How to iterate through multiple Json URL pages

Posted: December 21, 2016 in UNIX

Below is the example of how the json page looks like. the page will have a next with another json data link we need to get all such links until it encounters “next=null”.
capture

# this scripts gets all the links and writes them to a output file.

import urllib2
import simplejson
import sys
import json

SRC_DIR = sys.argv[1] # pass the variable where you want to save all the web links.

#Creating a variable to start the count
count = 0

print(SRC_DIR)
#Creating while loop
sys.stdout = open(sys.argv[1], ‘w’) # this statement enables writing output to a file.

print “Initial link here” # start writing to the output file.

while count < 500000000:

URL = ‘(Your link here)Weblink=’ + str(count)

response = urllib2.urlopen(URL)

data = simplejson.load(response)

count += 500

Result = str(data[‘next’])

if Result != ‘None’:

print Result

else:
break

The above script writes all the URL to a file as below.

capture

Once we get the above output file. we call our second script (below) which loops through each URL and get the Json data and converts it to CSV file format.

This script can be called with a loop script  reading the above file line by line and pass each line as a parameter to this script (can be a shell script, Perl or any script).

import json,urllib2
import sys

data = urllib2.urlopen(sys.argv[1]).read() # this is a argument will be replaced with URL run time
d = json.loads(data) # Loads the URL
sys.stdout = open(sys.argv[2], “a”) # write the data to this file which is the second argument(file name)

print ‘SERVER,PROVISION_DT,LAST_CHECKIN,OS,ARCHITECTURE,PLATFORM,STATE,IPADDRESS,LAST_CURRENCY_CHECK,LOCATION,RFS’

# the above statement is a header for the output file.

for rows in d[‘results’]:

print str(rows.get(‘name’)) + “,” + \
str(rows.get(‘create_date’)) + “,” + \
str(rows.get(‘checkin_date’)) + “,” + \
str(rows.get(‘operating_system’)) + “,” + \
str(rows.get(‘architecture’)) + “,” + \
str(rows.get(‘platform’)) + “,” + \
str(rows.get(‘state’)) + “,” + \
str(rows.get(‘ip_address’)) + “,” + \
str(rows.get(‘currency_date’)) + “,” + \
str(rows.get(‘location’)) + “,” + \
str(rows.get(‘rfs’))

# use of str is very important else the script fails with the below error
# TypeError: coercing to Unicode: need string or buffer, NoneType found

 

this this how the final output looks like.

capture

Download of the complete script available at below link

https://drive.google.com/file/d/0B6HgeXG_yRq1UVhkYXJwQnZ5bE0/view?usp=sharing

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s