Archive for December, 2016

Below is the example of how the json page looks like. the page will have a next with another json data link we need to get all such links until it encounters “next=null”.
capture

# this scripts gets all the links and writes them to a output file.

import urllib2
import simplejson
import sys
import json

SRC_DIR = sys.argv[1] # pass the variable where you want to save all the web links.

#Creating a variable to start the count
count = 0

print(SRC_DIR)
#Creating while loop
sys.stdout = open(sys.argv[1], ‘w’) # this statement enables writing output to a file.

print “Initial link here” # start writing to the output file.

while count < 500000000:

URL = ‘(Your link here)Weblink=’ + str(count)

response = urllib2.urlopen(URL)

data = simplejson.load(response)

count += 500

Result = str(data[‘next’])

if Result != ‘None’:

print Result

else:
break

The above script writes all the URL to a file as below.

capture

Once we get the above output file. we call our second script (below) which loops through each URL and get the Json data and converts it to CSV file format.

This script can be called with a loop script  reading the above file line by line and pass each line as a parameter to this script (can be a shell script, Perl or any script).

import json,urllib2
import sys

data = urllib2.urlopen(sys.argv[1]).read() # this is a argument will be replaced with URL run time
d = json.loads(data) # Loads the URL
sys.stdout = open(sys.argv[2], “a”) # write the data to this file which is the second argument(file name)

print ‘SERVER,PROVISION_DT,LAST_CHECKIN,OS,ARCHITECTURE,PLATFORM,STATE,IPADDRESS,LAST_CURRENCY_CHECK,LOCATION,RFS’

# the above statement is a header for the output file.

for rows in d[‘results’]:

print str(rows.get(‘name’)) + “,” + \
str(rows.get(‘create_date’)) + “,” + \
str(rows.get(‘checkin_date’)) + “,” + \
str(rows.get(‘operating_system’)) + “,” + \
str(rows.get(‘architecture’)) + “,” + \
str(rows.get(‘platform’)) + “,” + \
str(rows.get(‘state’)) + “,” + \
str(rows.get(‘ip_address’)) + “,” + \
str(rows.get(‘currency_date’)) + “,” + \
str(rows.get(‘location’)) + “,” + \
str(rows.get(‘rfs’))

# use of str is very important else the script fails with the below error
# TypeError: coercing to Unicode: need string or buffer, NoneType found

 

this this how the final output looks like.

capture

Download of the complete script available at below link

https://drive.google.com/file/d/0B6HgeXG_yRq1UVhkYXJwQnZ5bE0/view?usp=sharing