JSON has become the defacto standard of data transfer for a major part of the internet now. Particularly with the ubiquitous usage of mobile apps, pretty much all data is transferred via JSON.
Processing JSON rapidly is a recurring need in data processing.
Multiple libraries exist in Python and R for processing JSON. However, I have a strong preference to process JSON on the command line with tools particularly developed for such purpose.
jq is my favorite command line JSON processor.
jq makes processing JSON a pleasure and lot of fun. Its used widely and lots of material is available on the web to help with data wrangling.
I'd like to demonstrate processing my favorite Spinning providers' data from Peloton below.
I've got all the rides to data from Pelton in a file called consolidated-od-classes.json. Typically for data analysis using R or Python I need the data to be transform to
csv to be loaded into a data frame for manipulation with
Here is a simple command that does the job:
# generated from other scripts # download at https://l.mypad.in/peloton-od-classes output="./data/peloton/consolidated-od-classes.json" # for tracking performance start=`date +%s` result="./data/peloton/result.csv" echo 'title,id,scheduled_start_time' > $result jq -r '[.title, .id, .scheduled_start_time] | @csv' $output >> $result end=`date +%s` runtime=$((end-start)) echo $runtime
csv extract is ready!