You've successfully subscribed to MyPad Blog
Great! Next, complete checkout for full access to MyPad Blog
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info is updated.
Billing info update failed.

Daily Coding - Data transfer between Python and R

Daily Coding - Data transfer between Python and R

Quick #dailycoding writeup. Continuing with my previous post, I often use a combination of Python and R for my data analytics projects. Here is a quick demonstration of one of the options.

I am looking at analyzing TSLA options over the recent few months. I have been tracking and writing about the stock in few consulting projects and had invested in the stock recently.

Most of my TLSA data is scraped on AWS Dynamodb. In this post, I will demonstrate:

  1. Extracting the data from Dynamodb (as shown in the last post) in Python
  2. Save the data as parquet files
  3. Read the data in R

Here is the python script to accomplish the same:

Last few lines show writing to parquet from python

#!/usr/bin/env python
# coding: utf-8

# In[ ]:


import sys, json
import pandas as pd
from pymongo import MongoClient
import numpy as np
import platform
from pprint import pprint
from os.path import expanduser
import datetime
from os.path import join, dirname
from dotenv import load_dotenv
import os


# In[ ]:


# credit: https://github.com/theskumar/python-dotenv
# OR, explicitly providing path to '.env'
from pathlib import Path  # python3 only
# 

cwd = os.getcwd()
print(cwd)
env_path = Path(cwd) / '.aws.dynamo.env'
print(env_path)

load_dotenv(dotenv_path=env_path, verbose=True)


# In[ ]:


AWSAccessKeyId = os.getenv("AWSAccessKeyId")
AWSSecretAccessKey = os.getenv("AWSSecretAccessKey")
AWSRegion = os.getenv("AWSRegion")
AWSBucket = os.getenv("AWSBucket")

print(AWSAccessKeyId)


# In[ ]:


# https://stackoverflow.com/questions/36780856/complete-scan-of-dynamodb-with-boto3
import boto3
import requests
from urllib.parse import urlparse
from io import BytesIO;
import contextlib
import mimetypes
from slugify import slugify
import pathlib


session = boto3.Session(
    aws_access_key_id=AWSAccessKeyId,
    aws_secret_access_key=AWSSecretAccessKey
)

client = boto3.client('dynamodb')
dynamodb = session.resource('dynamodb')


# In[ ]:


response = client.list_tables()

print(response['TableNames'])


# In[ ]:


table = dynamodb.Table('options')


# In[ ]:


# credit: https://stackoverflow.com/a/38619425/644081
response = table.scan()
data = response['Items']


# In[ ]:





# In[ ]:


# todo: explore later
# pagination
response = table.scan()
data = response['Items']

while 'LastEvaluatedKey' in response:
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
    data.extend(response['Items'])


# In[ ]:


# credit: https://stackoverflow.com/questions/45636460/use-pythons-pandas-to-treat-aws-dynamodb-data
from dynamodb_json import json_util as json

obj = pd.DataFrame(json.loads(data))
obj


# write to parquet files


from fastparquet import write
write('options.parq', obj)

Read the data in R

setwd("~/Dropbox/pandora/My-Projects/repos/diary/writing/python")

# install.packages("arrow")
library(arrow)
df_options <- read_parquet("options.parq")

require("data.table")
dt_options <- as.data.table(df_options)

dt_options <- dt_options[grep("TSLA", ticker)]