Episode 2: Formatting Parse Data for DynamoDB

foodiesfeed.com_Gunel-Farhadli-making-pasta

Disclaimer: In this series of blogs we’ll describe how we move from Parse to AWS. In no way, shape or form do we claim that this is the best way to do things. This is simply a narration of steps we took. In fact, if you have found better ways of doing these things, we’d love to hear about it!

Welcome back! If you are tuning in for the first time this blog is dedicated to the journey that we are taking at Calorious to migrate from Parse to AWS. So let’s refer back to our plan from Episode 1:

The Plan

So the plan is simple

  1. Export our data out of parse (Done)
  2. Export our images out of parse (Done)
  3. Put all our images into an S3 bucket with the same unique name that parse gave them
  4. Import the json data we get out of Parse into DynamoDB along with the unique image names for our files.
    • In our new app, when we need to fetch images, we’ll first get image names that are stored with our items, then fetch image from our s3 buckets using the name.

In this blog we’ll deal with none of the topics in our plan! (We like keeping people on their toes here at Calorious)

Format my Data

At this point we have been able to get our data and images out of Parse. Our first reaction was, awesome let’s just get this data uploaded and recreate our tables. Well we had to slow down and go back to the text books because DynamoDB forces you to rethink the structure of your data. At the very least, it requires a different format in comparison to standard Parse JSON.

Best starting point for us was the AWS documentation and the sample data in the DynamoDB getting started guide here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SampleData.LoadData.html

Once the sample data was downloaded we opened up the file “forum.json”

{
 "Forum": [
  {
   "PutRequest": {
    "Item": {
     "Name": {"S":"Amazon DynamoDB"},
     "Category": {"S":"Amazon Web Services"},
     "Threads": {"N":"2"},
     "Messages": {"N":"4"},
     "Views": {"N":"1000"}
    }
   }
  },
  {
   "PutRequest": {
    "Item": {
     "Name": {"S":"Amazon S3"},
     "Category": {"S":"Amazon Web Services"}
    }
   }
  }
 ]
}

What the gobbledygook is this? It seems like every attribute needs to be annotated with what type of value it is.  Looking at the above JSON the Name of the Item is defined as type “S”. Does this mean we need to transform all our data into this format? Breathe, Calorious team member, breathe. After some research and experimentation, we came up with ways to shortcut this step. We will tell you about all that we tried… but first we need to address the immediate need – We need to format our Parse json… into simple key value pairs and fill the gaps that exist between parse and dynamo. Before we move forward, we need the following changes:

If you wanna know more about data types supported in DynamoDB, below link will help: 

(http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DataModel.html#DataModel.DataTypes)

Changes

So, there are a few differences right off the bat:

  1. No date type support in dynamoDB
  2. Parse has this neat object type for pointers and relations. No such luck with DynamoDB
  3. no default createdAt/updatedAt for free
  4. objectIds are obsolete in most cases.

Now the first thing we had to do was actually think about the data we had from dynamoDB’s perspective. Depending on how we were going to query objects, we had to restructure some of the tables. This would be unique to every applications so we will stick to the generic things in this blog – The steps to make the migration work:

Change datetime to milliseconds

We made the decision to convert all of our dates to millisecond that can be stored in DynamoDB as numbers. Why number? Storing it as number will allow us to meet our requirement of sorting by latest, get objects for a certain time period, etc.

Reduce all pointer/relation objects to simple strings

So Parse has these objects for (pointers and relations) that is a json with 3 attributes: type, objectId and classname. For DynamoDB, we decided to reduce this to single string attribute – the object id. <– this is the most straightforward transformation and a proof of concept. We strongly urge you to look into whether you still really need objectids.

When we query – If item A has an objectId for item B, we first fetch ItemA and then we fetch itemB via it’s objectId (this assumes that itemB has objectId as the hashkey).

An Example


//  Before
//----------------------------------------
{
  "createdAt": "2016-01-27T20:34:20.669Z",
  "objectId": "1I2i3mNfiO",
  "updatedAt": "2016-01-27T20:34:20.669Z",
  "user": {
    "__type": "Pointer",
    "className": "_User",
    "objectId": "hLE6HT84c4"
  },
  "name": "Apple",
  "type": "Fruit"
}

//  After
//----------------------------------------
{
  "createdAt": 1453926860669, //sortkey
  "user": "hLE6HT84c4",
  "name": "Apple",
  "type": "Fruit" //hashkey
}

Here you see an example of what JSON looked like for a sample object, before and after transformation.

Our best friend, Node

So how do we get all our data reformatted and ready to go? Time to write a small Node.js app  to get the job done … We’d rather be eating than wasting time reformatting by hand.


var jsonfile = require('jsonfile');
var fs = require('fs-extra');
var moment = require('moment');
var _ = require('lodash');
var attr = require('dynamodb-data-types').AttributeValue;
var foodFile = "parse-data/food.json";
var foodJson = jsonfile.readFileSync(foodFile);
var foodResults = foodJson.results;
var foods = [];
function loopy(){
for(var i=0; i<foodResults.length; i++){
var food = foodResults[i];
food.createdAt = moment(food.createdAt).valueOf();
food.user = food.user.objectId;
delete food.updatedAt;
delete food.objectId;
//uncomment below if you want json in dynamodb format
//foods.push(attr.wrap(food));
foods.push(food);
}
}
loopy();
var ws = fs.createOutputStream('data/foods.ddb.json')
ws.write(JSON.stringify(foods));

What we’re doing here is reading a parse exported data file, looping through all of the records, formatting it into the format we want, then writing it as a different file. This file now contains json objects with the format we want for DynamoDB.

Wait… so what about that special DynamoDB json format?

At line 19, you’ll see a commented out line. This line basically changes our reformatted JSON into dynamoDB json. It uses a nice node utility: dynamodb-data-types to transform regular json into the JSON you saw at the beginning of this post. Make a note of this for now, we will refer to it again when we talk about importing data using data pipelines in our next blog.

Also, we used Moment.js to convert Dates to milliseconds.

Now we are ready to upload our data! Until the next blog… Eat responsibly! 🙂

 

 

 

6 thoughts on “Episode 2: Formatting Parse Data for DynamoDB

  1. Pingback: AWS Week in Review February 15, 2016 | wart1949

  2. Pingback: Episode 4: Importing JSON into DynamoDB – Calorious

Leave a comment