Episode 4: Importing JSON into DynamoDB

pasta.jpg

Disclaimer: In this series we’ll describe how we move from Parse to AWS. In no way do we claim that this is the best way to do things. This is simply a narration of steps we took. In fact, if you have found better ways of doing the same, we’d love to hear about it!

Ok! Now we’re at the final step of data migration. By the end of this episode, we will have our data all ported over from Parse to DynamoDB and we’ll be ready to move on to addressing other features like Push Notifications, User management etc. Let’s quickly review our migration plan so far.

Plan Progress

  1. Export our data out of parse (Done)
  2. Export our images out of parse (Done)
  3. Format Parse data into dynamoDB compatible data (Done)
  4. Put all our images into an S3 bucket with the same unique name that parse gave them (Done)
  5. Import the JSON data we get out of Parse into DynamoDB along with the unique image names for our files.

For step 5, we’ll be using the JSON files we created at the end of Episode 2

Method 1: Data Pipelines ( We ended up not using this )

If you google, “importing data into aws dynamoDB”, you’ll be bombarded with links that tell you that Data Pipelines are the way to go. It sounds great, but there are some things we discovered that is worth considering before you go that route.

  • What is the correct format?? – We had no idea how to format our JSON files for data pipelines to work. There doesn’t seem to be an easy to find sample which we could replicate (or at least there wasn’t when we were doing this). So we resorted to trial and error. Don’t worry, we’ll share with you exactly what format you need… You’re Welcome!
  • Oh dang, it costs money! – The default box that aws uses to Import DynamoDB backup data from S3  is m3… which is not covered under the free tier. Not knowing this, we kept trying data pipelines with various wrong formats of data and incurred about $10 in fees in one day! Ok, lesson learned, let’s use a smaller box. We edited our config to use a smaller m1 linux box which spit out the following in our faces:

    ERROR: Instance type m1.small is not supported on AMI ‘3.8.0’

    • It gets more complicated but We wanted to get this entire migration done without spending $$ and too much time and effort… data pipelines was looking like a no-go

So in case you still want to go ahead and use the data pipelines, the correct format for input files is the following:


{"Name": {"S":"Amazon DynamoDB"},"Category": {"S":"Amazon Web Services"}}
{"Name": {"S":"Amazon push"},"Category": {"S":"Amazon Web Services"}}
{"Name": {"S":"Amazon S3"},"Category": {"S":"Amazon Web Services"}}

As you can see, the file should contain dynamoDB JSON objects separated by newline. Make sure you go over the dynamoDB format – don’t have empty strings etc because each time the import fails, it’ll cost you to try again.

Method 2: Old faithful Node app (recommended)

Node has certainly been our friend throughout this process. So once more we resorted to a simple Node app to push data into DynamoDB. What’s even better is that AWS SDK’s DynamoDB DocClient supports regular JSON… so we don’t need to try to get the perfect dynamoDB json format. Well, we can keep talking about this… but you probably wanna just see the code already! so … here it is


/*********************************
Simple Demo for loading files into
DynamoDB.
**********************************/

//package to read json files
var jsonfile = require('jsonfile');
//AWS node sdk
var AWS = require('aws-sdk');

//need to update region in config
AWS.config.update({
    region: "us-east-1"
});

//create a doc client to allow using JSON directly
var docClient = new AWS.DynamoDB.DocumentClient();

//prepared JSON file
//[{ ... }, { ... }]
var placeFile = "data/places.ddb.json";
var placeArray = jsonfile.readFileSync(placeFile);

//utility function to create a single put request
function getPlace(index){
    return {
        TableName: 'Places',
        Item: placeArray[index]
    };
}

//recursive function to save one place at a time
function savePlaces(index){
    if(index == placeArray.length){
        console.log("saved all.");
        return;
    }

    var params = getPlace(index);
    //spit out what we are saving for sanity
    console.log(JSON.stringify(params));
    //use the client to execute put request.
    docClient.put(params, function(err, data) {
        if (err) {
            console.log(err);
        }else{
            console.log("saved Place item "+index);
            index += 1;
            //save the next place on the list
            //with half a second delay
            setTimeout(function(){
                savePlaces(index);
            }, 500);
        }
    });
}

//start saving from index - 0
savePlaces(0);

So… all we’re doing here is the following:

  • Read our prepared JSON file from Episode 2. Hold the array of objects in memory
  • Read the first item, create a JSON object to put in dynamoDB, send out put request
  • Upon successful put, wait half a second, send out second push.
  • The console log methods help us determine what exactly we’re pushing (and what index we’re on)
  • Now lets say the app throws after pushing item 35. We know that something was wrong with item 36… so we quickly check our file, fix it… then edit “savePlaces(0)” (line 59) to say “savePlaces(36)” … and we will continue from 36 again.

Let it run till all your objects are pushed…. and Boom! Your DynamoDB is now ready to start serving your client side applications.

A Quick Recap

At this point, we have all our data extracted from Parse and imported into DynamoDB. All our images are stored in S3 Bucket and their names are stored with respective items in DynamoDB. We are now ready to start connecting the dots and pulling, pushing data via a mobile app. I hope you are finding the series useful so far… we would love to hear from you about your experience/tips etc. Please feel free to leave us comments, feedback or maybe an emoji! Until next time… Eat Responsibly!

Episode 3: Uploading images into Amazon S3

food-italian-italy-cheese-large

Disclaimer: In this series we’ll describe how we move from Parse to AWS. In no way do we claim that this is the best way to do things. This is simply a narration of steps we took. In fact, if you have found better ways of doing the same, we’d love to hear about it!

First of all, we’d like to apologize for the delay in posting this episode. A family celebration at Calorious had us all distracted and eating cake! We are now back on track and in full form. So to recap, let’s review our progress so far:

Plan Progress

  1. Export our data out of parse (Done)
  2. Export our images out of parse (Done)
  3. Format Parse data into dynamoDB compatible data (Done)
  4. Put all our images into an S3 bucket with the same unique name that parse gave them
  5. Import the JSON data we get out of Parse into DynamoDB along with the unique image names for our files.

So In this blog, we’ll talk about step 4.

Uploading Images to S3

In episode 1, we wrote a quick node app to recursively download all our images. While this might be good for small scale solution (few thousand files). For much larger set of files, it might be better to use the node aws sdk to upload the images directly to S3. For our usecase though, we took the easiest route – download all images locally, then upload using the aws console.

Create a bucket

So first, lets create a bucket where we will store the images. Log into your amazon s3 console. and create a new bucket.

Screen Shot 2016-03-15 at 10.50.48 AM

Once the bucket is created, click the bucket to enter it in the console. Now you get the option to upload files

Screen Shot 2016-03-15 at 10.53.03 AM

Click on upload, that brings up a prompt to allow for multiple file uploads. Select all the files we downloaded in Episode 1, and upload them. Once done, all your files with the correct name will now be in your S3 bucket.

Permissions

By default all buckets are private. Which means, if you need to access these files from an app, you need to permission them with the right credentials. Fot this, go to the IAM console.

  • Click on Roles in the sidebar and pick the role you want to give permission to. Tip: if you have used mobile hub, a role for your authenticated users has already been created for you. it’ll look something like this – <app name>_auth_MOBILEHUB_<numbers>
  • Once you have picked the correct role, click on “Create Role Policy” button.
  • Use policy generator, makes life a little easier
  • When you hit select, you’ll get to a screen with a few options. Pick the following
    • Effect – Allow
    • AWS Service – Amazon S3
    • Actions:
      • DeleteObject
      • GetObject
      • PutObject
      • RestoreObject
  • It’ll generate a policy that looks somewhat like the following:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "<auto-generated>",
            "Effect": "Allow",
            "Action": [
                "s3:DeleteObject",
                "s3:GetObject",
                "s3:PutObject",
                "s3:RestoreObject"
            ],
            "Resource": [
                "arn:aws:s3:::calorious-images/*"
            ]
        }
    ]
}


All set! You are now ready to read/write to this bucket from your app! Until next entry… you got it… EAT RESPONSIBLY!

Episode 2: Formatting Parse Data for DynamoDB

foodiesfeed.com_Gunel-Farhadli-making-pasta

Disclaimer: In this series of blogs we’ll describe how we move from Parse to AWS. In no way, shape or form do we claim that this is the best way to do things. This is simply a narration of steps we took. In fact, if you have found better ways of doing these things, we’d love to hear about it!

Welcome back! If you are tuning in for the first time this blog is dedicated to the journey that we are taking at Calorious to migrate from Parse to AWS. So let’s refer back to our plan from Episode 1:

The Plan

So the plan is simple

  1. Export our data out of parse (Done)
  2. Export our images out of parse (Done)
  3. Put all our images into an S3 bucket with the same unique name that parse gave them
  4. Import the json data we get out of Parse into DynamoDB along with the unique image names for our files.
    • In our new app, when we need to fetch images, we’ll first get image names that are stored with our items, then fetch image from our s3 buckets using the name.

In this blog we’ll deal with none of the topics in our plan! (We like keeping people on their toes here at Calorious)

Format my Data

At this point we have been able to get our data and images out of Parse. Our first reaction was, awesome let’s just get this data uploaded and recreate our tables. Well we had to slow down and go back to the text books because DynamoDB forces you to rethink the structure of your data. At the very least, it requires a different format in comparison to standard Parse JSON.

Best starting point for us was the AWS documentation and the sample data in the DynamoDB getting started guide here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SampleData.LoadData.html

Once the sample data was downloaded we opened up the file “forum.json”

{
 "Forum": [
  {
   "PutRequest": {
    "Item": {
     "Name": {"S":"Amazon DynamoDB"},
     "Category": {"S":"Amazon Web Services"},
     "Threads": {"N":"2"},
     "Messages": {"N":"4"},
     "Views": {"N":"1000"}
    }
   }
  },
  {
   "PutRequest": {
    "Item": {
     "Name": {"S":"Amazon S3"},
     "Category": {"S":"Amazon Web Services"}
    }
   }
  }
 ]
}

What the gobbledygook is this? It seems like every attribute needs to be annotated with what type of value it is.  Looking at the above JSON the Name of the Item is defined as type “S”. Does this mean we need to transform all our data into this format? Breathe, Calorious team member, breathe. After some research and experimentation, we came up with ways to shortcut this step. We will tell you about all that we tried… but first we need to address the immediate need – We need to format our Parse json… into simple key value pairs and fill the gaps that exist between parse and dynamo. Before we move forward, we need the following changes:

If you wanna know more about data types supported in DynamoDB, below link will help: 

(http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DataModel.html#DataModel.DataTypes)

Changes

So, there are a few differences right off the bat:

  1. No date type support in dynamoDB
  2. Parse has this neat object type for pointers and relations. No such luck with DynamoDB
  3. no default createdAt/updatedAt for free
  4. objectIds are obsolete in most cases.

Now the first thing we had to do was actually think about the data we had from dynamoDB’s perspective. Depending on how we were going to query objects, we had to restructure some of the tables. This would be unique to every applications so we will stick to the generic things in this blog – The steps to make the migration work:

Change datetime to milliseconds

We made the decision to convert all of our dates to millisecond that can be stored in DynamoDB as numbers. Why number? Storing it as number will allow us to meet our requirement of sorting by latest, get objects for a certain time period, etc.

Reduce all pointer/relation objects to simple strings

So Parse has these objects for (pointers and relations) that is a json with 3 attributes: type, objectId and classname. For DynamoDB, we decided to reduce this to single string attribute – the object id. <– this is the most straightforward transformation and a proof of concept. We strongly urge you to look into whether you still really need objectids.

When we query – If item A has an objectId for item B, we first fetch ItemA and then we fetch itemB via it’s objectId (this assumes that itemB has objectId as the hashkey).

An Example


//  Before
//----------------------------------------
{
  "createdAt": "2016-01-27T20:34:20.669Z",
  "objectId": "1I2i3mNfiO",
  "updatedAt": "2016-01-27T20:34:20.669Z",
  "user": {
    "__type": "Pointer",
    "className": "_User",
    "objectId": "hLE6HT84c4"
  },
  "name": "Apple",
  "type": "Fruit"
}

//  After
//----------------------------------------
{
  "createdAt": 1453926860669, //sortkey
  "user": "hLE6HT84c4",
  "name": "Apple",
  "type": "Fruit" //hashkey
}

Here you see an example of what JSON looked like for a sample object, before and after transformation.

Our best friend, Node

So how do we get all our data reformatted and ready to go? Time to write a small Node.js app  to get the job done … We’d rather be eating than wasting time reformatting by hand.


var jsonfile = require('jsonfile');
var fs = require('fs-extra');
var moment = require('moment');
var _ = require('lodash');
var attr = require('dynamodb-data-types').AttributeValue;
var foodFile = "parse-data/food.json";
var foodJson = jsonfile.readFileSync(foodFile);
var foodResults = foodJson.results;
var foods = [];
function loopy(){
for(var i=0; i<foodResults.length; i++){
var food = foodResults[i];
food.createdAt = moment(food.createdAt).valueOf();
food.user = food.user.objectId;
delete food.updatedAt;
delete food.objectId;
//uncomment below if you want json in dynamodb format
//foods.push(attr.wrap(food));
foods.push(food);
}
}
loopy();
var ws = fs.createOutputStream('data/foods.ddb.json')
ws.write(JSON.stringify(foods));

What we’re doing here is reading a parse exported data file, looping through all of the records, formatting it into the format we want, then writing it as a different file. This file now contains json objects with the format we want for DynamoDB.

Wait… so what about that special DynamoDB json format?

At line 19, you’ll see a commented out line. This line basically changes our reformatted JSON into dynamoDB json. It uses a nice node utility: dynamodb-data-types to transform regular json into the JSON you saw at the beginning of this post. Make a note of this for now, we will refer to it again when we talk about importing data using data pipelines in our next blog.

Also, we used Moment.js to convert Dates to milliseconds.

Now we are ready to upload our data! Until the next blog… Eat responsibly! 🙂

 

 

 

Episode 1: Exporting data and images out of Parse

export

Disclaimer: In this series of blogs we’ll describe how we move from Parse to AWS. In no way, shape or form do we claim that this is the best way to do things. This is simply a narration of steps we took. In fact, if you have found better ways of doing these things, we’d love to hear about it!

If you have been following along so far, you know that we made a decision to use a combination of Amazon’s DynamoDB and S3 bucket to store our data. So, lets consider where we are in our process – we have all our data (including pictures) in Parse which stores it in MongoDB. The official migration blog recommends using another MongoDB instance or a hosted service like MongoLab, which we are not doing. So what do we do?

Research Import

First order of business is to study DynamoDB to understand what format it requires the data to be in, and the best way to import data. We will go into details of format and other considerations of DynamoDB in our next blog, for now it is sufficient to know, the following:

  • DynamoDB accepts its own format of json which is not the same as MongoDB format (meaning we will need to massage our existing data)
  • Reading up on various amazon resources, you’ll see tutorials on using data pipelines to import data – we will do a blog post to cover why after experimenting with it, we decided to go a different route
  • DynamoDB has a maximum size per item limit, (400 kb) so it’s not a place where you want to store your image files or other binaries. This is why we’re using S3 buckets.

The Plan

So the plan is simple

  1. Export our data out of parse
  2. Export our images out of parse
  3. Put all our images into an S3 bucket with the same unique name that parse gave them
  4. Import the json data we get out of Parse into DynamoDB along with the unique image names for our files.
    • In our new app, when we need to fetch images, we’ll first get image names that are stored with our items, then fetch image from our s3 buckets using the name.

In this blog we’ll deal with 1 and 2

Give me my Data

Getting the data dump out of parse is fairly straight forward. If you’re using Parse’s beta dashboard, go to App Settings > General. Under App Management, you will see a button for Export App Data.

Screen Shot 2016-02-13 at 7.17.40 PM

This will send you an email with a zip file containing all your data. There will be a file for each of your tables/classes and the files will have json objects representing each record.

Give me my Files

Now this is probably why a lot of you are here. We couldn’t find an easy way of exporting images/files that parse provides. All the answers we read recommended doing it manually. So that’s what we did. For this step, we created a quick node app. The required npm packages:

  • Parse – to make parse requests to fetch your parse objects (these contain the actual urls for your images)
  • Request – to download the images once you’ve gotten the image url from objects
  • fs – to write the downloaded files
  • moment – beautiful library to handle dates 🙂

Here’s the app in its entirety


var Parse = require('parse/node');
var fs = require('fs');
var request = require('request');
var moment = require('moment');
Parse.initialize("<app-id>", "<secret-key>");
/***********************************************
* The idea is to download all images, 100 at
* a time. we'll write a recursive function to
* do that. the files will be downloaded in the
* ./images/ directory.
************************************************/
//keep a count of downloaded images
var count = 0;
var download = function(items){
//if we downloaded all images in the current
//list of items, we need to fetch more (next 100).
if(count == items.length){
//end case – if # of items parse fetched
//is less than 100 we know we've reached
//the end of data.
if(items.length < 100){
return;
}
//reset the count
count = 0;
console.log("got all 100");
//last processed item
var lastItem = items[items.length-1];
var newBeforeDate = moment(lastItem.createdAt)
//lets get the next 100 starting from the created date
//of the last object fetched
getItems(newBeforeDate.toDate());
}else{
//if we haven't yet downloaded all images in the
//current list of fetched items, download the next one
var item = items[count];
var filename = item.get("image")._name;
var uri = item.get("image")._url;
request.head(uri, function(err, res, body){
if (err){
console.log(err);
console.log(item);
return;
}else {
var stream = request(uri);
stream.pipe(
fs.createWriteStream("images/"+filename)
.on('error', function(err){
callback(error, filename);
stream.read();
})
).on('close', function() {
//call this function again
count++;
download(items);
});
}
});
}
};
function getItems(beforeDate){
var query = new Parse.Query("Item");
query.notEqualTo("deleted", true);
query.descending("createdAt");
query.lessThan("createdAt", beforeDate);
query.find({
success: function(item){
//we will get 100 at a time
console.log("fetched more items…")
download(items);
}
});
}
getItems(new Date());

so the idea is this

  1. fetch 100 parse objects
  2. for each object recursively download images 1 at a time
  3. repeat 1. and 2. till the last object is reached

Assuming that one has node and npm installed (and put in your app’s credentials to initialize app), one can run the app with:

node getParseFiles.js

We left this running for sometime… and at the end, we had all our files downloaded to our local file system.

So… now we have all our data in json format, and we have all our images downloaded. Next step, format the data for import and uploading images to S3! Until the next blog… Eat responsibly! 🙂

The Learning Process (Begins)

First things first we have to look at what Parse offers for migration. For anyone who wants to see the “Moving-On” article from Parse here is a link: http://blog.parse.com/announcements/moving-on/

Parse offerings to assist migration

  • Parse is offering a database migration tool (Migrate data to MongoDB)
  • Parse is open sourcing that Parse Serve (This will let you run most of the Parse API from you own Node.js server)
  • There is an assumption that developers will move to Heroku or MongoLab
  • Cloud code will break because there are Native Cloud Code modules that will not be available on Parse Server. (App Links, Buffer, Mailgun, etc…)
  • Analytics, Jobs, and Push Notifications are no longer supported
  • The Parse Dashboard is no longer available

Ok, So there is a lot more information, please check out the link above.

Not being very familiar with the plethora of services that AWS offers, first thing we wanted to know was… 

How does Parse map to AWS in terms of services?

After a quick and cursory research phase, here’s what we came up with:

parse-aws-map

 

Mind you that this is a 1000 ft view and a most simplistic mapping. Each of these services have their own set of features that are very different from their counterparts. However, a quick look confirmed that all of Calorious‘ use cases were potentially being met by AWS.

This provided us with the sanity check we needed before delving deeper into migration steps.

 

 

 

 

In addition, the AWS Mobile Hub was key to getting the basic pipeline for our application laid out with the services connected. If you have not had a look at Amazon’s mobile hub yet, it allows you to simply put together, test, and monitor mobile applications through a user friendly web interface. It also streamlines the service configurations of your application. Check it out here: https://aws.amazon.com/mobile/ A member of the Calorious team was able to see the Mobile Hub in action at the AWS Loft in SoHo late 2015.

Note* – At the time of writing this blog, mobile hub does not give you an option to have DynamoDB pre-configured in your starter app.

So now that we have shell applications that we can use to migrate our client side application code to it is time to jump into the nitty gritty.

Cognito

What do we do about users and identification? If you got your starter application from the mobile hub, chances are that you picked Cognito as one of the services. This means the starter is already set up with the pipelines for user authentication. In order to migrate successfully, we’d need to associate parse userid with amazon’s Cognito id. Stay tuned to see how we’ll achieve this. More info on Cognito – https://aws.amazon.com/cognito/

Storage

DynamoDB

When looking through the AWS documentation, you will notice that Amazon offers file storage (S3) , a hosted relational database solution (), and a hosted nosql database solution (DynamoDB). DynamoDB caught our eye. Calorious is a small startup but we have big dreams and we need something that will be able to grow with us. DynamoDB seems to fit the bill. https://aws.amazon.com/dynamodb/

DynamoDB is different from MongoDB. DynamoDB encourages you to think about your data as key-value pair. Needless to say you have to examine the way your applications data is laid out in Parse and restructure your database in DynamoDB. Parse leverages the use of pointer  and relation objects that allow you to create a relationship between objects. That concept goes out the window when working with DynamoDB. THATS OK. As we learn and re-design our own data, we will explain our moves and mistakes during data migration. 

Lambda

Last but not least, AWS lambdas provide the ability to invoke your functions via events. It is like Parse cloudcode on steroids. You can connect your functions to pretty much any of the aws services and invoke it to perform various things. Alright we are pumped to get started! so lets do this…

The Decision

On Friday January 29th, we woke up to some bad news, Parse announced their demise. It was a sad day for Calorious and like so many other dev teams we were wondering what do we do? Parse is giving all of their users one year to migrate off their services and open sourced their server code. At first we thought this is awesome! We could migrate our data to our own mongos and run the Parse server on a cloud host. We quickly figured out that this is not going to work for us. Without using Mongo Lab (which seems pricey) we would have to worry about scaling. Also features, like push notifications, are not available with the Parse Server. We came to a conclusion that it is time to give AWS a shot and dive straight in.

AWS provides a number of services that allows for mobile development. It also provides a hosted no-sql database called DynamoDB that removes the worries of scaling. To boot, if everything is configured correctly (We will share more details in a later post) AWS is cheap!

We are currently migrating to AWS and have started solving the problems of migration. Through this blog we hope to share our journey and help other teams move. Recently Azure has provided a simple migration over to their cloud offering. This is too easy and is not the way we roll at Calorious.

So join us for this journey!

For more information about Calorious please check out this link: https://www.facebook.com/Calorious-540284809486468