Episode 1: Exporting data and images out of Parse

export

Disclaimer: In this series of blogs we’ll describe how we move from Parse to AWS. In no way, shape or form do we claim that this is the best way to do things. This is simply a narration of steps we took. In fact, if you have found better ways of doing these things, we’d love to hear about it!

If you have been following along so far, you know that we made a decision to use a combination of Amazon’s DynamoDB and S3 bucket to store our data. So, lets consider where we are in our process – we have all our data (including pictures) in Parse which stores it in MongoDB. The official migration blog recommends using another MongoDB instance or a hosted service like MongoLab, which we are not doing. So what do we do?

Research Import

First order of business is to study DynamoDB to understand what format it requires the data to be in, and the best way to import data. We will go into details of format and other considerations of DynamoDB in our next blog, for now it is sufficient to know, the following:

  • DynamoDB accepts its own format of json which is not the same as MongoDB format (meaning we will need to massage our existing data)
  • Reading up on various amazon resources, you’ll see tutorials on using data pipelines to import data – we will do a blog post to cover why after experimenting with it, we decided to go a different route
  • DynamoDB has a maximum size per item limit, (400 kb) so it’s not a place where you want to store your image files or other binaries. This is why we’re using S3 buckets.

The Plan

So the plan is simple

  1. Export our data out of parse
  2. Export our images out of parse
  3. Put all our images into an S3 bucket with the same unique name that parse gave them
  4. Import the json data we get out of Parse into DynamoDB along with the unique image names for our files.
    • In our new app, when we need to fetch images, we’ll first get image names that are stored with our items, then fetch image from our s3 buckets using the name.

In this blog we’ll deal with 1 and 2

Give me my Data

Getting the data dump out of parse is fairly straight forward. If you’re using Parse’s beta dashboard, go to App Settings > General. Under App Management, you will see a button for Export App Data.

Screen Shot 2016-02-13 at 7.17.40 PM

This will send you an email with a zip file containing all your data. There will be a file for each of your tables/classes and the files will have json objects representing each record.

Give me my Files

Now this is probably why a lot of you are here. We couldn’t find an easy way of exporting images/files that parse provides. All the answers we read recommended doing it manually. So that’s what we did. For this step, we created a quick node app. The required npm packages:

  • Parse – to make parse requests to fetch your parse objects (these contain the actual urls for your images)
  • Request – to download the images once you’ve gotten the image url from objects
  • fs – to write the downloaded files
  • moment – beautiful library to handle dates πŸ™‚

Here’s the app in its entirety

so the idea is this

  1. fetch 100 parse objects
  2. for each object recursively download images 1 at a time
  3. repeat 1. and 2. till the last object is reached

Assuming that one has node and npm installed (and put in your app’s credentials to initialize app), one can run the app with:

node getParseFiles.js

We left this running for sometime… and at the end, we had all our files downloaded to our local file system.

So… now we have all our data in json format, and we have all our images downloaded. Next step, format the data for import and uploading images to S3! Until the next blog… Eat responsibly! πŸ™‚

Advertisements

5 thoughts on “Episode 1: Exporting data and images out of Parse

  1. Pingback: AWS Week in Review – February 8, 2016 | wart1949

  2. Pingback: AWS Week in Review – February 8, 2016 – SMACBUZZ

  3. Wouldn’t it have been more efficient to streamline the download of images -> upload directly to s3, skipping the local filesystem altogether?
    Probably not the case for you, but if you were dealing with a plethora of files some of which fairly large, a case where you run out of local storage isn’t at all unlikely.

    BTW, love the series, definitely interested on how your migrations goes!

    Liked by 1 person

    1. Edit: previous answer got it’s middle eaten up by wordpress, so correcting-
      Thank you for your question and a very good point there indeed! It would definitely have been more efficient to set up our node app to directly upload images to the S3 bucket. But you correctly identified that our usecase required transfer of less than 1k files.
      At the time of file download, given the number of files we had, we thought the upload button on the S3 dashboard was the easiest way to get the files in. To be honest, A lot of the things we post here will not be the perfect solution, but more of an account of what worked for us. The goal is to get the discussion started so intelligent people like yourselves can correct us/ suggest alternatives πŸ™‚
      To conclude though, for huge number of files, it is definitely much better to use the node app to download, upload to s3 and clean up… all in the same workflow. Cheers!

      Liked by 1 person

  4. Pingback: Episode 3: Uploading images into Amazon S3 – Calorious

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s