Porting legacy databases to Google App Engine

A reader posed the following question:

“I’m trying to convert my django app to work with google app engine. This is preferred rather than spending $100/year extra for a site with ssh access, plus I love the appengine dashboard.

Here is my issue: My current django app is fairly static. It pulls all its data from a mysql database containing ~6,000 rows. This itself is built from a gadfly database, so it should be pretty easy to get these values into the datastore/gql.

How can I sync my database with appengine?”

This is a highly relevant problem if you are porting an existing Django application to the Google App Engine. Luckily, the App Engine SDK includes a bulk data uploader tool that does the job. Let’s work through an example where we use this tool to transfer data from an existing MySQL database onto a Django application running on Google App Engine.

Case description: We have an inventory database that is currently stored in MySQL. This database is to be made available through a Django web application that allows visitors to review the inventory. The database is named ‘customerdb’ and has a single table called ‘inventory’:’

mysql> select * from inventory;
+----------+----------+
| name     | quantity |
+----------+----------+
| ham      |        2 |
| cheese   |        7 |
| macaroni |        1 |
+----------+----------+
3 rows in set (0.00 sec)

Setup: We need an App Engine-ready Django application that provides us with the views and models we need to display our inventory. For this scenario we will name the application ‘upload-demo’ and make it available on http://upload-demo.appspot.com. My earlier tutorials should provide you with what you need to build the basic application structure.

The full set of application files can be downloaded here. References to the application name and paths will have to be changed according to your system setup.

Once the fundamentals are in place you should add an inventory model that mirrors the table in our database:

# upload-demo/uploaddemo/main/models.py

from google.appengine.ext import db

class Inventory(db.Model):
    name = db.StringProperty()
    quantity = db.IntegerProperty()

We also need a view that displays the data:

# upload-demo/uploaddemo/main/views.py

from django.http import HttpResponse
from uploaddemo.main.models import Inventory

def main(request):
    result = ""
    items = Inventory.all()

    for item in items:
        result += "%s: %i<br/>" % (item.name, item.quantity)

    return HttpResponse(result)

Finally, your urls.py should point towards the view:

# upload-demo/uploaddemo/urls.py

from django.conf.urls.defaults import *

urlpatterns = patterns("",
    (r"^$", "uploaddemo.main.views.main"),
)

The application directory structure should look exactly like this:

Project directory structure

To verify that we are good to go, deploy the application to App Engine:

[test@mybox ~]$ appcfg.py update upload-demo

You should see an empty page—which makes sense since we have no data yet.

Step 1 – Create a bulk load handler: The bulk loader accepts CSV-formatted data which it will feed it into the datastore:

# upload-demo/loader.py

from google.appengine.ext import bulkload

class InventoryLoader(bulkload.Loader):
    def __init__(self):
        fields = [
            ("name", str),
            ("quantity", int)
        ]
       
        bulkload.Loader.__init__(self, "Inventory", fields)

if __name__ == "__main__":
    bulkload.main(InventoryLoader())

In this case we have created a loader for the Inventory model where the fields match the name and type of the fields in the model. Note that the loader is kept outside of the Django application.

Step 2 – Add the handler to the project: This is done by adding an entry to app.yaml that references loader.py:

# upload-demo/app.yaml

application: upload-demo
version: 1
runtime: python
api_version: 1

handlers:
- url: /load
  script: loader.py
  login: admin
- url: /.*
  script: main.py

A login will be required to access the loader URL—we don’t want anyone to add to our inventory without permission.

Step 3 – Convert the data to CSV:

Getting this step right can be surprisingly tricky, depending on your legacy database. For MySQL you may have to make sure that the user account has file write access rights:

[root@mybox ~]# mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 74740
Server version: 5.0.45 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> grant file on *.* to 'test'@'localhost';
Query OK, 0 rows affected (0.01 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)

This command might have to be run as root, depending on how your database is configured. To do the data dump we run the following select statement:

[test@mybox ~]$ mysql -u test customerdb -e "select * into
    outfile '/tmp/inventory.txt' fields terminated by ',' from
    inventory"
[test@mybox ~]$ cat /tmp/inventory.txt
ham,2
cheese,7
macaroni,1

If you are using PostgreSQL you can achieve the same by using the COPY command.

Step 4 – Upload the data: First, redeploy your application to App Engine:

[test@mybox ~]$ appcfg.py update upload-demo

We then use the bulkload_client.py script to upload our CSV file. The script is found in the tools folder of your App Engine installation—you may have to add it to your PATH. Note that you have to use double dashes for the parameters.

[test@mybox ~]$ bulkload_client.py --filename=/tmp/inventory.txt
    --kind=Inventory --url=http://upload-demo.appspot.com/load

INFO 2008-06-15 07:39:21,682 bulkload_client.py]
    Starting import; maximum 10 entities per post
INFO 2008-06-15 07:39:21,684 bulkload_client.py]
    Importing 3 entities in 29 bytes
ERROR 2008-06-15 07:39:21,997 bulkload_client.py]
    An error occurred while importing: Received code 302: Found
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="https://www.google.com/accounts/ServiceLogin?service=ah&
continue=http://upload-demo.appspot.com/_ah/login%3Fcontinue%3Dhttp://
upload-demo.appspot.com/load&ltmpl=gm&ahname=Django+data+u
pload+demo&sig=f9861d41d527e55f15742b8d54504bcc">here</A>.
</BODY></HTML>

ERROR    2008-06-15 07:39:21,997 bulkload_client.py] Import failed

Now, that didn’t work. Remember that app.yaml says we have to authenticate ourself as an admin user before we can upload data. Try visiting http://upload-demo.appspot.com/load in a web browser. After having authenticated yourself using your Google account you will be redirected to the following page:

Loader authentication screen

Just what we needed! Add the cookie string parameter to the previous request and try again:

[test@mybox ~]$ bulkload_client.py --filename=/tmp/inventory.txt
    --kind=Inventory --url=http://upload-demo.appspot.com/load
    --cookie='ACSID=AJKiYcE[...]1Hh4'

INFO 2008-06-15 07:50:58,541 bulkload_client.py]
    Starting import; maximum 10 entities per post
INFO 2008-06-15 07:50:58,549 bulkload_client.py]
    Importing 3 entities in 29 bytes
INFO 2008-06-15 07:50:59,102 bulkload_client.py]
    Import succcessful

If you visit http://upload-demo.appspot you should now see the data we just uploaded.

Final notes: This simple example should be enough to get you started. When converting real-life databases you will have to deal with more complex schemas with references between tables. The discussion here should point you in the right direction. You may also find the SDK documentation on types and property classes useful when porting your legacy database.

33 thoughts on “Porting legacy databases to Google App Engine

  1. Nice tutorial. I just wonder how You manage relations between objects? What if an entity has foreigh key to other?

  2. Thanks for sharing your knowledge.

    I tried this with the dev_appserver and it doesn’t seem to work. Is that why you go straight to the production environment?

  3. @vonG: I will cover foreign key relationships in a follow-up posting.

    @Frank: I was able to get it working locally with dev_appserver.py but I did not document it since the process is quite similar. The only difference is the login screen and the cookie format. What kind of error did you get?

  4. Actually I am having the same problem with dev_appserver and the production server :
    ERROR 2008-06-19 23:22:54,562 bulkload_client.py] An error occurred while importing: Received code 302: Requires login

    If I remove “login: admin” from my app.yaml it works. The cookie I’m using for dev is –cookie=’dev_appserver_login=”test@example.com:True”‘

    BTW, there’s a little problem with your bulkload_client.py example. With the version of the SDK that I’m running on Windows, all the “-” options should be “–“. So the example would be :
    bulkload_client.py -–filename=/tmp/inventory.txt -–kind=Inventory -–url=http://upload-demo.appspot.com/load –-cookie=’ACSID=AJKiYcE[…]1Hh4′

  5. @Frank: The dev cookie is the same as the one I have been using. I have not tested the whole thing on a Windows box so that might have something to do with it. You should also make sure that you are using the latest version of the App Engine SDK, but since it doesn’t work in production either that is probably not it.

  6. I have the same problem, bulk import just doesn’t seem to work under Windows.
    The problem occurs for both test server and production. Tried to debug it but eventually gave up.

  7. Hi Thomas when I run the bulk upholder command I receive a ‘Connection refused’ error.

    G:\Cherian\Google App Engine Projects\onlinemovieticket>python c:\google_appengi
    ne\tools\bulkload_client.py –filename names.csv –kind Customer –url http://lo
    calhost:8081/load
    INFO 2008-07-14 10:19:35,280 bulkload_client.py] Starting import; maximum 10
    entities per post
    INFO 2008-07-14 10:19:35,280 bulkload_client.py] Importing 4 entities in 71
    bytes
    ERROR 2008-07-14 10:19:36,358 bulkload_client.py] An error occurred while imp
    orting: (10061, ‘Connection refused’)
    ERROR 2008-07-14 10:19:36,358 bulkload_client.py] Import failed

    What could be the reason.

    Regards,
    Cherian

  8. @c-soft: I might look into the Windows issue in a later posting.

    @Cherian: I noticed that you are using port number 8081 instead of 8080. Unless you have changed the default port setting for the local development server then that could be the problem.

  9. Thomas,
    nice site… Although, I wonder if you’ve tried doing a bulk load on existing tables with a relation. I’ve tried but come up against a problem . I used the bulkloader example. I’ll post the exact problem later when i get home but for now I’ll just describe my schema.
    Member — Running club member
    Race — a race event racename,date, distance
    Runner — is child of Member and Race being the time for the race taken by the member.

    Alan…

  10. I don’t know if you can help but I wonder if you’ve tried doing a bulk load on existing tables with a relation. I’ve tried but come up against a problem . I used the bulkloader example.
    This is my schema.

    Member — Running club member
    Race — a race event racename,date, distance
    Runner — is child of Member and Race being the time for the race taken by the member.

    — These are my models

    from google.appengine.ext import webapp
    from google.appengine.ext import db

    class Member(db.Model):
    memberid = db.StringProperty()
    name = db.StringProperty()
    category = db.StringProperty()
    currentMember = db.StringProperty()
    member = Member()
    member.put()

    class Race(db.Model):
    raceid = db.StringProperty()
    name = db.StringProperty()
    date = db.StringProperty()
    distance = db.StringProperty()
    units = db.StringProperty()
    race = Race()
    race.put()
    class Runner(db.Model):
    racerunner = db.StringProperty()
    raceid = db.StringProperty()
    runnerid = db.StringProperty()
    runnername = db.StringProperty()
    race = db.ReferenceProperty(reference_class=Race)
    member = db.ReferenceProperty(reference_class=Member)
    time = db.StringProperty()
    position = db.StringProperty()
    category = db.StringProperty()
    WAVA = db.StringProperty()
    pace = db.StringProperty()
    notes = db.StringProperty()
    runner = Runner()
    runner.put()
    #This is my data to load
    #Runner.csv
    00310017,0031,0017,Bil Salters,2:32:05,M40,94,59.87,9:30,NOMATCH
    00310016,0031,0016,Iian Sallis,2:40:26,SM,103,57.59,10:01,NOMATCH
    00320018,0032,0018,Jon Church,35:02,SM,67.34,6:29,2nd M40
    00320019,0032,0019,Sarah Church,51:39,F45,105,51.26,9:33,3rd F45
    #member.csv
    0017,Bil Salters,M40,Yes
    0016,Iian Sallis,SM,No
    0018,Jon Church,SM,Yes
    0019,Sarah Church,F45,No
    #race.csv
    0031,North Mids XC Race 3 (men) ,13-01-2007,6.5,miles
    0032,Bedford Half Marathon ,10-12-2006,13.1,miles

    I tried to load Runner with this:
    #myRunnerloader.py
    from google.appengine.ext import bulkload
    from google.appengine.api import datastore_types
    from google.appengine.api import datastore_entities
    from google.appengine.ext import search
    class RunnerLoader(bulkload.Loader):
    def __init__(self):
    # Our ‘Race’ entity contains an id name string date distance and units
    bulkload.Loader.__init__(self, ‘Runner’,
    [(‘racerunner’, str),
    (‘raceid’, str),
    (‘runnerid’, str),
    (‘runnername’, str),
    (‘time’, str),
    (‘position’, str),
    (‘category’, str),
    (‘WAVA’, str),
    (‘pace’, str),
    (‘notes’, str),
    ])
    def HandleEntity(self, entity):
    runner = datastore_entities.Runner(racerunner)
    runner.update(entity)

    raceid = runner[‘raceid’]
    runnerid = runner[‘runnerid’]
    if raceid:
    race = datastore.Query(‘Race’,{‘raceid’: raceid}).Get(1)
    if not race:
    race = [datastore_entities.Race(raceid)]
    datastore.Put(race[0])
    runner[‘race’] = race[0].key()
    if runnerid:
    member = datastore.Query(‘Member’,{‘runnerid’: runnerid}).Get(1)
    if not member:
    member = [datastore_entities.Member(runnerid)]
    datastore.Put(member[0])
    [‘member’] = member[0].key()
    return runner
    if __name__ == ‘__main__’:
    bulkload.main(RunnerLoader())

    #And this cmd command on XP
    runnercd C:\Google_apps
    python “C:\Program Files\Google\google_appengine\tools\bulkload_client.py”
    –filename C:\Google_apps\data\Runnerfull.csv
    –kind Runner
    –url http://127.0.0.1:8080/runnerload
    And I get this error:

    C:\Documents and Settings\Alan Maplethorpe>cd C:\Google_apps
    C:\Google_apps>python “C:\Program Files\Google\google_appengine\tools\bulkload_client.py” –filename C:\Google_apps\data\Runner.csv –kind Runner –url http://
    127.0.0.1:8080/runnerload
    INFO 2008-10-16 14:05:35,157 bulkload_client.py] Starting import; maximum 10 entities per post
    INFO 2008-10-16 14:05:35,167 bulkload_client.py] Importing 4 entities in 260 bytes
    ERROR 2008-10-16 14:05:36,178 bulkload_client.py] An error occurred while importing: Received code 400: Bad Request
    Loading from line 1…error:
    Traceback (most recent call last):
    File “C:\Program Files\Google\google_appengine\google\appengine\ext\bulkload\__init__.py”, line 376, in LoadEntities
    new_entities = loader.CreateEntity(columns, key_name=key_name)
    File “C:\Program Files\Google\google_appengine\google\appengine\ext\bulkload\__init__.py”, line 235, in CreateEntity
    entities = self.HandleEntity(entity)
    File “C:\Google_apps\myrunningdata\myrunnerloader2.py”, line 23, in HandleEntity

    runner = datastore_entities.Runner(racerunner)
    AttributeError: ‘module’ object has no attribute ‘Runner’
    ERROR 2008-10-16 14:05:36,178 bulkload_client.py] Import failed
    C:\Google_apps>

    Any ideas?

    Alan…

  11. Thomas,

    I downloaded you app and test the bulkupload but is is not working for me still geting following error.

    302 Moved
    302 Moved
    The document has moved
    here.

    ERROR 2008-11-17 13:16:43,453 bulkload_client.py] Import failed

    please help me out.
    Thanks,
    Gampesh

  12. Gampesh, what happened when you attempted the admin authentication? Did you get a cookie flag when you visited (in your case) http://famousnspicy.appspot.com/load ? And did you pass it to the bulkload_client.py script as described? If you don’t get the authentication right then you will end up with the error you are describing.

  13. Yes Thomas i am able to get the cookies and after adding that cookies in my bulkupload command still getting same error.

  14. One more question if i have data in which have already comma then what happen google app engine will take care of that or is there any way to add escape while uploading data.

  15. Gampesh, it is difficult to say what may be the case without viewing your code and the full bulkload_client.py command input and output. Could it be your data that is causing the problem? Try with a very simple csv file first just to eliminate source data issues as a potential problem-causer.

    About your second question: I’m not sure how the App Engine reacts if you escape any commas in your input data file, so you’ll just have to try it out yourself. One possible alternative option is to modify the bulkload_client.py file. It uses the standard Python csv library, which can be told to use a different separator character. See the Python documentation for more details.

  16. I am receiving this error after adding the –cookie:

    INFO 2008-12-09 15:04:09,109 bulkload_client.py] Starting import; maximum 10
    entities per post
    INFO 2008-12-09 15:04:09,109 bulkload_client.py] Importing 3 entities in 29
    bytes
    ERROR 2008-12-09 15:04:09,609 bulkload_client.py] An error occurred while imp
    orting: Received code 302: Found

    302 Moved
    302 Moved
    The document has moved
    here.

    ERROR 2008-12-09 15:04:09,609 bulkload_client.py] Import failed

  17. sorry the error was message was not complete…

    INFO 2008-12-09 15:12:24,490 bulkload_client.py] Starting import; maximum 10
    entities per post
    INFO 2008-12-09 15:12:24,506 bulkload_client.py] Importing 3 entities in 29
    bytes
    ERROR 2008-12-09 15:12:25,038 bulkload_client.py] An error occurred while imp
    orting: Received code 302: Found

    302 Moved
    302 Moved
    The document has moved
    here.

    ERROR 2008-12-09 15:12:25,038 bulkload_client.py] Import failed

  18. Varun, are you sure that you added the correct cookie and that there isn’t e.g. any extra white space from copying and pasting it? Also, check the documentation for bulkload_client.py to see if there has been any changes to the client interface. I haven’t had the time to do more experimentation with Google App Engine since I wrote this entry, so things might have changed. I think there has been at least a couple of updates to the App Engine SDK since then.

  19. Hi Thomas,

    I can not retrive the cookie in the way you are showing. When i type in the browser http://friendsgrid.appspot.com/load it gives me a “403 Forbidden” error .. Error: Forbidden
    Your client does not have permission to get URL /load from this server.

    and when i try to do it without the cookie… this is the error:

    jvasquez@inline:~/mentez/google_appengine$ ./bulkload_client.py –filename=../mosaico_data.csv –kind=Orkut –url=http://friendsgrid.appspot.com/load
    INFO 2009-04-13 14:50:32,456 bulkload_client.py] Starting import; maximum 10 entities per post
    INFO 2009-04-13 14:50:32,465 bulkload_client.py] Importing 10 entities in 1196 bytes
    ERROR 2009-04-13 14:50:32,826 bulkload_client.py] An error occurred while importing: Received code 302: Found

    302 Moved
    302 Moved
    The document has moved
    here.

    ERROR 2009-04-13 14:50:32,827 bulkload_client.py] Import failed
    jvasquez@inline:~/mentez/google_appengine$

    and when i do it like this links shows http://code.google.com/appengine/docs/python/tools/uploadingdata.html it gives me authentication errors and 404 errors

    I would appreciate a lot if you can give me some help

    Thanks

  20. i forgot to say that i’m working on a linux machine…

    i dont know if that could be a problem

  21. You mentioned you’d talk about porting tables with foreign keys at some point — were you ever able to do that?

  22. Graham, unfortunately I didn’t—I pretty much stopped experimenting with App Engine after I wrote this posting. Maybe one day.

  23. Hi there!

    The bulk upload just does not work!

    tried everything!

    please help!

    I’m on a windows box..

    this is the error i get

    “An error occurred while importing: Received code 302: FOUND”
    “IMPORT FAILED”

Leave a Reply

Your email address will not be published. Required fields are marked *