A reader posed the following question:
“I’m trying to convert my django app to work with google app engine. This is preferred rather than spending $100/year extra for a site with ssh access, plus I love the appengine dashboard.
Here is my issue: My current django app is fairly static. It pulls all its data from a mysql database containing ~6,000 rows. This itself is built from a gadfly database, so it should be pretty easy to get these values into the datastore/gql.
How can I sync my database with appengine?”
This is a highly relevant problem if you are porting an existing Django application to the Google App Engine. Luckily, the App Engine SDK includes a bulk data uploader tool that does the job. Let’s work through an example where we use this tool to transfer data from an existing MySQL database onto a Django application running on Google App Engine.
Case description: We have an inventory database that is currently stored in MySQL. This database is to be made available through a Django web application that allows visitors to review the inventory. The database is named ‘customerdb’ and has a single table called ‘inventory’:’
mysql> select * from inventory; +----------+----------+ | name | quantity | +----------+----------+ | ham | 2 | | cheese | 7 | | macaroni | 1 | +----------+----------+ 3 rows in set (0.00 sec)
Setup: We need an App Engine-ready Django application that provides us with the views and models we need to display our inventory. For this scenario we will name the application ‘upload-demo’ and make it available on http://upload-demo.appspot.com. My earlier tutorials should provide you with what you need to build the basic application structure.
The full set of application files can be downloaded here. References to the application name and paths will have to be changed according to your system setup.
Once the fundamentals are in place you should add an inventory model that mirrors the table in our database:
# upload-demo/uploaddemo/main/models.py from google.appengine.ext import db class Inventory(db.Model): name = db.StringProperty() quantity = db.IntegerProperty()
We also need a view that displays the data:
# upload-demo/uploaddemo/main/views.py from django.http import HttpResponse from uploaddemo.main.models import Inventory def main(request): result = "" items = Inventory.all() for item in items: result += "%s: %i<br/>" % (item.name, item.quantity) return HttpResponse(result)
Finally, your urls.py should point towards the view:
# upload-demo/uploaddemo/urls.py from django.conf.urls.defaults import * urlpatterns = patterns("", (r"^$", "uploaddemo.main.views.main"), )
The application directory structure should look exactly like this:
To verify that we are good to go, deploy the application to App Engine:
[test@mybox ~]$ appcfg.py update upload-demo
You should see an empty page—which makes sense since we have no data yet.
Step 1 – Create a bulk load handler: The bulk loader accepts CSV-formatted data which it will feed it into the datastore:
# upload-demo/loader.py from google.appengine.ext import bulkload class InventoryLoader(bulkload.Loader): def __init__(self): fields = [ ("name", str), ("quantity", int) ] bulkload.Loader.__init__(self, "Inventory", fields) if __name__ == "__main__": bulkload.main(InventoryLoader())
In this case we have created a loader for the Inventory model where the fields match the name and type of the fields in the model. Note that the loader is kept outside of the Django application.
Step 2 – Add the handler to the project: This is done by adding an entry to app.yaml that references loader.py:
# upload-demo/app.yaml application: upload-demo version: 1 runtime: python api_version: 1 handlers: - url: /load script: loader.py login: admin - url: /.* script: main.py
A login will be required to access the loader URL—we don’t want anyone to add to our inventory without permission.
Step 3 – Convert the data to CSV:
Getting this step right can be surprisingly tricky, depending on your legacy database. For MySQL you may have to make sure that the user account has file write access rights:
[root@mybox ~]# mysql -u root Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 74740 Server version: 5.0.45 Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> grant file on *.* to 'test'@'localhost'; Query OK, 0 rows affected (0.01 sec) mysql> flush privileges; Query OK, 0 rows affected (0.01 sec)
This command might have to be run as root, depending on how your database is configured. To do the data dump we run the following select statement:
[test@mybox ~]$ mysql -u test customerdb -e "select * into outfile '/tmp/inventory.txt' fields terminated by ',' from inventory" [test@mybox ~]$ cat /tmp/inventory.txt ham,2 cheese,7 macaroni,1
If you are using PostgreSQL you can achieve the same by using the COPY command.
Step 4 – Upload the data: First, redeploy your application to App Engine:
[test@mybox ~]$ appcfg.py update upload-demo
We then use the bulkload_client.py script to upload our CSV file. The script is found in the tools folder of your App Engine installation—you may have to add it to your PATH. Note that you have to use double dashes for the parameters.
[test@mybox ~]$ bulkload_client.py --filename=/tmp/inventory.txt --kind=Inventory --url=http://upload-demo.appspot.com/load INFO 2008-06-15 07:39:21,682 bulkload_client.py] Starting import; maximum 10 entities per post INFO 2008-06-15 07:39:21,684 bulkload_client.py] Importing 3 entities in 29 bytes ERROR 2008-06-15 07:39:21,997 bulkload_client.py] An error occurred while importing: Received code 302: Found <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8"> <TITLE>302 Moved</TITLE></HEAD><BODY> <H1>302 Moved</H1> The document has moved <A HREF="https://www.google.com/accounts/ServiceLogin?service=ah& continue=http://upload-demo.appspot.com/_ah/login%3Fcontinue%3Dhttp:// upload-demo.appspot.com/load<mpl=gm&ahname=Django+data+u pload+demo&sig=f9861d41d527e55f15742b8d54504bcc">here</A>. </BODY></HTML> ERROR 2008-06-15 07:39:21,997 bulkload_client.py] Import failed
Now, that didn’t work. Remember that app.yaml says we have to authenticate ourself as an admin user before we can upload data. Try visiting http://upload-demo.appspot.com/load in a web browser. After having authenticated yourself using your Google account you will be redirected to the following page:
Just what we needed! Add the cookie string parameter to the previous request and try again:
[test@mybox ~]$ bulkload_client.py --filename=/tmp/inventory.txt --kind=Inventory --url=http://upload-demo.appspot.com/load --cookie='ACSID=AJKiYcE[...]1Hh4' INFO 2008-06-15 07:50:58,541 bulkload_client.py] Starting import; maximum 10 entities per post INFO 2008-06-15 07:50:58,549 bulkload_client.py] Importing 3 entities in 29 bytes INFO 2008-06-15 07:50:59,102 bulkload_client.py] Import succcessful
If you visit http://upload-demo.appspot you should now see the data we just uploaded.
Final notes: This simple example should be enough to get you started. When converting real-life databases you will have to deal with more complex schemas with references between tables. The discussion here should point you in the right direction. You may also find the SDK documentation on types and property classes useful when porting your legacy database.
Nice tutorial. I just wonder how You manage relations between objects? What if an entity has foreigh key to other?
Thanks for sharing your knowledge.
I tried this with the dev_appserver and it doesn’t seem to work. Is that why you go straight to the production environment?
@vonG: I will cover foreign key relationships in a follow-up posting.
@Frank: I was able to get it working locally with dev_appserver.py but I did not document it since the process is quite similar. The only difference is the login screen and the cookie format. What kind of error did you get?
Actually I am having the same problem with dev_appserver and the production server :
ERROR 2008-06-19 23:22:54,562 bulkload_client.py] An error occurred while importing: Received code 302: Requires login
If I remove “login: admin” from my app.yaml it works. The cookie I’m using for dev is –cookie=’dev_appserver_login=”test@example.com:True”‘
BTW, there’s a little problem with your bulkload_client.py example. With the version of the SDK that I’m running on Windows, all the “-” options should be “–“. So the example would be :
bulkload_client.py -–filename=/tmp/inventory.txt -–kind=Inventory -–url=http://upload-demo.appspot.com/load –-cookie=’ACSID=AJKiYcE[…]1Hh4′
@Frank: The dev cookie is the same as the one I have been using. I have not tested the whole thing on a Windows box so that might have something to do with it. You should also make sure that you are using the latest version of the App Engine SDK, but since it doesn’t work in production either that is probably not it.
I have the same problem, bulk import just doesn’t seem to work under Windows.
The problem occurs for both test server and production. Tried to debug it but eventually gave up.
Hi Thomas when I run the bulk upholder command I receive a ‘Connection refused’ error.
G:\Cherian\Google App Engine Projects\onlinemovieticket>python c:\google_appengi
ne\tools\bulkload_client.py –filename names.csv –kind Customer –url http://lo
calhost:8081/load
INFO 2008-07-14 10:19:35,280 bulkload_client.py] Starting import; maximum 10
entities per post
INFO 2008-07-14 10:19:35,280 bulkload_client.py] Importing 4 entities in 71
bytes
ERROR 2008-07-14 10:19:36,358 bulkload_client.py] An error occurred while imp
orting: (10061, ‘Connection refused’)
ERROR 2008-07-14 10:19:36,358 bulkload_client.py] Import failed
What could be the reason.
Regards,
Cherian
@c-soft: I might look into the Windows issue in a later posting.
@Cherian: I noticed that you are using port number 8081 instead of 8080. Unless you have changed the default port setting for the local development server then that could be the problem.
Great, this will provide a nice start…
Thomas,
nice site… Although, I wonder if you’ve tried doing a bulk load on existing tables with a relation. I’ve tried but come up against a problem . I used the bulkloader example. I’ll post the exact problem later when i get home but for now I’ll just describe my schema.
Member — Running club member
Race — a race event racename,date, distance
Runner — is child of Member and Race being the time for the race taken by the member.
Alan…
I don’t know if you can help but I wonder if you’ve tried doing a bulk load on existing tables with a relation. I’ve tried but come up against a problem . I used the bulkloader example.
This is my schema.
Member — Running club member
Race — a race event racename,date, distance
Runner — is child of Member and Race being the time for the race taken by the member.
— These are my models
from google.appengine.ext import webapp
from google.appengine.ext import db
class Member(db.Model):
memberid = db.StringProperty()
name = db.StringProperty()
category = db.StringProperty()
currentMember = db.StringProperty()
member = Member()
member.put()
class Race(db.Model):
raceid = db.StringProperty()
name = db.StringProperty()
date = db.StringProperty()
distance = db.StringProperty()
units = db.StringProperty()
race = Race()
race.put()
class Runner(db.Model):
racerunner = db.StringProperty()
raceid = db.StringProperty()
runnerid = db.StringProperty()
runnername = db.StringProperty()
race = db.ReferenceProperty(reference_class=Race)
member = db.ReferenceProperty(reference_class=Member)
time = db.StringProperty()
position = db.StringProperty()
category = db.StringProperty()
WAVA = db.StringProperty()
pace = db.StringProperty()
notes = db.StringProperty()
runner = Runner()
runner.put()
#This is my data to load
#Runner.csv
00310017,0031,0017,Bil Salters,2:32:05,M40,94,59.87,9:30,NOMATCH
00310016,0031,0016,Iian Sallis,2:40:26,SM,103,57.59,10:01,NOMATCH
00320018,0032,0018,Jon Church,35:02,SM,67.34,6:29,2nd M40
00320019,0032,0019,Sarah Church,51:39,F45,105,51.26,9:33,3rd F45
#member.csv
0017,Bil Salters,M40,Yes
0016,Iian Sallis,SM,No
0018,Jon Church,SM,Yes
0019,Sarah Church,F45,No
#race.csv
0031,North Mids XC Race 3 (men) ,13-01-2007,6.5,miles
0032,Bedford Half Marathon ,10-12-2006,13.1,miles
I tried to load Runner with this:
#myRunnerloader.py
from google.appengine.ext import bulkload
from google.appengine.api import datastore_types
from google.appengine.api import datastore_entities
from google.appengine.ext import search
class RunnerLoader(bulkload.Loader):
def __init__(self):
# Our ‘Race’ entity contains an id name string date distance and units
bulkload.Loader.__init__(self, ‘Runner’,
[(‘racerunner’, str),
(‘raceid’, str),
(‘runnerid’, str),
(‘runnername’, str),
(‘time’, str),
(‘position’, str),
(‘category’, str),
(‘WAVA’, str),
(‘pace’, str),
(‘notes’, str),
])
def HandleEntity(self, entity):
runner = datastore_entities.Runner(racerunner)
runner.update(entity)
raceid = runner[‘raceid’]
runnerid = runner[‘runnerid’]
if raceid:
race = datastore.Query(‘Race’,{‘raceid’: raceid}).Get(1)
if not race:
race = [datastore_entities.Race(raceid)]
datastore.Put(race[0])
runner[‘race’] = race[0].key()
if runnerid:
member = datastore.Query(‘Member’,{‘runnerid’: runnerid}).Get(1)
if not member:
member = [datastore_entities.Member(runnerid)]
datastore.Put(member[0])
[‘member’] = member[0].key()
return runner
if __name__ == ‘__main__’:
bulkload.main(RunnerLoader())
#And this cmd command on XP
runnercd C:\Google_apps
python “C:\Program Files\Google\google_appengine\tools\bulkload_client.py”
–filename C:\Google_apps\data\Runnerfull.csv
–kind Runner
–url http://127.0.0.1:8080/runnerload
And I get this error:
C:\Documents and Settings\Alan Maplethorpe>cd C:\Google_apps
C:\Google_apps>python “C:\Program Files\Google\google_appengine\tools\bulkload_client.py” –filename C:\Google_apps\data\Runner.csv –kind Runner –url http://
127.0.0.1:8080/runnerload
INFO 2008-10-16 14:05:35,157 bulkload_client.py] Starting import; maximum 10 entities per post
INFO 2008-10-16 14:05:35,167 bulkload_client.py] Importing 4 entities in 260 bytes
ERROR 2008-10-16 14:05:36,178 bulkload_client.py] An error occurred while importing: Received code 400: Bad Request
Loading from line 1…error:
Traceback (most recent call last):
File “C:\Program Files\Google\google_appengine\google\appengine\ext\bulkload\__init__.py”, line 376, in LoadEntities
new_entities = loader.CreateEntity(columns, key_name=key_name)
File “C:\Program Files\Google\google_appengine\google\appengine\ext\bulkload\__init__.py”, line 235, in CreateEntity
entities = self.HandleEntity(entity)
File “C:\Google_apps\myrunningdata\myrunnerloader2.py”, line 23, in HandleEntity
runner = datastore_entities.Runner(racerunner)
AttributeError: ‘module’ object has no attribute ‘Runner’
ERROR 2008-10-16 14:05:36,178 bulkload_client.py] Import failed
C:\Google_apps>
Any ideas?
Alan…
Thomas,
I downloaded you app and test the bulkupload but is is not working for me still geting following error.
302 Moved
302 Moved
The document has moved
here.
ERROR 2008-11-17 13:16:43,453 bulkload_client.py] Import failed
please help me out.
Thanks,
Gampesh
Gampesh, what happened when you attempted the admin authentication? Did you get a cookie flag when you visited (in your case) http://famousnspicy.appspot.com/load ? And did you pass it to the bulkload_client.py script as described? If you don’t get the authentication right then you will end up with the error you are describing.
Yes Thomas i am able to get the cookies and after adding that cookies in my bulkupload command still getting same error.
One more question if i have data in which have already comma then what happen google app engine will take care of that or is there any way to add escape while uploading data.
Gampesh, it is difficult to say what may be the case without viewing your code and the full bulkload_client.py command input and output. Could it be your data that is causing the problem? Try with a very simple csv file first just to eliminate source data issues as a potential problem-causer.
About your second question: I’m not sure how the App Engine reacts if you escape any commas in your input data file, so you’ll just have to try it out yourself. One possible alternative option is to modify the bulkload_client.py file. It uses the standard Python csv library, which can be told to use a different separator character. See the Python documentation for more details.
I am receiving this error after adding the –cookie:
INFO 2008-12-09 15:04:09,109 bulkload_client.py] Starting import; maximum 10
entities per post
INFO 2008-12-09 15:04:09,109 bulkload_client.py] Importing 3 entities in 29
bytes
ERROR 2008-12-09 15:04:09,609 bulkload_client.py] An error occurred while imp
orting: Received code 302: Found
302 Moved
302 Moved
The document has moved
here.
ERROR 2008-12-09 15:04:09,609 bulkload_client.py] Import failed
sorry the error was message was not complete…
INFO 2008-12-09 15:12:24,490 bulkload_client.py] Starting import; maximum 10
entities per post
INFO 2008-12-09 15:12:24,506 bulkload_client.py] Importing 3 entities in 29
bytes
ERROR 2008-12-09 15:12:25,038 bulkload_client.py] An error occurred while imp
orting: Received code 302: Found
302 Moved
302 Moved
The document has moved
here.
ERROR 2008-12-09 15:12:25,038 bulkload_client.py] Import failed
Varun, are you sure that you added the correct cookie and that there isn’t e.g. any extra white space from copying and pasting it? Also, check the documentation for bulkload_client.py to see if there has been any changes to the client interface. I haven’t had the time to do more experimentation with Google App Engine since I wrote this entry, so things might have changed. I think there has been at least a couple of updates to the App Engine SDK since then.
Hi Thomas,
I can not retrive the cookie in the way you are showing. When i type in the browser http://friendsgrid.appspot.com/load it gives me a “403 Forbidden” error .. Error: Forbidden
Your client does not have permission to get URL /load from this server.
and when i try to do it without the cookie… this is the error:
jvasquez@inline:~/mentez/google_appengine$ ./bulkload_client.py –filename=../mosaico_data.csv –kind=Orkut –url=http://friendsgrid.appspot.com/load
INFO 2009-04-13 14:50:32,456 bulkload_client.py] Starting import; maximum 10 entities per post
INFO 2009-04-13 14:50:32,465 bulkload_client.py] Importing 10 entities in 1196 bytes
ERROR 2009-04-13 14:50:32,826 bulkload_client.py] An error occurred while importing: Received code 302: Found
302 Moved
302 Moved
The document has moved
here.
ERROR 2009-04-13 14:50:32,827 bulkload_client.py] Import failed
jvasquez@inline:~/mentez/google_appengine$
and when i do it like this links shows http://code.google.com/appengine/docs/python/tools/uploadingdata.html it gives me authentication errors and 404 errors
I would appreciate a lot if you can give me some help
Thanks
…
i forgot to say that i’m working on a linux machine…
i dont know if that could be a problem
hello i tried to download the link: http://thomas.broxrost.com/wp-content/uploads/2008/06/upload-demo.zip however this seems to be broken.
regards
james
James, thanks for noticing! The file is back up again.
You mentioned you’d talk about porting tables with foreign keys at some point — were you ever able to do that?
Graham, unfortunately I didn’t—I pretty much stopped experimenting with App Engine after I wrote this posting. Maybe one day.
Hi there!
The bulk upload just does not work!
tried everything!
please help!
I’m on a windows box..
this is the error i get
“An error occurred while importing: Received code 302: FOUND”
“IMPORT FAILED”
I recommended looking a the new bulkload tool, which should be much easier to use. The documentation is currently available at http://code.google.com/appengine/docs/python/tools/uploadingdata.html