Thomas Brox Røst

Running pytst 1.15 on a 64-bit platform

January 25, 2009May 30, 2009 Thomas Brox Røst Leave a comment

Update: The latest version, 1.17, compiles on 64-bit platforms out of the box, so the patch below is no longer necessary.

Nicolas Lehuen’s pytst is a C++ ternary search tree implementation with a Python interface. It’s an excellent tool—and it is also really, really fast.

Unfortunately version 1.15 doesn’t compile on 64-bit platforms, giving the following error messages:

pythonTST.h:178: error: cannot convert 'int*' to 'Py_ssize_t*' for argument '3'
to 'int PyString_AsStringAndSize(PyObject*, char**, Py_ssize_t*)'
tst_wrap.cxx: In function 'PyObject* _wrap__TST_walk__SWIG_1(PyObject*, int, PyO
bject**)':
tst_wrap.cxx:3175: error: cannot convert 'int*' to 'Py_ssize_t*' for argument '3
' to 'int PyString_AsStringAndSize(PyObject*, char**, Py_ssize_t*)'
tst_wrap.cxx: In function 'PyObject* _wrap__TST_close_match(PyObject*, PyObject*
)':
tst_wrap.cxx:3250: error: cannot convert 'int*' to 'Py_ssize_t*' for argument '3
' to 'int PyString_AsStringAndSize(PyObject*, char**, Py_ssize_t*)'
tst_wrap.cxx: In function 'PyObject* _wrap__TST_prefix_match(PyObject*, PyObject
*)':
[...and so on...]

Until Nicolas releases an updated version, here is the quick fix:

cp pythonTST.h pythonTST.h.orig
cp tst_wrap.cxx tst_wrap.cxx.orig
sed -r 's/int size/Py_ssize_t size/' < tst_wrap.cxx.orig > tst_wrap.cxx
sed -r 's/int length/Py_ssize_t length/' < pythonTST.h.orig > tmpfile
sed -r 's/sizeof\(int\)/sizeof(long)/' < tmpfile > pythonTST.h

Run these commands from the pytst source directory and you should be all set. I’m not sure if this a fully satisfactory solution, but at least this will get the test suite running again.

Dostoevsky on the dangers of science

October 13, 2008June 11, 2009 Thomas Brox Røst Leave a comment

“He was devoured by the deepest and most insatiable passion, which absorbs a man’s whole life and does not, for beings like Ordynov, provide any niche in the domain of practical daily activity. This passion was science. Meanwhile it was consuming his youth, marring his rest at night with its slow, intoxicating poison, robbing him of wholesome food and of fresh air which never penetrated to his stifling corner. Yet, intoxicated by his passion, Ordynov refused to notice it. He was young and, so far, asked for nothing more. His passion made him a babe as regards external existence and totally incapable of forcing other people to stand aside when needful to make some sort of place for himself among them. Some clever people’s science is a capital in their hands; for Ordynov it was a weapon turned against himself.”

From “The Landlady” (1848)

3 lessons the individual investor can learn from JPMorgan Chase

September 29, 2008September 30, 2008 Thomas Brox Røst 2 Comments

JPMorgan Chase recently picked up the remains of troubled Washington Mutual for a mere $1.9 billion—a deal described by some as buying the company “for nothing.”

Although the takeover carries a fair amount of risk for JPMorgan, the bank looks increasingly likely to emerge as a credit crisis survivor. They still have a “reasonably strong” balance sheet and have by and large managed to steer clear of a debacle of historical proportions.

How did this come about? Also, are there elements to this story that you as an individual investor can learn from?

Fortune recently published a comprehensive account of how JPMorgan mostly avoided the subprime debacle (“Jamie Dimon’s Swat Team”), which is well worth a read. Based on the article, I have tried to summarize the main principles that have so far kept JPMorgan as a company out of trouble.

These principles are, in my opinion, just as valid on a personal level when you, as an individual investor, are to decide whether or not a company stock is a good buy. So, I have also tried to relate each principle to general best practices of investment.

1. It’s all in the numbers

In early 2006, JPMorgan were, like everyone else, dealing in subprime CDOs. By the end of that year, the bank had dumped more or less all of their subprime mortgage holdings. What happened?

First of all, the numbers were no longer looking good.

JPMorgan has a strong tradition of data-mining every aspect of their business and continuously trying to figure out the story behind the numbers. What Jamie Dimon, the CEO, and his team saw was that the subprime market was way to risky for the profits it was generating. Data from their retail banking division showed that subprime loan payments were increasingly late. Moreover, their own data analysis indicated that the supposedly safe AAA ratings lavished upon CDO bonds were bogus.

The numbers were increasingly and consistently negative and in sharp contrast to the conventional wisdom on the subprime market. Trusting the data and its interpretation rather than the general opinion, JPMorgan left the market altogether.

Learning points:

When evaluating a company as an investment opportunity, you can rely on a barrage of opinion from a sea of sources with a multitude of motivations.

Or, you can go straight to the facts.

The numbers in quarterly reports or annual accounts don’t lie unless deliberately tampered with. If the balance sheet tells you that a company is heading for trouble, then that company is heading for trouble, no matter what anyone else might be saying.

Learn how to read and understand the balance sheet, the income statement and the cash flow statement. Once understood, they tell you more about the company than any financial advisor or industry analyst ever will.
Be diligent in your pursuit of data, both on the company, the sector, and the general state of the economy.
Read the numbers first and then make up your own interpretation. Other people’s interpretations are not gospel, but rather a challenge to your own interpretation.
The opinionated parts of a company’s annual report are mostly fluff and should be read as such. Read the annual report from the back. It’s not a crime to talk about a company in optimistic terms; manipulating the numbers is.

2. Investment is not about short-term profit

JPMorgan exited the subprime market while it was still a booming business. This took a lot of guts when other Wall Street firms were making a killing from subprime.

In the short term they lost ground to competitors by not jumping on the latest Street bandwagon. Their conservative stance and the effect it had on quarterly earnings must have generated immense pressure, both internally and externally. From 2005 to 2007, JPMorgan fell from third to sixth place in fixed-income underwriting. This is the sort of development that causes ruckus in board meetings.

Nonetheless: In the long term they prevailed.

Their decision to trust their analysis of subprime being too risky turned out to be a sensible one, even if this meant a very negative short-term impact on their balance sheets.

By focusing on core company values rather than pursuing immediate profit, JPMorgan emerged on top.

Learning points:

You can certainly make money from overhyped stocks whose valuation belies the true worth of their business. This, however, requires you to play the game of getting out before the bubble of irrational exuberance pops.

Timing the market is ultimately about luck. Luck is a property that is best reserved for the lottery rather than your savings.

Fashion does not imply quality. Just because everyone else is ecstatic about something—be it dot-com companies or the mullet—does not mean you should be as well. Only get with the crowd if there is a fundamentally sane reason for doing so. Dare to be different.

Be prepared to stick it out as long as the underlying fundamentals of your analysis does not change. Quality always prevails in the long term.
Even fundamentally good stocks go down if they are unpopular. This is the way the market rolls; don’t lose any sleep over it.
You will not be able to consistently time the ups and downs of the market, so don’t even try to. Learn to live with the fact that good stocks will sometimes go down for no good reason.
Don’t check your portfolio every five minutes. Apart from keeping you from getting any other work done, it will only lead you to perceive the market as more volatile than it really is. Think the market is too volatile? Just reduce your sampling frequency. Remember that you are in it for the long run.
Listen to other people but don’t let them make your decisions. Even if they have a compelling chain of arguments there are likely more conclusions that can be drawn from the same set of underlying facts. Your explanation of why to invest should always be your own.

3. Question and diversify

JPMorgan operating-committee meetings are described as “loud and unsubtle”. According to Bill Daley, head of corporate responsibility and former Secretary of Commerce, “[p]eople were challenging Jamie, debating him, telling him he was wrong. It was like nothing I’d seen in a Bill Clinton cabinet meeting, or anything I’d ever seen in business.”

This culture of allowing, encouraging and listening to dissent ultimately made it easier for JPMorgan to make the right decisions. Getting all facts and viewpoints on the table while continously questioning what they were doing was a major success factor.

Still, JPMorgan made their own share of mistakes.

In 2007 a short-term secured loans unit bought a $2 billion subprime CDO—upper management claims they never knew. Other billion-dollar write-offs had to be endured as well. Their principle of only taking risks when you are paid well for doing so is anything but perfect. It also remains to be seen how well timed their shotgun purchase of Washington Mutual turns out—among its assets are an estimated $30 billion’s worth of loans that have to be written down. However, on the whole they look to be emerging from the credit crisis as a much healthier company than their surviving competitors.

Learning points:

There is no such thing as a risk-free investment, so be prepared to accept losses. JPMorgan’s competitors put themselves in a position where some of them could not weather a downturn in one of their business segments. JPMorgan, on the other hand, were doing what they could to make sure their good moves outweighed their bad moves.

Making bold investment choices always carries the probability of failure. Use diversification as a cushion for when failure strikes.

In bicycle racing, one does not talk in terms of if a rider will take a tumble but rather about when. The same should apply to your investments.

Moreover, be prepared to continuously question your own judgment. The premises for earlier decisions will change, so be prepared to revert on them. If faith becomes a stronger motivation than reason for holding on to stock it’s probably time to let go.

Hedge your investments. No matter the soundness of your strategy or how diligently you stick to your principles, things will still go wrong from time to time. Don’t allow mishaps to take you down.
Be prepared to change your position. Things change, the world keeps turning, and so should you.
Don’t get emotional about a stock. Your favorite company might turn from making mostly good decisions to making mostly bad decisions. These things happen, so be prepared to get out even if this means taking a loss.
If your sole reason for hanging on to a stock is the belief of future recovery then you have already lost. Get rid of it, count your losses, and learn from the experience.

Hacking comments in Django 1.0

September 5, 2008January 6, 2009 Thomas Brox Røst 9 Comments

The recent release of Django 1.0 included a full rewrite of the comments framework. Comments have been available in Django for a while but were never properly documented until now.

This article will show you how to adapt and extend the comments framework so that it fits the needs of your application. Why extend it? Well, mainly because the framework does what it says on the box—and nothing more. It allows you to attach comments to any Django object instance but for the rest of the business logic—e.g. regulating who can modify and delete comments—you are on your own.

Also, the current documentation does not cover all features so what I am writing here should hopefully fill a few gaps.

Prerequisites

You need to be familiar with Django. If you’re not, then have a look at the tutorial in the official documentation or alternatively at my previous article on how to get started with Django on Google App Engine.

How comments work

It’s really simple—just skim through the well-written documentation and you should pretty much be able to figure it out.

For example, to show a comment form for an instance of a model called my_model_instance, you just need two lines of template code:

{% load comments %}
{% render_comment_form for my_model_instance %}

The magic behind the comments framework lies in its use of generic model relations. This is a very powerful (and well-hidden) Django feature that allows your models to have generic foreign keys, meaning they can link to any other model. The comments framework uses this technique to ensure that comments can be attached to an arbitrary model in your application.

The scenario

I will be describing a real-life case from my company web site, Eventseer.net. Eventseer is an event tracker that helps researchers stay informed on upcoming conferences and workshops. It uses the comments framework for two different purposes.

Firstly, registered users can add comments to each event in our database. Secondly, all users can claim a personal profile page where they get what we call a whiteboard—which is simply a blogging application. Each entry on a whiteboard can be commented on by other registered users.

The problem

There are some limitations when it comes to adding comments on Eventseer. For example, only registered users are allowed to add comments. After a comment has been added, only the user who added it or an administrator are allowed to delete it.

These are fairly typical requirements—which are not supported out of the box in the comments framework. There is some support for using the built-in permissions system, but this will still not let you exercise fine-grained per user access control.

Moreover, the default comment templates are ugly as sin and will have to adapted to fit your application.

Step 1: Enabling comments

This is described well enough in the standard documentation. However, if we want to add extra functionality there are a couple of extra things to be done.

First, we add the comments framework to INSTALLED_APPS in settings.py:

# eventseer/settings.py

INSTALLED_APPS = (
    ...
    'django.contrib.comments',
    'eventseer.mod_comments',
    ...
)

Note that I also added an app called eventseer.mod_comments. This is where our comments wrapper code will reside. (I will be using the eventseer project name for the rest of this tutorial).

Now synchronize the database:

$ python manage.py syncdb

This creates the tables necessary for storing the comments.

Finally, add an entry in your base urls.py:

# eventseer/urls.py

from django.conf.urls.defaults import *

urlpatterns = patterns('',
    ...
    (r'^comments/', include('eventseer.mod_comments.urls')),
)

This is where we deviate from the standard documentation: Instead of routing all comment URLs to the bundled comments application we instead route them to our own custom application. This allows us to intercept comment URLs as required.

Step 2: Add the modified comments application

This is done the usual way:

$ python manage.py startapp mod_comments

In the previous step we added a reference to urls.py in the mod_comments application, so this file must be added:

# eventseer/mod_comments/urls.py

from django.conf.urls.defaults import *

urlpatterns = patterns('',
    (r'^delete/(?P<comment_id>\d+)/$', 'eventseer.mod_comments.views.delete'),
    (r'', include('django.contrib.comments.urls')),
)

The first line routes requests to /comments/delete/ to a custom delete view which we will create in the next step. For this example this is the only behavior we wish to modify. The last line ensures that all other requests are passed through to django.contrib.comments.urls.

Step 3: Create the wrapper view

We want to make sure that only the user who wrote a comment or administrators are allowed to delete it. This can be taken care of in mod_comments/views.py:

# eventseer/mod_comments/views.py

from django.contrib.auth.decorators import login_required
from django.contrib.comments.models import Comment
from django.http import Http404
from django.shortcuts import get_object_or_404
import django.contrib.comments.views.moderation as moderation

@login_required
def delete(request, comment_id):
    comment = get_object_or_404(Comment, pk=comment_id)
    if request.user == comment.user or \
       request.user.is_staff:
        return moderation.delete(request, comment_id)
    else:
        raise Http404

First we wrap the delete function with the login_required decorator so as to keep out non-authenticated users. We then check if the user who made the delete request actually owns the comment or if the user has administrator permissions. If either case holds true we pass the request on to the original delete method. Otherwise a 404 (page not found) error is raised.

We can of course modify the view method signature as required. In fact, the original delete method can be completely bypassed if that is what we want.

Step 4: Modifying delete behavior

By default the delete view shows a confirmation page (comments/delete.html) on GET requests and does the actual deletion on POST requests. After the deletion is done you will be shown the standard deleted.html template. Alternatively, adding a next parameter to the POST request will send the user to the given URL.

Say we wish to make some changes to the confirmation page, comments/delete.html. Instead of modifying the original in the Django distribution we create our own version. Create the directory eventseer/mod_comments/templates/comments and copy delete.html into it.

You will typically find this file in /usr/lib/python2.5/site-packages/django/contrib/comments/templates/comments on Linux systems or C:/Python2.5/Lib/site-packages/django/contrib/comments/templates/comments on Windows systems—your mileage may vary.

Typically you will wish to change this template to fit in with your site design, for instance by inheriting from your base templates.

To make the modified template take precedence, just add the new directory to settings.py:

# eventseer/settings.py

TEMPLATE_DIRS = (
    ...
    '/home/eventseer/src/eventseer/mod_comments/templates',
    ...
)

This will make sure that the Django URL resolver queries the eventseer/mod_comment/templates directory—where it will find our alternative version of comments/delete.html. Requests to other comment views that use the other default templates will be passed through to the correct default location.

Conclusion

The Django comments framework is the easiest and quickest way to add commenting functionality to your application. The flip side of this simplicity is that you will often have to extend the framework to make it behave according to your requirements. As this tutorial have shown, this can be done without making changes to the comments framework itself. One of the core strengths of Django is how it provides a set of reusable building blocks upon which you can add your own advanced functionality as required.

At the time of writing, the comments framework documentation is somewhat sparse. If you want to learn more about the inner workings of Django comments you will have to consult the source code—there are quite a few undocumented features that are really useful.

UPDATE

Tim Hoelscher noticed that I hadn’t said anything about how to work around the Django permission system, which was an unintentional omission.

The original delete method in django.contrib.comments.views.moderation requires that the user who wants to delete a comment has the comments.can_moderate permission. Regular users do not have this permission by default, so we have to set it for all users who are allowed to delete comments. (Remember, the wrapper delete makes sure that they can only delete their own comments.)

An easy way to solve this is to create a ‘user’ group, assign the comments.can_moderate permission to this group, and finally assign all users to this group. This can be done through the admin interface, with a few lines of SQL, or within your Django application. Refer to the Django permissions documentation for more information on how permissions work.

Persistent Django on Amazon EC2 and EBS – The easy way

August 21, 2008February 13, 2010 Thomas Brox Røst 69 Comments

Now that Amazon’s Elastic Block Store (EBS) is publicly available, running a complete Django installation on Amazon Web Services (AWS) is easier than ever.

Why EBS? EBS provides persistent storage, which means that the Django database is kept safe even after the Django EC2 instances terminate.

This tutorial will take you through all the necessary steps for setting up Django with a persistent PostgreSQL database on AWS. I will be assuming no prior knowledge of AWS, so those of you who have dabbled with it before might want to skim through the first steps. Knowing your way around Django is an advantage but not a requirement.

I am deliberately keeping things simple—to get a deeper understanding of the hows and whys of AWS you should take a look at James Gardner’s excellent article as well as the official documentation.

The command line tools can be a bit intimidating so I will also show you how Elasticfox can be a fully satisfactory alternative.

Summary

We are going to register with AWS, get acquainted with Elasticfox, start up an EC2 instance, install Django and PostgreSQL on the instance, and finally mount an EBS drive and move our database to it.

Step 1: Set up an AWS account

To use AWS you need to register at the AWS web page. If you already have an account with Amazon you can extend this to also cover AWS.

Step 2: Download and install the Elasticfox Firefox extension

This tool will make life a whole lot easier for you. Down the road there is no avoiding the official command line tools or alternatively boto if you want to access AWS programmatically. For now, let’s stick with Elasticfox.

You can install the extension from this page.

Step 3: Add your AWS credentials to Firefox

Launch Elasticfox (‘Tools’ -> ‘Elasticfox’) and click on the ‘credentials’ button. Enter your account name (typically the email address you registered with), AWS access key and AWS secret access key. This information can be found via the ‘Your web services account’ on the AWS start page.

Step 4: Create a new EC2 security group

Let’s pause for a while to consider what we are doing.

You will be running your Django installation off an EC2 instance. There is no magic to them at all—they are simply fully functional servers that you access the same way as, say, a dedicated server or a web hosting account.

By default, EC2 instances are an introverted lot: They prefer keeping to themselves and don’t expose any of their ports to the outside world. We will be running a web application on port 8000 so therefore port 8000 has to be opened. (Normally we would be opening port 80, but since I will only be using the Django development web server then port 8000 is preferable). SSH access is also essential, so port 22 should be opened as well.

To make this happen we must create a new security group where these ports are opened.

Click on the ‘Security Groups’ tab and then the ‘Refresh’ icon. The list should update to show you the ‘default’ group.

Then click the ‘Create Security Group’ icon and create a new group named ‘django’.

Now we need to add the actual permissions. Click the ‘Grant Permission’ icon and add ‘From port 8000 to 8000’ under ‘Protocol Details’. Repeat the same action for port 22.

Your security group is now ready for use.

Step 5: Set up a key pair

Having a security group is not enough; we also have to set up a key pair to access the instance via SSH.

Why is this necessary? Think about it: You are launching a server instance but no one has told you the root password. So, setting up a private/public key pair is the only way to gain access.

Click on the ‘KeyPairs’ tab and then the ‘Create a new keypair’ icon. Name your new key pair ‘django-keypair’. A save dialog will pop up, allowing you to save the private key in a safe location. Use the filename ‘django.pem’.

Step 6: Launch an EC2 instance

I have a certain fondness for Fedora, so I’ll be using the fedora-8-i386-base-v1.07 AMI with AMI ID ami-2b5fba42.

Return to the ‘AMIs and Instances’ tab.

If you click the ‘Refresh’ icon in the ‘Machine Images’ section you will get a list of all public images. To find the one we’re after, enter ‘fedora-8’ in the search box—after a while all the relevant images will appear.

Right-click the image with the AMI ID as above and select ‘Launch instance(s) of this AMI’.

This is where the actions from the previous steps start making sense. Set the key pair to ‘django-keypair’ and add the ‘django’ security group to the launch set. Leave all the other settings as they are. Then click the ‘Launch’ button.

Important: From this point and on the meter will be running! If the fire alarm goes off, you get bored with this tutorial, or whatever: Do remember to shut down the instance before you leave, otherwise it will cost you $2.40 per day.

The ‘Your Instances’ section should update, showing you that the instance you just launched is ‘pending’. Click the ‘Refresh’ icon after a while—in a minute or so the status should change to ‘running’.

Step 7: Connect with your new instance

Double click on the running instance and copy the ‘Public DNS Name’ entry. This is the domain name you use to access the instance from the outside. In this tutorial, my instance is hosted at ‘ec2-75-101-248-101.compute-1.amazonaws.com’.

Now we are going to SSH into the instance. I am doing this via Cygwin on Windows, but any SSH client should do. If you are on Windows and have Putty installed you can even launch directly from Elasticfox by right-clicking on the running instance and selecting ‘SSH to Public DNS Name’.

Let’s start with a basic sanity check:

$ ssh root@ec2-75-101-248-101.compute-1.amazonaws.com
The authenticity of host 'ec2-75-101-248-101.compute-1.amazonaws.
com (75.101.248.101)' can't be established.
RSA key fingerprint is db:0a:85:36:99:5f:65:6b:c7:77:3e:37:59:fc:16:fd.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ec2-75-101-248-101.compute-1.amazonaws.
com,75.101.248.101' (RSA) to the list of known hosts.
Permission denied (publickey,gssapi-with-mic).

As expected, this isn’t working; we need to use the private key you saved earlier. Go to the directory where you saved the django.pem file and type the following:

$ ssh -i django-keypair.pem root@ec2-75-101-248-101.compute-1.amazonaws.com

         __|  __|_  )  Fedora 8
         _|  (     /    32-bit
        ___|\___|___|

 Welcome to an EC2 Public Image
                       : -)
    Base

[root@ ~]#

That’s better!

If you try pointing your browser towards ‘http://ec2-75-101-248-101.compute-1.amazonaws.com:8000/’ you should get a ‘can’t establish a connection’ error since there is no web server running on port 8000 as of yet.

Step 8: Install required software

Most AMI instances are stripped to the bone, so we have to add the software packages we need to get Django up and running. The steps required will of course vary from AMI to AMI, but running the following script as root is sufficient for our v1.07 Fedora 8 instance:

# Install subversion
yum -y install subversion

# Install, initialize and launch PostgreSQL
yum -y install postgresql postgresql-server
service postgresql initdb
service postgresql start

# Modify PostgreSQL config to avoid username/password problems
# Note: This grants access to _all_ local traffic!
cat &gt; /var/lib/pgsql/data/pg_hba.conf &lt;&lt;EOM
local all all trust
host all all 127.0.0.1/32 trust
EOM

# Restart PostgreSQL to enable new security policy
service postgresql restart

# Set up a database for Django
psql -U postgres -c &quot;create database djangotest encoding 'utf8'&quot;

# Install Django (I always checkout from SVN)
cd /opt
svn co http://code.djangoproject.com/svn/django/trunk/ django-trunk
ln -s /opt/django-trunk/django /usr/lib/python2.5/site-packages/django
ln -s /opt/django-trunk/django/bin/django-admin.py /usr/local/bin

# Install psycopg2 (for database access from Python)
yum -y install python-psycopg2

Step 9: Set up a Django project

First we set up an account for our test Django project:

[root ~]# useradd djangotest
[root ~]# su - djangotest
[djangotest ~]$

For the full story on how to create a new Django project you should have a look at the official tutorial. For now, just execute the following as the ‘djangotest’ user:

[djangotest ~]$ django-admin.py startproject mysite

Now we have all we need to test if the installation is working. Launch the development server like this:

[djangotest ~]$ python mysite/manage.py runserver ec2-75-101-248-101.compute-1.amazonaws.com:8000
Validating models...
0 errors found

Django version 1.0-beta_1-SVN-8461, using settings 'mysite.settings'
Development server is running at http://ec2-75-101-248-101.compute-1.amazonaws.com:8000/
Quit the server with CONTROL-C.

Note that I am using the full external domain name with the ‘runserver’ command.

Visit ‘http://ec2-75-101-248-101.compute-1.amazonaws.com:8000/’ with your browser and you should see the regular Django ‘It worked!’ page.

Note: Please don’t use the Django development server in a production setting. In fact, you probably shouldn’t use it on anything that is exposed to the outside world. The only reason I am doing it this way in this tutorial is to keep things simple—normally you should set up a proper web server such as Apache or Lighttpd. Refer to the Django documentation for information on how to do this.

Step 10: Create a Django application

I will show you how to put the Django database in persistent storage later on, so we have to set up a simple database-backed Django application.

Modify mysite/settings.py as follows:

DATABASE_ENGINE = 'postgresql_psycopg2'
DATABASE_NAME = 'djangotest'
DATABASE_USER = 'postgres'
DATABASE_PASSWORD = ''
...

INSTALLED_APPS = (
    'django.contrib.admin',
    'django.contrib.auth',
...

Then modify mysite/urls.py to allow access to the admin GUI:

from django.conf.urls.defaults import *

# Uncomment the next two lines to enable the admin:
from django.contrib import admin
admin.autodiscover()

urlpatterns = patterns('',
    # Example:
    # (r'^mysite/', include('mysite.foo.urls')),

    # Uncomment the next line to enable admin documentation:
    # (r'^admin/doc/', include('django.contrib.admindocs.urls')),

    # Uncomment the next line to enable the admin:
    (r'^admin/(.*)', admin.site.root),
)

Now we have to sync the database:

[djangotest ~]$ python mysite/manage.py syncdb

You will be asked to create an admin user—set both the username and the password to ‘djangotest’.

Then create a Django app:

[djangotest ~]$ python mysite/manage.py startapp myapp

If you got the preceding steps right, you should now be able to log on to the admin GUI at http://ec2-75-101-248-101.compute-1.amazonaws.com:8000/admin/ with the ‘djangotest’ user.

Add a new user to verify that the database connection works—we will be needing that new user later on.

Step 11: Create and mount an EBS instance

This is where things get really cool!

There is a huge problem with our current setup: Once you shut down the AMI instance, all the data in our database will disappear. Enter EBS.

EBS lets you define a persistent storage volume that can be mounted by EC2 instances. If we move our database files to an EBS volume then they will persist no matter what happens to our EC2 instances.

First, go back to Elasticfox and make a note of the availability zone of your running instance—this should be something like ‘us-east-1b’.

Then click on the ‘Volumes and Snapshots’ tab. Click the ‘Create Volume’ icon and create a 1GB volume that belongs to the same availability zone as your instance.

Right-click the new volume and choose ‘Attach this volume’. This will let you attach the volume to the running instance. Use /dev/sdh as the mount point. Refresh after a couple of seconds and the ‘Attachment status’ should have changed to ‘attached’.

Go back to your terminal and create an ext3 filesystem on the new volume:

[root ~]# mkfs.ext3 /dev/sdh
mke2fs 1.40.4 (31-Dec-2007)
/dev/sdh is entire device, not just one partition!
Proceed anyway? (y,n) y
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
131072 inodes, 262144 blocks
13107 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=268435456
8 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 35 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

All that remains is to mount the filesystem, in this case to /vol:

[root ~]# echo &quot;/dev/sdh /vol ext3 noatime 0 0&quot; &gt;&gt; /etc/
fstab
[root ~]# mkdir /vol
[root ~]# mount /vol
[root ~]# df --si
Filesystem             Size   Used  Avail Use% Mounted on
/dev/sda1               11G   1.4G   8.8G  14% /
/dev/sda2              158G   197M   150G   1% /mnt
none                   895M      0   895M   0% /dev/shm
/dev/sdh               1.1G    35M   969M   4% /vol

Step 12: Moving the database to persistent storage

First make sure that PostgreSQL is stopped:

[root ~]# service postgresql stop
Stopping postgresql service:                               [  OK  ]

You should also terminate your Django development server in case it is still running.

Now move the PostgreSQL database files to the EBS volume mounted at /vol:

[root ~]# mv /var/lib/pgsql /vol

For this to work we have to make a small modification to the /etc/init.d/postgresql file—make sure that the lines starting at around line 100 look exactly like this:

...
# Set defaults for configuration variables
PGENGINE=/usr/bin
PGPORT=5432
PGDATA=/var/lib/pgsql
if [ -f &quot;$PGDATA/PG_VERSION&quot; ] &amp;amp;&amp;amp; [ -d &quot;$PGDATA/base/template1&quot; ]
then
        echo &quot;Using old-style directory structure&quot;
else
        PGDATA=/var/lib/pgsql/data
fi
PGDATA=/vol/pgsql/data
PGLOG=/vol/pgsql/pgstartup.log
...

Note that this is a Fedora-specific hack—the main idea is to have the $PGDATA system variable point at /vol/pgsql/data.

For other databases the procedure will differ. A similar procedure for MySQL is available here.

PostgreSQL can now be restarted:

[root ~]# service postgresql start
Starting postgresql service:                               [  OK  ]

To verify that Django is using the same database as before you can revisit the admin GUI—the new user you added previously should still be available.

And there you have it!

Step 13: Shutting down

For completeness’ sake, let’s review the steps required to shut everything down.

First, stop the database server and unmount the EBS volume:

[root ~]# service postgresql stop
Stopping postgresql service:                               [  OK  ]
[root ~]# umount /vol

Then return to Elasticfox, right-click the EBS volume and select ‘Detach this instance’. When you are done with this tutorial you can delete the volume instance as well—having it in storage will cost you money.

Finally, go to the ‘AMIs and Instances’ tab and terminate the running instance. That should conclude your current transaction with AWS. (Refresh the volume and instances sections to verify that everything has really shut down).

Final words

If you now repeat steps 6 to 11 you should be able to launch a brand new EC2 instance that uses the database on your stored volume—this is left as an exercise for the reader. The only deviations from the procedure are that you shouldn’t have to run the PostgreSQL ‘initdb’ command, or create the ‘djangotest’ database.

This has been a bare-bones introduction to how EBS lets you run a persistent Django installation on AWS. In real life, the following issues have to be considered:

Use a proper web server.
Make sure the web server log files, database log, django logs etc. are moved to persistent storage as well.
Create a custom AMI that is properly set up for your Django project (so that you don’t have to do the full setup procedure every time you launch an instance).

Then there’s scaling, backup, and so on. Nonetheless, hopefully this article should be enough to get you started.

Addendum

A reader pointed out that the PostgreSQL user home directory should also be changed. While I haven’t tried this myself, the correct procedure is probably to do a usermod -d /vol/pgsql postgres as root.

Serving static files with Django and AWS – going fast on a budget

August 14, 2008 Thomas Brox Røst Leave a comment

I just posted an article on how to improve Django response times through the use of pre-generated static files:

Speed matters.

When Google tried adding 20 extra results to their search pages, traffic dropped by 20%. The reason? Page generation took an extra .5 seconds.

This article will show how Eventseer utilizes an often overlooked way of improving the responsiveness of a web application: Pre-generating and serving static files instead of dynamic pages.

The full posting can be read here.

Porting legacy databases to Google App Engine

June 15, 2008June 1, 2009 Thomas Brox Røst 33 Comments

A reader posed the following question:

“I’m trying to convert my django app to work with google app engine. This is preferred rather than spending $100/year extra for a site with ssh access, plus I love the appengine dashboard.

Here is my issue: My current django app is fairly static. It pulls all its data from a mysql database containing ~6,000 rows. This itself is built from a gadfly database, so it should be pretty easy to get these values into the datastore/gql.

How can I sync my database with appengine?”

This is a highly relevant problem if you are porting an existing Django application to the Google App Engine. Luckily, the App Engine SDK includes a bulk data uploader tool that does the job. Let’s work through an example where we use this tool to transfer data from an existing MySQL database onto a Django application running on Google App Engine.

Case description: We have an inventory database that is currently stored in MySQL. This database is to be made available through a Django web application that allows visitors to review the inventory. The database is named ‘customerdb’ and has a single table called ‘inventory’:’

mysql> select * from inventory;
+----------+----------+
| name     | quantity |
+----------+----------+
| ham      |        2 |
| cheese   |        7 |
| macaroni |        1 |
+----------+----------+
3 rows in set (0.00 sec)

Setup: We need an App Engine-ready Django application that provides us with the views and models we need to display our inventory. For this scenario we will name the application ‘upload-demo’ and make it available on http://upload-demo.appspot.com. My earlier tutorials should provide you with what you need to build the basic application structure.

The full set of application files can be downloaded here. References to the application name and paths will have to be changed according to your system setup.

Once the fundamentals are in place you should add an inventory model that mirrors the table in our database:

# upload-demo/uploaddemo/main/models.py

from google.appengine.ext import db

class Inventory(db.Model):
    name = db.StringProperty()
    quantity = db.IntegerProperty()

We also need a view that displays the data:

# upload-demo/uploaddemo/main/views.py

from django.http import HttpResponse
from uploaddemo.main.models import Inventory

def main(request):
    result = ""
    items = Inventory.all()

    for item in items:
        result += "%s: %i<br/>" % (item.name, item.quantity)

    return HttpResponse(result)

Finally, your urls.py should point towards the view:

# upload-demo/uploaddemo/urls.py

from django.conf.urls.defaults import *

urlpatterns = patterns("",
    (r"^$", "uploaddemo.main.views.main"),
)

The application directory structure should look exactly like this:

Project directory structure

To verify that we are good to go, deploy the application to App Engine:

[test@mybox ~]$ appcfg.py update upload-demo

You should see an empty page—which makes sense since we have no data yet.

Step 1 – Create a bulk load handler: The bulk loader accepts CSV-formatted data which it will feed it into the datastore:

# upload-demo/loader.py

from google.appengine.ext import bulkload

class InventoryLoader(bulkload.Loader):
    def __init__(self):
        fields = [
            ("name", str),
            ("quantity", int)
        ]
       
        bulkload.Loader.__init__(self, "Inventory", fields)

if __name__ == "__main__":
    bulkload.main(InventoryLoader())

In this case we have created a loader for the Inventory model where the fields match the name and type of the fields in the model. Note that the loader is kept outside of the Django application.

Step 2 – Add the handler to the project: This is done by adding an entry to app.yaml that references loader.py:

# upload-demo/app.yaml

application: upload-demo
version: 1
runtime: python
api_version: 1

handlers:
- url: /load
  script: loader.py
  login: admin
- url: /.*
  script: main.py

A login will be required to access the loader URL—we don’t want anyone to add to our inventory without permission.

Step 3 – Convert the data to CSV:

Getting this step right can be surprisingly tricky, depending on your legacy database. For MySQL you may have to make sure that the user account has file write access rights:

[root@mybox ~]# mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 74740
Server version: 5.0.45 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> grant file on *.* to 'test'@'localhost';
Query OK, 0 rows affected (0.01 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)

This command might have to be run as root, depending on how your database is configured. To do the data dump we run the following select statement:

[test@mybox ~]$ mysql -u test customerdb -e "select * into
    outfile '/tmp/inventory.txt' fields terminated by ',' from
    inventory"
[test@mybox ~]$ cat /tmp/inventory.txt
ham,2
cheese,7
macaroni,1

If you are using PostgreSQL you can achieve the same by using the COPY command.

Step 4 – Upload the data: First, redeploy your application to App Engine:

[test@mybox ~]$ appcfg.py update upload-demo

We then use the bulkload_client.py script to upload our CSV file. The script is found in the tools folder of your App Engine installation—you may have to add it to your PATH. Note that you have to use double dashes for the parameters.

[test@mybox ~]$ bulkload_client.py --filename=/tmp/inventory.txt
    --kind=Inventory --url=http://upload-demo.appspot.com/load

INFO 2008-06-15 07:39:21,682 bulkload_client.py]
    Starting import; maximum 10 entities per post
INFO 2008-06-15 07:39:21,684 bulkload_client.py]
    Importing 3 entities in 29 bytes
ERROR 2008-06-15 07:39:21,997 bulkload_client.py]
    An error occurred while importing: Received code 302: Found
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="https://www.google.com/accounts/ServiceLogin?service=ah&
continue=http://upload-demo.appspot.com/_ah/login%3Fcontinue%3Dhttp://
upload-demo.appspot.com/load&ltmpl=gm&ahname=Django+data+u
pload+demo&sig=f9861d41d527e55f15742b8d54504bcc">here</A>.
</BODY></HTML>

ERROR    2008-06-15 07:39:21,997 bulkload_client.py] Import failed

Now, that didn’t work. Remember that app.yaml says we have to authenticate ourself as an admin user before we can upload data. Try visiting http://upload-demo.appspot.com/load in a web browser. After having authenticated yourself using your Google account you will be redirected to the following page:

Loader authentication screen

Just what we needed! Add the cookie string parameter to the previous request and try again:

[test@mybox ~]$ bulkload_client.py --filename=/tmp/inventory.txt
    --kind=Inventory --url=http://upload-demo.appspot.com/load
    --cookie='ACSID=AJKiYcE[...]1Hh4'

INFO 2008-06-15 07:50:58,541 bulkload_client.py]
    Starting import; maximum 10 entities per post
INFO 2008-06-15 07:50:58,549 bulkload_client.py]
    Importing 3 entities in 29 bytes
INFO 2008-06-15 07:50:59,102 bulkload_client.py]
    Import succcessful

If you visit http://upload-demo.appspot you should now see the data we just uploaded.

Final notes: This simple example should be enough to get you started. When converting real-life databases you will have to deal with more complex schemas with references between tables. The discussion here should point you in the right direction. You may also find the SDK documentation on types and property classes useful when porting your legacy database.

On why having your own hedge fund is a good career choice

April 22, 2008April 22, 2008 Thomas Brox Røst Leave a comment

“If you take big, even reckless, bets and win, you have a great year and you get a great bonus—or in the case of hedge funds, 20% of the profits. If you lose money the following year, you lose your investors’ money rather than your own—and you don’t have to give back last year’s bonus. Heads, you win; tails, you lose someone else’s money.”

(Chaos on Wall Street, explained)

Django on Google App Engine: Templates and static files

April 20, 2008July 21, 2008 Thomas Brox Røst 11 Comments

In a previous tutorial we learned how to set up a simple Django project on the Google App Engine. We also saw how to use the App Engine datastore in place of the Django model API.

Now, let’s have a look at how to integrate Django templates. I will also show you how to serve static files.

Important: Remember to upgrade to the latest version of the App Engine SDK (version 1.0.1 at the time of writing). Otherwise, this tutorial will not work for you if you are developing on Windows.

Step 1: Set up an App Engine project—I am calling mine djangostatic. Follow steps 1 through 7 from the previous tutorial, remembering to substitute the project directory path and project name in main.py and app.yaml, and you will be all set.

Step 2: We will create a simple view that makes use of a template. First, let us define the template. Create a directory where you can store templates:

tmp/djangostatic$ cd djangostatic/main
tmp/djangostatic/djangostatic/main$ mkdir -p templates/main

Then, add the file main.html to your new template directory:

# djangostatic/djangostatic/main/templates/main/main.html

<html>
    <head>
        <link href="/css/main.css" type="text/css"
               rel="stylesheet"></link>
    </head>
    <body>
        <p>
            Hello world!
        </p>
    </body>
</html>

Note that the template refers to a style sheet file, main.css, which we will create later on.

Step 3: Django needs to be told where to search for template files: this is done in the settings.py file. The settings file is mostly pre-configured; we just have to modify the part that sets the TEMPLATE_DIRS variable:

# djangostatic/djangostatic/settings.py

import os
ROOT_PATH = os.path.dirname(__file__)

TEMPLATE_DIRS = (
    ROOT_PATH + "/main/templates",
)

Step 4: After creating the template and telling Django where to find it, we have to write a view that does the actual rendering:

# djangostatic/djangostatic/main/views.py

from django.shortcuts import render_to_response

def main(request):
    return render_to_response("main/main.html")

This tells Django to use the template main/main.html when rendering the response. The render_to_response method is a convenient shortcut for rendering a template and returning a response in one step.

Step 5: Finally, we need to map a URL to our view—this is done in urls.py:

# djangostatic/djangostatic/urls.py

from django.conf.urls.defaults import *

urlpatterns = patterns("",
    (r"^$", "djangostatic.main.views.main"),
)

Start your development server (dev_appserver.py djangostatic), fire up your browser, and open the page at http://127.0.0.1:8080/. If you have done everything right so far, you should get the “hello world” message from the template.

Step 6: So what about the style sheet file, main.css? A style sheet file is a typical example of a static file. We use Django for rendering dynamic pages, so requests for static files should not be handled by the Django engine. In a regular Django application, we usually configure the web server to route such requests to a specific directory. On the App Engine, we achieve the same effect by adding a static handler to app.yaml:

# djangostatic/app.yaml

application: djangostatic
version: 1
runtime: python
api_version: 1

handlers:
- url: /css
  static_dir: media/css
 
- url: /.*
  script: main.py

Here, we have added an entry that routes all requests beginning with /css to the directory media/css. Let us create this directory:

tmp/djangostatic$ mkdir -p media/css

Step 7: The link in our template specified /css/main.css as the full URL, so we have to add the main.css file to our new directory:

# djangostatic/media/css/main.css

p {
    font-size: 48px;
}

Reload the application page; the browser should now be able to make use of the style sheet so that the message is displayed in a larger font. You can view the final results here.

Final notes: To learn more about how to serve static files on App Engine, have a look at the official documentation on how to configure an app. Django templates are very powerful—this tutorial has only shown you the absolute basics. Visit the Django template documentation to get the full story.

On the value of ignorance

April 14, 2008 Thomas Brox Røst Leave a comment

“It is best for the author to be born away from literary centres, or to be excluded from their ruling set if he be born in them. It is best that he starts out with his thinking, not knowing how much has been thought and said about everything.

A certain amount of ignorance will insure his sincerity, will increase his boldness and shelter his genuineness, which is his hope of power.

Not ignorance of life, but life may be learned in any neighborhood;

—not ignorance of the greater laws which govern human affairs, but they may be learned without a library of historians and commentators, by imaginitive sense, by seeing better than by reading;

—not ignorance of the infinitudes of human circumstance, but knowledge of these may come to a man without the intervention of universities;

—not ignorance of one’s self and of one’s neighbor, but innocence of the sophistications of learning, its research without love, its knowledge without inspiration, its method without grace; freedom from its shame at trying to know many things as well as from its pride of trying to know but one thing; ignorance of that faith in small confounding facts which is contempt for large reassuring principles …”

(Woodrow Wilson, “How books become immortal”, 1891)