Archive for the ‘python’ Category

Introducing pupyMPI

Monday, January 25th, 2010

Jan, Frederik and I just released the first – somewhat stable – version of pupyMPI, a 100% pyre python implementation of the MPI standard. Or, as close to the standard as we saw fit.

Most python-mpi projects are bindings to some C implementation which gives a lot of strengths. It runs very very fast for one thing. So fast you can actually use it for real production if you want. In our opinion it’s not that useful, as most real applications will depend om some further systems and most real clusters will probably only allow you to run your C and FORTRAN stuff. They do however give developers a nice way to develop rapid prototypes, learn and play with MPI. pupyMPI boosts all threes while keeping the system to close to regular MPI, you can probably convert to regular MPI if you need the performance.

A quick example

Just to show how fast you can actually implementing programs in pupyMPI here is a quick example of a distributed program. It does a monte carlo pi simulation with some user defined number of simulations:

#!/usr/bin/python/

from mpi import MPI
import sys
from random import random
from math import sqrt

mpi = MPI()
world = mpi.MPI_COMM_WORLD
rank = world.rank()

try:
    simulations = int(sys.argv[1])
except:
    simulations = 1000000

per_rank = simulations / world.size()

hits = 0

for i in xrange(per_rank):
    # Simulate a part hitting inside the unity circle.
    x = random()
    y = random()

    if sqrt(x*x+y*y) < 1.0:
        hits+=1

# Gather the sum of hits from each process at the process
# with rank 0
total_hits = world.reduce(hits, sum, root=0)

if rank == 0:
    pi = float(total_hits)*4 / simulations
    print "Estimating PI on %d nodes through %d simulations yield %f"
         % (world.size(), simulations, pi)

mpi.finalize()

The output of several runs given below:

$ mpirun.py -c 2 monte_carlo_pi.py -- 1000
Estimating PI on 2 nodes through 1000 simulations yield 3.184000

$ mpirun.py -c 4 monte_carlo_pi.py -- 10000
Estimating PI on 4 nodes through 10000 simulations yield 3.135200

$ mpirun.py -c 8 monte_carlo_pi.py -- 100000
Estimating PI on 8 nodes through 100000 simulations yield 3.136560

$ mpirun.py -c 10 monte_carlo_pi.py -- 1000000
Estimating PI on 10 nodes through 1000000 simulations yield 3.144464

$ mpirun.py -c 10 monte_carlo_pi.py -- 10000000
Estimating PI on 10 nodes through 10000000 simulations yield 3.141069

$ mpirun.py -c 10 monte_carlo_pi.py -- 100000000
Estimating PI on 10 nodes through 100000000 simulations yield 3.141677

The above example have very little communication involved other than the final exchange of hits. We implemented a lot of different communication operations as you can see in the online documentation.

Performance

Don't expect much, as a pupyMPI program will normally run 15-20 times slower than the C equivalent. But hopefully it will prove a fine educational tool and maybe also be used for fast prototyping. If we get the time (and credit at school) we'll performance tune it to get within a factor 10 of the C version.

Howto get custom GET variables into djangos admin list pages

Tuesday, May 12th, 2009

Yesterday I ranted a bit about how to insert extra GET parameters in the django change list page without having the admin blow up when it was not able to recognise them as filters. I’m using this for building a custom menu that uses GET parameters to fold / unfold a submenu. The menu is context aware so I can’t do this by looking up the path, ie. several submenu points can like to /admin/auth/user/, usually just with different filter parameters.

I haven’t been able to find a nice solution for this, but here is a dirty one. As always I find myself using middlewares when something takes a nasty turn. The idea is to put the information in GET parameters, but use a middleware to actually find the special parameters and insert them on the request object. A context processor can then later give template the variables. I’m controlling which parameters we should fetch with a simple settings tuple:

#!/usr/bin/python/
ADMIN_GET_VARS = (
    ('submenu', None),
)

In my case I have only defined the submenu parameter, but you could take as many as you’ll like. The second element in the tuple (The None after “submenu”) is a default value that will be used if the GET parameter is not present. The middleware looks like this:

#!/usr/bin/python/

from django.conf import settings

class ForwardGetMiddleware:
    def process_request(self, request):
        new_get = request.GET.copy()
        custom_vars = {}
        if hasattr(settings, "ADMIN_GET_VARS"):
            for name, default in settings.ADMIN_GET_VARS:
                custom_vars[name] = new_get.get(name, default)
                try:
                    del new_get[name]
                except KeyError:
                    pass
        request.ADMIN_GET_VARS = custom_vars
        request.GET = new_get

Here is something important. We need to remove the get parameter from the request.GET as it will otherwise make the admin blow up. It’s therefore important that you don’t use parameter names that will clash with the admin. I’m considering introducing a prefix on every parameter to leave the admin alone. The context processor is very simple due to the way we build the middleware:

#!/usr/bin/python/

def admin_get_vars(request):
    return getattr(request, "ADMIN_GET_VARS", {} )

So there it is in a very basic stage. It’s possible to introduce more advanced settings to enable finer control with when this functionality kicks in, but it’s already a hack, so I don’t think there is any need for it. Also, one might consider trying to limit this to specific urls to improve performance and limit the number of side effects this might have.

A small rant about the django admin!

Monday, May 11th, 2009

I’m currently working on a project involving a huge amount of applications and models, all of which should of cause be managed by the administration. The deadline is of cause way to near to write the thing from scratch and that would also be a waste of time. Instead I’m customising the django.contrib.admin site, which is working very well most of the time. Normally when it doesn’t I can look at the code and accept that the admin is not build to handle everything. What I can’t understand however is why the list pages would try to handle all the GET parameters as filters and return an error if it can’t handle them?

From my point of view it could be a nice way to actually send different information to specific pages and handle them to my base_site.html (or whatever) through a context processor. From my point of view there is no other easy way to do this. Please Django people, why have you marked this a “design decision needed“??!

Levenshtein Distance

Monday, February 23rd, 2009

I wrote a little python function to calculate the Levenshtein distance for an assignment. I figured it was some time ago since I released code on my blog, so here goes. (more…)

Python, now extra cool with vim..

Monday, September 15th, 2008

I’ll give credit to Mads for this one, as he was so kind to throw an email with this link to me earlier today. It shows how much you can actually do with vim and how powerfull the configuration possiblities are.

This is a good step forwards the holy and great .vimrc file I had some years ago but lost due to my own foolness and a rm-command :)