A13X networks: python

Showing posts with label python. Show all posts

Tuesday, August 30, 2011

Testing authenticated handlers in Tornado

Tornado is an excellent little async HTTP framework and I have been using it in many projects I've been part of. Documentation is a bit lacking but access to sources makes all the difference.

But say you are implementing handlers that require authentication. You are versed in TDD and you want your handlers tested as well. Here's a little trick that I use to "mock" the authenticated user.

When writing tests, I use excellent pymox library and the following example uses that, but you can use any mocking library you prefer.

Here is the Handler which we want to test:

# encoding: utf-8
__author__ = 'alex'
import tornado.web

class Protected(tornado.web.RequestHandler):
    def get_current_user(self):
        # get an user from somewhere
 return self.retrieve_user_from_db()

    @tornado.web.authenticated
    def get(self):
        self.render("protected-page.html")

We want to test the get() method, but since it's decorated with authenticated decorator, testing the method will result in response code 403. To overcome that (and make our test green!), we'll mock get_current_user() in our Test - it's that easy:

# encoding: utf-8
__author__ = 'alex'
import os, os.path, sys
import tornado.web
import tornado.testing
import mox

import project.handlers

class TestAuthenticatedHandlers(tornado.testing.AsyncHTTPTestCase):
    def get_app(self):
        app = tornado.web.Application([(r'/protected', project.handlers.Protected)])
        self.mox.StubOutWithMock(tornado.web.RequestHandler, 'get_current_user', use_mock_anything=True)
        tornado.web.RequestHandler.get_current_user().AndReturn("authenticated_user")
        return app

    def setUp(self):
        self.mox = mox.Mox()
        super(TestAuthenticatedHandlers, self).setUp()

    def tearDown(self):
        self.mox.UnsetStubs()
        self.mox.ResetAll()

    def test_new_admin(self):
        self.mox.ReplayAll()
        resp = self.fetch('/protected')
        self.assertEqual(resp.code, 200)
 self.mox.VerifyAll()

I stubbed out the get_current_code() method on the RequestHandler, but if you have implemented a BaseHandler and extend your handlers from that, you should mock its method (it's closer to your code).

I mock objects extensively in my tests - it is a great practise and helps you focus better on the actual code you're testing. Hopefully this little trick will help you as well.

Monday, June 29, 2009

TinyURLs reloaded, now with Python3

Few days ago, very smart and great people behind Python project released stable version of Python 3.1; that brings bunch of improvements (including performance ones) and new features. You can check what's new in this release right here, if you wish.

I was quite impressed by the speed improvements over the 3.0.1 version of Python. It's quite fast and comparable to Python 2.6 version (which is pretty damn fast). Running few of my sources with the new version (to test speed and comformance) I noticed that my TinyURL example doesn't work anymore.

The string.letters is gone, and print is (as mentioned before) no longer a statement; it's a function. The updated function now looks like this:


import random
import string

def short_id(num): return "".join(random.sample(string.digits + string.ascii_letters, num))

print(short_id(6))

The only noticable change in the function itself is use of string.ascii_letters. There you have it - now Py3 compliant! :)

Thursday, October 2, 2008

Python 2.6 final

It's here, the final version of Python 2.6 has been released. This is the last 2.x release of Python before the almighty Py3K. You can read what is new in Python 2.6 and of course, grab your own copy.

I'm fetching the installation now, eager to see if the installation will work with PyObjC on my Mac. Next thing on the list is to install Stackless.

Wednesday, September 17, 2008

World is concurrent

Last few years or so, in my free time, I have been coding in Stackless Python. I'm using Python almost exclusively for my projects, so it comes natural to use Stackless for concurrency. Concurrent programming is very interesting and challenging, and Stackless Python makes it (very) bearable and easy.

Stackless Python introduces the concept of microthreads, where tasklets wrap functions allowing them to be launched as microthreads. Scheduling is built in and it can be either cooperative or preemptive. Finally, there are channels which can be used for communication between tasklets. Channels block tasklets, either receiving ones or ones sending, depending on if there is a waiting receiver (or sender, respectively). Another interesting thing is that tasklets can be serialized (pickled) to a disk (or any other storage media) and deserialized later on to be resumed.

Using this functionality provided by Stackless (through a single module) is very easy and intuitive. It keeps the Python code very readable and understandable and it even improves the structure of the program. Common usage patterns are available from the authors of Stackless to help people new to concurrent programming understand principles of how Stackless is used. It allows creating custom tasklet functionality (ie. named tasklets), as well as custom channel functionality (ie. broadcast channels, sending of messages with a timeout, etc).

As I mentioned, scheduling is built in and it is up to a programmer to choose the type: preemptive or cooperative. With cooperative scheduling one has to be careful to write code so that tasklets run cooperatively. During the design and implementation of the tasklets, programmer should pay attention to run the scheduler manually if the operation within the tasklet might make other tasklets suffer for not being able to run. On the other hand, with preemptive scheduling, the scheduler itself is configured to interrupt and reschedule running tasklets. I found the cooperative scheduling more useful in my implementations since it gives more control. You can, however, using preemptive scheduling kindly ask the scheduler not to put your running code back to the scheduling queue. One useful idiom is the Sleeping Tasklets, which blocks the tasklets for a certain period of time. Interestingly, that idiom uses channels to accomplish this.

Documentation of Stackless is available on the website and it covers basic functionality. Also, it offers examples and common patterns (idioms). The module itself is very well documented and available in the python interactive shell by typing help(stackless). The community is active on the mailing list, where help is always available. Every now and then there's a good discussion about the advanced usages of Stackless, and I highly recommend subscribing to the list.

Issues exist, however. Current CPython implementation suffers from the infamous GIL (or, Global Interpreter Lock) which makes it difficult for Python to fully utilize multicore systems (almost every recent computer nowadays). For those who don't know the effects of GIL, it is a mechanism to keep multiple threads from modifying the same object at the same time. Only one thread that acquires a lock on the object may modify that object, and the interpreter controls the acquiring and releasing of the lock, especially around potentially slow operations (such as I/O).

There has been quite a lot of talk about the GIL and whether it should (or not) be removed from Python. Back in '99, few brave souls (Greg Stein and Mark Hammond) tried and removed GIL from the Python code using locks on all mutable structures. According to benchmarks, it actually slowed the execution of code. The results showed that the execution was twice as slow in single threaded environment than with the GIL. If ran on multi-CPU (multi-core) systems, there would be no actual performance gains by removing the GIL.

To truly make the code distributed between CPUs, the solution is to run several python processes and communicate tasklets between them, however that can get (very) complicated, to say the least. Luckily, Python 2.6 (next major release of Python, with final release just around the corner) comes with multiprocessing module. This module supports spawning processes in a similar fashion threading module is used. More so, since the module follows the same API as threading module, it makes refactoring of your projects which use threading a breeze. Process objects can communicate between each other using Queues or Pipes, the latter being bidirectional (both parties represent ends of the pipe, with send() and recv() methods available for sending and receiving).

Using multiprocessing module, you can even distribute your processes remotely, and on top of that, it is possible to create a pool of workers for running tasks. Finally, since the module uses subprocesses instead of threads, the effects of aforementioned GIL can be circumvented.

Stackless by design (because of the scheduler) does not allow tasklets to access (and modify) data structures at the same time. Moreover, in true concurrency fashion, it utilizes channels for passing the data between tasklets. It still has a notion of a shared state between tasklets, but of course, without the dangers. With multiprocessing on Python 2.6, Stackless Python programs will be much more scalable than they are currently, utilizing multi-cpu and multi-core environments more efficiently. That still, however remains to be seen, as porting of Stackless to Python 2.6 is a work in progress.

Enter Erlang! For those of you who don't know what it is, Erlang is a functional programming language (an entirely different approach to programming compared to, ie OOP) designed in Ericsson for the purpose of developing concurrent, distributed, fault-tolerant and (soft) real-time applications.

Being functional, it does not have a concept of a state, it has single variable assignments (just as they taught you in Math classes), dynamic typing, pattern matching of functions (with guards), etc. On top of that, it has extremely lightweight processes with no shared memory. Processes pass messages around to communicate between themselves. Erlang runtime supports very high number of concurrent processes and they are actually abstracted from the operating system. It also supports dynamic distribution of processes, both locally (over multiple CPUs or cores) or remotely (over the network to another Erlang runtime node). Thus, you can build large scale distributed applications that run on machines with many CPUs and on many machines in the network. Since there's no shared state or shared variables, all traditional problems related to concurrency simply disappear as the need for locks is removed.

Erlang does give a lot of headache to a lot of people, because of its syntax. Comparing the syntax with Python's, I can say (being biased and all) that I definitely prefer Python's. On the other hand, I quite like the syntax of Erlang, too. It looked quite bizarre at first, but going through the documentation and examples helped me understand the basics of it. I have just got delivered the Erlang book I purchased, and am very excited to learn more about Erlang. As I have always been more interested in building backend applications, using Erlang seems like a good choice worth learning more about.

As a follow up, next article on this topic will be sprinkled with some code examples, both in Stackless Python and Erlang. I really like learning about new programming languages and frameworks by actually implementing something useful using them, so I will try to do so with Erlang. Some ideas are popping into mind and in next articles I might elaborate more on that, too.

Thursday, July 31, 2008

Python, the evolution

I just stumbled upon a project called code_swarm, which is an experiment in organic software visualisation. The project takes information from a source control system and creates a visual representation of the history of code commits. The end result looks quite awesome. Below is the code_swarm video of Python source code and its evolution.

The project's source code is available here, free of charge, of course. It is created by Michael Ogawa, using Processing environment. Great stuff!

Friday, July 11, 2008

Protocol Buffers

Google decided to open their tool for serializing structured data, called Protocol Buffers. It's language and platform neutral way of communicating data over networks or serializing it for storage. Interesting bit is that Google is using it in almost all of their projects.

On the surface, tt reminds me a lot of CORBA, especially the way you define message structures, but it differs from it a lot in terms of message exchange and serialization - you can store and/or communicate your data structure across the network by any means, unlike CORBA where you're forced to use CORBa message brokers. In my opinion, that's the main reason why CORBA was never so widely accepted.

How does it all work? First of all, you define your structure using a DSL and store it in a .proto file. You then compile that file using a tool and create data access classes for your language of choice. Classes can be generated for C++, Java and Python - my choice would be, as always, Python. Those generated classes are then used to create, populate, serialize and retrieve your protocol buffers messages.

The messages are very flexible - you can add new fields to your messages without breaking old code that's using them; they will simply ignore it. That functionality comes in very handy, especially for larger systems (think versioning and deployment).

Yes, but what about XML, you might ask? According to Google, PBs have many advantages over XML; they are:

simpler
3 to 10 times smaller
20 to 100 times faster
less ambiguous
generate data access classes that are easier to use programatically

I might disagree with the last one (think of JAXB in Java world), but I completely agree with the other ones - it's quite true that XML tends to be cumbersome and a big overkill, especially in environments where size and speed does matter.

For the end, I saved a very interesting quote from Google's PB pages:

Protocol buffers are now Google's lingua franca for data - at time of writing, there are 48,162 different message types defined in the Google code tree across 12,183 .proto files. They're used both in RPC systems and for persistent storage of data in a variety of storage systems.

Looks very interesting and very promising, considering Google is behind it. I'll follow up this article with some neat examples.

Wednesday, February 20, 2008

Jython development gaining momentum

Jython developers gathered last Sunday in San Francisco and held a sprint to work on the next major release. This will bring Jython implementation of Python on par with the CPython implementation, which is now version 2.5.1 (with a v2.5.2 release candidate 1 released to public testing just few days ago). The San Francisco Sprint focused on the Roadmap list for the Jython 2.5.

There is not much information to work with in regards to what is the outcome to the Sprint, but I suppose a lot of good work came out of it. Hopefully we will see a stable release soon. With Groovy and JRuby already having a steady (and growing) number of followers, Jython needs a jolt.