Wednesday, September 17, 2008

World is concurrent

Last few years or so, in my free time, I have been coding in Stackless Python. I'm using Python almost exclusively for my projects, so it comes natural to use Stackless for concurrency. Concurrent programming is very interesting and challenging, and Stackless Python makes it (very) bearable and easy.

Stackless Python introduces the concept of microthreads, where tasklets wrap functions allowing them to be launched as microthreads. Scheduling is built in and it can be either cooperative or preemptive. Finally, there are channels which can be used for communication between tasklets. Channels block tasklets, either receiving ones or ones sending, depending on if there is a waiting receiver (or sender, respectively). Another interesting thing is that tasklets can be serialized (pickled) to a disk (or any other storage media) and deserialized later on to be resumed.

Using this functionality provided by Stackless (through a single module) is very easy and intuitive. It keeps the Python code very readable and understandable and it even improves the structure of the program. Common usage patterns are available from the authors of Stackless to help people new to concurrent programming understand principles of how Stackless is used. It allows creating custom tasklet functionality (ie. named tasklets), as well as custom channel functionality (ie. broadcast channels, sending of messages with a timeout, etc).

As I mentioned, scheduling is built in and it is up to a programmer to choose the type: preemptive or cooperative. With cooperative scheduling one has to be careful to write code so that tasklets run cooperatively. During the design and implementation of the tasklets, programmer should pay attention to run the scheduler manually if the operation within the tasklet might make other tasklets suffer for not being able to run. On the other hand, with preemptive scheduling, the scheduler itself is configured to interrupt and reschedule running tasklets. I found the cooperative scheduling more useful in my implementations since it gives more control. You can, however, using preemptive scheduling kindly ask the scheduler not to put your running code back to the scheduling queue. One useful idiom is the Sleeping Tasklets, which blocks the tasklets for a certain period of time. Interestingly, that idiom uses channels to accomplish this.

Documentation of Stackless is available on the website and it covers basic functionality. Also, it offers examples and common patterns (idioms). The module itself is very well documented and available in the python interactive shell by typing help(stackless). The community is active on the mailing list, where help is always available. Every now and then there's a good discussion about the advanced usages of Stackless, and I highly recommend subscribing to the list.

Issues exist, however. Current CPython implementation suffers from the infamous GIL (or, Global Interpreter Lock) which makes it difficult for Python to fully utilize multicore systems (almost every recent computer nowadays). For those who don't know the effects of GIL, it is a mechanism to keep multiple threads from modifying the same object at the same time. Only one thread that acquires a lock on the object may modify that object, and the interpreter controls the acquiring and releasing of the lock, especially around potentially slow operations (such as I/O).

There has been quite a lot of talk about the GIL and whether it should (or not) be removed from Python. Back in '99, few brave souls (Greg Stein and Mark Hammond) tried and removed GIL from the Python code using locks on all mutable structures. According to benchmarks, it actually slowed the execution of code. The results showed that the execution was twice as slow in single threaded environment than with the GIL. If ran on multi-CPU (multi-core) systems, there would be no actual performance gains by removing the GIL.

To truly make the code distributed between CPUs, the solution is to run several python processes and communicate tasklets between them, however that can get (very) complicated, to say the least. Luckily, Python 2.6 (next major release of Python, with final release just around the corner) comes with multiprocessing module. This module supports spawning processes in a similar fashion threading module is used. More so, since the module follows the same API as threading module, it makes refactoring of your projects which use threading a breeze. Process objects can communicate between each other using Queues or Pipes, the latter being bidirectional (both parties represent ends of the pipe, with send() and recv() methods available for sending and receiving).

Using multiprocessing module, you can even distribute your processes remotely, and on top of that, it is possible to create a pool of workers for running tasks. Finally, since the module uses subprocesses instead of threads, the effects of aforementioned GIL can be circumvented.

Stackless by design (because of the scheduler) does not allow tasklets to access (and modify) data structures at the same time. Moreover, in true concurrency fashion, it utilizes channels for passing the data between tasklets. It still has a notion of a shared state between tasklets, but of course, without the dangers. With multiprocessing on Python 2.6, Stackless Python programs will be much more scalable than they are currently, utilizing multi-cpu and multi-core environments more efficiently. That still, however remains to be seen, as porting of Stackless to Python 2.6 is a work in progress.

Enter Erlang! For those of you who don't know what it is, Erlang is a functional programming language (an entirely different approach to programming compared to, ie OOP) designed in Ericsson for the purpose of developing concurrent, distributed, fault-tolerant and (soft) real-time applications.

Being functional, it does not have a concept of a state, it has single variable assignments (just as they taught you in Math classes), dynamic typing, pattern matching of functions (with guards), etc. On top of that, it has extremely lightweight processes with no shared memory. Processes pass messages around to communicate between themselves. Erlang runtime supports very high number of concurrent processes and they are actually abstracted from the operating system. It also supports dynamic distribution of processes, both locally (over multiple CPUs or cores) or remotely (over the network to another Erlang runtime node). Thus, you can build large scale distributed applications that run on machines with many CPUs and on many machines in the network. Since there's no shared state or shared variables, all traditional problems related to concurrency simply disappear as the need for locks is removed.

Erlang does give a lot of headache to a lot of people, because of its syntax. Comparing the syntax with Python's, I can say (being biased and all) that I definitely prefer Python's. On the other hand, I quite like the syntax of Erlang, too. It looked quite bizarre at first, but going through the documentation and examples helped me understand the basics of it. I have just got delivered the Erlang book I purchased, and am very excited to learn more about Erlang. As I have always been more interested in building backend applications, using Erlang seems like a good choice worth learning more about.

As a follow up, next article on this topic will be sprinkled with some code examples, both in Stackless Python and Erlang. I really like learning about new programming languages and frameworks by actually implementing something useful using them, so I will try to do so with Erlang. Some ideas are popping into mind and in next articles I might elaborate more on that, too.