Thursday, July 31, 2008

Python, the evolution

I just stumbled upon a project called code_swarm, which is an experiment in organic software visualisation. The project takes information from a source control system and creates a visual representation of the history of code commits. The end result looks quite awesome. Below is the code_swarm video of Python source code and its evolution.



The project's source code is available here, free of charge, of course. It is created by Michael Ogawa, using Processing environment. Great stuff!

Friday, July 11, 2008

Protocol Buffers

Google decided to open their tool for serializing structured data, called Protocol Buffers. It's language and platform neutral way of communicating data over networks or serializing it for storage. Interesting bit is that Google is using it in almost all of their projects.

On the surface, tt reminds me a lot of CORBA, especially the way you define message structures, but it differs from it a lot in terms of message exchange and serialization - you can store and/or communicate your data structure across the network by any means, unlike CORBA where you're forced to use CORBa message brokers. In my opinion, that's the main reason why CORBA was never so widely accepted.

How does it all work? First of all, you define your structure using a DSL and store it in a .proto file. You then compile that file using a tool and create data access classes for your language of choice. Classes can be generated for C++, Java and Python - my choice would be, as always, Python. Those generated classes are then used to create, populate, serialize and retrieve your protocol buffers messages.

The messages are very flexible - you can add new fields to your messages without breaking old code that's using them; they will simply ignore it. That functionality comes in very handy, especially for larger systems (think versioning and deployment).

Yes, but what about XML, you might ask? According to Google, PBs have many advantages over XML; they are:
  • simpler
  • 3 to 10 times smaller
  • 20 to 100 times faster
  • less ambiguous
  • generate data access classes that are easier to use programatically

I might disagree with the last one (think of JAXB in Java world), but I completely agree with the other ones - it's quite true that XML tends to be cumbersome and a big overkill, especially in environments where size and speed does matter.

For the end, I saved a very interesting quote from Google's PB pages:
Protocol buffers are now Google's lingua franca for data - at time of writing, there are 48,162 different message types defined in the Google code tree across 12,183 .proto files. They're used both in RPC systems and for persistent storage of data in a variety of storage systems.

Looks very interesting and very promising, considering Google is behind it. I'll follow up this article with some neat examples.

Tuesday, July 1, 2008

Back at work after a long vacation

I was away on vacation for 7 weeks. Now I'm back at work. Tough, I must say. My mailbox is a total mess, loads of new mail, meeting invitations, updates and changes, etc. Then an idea hit me! Our corporate email suite comes with Out of Office assistant, which helps you (and your coworkers) to manage the email while you're away on vacation (sending out auto-responders and such). It also has a very nice feature of reporting who emailed you and all that. I am still going through the emails I received, totally stressed out not to miss something important. I'm skimming through as fast as I can and it seems there's no end to it. I'm quite sure I missed something very important, although I'm trying my best to mark messages that I need to check again later. That means, of course, that I'll be reading my emails twice.

Now, the idea - wouldn't it be great to have a feature which would, upon opening your email suite after a long vacation, populate your inbox little-by-little. Let's say you have 120 messages which have been sent to you while you were on vacation. The suite would sent you a dozen messages every 10 minutes, so you can slowly read it and catch up. Preferably over a good cup of coffee. No stress, no worries, you can read your mail at a normal rate as you would if you didn't go on vacation. I'm sure it would take some people few days to catch up, but it sure beats reading 500+ emails in one hour.

Back to reading emails.