technology from back to front

Tools: Debian build-depends metapackages

When I’m doing development on an existing software project, and especially when I’m trying to bugfix something with a Debian package, I find that I install random packages I need to rebuild something, and then later on I’m wondering why I’ve got those installed. I tend to try to keep with the philosophy that the bits of software I’m using are pretty knowledgeable about whatever they’re intended to do, and so trusting them to make smart decisions is a good idea. For package managers, this means I should only keep track of the software that I actually use, and tell the package manager that everything else has been automatically installed and can therefore be removed when they cease to be a dependency of something I actually need. Having packages that I only installed for building something else without a record of that in the package management system breaks that mental model.

Enter dh-builddep-metapackage (the naming is inline with the names used by the debhelper scripts used for other debian packaging stuff). dh-builddep-metapackage builds build-dependency metapackages in order to ease package management for package rebuilders. In effect, it builds a “<package name>-builddep” package that has no content, but depends on everything that the existing package build-depends on. By using dh-builddep-metapackage to create metapackages rather than using “apt-get build-dep”, I keep a record in the package management about why I need a particular development package, and can remove the dependant packages when I’m no longer working with the relevant source package.

I think someone’s done this before, but some work with Google didn’t find it, and it was an interesting exercise anyway. Standard usage is “dh-builddep-metapackage -b <package name>”, which will create the metapackage data and build the package for you using dpkg-buildpackage. A folder called “<package name>-<package version>” will be created in the current local directory, and if an existing folder exists then dh-builddep-metapackage will refuse to overwrite it (unless you give the -o/–overwrite option).

So far I’ve used it a few times, and it’s quite nice to have those packages around to remind me about the dependencies. I’m considering building a full repository with *-builddep packages for everything in the Debian archives, but I’d like to make sure I’d use this enough first.

by
Tom Parker
on
12/04/10

Tools: Installing Visual Studio AddIns for All Users

Whilst writing the installer for WebGAC, I was faced with some challenges trying to make the Add-In install for all users on the system. The MSDN documentation for Add-In registration generally recommends placing the files into the user’s My Documents directory. It’s All Users solution is to place it into the Shared Documents directory. The problem I faced was that that directory has moved drastically on Windows 7 - to the point where as far as I can tell, Visual Studio (2008 at least) is no longer searching there by default.

It turns out, though, that there is an easy solution.

(more…)

by
paulj
on
05/03/10

Tools: WebGAC: Minding your .NET Dependencies

Managing binary dependencies in .NET can be a complicated task. For small projects, checking the dependencies into source control tends to work just fine. So does requesting that all developers have various binaries available in their GAC. Grow much bigger, or add more projects, and managing that starts to get very difficult.

The Java world has had a solution to this problem for a long time, in the form of Maven and Ivy. Remote servers store the binaries, and the build tool automatically downloads them on demand. WebGAC adds the core of this functionality to .NET, but without requiring you to switch build tools, or maintain a separate configuration file. Dependencies are specified just the same way as normal, but if you don’t have them when building your project, WebGAC will fetch them for you automatically.

WebGAC is available at http://github.com/paulj/webgac. Browse over there for more information and installation instructions, or continue reading here for more details.

(more…)

by
paulj
on
27/02/10

Tools: mercurial-server 0.8 released

mercurial-server home page

mercurial-server gives your developers remote read/write access to centralized Mercurial repositories using SSH public key authentication; it provides convenient and fine-grained key management and access control.

(more…)

by
Paul Crowley
on
10/11/09

Tools: simple build tool

I started using Maven at a company where we had 60+ Java projects all with their own individual Ant build file. Each build file was different and each project was structured completely differently. Porting the most active projects to Maven made this situation a lot saner, test code was always in the same location, the same commands achieved the same things on different projects and we could start to manage our library dependencies sensibly. However, some things just always seemed crazy about Maven - having to explain that Maven really did need to ‘download the whole internet’ just to clean compiled code out of its target directory soon gets painful!

Since I started working on some Scala projects at LShift I experienced further annoyances with Maven so I thought I would examine an alternative to Maven written in Scala - the simple build tool (sbt) http://code.google.com/p/simple-build-tool.

The first thing it did nicely was new project creation, this is much simpler than Maven, no multiple command line options or selections from very long lists, just a simple interactive script with questions that are easy to answer.

[bert] things%sbt
Project does not exist, create new project? (y/N/s) : y
Name: parsnip
Organization []: lshift
Version [1.0]:
Scala version [2.7.5]:
sbt version [0.5.5]:
:: retrieving :: sbt#boot
confs: [default]
2 artifacts copied, 0 already retrieved (9831kB/619ms) :: retrieving :: sbt#boot
confs: [default]
3 artifacts copied, 0 already retrieved (3395kB/39ms)
[success] Successfully initialized directory structure.
[info] Building project parsnip 1.0 using sbt.DefaultProject
[info]    with sbt 0.5.5 and Scala 2.7.5
[info] No actions specified, interactive session started. Execute ’help’ for more information.

This creates a project structure for you that is closely modelled on the standard directory layout used by Maven. There are two additional directories lib and project. The lib directory provides a standard place to keep any dependencies that aren’t available in a Maven repository. The project directory is where the project configuration is kept.

Secondly, there is no XML required to configure the project, the configuration is carried out by a properties file and a Scala file. Dependencies are managed by adding the values to the Scala file that defines the project configuration; for example:

import sbt._

class ParsnipProject(info: ProjectInfo) extends DefaultProject(info) {
  val derby     = "org.apache.derby" % "derby" % "10.4.1.3"
  val scalatest = "org.scala-tools.testing" % "scalatest" % "0.9.5" % "test"
}

This uses a simple DSL to replace your Maven or Ivy configuration file.

Thirdly, testing is easy! Maven can be a bit hit or miss running the three major Scala testing frameworks and it doesn’t report the actual error when a test fails - you need to dig through the test output. If a test fails with sbt the standard output you would get from the test runner is displayed in the console. Additionally, you can make sbt monitor your project continuously and re-test as code changes so you are informed of broken tests as soon as they break.

Fourthly, it won’t ‘download the internet’ unless you explicitly tell it. The command, sbt update, downloads dependencies but you can do sbt clean without having to download the internet.

[bert] things%sbt clean
[info] Building project parsnip 1.0 using sbt.DefaultProject
[info]    with sbt 0.5.5 and Scala 2.7.5
[info]
[info] == clean ==
[info] Deleting directory /Users/timc/tmp/things/target
[info] == clean ==
[success] Successful.
[info]
[info] Total time: 0 s
[success] Build completed successfully.
[info]
[info] Total build time: 1 s

So for Scala development sbt seems to be as good if not better than Maven. Extending sbt is also simpler, custom tasks and plugins are simple Scala functions or classes with no additional XML required.

by
tim
on
12/10/09

Tools: EvServer, Introduction: The tale of a forgotten feature

Long long time ago there was a WSGI spec. This document described a lot of interesting stuff. Between other very important paragraphs you could find a hidden gem:

[...] applications will usually return an iterator (often a generator-iterator) that produces the output in a block-by-block fashion. These blocks may be broken to coincide with mulitpart boundaries (for “server push”), or just before time-consuming tasks (such as reading another block of an on-disk file). [...]

It means that all WSGI conforming servers should be able to send multipart http responses. WSGI clock application theoretically could be written like that:
def clock_demo(environ, start_response):
    start_response(“200 OK”, [('Content-type','text/plain')])
    for i in range(100):
        yield “%s\n” % (datetime.datetime.now(),)
        time.sleep(1)
The problem is that way of programming just doesn’t work well. It’s not scalable, requires a lot of threads and can eat a lot of resources. That’s why the feature has been forgotten.

Until May 2008, when Christopher Stawarz reminded us this feature and proposed an enhancement to it. He suggested, that instead of blocking, like time.sleep(1), inside the code WSGI application should return a file descriptor to server. When an event happens on this descriptor, the WSGI app will be continued. Here’s equivalent of the previous code, but using the extension. With appropriate server this could be scalable and work as expected:
def clock_demo(environ, start_response):
    start_response(“200 OK”, [('Content-type','text/plain')])
    sd = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    try:
        for i in range(100):
            yield environ['x-wsgiorg.fdevent.readable'](sd, 1.0)
            yield “%s\n” % (datetime.datetime.now(),)
    except GeneratorExit:
        pass
    sd.close()
So I created a server that supports it: EvServer the Asynchronous Python WSGI Server


Implementation

I did my best to implement the latest of the three versions of Chris proposal. The code is based on my hacked together implementation of a very similar project django-evserver, which was created way before the extension was invented and before I knew about the WSGI multipart feature.

EvServer is very small and lightweight , the core is about 1000 lines of Python code. Apparently, due to the fact that EvServer is using ctypes bindings to libevent, it’s quite fast.

I did a basic test to see how fast it is. The methodology is very dumb, I just measure the number of handled WSGI requests per second, so as a result I receive only the server speed. The difference is clearly visible:
Server
Fetches/sec
evserver 4254
spawning with threads 1237
spawning without threads 2200
cherrypy wsgi server 1700
  
 
Description
 
So what really EvServer is?
  • It’s yet another WSGI server.
  • It’s very low levelish, the WSGI application has control on almost every http header.
  • It’s great for building COMET applications.
  • It’s fast and lightweight.
  • It’s feature complete.
  • Internally it’s asynchronous.
  • It’s simple to use.
  • It’s 100% written in Python, though it uses libevent library, which is in C.
What EvServer is not?
  • Unfortunately, it’s not mature yet.
  • It’s Linux and Mac only.
  • It’s not fully blown, Apache-like web server.
  • Currently it’s Python 2.5 only.

Examples

Admittedly using raw WSGI for regular web applications is a bit inconvenient. Fortunately decent web frameworks support passing iterators from the web application down to the WSGI server, throughout all the framework. On my list of frameworks that support iterators you can find: Django and Web.py.

Django

Django 1.0 supports returning iterators from views. This is Django code for the clock example:

def django_clock(request):
    def iterator():
        sd = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        try:
            while True:
                yield request.environ['x-wsgiorg.fdevent.readable'](sd, 1.0)
                yield ‘%s\n’ % (datetime.datetime.now(),)
        except GeneratorExit:
            pass
        sd.close()
    return HttpResponse(iterator(), mimetype=“text/plain”)
The problem is that this code is not going to work using the standard ./manage runserver development server. Fortunately, it’s very easy to integrate EvServer with Django, you only need to put that into settings.py:
INSTALLED_APPS = (
    [...]
    ‘django.contrib.sites’,
    ‘evserver’,             # <<< THIS LINE enables runevserver command)
Now you can test your app using ./manage runevserver.
Full source code for the example django application is in the EvServer examples directory.

Web.py

From the 0.3 version Web.py supports returning iterators. You can see it in action here:
class webpy_clock:
    def GET(self, name):
        web.header(‘Content-Type’,‘text/plain’, unique=True)
        environ = web.ctx.environ
        def iterable():
            sd = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) # any udp socket
            try:
                while True:
                    yield environ['x-wsgiorg.fdevent.readable'](sd, 1.0)
                    yield “%s\n” % (datetime.datetime.now(),)
            except GeneratorExit:
                pass
            sd.close()
        return iterable()
The full source code is included in EvServer example directory . You can run this code using command:
evserver --exec "import examples.framework_webpy; application = examples.framework_webpy.application"


Summary

I haven’t discussed any useful scenario yet, I’ll try to do that in the future post. I’m thinking of some interesting uses for EvServer - pushing the data to the browser using COMET.

LShift is recruiting!


 

Tools: Mix and match version control

LShift’s standard version control platform these days is Mercurial, but just before we adopted it, I started a project using Trac and Subversion, mostly because that’s what Trac does out of the box.

Later, we branched the project to add a large new project, and during that branch we converted from using ant to Maven and modularised the project, resulting in a lot of moved files. This made what we were doing on the branch a lot easier, but left us with a merge that subversion wasn’t capable of, even though we had used svn mv to move all the files.

What was capable of the merge was Mercurial. I imported the whole subversion repository using hg convert. See the convert extension documentation. It works exactly as described, but make sure you have 1.0.1 or later - I had problems with earlier versions.

The merge went reasonably well, so I was left with a merged version in a Mercurial repository. I was going to switch to using Mercurial, and its Trac integration, when I discovered that couldn’t cope with multiple repositories. The Trac instance was managing several different source projects, which would have to go into several mercurial repositories, which I couldn’t merge together in any satisfactory way.

There are several projects around to address this (I’ll probably cover them in another post), none of which are ready for production yet. I decided the most expedient thing would be to try and generate a patch for my merge, and apply it to the subversion repository.

A conventional patch would lose the version history of all the moved files, so I decided a git diff would do the job. You can certainly, with some patience, get git-svn to do this, and understand what it was doing. Lacking that patience, I wrote a script to do the job. It parses the git diff and deals with any directory creation needed, calls to svn mv, svn add, and svn rm as required by the diff. It actually turns out to be a bit more work than I was expecting, so I’ve published it here.

by
david
on
21/01/09

Tools: Firefox tabs are finally usable

If you use Firefox, go and install the Ctrl-Tab add-on.

Tabs are great for reducing clutter, but they fail to make life much easier because the tab navigation doesn’t support the common patterns of use. For example, I end up opening the same page in multiple tabs because it is quicker to do that than to hunt for it in the existing tabs. Eventually even that is unmanageable and I have to manually garbage collect tabs.

Ctrl-Tab makes tabs usable again by fixing the navigation. The key combination Control-Tab cycles through the tab history (in the same way as task switchers), so that a single tap take you to the previous tab you visited, which is almost always the thing you wanted. Successive taps of Tab show thumbnails of the tab contents while you flip through them. Better still, Control-A (Control-Shift-a) shows thumbnails of all open tabs, and lets you filter incrementally by typing in the auto-focussed box, then select the tab with the arrow keys or mouse. In a single swoop it’s now much easier for me to switch to something I have open than open it again – the way it should be.

by
mikeb
on
12/11/08

Tools: Simple inter-process locks

I recently faced a very common problem, how to make sure that only one instance of my program is running at a time on the host.

There are a lot of approaches that can be taken to solve this problem, but I needed a portable solution for Python.

My first idea was to use widely known IPC techniques to lock some global resource. In C I would just create a semaphore and lock it. One problem is that a semaphore is not unlocked when a process dies. Another issue is a lack of support of named semaphores for Python.

The best solution on Unix is to gain an exclusive write lock on a file using fcntl(LOCK_EX).

Of course it doesn’t work on Windows. But for this OS the solution is to take advantage of their mutex facilities using pywin32 module. I was surprised to see that this method works quite well.

It’s also possible to use the fact that only one process at a time can bind to specific tcp/ip port (unless you use SO_REUSEPORT). This is the most portable, but also the most obscure method.

Here’s the code for this inter process “locking”. It’s not really locking, because you can’t block and wait for a lock. All you can do is grab a lock or get an exception. But this is enough to make sure that there is only one process that’s using a resource. This is how you can use this module:

import interlocks, time

lock = interlocks.InterProcessLock("my resource name")
try:
    lock.lock()
except interlocks.SingleInstanceError:
    print "Other process has acquired this lock."
else:
    print "Press CTRL+C to release the lock..."
    while True: time.sleep(32767)

Test code for the interlocks module needs to open an external process that blocks the resource. The code is not perfect (race conditions), but should be enough for just a test case:

def execute(cmd):
    ''' spawn a new python process that will execute 'cmd' '''
    cmd = '''import time;''' + cmd + '''time.sleep(10);'''
    pid = os.spawnv(os.P_NOWAIT,'/usr/bin/python', ['/usr/bin/python', '-c', cmd])
    time.sleep(1) # poor man’s synchronization
    return pid

lock = interlocks.InterProcessLock(’test’)

# lock resource from other process
pid = execute(”import interlocks; a=interlocks.InterProcessLock(’test’);a.lock();”)
try: # fail to grab a lock
    lock.lock()
except interlocks.SingleInstanceError: print “success: the lock is blocked by spawned process”
else: print “FAILURE: the lock should be blocked by spawned process (pid=%i), but isn’t” % (pid,)

os.kill(pid, signal.SIGKILL)
time.sleep(1) # poor man’s synchronization

Coding the tests wasn’t so painful, much more problematic was to make tests run on Windows. Obviously we need an os.kill replacement for this platform. The next problem is to make os.spawnv() work on Windows at all: which slashes to use or how to encode spaces in the path. Another issue is that the process pid returned from os.spawnv() can’t be killed. It seems that the return value is not really a proper pid. Don’t waste your time like I did, use subprocess.Popen(). Fixed test code, without os.spawnv is included in the lib.

by
marek
on
05/11/08

Tools: Where did all my space go?

Over the last little while, I’ve started to suffer from lack of space on the hard disk in my laptop, which is ridiculous, since there’s an 80GB disk in there and there is no way I have that much data I need to hang on to. I decided to do something about it last week. The main part of the problem was to figure out what was eating all the space: du tells you exactly what’s using how much, but it’s hard to get a feel for where your space has gone by scanning through pages of du output. So I built a program to help.

spaceviz is a small Python program that takes the output of du -ak, and builds you a picture and HTML client-side imagemap of your space usage like this one:

Running it against the output of du -ak / showed me very clearly where all the space had gone: not only did I have a few seasons of various TV shows on my disk (which I already knew were there), but I had 11 GB of unneeded gzipped RDF data left over from a project that finished earlier this year (that I had forgotten about). Instant win!

To run it for yourself, check out the mercurial repository http://hg.opensource.lshift.net/spaceviz, and run

make veryclean all ROOT=/

replacing the ROOT=/ with a definition of ROOT that points at the directory tree you want to generate usage data for. The makefile will take care of running du and spaceviz.py for you. Edit the settings for WIDTH and HEIGHT in spaceviz.py to change the dimensions of the generated picture.

The program runs not only on Linux without fuss, but also on OS X so long as you have the netpbm port installed to convert the python-generated PPM file to a browser-accessible (and much more efficiently compressed!) PNG.

by
tonyg
on
29/10/08
2000-9 LShift Ltd, 1st Floor Office, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK +44 (0)20 7729 7060