technology from back to front

Archive for the ‘Web’ Category

Getting back into front-end web development

I’ve been working on a small SPA (Single Page Application) – just HTML, CSS and JavaScript statically served and doing its thing entirely in the browser. I learned a great deal throughout the project, but here are some of the things that strike me as most valuable.

Get a good workflow going

I used Grunt to setup a nice build system that is mostly a joy during development. It took a while to evolve my Gruntfile, but now when I edit a file, the results are immediately refreshed in the browser (I don’t even have to hit cmd-R). I can deploy to S3 test, staging and live sites with a single command that takes about 3 seconds. My SASS files are compiled down to minified CSS, my JS is minified with source maps etc.

The hardest part of using Grunt is figuring out how to configure it and its many contrib plugins. I could have done with a definitive reference or perhaps I could have used Yeoman to give me an out of the box solution. However I recognised that I was always going to have to figure out the guts of Grunt so I think I really was better off bespoking it from the start. I’m glad I did as now I have a tight setup that does precisely what I want and that I understand completely.

Now it seems there is a new kid on the scene, Gulp – nicely introduced in this tutorial blog post. I will definitely be looking closely at that for my next project, with the piping approach looking like the key step beyond Grunt, along with nicer syntax. I’d also look at Browserify, for a nicer way to piece together the JS bits.

Learn JavaScript properly

To the uninitiated, JavaScript is fairly surprising in many subtle ways, and though I can grok the prototype-based inheritance fairly easily, the scoping rules caught me out repeatedly. This was especially the case as I tried to create JQuery plugins with private methods and state. Eventually a simple old article by grand-daddy of JavaScript writing Douglas Crockford gave me the vital clues I was missing.

Really I should just read his book, and I would recommend that anyone else doesn’t just attempt to learn JavaScript as they go, but takes some time to pro-actively figure out the core concepts – it will pay off in short order.

jQuery is non-negotiable

And the award for most indispensable library goes to: jQuery. Seriously, it should be baked into the browsers or the ECMAScript standard. The nicest thing about it is I can pretty much just guess at the API and be right most of the time, though reading the docs usually reveals new conveniences that go further than I even imagined.

Browser quirks can be a living nightmare

JavaScript itself is fairly reliable, especially with judicious use of mature libraries like jQuery that paper over the cross-browser DOM cracks. CSS in complicated scenarios is where it all seems to go wrong however.

It’s amazing how broken/different some browsers are. Here are just a few highlights, though every day brought tens of new oddities and associated workarounds.

  • Mobile Safari on iOS 7 reports the viewport height inconsistently (depending on how you access it) leading to bad layout and horrible JavaScript workarounds.
  • Use of -webkit-overflow-scrolling:touch causes the hardware accelerated renderer to kick in, resulting in various flickers, flashes and flitches with content not rendering.
  • IE 10 on Windows 8 shows back/forward overlays at the left/right of the screen when your mouse moves near them, obscuring links in those locations.
  • Chrome running on Retina Macs suffers from strange graphical glitches when running CSS Animations, but is fine with CSS Transitions. However other browsers/platforms really need CSS Animations to get smooth, hardware accelerated movement. In my case it was necessary to implement both approaches and select using browser detection.
by
Sam Carr
on
06/03/14

Two Magnolias, one container

We are using Magnolia in a number of projects here at LShift. I have been feeling that Magnolia has a simple way to do most things, but often there are a number of other plausible alternatives that gradually lead you into wasting enormous amounts of time.

Here I want to present a simple way to get both author and public instances of Magnolia running in your dev environment in the same container. It may seem very obvious. If so, good. This was not the first way I tried, and it cost me a lot of time.

We will be aiming for:

  1. Easily deploying Magnolia onto a stage or production environment — one file, one or two configuration parameters only.
  2. Making it easy for a tester to launch local public and author instances of Magnolia that talk to each other correctly.
  3. Making it easy for a developer to debug Magnolia, having both instances running under the control of the IDE.

Preconditions

I will be assuming that you have a parent project with a child project that represents your webapp. I also will assume that you have copied the contents of src/main/webapp/WEB-INF/config from the magnolia-empty-webapp project into your own webapp project. The source for this is in the ce-bundle at git.magnolia-cms.com/gitweb/?p=ce-bundle.pub.git, but assuming you have magnolia-empty-webapp as a dependency (as recommended) you should be able to pick it up from your target directory.

I will be using Tomcat 7 as Tomcat is recommended by Magnolia and 7 is the latest stable version at the time of writing.

Deploying Magnolia to Stage or Production environments

For deployment to stage or production you don’t want both author and public deployed in the same container, or even on the same machine; so we only need to be able to configure a single running instance to be either author or public.

This is quite simple and well documented. In your webapp project, open your src/main/webapp/WEB-INF/web.xml (that you copied from the empty webapp project as described above) and look for the lines:

  <context-param>
    <param-name>magnolia.initialization.file</param-name>
    <param-value>
      WEB-INF/config/${servername}/${contextPath}/magnolia.properties,
      WEB-INF/config/${servername}/${webapp}/magnolia.properties,
      WEB-INF/config/${servername}/magnolia.properties,
      WEB-INF/config/${contextPath}/magnolia.properties,
      WEB-INF/config/${webapp}/magnolia.properties,
      WEB-INF/config/default/magnolia.properties,
      WEB-INF/config/magnolia.properties
    </param-value>
  </context-param>

You will need to add your own line at the top of the <param-value> section:

      WEB-INF/config/${contextAttribute/instanceName}/magnolia.properties,

Then when you deploy your WAR, you can simply set the instanceName environment variable to magnoliaPublic or magnoliaAuthor depending on what type of instance you want. As you can see from the fragment of web.xml above, this will make the settings in src/main/webapp/WEB-INF/config/magnoliaAuthor/magnolia.properties or src/main/webapp/WEB-INF/config/magnoliaAuthor/magnolia.properties active, respectively. Ultimately you will want to make more magnolia.properties files in more subdirectories (called, perhaps, stageAuthor, productionPublic and so on) with appropriate settings for those environments and you can simply make instanceName refer to the appropriate subdirectory.

Local Magnolia from the command line

Now, it would seem plausible that this method can be made to make your local testing environment work. Plausible, but wrong. This is the difficult way. You’ll start writing your context.xml files, then you’ll need a server.xml file, then before you know it you’ll be building your own Tomcat so that you can manage it all.

The “secret” is to use the fact that the web.xml already refers to the context path, in the form of the line:

      WEB-INF/config/${contextPath}/magnolia.properties,

(as well as in another line which we won’t concern ourselves with). This means that, instead using an environment variable, you can deploy the same WAR file to two different context paths and Magnolia will set itself up differently for each. And if you choose the paths /magnoliaAuthor and /magnoliaPublic you will automatically pick up the properties files provided by the empty webapp and all will be fine — Magnolia even sets up the author instance to point at http://localhost:8080/magnoliaPublic by default, so you won’t have to configure it yourself!

Well, actually, it’s not all fine. If you try this, you’ll find that one of your instances will refuse to start, complaining that its repository is already locked. Of course, they are trying to use the same repository. Fix this by adding a line similar to the following to magnoliaPublic/magnolia.properties:

magnolia.repositories.home=${magnolia.home}/repositories-public

The name of the subdirectory is not important. Note that, as it stands, this will change where the stage and production deployed Magnolias you configured above store their data. If that bothers you, now might be a good time to make your productionPublic/magnolia.properties and similar files.

So, how do we get that running painlessly so that your tester doesn’t keep asking you how to do it?

Add the Tomcat Maven plugin to your webapp’s pom.xml, and configure it to launch your WAR twice on two different context paths:

      <plugin>
        <groupId>org.apache.tomcat.maven</groupId>
        <artifactId>tomcat7-maven-plugin</artifactId>
        <version>2.2</version>
        <configuration>
          <webapps>
            <webapp>
              <groupId>com.my.group</groupId>
              <artifactId>my-webapp</artifactId>
              <version>1.0-SNAPSHOT</version>
              <type>war</type>
              <asWebapp>true</asWebapp>
              <contextPath>/magnoliaAuthor</contextPath>
            </webapp>
            <webapp>
              <groupId>com.my.group</groupId>
              <artifactId>my-webapp</artifactId>
              <version>1.0-SNAPSHOT</version>
              <type>war</type>
              <asWebapp>true</asWebapp>
              <contextPath>/magnoliaPublic</contextPath>
            </webapp>
          </webapps>
        </configuration>
      </plugin>

Replacing com.my.group and my-webapp with your own webapp’s group and artifact id.

Now you can run your Magnolia simply with:

mvn tomcat7:run-war

For reasons best known to the Tomcat plugin, boring old mvn tomcat7:run doesn’t work — deploying only one Magnolia in its default location. Sorry.

The instances are available, of course, at http://localhost:8080/magnoliaAuthor and http://localhost:8080/magnoliaPublic.

Local Magnolia from your IDE

Now you’re on the home straight. Here’s how I configure the Tomcat plugin in Eclipse:

Firstly, you need to get Eclipse to know about Tomcat 7. The foolproof way to do this is as follows: Window -> Preferences -> Server -> Runtime Environments -> Add… -> Apache Tomcat v7.0 -> Next. Now give it a location that is writable by you in the “Tomcat installation directory” box and click “Download and Install…”; using your pre-existing Tomcat might not work if it isn’t laid out in the way Eclipse expects. Now Finish and open the Servers view.

You can now add a new Tomcat 7 server and double-click on it. Tick “Publish module contexts to separate XML files”, set the start timeout to something large like 480 seconds, and in the modules tab add your webapp project twice; once with the path /magnoliaAuthor and once with the path /magnoliaPublic.

Now you can launch and debug your two instances of Magnolia from within your IDE!

by
Tim Band
on
03/03/14

Grunt uglify file specs

I struggled a bit finding relevant examples of Gruntfile configuration for Uglify, so having solved a few specific problems myself, here’s what I came up with.

This is just a snippet from the whole Gruntfile of course, and contains half-decent comments already, though I’ll provide some extra explanations below to point out the most interesting bits.

// Variables used internally within this config.
conf: {
  app: 'app',
  dist: 'dist',
  // Just our own custom scripts.
  script_files: ['scripts/*.js'],
  // All scripts that should be minified into final result.
  // Ordering is important as it determines order in the minified output and hence load order at runtime.
  // We don't include jquery (though we could) as it's better to get it from Google where possible.
  minify_js_files: [
      'scripts/vendor/modernizr/modernizr.custom.js',
      '<%= conf.script_files %>',
      'scripts/polyfills/**/*.js']
},

uglify: {
  options: {
    banner: '/*! <%= pkg.name %> <%= grunt.template.today("yyyy-mm-dd") %> */\n',
    sourceMap: '<%= conf.dist %>/scripts/source.map.js',
    sourceMapRoot: '/scripts',
    sourceMappingURL: '/scripts/source.map.js',
    sourceMapPrefix: 2
  },
  // For dev, effectively just concatenate all the JS into one file but perform no real minification.
  // This means that the HTML is the same for dev and prod (it just loads the single .js file) but
  // debugging in the browser works properly in the dev environment. It should work even when fully
  // minified, given the source maps, but practice shows that it doesn't.
  dev: {
    options: {
      report: false,
      mangle: false,
      compress: false,
      beautify: true
    },
    files: [{
      expand: true,
      cwd: '<%= conf.app %>',
      src: '<%= conf.minify_js_files %>',
      dest: '<%= conf.dist %>/scripts/main.min.js',
      // Because we want all individual sources to go into a single dest file, we need to use this
      // rename function to ensure all srcs get the same dest, otherwise each would get a separate
      // dest created by concatenting the src path onto dest path.
      rename: function(dest, src) { return dest; }
    }]
  },
  prod: {
    options: {
      banner: '/*! <%= pkg.name %> <%= grunt.template.today("yyyy-mm-dd") %> */\n',
      report: 'min',
      mangle: true,
      compress: true
    },
    files: '<%= uglify.dev.files %>'
  }
},

Use of a rename function for configuring file srcs and dests

I was really struggling to come up with src/dest configuration for Uglify that pushed all of my source files into a single minified dest file. To be fair, this is trivially easy in the common case, as you can simply use files: { ‘dest_path’: ['file_1', 'file2'] }.

However I have my list of source files in <%= conf.minify_js_files %> and the paths therein do not include the root app/ directory, because this works out for the best in various other Grunt tasks (not shown) where I use cwd in the files block to supply that root dir. Unfortunately, without my custom rename function, a separate dest is calculated for each src file, by concatenating the src path with the dest, so instead of one minified JS file we get lots of individual files sprayed over all sorts of unintended locations. The trivial rename function I’ve used overrides those calculated dest locations to our originally intended single dest. Where different srcs have the same dest, the grunt-contrib-uglify plugin has the good sense to assume you want to merge their output. And hence we get the result we want. To be clear, this is only complicated because I want to use cwd in the file config rather than using the simpler approach.

Re-using files blocks in multiple places

You can share common options amongst multiple targets by putting them at the top level of the task and overriding/extending as required in the specific targets. However you can’t do this when specifying files. In my case I want to specify the same files for both dev and prod Uglify targets, so I specify them in full for dev then use Grunt’s templating facility to refer to them from prod with files: ‘<%= uglify.dev.files %>’.

Theoretically I could have put the definition in the conf block at the top, but it’s specific to Uglify and only used there so I prefer it to be local to the Uglify task. It seems obvious now that I can refer back to it like this, but at the time I struggled to see how to achieve it. I think I had a blind spot for the generic nature of the templating mechanism, having only used it in a rigid way for config previously, and still being very new to Gruntfiles.

Uglify may break JS debugging

I found that my minified JS files could not be successfully debugged in any browsers. I could see the original un-minified code thanks to the source maps and I could set breakpoints, but they usually wouldn’t trigger, or if I did break (e.g. with JS ‘debugger’ command in my code) it was impossible to get variable values. All very frustrating.

Whilst I’m developing I use Grunt’s watch task to serve my files and to auto-process modifications on the fly, so in that mode I turn off all the actual minification features of Uglify and effectively just concatenate the files together into one. Because it’s still the same single file with the same name as in production, I can use identical static HTML to include my JS script. The source maps are still there and allow me to see the individual files in the browser debugger.

by
Sam Carr
on
08/01/14

Documenting an HTTP API with Swagger

I recently tried out Swagger, for documenting an HTTP API. The big win with Swagger is that it provides a sweet HTML UI to browse your API docs and experiment with sending requests and viewing responses, which is a great experience for other developers that are trying to get to grips with your API. Try out their demo of the Swagger UI, for a simple petstore example.

Swagger petstore example - screenshot

Swagger is effectively three things ‘architecturally’:

  • A specification for the JSON files, which contain your API documentation in the abstract
  • Various “generator” tools and libraries (many third party) for producing those JSON files in the first place, either statically or dynamically
  • Swagger UI, a static HTML/JS app which consumes those JSON files and presents the nice UI.

Ideally you use a generator that extracts the API documentation for free, right from your API code. Many such generators are available for various languages and HTTP/REST frameworks, so you may have to do very little to get a free lunch here. However typically you’d expect to use extra annotations to further document the API with the useful human-facing semantic information which isn’t present in the raw code, and further annotations may be required to clarify object serialisation etc. Still, this is a pretty good facsimile of the promised land, where documentation stays in sync with the code and never goes stale!

In our specific case we were working with an app written in Go, so could potentially have used the go-restful library for our REST services, which has Swagger support built into it. However we were already committed to another library that didn’t have that support and being new to Swagger we couldn’t be sure if it was worth switching libraries or wiring up our own swagger integration. We decided to prototype a Swagger solution by hand-crafting the JSON files in the first instance, to see if we (and our users) liked the results. This showed up a particular challenge that is worth covering here.

You can’t do URL hierarchies with static file serving

A typical REST API will have URL hierarchies such as /users (that lists users) /users/fred-smith (details for a specific user) and indeed the Swagger JSON file URLs consumed by Swagger UI are assumed to be in this sort of hierarchy. Swagger UI consumes Swagger JSON files via HTTP: you give it the URL of the main JSON “resource listing” file which provides URLs for the subordinate “API declaration” files. If that resource listing is served from /main, it expects the API declarations to be at /main/user, /main/product etc. and this is hardcoded into the way it constructs URLs. Unfortunately if we want to provide these JSON files by simply serving them via Nginx, straight from disk with no smarts, we’re out of luck as your average filesystem cannot have both a file “main” and a directory “main” in the same parent directory. You just can’t do it, so you can’t serve up a hierarchy like that from static files.

Obviously you could configure your web server more intricately, mapping individual URLs to individual files to construct the hierarchy. This isn’t appealing however, especially as Swagger UI itself can be served statically (it’s just static HTML, JS, CSS etc.) and we are simply including our JSON files within its directory structure. Three simple lines of config in Nginx should be enough, to serve up swagger-ui and our included JSON files:

location /api-docs {
    alias /opt/swagger-ui;
}

The root problem here is that Swagger UI is extremely simplistic about how it interprets paths in the top-level resource listing JSON. It assumes that the paths to the individual API declaration files can simply be concatenated to the resource listing path, as if they are laid out in a pure hierarchy as sub-resources. If the resource listing is at /api-doc.json and it references a path “users.json” then Swagger UI concatenates these and looks for the API declaration at /api-doc.jsonusers.json. This looks especially bad if you have a .json extension and no leading / on the path. By fixing those two problems we get a bit closer but it’s still looking for /api-doc/users and as mentioned above, we can’t have both a file and a directory named “api-doc” in the filesystem so we are stuck. As an aside, losing the file extension is worth doing regardless, as Swagger UI uses the full name as the title for each section of the docs and you really want “users” rather than “users.json” as your heading.

The trick to win the day here is to use a path like “/../users” in the resource listing. Then the concatenated path is /api-doc/../users which is ultimately resolved to just /users. That being the case, we can put our JSON files “api-doc” and “users” in the same directory (even though Swagger likes to consider them as hierarchical) and they will link together correctly. If you do want the API declaration files to be down a level, you could use “/../apis/users” and put them in an “apis” directory one level deeper than the resource listing file. The key here is that we don’t have to have a file and directory with the same name.

by
Sam Carr
on
27/11/13

Embedded video and progressive download: A Quiz

I will provide you with two video files, video1.flv and video2.wmv, you need to embed them on the page and ensure that they use progressive download. Both video files are greater in size than 1GB so it will be obvious whether they are playing before they have completely downloaded. You will need to use the flash video player that I have provided for the flash video. Which one of the HTML snippets shown below should you use?

Snippet A

<object type="application/x-shockwave-flash" data="/player.swf" >
  <param name="movie" value="/player.swf"/>
  <param name="FlashVars" value="flv=/video1.flv"/>
</object>

<object type="video/x-ms-wmv">
  <param name="FileName" value="/video2.wmv"/>
</object>

Snippet B

<object type="application/x-shockwave-flash" data="/player.swf" >
  <param name="movie" value="/player.swf"/>
  <param name="FlashVars" value="flv=http://myserver.lshift.net/video1.flv"/>
</object>

<object type="video/x-ms-wmv">
  <param name="FileName" value="http://myserver.lshift.net/video2.wmv"/>
</object>

Read more…

by
tim
on
13/02/11

A Custom ASP.Net Navigation Component for EpiServer CMS

LShift have used the EpiServer CMS on several customer projects and it generally does most things you would want
to do with a CMS in a simple way. EpiServer is a .Net based CMS and if you understand ASP.NET templated pages and templated controls it is very straightforward with a minimal learning curve.

One challenge I faced on a recent project was to implement a particular HTML navigation design using EpiServer. The HTML design called for the navigation to be rendered as nested HTML lists with the current section of the site annotated with a particular class.

For example if you were looking at “Tasty Fish” in the “Cat Food” section of the site the HTML
should look something like this:

<ul>
 <li>Dog Food
      <ul>
          <li>Meaty Bones</li>
        </ul>
 </li>
 <li class="selected">Cat Food
     <ul>
          <li>Tasty Fish</li>
     </ul>
 </li>
</ul>

On initial inspection the EpiServer CMS appears to have two controls that may help, the EpiServer:MenuList and the EpiServer:PageTree. I first attempted to use the EpiServer:MenuList, this allowed me to do this:

   <ul>
      <li>Dog Food</li>
       <li>Cat Food</li>
   </ul>
 <ul>
      <li class="selected">Tasty Fish</li>
    </ul>

This isn’t quite what the design required, the complete site navigation tree needed to be rendered since CSS was being used to show and hide menus in response to mouse rollovers.

So for attempt two I tried the EpiServer:PageTree component; this component is designed to render a whole tree of pages so it should be an appropriate solution. It is a very flexible component and provides lots of templates for customising the layout based upon the state of the tree. This is what I ended up with:

 <ul>
      <li>Dog Food
          <ul>
              <li>Meaty Bones</li>
            </ul>
     </li>
     <li>Cat Food
          <ul>
              <li class="selected">Tasty Fish</li>  <!-- OH NO THIS IS WRONG -->
            </ul>
     </li>
 </ul>

This was very close! However it didn’t meet the design requirement; the top level item that contained the current page needed to be tagged with the CSS class, not the item corresponding to the current page. There didn’t seem to be an easy way to achieve this with the EPiServer components.

I decided I probably need some type of custom control, I then proceeded to write three implementations of a navigation control moving from sinful generation of HTML in a code behind, through my own templated control until arriving at the obvious solution using the asp:ListView control and a simple code behind. This was a nice solution because it uses a standard ASP.NET component in a standard way, the complication of tagging the selected top level item could be hidden away in a small code behind, and the markup was completely under the control of the HTML developer.

The navigation section of the ASP page looked like this:

   <asp:ListView ID="Level1" runat="server" ItemPlaceHolderID="Level1Item">
      <LayoutTemplate>
          <ul><asp:PlaceHolder ID="Level1Item" runat="server"/></ul>
        </LayoutTemplate>
     <ItemTemplate>
            <li class='<%# ((Boolean)Eval("Selected")) ? "selected" : "" %>'><%# Eval("Name") %>
                <asp:ListView ID="Level2" runat="server" ItemPlaceHolderID="Level2Item">
                  <LayoutTemplate>
                      <ul><asp:PlaceHolder ID="Level2Item" runat="server"/></ul>
                    </LayoutTemplate>
                 <ItemTemplate>
                        <li><%# Eval("Name") %>
                 </ItemTemplate>
               </asp:ListView>
           </li>
     </ItemTemplate>
   </asp:ListView>

This is a straightforward usage of nested ListViews and ASP data binding expressions, all of the markup is visible and it can be explained to an HTML developer in a short amount of time. New navigation levels can be added in exactly the same way that the Level 2 navigation was added to the Level1 navigation. The ternary operator within the data binding expression, class='<%# ((Boolean)Eval("Selected")) ? "selected" : "" %>', determines if the navigation item is selected, this is a standard mechanism for conditional rendering with ASP.NET data bound controls.

This was combined with a page behind like this:

 protected override void OnLoad(System.EventArgs e)
  {
       base.OnLoad(e);

     Level1.DataSource = BuildMenuItems();
       Level1.DataBind();
  }

   private List<MenuItem> BuildMenuItems()
   {
       List<MenuItem> menuItems = new List<MenuItem>();

        PageData homePage = GetPage(PageReference.StartPage);
       foreach(PageData child in GetChildren(homePage.PageLink))
       {
           if(child.VisibleInMenu)
         {
               MenuItem item = CreateMenuItem(child, true);
                item.Selected = findPage(CurrentPage.PageGuid, child);
              menuItems.Add(item);
            }
       }

       return menuItems;
   }

   private MenuItem CreateMenuItem(PageData page, Boolean includeChildren)
 {
       MenuItem item = new MenuItem(page.PageName);
        item.Url = page.LinkURL;

        if (includeChildren)
        {
           PageDataCollection children = GetChildren(page.PageLink);
           foreach (PageData child in children)
            {
               if (child.VisibleInMenu)
                {
                   item.Children.Add(CreateMenuItem(child, true));
             }
           }
       }

       return item;
    }

   private Boolean findPage(Guid id, PageData parent)
  {
       if (id == parent.PageGuid) return true;

     foreach (PageData page in GetChildren(parent.PageLink))
     {
           if (page.PageGuid == id)
            {
               return true;
            }
           if(findPage(id, page))
          {
               return true;
            }
       }

       return false;
   }

With a helper class MenuItem defined like this:

public class MenuItem
{
  public MenuItem(String name)
    {
       this.Name = name;
   }

   public String Name { get; set; }
    public String Url { get; set; }
 public Boolean Selected { get; set; }
   private List<MenuItem> children = new List<MenuItem>();
 public List<MenuItem> Children {
      get
     {
           return children;
        }
       set
     {
           children = value;
       }

   }

}

The page behind creates MenuItem instances for each page in the navigation. The top level item gets tagged as selected only if the current page is one of its children. This is a reasonable amount of code to write but it was the smallest solution that solved the problem and made the HTML obvious and available for modification by HTML developers.

by
tim
on
29/11/09

Untangling the BBC’s data feeds

Recently, Alan Ogilvie from A&Mi at the BBC announced that they were developing a “Feeds Hub”, and outlined their ambitions for it.

He also mentioned LShift, RabbitMQ and open source, and I would like to explain, from our point of view, what this project is and how we’re working with the BBC.

What is a “Feeds Hub”?

Alan describes the central problem they want to solve:

The number of new projects across the BBC starting to use feeds in creative ways is growing very quickly – just think of spaghetti… on a massive scale.

So what do we do? What are the options?

We could go down the route of gathering together a centralised ‘Feed Usage’ committee with members across the BBC, to ‘federate’ feeds so that they are all produced in the same way but, in practice, this never truly works and is likely to stifle creativity. Often it is quite difficult to convince people to work together when they have already experienced the freedom of doing what they want – often they are concerned that their projects will be delayed. Not all feeds sources that we use or want to use are under our control, things like Twitter, Flickr, blogs, etc. Federation will never solve all our problems anyway – for example, it can’t help when a source feed is turned off, it doesn’t monitor failures.

The idea is, then, is to bring the spaghetti under control; not by mandating things be done a certain way, but by overlaying a bunch of management and monitoring tools that would otherwise be ad-hoc or not exist.

We also want to enable people to discover, reuse and adapt existing feeds, rather than reinvent them. Again, not by enforcement, but by making it easier to do so than to not.

And we’re not just talking about RSS — there are (at the BBC and in general) many different protocols and formats flying about.

Technically-speaking, this adds up to a couple of pieces of kit: a platform for relaying feeds through, that supports routing, transformation and distribution by a number of different means; and, a user interface for discovering, creating, managing and monitoring these feeds.

How are LShift involved?

In short: LShift are developing the core technology, helping the BBC shepherd the various strands of the project along, and helping engage with developers to build the open source aspect of the project (about which more in a bit).

LShift are the progenitors of RabbitMQ, a message broker implementing AMQP. Over the last few years we’ve been thinking about and experimenting with different applications of messaging (and not just AMQP); for example, Rabbiter, which puts a Twitter-like spin on XMPP.

In the meantime, RabbitMQ itself has gained client libraries, gateways, adapters, and a smart, active community, to the point where it’s no longer just an AMQP message broker — it’s becoming more like a universal messaging adapter.

So we were very enthused when we heard that the BBC wanted a feeds hub, because it seemed to bring together lots of what we’d been thinking about abstractly, as well as new ideas and problems to solve, and give it all a concrete purpose.

When and how will it be open source?

We’re working on a prototype, and our plan is to make the source public as soon as it’s fit for consumption. We hope this will be in the next month.

In the meantime, I may talk about some of the core technical ideas, and our plans, here on our blog; and, of course, you can follow LShift on Twitter and the Radiolabs blog.

by
mikeb
on
08/05/09

Reverse HTTP == Remote CGI

I’ve been working recently on Reverse HTTP, an approach to making HTTP easier to use as the distributed object system that it is. My work is similar to the work of Lentczner and Preston, but is independently invented and technically a bit different: one, I’m using plain vanilla HTTP as a transport, and two, I’m focussing a little more on the enrollment, registration, queueing and management aspects of the system. My draft spec is here (though as I’m still polishing, please excuse its roughness), and you can play with some demos or download and play with an implementation of the spec.

Comments welcome!

by
tonyg
on
08/03/09

Streamlining HTTP

HTTP/1.1 is a lovely protocol. Text-based, sophisticated, flexible. It
does tend toward the verbose though. What if we wanted to use HTTP’s
semantics in a very high-speed messaging situation? How could we
mitigate the overhead of all those headers?

Now, bandwidth is pretty cheap: cheap enough that for most
applications the kind of approach I suggest below is ridiculously far
over the top. Some situations, though, really do need a more efficient
protocol: I’m thinking of people having to consume the [OPRA][] feed,
which is fast approaching 1 million messages per second ([1][], [2][],
[3][]). What if, in some bizarre situation, HTTP was the protocol used
to deliver a full OPRA feed?

### Being Stateful

Instead of having each HTTP request start with a clean slate after the
previous request on a given connection has been processed, how about
giving connections a memory?

Let’s invent a syntax for HTTP that is easy to translate back to
regular HTTP syntax, but that [avoids repeating ourselves][DRY] quite
so much.

Each line starts with an opcode and a colon. The rest of the line is
interpreted depending on the opcode. Each opcode-line is terminated
with CRLF.

V:HTTP/1.x Set HTTP version identifier.
B:/some/base/url Set base URL for requests.
M:GET Set method for requests.
<:somename Retrieve a named configuration
>:somename Give the current configuration a name
H:Header: value Set a header
-:/url/suffix Issue a bodyless request
+:/url/suffix 12345 Issue a request with a body

Opcodes `V`, `B`, `M` and `H` are hopefully self-explanatory. I’ll
explore `<` and `>` below. The opcodes `-` and `+` actually complete
each request and tell the server to process the message.

Opcode `-` takes as its argument a URL fragment that gets appended to
the base URL set by opcode `B`. Opcode `+` does the same, but also
takes an ASCII `Content-Length` value, which tells the server to read
that many bytes after the CRLF of the `+` line, and to use the bytes
read as the entity body of the HTTP request.

`Content-Length` is a slightly weird header, more properly associated
with the entity body than the headers proper, which is why it gets
special treatment. (We could also come up with a syntax for indicating
chunked transfer encoding for the entity body.)

As an example, let’s encode the following `POST` request:

POST /someurl HTTP/1.1
Host: relay.localhost.lshift.net:8000
Content-Type: text/plain
Accept-Encoding: identity
Content-Length: 13

hello world

Encoded, this becomes

V:HTTP/1.1
B:/someurl
M:POST
H:Host: relay.localhost.lshift.net:8000
H:Content-Type: text/plain
H:Accept-Encoding: identity
+: 13
hello world

Not an obvious improvement. However, consider issuing 100 copies of
that same request on a single connection. With plain HTTP, all the
headers are repeated; with our encoded HTTP, the only part that is
repeated is:

+: 13
hello world

Instead of sending (151 * 100) = 15100 bytes, we now send 130 + (20 *
100) = 2130 bytes.

The scheme as described so far takes care of the unchanging parts of
repeated HTTP requests; for the changing parts, such as `Accept` and
`Referer` headers, we need to make use of the `<` and `>`
opcodes. Before I get into that, though, let’s take a look at how the
scheme so far might work in the case of OPRA.

### Measuring OPRA

Each OPRA quote update is [on average 66 bytes
long](http://www.nanex.net/OPRADirect/OPRADirect.htm#3), making for
around 63MB/s of raw content.

Let’s imagine that each delivery appears as a separate HTTP request:

POST /receiver HTTP/1.1
Host: opra-receiver.example.com
Content-Type: application/x-opra-quote
Accept-Encoding: identity
Content-Length: 66

blablablablablablablablablablablablablablablablablablablablablabla

That’s 213 bytes long: an overhead of 220% over the raw message
content.

Encoded using the stateful scheme above, the first request appears on
the wire as

V:HTTP/1.1
B:/receiver
M:POST
H:Host: opra-receiver.example.com
H:Content-Type: application/x-opra-quote
H:Accept-Encoding: identity
+: 66
blablablablablablablablablablablablablablablablablablablablablabla

and subsequent requests as

+: 66
blablablablablablablablablablablablablablablablablablablablablabla

for an amortized per-request size of 73 bytes: a much less problematic
overhead of 11%. In summary:

Encoding Bytes per message body Per-message overhead (bytes) Size increase over raw content Bandwidth at 1M msgs/sec
Plain HTTP 66 147 220% 203.1 MBy/s
Encoded HTTP 66 7 11% 69.6 MBy/s

Using plain HTTP, the feed doesn’t fit on a gigabit ethernet. Using
our encoding scheme, it does.

Besides the savings in terms of bandwidth, the encoding scheme could
also help with saving CPU. After processing the headers once, the
results of the processing could be cached, avoiding unnecessary
repetition of potentially expensive calculations such as routing,
authentication, and authorisation.

### Almost-identical requests

Above, I mentioned that some headers changed, while others stayed the
same from request to request. The `<` and `>` opcodes are intended to
deal with just this situation.

The `>` opcode stores the current state in a named register, and the
`<` opcode loads the current state from a register. Headers that don't
change between requests are placed into a register, and each request
loads from that register before setting its request-specific headers.

To illustrate, imagine the following two requests:

GET / HTTP/1.1
Host: www.example.com
Cookie: key=value
Accept: HTTP Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

GET /style.css HTTP/1.1
Host: www.example.com
Cookie: key=value
Referer: http://www.example.com/
Accept: text/css,*/*;q=0.1

One possible encoding is:

V:HTTP/1.1
B:/
M:GET
H:Host: www.example.com
H:Cookie: key=value
>:config1
H:Accept: HTTP Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
-:
<:config1
H:Referer: http://www.example.com/
H:Accept: text/css,*/*;q=0.1
-:style.css

By using `<:config1`, the second request reuses the stored settings
for the method, base URL, HTTP version, and `Host` and `Cookie`
headers.

### It’ll never catch on, of course — and I don’t mean for it to

Most applications of HTTP do fine using ordinary HTTP syntax. I’m not
suggesting changing HTTP, or trying to get an encoding scheme like
this deployed in any browser or webserver at all. The point of the
exercise is to consider how low one might make the bandwidth overheads
of a text-based protocol like HTTP for the specific case of a
high-speed messaging scenario.

In situations where the semantics of HTTP make sense, but the syntax
is just too verbose, schemes like this one can be useful on a
point-to-point link. There’s no need for global support for an
alternative syntax, since people who are already forming very specific
contracts with each other for the exchange of information can choose
to use it, or not, on a case-by-case basis.

Instead of specifying a whole new transport protocol for high-speed
links, people can reuse the considerable amount of work that’s gone
into HTTP, without paying the bandwidth price.

### Aside: AMQP 0-8 / 0-9

Just as a throwaway comparison, I computed the minimum possible
overhead for sending a 66-byte message using AMQP 0-8 or 0-9. Using a
single-letter queue name, “`q`”, the overhead is 69 bytes per message,
or 105% of the message body. For our OPRA example at 1M messages per
second, that works out at 128.7 megabytes per second, and we’re back
over the limit of a single gigabit ethernet again. Interestingly,
despite AMQP’s binary nature, its overhead is much higher than a
simple syntactic rearrangement of a text-based protocol in this case.

### Conclusion

We considered the overhead of using plain HTTP in a high-speed
messaging scenario, and invented a simple alternative syntax for HTTP
that drastically reduces the wasted bandwidth.

For the specific example of the OPRA feed, the computed bandwidth
requirement of the experimental syntax is only 11% higher than the raw
data itself — nearly 3 times less than ordinary HTTP.

—-

Note: [this][3] is a local mirror of [this](http://www.jandj.com/presentations/SIFMATech2008-CapacityGrowth.pdf).

[DRY]: http://en.wikipedia.org/wiki/Don%27t_repeat_yourself
[OPRA]: http://www.opradata.com/

[1]: http://www.wallstreetandtech.com/news/trading/showArticle.jhtml?articleID=163101596
[2]: http://en.wikipedia.org/wiki/Options_Price_Reporting_Authority#Messages_per_Second
[3]: http://www.lshift.net/blog/wp-content/uploads/2009/02/sifmatech2008-capacitygrowth.pdf

by
tonyg
on
27/02/09

Jeff Lindsay on Web Hooks

From Jason Salas‘s interview with Jeff Lindsay, the guy who invented the term web hooks:

“For example, the Facebook Platform, although pretty complicated and full of their own technology, is still at the core based on web hooks. They call out to a user-defined external web application and integrate that with their application. That’s quite a radically different use of web hooks compared to the way people think of them in relation to XMPP.”

That’s an interesting point: while nothing is stopping XMPP from being used this way, it’s not how it is currently used. XMPP seems to be gaining some adoption for asynchronous or messaging-style tasks, but I haven’t seen much in the way of generalised RPC over XMPP yet. (Perhaps I’ve overlooked something obvious?) HTTP, on the other hand, is being used both for asynchronous operations (HTTP push, where the HTTP response has no body, and serves as an acknowledgement of receipt or completion) and for synchronous RPC-like operations (JSON-RPC, SOAP, CGI, ordinary static web pages).

Web hooks can be seen as an approach to making it easier for people to participate in the world of distributed objects that is HTTP — a worthy goal.

by
tonyg
on
26/02/09

Search

Categories

You are currently browsing the archives for the Web category.

Feeds

Archives

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us