I will provide you with two video files, video1.flv and video2.wmv, you need to embed them on the page and ensure that they use progressive download. Both video files are greater in size than 1GB so it will be obvious whether they are playing before they have completely downloaded. You will need to use the flash video player that I have provided for the flash video. Which one of the HTML snippets shown below should you use?
Snippet A
<object type="application/x-shockwave-flash" data="/player.swf" > <param name="movie" value="/player.swf"/> <param name="FlashVars" value="flv=/video1.flv"/> </object><object type="video/x-ms-wmv"> <param name="FileName" value="/video2.wmv"/> </object>
Snippet B
<object type="application/x-shockwave-flash" data="/player.swf" > <param name="movie" value="/player.swf"/> <param name="FlashVars" value="flv=http://myserver.lshift.net/video1.flv"/> </object><object type="video/x-ms-wmv"> <param name="FileName" value="http://myserver.lshift.net/video2.wmv"/> </object>
LShift have used the EpiServer CMS on several customer projects and it generally does most things you would want to do with a CMS in a simple way. EpiServer is a .Net based CMS and if you understand ASP.NET templated pages and templated controls it is very straightforward with a minimal learning curve.
One challenge I faced on a recent project was to implement a particular HTML navigation design using EpiServer. The HTML design called for the navigation to be rendered as nested HTML lists with the current section of the site annotated with a particular class.
For example if you were looking at “Tasty Fish” in the “Cat Food” section of the site the HTML should look something like this:
<ul>
<li>Dog Food
<ul>
<li>Meaty Bones</li>
</ul>
</li>
<li class="selected">Cat Food
<ul>
<li>Tasty Fish</li>
</ul>
</li>
</ul>
On initial inspection the EpiServer CMS appears to have two controls that may help, the EpiServer:MenuList and the EpiServer:PageTree. I first attempted to use the EpiServer:MenuList, this allowed me to do this:
<ul>
<li>Dog Food</li>
<li>Cat Food</li>
</ul>
<ul>
<li class="selected">Tasty Fish</li>
</ul>
This isn’t quite what the design required, the complete site navigation tree needed to be rendered since CSS was being used to show and hide menus in response to mouse rollovers.
So for attempt two I tried the EpiServer:PageTree component; this component is designed to render a whole tree of pages so it should be an appropriate solution. It is a very flexible component and provides lots of templates for customising the layout based upon the state of the tree. This is what I ended up with:
<ul>
<li>Dog Food
<ul>
<li>Meaty Bones</li>
</ul>
</li>
<li>Cat Food
<ul>
<li class="selected">Tasty Fish</li> <!-- OH NO THIS IS WRONG -->
</ul>
</li>
</ul>
This was very close! However it didn’t meet the design requirement; the top level item that contained the current page needed to be tagged with the CSS class, not the item corresponding to the current page. There didn’t seem to be an easy way to achieve this with the EPiServer components.
I decided I probably need some type of custom control, I then proceeded to write three implementations of a navigation control moving from sinful generation of HTML in a code behind, through my own templated control until arriving at the obvious solution using the asp:ListView control and a simple code behind. This was a nice solution because it uses a standard ASP.NET component in a standard way, the complication of tagging the selected top level item could be hidden away in a small code behind, and the markup was completely under the control of the HTML developer.
The navigation section of the ASP page looked like this:
<asp:ListView ID="Level1" runat="server" ItemPlaceHolderID="Level1Item">
<LayoutTemplate>
<ul><asp:PlaceHolder ID="Level1Item" runat="server"/></ul>
</LayoutTemplate>
<ItemTemplate>
<li class='<%# ((Boolean)Eval("Selected")) ? "selected" : "" %>'><%# Eval("Name") %>
<asp:ListView ID="Level2" runat="server" ItemPlaceHolderID="Level2Item">
<LayoutTemplate>
<ul><asp:PlaceHolder ID="Level2Item" runat="server"/></ul>
</LayoutTemplate>
<ItemTemplate>
<li><%# Eval("Name") %>
</ItemTemplate>
</asp:ListView>
</li>
</ItemTemplate>
</asp:ListView>
This is a straightforward usage of nested ListViews and ASP data binding expressions, all of the markup is visible and it can be explained to an HTML developer in a short amount of time. New navigation levels can be added in exactly the same way that the Level 2 navigation was added to the Level1 navigation. The ternary operator within the data binding expression,
, determines if the navigation item is selected, this is a standard mechanism for conditional rendering with ASP.NET data bound controls.
This was combined with a page behind like this:
protected override void OnLoad(System.EventArgs e)
{
base.OnLoad(e);
Level1.DataSource = BuildMenuItems();
Level1.DataBind();
}
private List<MenuItem> BuildMenuItems()
{
List<MenuItem> menuItems = new List<MenuItem>();
PageData homePage = GetPage(PageReference.StartPage);
foreach(PageData child in GetChildren(homePage.PageLink))
{
if(child.VisibleInMenu)
{
MenuItem item = CreateMenuItem(child, true);
item.Selected = findPage(CurrentPage.PageGuid, child);
menuItems.Add(item);
}
}
return menuItems;
}
private MenuItem CreateMenuItem(PageData page, Boolean includeChildren)
{
MenuItem item = new MenuItem(page.PageName);
item.Url = page.LinkURL;
if (includeChildren)
{
PageDataCollection children = GetChildren(page.PageLink);
foreach (PageData child in children)
{
if (child.VisibleInMenu)
{
item.Children.Add(CreateMenuItem(child, true));
}
}
}
return item;
}
private Boolean findPage(Guid id, PageData parent)
{
if (id == parent.PageGuid) return true;
foreach (PageData page in GetChildren(parent.PageLink))
{
if (page.PageGuid == id)
{
return true;
}
if(findPage(id, page))
{
return true;
}
}
return false;
}</pre>
With a helper class MenuItem defined like this:
public class MenuItem
{
public MenuItem(String name)
{
this.Name = name;
}
public String Name { get; set; }
public String Url { get; set; }
public Boolean Selected { get; set; }
private List<MenuItem> children = new List<MenuItem>();
public List<MenuItem> Children {
get
{
return children;
}
set
{
children = value;
}
}
}
The page behind creates MenuItem instances for each page in the navigation. The top level item gets tagged as selected only if the current page is one of its children. This is a reasonable amount of code to write but it was the smallest solution that solved the problem and made the HTML obvious and available for modification by HTML developers.
Recently, Alan Ogilvie from A&Mi at the BBC announced that they were developing a “Feeds Hub”, and outlined their ambitions for it.
He also mentioned LShift, RabbitMQ and open source, and I would like to explain, from our point of view, what this project is and how we’re working with the BBC.
Alan describes the central problem they want to solve:
The number of new projects across the BBC starting to use feeds in creative ways is growing very quickly – just think of spaghetti… on a massive scale. So what do we do? What are the options? We could go down the route of gathering together a centralised ‘Feed Usage’ committee with members across the BBC, to ‘federate’ feeds so that they are all produced in the same way but, in practice, this never truly works and is likely to stifle creativity. Often it is quite difficult to convince people to work together when they have already experienced the freedom of doing what they want – often they are concerned that their projects will be delayed. Not all feeds sources that we use or want to use are under our control, things like Twitter, Flickr, blogs, etc. Federation will never solve all our problems anyway – for example, it can’t help when a source feed is turned off, it doesn’t monitor failures.
The idea is, then, is to bring the spaghetti under control; not by mandating things be done a certain way, but by overlaying a bunch of management and monitoring tools that would otherwise be ad-hoc or not exist.
We also want to enable people to discover, reuse and adapt existing feeds, rather than reinvent them. Again, not by enforcement, but by making it easier to do so than to not.
And we’re not just talking about RSS — there are (at the BBC and in general) many different protocols and formats flying about.
Technically-speaking, this adds up to a couple of pieces of kit: a platform for relaying feeds through, that supports routing, transformation and distribution by a number of different means; and, a user interface for discovering, creating, managing and monitoring these feeds.
In short: LShift are developing the core technology, helping the BBC shepherd the various strands of the project along, and helping engage with developers to build the open source aspect of the project (about which more in a bit).
LShift are the progenitors of RabbitMQ, a message broker implementing AMQP. Over the last few years we’ve been thinking about and experimenting with different applications of messaging (and not just AMQP); for example, Rabbiter, which puts a Twitter-like spin on XMPP.
In the meantime, RabbitMQ itself has gained client libraries, gateways, adapters, and a smart, active community, to the point where it’s no longer just an AMQP message broker — it’s becoming more like a universal messaging adapter.
So we were very enthused when we heard that the BBC wanted a feeds hub, because it seemed to bring together lots of what we’d been thinking about abstractly, as well as new ideas and problems to solve, and give it all a concrete purpose.
We’re working on a prototype, and our plan is to make the source public as soon as it’s fit for consumption. We hope this will be in the next month.
In the meantime, I may talk about some of the core technical ideas, and our plans, here on our blog; and, of course, you can follow LShift on Twitter and the Radiolabs blog.
I’ve been working recently on Reverse HTTP, an approach to making HTTP easier to use as the distributed object system that it is. My work is similar to the work of Lentczner and Preston, but is independently invented and technically a bit different: one, I’m using plain vanilla HTTP as a transport, and two, I’m focussing a little more on the enrollment, registration, queueing and management aspects of the system. My draft spec is here (though as I’m still polishing, please excuse its roughness), and you can play with some demos or download and play with an implementation of the spec.
Comments welcome!
HTTP/1.1 is a lovely protocol. Text-based, sophisticated, flexible. It does tend toward the verbose though. What if we wanted to use HTTP’s semantics in a very high-speed messaging situation? How could we mitigate the overhead of all those headers?
Now, bandwidth is pretty cheap: cheap enough that for most applications the kind of approach I suggest below is ridiculously far over the top. Some situations, though, really do need a more efficient protocol: I’m thinking of people having to consume the OPRA feed, which is fast approaching 1 million messages per second (1, 2, 3). What if, in some bizarre situation, HTTP was the protocol used to deliver a full OPRA feed?
Instead of having each HTTP request start with a clean slate after the previous request on a given connection has been processed, how about giving connections a memory?
Let’s invent a syntax for HTTP that is easy to translate back to regular HTTP syntax, but that avoids repeating ourselves quite so much.
Each line starts with an opcode and a colon. The rest of the line is interpreted depending on the opcode. Each opcode-line is terminated with CRLF.
V:HTTP/1.x Set HTTP version identifier.
B:/some/base/url Set base URL for requests.
M:GET Set method for requests.
<:somename Retrieve a named configuration
>:somename Give the current configuration a name
H:Header: value Set a header
-:/url/suffix Issue a bodyless request
+:/url/suffix 12345 Issue a request with a body
Opcodes V, B, M and H are hopefully self-explanatory. I’ll
explore < and > below. The opcodes - and + actually complete
each request and tell the server to process the message.
Opcode - takes as its argument a URL fragment that gets appended to
the base URL set by opcode B. Opcode + does the same, but also
takes an ASCII Content-Length value, which tells the server to read
that many bytes after the CRLF of the + line, and to use the bytes
read as the entity body of the HTTP request.
Content-Length is a slightly weird header, more properly associated
with the entity body than the headers proper, which is why it gets
special treatment. (We could also come up with a syntax for indicating
chunked transfer encoding for the entity body.)
As an example, let’s encode the following POST request:
POST /someurl HTTP/1.1
Host: relay.localhost.lshift.net:8000
Content-Type: text/plain
Accept-Encoding: identity
Content-Length: 13
hello world
Encoded, this becomes
V:HTTP/1.1
B:/someurl
M:POST
H:Host: relay.localhost.lshift.net:8000
H:Content-Type: text/plain
H:Accept-Encoding: identity
+: 13
hello world
Not an obvious improvement. However, consider issuing 100 copies of that same request on a single connection. With plain HTTP, all the headers are repeated; with our encoded HTTP, the only part that is repeated is:
+: 13
hello world
Instead of sending (151 * 100) = 15100 bytes, we now send 130 + (20 * 100) = 2130 bytes.
The scheme as described so far takes care of the unchanging parts of
repeated HTTP requests; for the changing parts, such as Accept and
Referer headers, we need to make use of the < and >
opcodes. Before I get into that, though, let’s take a look at how the
scheme so far might work in the case of OPRA.
Each OPRA quote update is on average 66 bytes long, making for around 63MB/s of raw content.
Let’s imagine that each delivery appears as a separate HTTP request:
POST /receiver HTTP/1.1
Host: opra-receiver.example.com
Content-Type: application/x-opra-quote
Accept-Encoding: identity
Content-Length: 66
blablablablablablablablablablablablablablablablablablablablablabla
That’s 213 bytes long: an overhead of 220% over the raw message content.
Encoded using the stateful scheme above, the first request appears on the wire as
V:HTTP/1.1
B:/receiver
M:POST
H:Host: opra-receiver.example.com
H:Content-Type: application/x-opra-quote
H:Accept-Encoding: identity
+: 66
blablablablablablablablablablablablablablablablablablablablablabla
and subsequent requests as
+: 66
blablablablablablablablablablablablablablablablablablablablablabla
for an amortized per-request size of 73 bytes: a much less problematic overhead of 11%. In summary:
| Encoding | Bytes per message body | Per-message overhead (bytes) | Size increase over raw content | Bandwidth at 1M msgs/sec |
|---|---|---|---|---|
| Plain HTTP | 66 | 147 | 220% | 203.1 MBy/s |
| Encoded HTTP | 66 | 7 | 11% | 69.6 MBy/s |
Using plain HTTP, the feed doesn’t fit on a gigabit ethernet. Using our encoding scheme, it does.
Besides the savings in terms of bandwidth, the encoding scheme could also help with saving CPU. After processing the headers once, the results of the processing could be cached, avoiding unnecessary repetition of potentially expensive calculations such as routing, authentication, and authorisation.
Above, I mentioned that some headers changed, while others stayed the
same from request to request. The < and > opcodes are intended to
deal with just this situation.
The > opcode stores the current state in a named register, and the
< opcode loads the current state from a register. Headers that don’t
change between requests are placed into a register, and each request
loads from that register before setting its request-specific headers.
To illustrate, imagine the following two requests:
GET / HTTP/1.1
Host: www.example.com
Cookie: key=value
Accept: HTTP Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
GET /style.css HTTP/1.1
Host: www.example.com
Cookie: key=value
Referer: http://www.example.com/
Accept: text/css,*/*;q=0.1
One possible encoding is:
V:HTTP/1.1
B:/
M:GET
H:Host: www.example.com
H:Cookie: key=value
>:config1
H:Accept: HTTP Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
-:
<:config1
H:Referer: http://www.example.com/
H:Accept: text/css,*/*;q=0.1
-:style.css
By using <:config1, the second request reuses the stored settings
for the method, base URL, HTTP version, and Host and Cookie
headers.
Most applications of HTTP do fine using ordinary HTTP syntax. I’m not suggesting changing HTTP, or trying to get an encoding scheme like this deployed in any browser or webserver at all. The point of the exercise is to consider how low one might make the bandwidth overheads of a text-based protocol like HTTP for the specific case of a high-speed messaging scenario.
In situations where the semantics of HTTP make sense, but the syntax is just too verbose, schemes like this one can be useful on a point-to-point link. There’s no need for global support for an alternative syntax, since people who are already forming very specific contracts with each other for the exchange of information can choose to use it, or not, on a case-by-case basis.
Instead of specifying a whole new transport protocol for high-speed links, people can reuse the considerable amount of work that’s gone into HTTP, without paying the bandwidth price.
Just as a throwaway comparison, I computed the minimum possible
overhead for sending a 66-byte message using AMQP 0-8 or 0-9. Using a
single-letter queue name, “q“, the overhead is 69 bytes per message,
or 105% of the message body. For our OPRA example at 1M messages per
second, that works out at 128.7 megabytes per second, and we’re back
over the limit of a single gigabit ethernet again. Interestingly,
despite AMQP’s binary nature, its overhead is much higher than a
simple syntactic rearrangement of a text-based protocol in this case.
We considered the overhead of using plain HTTP in a high-speed messaging scenario, and invented a simple alternative syntax for HTTP that drastically reduces the wasted bandwidth.
For the specific example of the OPRA feed, the computed bandwidth requirement of the experimental syntax is only 11% higher than the raw data itself — nearly 3 times less than ordinary HTTP.
From Jason Salas’s interview with Jeff Lindsay, the guy who invented the term web hooks:
“For example, the Facebook Platform, although pretty complicated and full of their own technology, is still at the core based on web hooks. They call out to a user-defined external web application and integrate that with their application. That’s quite a radically different use of web hooks compared to the way people think of them in relation to XMPP.”
That’s an interesting point: while nothing is stopping XMPP from being used this way, it’s not how it is currently used. XMPP seems to be gaining some adoption for asynchronous or messaging-style tasks, but I haven’t seen much in the way of generalised RPC over XMPP yet. (Perhaps I’ve overlooked something obvious?) HTTP, on the other hand, is being used both for asynchronous operations (HTTP push, where the HTTP response has no body, and serves as an acknowledgement of receipt or completion) and for synchronous RPC-like operations (JSON-RPC, SOAP, CGI, ordinary static web pages).
Web hooks can be seen as an approach to making it easier for people to participate in the world of distributed objects that is HTTP — a worthy goal.
Long long time ago there was a WSGI spec. This document described a lot of interesting stuff. Between other very important paragraphs you could find a hidden gem:
[...] applications will usually return an iterator (often a generator-iterator) that produces the output in a block-by-block fashion. These blocks may be broken to coincide with mulitpart boundaries (for “server push”), or just before time-consuming tasks (such as reading another block of an on-disk file). [...]
def clockdemo(environ, startresponse): startresponse("200 OK", [('Content-type','text/plain')]) for i in range(100): yield "%s\n" % (datetime.datetime.now(),) time.sleep(1)The problem is that way of programming just doesn’t work well. It’s not scalable, requires a lot of threads and can eat a lot of resources. That’s why the feature has been forgotten.
def clockdemo(environ, startresponse): startresponse("200 OK", [('Content-type','text/plain')]) sd = socket.socket(socket.AFINET, socket.SOCKDGRAM) try: for i in range(100): yield environ['x-wsgiorg.fdevent.readable'](sd, 1.0) yield "%s\n" % (datetime.datetime.now(),) except GeneratorExit: pass sd.close()So I created a server that supports it: EvServer the Asynchronous Python WSGI Server
| Server |
Fetches/sec |
| evserver | 4254 |
| spawning with threads | 1237 |
| spawning without threads | 2200 |
| cherrypy wsgi server | 1700 |
Examples
Admittedly using raw WSGI for regular web applications is a bit inconvenient. Fortunately decent web frameworks support passing iterators from the web application down to the WSGI server, throughout all the framework. On my list of frameworks that support iterators you can find: Django and Web.py.
Django
Django 1.0 supports returning iterators from views. This is Django code for the clock example:
def djangoclock(request): def iterator(): sd = socket.socket(socket.AFINET, socket.SOCKDGRAM) try: while True: yield request.environ['x-wsgiorg.fdevent.readable'](sd, 1.0) yield '%s\n' % (datetime.datetime.now(),) except GeneratorExit: pass sd.close() return HttpResponse(iterator(), mimetype="text/plain")The problem is that this code is not going to work using the standard ./manage runserver development server. Fortunately, it’s very easy to integrate EvServer with Django, you only need to put that into settings.py:
INSTALLEDAPPS = ( [...] 'django.contrib.sites', 'evserver', # <<< THIS LINE enables runevserver command)Now you can test your app using ./manage runevserver.
class webpyclock: def GET(self, name): web.header('Content-Type','text/plain', unique=True) environ = web.ctx.environ def iterable(): sd = socket.socket(socket.AFINET, socket.SOCKDGRAM) # any udp socket try: while True: yield environ['x-wsgiorg.fdevent.readable'](sd, 1.0) yield "%s\n" % (datetime.datetime.now(),) except GeneratorExit: pass sd.close() return iterable()The full source code is included in EvServer example directory . You can run this code using command:
evserver --exec "import examples.frameworkwebpy; application = examples.framework_webpy.application"
If you use Firefox, go and install the Ctrl-Tab add-on.
Tabs are great for reducing clutter, but they fail to make life much easier because the tab navigation doesn’t support the common patterns of use. For example, I end up opening the same page in multiple tabs because it is quicker to do that than to hunt for it in the existing tabs. Eventually even that is unmanageable and I have to manually garbage collect tabs.
Ctrl-Tab makes tabs usable again by fixing the navigation. The key combination Control-Tab cycles through the tab history (in the same way as task switchers), so that a single tap take you to the previous tab you visited, which is almost always the thing you wanted. Successive taps of Tab show thumbnails of the tab contents while you flip through them. Better still, Control-A (Control-Shift-a) shows thumbnails of all open tabs, and lets you filter incrementally by typing in the auto-focussed box, then select the tab with the arrow keys or mouse. In a single swoop it’s now much easier for me to switch to something I have open than open it again – the way it should be.
After my talk on Javascript DVCS at the Osmosoft Open Source Show’n'tell, I went to visit Osmosoft, the developers of TiddlyWiki, to talk about giving TiddlyWiki some DVCS-like abilities. Martin Budden and I sat down and built a couple of prototypes: one where each tiddler is versioned every time it is edited, and one where versions are snapshots of the entire wiki, and are created each time the whole wiki is saved to disk.
| Regular DVCS | SynchroTiddly |
|---|---|
| Repository | The html file contains everything |
| File within repository | Tiddler within wiki |
| Commit a revision | Save the wiki to disk |
| Save a text file | Edit a tiddler |
| Push/pull synchronisation | Import from other file |
If you have Firefox (it doesn’t work with other browsers yet!) you can experiment with an alpha-quality DVCS-enabled TiddlyWiki here. Take a look at the “Versions” tab, in the control panel at the right-hand-side of the page. You’ll have to download it to your local hard disk if you want to save any changes.
It’s still a prototype, a work-in-progress: the user interface for version management is clunky, it’s not cross-browser, there are issues with shadow tiddlers, and I’d like to experiment with a slightly different factoring of the repository format, but it’s good enough to get a feel for the kinds of things you might try with a DVCS-enabled TiddlyWiki.
Despite its prototypical status, it can synchronize content between different instances of itself. For example, you can have a copy of a SynchroTiddly on your laptop, email it to someone else or share it via HTTP, and import and merge their changes when they make their modified copy visible via an HTTP server or email it back to you.
I’ve been documenting it in the wiki itself — if anyone tries it out, please feel free to contribute more documentation; you could even make your altered wiki instance available via public HTTP so I can import and merge your changes back in.
Yesterday I presented my work on Javascript diff, diff3, merging and version control at the Osmosoft Open Source Show ‘n Tell. (Previous posts about this stuff: here and here.) The slides for the talk are here. They’re a work-in-progress – as I think of things, I’ll continue to update them.
To summarise: I’ve used the diff3 I built in May to make a simple Javascript distributed version-control system that manages a collection of JSON structures. It supports named branches, merging, and import/export of revisions. So far, there’s no network synchronisation protocol, although it’d be easy to build a simple one using the rev import/export feature and XMLHttpRequest, and the storage format and repository representation is brutally naive (and because it doesn’t yet delta-compress historical versions of files, it is a bit wasteful of memory).
You can try out a few browser-based demos of the features of the diff and DVCS libraries:
The code is available using Mercurial by
(or by simply browsing to that URL and exploring from there). It’s quite small and (I hope) easily understood – at the time of writing,
The core interfaces, algorithms and internal structures of the DVCS code seem quite usable to me. In order to get to an efficient DVCS from here, the issues of storage and network formats will have to be addressed. Fortunately, storage and network formats are only about efficiency, not about features or correctness, and so they can be addressed separately from the core system. It will also eventually be necessary to revisit the naive LCA-computation code I’ve written, which is used to select an ancestor for use in a merge.
The code is split into a few different files:
for an example of how to use the DVCS, and
for an example of the repository format and the use of the revision import feature. * The diff and diff3 code itself. * Graph utilities (for computing LCA etc) * The DVCS and pseudo-file-system code. * The repository history-graph-drawing code and a python script for drawing the little tile images used in rendering a repository history graph.
You are currently browsing the archives for the Web category.