technology from back to front

Evserver, part3: Simplified Etherpad clone

Realtime collaboration



I hate to write using markup languages.



The problem with markups is that when I see a typo in a rendered output, I have to click through the text and search for exact place with the mistake. I have the same feeling about editing Wikipedia, documentation on code.google.com, Trac, Blogger, WordPress and so on.



But I hate writing in WYSIWYG editors even more. Almost all graphical editors generate crappy output: badly closed html tags, broken styles, stripped white space. Considering this problems I usually try to stay with markups.



Next problem is that I’m the only person that can fix mistakes in my texts. My friends tell me about typos, but I have to fix them by hand. I tried to share texts on google docs, but the collaboration doesn’t work well enough.





A few months ago I saw an online real-time editor Etherpad. That’s quite a cool toy. It solves the problem of sharing the text with my friends, but it doesn’t support any markups – it’s just a plaintext editor.



But I know how to create Comet applications easily using EvServer and Django. I realized that I could build a simplified Etherpad clone, which supports a markup language!





The Etherpad clone



I decided to spend a very limited time on this project. Actually I wanted to do everything in my spare time in a week, that is about 6 afternoons.



Features I wanted, ordered by importance:

  • must support editing by many users in real time – like Etherpad
  • must generate rendered markup with reasonable latency
  • must support all major browsers (though IE and Konqueror are not a must)
  • must be dead simple
  • should be able to scale up (for reasons I’ll describe below)
  • should show who created what – like Etherpad. In the end I dropped this requirement due to the limited time.



With such hard time constraints I was ready to make some technological decisions:

  • Python on the server side
  • EvServer as a server
  • Django
  • Haproxy as a loadbalancer
  • use EvServer’s Comet transports
  • RabbitMq as a message broker
  • MemcacheDB as a database
  • Memcache for temporary storage
  • Support reStructuredText as a markup language



The hardest part of the project is synchronizing and merging updates from many clients in real time and fixing collisions and propagating changes to users. Fortunately Neil Fraser from Google solved this problem and published it as a very nice project diff-match-patch.



It seems that the only job I have to do is to glue this parts together.





Scalability?



A few days after Etherpad was launched I saw this dialog:



It seems that Etherpad failed to scale. I don’t expect my clone to be popular, but it would be nice to scale better, even if it’s only in theory.



So I built the application with scalability in mind. The major hub in the application is the message broker – RabbitMQ. As long as this AMQP server scales – my application will also do. In the end RabbitMQ ought to be scalable – it was build exactly for that.



To synchronize user changes between browsers with low latency EvServer application shouldn’t do any jobs requiring high CPU usage. Managing and comparing changes from browsers is quite fast. The tough part is to generate the rendered output from the markup. Rendering reStructuredText markup to html can take up to few seconds of processor time! I decided that this job should be done by external process that can afford greater latency. Except for that, the message flow is pretty standard for a real-time web application:







Burning the time



Once I decided what I wanted to build I was ready to start programming. I haven’t got a detailed plan, but I accomplished the task almost on time. This is roughly the time I used and how the effort broke down:

  • 1st evening: creating initial html view
  • 2nd evening: figuring out javascript hacks for the textarea
  • 3rd evening: created basic message flow – the latency counter started to work
  • waiting at the airport and flight: integrating the diff-merge-patch library, synchronization starts to work
  • 4th evening: fixed message routing in amqp, integrated the persistent database
  • 5th evening: javascript fixes for browsers
  • all Sunday: making it work on IE and final fixes





The application






Here you can play with the final application. It’s not especially impressive, but it works. I established my goals and I proved that using proper tools it’s possible to build simplified Etherpad clone in just few days. I must admit that creating this app was a great fun.



I even start to think that the application could be useful. Maybe I should consider adding more popular markup languages like Markdown, Wiki or Trac.





by
marek
on
02/03/09
  1. Wow, that’s very cool. Add in some web hooks to save the file off somewhere else, and it’ll be my next general use text editor!

  2. Reminds me a lot of hikij, although that is all done in client side javascript and doesn’t round-trip to the server.

  3. You might want to take a look at http://collabedit.com

  4. How many users can edit the the same document realtime? Etherpad can handle at most 8 users as far as I remember.

  5. Thanks for the links. Both hikij and collabedit.com are very interesting.

    How many users can edit the the same document realtime?

    There aren’t any limits.

  6. The demo died just now. I’ll investigate it soon.

    Update: should be working again

  7. Tesseracter
    on 05/03/09 at 11:58 pm

    The largest problem i have seen with collaborative editors is the undo function. while i would love to move my coding to a collaborative system, i can’t do it without an undo button. i know it gets more complicated with the design with multiple users, but it would certainly be nice.

  8. troelskn
    on 05/03/09 at 11:38 pm

    See also: http://attacklab.net/showdown/

    And:
    upflow

  9. @Tesseracter: this is where using a DVCS for applying deltas to the state of a document comes in useful ;-) (and gives offline-capability too, if the UI is sophisticated enough)

  10. What’s the best production deployment model for evserver and Django?
    Also, what do you think of Orbited and how does it compare to evserver?

  11. Nice to see someone else using a live markup preview!

    Collision resolution seems like a lot of wasted effort for collaborative editors. Users want to edit with other people, but just about never want to type in the exact same place at once.

    We built a little editor called draftastic_ around live markup much like yours, but we do locking on arbitrary sections to keep users from ever colliding. (Which also has the handy side-effect of making per-user change logs and undos easy.)

    .. _draftastic: http://draftastic.com/

  12. @Erik: Orbited is used somewhere in production, while Evserver is a alpha project yet. The biggest difference is that Orbited is (or at least was, I haven’t checked it for a while) a proxy, that has to be in front of the normal server. While Evserver can be set on separate machine with different subdomain. This allows you to smoothly develop comet code, without interfering with your normal http traffic.

  13. manuel Astudillo
    on 26/06/09 at 2:42 pm

    Hi,

    The link to the demo application is not working anymore.

    Is there any chances that you make public the repo with the code? I think this little app has big potential, specially for blog writers that use ReST as their markup languange.

    kind regards,

    Manuel.

  14. I’d like to second manual, is there a source link somewhere? I’d love to add some of these features into a project management tool I’m building.

    Aaron

 
 


3 × = nine

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us