Ringing Room Under the Hood, part 2 || Leland Paul Reimer

This is the second in a series of posts discussing the technical decisions that went into creating Ringing Room, the first web platform for distributed English change-ringing. The first post gives a little bit of background on change-ringing for non-ringers to be able to follow the series. This post looks more at how Ringing Room clients communicate with the server.

Ringing Room Under the Hood
Communications (this post)
Towers
(coming soon!)

This post is going to discuss how information on Ringing Room is passed back and forth between ringers and the server, including the communications protocol and the server stack.

To P2P or…

In a perfect world, we would like Ringing Room clients to communicate peer-to-peer, such that every client machine communicates directly with every other client. This architecture minimizes the time that it takes each client to hear about any event on any other machine; adding any kind of central server adds both processing overhead on the server plus the extra time for the information to go up to the server and then out to the other clients. In practice, though Ringing Room instead uses a client-server architecture, where all communication is mediated by a central server. The reasons for this boil down to the underlying communications protocols we have available.

Most of web is built on TCP, which is (in general) not designed for P2P. The alternative is UDP, which is more similar to VoIP technology and is used by many videoconferencing platforms. UDP is designed for P2P, but it has some disadvantages for this specific use-case:

UDP does not guarantee delivery. If a TCP packet goes missing along the way, the sender and receiver negotiate resending the packet; UDP sacrifices this insurance in order to reduce latency. For an application like videoconferencing, the worst case for a lost packet is some artifact or skip in the video — certainly annoying, but not crippling. For ringing, a lost packet might result in the bell-state getting out of sync between clients — that is, some of the ringers might see that bell 5 has rung a total of 5 times, while others see that it has rung a total of 6 times. Recall that the parity of the stroke is important for helping ringers keep track of their ringing; a skip like this is likely to cause the ringing to grind to a halt.

UDP does not guarantee order. This is closely related to the last point: UDP sacrifices strict packet-ordering in order to reduce latency. Again, this is pretty bad for ringing: The order in which the bells ring is, in fact, the entire point of the exercise!

UDP is harder in the browser. Ok, this one is a bit subjective, and is changing quickly. WebRTC, the browser-based UDP standard, is only now achieving broad browser penetration, and libraries & tutorials for it are thin on the ground. At least at the time I started working on Ringing Room, I found getting started with WebRTC to be overwhelming!

All of these challenges can certainly be overcome — the delivery & order guarantees would need to be re-implemented in our code, but that’s certainly possible.¹ But when I first designed Ringing Room I made the choice to go with Websockets (which is TCP-based), and thus a client-server model rather than a P2P one. While I’m still considering a switch to P2P for the future, I think that this model had a number of knock-on effects that really helped Ringing Room become a success in the first few months; I’ll discuss one of these below.

When to ring?

One of the first decisions I needed to make about the communications protocol was: When a user hits a key to ring a bell, when should we actually ring that bell?

It would be easy to just have this happen immediately: As soon as the ringer hits the relevant key, the bell on their client changes state, makes a sound, and sends a signal to the server. The server then forwards that signal to all other clients in the room, which ring that bell as soon as they receive the server’s signal. In essence, the “canonical” time that the bell rings is the moment when the user presses a key; I’ll call this the “client-timed” model.

The alternative is a server-timed model, where the ringing event happen only once the server says it should. That is, when the user hits a key to ring a bell, their client sends a signal to the server, which then responds by broadcasting that signal back to all clients in the room. The ringer won’t see or hear their own bell respond until the signal has bounced back to them from the server. In effect, the canonical time that the bell rings is the moment when the server broadcasts its signal.

The client-timed model might at first seem better — surely having the app be more responsive is better? However, the server-timed model has a subtle advantage. If the bell rings as soon as the user presses a key, they have no feedback about how laggy their connection is — they can’t tell whether other ringers are hearing their bell soon after keypress or at some delay. By contrast, if the ringer has to wait for the signal to come back from the server, they have some implicit information about the delay and can compensate for it, pressing the key just a bit earlier in order to achieve an even rhythm.²

Ringing Room has always run on a server-timed model, though I’ll admit that I first picked this option somewhat arbitrarily! But I think time has shown that this model works best for this particular application. Graham John’s Handbell Stadium, which uses a P2P based system, initially ran on a client-side model; he has since added an extremely-clever “ping balancing” system that (among other things) imitates some of the effect of the server-timed model.³ Handbell Stadium users generally reported a significant increase in the evenness of the ringing after ping balancing was implemented.

The sysadmin details

Without getting too much into the gory details: Ringing Room is built with Flask, mostly because I’m most comfortable working in Python.⁴ All of the static pages are served by nginx straight from the Flask app, just as normal. Real-time communication is handled by Socket.IO, or more particularly Flask-Socket.IO on the server-side. Socket.IO is an event-based system, meaning that we can set up listeners for particular named “events” on both server & client; all the event-types used by Ringing Room are documented in the repo.

Unfortunately, nginx itself can’t handle Socket.IO / Websockets. This means that there’s a third piece to the server architecture, namely Gunicorn; nginx is set up as a reverse proxy:

server {

  [...]

  location / {
    include proxy_params;
    proxy_pass http://unix:/srv/www/ringingroom/ringingroom.sock;
    }
}

All of this is currently hosted on a (set of) DigitalOcean droplets; I’ve used DigitalOcean for my personal website for years, so I picked it mostly out of familiarity.

And that’s that! In the next post, I’ll discuss more about the server-side architecture, in particular how Ringing Room goes about modelling towers.

As proof of this, there are now a few other distributed ringing platforms which use UDP, including the Handbell Stadium and muster. Notably, both of those are native apps, not web apps. ↩︎
Change-ringers are quite accustomed to dealing with delay between action and sound: Tower bells often take around a full second between beginning to swing and making a sound. Anecdotally, many Ringing Room ringers do, in fact, consciously adjust their keypresses by some fixed offset to compensate for lag. ↩︎
Ping balancing adds a moving average of the ping times between all clients as an artificial delay between user input and response. It’s quite a clever system, and I’ve wanted to port some bits of it over to Ringing Room for a while! ↩︎
I’ll put a plug in here for Miguel Grinberg’s Flask Mega-Tutorial, which is how I learned nearly everything I know about Flask. ↩︎