The curious world of web RTC (Part 2: Servers)

This is part 2 of my series on webRTC.

Read Part 1 here: The curious world of web RTC (Part 1: Introduction)

Let's talk servers. Of all the webRTC articles I read, all of them address the servers at the very end of the discussion. I am a very bad reader. The first few articles I read, I genuinely thought webRTC doesn't need any servers. Which is a partial truth. The real communication, does not usually require servers. But at least one or more server is required to facilitate the initial communication.

My plan is to keep the series as simple as possible. I would use a very well known name pair as example. Alice and bob, our two parties willing to establish a RTC Peer connection. For this example to work, Let's assume Alice wants to send letters directly to bob.

Signalling Server:

Signalling server is usually always required in real world scenarios. It can be implemented in many ways. And there are many well known signalling servers in the web. But due to its super simple nature and the fact that ability to have full control helps, I highly recommend creating one manually.

Here's how the communication happens from Alice to bob,

Alice Generates an Offer SDP (a fancy way of saying a string describing Alice's Real time communication and local IP address). If alice was a real person and IP address was a real address, the imaginary SDP might contain something like this:

I live in house number 4. I am capable of understanding and speaking VP9 video and Opus Audio.

Alice sends it to the signalling server. Consider it as a post box in a post office. So, the signalling server might be just a REST API server and Alice sends her SDP as a post request, Or better use web Socket.

Now Bob, collects the SDP, from the Post box (Signalling server) or gets it as a socket signal(depending on the signalling server implementation) and creates an Answer SDP. This SDP also contains similar information as the offer SDP. Alice collects or gets it from the signalling server.

Now in a happy and small world, Alice now knows Bob's Address and capabilities, Bob knows Alice's Address and capabilities, and they should be able to communicate.

Enter the world of NAT and Firewall:

In real world there are thousands of house #4, in different areas. Same happens for IP addresses inside a NAT. NAT is very easy to understand. Think of the wireless router at your home or work. Under your wifi network, every device gets an IP address to connect to the local network created by your router.Then your router uses your original IP address to connect to the internet (In many cases, your router is also a part of NAT created by your ISP, so there might be multiple layers of NAT between your device and it's actual IP address). So, in our example, if Alice says she lives in house # 4, it might be in a locality in a big city. So, most of the time, only SDP can't create connections and only Signalling servers aren't enough.

STUN Server:

STUN servers are used to determine the real IP address of a client. A peer client (Alice or Bob) asks the STUN server about their IP address and it simply tells them what appears to be their IP Address. The Clients use this information to create ICE candidates (More in ICE candidates later).

You might not need a STUN server of your own. Due to the low amount of processing it requires, Many public stun servers are available for free Here's a list.

TURN Server:

TURN servers are relay servers. In case a direct connection is not possible between Alice and Bob, TURN servers are used to relay the traffic (Not so peer to peer in this case). This makes running TURN servers very expensive and you probably won't find any free ones. There are many Services that provides managed TURN servers. Or you can create your own with some effort. More on those in upcoming blog posts.

TURN and STUN servers are also called ICE servers.

Other Servers:

There might be other servers involved in complex WebRTC Application like Media servers, In order to mix multi party video feed, Encode/Decode for unsupported devices/Protocols, Applying real time filters, or recording (Recording can be done in the client too, Thanks to the Media Stream Recorder API).

There might be other specified servers for specific requirements.

What are ICE Candidates:

ICE candidates are different suggested ways to reach a client. Every client creates multiple ICE candidates depending on the ICE servers provided to the Peer Connection.Unlike SDP, ICE candidates are Objects with multiple properties and can be serialised and deserialised to JSON string.

The flow goes something like this:

Alice Creates Offer SDP and sends to Bob via the signalling server.
Bob Creates Answer SDP and sends to Alice, also via the signalling server.
They try to connect with only SDP, most of the time, they fail.
Alice And bob starts to generate ICE candidates with the information gathered from their ICE servers (TURN and STUN servers) and sends those candidates to each other via the signalling server.
Both parties try to connect to each other using each other's candidates and choses the most efficient way in the process.
If everything goes well, A peer connection is established.

To relate it to the real world:

Alice creates offer SDP and sends it to Bob
Bob gets Alice's offer SDP, Creates answer SDP and sends to Alice.
They try to send each other letters, and fail.
Alice and Bob goes to the STUN server and learns about her city and locality.
Then Alice and bob generates ICE candidate from the information provided by the STUN server and sends the information to each other. They try sending Letters again. If the letter goes through, the connection is established.
If they still can't send letter, They go to a restaurant (STUN server) and connects it's address. Then they meet at the restaurant and exchanges letter.

There are hundreds of operation in between, They are left out intentionally. In later posts, I will discuss more about the implementation of the process. I might have oversimplified, or overcomplicated the process. Let me know your thoughts in the comments.