So You Want to Be a Video Communications App Developer

  • Video chat, video conferencing
  • Group calls, audio rooms
  • Co-watching, watch parties, chat between stream viewers
  • Live customer engagement
  • Virtual events, webinars, town halls
  • Live streaming with guest participants
  • Remote presence, telework, telehealth
  • Static content. Want to find out what’s on the lunch menu this week? It’s easier when the restaurant has a website. No more calling or emailing them! This was the web in 1995: great for distributing content that doesn’t update constantly and isn’t personalized to a particular user, but not very alive.
  • Forms. Planning on buying something? Want to ask a question? Instead of emailing them, it could save everyone time if they had a form on their website for structured requests.
  • Apps. From those contact forms, it’s a short conceptual leap to make the server answer to requests automatically using a server-side application that generates a new web page. This was “web apps 1.0”, and it replaced email-based automatic content services which didn’t have the benefit of a structured UI.
  • Text chat. In 2001–2005, web apps became dynamic thanks to new browser APIs. Instead of rendering a whole new page for each server request, you could fetch data in small pieces and update parts of the web page as needed— Web 2.0! This unbridled network access also enabled realtime text chat, which then became a core part of business web design thanks to easy-to-use platforms like Intercom. This was the last piece of email to be unbundled: Intercom and others largely took over direct customer contact, providing tools better adjusted to each business vertical than simply having customer representatives reading and sending email.

Technology underpinnings of live Internet video

Transmitting realtime video on the Internet isn’t trivial. It’s extremely bandwidth-intensive compared to other applications and also extremely sensitive to delivery timing fluctuations—a challenging combination on Internet’s packet network.

RTMP

Honestly, it’s not even worth expanding these acronyms. They’re all named the same: “real-time-something-not-descriptive-nor-useful”. Just remember that this first one under discussion is The One With The “M”.

RTP

What a difference one less “M” makes. We saw RTMP is used for single video streams at fairly high bandwidth and where latency isn’t a prime concern—like television.

WebRTC

Enter WebRTC. It’s not a single protocol, but more like a shared framework for video communications applications that work in web browsers. It offers APIs and mandates the underlying protocols (like RTP), as well as the codecs used to compress the actual audio/video data. Thanks to WebRTC, Firefox users can talk to Chrome users, and both can talk to servers that understand the relevant parts of WebRTC.

The platforms

Zoom’s brand new SDK (introduced March 2021) was mentioned earlier. As the thousand-pound gorilla of consumer mindshare, we’ll definitely want to take a look at Zoom’s developer offering. It will be pitted against three already established video communications platforms, presented here in alphabetical order.

Agora

Founded in 2013, Agora is dual-headquartered in Santa Clara, California and Shanghai, China.

Daily.co

Daily is the upstart among these companies, but it actually has five years of history already.

Twilio

Twilio brands themselves a “cloud communications platform as a service”, or CPaaS for (not-very-)short.

Zoom

Ah, the company whose name practically became a common noun for “video meeting” during the Covid-19 pandemic.

The excluded

One important competitor left out here is Amazon’s Chime SDK. I‘ve wanted to primarily focus on the needs of startups and individual developers, and the Chime SDK is geared towards serving enterprises that already have a substantial investment in the Amazon Web Services (AWS) ecosystem. I don’t feel I would do it justice in this context.

Pricing comparison

Reason prevails, and we find ourselves determined to leverage a well-maintained commercial platform rather than cobbling something together from various semi-maintained open source widgets and risk being in the miracle business. That means we must pay, and big product dreams could translate into big bills (although at that point, it’s usually a positive problem to have—especially in the present startup funding environment.)

Zoom’s products and pricing

To even test the Zoom Video SDK, you need to create a brand new Zoom user account and enter a valid credit card. The sign-up page (shown below) tries to get you to sign onto the $1,000 / year plan. When you sign up for pay-as-you-go, Zoom still sends you an initial $0 invoice. Everything about this flow is designed for corporations, not individuals.

Twilio’s products and pricing

Twilio’s enterprise roots are showing in their Video API pricing. The website makes you choose upfront between three products, and at every corner lurks a button asking if you’d like to talk to a Twilio salesperson instead. Let’s try to figure it out without the hard-sell pitch.

  • Twilio Video WebRTC Go. This is a free sampler of their SDK, but it really won’t get you very far because it only supports 1:1 WebRTC calls (i.e. peer-to-peer between two persons). If you just want to see what the SDK is like, this is the way to go, but it won’t give you much insight into the full product.
  • Twilio Video P2P. This product’s tagline is “Build peer-to-peer video applications with unlimited TURN relay” — in other words, it’s the next step up in WebRTC capabilities compared to the free tier. We get multiple participants, and Twilio’s server will help relaying the streams as needed (that’s what TURN does)… But fundamentally this is the hippie commune model of everyone connecting to everyone, which can be taxing for upload bandwidth. The price is $0.0015 per participant minute. So, a 30-minute call with five participants would come to 22.5 cents.
  • Twilio Video Groups. Remember SFU from the WebRTC discussion earlier? Well, this product is basically an SFU server. Up to 50 participants can join a session. The price is $0.004 / participant minute. A 30-minute call with the maximum of fifty participants would thus cost $6. If you want a recording of the call in a single video stream, it costs a bit extra (60 cents per hour). Or, if you want to record all the participant streams as separate video files, it doubles the price.

Daily’s products and pricing

Daily.co is essentially the “anti-Twilio” in their pricing.

Agora’s products and pricing

We already briefly saw Agora’s product list. Let’s take another look:

Next

One of the reasons for writing this post is to get first-hand experience of those implicit limitations that vendors might not be eager to discuss. I’m going to write a simple basic video chat application four times, using each vendor’s web API, and then spend a bit of time trying to bump into the limitations. What happens if you join a video room with many participants in Agora vs. Daily vs. Twilio vs. Zoom? Can we use these demo apps to get some measurements that could help us understand the quality tradeoffs on each platform?

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Pauli Olavi Ojala

Pauli Olavi Ojala

"Say the words" is how the world's oldest surviving book begins. Writing is the original magic. 💮 Video tools @ Facebook. Previously Vidpresso (YC W14), Neonto