News from Industry

Answering ChatGPT questions about WebRTC

bloggeek - Mon, 07/07/2025 - 12:30

Explore the most common WebRTC questions in ChatGPT and get answers to them that a human would give (as opposed to letting ChatGPT both ask and answer…

I am trying to use Generative AI in my own work more and more. I’ve been told (as well as told others) that I won’t be replaced by AI, but I will be replaced by someone who uses Generative AI. So the only thing to do is to replace myself by learning to use AI myself.

It started small, with the Midjourney images on social media and in my articles – these are actually handled by my son most of the time. And now from time to time, I try to have short conversations with different LLM engines that relate to work. Mostly to get ideas.

For my Video Q&A series with Philipp Hancke, which passed 50 (!) videos already, I wanted a few new and fresh questions, so I asked ChatGPT for a few. I picked a couple for some future videos, but decided it was time to write an article, answering these questions – in a way, answering ChatGPT’s questions about WebRTC – and no – I didn’t ask ChatGPT to answer them for me 😉

Table of contents

🔧 General / Introductory
🧑‍💻 For Developers / Technical Use
🔒 Security and Privacy
📈 Performance and Optimization
🌐 Browser and Platform Compatibility
🏗️ Architecture and Deployment
📦 Use Cases and Applications
🤝 Interoperability and Standards
🔍 Debugging and Troubleshooting
Got any questions for me?

🔧 General / Introductory What is WebRTC?

Well… that should be relatively simple.

WebRTC is a technology for enabling live streaming of voice and video in web browsers (and elsewhere). It is open source and available in all modern browsers today.

If you want a longer answer, with videos, then head on to this article 👉 What is WebRTC and what is it good for?

How does WebRTC work?

WebRTC is actually a set of standard specifications that together make up a sophisticated media engine that is optimized for voice and video conversations. A large part of it means dealing with the devices and the networks that users end up using.

If I had to explain it, it would be something like this:

WebRTC has minimal signaling of its own, and relies on the application on top of it to handle signaling for it
It negotiates the requirements of a session via an SDP protocol (which the application is responsible of sending and receiving)
The media connection gets established using the ICE protocol, which uses STUN and TURN server as the means to get connectivity in as many network conditions as possible
Actual media is sent and received over SRTP, with the voice and video codecs negotiated in advance via SDP

I won’t go over all the details here, but if you want to dive deeper into this, then I can suggest these two free courses that I have:

👉 WebRTC Basics

👉 WebRTC: The Missing Codelab

What are the main components of WebRTC?

This depends how you look at it 🥸

I want to suggest two different approaches:

1. Technology stack

Here, you can find the various protocols that make up WebRTC. The drawing is taken from the great book High Performance Browser Networking.

You can read more about each one of these protocols in the WebRTC Glossary, or you can check out my WebRTC Protocols courses on webrtccourse.com

2. Entities

This is my preferred video of the WebRTC components, as it talks about the entities involved in connecting a session. It also suggests that in many ways, you’re not in control of most of what’s going on, which is sad, but it has its reasons.

Here are two articles to dig in about this angle:

👉 WebRTC Server: What is it exactly?

👉 The lead actors in WebRTC are outside of your control

Is WebRTC free to use?

Yes. No. Maybe.

WebRTC is an open protocol with a high quality, popular, permissive open source implementation.

This makes WebRTC free. Individuals and companies can use WebRTC to their heart’s content in whatever application they want to develop.

The thing is… developing with WebRTC is going to cost you time and engineers – both usually expensive. And running WebRTC applications isn’t free either – there are costs associated with hosting servers and paying for networking traffic – this can get expensive quickly for video that requires high bandwidth use.

Here’s a longer article on this topic 👉 Is WebRTC really free? The costs of running a WebRTC application

Who maintains or owns WebRTC?

Google. Not exactly, but close enough.

WebRTC is defined by the W3C and IETF. These are international standardization organizations that encompass the views of multiple vendors.

The implementation that goes into all modern web browsers today? That was implemented and maintained by Google.

They are by far the largest contributor to that piece of code, which means they control and own the behavior your application would deal with the moment it hits a web browser.

And remember, Google owns Chrome which has a bigger market share than all other browsers combined and with a long margin. And all other browsers run the same piece of code, known as libWebRTC.

So yes. WebRTC is maintained by Google. Mostly.

Here’s some more on this topic/question:

👉 With WebRTC, don’t expect Google to be your personal outsourcing vendor

👉 libWebRTC

🧑‍💻 For Developers / Technical Use How do I get started with WebRTC?

Depends who you are, what you do and what you are aiming to achieve.

For a developer trying to learn WebRTC, I’d go for building a sample application to understand the tech a bit better. You can use our WebRTC: The Missing Codelab training course as a starting point and an explainer for this
Companies who wish to develop a demo or an MVP of something should most likely use a third party managed service for that. There are quite a few vendors and you can find many of them listed on this page of a report of mine: Video API report
Support, QA, product managers, entrepreneurs and other people who need a basic understanding of WebRTC can start from my free WebRTC Basics training course

There are likely more approaches and others who may need getting started with WebRTC. If I haven’t covered your scenario or need, just leave me a comment on this article and I’ll try to help you out.

How do I establish a peer-to-peer connection?

Using WebRTC of course 🙂

You will need to pass the SDP messages created by the WebRTC API from one peer to another and vice versa. You will likely also need a TURN server (or a STUN server).

Lucky for you, the WebRTC: The Missing Codelab training course is free for all. It explains how to build a Node app that does exactly that – establishing a peer-to-peer connection with WebRTC. It is packed with explanations and rationale, including covering all relevant edge cases while at it.

What signaling server should I use with WebRTC?

Whatever fits your needs.

You need to start from figuring out the signaling protocol you want to use and move your way from there to the actual signaling server.

Here are two resources to guide you through this:

👉 Choosing the best WebRTC signaling protocol for your application

👉 What is a WebRTC Signaling Server and Why You Should NOT Use AppRTC?

How do I handle NAT traversal in WebRTC?

Using STUN and TURN servers and the ICE protocol.

WebRTC runs on technologies that are slightly different from the rest of what we’re used to in web browsers. This includes things like UDP, SRTP and ephemeral, dynamic ports. As such, certain network elements out there might block its traffic (such as NAT devices and firewalls). These shape the networks in ways that might hinder the ability to send what WebRTC needs sending over the network, which is why WebRTC uses STUN (to figure out public IP addresses) and TURN (to relay media). ICE then orchestrates the process to figure out the best path between the peers in the session.

More on this 👉 We TURNed to see a STUNning view of the ICE

What’s the best TURN/STUN server to use?

That depends…

Here are a few thoughts out of the top of my head:

coturn is the most common and popular open source alternative. It is used quite a lot and by virtually everyone
STUNner is a rather new and promising alternative to coturn. It has an actual company backing it, which can be seen as an advantage
If you want a managed service, then I’d look at Cloudflare TURN or Twilio NTS before venturing to other avenues

How do I record a WebRTC stream?

There are multiple ways to record WebRTC streams.

I’ll start with a fact you need to first accept – you can’t use TURN to record WebRTC media streams. TURN servers aren’t privy to the encryption keys used…

You can record WebRTC streams on the client side using MediaRecorder API or in media servers (multiple alternatives there).

For a deeper dive into recording head here 👉 WebRTC recording challenges and solutions

How can I share my screen with WebRTC?

WebRTC has an API called GetDisplayMedia. With it, the user can decide to share a browser tab, a window or the whole screen to share.

The resulting media stream can then be sent as any other video streams over WebRTC (with some minor but important differences).

WebRTC: The Missing Codelab training course includes a lesson about screen sharing.

How do I implement group calling or multiparty video?

This one will take time and won’t fit here.

Group calling requires media servers. Usually, the SFU kind.

If you are asking, then my suggestion is to use a Video API vendor for this instead of doing it on your own. Assuming you want to build it on your own and be your own boss here, then go for one of the open source SFU media servers.

Start here to learn more about media servers 👉 What exactly is a WebRTC media server?

🔒 Security and Privacy Is WebRTC secure?

Yes.

And no.

WebRTC is secure. To the point of being the most secure VoIP solution out there.

But you can ruin it all by doing things unintentionally in the application layer.

Here’s where you should continue when it comes to WebRTC security:

👉 Everything you need to know about WebRTC security 🔐

👉 WebRTC Security & Privacy Essentials (paid course)

Does WebRTC leak my IP address?

Yes. And no.

WebRTC needs IP addresses to work. How would anyone know how to reach your machine directly to send you media peer-to-peer otherwise?

While most of what you’ll find about WebRTC leak is FUD, there is truth in it as well. The fact IP addresses are needed can be abused in many creative ways.

You can read more about this here 👉 What is the WebRTC leak test?

How can I prevent WebRTC IP leaks in browsers?

A glitch in the ChatGPT matrix!

This question is too similar to the previous one 🤯

Just go read the answer above.

📈 Performance and Optimization How do I reduce WebRTC latency?

By placing your media servers closer to the users
Doing the same for your TURN servers
Analyzing the whole media processing pipeline end to end and reducing latency along that pipeline wherever you see the opportunity to do so

Guess what? I even wrote a long form article titled Reducing latency in WebRTC 😎

How can I measure the quality of a WebRTC call?

This one is tricky.

First you’ll need to define quality. Is it related to connectivity? Actual media quality? On which devices? Over what networks and network conditions?

Are you fine eating up more of the device CPU and network for better quality? Does your answer change if the device is a smartphone and the user would rather use it for the whole day and without it heating up in his hand?

One way to measure quality in WebRTC is by way of MOS and VMAF scores. Both are not really objective and have their drawbacks.

In most cases, and for doing measurements at scale, you will end up just looking at network related metrics, such as bitrate, packet loss, jitter and round trip time.

Here’s an ebook that will give you some more information on this 👉 Top 7 WebRTC Video Quality Metrics and KPIs

Oh, and you use WebRTC stats to collect these metrics and make sure measurements of quality.

What metrics should I monitor for WebRTC?

Ha! We just got an answer to it above.

I’ll reiterate it here then 👉 Top 7 WebRTC Video Quality Metrics and KPIs

How do I improve audio/video quality in poor networks?

Use better codecs
Reduce bitrate requirements
Incorporate error resiliency techniques such as FEC, retransmissions and packet loss concealment

Here are a few resources to read about this topic:

👉 WebRTC media resilience: the role FEC, RED, PLC, RTX and other acronyms play

👉 Fixing packet loss in WebRTC

👉 8 ways to optimize WebRTC performance

🌐 Browser and Platform Compatibility Which browsers support WebRTC?

All modern browsers: Chrome, Safari, Edge and Firefox.

There are some differences, but they aren’t too many. Essentially, you’ll need to test on all browsers and fix any issues that crop up.

Does WebRTC work on mobile (Android/iOS)?

Yes.

Both Chrome and Safari on mobile support WebRTC. Again, with some minor limitations and differences, but for the most part they work.

You can also get WebRTC compiled into a native application on Android and iOS, which is quite popular.

How do I make WebRTC work in Safari?

Just like you do for Chrome, but with less debugging and troubleshooting tools and with a bit more of a headache while doing so 😉

🏗️ Architecture and Deployment Can WebRTC scale for large audiences?

Yes. It requires media servers and effort, but it is doable.

Ignore the FUD around WebRTC being P2P and the need for something different (which is someone ending up selling you their WebRTC implementation).

Here are a few resources for you to read on this topic:

👉 What is WebRTC P2P mesh and why it can’t scale?

👉 How Many Users Can Fit in a WebRTC Call?

👉 Different WebRTC server allocation schemes for scaling group calling

What’s the difference between SFU and MCU?

Both are media servers geared towards managing group meetings.

An MCU will mix the media from the participants and generate a single stream going back to the participants.

An SFU routes the media it receives to the participants in the meeting. It doesn’t process media beyond routing it.

Today? SFUs are a lot more common and popular. They offer flexibility and cost less to operate.

Start here for more information:

👉 WebRTC Multiparty Video Alternatives, and Why SFU is the Winning Model

👉 WebRTC conferences – to mix or to route audio

Should I use a media server with WebRTC?

Yes.

But it depends on your application and use case.

Generally speaking, you will need a media server if you wish to conduct group meetings or broadcasting.

Choosing the best WebRTC signaling protocol for your application

bloggeek - Mon, 06/23/2025 - 12:30

Deciding on WebRTC signaling? Explore standardized and proprietary protocols to find the best fit for your needs.

WebRTC comes without a signaling protocol. This means that you need to choose your own for your application. You can choose a standardized protocol for your WebRTC application. Maybe SIP or XMPP or something else. Or you could go for something proprietary – tailored to your specific custom needs.

Which signaling protocol is best for your WebRTC application? It depends. And it is what we’re going to try and find out today.

Table of contents

TL;DR – when to use what?
WebRTC Signaling 101
- What is signaling and why do we need it for WebRTC?
- There’s signaling and there’s transport…
Standard signaling: SIP over WebSocket
Standard signaling: XMPP
Standard signaling: MQTT
Standard signaling: Matrix
Standard signaling: WHIP and WHEP
Proprietary signaling protocol
Still confused?

TL;DR – when to use what?

Let’s start with a quick answer to satisfy curiosity. Here’s my own set of rules on how to make such a decision:

If your application already uses a “chat” protocol to send messages between users for some communications, then just extend that solution to include WebRTC signaling
- This can be a VoIP product that uses SIP (then you’ll need SIP over WebSockets to get to browsers and WebRTC with it)
- It can be XMPP if you’re more into messaging or MQTT if that’s more of an IOT type application
- Or it can be some other signaling protocol that I am just not aware of. It happens
The application has some kind of a messaging bus that is used for communication with users or between users? Use it
- This can be a simple WebSocket or REST or HTTP protocol (a proprietary one) that has been used before. I always give as an example a dating app that already has a way for people to schedule their blind date
- It can also be a managed cloud messaging service such as Ably, Pubnub, Pusher or others
- Here, you’ll need to introduce new types of messages and have your WebRTC SDP and control logic piggyback on that same signaling solution
Using media servers, most probably an SFU? These come with their own client SDKs and reference apps
- Sometimes, it is easy and better to just adopt these and be done with it
- You will need to extend them as your application evolves, but they do give a simple starting point
Do you send only or receive only? Try using WHIP or WHEP
None of the above? Just create a proprietary signaling protocol to exactly fit your needs

WebRTC Signaling 101

WebRTC is a modern and powerful media engine. The thing is, you need to direct it in the right way to get it started.

I have a couple of questions for you:

How exactly do users register to a service?
How do they indicate that they are available?
How can one user search for the status of another?
How can he reach out and dial? Or alternatively, how does one join a virtual meeting room? Or an online live stream?

These questions aren’t answered by WebRTC. They are answered by a signaling protocol.

What is signaling and why do we need it for WebRTC?

A signaling protocol is there to answer the questions above.

It does so in a standardized way (hopefully, written down and well documented so it is easy to follow and implemented by others as well).

You’d think it makes sense to have a signaling protocol in WebRTC, and you’d be correct!

But there isn’t…

Here’s what I wrote over 10 years ago about the death of signaling:

The decision not to add signaling to WebRTC might have been an innocent one – I can envision engineers sitting around a table in a Google facility some two years ago, having an interesting conversation:

“Guys, let’s add SIP to what we’re doing with WebRTC”

“But we don’t have anything we developed. We will need to use some of that open source stuff”

“And besides – why not pack XMPP with it? Our own GTalk uses XMPP”

“Go for it. Let’s do XMPP. We’ve got that libjingle lying around here somewhere”

“Never did like it, and there are other XMPP libraries floating around – you remember the one we used for that project back in the day? It is way better than libjingle”

“Hmm… thinking about it, it doesn’t seem like we’re ready for signaling. And besides, what we’re trying to do is open source a media engine for the web – we already have JavaScript XMPP – no need to package it now – it will just slow us down”

WebRTC was “rushed”. Google had an implementation ready to be baked into the browser. Figuring out signaling and making a decision by committee at the standardization organizations would have pushed the actual adoption and use by at least 5 years (and I am optimistic here).

So deciding to use something that existed such as SDP as the API interface layer (because they had it already in the implementation mind you), and just let the developers figure out how to send these messages on the network was the result.

Is SDP good? Yes. It works.

Is it perfect? Hell no. It is horrible.

But it is what we have and it is what we use.

???? While we’re talking about SDP, there are plans to get rid of SDP munging as an interface in WebRTC. The question isn’t if this will happen but when. Make sure you are ready for it.

Our WebRTC Insights clients already received an action plan to rid themselves of SDP munging in a controlled way. If you want to be ahead of the curve in everything WebRTC, then you may want to check out our service.

There’s signaling and there’s transport…

You can’t just send your signaling message over TCP or UDP. I mean you can – but not if you want this to occur in a web browser. There is no programmable interface that enables that.

What you do is either use HTTPS or a secure WebSocket. Because that’s what’s available in web browsers for you to use. With HTTPS, there’s REST, XHR and SSE – all mechanisms that transform HTTPS from a page fetching mechanism to something that can do “messaging”.

On top of these transport mechanisms, we can place our signaling protocol.

Why the distinction? I am not sure, but here are a couple of reasons that come to mind:

The transport protocol is always standardized, while the signaling protocol can be proprietary
You can use different transport protocols for a signaling protocol. For example, SIP can work over UDP, TCP, TLS and WebSocket
Because with networking, we like thinking in layers

Standard signaling: SIP over WebSocket

One of the most common signaling protocols we have for VoIP is SIP.

Most of the backbone of the telephony companies is based on SIP or a variant of it. For the most part, I regard that world as PSTN – making a phone call to a phone number not using a specific app.

Incidentally, it also uses SDP (not really – it was on purpose but in an opposite way – the media engine used originally as the baseline of Google’s WebRTC implementation had an SDP interface because it was meant to play nice with SIP).

To make sure SIP can work in web browsers, it needed a few minor changes. RFC 7118 is the standard that was created for that purpose – it enables SIP to work on WebSocket as a transport layer and then with WebRTC as its media engine.

The end result? You can use SIP over WebSocket as your signaling in a WebRTC application.

When to use it?

Your app is SIP based and you just need to enable some of the users to connect to your existing network from web browsers
You know and love SIP. And you feel confident in being able to use it in web browsers using Java Script (this one is less likely)

When NOT to use it?

Your app doesn’t have any connectivity to SIP or PSTN networks. And you’re not a SIP expert
You have connectivity to SIP or PSTN but that’s marginal and not the main focus of your application (if you’re doing a contact center that has standard phones on one end and web browsers on the other, then SIP is most likely for you)

Standard signaling: XMPP

XMPP is the standard originally used for presence and messaging. It was also what Google used for Google Hangouts back in the day before it was rebranded as Google Meet and before WebRTC was even announced.

It is quite the common protocol, so making use of it with WebRTC makes sense. Especially if you want to add voice and video communications to your app.

When to use it?

Similar to SIP, I’d use it if XMPP is already at the core of my application. There’s no point in using yet another signaling protocol next to it
If you know XMPP well, you might as well use it. Assuming you’re comfortable with that decision

When NOT to use it?

If you don’t use XMPP already and don’t know it, I’d skip
Your application doesn’t have a lot of messaging beyond just the pure signaling needed to get WebRTC sessions started

Standard signaling: MQTT

Then there’s MQTT. This is a signaling protocol designed first and foremost for the Internet of Things. Its purpose is to collect telemetry from devices.

Why mention it here? Because Facebook Messenger uses MQTT as its signaling protocol. And Messenger is one of the biggest WebRTC applications out there by usage.

When to use it?

If your application already makes use of MQTT for its messaging
Like XMPP, if you know MQTT, you might as well use it. Assuming you’re comfortable with that decision

When NOT to use it?

In all other cases
I simply don’t know how commonplace this protocol is in our industry, and I’d rather use a well known solution or one I built myself than something that has been around for years, but wasn’t adopted widely by my industry. Not because it isn’t good – but because other solutions seem good enough and more well known

Standard signaling: Matrix

I think it is time I recognize Matrix as a standard signaling solution…

Matrix is rather new and was introduced and built with federated decentralized communications in mind. Big words. I am not going to explain them here.

It comes with an open source implementation of both server and client in multiple programming languages and a managed service on top for those who need it – Element

And yes. It can be used for WebRTC as its signaling protocol.

When to use it?

Think of it as all or nothing. If you use Matrix and its client and server side code for the benefits they offer (messaging, decentralization, etc), then choose it

When NOT to use it?

Don’t pick and choose pieces of it to form a signaling protocol

What I am trying and failing to say here is that you should pick Matrix if the open source app it comes with is very close to your own intended application behavior.

Standard signaling: WHIP and WHEP

Then there are WHIP and WHEP. These ARE WebRTC signaling protocols in the sense that they were designed and defined specifically for WebRTC – they aren’t used for anything else.

They are simple and limited in scope and capability.

When to use it?

For unidirectional streaming, check if WHIP and WHEP are for you
If you plan on having third party devices stream into your service (think about OBS as an example) or if you want to support some future generic players then WHEP (future because this is too early)

When NOT to use it?

What you’re doing is bidirectional in nature
You don’t care about an ecosystem or third parties and adding WHIP or WHEP only complicates things even if only a bit

Proprietary signaling protocol

You decide what you want here.

Sit down and write what type of messages you need to be able to pass. What information these messages convey. Decide on their structure and method of parsing (JSON anyone? Maybe protobuf? Something else?). Figure out what transport you want to use. Document and implement.

Be sure to make it a wee bit extensible with the ability of versioning.

When to use it?

If you already have something that can be viewed as signaling in your service. Then you just extend it this way
When you don’t find any reason to use any of the standardized signaling protocols

When NOT to use it?

Only if you lean into a standardized protocol due to reasons I’ve given in the previous sections

For me? A proprietary signaling protocol is likely going to be the way to go for a lot of the use cases that come my way.

Still confused?

I hear you.

Making a decision isn’t always simple and choosing a solid WebRTC signaling protocol for your application is one of these times.

Here’s what I can suggest:

If you picked the proprietary route, then our WebRTC: The Missing Codelab course has just switched from being a paid course to a free course. Enroll to learn more about this as part of that course.

If you want assistance in making the decision, just contact me.

The post Choosing the best WebRTC signaling protocol for your application appeared first on BlogGeek.me.

WebRTC is about reducing friction and barriers of entry

bloggeek - Mon, 06/09/2025 - 12:30

Discover how WebRTC removes the barriers of entry and the challenges associated with real time communication application implementation.

[In this list of short articles, I’ll be going over some WebRTC related quotes and try to explain them]

I want to go back to the basics of WebRTC and when it came to be.

People complain that WebRTC is too complex. I say it is the simplest thing we have.

When WebRTC came out, developing a web meeting service that does video was expensive as hell:

You had to develop your own media framework (or purchase a commercial one, and there weren’t many out there)
You had to integrate your own signaling into it, even if that was SIP
You then had to port it to multiple operating systems (at least Windows, Android and iOS, but usually more)
And then test it. Over different operating systems and hardware configurations. Different Windows machines act differently (surprise – you likely don’t remember that), and you had to purchase them, test on them, deal with complaints from customers

It was a royal mess.

I wouldn’t start such a project without $1-2M investment just to get a first clunky and limited version to show for it. I know, because I’ve done it once or twice where I worked prior to WebRTC’s launch.

–

What did WebRTC bring with it?

The above… in a day. Or a week:

A commercial grade media engine, built into every browser
1. Widely tested across operating systems and devices
2. Running multiple voice and video codecs
3. With all the bells and whistles of network impairment adaptation logic
The notion and reality of royalty free voice and video codecs – Opus, VP8, VP9 and AV1 have all became commonplace, widely accepted and adopted
All that goodness, with a standard API on top

The end result was that a small team or even a single developer can now build a proof of concept in a short timespan, cutting down the initial investment to virtually nothing. This means you can get something out there either on seed funding or on a shoestring budget.

It also changed the nature of the developers. Most developers using WebRTC aren’t the classic VoIP developers. They don’t have that skill set or training when they start off – they simply take some open source project and move on from there, trying to figure things out for themselves. Sometimes it works. Sometimes it doesn’t. But it does bring with it a lot of creativity and out of the box thinking (simply because they don’t know where or what the box even is).

–

Scaling a successful service still is a huge challenge. For that, you do need to understand the technology intimately.

But that first step? And the second? And the third?

A lot easier to do with WebRTC than it ever was before.

The barrier of entry and friction for developers to use such technologies has gone dramatically down.

–

So if before, the barrier had been having enough money and the training necessary.

That is no longer the case.

You just need a darn good idea, that WebRTC is a viable solution for, with the ability to execute it. Sprinkle on that great timing and luck and you’re good to go.

The barriers needed for your business? They now need to come from elsewhere. WebRTC won’t give them to you.

Need help?

Be sure to follow this blog, as it is the most up to date resource out there about WebRTC

Subscribe to WebRTC Weekly to get a picture of what others are publishing about WebRTC out there

The WebRTC Insights service takes care of a lot of what goes on in the market for you, as well as the progress made by browsers with WebRTC support

I can assist with figuring out what is possible with WebRTC, and where to focus your energies in putting up the mote you need for your business

The post WebRTC is about reducing friction and barriers of entry appeared first on BlogGeek.me.

Using LTE modems under Debian

TXLAB - Sun, 06/08/2025 - 23:07

Back in the day I created a set of scripts for 3G and LTE modems to use under Debian: they used PPP chat scripts and custom udev rules for convenience. That’s all obsolete now.

NetworkManager and Modemmanager hide all the modem communication under the hood, and you only need to initialize them properly. The following scenario was tested with Huawei ME906s and Fibocom L850-GL modems:

apt install -y network-manager modemmanager nmcli connection edit type gsm con-name LTE save quit

Here it’s important not to set “connection.interface-name“, so that NetworkManager can pick any interface name of type “gsm”. You may also need to set the APN name if it’s different from “internet“.

The Fibocom L850-GL needs to be set to MBIM mode first:

apt install -y picocom picocom /dev/ttyACM0 AT+GTUSBMODE? AT+GTUSBMODE=7 AT+CFUN=15

After that, the NetworkManager will connect automatically to the LTE network if it is available. If an Ethernet connection is present, it will receive a route with a lower metric, so that the LAN path is preferred.

How WebRTC’s NetEQ Jitter Buffer Provides Smooth Audio

webrtchacks - Tue, 06/03/2025 - 14:00

Audio jitter buffers are required 101 introductory material for understanding VoIP. libWebRTC’s audio jitter buffer implementation – the one in Chromium – is known as NetEQ. NetEQ is anything but basic. This is good from a user perspective since real-life networks conditions are often challenging. However, this means NetEQ’s esoteric code is complex and difficult […]

The post How WebRTC’s NetEQ Jitter Buffer Provides Smooth Audio appeared first on webrtcHacks.

8 ways to optimize WebRTC performance

bloggeek - Mon, 05/26/2025 - 12:30

Discover effective strategies to optimize WebRTC and enhance the quality of your video and audio streaming services.

In my update to the Video API report this time, I had the chance of reviewing what the vendors have done in the last 12 months or so. Some added new features and capabilities. Others not so much. Many were improving and optimizing their offering – better background replacement, less peer connections, more users in a single call, additional devices, …

WebRTC is a marathon and not a sprint. You can’t just write once and forget. You need to work at it. Day in, day out. Improving and optimizing your application.

Part of these optimizations are around WebRTC performance. Here are 8 places to validate the next time you need to optimize your WebRTC application’s performance:

Table of contents

1. Send and receive less bytes
2. Use better video codecs
3. Don’t send all audio streams all the time
4. Use simulcast and SVC only when needed
5. Treat different configurations differently
6. Have more media servers
7. Allocate users to closer servers
8. Collect, measure and monitor your metrics
Final thoughts on optimizing WebRTC performance

1. Send and receive less bytes

Here’s a shocker – if you send and receive less bytes (especially of the video kind), you are going to have higher performance. Your device will use less network and CPU resources (which will make it perform better). The media servers will have less data to route through them.

I know that what we want at the end of the day is the best possible 4K resolution at 60fps in a crisp look. And that’s before you start dreaming of doing VR or 8K.

But here’s the thing – do you really need 4K or even full HD on a smartphone with a 5” or 6” display? Is that 4K from the webcam useful when you’re also sharing your display at the same time and the other participant cares about your display and not your looks?

Why did I switch here from bytes to resolution? Because the higher the bitrate (=bytes) the higher the resolution I can compress at reasonable quality

We call this the resolution ladder – for a given bitrate, we match a suitable resolution, and we go up or down the ladder based on how much bitrate we have. The numbers vary per the video codec, frame rate, type of content and if you’re going up or down the ladder, but that’s for another time.

Oh, and you don’t control where the rungs on the ladder are – that’s a decision left to the browser to make

–

So… first things first.

Go count your pixels. Check your bitrate. See if it is optimal for your use case. Ask yourself if, where and how can you reduce that bitrate. Either on the incoming or the outgoing streams. To think about it in a simpler way, start by focusing on the resolution and framerate and move your way from there towards bitrate and bytes.

2. Use better video codecs

Did I mention that video codecs affect bitrate and quality?

For the same bitrate budget, the quality you get will be something like this for each video codec:

VP8 < VP9 < AV1

AV1 will give better quality than VP9 which in turn offers better quality than VP8 (for the same bitrate).

So yes. Picking a newer video codec means lower bitrate. But it also means higher CPU and memory use. This makes the decision non-trivial…

When you pick a better video codec, there’s another decision to be made – are you going to use the added bitrate to improve quality or will you reduce the bitrate and maintain the same level of quality?

And this isn’t the only question to deal with in a multi video codec environment. You need to pick the video codec that is suitable for the specific scenario you’re in:

AV1 is a great codec to use today. But not on older devices. And not when the resolution and bitrate might be too high
AV1 is also great for text in screen sharing (text legibility at even low bitrates is way better than the other alternatives)
H.264 can be a great codec on the right devices – it comes with hardware acceleration in many cases, which means lower CPU use and having mobile handsets that don’t warm up on long video calls
VP8 is rock solid, available everywhere
HEVC is an Apple thing for Apple devices that might or might not be available
VP9 today is a kind of a transition point between VP8 and AV1

Which. One. Do. You. Use?

It depends.

And we will leave it at that. Just know that optimizing WebRTC for performance means figuring out which video codec to use in which scenario.

3. Don’t send all audio streams all the time

During Covid, I had a customer asking to be able to recreate the experience of a stadium full of people. Hearing the people around you and the crowd cheering together when a goal is scored.

The problem, besides the CPU and/or network required to make that happen, was that the WebRTC implementation from Google at the time (that’s libWebRTC) wasn’t fond of mixing too many audio sources. It simply took the 3 incoming streams with the loudest audio and mixed them – ignoring all others.

The good thing about it? It reduced CPU load. And frankly, if you have more than 3 people speaking in a meeting you have other issues than the WebRTC implementation – likely something you’ll need to settle between the people speaking anyway.

What happened is that Google a year or two ago decided to remove that optimization. It will now mix all incoming audio streams thrown at it. Theoretically, you can now give that stadium audience the vocal experience of everyone cheering. In reality? Your users might be suffering from CPUs that warm up a lot more due to the extra mixing effort.

What should you do?

“3 loudest” approach to audio mixing

Decide on the maximum number of audio streams you wish to mix. If you aren’t sure – just pick the magic number 3.

3 was the magic number libWebRTC used for over a decade. Now there’s no limit in libWebRTC. But… Google Meet still decide on 3 as its magic number.

Now that you have that number, make sure in your SFU to never send more than the 3 loudest streams to send towards the listeners. What do you do with the rest? Replace their media with DTX or just don’t send them… up to you and your architecture.

That will improve your session’s scale and optimize WebRTC performance for both network and CPU.

4. Use simulcast and SVC only when needed

Simulcast is great! SVC? Even better!

But not every problem is a nail with that hammer you call simulcast (or SVC for that matter).

Let’s take simulcast as an example. We use it to generate multiple video streams in various bitrates so that a group meeting will be able to deal with users on different networks and devices. It improves the average user experience of the meeting for its participants.

But… done in a 1:1 meeting, it is just wasteful.

The sender here is sending too many streams, causing it to waste precious CPU and network resources instead of using the same resources to improve the quality of that meeting with a single video stream.

You need to figure out when to use and when not to use these features…

5. Treat different configurations differently

That example around simulcast above? Let’s generalize it a bit, shall we?

Your application will have different configurations for its WebRTC operation. It might be due to the number of users, their locations, the devices used, their network quality or even what it is that they are doing in the meeting itself.

Take all of these different permutations, let’s call them configurations. And now for each, figure out what is the best approach to optimize the performance of your WebRTC stack for it.

Is it worth the effort to optimize in such a way?

Does this configuration happen often enough? To important users/customers?

How complex is it to implement that kind of optimization?

What about switching from one configuration to another – can you smoothly turn on and off the various optimizations you have in place?

This is important. Go do the work.

6. Have more media servers

If you want to optimize a WebRTC application for performance, you might as well throw more media servers on the problem.

Throwing more hardware is great, but the point I want to make here is that these servers need to be CLOSER to the users.

Got all your media servers in a single data center in US East? You need to add another region.

Covered the US and Europe? Time to add Asia.

Etc.

In my Video API report, there’s the whole gamut of deployments:

Everything from a single region, single continent to over 200 regions. And it seems that you’re either happy with 10-30 or you strive for 200+ regions.

Check where your users are from. Populate the data centers around them with your media servers.

Oh – and you don’t really need to overdo it. Many of the bigger vendors (who have high media quality) make do with less than 20 different regions.

7. Allocate users to closer servers

Got your servers sprinkled all over the globe? Great!

Now where do you end up connecting your users? To which location?

If there’s a meeting between 2 people in the US and 1 in France. Which regions do you have media servers covering this headache of a meeting?

If it is in France… then the two in the US are going to have a poor experience. Especially when they talk to one another in the meeting (their media flows over the Atlantic ocean for no good reason)
If it is in the US… well… that guy in France might suffer from a poor connection over that same ocean and end up with more packet losses and latency than you wish for
You could cascade this and have multiple media servers in multiple regions handle the session. But that takes effort. Make it happen

The point I am trying to make? Media server allocation for group meetings isn’t trivial. Take your time figuring it out and implementing it properly.

8. Collect, measure and monitor your metrics

If you don’t know what’s wrong and why, there’s no way you’re going to be able to fix things. Or improve. Or optimize.

I started off by saying that WebRTC is a marathon and not a sprint. When it comes to optimizing WebRTC performance, it means that you need to improve over time your application.

Where and what to improve?

What gives the highest ROI for your effort?

Did your changes make a dent and actually improve things?

To answer these questions requires you to collect metrics, measure and analyze the data. And monitor continuously for it.

Make that a top priority for you.

Why?

Because the time will come when you will have users complaining. I’ve seen it happen multiple times with the companies I help.

Starting to put these monitoring tools in place at that point in time means you’re working with urgency of churning customers, which isn’t fun.

Start earlier than that.

Final thoughts on optimizing WebRTC performance

This is what came out of the top of my head the other day about optimizing WebRTC performance. There are likely at least 8 more ways to do that – all of them important and useful.

Don’t neglect this part in your WebRTC application development planning.

Optimizing a WebRTC application is great. But what about successfully launching it?

Check out my 3-step WebRTC launch action plan – a free resource that will show you what I do with every consulting project that deals with launching WebRTC applications.

Get the 3-step WebRTC launch action plan

The post 8 ways to optimize WebRTC performance appeared first on BlogGeek.me.

A good WebRTC application is like a great orchestra performance

bloggeek - Mon, 05/12/2025 - 12:30

Learn about the qualities that define an exceptional WebRTC application and why user experience matters.

[In this list of short articles, I’ll be going over some WebRTC related quotes and try to explain them]

There’s something to be said about great WebRTC applications. Something about them is simply better than the rest when you bump into them. We’ve all seen them. For each of us it might even be a different application.

What do they all have in common?

Their user experience works for you (instead of you working for it)
You don’t need to think about calls not connecting (they might not connect, but somehow, you’re going to understand why – and it will happen less often)
Media quality will be good enough (you won’t find yourself comparing it to other experiences you’ve had)

Getting there requires a certain commitment. A need to look at the various parts of the application, the whole design, the implementation. And then to lovingly optimize it over and over again. Iterating in each stage to polish another piece of it.

Somehow, I wanted to compare a good WebRTC application to a great orchestra performance in this quote, but I find myself drawn to another conclusion immediately – the one that says that WebRTC is a marathon and not a sprint.

Table of contents

For Engineers
- Cover all your bases
- An ongoing effort
For Product Managers
For Customer Success and Support
Need help?

For Engineers

Getting WebRTC properly tuned like a great orchestra requires finesse and a lot of understanding of how WebRTC works.

There are a lot of moving parts in WebRTC – clients, browsers, media servers, TURN servers, …

And they all need to work together properly:

Cover all your bases

Just recently I sent out my tip & offer email to my subscribers, where I mentioned that a media server cannot work in vacuum and needs a client side SDK.

Fast forward a few weeks, and Cloudflare acquires Dyte because its SFU was missing … a client SDK. I’ve written about Cloudflare as part of my previous article – go check it out.

The same is true for the other bits and pieces of WebRTC:

Yes. TURN servers are rather independent and the first thing I’d suggest to my clients is to “outsource” these to third party managed services if possible. But you still need to focus here on where your users are, their types, the need for custom installations at times, etc.
Media servers need to have client SDKs. I mentioned that already above
You need to figure out the source of truth in the whole deployment – and if you even have one – or do you have media servers and application servers communicate independently directly with the clients that pass JWT tokens with their permissions
Scaling has multiple dimensions here: scaling a group call, scaling on the client’s UI, scaling specific server types, scaling a global session across servers, …
How do clients and media servers “negotiate” the dynamic capabilities and limitations of the client’s device?
Where does the UI and UX on the device play a role to “hide” certain limitations of the system – such as the latency, mute signals, low CPU, poor networks, …

The list here is endless…

An ongoing effort

An orchestra? It has a conductor. His role is to decide in real time what takes place. And for that he looks and listens to the musicians.

With a WebRTC application, we need observability – a way to understand what users feel in real time as it relates to the media being sent and received. And then we need to adapt.

This adaptation is done dynamically. But also as an optimization effort that takes place over time.

For Product Managers

Here are a few immediate insights to draw from this:

It isn’t that simple and obvious what makes up a good application
Good applications require attention to detail
Since WebRTC is built out of many moving parts, you need to orchestrate and tune how they work together to reach the type of an experience you want
This is going to take time. Longer than what your developers or your outsourcing vendor is promising you. And not because they don’t know – but because getting from a WebRTC application to a good WebRTC application isn’t obvious (or even factored in the requirements)

So. Where does that lead you?

Look at WebRTC projects as an ongoing investments
Plan for generous “technical debt” time
- 20% of engineering effort around the communications piece should be fine
- Split this between actual technical debt and small tweaks and improvements that are targeted at tuning your WebRTC orchestra
- Have a Product Manager guide and prioritize these tuning initiatives
Compare your application to the market leaders every 6 months or so
- WebRTC moves fast, and so are the leading vendors
- Knowing what they do and “feeling” their apps will give you insights

For Customer Success and Support

An orchestra has lots of different musical instruments. Each giving his own unique sound to the final composition.

With WebRTC applications, we must not forget Customer Success and Support functions.

While we may have the best implementation of WebRTC. The best infrastructure is in place. At the end of the day, what is going to matter is the here and now. The session the user is on, and the experience he is having.

And as I always say, things are out of our control, and some of the reasons for that are the user’s own device and the network.

In such cases, we will need to front user complaints and requests, and be able to handle them properly. This is part of the overall experience. Part of the “orchestra performance” that we’re putting out there in our WebRTC application.

Take care of all your WebRTC instruments – even the non-technical ones.

Need help?

Be sure to follow this blog, as it is the most up to date resource out there about WebRTC

Subscribe to WebRTC Weekly to get a picture of what others are publishing about WebRTC out there

The WebRTC Insights service takes care of a lot of what goes on in the market for you, as well as the progress made by browsers with WebRTC support

I can assist with comparisons to market leading apps, as well as in prioritizing efforts

The post A good WebRTC application is like a great orchestra performance appeared first on BlogGeek.me.

The future of Video APIs is… AI: LiveKit, Daily and Cloudflare this month

bloggeek - Mon, 04/28/2025 - 12:30

Three important news items were published in the past couple of weeks that are shaping the Video API market. And all have an AI aspect to them.

Our Programmable Communications industry (CPaaS) is moving and shifting. And those focused on video are the ones who matter at the moment. In the past, we’ve seen such innovations coming from Twilio, who defined and redefined the CPaaS market. In recent years not much. These days? You need to look at the video players to understand the trends.

Here are 3 big news items that got my attention this month and why they matter:

Daily announced Pipecat Cloud
LiveKit series B funding and… LiveKit Cloud Agents
Cloudflare acquires Dyte and partners with Hugging Face

Let’s see where all this lead us to

Table of contents

Why video is leading the way in AI for CPaaS
From an AI interface to an AI framework
Daily and Pipecat Cloud
LiveKit and Cloud Agents
Cloudflare closing its gaps
Upcoming update of the Video API report

Why video is leading the way in AI for CPaaS

CPaaS started off around SMS and voice. The concept around it was to aggregate telecom providers and place a single, sane API on top of them.

The barrier or mote here for vendors was the negotiation of contracts and integrating with interfaces of 100+ telecom providers around the globe. Not fun at all.

That meant a customer could purchase a phone number, send a message and answer calls without the need to think if the underlying provider is Verizon, AT&T or Globe Telecom in the Philippines. And the customer didn’t really care – not who the underlying provider was, as long as the service was good. And that service was uniform in nature – you want calls to get connected and messages to be delivered at a high rate. Nothing less and nothing more.

Fast forward to today and nothing changed in voice-land.

But AI is different.

When you look at how the voice focused vendors are adding AI, some are doing so by deciding which algorithms/vendors to use and placing an API layer on top of it, taking their sweet time about it. The notion is that the customer doesn’t care much/enough about this anyways and/or that the algorithms/vendors are finite and small in their number. So they can do it all themselves.

The video focused vendors who are looking at this and are at the forefront with their vision are Daily, LiveKit and Agora. They all created AI frameworks making them open source. Gustavo wrote about these already.

The concept behind all these frameworks is simple:

Make it easy to connect the Programmable Communications media stream to the framework
Have the framework flexible enough to deal with a variety of use cases, some of which are still unknown to us
Integrate with as many algorithms/vendors as possible
Make it open source, so that others can integrate more algorithms/vendors (because the world is infinite here and not finite)

And it worked for them. At least based on the engagement numbers we see on git for the relevant projects.

From an AI interface to an AI framework

The naive solution which I was promoting and aiming for was simple. If you are dealing with CPaaS, what you need to offer is a way to extract or inject in real-time audio and video streams to your platform in a backend-to-backend manner.

Such an approach just means that you have a WebSocket, RTP or some other transport mechanism from your media servers that can then be connected to external AI services. Think of TTS (Text-To-Speech) for a call as an example. Users connect to your SFU. The developers can connect the audio from that meeting and send it towards whatever TTS service they want and continue things from there.

That enabler is an AI interface for CPaaS. Some services have had these for years on their voice channels. Those doing video started introducing them more recently. It gives developers the full capabilities, but little else. In a way, it leaves a lot to be desired. Especially now that LLMs are so popular and mostly text based.

What happens is that we usually need a kind of a processing pipeline these days. A way to ship media from the media server through one or more external components and then back into the media server. That requires an AI framework.

Something akin to… well… Daily’s Pipecat and LiveKit Agents.

I believe such frameworks connected to the Video API or being an integral part of them will be critical moving forward.

Daily and Pipecat Cloud

Daily had a hosted solution for AI called Daily Bots. It decided to sunset it and instead introduce Pipecat Cloud. The actual announcement was made by their CEO, Kwindla Hultman Kramer over LinkedIn:

(you should follow Kwindla on LinkedIn – he shares a ton of insights and resources there regularly)

The main change?

Up until now, Daily developers could use one of two approaches:

Adopt Pipecat as their AI framework, build their logic with it, and then deploy it on their own wherever they wanted – just like any other open source component
Use Daily Bots, which was a hosted service by Daily, built on top of Pipecat. It was great but limited in nature (it didn’t allow running custom Python code as part of the bot)

Daily decided to sunset Daily Bots and migrate its customers to a new platform called Pipecat Cloud. This is a managed Pipecat service, where developers build their own Pipecat pipelines in local Docker containers and then upload them to the Pipecat Cloud where they run in production. Daily takes care of scaling, monitoring and everything else.

It was the natural next step:

This increases the mote between Pipecat and Daily’s competitors to just use Pipecat; they would need to now offer a cloud based service as well to make it compelling to begin with
It enables and entices an easy migration path between the Cloud and the open source offering

In a way, Daily took a step from LiveKit’s playbook – starting by offering an open source framework (Pipecat), getting developers hooked and addicted to it, and then introducing a paid Cloud service for it. Which is a natural segway to… LiveKit.

LiveKit and Cloud Agents

LiveKit had a big announcement this month, celebrating its new series B funding of $45M. This post is interesting in the way it is written – from the least important to the most important (at least for me):

LiveKit Agents 1.0 is released, in a way, stating this isn’t a beta or an MVP anymore without really saying it
- Workflows are introduced, for better support of a conversation flow with known steps in it (mainly for contact centers)
- Multilingual semantic turn detection, which is neat
- Telephony support, which was there before, but somehow mentioned here for emphasis I believe
Wrapped under Agents 1.0 is also Cloud Agents, which I believe deserve to be mentioned separately
- LiveKit Cloud Agents is the same as Pipecat Cloud – in the sense that you build your own LiveKit Agents logic and code, and then host it on LiveKit’s cloud
- Unlike Pipecat Cloud, Cloud Agents is in closed beta with a Google Form in front of it to access
- This might mean that LiveKit weren’t ready for this announcement, but had to push it through because of Daily’s announcement AND because of their series B funding
Series B funding
- $45M is serious money in 2025, especially in the Video API domain where funding is scarce. This comes low when compared to pure AI players, but in a way, shows where the focus is in our industry now – AI (not surprising)
- Total funding LiveKit raised so far is $83M, which is considerable and shows the trust of its investors
- LiveKit plans to use this new funding towards “growing our team and furthering our progress towards offering an all-in-one platform for building AI agents that can see, hear, and speak like we do.”

This was great news for LiveKit and it gives them what they need to push through and grow their offering in ways that are hard to achieve in the current economic climate.

Cloudflare closing its gaps

I must admit. For me, Cloudflare in WebRTC was a bright shining light and a huge disappointment at the same time.

On one hand:

Cloudflare is likely the 4th IaaS vendor after AWS, Azure and GCP
Their spread of 200+ data center and use of Anycast brought something fresh and new to the WebRTC market
A no frills hosted SFU was again something interesting and new

On the other hand though:

There was no client SDK to speak of
Cloudflare assumed developers would just connect to their SFU and it will magically just work, which is far from the reality. Especially if you want to optimize for media quality
Since the initial announcements, no further news came out of Cloudflare officially

It seems like Cloudflare didn’t lose interest in WebRTC. It just tried to figure out what the next big step should be, and it is trying to close the gaps with two different deals it did, wrapped into a single announcement.

It starts with a new name for the offering. Instead of Cloudflare Calls it is now called Cloudflare Realtime, which now includes 3 products: RealtimeKit (new and in beta), TURN Server (once almost a hidden part under Calls) and Serverless SFU (what was Calls).

Cloudflare acquired Dyte, another Video API vendor from India, and wrapped it into RealtimeKit
- Dyte will be moving its own API and SDKs to use Cloudflare’s infrastructure (IaaS and most likely also TURN and SFU). At some future point, they might just close Dyte as a product/company and have it all under RealtimeKit
- RealtimeKit now serves as the biggest missing part for Cloudflare – client SDKs. The announced platforms that will be supported by these SDKs are Kotlin, React Native, Swift, JavaScript and Flutter
- Recording and Voice AI (in partnership with EleventLabs) will be part of the platform as well
- As with LiveKit Cloud Agents, the access to RealtimeKit is also in private beta behind a signup form
- There is also a promise that all this comes with a robust AI offering, but that feels more of a lip service or a roadmap item than anything else at the moment
Partnership with Hugging Face
- Hugging Face is a large and important player in the generative AI and machine learning domain
- Recently, it launched their own FastRTC framework. FastRTC is all about connecting WebRTC and WebSockets to AI models – essentially what we need to build our media pipelines in Video APIs; and in a way, somewhat similar a bit to PipeCat and LiveKit Agents
- To make sure users of FastRTC end up with Cloudflare’s WebRTC infrastructure and RealtimeKit, the initial step that Cloudflare took was to offer free 10Gb of TURN bandwidth each month to Hugging Face users. It sounds much, but it is $0.5/month based on Cloudflare’s TURN pricing
- What’s important here is the partnership and the intent. I am sure this is a first step, considering the acquisition of Dyte in parallel to this

All in all, a positive announcement for Cloudflare and shows intent of investing further in WebRTC and Video APIs.

Upcoming update of the Video API report

These market changes, along with a few previous ones, made me decide to update my Video APIs report.

It needs a better explanation of the market after Twilio decided to keep their Programmable Video service, but also in light of the trends mentioned here, beef up the whole section dealing with AI frameworks.

I am again reaching out to the vendors, to see what I missed from the work they put into their platforms this past year, and also looking for vendors who weren’t covered by the report so far and should be there. If you know of one, or work in one, just ping me to let me know.

And if you are interested to learn more about this report, or any of my other services – just reach out to me.

The post The future of Video APIs is… AI: LiveKit, Daily and Cloudflare this month appeared first on BlogGeek.me.

What’s Your SaaS for WebRTC Signaling?

bloggeek - Thu, 04/24/2025 - 12:00

Looking for a signaling solution for WebRTC? Why not ditch the whole protocol discussion and head straight towards a SaaS based approach?

The true meaning of cloud-based signaling

I’ve written about how to select a signaling protocol for WebRTC. This led to a lively discussion both on my blog and on Facebook’s WebRTC group. I learned a new thing that day:

VoIP signaling is a religion. People believe in a specific protocol and worship it. And they tend to fight with the atheists.

I was part of the religious in VoIP, but I now have doubts of its need. Call me a signaling protocol atheist.

I was corrected on that post that signaling protocol and network protocol are two separate things that need to be discussed and selected separately. It is true that they are different, but I think that most developers who approach WebRTC today don’t make that distinction any more – they simply don’t care – they are just trying to get their service to work.

On Facebook, Olle E Johansson commented:

Yes, the API is the key. Finding an abstraction level that the web developer understands, not that just exposes protocol operations. The signalling matters when things start growing and you need scalability or interoperability with other systems, but only then.

I guess you care about signaling protocol for interoperability, I just don’t see how it can help with scalability – the web is scalable enough – a lot more than VoIP today – just ask Whatsapp. Why bog it down with SIP? My recommendation still stands: If you don’t need to connect to other networks (or if you do, but only for a small part of your use case) – go for a proprietary signaling protocol.

What I did ignore/miss though, is what happens when you decide to go for a proprietary protocol, but don’t really want to deploy a server at all. What if what you want is to get “signaling as a service” – SaaS.

First question is why would you?

The easy answer here is because you can, and because it has its advantages over building your own. As with any other SaaS or cloud related service, these things come to mind:

Scalability – someone else who does that for a living takes care of it for you
Maintenance – do you really want a DevOps guy to sit all day playing with scripts and monitoring your signaling infrastructure?
Availability – same as above. Just too much work to deal with

The main thing though, is probably deciding what’s core to your business and what is just details. Signaling has migrated in to the “details” part, so outsourcing it to a SaaS vendor makes sense.

Here are a few viable options to use for WebRTC signaling in a SaaS model.

Ably

Ably is one of the independent managed messaging/signaling platforms out that that can be used for WebRTC signaling.

I have a soft spot for Ably – at testRTC, years ago, when we needed some signaling solution to create our own simple demos or to integrate into our products, after going through the motion of trying out other alternatives (some listed here below), we ended up with Ably.

Why? Because it was the most straightforward and simple for our develoveprs to integrate with.

What were the exact reasons? I don’t know, and didn’t investigate much at the time. It simply provided the best experience for us in getting things up and running – and that’s our goal anyways.

PubNub

If you’ve been around long enough with WebRTC, you should already be aware of PubNub. As an example, Rebtel are already using them.

PubNub offers a publish/subscribe infrastructure that can be used to develop messaging applications. WebRTC services being one of their targets, they are heavy on marketing their solution in WebRTC events and have gone as far as offering a reference implementation for developing a video calling service with WebRTC and PubNub.

If you are looking for a vendor that cares about show casing customers that use WebRTC and offers the kind of scaling you will need today for other use cases – PubNub is a good choice.

Firebase

Google Firebase is one of the BaaS vendors out there (Backend as a Service). Their intent is to enable developers to build frontend apps without having to care at all about the backend.

The main difference from PubNub here is that it synchronizes data and acts as distributed storage/memory for your apps. If you need more than just messaging, I’d suggest you check it out.

Since its acquisition by Google, Firebase has expanded to be the developer backbone solution of a lot of services for developers – especially on Android apps.

I know of a few in the WebRTC community that are using Firebase, so it is a valid option. Firebase might not make a lot of noise about WebRTC, but that’s because it isn’t their main focus (which can be seen also as a downside)

PeerJS

PeerJS isn’t really a SaaS provider, but it is contemplating to be one – at least from how it looks like in their website.

PeerJS is a framework that provides signaling for WebRTC. It operates with a Node.js based server called PeerServer that has a service called PeerServer Cloud. This cloud service offers only free accounts for hacking, but nothing in the form of production support.

It is here because of three reasons:

Many are using it already to build their services, so it made sense to me
They have the potential (and general intent) of offering it in a SaaS model; although this hasn’t properly materialized in over 10 years of their existence
The source is freely available, which means that you have the ability to start with SaaS and migrate to your own data center

A word of caution – from the looks of it – this might not be able to scale easily to the millions. It just seem too… lightweight.

Pusher

If Ably, PubNub and Firebase are here, then Pusher should be as well. It is a messaging SaaS provider. Not much in the WebRTC domain about it, but I guess that it can be used just as well.

Use it if you already know it and like it.

–

I am sure there are others as well that I missed, like XSockets.NET – and those that are still too small like GrimWire. And then there are the likes of Stream, which started for messaging and now has its own video service built on top of WebRTC as weel.

–

If you are trying to figure out what to use for your product, you can always contact me about it.

The post What’s Your SaaS for WebRTC Signaling? appeared first on BlogGeek.me.

OpenAI & WebRTC Q&A with Sean DuBois

webrtchacks - Tue, 04/22/2025 - 13:23

OpenAI is utilizing WebRTC for its Realtime API! Even better, webrtcHacks friend and Pion founder Sean DuBois helped to develop it and agreed to a Q&A about the implementation. It is not often a massive WebRTC use case like this emerges so rapidly. In addition, Sean was extremely transparent about his work at OpenAI. In […]

The post OpenAI & WebRTC Q&A with Sean DuBois appeared first on webrtcHacks.

WebRTC gives voice to LLMs

bloggeek - Mon, 04/14/2025 - 12:30

Explore the role of voice LLM in interactive AI. Understand how voice interfaces in generative AI require the use of WebRTC technology.

[In this list of short articles, I’ll be going over some WebRTC related quotes and try to explain them]

We’re all into ChatGPT, LLMs, Agentic AI, Conversational AI, bots, whatever you want to call them.

Our world and life now revolves around prompting. It used to be search and copy+paste. Now it is all prompting.

A natural extension of text is voice. And for that, we need to also understand that the whole interaction is going to be different:

Where prompting is turn by turn, voice is a lot more interactive.

At the “beginning” (as if we have ChatGPT with us for a decade…), companies introduced support for voice interfaces to their Generative AI LLM models using WebSockets. Some still introduce it today as well – even calling it “low latency”.

Rather quickly, that notion has died and has been replaced with the use of… WebRTC.

Why? Because we need something that is low latency, real time, interactive and live. All words that are used to describe WebRTC.

Want to dig deeper into this? Check out the following articles:

What Will Be the API Giving Voice to LLMs? (Nordic APIs)
CPaaS and LLMs both need APIs and SDKs
OpenAI, LLMs, WebRTC, voice bots and Programmable Video
Generative AI and WebRTC: The fourth era in the evolution of WebRTC

Need help?

My Generative AI & WebRTC workshop is available for corporate customers to enroll for live sessions

This blog is chock full with resources and articles that deal with these things. You just need to search for it and read

I offer consulting to companies who want to develop with WebRTC. This includes making use of Generative AI technologies

The post WebRTC gives voice to LLMs appeared first on BlogGeek.me.

Measuring the response latency of OpenAIs WebRTC-based Realtime API

webrtchacks - Tue, 04/01/2025 - 14:00

As Chad mentioned in his post last week, we have been diving into what OpenAI is doing with WebRTC. Over the last months, we actually did a full teardown and compared OpenAI’s Realtime API to what powers chatgpt.com. What intrigued us most was how to measure response latency. One of the key metrics for any […]

The post Measuring the response latency of OpenAIs WebRTC-based Realtime API appeared first on webrtcHacks.

Upcoming Livestream April 10: Open AI WebRTC Q&A with Sean DuBois

webrtchacks - Tue, 04/01/2025 - 05:52

Thursday, April 10 at 17:00 UTC / 11:00 AM EDT / 8:00 AM PDT OpenAI is utilizing WebRTC for its Realtime API! Join Chad Hart, Editor of webrtcHacks, for an analysis of WebRTC trends in GitHub, OpenAI is utilizing WebRTC for its Realtime API! Join webrtcHacks editor Chad Hart for a Q&A with OpenAI’s lead […]

The post Upcoming Livestream April 10: Open AI WebRTC Q&A with Sean DuBois appeared first on webrtcHacks.

Tools for troubleshooting WebRTC applications

bloggeek - Mon, 03/31/2025 - 12:30

Troubleshoot your WebRTC applications with proven strategies. Discover tools to resolve common issues in connectivity and performance.

WebRTC is great. When it works.

When it doesn’t? A bit less so. Which is why there are tools available at your disposal to be able to debug and troubleshoot issues with your WebRTC application – be it connectivity failures, poor quality, bad use of the APIs or just buggy implementation.

This article, as well as the other articles in this series were written with the assistance of Philipp Hancke.

Interested in webrtc-internals and getStats? Then this series of articles is just for you:

webrtc-internals and getStats
Reading getStats in WebRTC
ICE candidates and active connections in WebRTC
WebRTC API and events trace
Tools for troubleshooting WebRTC applications (you are here)

Time to get down to business and see what tools are available to us for troubleshooting WebRTC applications.

Table of contents

The need for observability in WebRTC
Analyzing webrtc-internals (and getStats) with fippo’s dump importer
Using rtcstats for WebRTC data collection
Full fledged getStats based monitoring
observeRTC
peer metrics
testRTC
How can we help

The need for observability in WebRTC

People approach me frequently to help them find issues with their applications. They are stuck being unable to launch a WebRTC service or get too many customer complaints about the service instability.

The complaints are varied. They come in different shapes and sizes:

Too many calls don’t connect
Some calls have poor quality
Users complain our service isn’t as good as Zoom (or other providers)
A few if the participants can hear others in a group conference

There’s usually more than a single reason to cause each of the problems, which means the original complaint isn’t enough to solve the root cause of the problem.

For that, what is needed is observability in WebRTC. The ability to collect and analyze the relevant information in such calls that have issues. And for that, you need the system and the tooling in place.

The best approach if you ask me? Getting as close as possible to what webrtc-internals has to offer, and then outdo that as well.

What I want to do in this article, is to list a few of the solutions out there available today.

Analyzing webrtc-internals (and getStats) with fippo’s dump importer

The first tool on this list is fippo’s dump importer.

You take a webrtc-internals dump file, upload it to the dump importer, and then get a nice visualization of it.

We’ve covered this tool already in this series of articles, so there’s no point doing it here again.

The cool thing? It also supports files that get collected using rtcstats, which is the next tool on this list.

Using rtcstats for WebRTC data collection

What you get with fippo’s dump importer is great, but then you need to collect a webrtc-internals dump from an incident to be able to use it, and chrome://webrtc-internals is a technician’s solution to the problem, which means it can’t scale to production systems and real users.

Which is why we need to be able to collect that data on our own from the WebRTC calls in our application. To do that, you can use rtcstats. This is another open source project that can help in collecting the relevant data inside your client side JS code that runs in the browser. It will collect data similar to webrtc-internals and allows you to send them to a server using a way you choose. One way to do this is to send the data via WebSockets while the call is running. This may have a slight performance impact on the call but means the data is available immediately after the call ends.

Another alternative is to collect the data locally and upload a blob after the call ends. This avoids taking bandwidth from a live call but you need to consider what happens if a user closes the tab and never returns. This might still be useful if you only want to have the data available when the user files a support request in which case it can be uploaded as part of that.

For native applications, you might want to port this code as well, though that’s a bit more challenging. Jitsi is maintaining a friendly fork of the project.

Full fledged getStats based monitoring

rtcstats is the client side for WebRTC data collection. It comes without a server side – it just generates the JSON blobs you need to collect somewhere.

While unmaintained rtcstats-server (Jitsi maintains a fork for this too), it shows a couple of things you can do with the data and can serve as a starting point. The concept is to send the stats over a WebSocket to a server, and have that server process it at the scale you need. One of the most basic functionalities provided is taking the data and storing it to a file which then gets uploaded to a cloud storage service. A way more fancy feature is to extract certain metrics from each session, such as the time it takes getUserMedia to resolve, and look at those metrics over all calls on your service.

One thing to remember – be sure to store the files in their rtcstats format (which is basically line-oriented JSON), so that you’ll be able to view them with fippo’s dump importer

observeRTC

Then there’s observeRTC. An open source project that includes the client side and the server side.

It isn’t that popular, but it has many of the bits and pieces needed.

Check it out as well when you plan on building your own.

peer metrics

peer metrics is/was a commercial SaaS service for monitoring WebRTC. A year ago, they open sourced it. The open source projects themselves aren’t that popular at the moment and the amount of work done on this service is minimal.

Again, check it out if you are planning to build your own.

testRTC

If you are looking for a client side WebRTC data collection service that works at scale commercially, then there’s Cyara watchRTC. Cyara acquired Spearline which acquired testRTC.

testRTC was a company I co-founded with a couple of friends. You can say I am biased as to what this service can do.

If you want an out of the box solution – check them out.

How can we help

WebRTC statistics is an important part of developing and maintaining WebRTC applications. We’re here to help.

You can check out my products and services on the menu at the top of this page.

The two immediate services that come to mind?

WebRTC Courses – looking to upskill yourself or your team with WebRTC knowledge and experience? You’ll find no better place than my WebRTC training courses. So go check them out
WebRTC Insights – once in every two weeks we send out a newsletter to our Insights subscribers with everything they need to know about WebRTC. This includes things like important bugs found (and fixed?) in browsers. This has been a lifesaver more than once to our subscribers

Something else is bugging you with WebRTC? Just reach out to me.

The post Tools for troubleshooting WebRTC applications appeared first on BlogGeek.me.

The Unofficial Guide to OpenAI Realtime WebRTC API

webrtchacks - Tue, 03/18/2025 - 13:45

OpenAI using WebRTC in its Realtime API is obviously exciting to us here at webrtcHacks. Fippo and I were doing some blackboxing of this based on a quick sample the day of the WebRTC announcement so we could look at it in webrtc-internals and Wireshark. Some weeks later, my daughter was interested in using ChatGPT […]

The post The Unofficial Guide to OpenAI Realtime WebRTC API appeared first on webrtcHacks.

WebRTC API trace

bloggeek - Mon, 03/17/2025 - 13:00

Explore the WebRTC API trace for effective debugging and troubleshooting of connectivity and quality issues in your applications.

WebRTC is great. When it works.

This article, as well as the other articles in this series were written with the assistance of Philipp Hancke.

Interested in webrtc-internals and getStats? Then this series of articles is just for you:

webrtc-internals and getStats
Reading getStats in WebRTC
ICE candidates and active connections in WebRTC
WebRTC API and events trace (you are here)
Tools for troubleshooting WebRTC applications (coming soon)

What did your app do exactly? That’s going to be what we’ll look at and cover now. The events log that holds all WebRTC API calls.

Table of contents

WebRTC events log – the video version
What is in the events log?
How to create traces of your own programmatically?
Important APIs and callbacks in the events log
- The connection failed
- There was no audio
How can we help

WebRTC events log – the video version

Here’s a quick video guide on the WebRTC events log:

What is in the events log?

WebRTC has a rich set of APIs in web browsers for using it. “Stealing” from Olivier’s article about the state of WebRTC APIs, there are currently 479 APIs in WebRTC:

When an issue arises in a WebRTC application, it might be due to a multitude of reasons – from network, to device, signaling, the user, etc.

It might be connectivity and media quality. But also just unexpected behavior of the application.

One of the ways in which we can debug things (without breakpoints and runtime debuggers) is by looking at the WebRTC events log (also known as events trace at times).

If you open chrome://webrtc-internals as a destination inside a Chrome browser tab, it will collect WebRTC API calls and events from all of your Chrome browser tabs and log them into this new tab. You will then be able to review the flow of your WebRTC application – the APIs it called at each step, the return values of failures, events that were invoked, etc.

Here’s how it looks like when using StreamYard (to make the video recorded above) for example:

When more data is there, you can click to open up, showing for example, the SDP of the createOfferOnSuccess() event.

A very similar events log exists in the dump importer as well, so we’ve skipped showing a similar screenshot for it.

Under the hood this functionality is implemented outside of libWebRTC in the Chromium layer (which is the reason this is not easy to replicate for other browsers). The implementation (called the “peer connection tracker”) is monitoring all RTCPeerConnection objects and getting notified about all method calls and events. This information gets serialized to JSON and is sent to the webrtc-internals tab(s) (if they are open) and then turned into the event log we can see in the screenshot. While it has evolved quite a bit compared to how it looked like in 2014 in one of Philipp’s first WebRTCHacks posts, the basic functionality has been there for over a decade and helped resolved countless bugs and issues:

How to create traces of your own programmatically?

webrtc-internals has in many ways shaped the approach WebRTC developers take to debugging WebRTC issues. It has a very serious shortcoming though, you can not ask a user to send you a “webrtc-internals dump” that lets you look at their problem. As we have seen countless times, that is challenging even for developers.

You can take the same approach as Chromium and add tracing before and after each method call and for each event. That becomes quite a maintenance hassle however. In Javascript, one can use the same polyfilling techniques used by adapter.js to achieve the same result transparently with a polyfill. This is how “rtcstats.js” came to be and it is surprisingly compact, only 400 lines. These days, Jitsi is maintaining a fork.

The main advantage of this is that it is very lightweight on the client side, limiting itself to the traces and periodic collection of data while all the business logic is handled by a backend. Even without a backend the events and stats can be stored on the clients’ browsers in a storage such as IndexedDB and then attached to support requests.

Important APIs and callbacks in the events log

The events log has quite a few of the APIs and events that occur in a WebRTC application. Here’s our approach to sifting through it quickly.

The connection failed

We start with a verbal description of the problem, e.g. “the connection failed”.

For this we are going to look at the TURN servers configured, the candidates gathered via `onicecandidate` and the candidates added via `addIceCandidate` as well as the `iceconnectionstatechange` events:

Did the call fail to connect at all or did it fail at some point?
What happened prior to that point?
In particular the iceconnectionstatechange is such a frequent issue that the dump importer marks it a failure in bright red so you can see it immediately

There was no audio

Another example would be “there was no audio”.

We saw such an issue recently so the first thing we checked was whether an audio track was emitted via a ‘transceiverAdded’ event. This was the case, with both an audio and a video track. We then checked the statistics of the audio track and noticed that while `packetsReceived` increased so did `packetsDiscarded`. The jitter buffer emitted events but audio level was consistently zero which pointed to audio not being decoded. Going back to the `transceiverAdded` events for audio and video they showed different streams being used.

The bug? The ‘ontrack’ handler in Javascript was setting the srcObject of the video element used to the track event stream. Since these were different streams, the element only had a stream with a video track and audio was never played out or decoded.

See here for a fiddle reproducing the issue.

Over time, when working with this events log, you learn to see the patterns.

How can we help

WebRTC statistics is an important part of developing and maintaining WebRTC applications. We’re here to help.

You can check out my products and services on the menu at the top of this page.

The two immediate services that come to mind?

WebRTC Courses – looking to upskill yourself or your team with WebRTC knowledge and experience? You’ll find no better place than my WebRTC training courses. So go check them out
WebRTC Insights – once in every two weeks we send out a newsletter to our Insights subscribers with everything they need to know about WebRTC. This includes things like important bugs found (and fixed?) in browsers. This has been a lifesaver more than once to our subscribers

Something else is bugging you with WebRTC? Just reach out to me.

The post WebRTC API trace appeared first on BlogGeek.me.

ICE candidates and active connections in WebRTC

bloggeek - Mon, 03/17/2025 - 12:30

Understand WebRTC active connection and how to troubleshoot connectivity issues effectively in your WebRTC applications.

WebRTC is great. When it works.

This article, as well as the other articles in this series were written with the assistance of Philipp Hancke.

Interested in webrtc-internals and getStats? Then this series of articles is just for you:

webrtc-internals and getStats
Reading getStats in WebRTC
ICE candidates and active connections in WebRTC (you are here)
WebRTC API and events trace
Tools for troubleshooting WebRTC applications (coming soon)

This time? We’re going to figure out ICE negotiation and active connections. Let’s start…

Table of contents

WebRTC Peer Connections – a quick look
Reading WebRTC ICE related events – the video version
Understanding ICE negotiation in WebRTC
ICE candidates and ICE candidate pairs
State changes in ICE
Finding the active connection from webrtc-internals
Finding the active connection via getStats
How can we help

WebRTC Peer Connections – a quick look

Watch the video above if you’re unfamiliar with how WebRTC works. It shows two aspects of WebRTC:

Signaling, which is out of scope of WebRTC
Media, which presumably goes peer to peer – directly between the browsers

Here’s the thing though: media might not go directly between browsers. Or even from a browser to a media server. The reason for that is the network. At times, networks are going to block our traffic:

To overcome this, we use a protocol called ICE in WebRTC.

Reading WebRTC ICE related events – the video version

The video above is a visual explainer of what we have in this article (to some extent). Use it as an introduction before going into the details below.

Understanding ICE negotiation in WebRTC

Here’s a quick overview of ICE:

WebRTC uses ICE to handle NAT traversal.

ICE collects different addresses that the device can use – the local device IP addresses, its public IP addresses (obtained by using a STUN server) and any relay IP addresses (obtained by using TURN servers).

Each such address is called an ICE candidate pair
There are local candidates – the addresses of the local device
And there are remote candidates – the addresses of the remote device (be it another browser, device or media server)

In WebRTC, we prefer using a method called Trickle ICE, which collects the addresses and runs connectivity checks with addresses it already has in parallel.

Each pair of local and remote candidates is used to conduct a connectivity check.

Once such a check succeeds, we reach the connected state and can start sending media.

If more connections of such pairs are made, the pair with the highest priority will be used.

This process takes time and resources. The results of which aren’t as deterministic as we’d like it to be either. And at times – you can’t really connect, or you end up connecting in ways that make little sense (usually because of your own bugs).

Why is this important to us when we talk about getStats and webrtc-internals?

Because A LOT of the issues we will face with WebRTC are going to revolve around connectivity of the session. And that boils down to understanding ICE negotiation, selected candidate pair and the active connection in many of the cases.

👉 read this quick article about STUN, TURN and ICE for a few more aspects of NAT traversal

ICE candidates and ICE candidate pairs

We’ve seen how ICE candidates and ICE candidate pairs look like in getStats() last time.

Lucky for us, WebRTC makes it a bit easier to see these things when you open the webrtc-internals tab in Chrome.

For me, there are 4 different places to look for when it comes to connectivity and ICE negotiation in WebRTC:

Peer connection configuration
State machines
ICE candidates table
Events log

1. Peer connection configuration

The peer connection configuration shows us the configuration of the peer connection.

The important parts here are wrt connectivity? The iceServers and the iceTransportPolicy (see here for the “official” documentation).

If the iceTransportPolicy is “relay” then we know we will end up connecting via TURN.

The iceServers configuration simply tells us which STUN and TURN servers are going to be approached when collecting IP addresses for local ICE candidates.

2. State machines

The state machines indicate which states we’ve gone through.

Making sense of getStats in WebRTC

bloggeek - Mon, 03/03/2025 - 12:30

Unlock the potential of WebRTC stats with getStats to boost your application’s performance and reliability.

WebRTC is great. When it works.

This article, as well as the other articles in this series were written with the assistance of Philipp Hancke.

Interested in webrtc-internals and getStats? Then this series of articles is just for you:

webrtc-internals and getStats
Reading getStats in WebRTC (you are here)
ICE candidates and active connections in WebRTC (coming soon)
WebRTC API and events trace (coming soon)
Tools for troubleshooting WebRTC applications (coming soon)

This time? We’re taking a closer look at what’s inside getStats values – what the metrics that you’ll find there really mean (at least the more important ones)

Table of contents

webrtc-internals / getStats
Structure of a getStats returned value
A deep dive into getStats values
Structure of a webrtc-internals file
How can we help

webrtc-internals / getStats

We’re going to use these two terms interchangeably from now on, so please bear with us.

For me?

getStats is the API inside WebRTC that collects a lot of the data and metrics we’ll look at
webrtc-internals is what Chromium gives us as the main debugging tool for WebRTC (and a lot of the data in there? That’s getStats data)

If you’ve read the previous article, then you should know by now how to obtain a webrtc-internals dump file and also how to call getStats periodically to get the statistics you need.

So time to understand what’s in there…

Structure of a getStats returned value

There are many metrics that can be used in WebRTC to monitor various aspects of the peer connection. To put some sense and order into the process, the W3C decided to design the getStats() API in a manner that would “flatten” the information out for easy search access, and also include identifiers to be able to think of it all as structured tree data.

Here’s a “short” video explainer for WebRTC getStats() result structure:

https://youtu.be/B1MgeVkRQ-M A map of stats objects

WebRTC has been broken down in the specification to various objects for the purpose of statistics reporting. These objects are sometimes singletons (such as the “peer-connection”) and sometimes may have multiple instances (think incoming media streams).

To get away from the need of maintaining multiple arrays, a single map of statistics is used which stores in it as a set of RTCStats objects.

Each RTCStats object always has in it an id (object identifier), a timestamp and a type. The rest of the fields (and values) stored in the object depend on the type.

Multiple objects of the same type, such as “inbound-rtp” will have a different id.

Here’s how it looks like if you inspect the response object in the JS console on Chrome:

Partial getStats

Before we dive into the hierarchy and the metrics, it is important to note what happens with getStats() when you call it with a specific selector. The selector is a specific MediaStreamTrack, so that the results returned are going to be limited to that track only.

getStats() getStats(selector) RTCRtpSender.getStats() RTCRtpReceiver.getStats()

Great – right?

Not really…

This is not going to help you in any way, but in many ways, it is a hindrance.

When calling getStats(), with or without a selector, libWebRTC goes about its business collecting the statistics across ALL of the WebRTC objects. It sweats and uses resources to collect everything, and then filter down the results for you. There’s no optimization in the collection process that is taking place here.

Since you’re usually going to need to check statistics across your tracks, calling this separately for each track is wasteful.

Our suggestion? Always call getStats() with no selector at all. Do the filtering yourself if needed.

Hierarchy of objects

Most objects in getStats (but not all of them) end up connecting in one way or another to the “transport” object.

This “hidden” tree structure can be reconstructed by way of the various id fields found inside WebRTC’s stats objects (from WebRTC stats spec):

Some important notes about this table:

We’ve taken the liberty of marking in yellow all of the internal pointers, which can be used to easily jump from one RTCStats object to another inside the results object. All of these end with “Id”
We also marked in orange the track and data channel identifiers. These relate to internal identifiers of WebRTC objects – they can’t be used as stats pointers
Oh, and there are more fields than what you see here… two reasons why:
- The spec has more of them. This table may change as the standard evolves and should be considered partial at best (so click through in the specification to the RTCxxxStats objects to get the full details and descriptions
- Chrome, as well as other browsers, may have their own proprietary fields that they’ve added where they saw fit. Why? Because they can

Let’s see what the main stats objects and fields are there.

The specification of these can be found in the W3C spec for WebRTC itself.

A deep dive into getStats values

Time to look at getStats objects and fields and understand what values we may get for certain WebRTC metrics.

Fields and value types

For me, all of these fields are just field:value (or key:value) pairs.

If I had to group the fields to the types of values they store, it would be something like this:

Identifiers – values that are used to link one stats object to another (we have a screenshot above with yellow markings for all these). Their names end with “Id”. Beware, “trackIdentifier” is not such a pointer
verbose/textual – these are values that store textual or verbose information. Not something that we plot on a graph
accumulators – these are metrics that grow over time, accumulating their information. For example, the number of packets lost (since the beginning)
calculated – the calculated metrics don’t exist in getStats(). getStats doesn’t have calculated values since it takes no stance on the interval over which to calculate averages. These reside in webrtc-internals, which places their names inside [] brackets. They take accumulators and divide them by “something” – usually seconds, to get them averaged out over short periods of time, making it easier to spot outliers on graphs
numbers – numeric values of various kinds that aren’t accumulators or calculated. They are just… numbers. They are either static most of the time, change a bit or change a lot throughout the session. An example? The audio level on the incoming audio or height (in pixels) of a video stream

Why did I want to mention all this? When you see a field, be sure to think about its type – it will help you determine how to read it and what you should do with it.

“transport” type

Link to spec (RTCTransportStats)

The “transport” type denotes the DTLS and ICE transport objects used to send and receive media. You can think about it as a single RTP/RTCP “connection”.

Things you’ll find on the “transport” type?

Accumulators for packets and bytes sent and received (these are packetsSent, packetsReceived, bytesSent and bytesReceived). This is totals and on a high level. You’ll be more interested in the lower level values on other objects most of the type
Status and state of DTLS and ICE objects, which is important for debugging (mainly iceRole, dtlsState, iceState and dtlsRole)
The selected ICE candidate pair identifier – selectedCandidatePairId, which is important to understand where we’re connected to and how exactly (UDP, TCP, direct, relay, etc)
The certificate identifiers – localCertificateId and remoteCertificateId – not much use in them

Typically you will have a single transport object per connection (unless you are not using BUNDLE).

“candidate-pair”, “local-candidate” and “remote-candidate” types

These objects deal with ICE negotiation candidates.

During this process, WebRTC collects all local candidates (IP addresses it can use to receive media and send media from) and the remote candidates (IP addresses that the remote peer tells him he can be reached out at). WebRTC then conducts ICE connectivity checks by pairing different local candidates with remote candidates.

To that end, getStats stores and returns us all “local-candidate” and “remote-candidate” types along with the “candidate-pair” types for the pairs it tried out.

“local-candidate” and “remote-candidate”?

Link to spec (RTCIceCandidateStats)

The ICE candidate statistics object stores static information in general. It doesn’t have anything that changes dynamically, as that happens on the pair. The main fields here relate to the IP, port and protocol (address, port, protocol, candidateType and relayProtocol) used by the candidate.

Our “candidate-pair”?

Link to spec (RTCIceCandidatePairStats)

The candidate pair is the actual connection (or attempted connection). Here things start to become interesting (at last).

On one hand, the pair contains quite a few identifiers, connecting it to the transport object (transportId) and to the local and remote candidates (localCandidateId and remoteCandidateId). The state field indicates when ICE checked it, failed or succeeded (not too useful).

There are quite a few interesting fields here:

packetsSent, packetsReceived, bytesSent and bytesReceived. These are similar in nature to the ones found on the “transport” type, but for the specific candidate pair
On top of these we have additional accumulators – requestsReceived, requestsSent, responseReceived, responseSent and consentRequestsSent – all of these relate to the ICE protocol and connectivity checks conducted for this pair. This becomes important when your connection does not go through
We’ve got timestamps, indicating when packets were last sent or received
Round trip calculations for the STUN/TURN connection (not necessarily what we want as RTT, but sometimes all we’ve got to go with – these measure the ICE RTT – towards the peer that terminates ICE which might be a SFU, which is different from what RTCP RTT measures). These are the totalRoundTripTime and currentRoundTripTime
Bandwidth estimation calculated values in availableOutgoingBitrate and availableIncomingBitrate
Then there’s packetsDiscardedOnSend and bytesDiscardedOnSend, both accumulators that may indicate network or compute issues (read more about discarded WebRTC packets)

For the most part? This section still deals with connectivity related metrics. A lot less about quality itself.

RTCRtpStreamStats

We’re getting to fragmented stats structures – think classes and inheritance in object oriented programming languages. The RCTRtpStreamStats take part of all rtp reports – “outbound-rtp”, “inbound-rtp”, “remote-inbound-rtp” and “remote-outbound-rtp”. What does it hold?

Link to spec (RTCRtpStreamStats)

ssrc is the static field connecting us to the SSRC value of the RTP stream itself. These reports also aggregate data from SSRCs related to this SSRC such as the RTX and FEC SSRCs.

kind just indicates if this is a “voice” or a “video” stream. That’s going to affect other metrics down the line, and is also a way to filter and find what we’re looking for.

Then we’ve got the pointer identifiers transportId and codecId.

Nothing much to write home about here, but important to know and understand nonetheless.

RTCSentRtpStreamStats and RTCReceivedRtpStreamStats

Each “*-rtp” type object also holds in it either an RTCSentRtpStreamStats or an RTCReceivedRtpStreamStats set of fields.

RTCSentRtpStreamStats

Link to spec (RTCSentRtpStreamStats)

The Sent one is rather simple. It holds two accumulators that we’ve seen already: packetsSent and bytesSent.

There’s slightly more (and different) fields in the receive side of things:

Link to spec (RTCReceivedRtpStreamStats)

On the receiving end, we’re focused on two accumulators and a variable metric. The accumulators are packetsReceived and packetsLost (rather important ones that also help us in calculating packet loss percentage).

And then there’s the jitter metric, which is the reported jitter of the incoming stream’s packets.

“outbound-rtp” and “remote-inbound-rtp” types

These two types are about outgoing media. “outbound-rtp” is about what we send and “remote-inbound-rtp” is about what our peer reported it received from us.

Each of these holds more than one stats object inside of it. We’ve covered the basics of these objects above. Time to look at what they specifically hold.

Let’s review each one of them separately.

“outbound-rtp”

outbound-rtp reports back to us what our WebRTC implementation is sending on a stream. To begin with, the “outbound-rtp” stats object will be holding RTCRtpStreamStats and RTCSentRtpStreamStats fields.

On top of it, there’s a slew of additional fields that will be there, depending on the type of the stream – audio or video.

Link to spec (RTCOutboundRtpStreamStats)

Our outbound RTP metrics relate to both audio and video, with specific metrics that are relevant only for video.

Both audio and video:

mid and rid values, if existent and relevant. The mid tells you where in the SDP the media associated with these stats lives, the rid tells you which simulcast layer is described by it.You won’t be needing this much for quality measurements
rtxSsrc for the retransmission SSRC, if one is used
mediaSourceId and remoteId are again identifier indexes. The remoteId points to the relevant “remote-inbound-rtp” object described below
Lots of accumulators: headerBytesSent, retransmittedPacketsSent, rentransmittedBytesSent, totalPacketSendDelay and nackCount
We’ve got our targetBitrate indicating what the encoder is compressing towards
Active indicates if this is an active stream or not

Video only:

Additional accumulators here include totalEncodedBytesTarget (which is currently broken and may get removed from the specification), framesSent, hugeFramesSent, keyFramesEncoded, qpSum, firCount, pliCount and totalEncodeTime
- pliCount and firCount give you an idea how often the encoder needs to produce “expensive” keyframe
- totalEncodeTime can be divided by framesEncoded gives you an idea how much time the encoder is spending per frame on average – the upper limit for that is 33ms for 30fps
We can figure out the video resolution we’re sending by looking at frameWidth and frameHeight
And we’ve got framesPerSecond on top of framesSent so we don’t have to calculate fps directly (at least not for the simple scenarios)
Using SVC or Simulcast with temporal scalability? scalabilityMode is going to be a relevant metric to understand what layers are being encoded
qualityLimitationReason, qualityLimitationDurations, qualityLimitationResolutionChanges are unique in their structure and use. Suffice to say that we’ve done a fiddle about this one: Quality limitation stats in WebRTC
encoderImplementation is a static value that hints on the actual codec implementation (software or hardware). To that end, powerEfficientEncoder is also useful if available. These won’t always be available to you (some browsers restrict this due to privacy reasons)

Now that we have what we “know” we sent, time to look at “remote-inbound-rtp”

“remote-inbound-rtp”

The remote-inbound-rtp object is all about what the remote side reported back about our sent stream. In essence, this is the RTCP RR (Receiver Report) data – or more accurately – parts of it. Our “remote-inbound-rtp” stats object also holds RTCRtpStreamStats and RTCReceivedRtpStreamStats fields.

Link to spec (RTCRemoteInboundRtpStreamStats)

We have the customary localId identifier connecting us back to “outbound-rtp”
totalRoundTripTime and roundTripTimeMeasurements are both accumulators, together hinting on the average RTT
roundTripTime as most recently calculated
fractionLost is the packet loss percentage

Time to talk about the “other side”…

“inbound-rtp” and “remote-outbound-rtp” types

What we had for outbound is there for inbound as well. “Inbound-rtp” is what we actually received and processed while “remote-outbound-rtp” is what the remote peer reported to us it sent (where some might have gotten lost in the void of the internet).

Here’s what we have for the “inbound-rtp” – RTCRtpStreamStats, RTCReceivedRtpStreamStats as well as additional fields:

Link to spec (RTCInboundRtpStreamStats)

For inbound RTP related stats, we have those that are specific to audio, those specific to video and those that relate to both.

Both audio and video:

trackIdentifier, connecting us to the media track
mid value, if existent and relevant. The mid tells you where in the SDP the media associated with these stats lives. You won’t be needing this much for quality measurements
rtxSsrc and fecSsrc for the retransmission SSRC, if one is used. fecSsrc is set when receiving video FEC with a mechanism that uses a different SSRC like flexfec
remoteId as an identifier index, pointing to the relevant “remote-outbound-rtp” object described below
Lots of accumulators: headerBytesReceived, packetsDiscarded, fecBytesReceived, fecPacketsReceived, fecPacketsDiscarded, bytesReceived, totalProcessingDelay, nackCount, jitterBufferEmittedCount, retransmittedPacketsReceived and retransmittedBytesReceived
Our bytesReceived accumulator also includes the RTX and FEC bytes. For the most part, you’ll need to subtract retransmittedBytesReceived and fecBytesReceived from it to get to the raw payload bytes (actual media, without the extra fluff)
There are a few metrics (non-accumulators) that give us the status of the jitter buffer: jitterBufferDelay, jitterBufferTargetDelay, jitterBufferMinimumDelay. These allow you to estimate how much time packets or frames spend in the jitter buffer
Then there are lastPacketReceivedTimestamp and estimatedPlayoutTimestamp values which you need to look at if you are wondering if you have not received data for a while

Audio only:

There are audio specific accumulators dealing with packet loss concealment: totalSamplesReceived, concealedSamples, silentConcealedSamples, concealmentEvents, insertedSamplesForDeceleration and removedSamplesForAcceleration
- Of these we found concealedSamples and concealmentEvents somewhat useful metrics for how often the audio jitter buffer has to make up audio. Too often and too long and the user will notice
We have two additional accumulators: totalSamplesDuration and totalAudioEnergy
The audioLevel enables us to know the volume level of the incoming audio (note that this has its accumulator in totalAudioEnergy above)
Then there’s playoutId, an identifier connecting us to the “media-playout” stats

Video only:

Additional accumulators here include framesReceived, framesDecoded, keyFramesDecoded, framesRendered, framesDropped, qpSum, totalDecodeTime, totalInterFrameDelay, totalSquaredInterFrameDelay, pauseCount, totalPausesDuration, freezeCount, totalFreezesDuration, firCount, pliCount, framesAssembledFromMultiplePackets, totalAssemblyTime, totalCorruptionProbability, totalSquaredCorruptionProbability and the new corruptionMeasurements
- Of these, totalDecodeTime divided by framesDecoded is interesting for estimating CPU load
- The freezeCount tells you how often a video freeze was long enough to have been noticed by a user
We can figure out the video resolution we’re receiving by looking at frameWidth and frameHeight
And we’ve got framesPerSecond on top of framesReceived so we don’t have to calculate fps directly (at least not for the simple scenarios)
decoderImplementation is a static value that hints on the actual codec implementation (software or hardware). To that end, powerEfficientDecoder is also useful if available

Now it is time to check what is being reported to use by the remote peer:

“remote-outbound-rtp”

The “remote-outbound-rtp” is what the remote peer tells us he sent us. This is received on our end by the RTCP SR (Sender Report) and then incorporated into this stats block.

As usual, it is comprised out of RTCRtpStreamStats, RTCSentRtpStreamStats and this additional block:

Link to spec (RTCRemoteoutboundRtpStreamStats)

Here we have:

The customary localId identifier connecting us back to “inbound-rtp”
Accumulators for reportsSent, totalRoundTripTime and roundTripTimeMeasurements
roundTripTime as most recently calculated (this and the relevant accumulators and fields only appear here if the relevant RTCP extension with the DLRR report block are implemented, which is still rather rare – more on that in our Low-Level WebRTC Protocols course)

“codec” type

The codec block holds information about the codec used – for both incoming and outgoing streams.

Link to spec (RTCCodecStats)

Frankly? There’s not much here to use for monitoring… The best thing here is the ability to resolve a nice name for the codec.

“media-source” type

The “media-source” is about what we’re sending. It is split into 3 parts: generic, audio and video. Obviously, we will find either audio or video for any specific media source.

The generic

Link to spec (RTCMediaSourceStats)

The kind field will indicate if we’re dealing with audio or video…

The audio

Link to spec (RTCAudioSourceStats)

Here we have a few metrics, of which audioLevel is the most interesting:

The audioLevel enables us to know the volume level of the captured audio
For calculations of audio levels, we have two accumulators: totalAudioEnergy and totalSamplesDuration
Then there are two additional metrics available for microphones that have built-in echo cancellation: echoReturnLoss and echoReturnLossEnhancement

The video

Link to spec (RTCVideoSourceStats)

We’ve seen the metrics here elsewhere as well – but this time, it indicates what our source video metrics are – not those measured just before encoding or after being decoded on the other end.

Towards that end, we have:

width and height for the video’s resolution. This is what was captured on camera or screen. It might get scaled down before being sent or displayed on the other end, so it is a good reference to figure out the differences
An accumulator of the number of frames captured so far
Calculated FPS in the form of framesPerSecond. How is this calculation done and over what period of time? Not something specified or agreed upon

“media-playout” type

Where “media-source” is about outgoing streams, “media-playout” is about incoming ones. That said, today at least, “media-playout” is limited to audio streams only.

Link to spec (RTCAudioPlayoutStats)

All of the fields here (besides the kind which is always set to “audio”) are accumulators.

Nothing much to add here.

Others? “peer-connection”, “data-channel” and “certificate” types

The other types of stats blocks don’t hold much in them. At least not in the form of something that is really useful when debugging.

The “peer-connection” has a running tally using accumulators for closed and opened data channels (dataChannelsOpened and dataChannelsClosed).

The “data-channel” one is built mostly of accumulators that can be calculated from sent and received data on the channels. Might be easier to take it from here, but it doesn’t add much value beyond being simpler to get in this manner.

And the “certificate”? Well… it just gives you that – the certificates trail. Not something we’ve used so far.

Structure of a webrtc-internals file

When it comes to chrome://webrtc-internals, the file itself is a simple JSON text file. The format is not specified and subject to change. It has grown historically and does some things like double-encoding as JSON.

Sometimes you need to look at the format when you are looking for a specific value that is not visualized by your tooling such as the dtlsCipher.

If you open the content in a nice JSON viewer, you’ll get something like this:

There are 2 arrays in this JSON file:

getUserMedia, which shows the getUserMedia() and getDisplayMedia() API calls with their parameters and the resulting streams and tracks or errors
List of PeerConnections objects, where each peer connection has its configuration, stats and updateLog

The stats inside the PeerConnections objects is an array of calls into getStats(). Here’s what you’ll find there:

Here we see the id COT01_96. The field of each item is the postfix of the id – transportId, payloadType, mimeType, clockRate, timestamp, …

For each, we have the startTime and endTime, denoting the time the first and last samples were taken. We have the statsType – the object this is collected for (“codec” in this case). And the values which are an array of the values as taken over the period of time.

The eventsLog… that’s left for another article down the road.

If you are lazy, and you should be, then reading this file should be done using a dedicated visualizer. The open one out there is fippo’s WebRTC dump importer. It parses the structure and then visualizes some of the data. I’ll leave it to you to try it out – it works great. Maybe we should do a video explainer for it at some point…

How can we help

WebRTC statistics is an important part of developing and maintaining WebRTC applications. We’re here to help.

You can check out my products and services on the menu at the top of this page.

The two immediate services that come to mind?

WebRTC Courses – looking to upskill yourself or your team with WebRTC knowledge and experience? You’ll find no better place than my WebRTC training courses. So go check them out
WebRTC Insights – once in every two weeks we send out a newsletter to our Insights subscribers with everything they need to know about WebRTC. This includes things like important bugs found (and fixed?) in browsers. This has been a lifesaver more than once to our subscribers

Something else is bugging you with WebRTC? Just reach out to me.

The post Making sense of getStats in WebRTC appeared first on BlogGeek.me.

Everything you wanted to know about webrtc-internals and getStats

bloggeek - Mon, 03/03/2025 - 12:30

Maximize your understanding of webrtc stats and webrtc-internals, assisting you in monitoring and analyzing WebRTC applications.

WebRTC is great. When it works.

This article, as well as the other articles in this series were written with the assistance of Philipp Hancke.

Interested in webrtc-internals and getStats? Then this series of articles is just for you:

webrtc-internals and getStats (you are here)
Reading getStats in WebRTC
ICE candidates and active connections in WebRTC (coming soon)
WebRTC API and events trace (coming soon)
Tools for troubleshooting WebRTC applications (coming soon)

This time? We’re focusing on WebRTC debugging 101. Or as it is more widely known by: webrtc-internals and getStats

Table of contents

A quick introduction to getStats
Chrome and webrtc-internals
Visualising WebRTC statistics
How can we help

A quick introduction to getStats

WebRTC runs inside the browser. It has a set of Javascript APIs so developers can build their applications using it. The thing is, that networks are finicky and messy – they are unpredictable. Which is why developers need to monitor quality metrics. If you don’t do that in your application, then:

Customers might complain (poor resolution; video freezes; echo; etc) – and you will have nothing to say to them about that. It is not as simple as “your network sucks” anymore and the cost of refunding them for everything is going to affect your revenue
Your application might not be properly fine tuned and optimized for the scenario you’re using (you can call it bugs)

What is needed is observability, and that is done using an API that was available in WebRTC since its inception – known as getStats(). getStats exposes a surprisingly large amount of information about the internal performance of the underlying WebRTC library.

Calling getStats

getStats can either be called on the RTCPeerConnection object or specific senders or receivers. Since calling it on senders or receivers only filters the result obtained for the whole connection it is typically better to call it on the RTCPeerConnection:

const stats = await pc.getStats();

Remember that getStats is an asynchronous method so returns a Promise which needs to be awaited. The Promise resolves with a “Maplike” object that is a key-value store in Javascript.

You can iterate over this with a for-loop and log the contents:

stats.forEach(report => console.log(report.id, report))

Please note that the “id” is an identifier and while it has a certain structure in Chrome, do not attach any meaning to that structure as it is subject to change without notice (this happened in the past already)..

Alternatively you can get an array with the values which is useful if you are looking to filter for certain types of reports:

[...stats.values()].filter(report => report.type === 'inbound-rtp')

The key of each key-value pair is a string that uniquely identifies the object and is consistent across calls. This means you can call getStats at two different points in time and compare the objects easily:

// we assume `stats` has been obtained “a while ago” const newStats = await pc.getStats();

Assuming we are interested in the audio bitrate, we would look for the “outbound-rtp” report with an “audio” kind:

// we assume `stats` has been obtained “a while ago” const newStats = await pc.getStats();

We need to check that “currentAudio” exists and that stats.has(currentAudio.id) (i.e. the old report has the same value and then we can calculate the audio bitrate from the “bytesSent” values as

// check currentAudio and stats.has(currentAudio.id) audioBitrate = 8 * (currentAudio.bytesSent - stats.get(currentAudio.id).bytesSent) / (currentAudio.timestamp - stats.get(currentAudio.id).timestamp)

The pattern of taking the difference in the cumulative measure and dividing it by the time difference is very common, see here for the underlying design principle.

What do the values inside getStats exactly mean? That’s what we’re covering in our reading getStats article.

getStats frequency

At what frequency should you be calling getStats()?

That’s up to you. For most metrics, calling it at frequencies lower than a second makes no sense. And frequencies above 10 seconds will be too little usually.

getStats() uses a JavaScript Promise – which means it is asynchronous in nature. You ask for stats and then the browser (WebRTC) will be working to get the stats for you. It will return the Promise once done.

Calling too frequently means eating up CPU for collecting statistics since getStats needs to query a lot of information from different parts of the system. If you don’t plan on using it for something important enough at such a frequency, then call the function less frequently.

One example of using getStats for the wrong task was calling it several times per second to get and display the audio level. This has since been replaced by a better API and getStats is returning the same result when it is called too frequently.

getStats returns aggregated values for many statistics such as the number of bytes received. This lets you call getStats and subtract the previous value from the current value and divide this by the time between the two measurements to get an average over a time period.

Our suggestion? Once a second. For statistics that are a bit jittery keep a 3-5 second old object around and average over the slightly larger window.

Reading getStats results

How to read getStats results is a bigger topic and won’t fit here. Lucky for you, we’ve written a dedicated article just for this!

Head on and check out how to make sense of getStats results.

Why collect stats from the client anyway

In the past, in VoIP services, we often focused on collecting the metrics and statistics from the “network”. We collected the metrics from the media servers and other application servers. We also placed network packet analyzers near the media servers to measure these metrics for us.

This cannot be done anymore…

WebRTC enables also peer to peer interactions, so the media doesn’t even flow to the servers, which means that the infrastructure has no clue about the session quality for peer to peer sessions
WebRTC encrypts all media traffic using SRTP. Encrypted traffic means a lot less visibility for packet inspection tools, which means that even metrics going to our backend is mostly “invisible” to us
Collecting from the media servers means we know what they “feel” about this session, but what we are really interested in is what the devices/users “feel” about this session. Many of the metrics available on the client just don’t exist on the other end of the session

When WebRTC was introduced, it immediately lent itself to client side metrics collection. The bandwidth available to us was higher than ever before for the most part, many of the developers building WebRTC services were never indoctrinated as VoIP experts – they were just web developers. This meant that client side collection of stats was adopted and made common.

Whatever the reason is, today’s best practice is to collect the information from the client itself, and that makes total sense for WebRTC applications.

A word about rtcstats

A decade ago Philipp Hancke created an open source project years ago called rtcstats. It is a very lightweight approach to wrap the underlying RTCPeerConnection object, periodically call getStats and send the statistics (as well as other information about the API calls) to a server. On the server one of the artifacts this produces is a “dump” with information equivalent to the webrtc-internals dump. While it has not been updated recently, there are friendly forks e.g. from Jitsi. This project enables us to easily collect WebRTC related information from a WebRTC application without much integration effort. The library itself is simple enough that it does not require much maintenance or frequent updates.

If you are looking to build your own WebRTC statistics collection for your WebRTC application, then this project is highly recommended.

rtcstats collects everything – API calls and all getStats metrics, sending it to the server side of your data collection. It does so with some thoughts about reducing the traffic on the network by implementing a kind of a virtual sparse graph of the metrics collected (think of it as not collecting metric values that haven’t changed). This avoids stealing away the bandwidth needed for real time communications for uploading the logs.

Chrome and webrtc-internals

In a way, Chrome was always at the forefront of WebRTC (not surprising considering ALL modern browsers end up using Google’s libWebRTC implementation). They were the first to implement and adopt it into their browser and services as well (obviously).

What happened is that Google needed simple tooling to debug and troubleshoot issues related to WebRTC. So they created webrtc-internals.

What is webrtc-internals?

webrtc-internals is a browser tab (just write chrome://webrtc-internals in the address bar of your Chrome browser) that collects and shares WebRTC related information from the browser itself.

It has information about GetUserMedia, PeerConnection configuration, WebRTC API calls and events and calls to getStats – both latest and visualized on graphs.

This treasure trove of information is quite handy when you’re trying to figure out what’s going on with your WebRTC application.

The challenge? The data itself is transient. There as long as the peer connection is open. Deleted and gone the moment it is closed.

This leaves us with two big challenges:

How do developers review webrtc-internals of users who complain? They can’t really open that on the user’s machine remotely
The data is available only during the session itself. How do you get to that data after the fact?

How to obtain a webrtc-internals file?

The first thing we need in order to “solve” the two challenges above is to “convert” webrtc-internals into a file (also know as a webrtc-internals dump)

The video above explains that visually. In essence:

Open chrome://webrtc-internals. Preferably, before starting to run the session at all
Keep it open and run your service – without closing the WebRTC session
Go back to the chrome://webrtc-internals tab, click on Create a WebRTC-Internals dump
Click the Download the “webrtc-internals dump” button

You should now have a webrtc_internals_dump.txt file in your downloads folder.

Note that you still need to be purposeful about it, planning on obtaining that information to begin with, and actively downloading the file. Not fun, but very useful.

Reading webrtc-internals

How to read getStats results is a bigger topic and won’t fit here. Lucky for you, we’ve written a dedicated article just for this!

Head on and check out how to make sense of getStats results.

webrtc-internals alternative in Firefox

Mozilla has its own about:webrtc browser tab for Firefox.

They even outdid Google here and actually wrote about it: Debugging with about:webrtc in Firefox, Getting Data Out

What they are lacking though is relevance… not many developers (or users) are using Firefox, so the whole focus and effort is elsewhere.

Here’s the thing – at the end of the day, what we need is a robust solution/service across all browsers and devices. This usually translates to rtcstats based solutions. More on that… in a later article in this series.

👉 Still interested in debugging on Firefox? Check out this section from Olivier’s post on debugging WebRTC in browsers

webrtc-internals alternative in Safari

Debugging is Safari is close to nonexistent. You’d be better off collecting the data yourself via rtcstats.

Apple, being Apple, doesn’t care much about WebRTC or developers in general.

👉 Still interested in debugging on Safari? Check out this section from Olivier’s post on debugging WebRTC in browsers

Visualising WebRTC statistics

Having stats is great, but what do you make out of this?

Being able to see anything here is hard. Which is why Philipp Hancke built and is still maintaining a tool called WebRTC dump importer – you take the webrtc-internals dump you’ve downloaded, upload it to this page, and magic happens. Go check it out.

There are other visualization tools available, but they are commercial and part of larger paid solutions (testRTC for example has great visualization, but it isn’t offered as a standalone).

How can we help

WebRTC statistics is an important part of developing and maintaining WebRTC applications. We’re here to help.

You can check out my products and services on the menu at the top of this page.

The two immediate services that come to mind?

WebRTC Courses – looking to upskill yourself or your team with WebRTC knowledge and experience? You’ll find no better place than my WebRTC training courses. So go check them out
WebRTC Insights – once in every two weeks we send out a newsletter to our Insights subscribers with everything they need to know about WebRTC. This includes things like important bugs found (and fixed?) in browsers. This has been a lifesaver more than once to our subscribers

Something else is bugging you with WebRTC? Just reach out to me.

The post Everything you wanted to know about webrtc-internals and getStats appeared first on BlogGeek.me.

CPaaS and LLMs both need APIs and SDKs

bloggeek - Mon, 02/17/2025 - 12:30

Understanding API and SDK: Dive into their definitions and learn why both are crucial for effective software development with CPaaS and LLM.

An API and an SDK. They are similar but different. Both are interfaces used by services to expose their capabilities to developers and applications. For the most part, we’ve been happy enough with APIs that are based on REST, probably with an OpenAPI specification based definition for it.

But for things like WebRTC, communications and WebSocket based interfaces, an API just isn’t enough.

Let’s dive in to see why.

Table of contents

What are API and SDK?
- API
- SDK
CPaaS and Programmable Communication interfaces
Programmable LLM interfaces
Is an SDK critical to “hide” a WebRTC interface

What are API and SDK?

We will start by a quick definition of each.

Keep in mind that the actual definitions are rather fluid – the ones below are just those that are common today in our industry (networking software).

API

API stands for Application Programming Interface. In this day and age, such an interface is usually one that gets used by remote invocation – from one machine to another over an IP network.

The most common specification for an API? REST

REST is a rather simple mechanism built on top of HTTP. For me, it is a way to formalize how a URL can be used to retrieve values, push values or execute “stuff” on the server.

Why REST? Because it uses HTTP, making it easily accessible and usable by web applications running inside web browsers.

Then there’s OpenAPI which is simply a specification of how to express interfaces using REST in a formal way. This enables using software tools to create, document, deploy, test and use APIs.

While there are other types of APIs, which don’t rely on REST or OpenAPI, most do.

The unique thing about an API? It sits “inside” or “on top” of the service we want to interface with and we call/invoke that API by calling it from a separate process/machine.

SDK

SDK stands for Software Development Kit. For me, that’s a piece of code that gets embedded in your application as a library that you use directly.

Where an API gets communicated remotely, over the network; an SDK gets invoked directly, from inside a software application.

In many cases, an SDK is built on top of an API, to make it easier to integrate with.

CPaaS and Programmable Communication interfaces

Lets see what Twilio does as an example of the various interfaces they have on offer. The ones on offer are:

An API. REST based. The classic definition of an API as given above
Helper libraries. Also known as server-side SDKs. These libraries are written in various languages and simply call the APIs, so you don’t have to deal with REST calling directly
TwiML. A kind of XML structure that can be used to define what is done when a phone number is dialed. It is a kind of a lowcode technique that have been introduced by Twilio years ago
Client-side SDKs. These SDKs are needed when you want to run complex code on client devices
- To implement voice and video calling using WebRTC. Because here, REST just isn’t enough…
- When using Conversations and Sync Twilio services. Because these likely use WebSocket instead of REST and are more challenging to implement directly using something like REST (they are likely more stateful in their nature)

The moment in time that a client side SDK is needed is when explaining how to interact with the server’s interface (think REST) is going to be complicated. Remember – CPaaS vendors are there to simplify the development. So adding SDKs to simplify it further where needed makes total sense.

WebRTC almost forces us to create such client side SDKs. Especially since signaling isn’t defined for WebRTC – it is up to the vendor to decide, and here, the vendor is the CPaaS vendor. So if he defines an interface, it is easier to implement the client side of it as an SDK than to document it well enough and assume customers will be able to do the implementation properly without wasting too much time and too many support resources.

Programmable LLM interfaces

Time to look at LLM and Generative AI interfaces that are programmable. We do that by reviewing OpenAI’s developer platform documentation. Here’s what they have available at the moment:

REST API and an SDK, for the text based prompting technology. The SDKs are available for JavaScript and Python
WebSocket interface/API, for voice prompting. No SDK (yet)
WebRTC interface/API, for voice prompting. No SDK (yet)

With voice, OpenAI Realtime API started by offering a WebSocket interface. Google Gemini followed suit.

Why WebSocket?

Because the data that needs to pass through the connection is audio and not just a text request. HTTP is a bit less suitable for this
It is bidirectional in nature – voice goes both ways here
There are extra events that have to go alongside the voice itself
The connection needs to stay open for long periods of time
There is no real “request-response” paradigm like the one we’re used to on the web and with HTTP. While it is “there” to some extent, human interactions aren’t just a back and forth ping pong game
Also, all online TTS and STT services out there offer a WebSocket interface, so this was just following an adjacent industry’s best practices (showing why sometimes best practices just aren’t the best)

Why no SDK? Because this is still in beta…

They quickly followed with a WebRTC interface. Which makes total sense – WebSocket isn’t really real time and comes with its own set of limitations for an interactive voice interface (on that, in another time).

What they didn’t do here was add an SDK either.

And while with WebSocket this is “acceptable”, for WebRTC… I believe it is less so.

Here’s what I wrote about OpenAI, LLMs, voice and WebRTC a few months back

Is an SDK critical to “hide” a WebRTC interface

Yes it is.

WebRTC has an API surface that is quite extensive. It includes APIs, SDP, network configuration, etc.

Leaving all these exposed and even more – with no direct implementation other than an example in the documentation – isn’t going to help anyone.

WebRTC as a development interface suffers from a few big challenges:

Low level, suitable for experience WebRTC developers only
Exposes internals of the implementation, which are better kept out of a third party engineer’s hands (for example, controlling the iceServers configuration)
Expansive interface with 100s of APIs that make up WebRTC. Which ones should a developer use? Which ones were tested against?
Varied interface that includes APIs, callbacks, promises, configurations and SDP munging
No defined signaling means someone needs to define it. And then developers need to understand and use that definition. Tricky (trust me)

This means that without having an SDK to a WebRTC interface (be it for a Programmable Video or Voice service, or for an LLM / Generative AI service), you are going to be left with a solution that is hard to adopt and easy to break:

Hard to adopt because it takes a long time for developers to integrate with, and in the process eats up expensive support resources on your end (not to mention frustration for both the customer and the support people)
Easy to break because there are just too many things that developers can do that you haven’t thought about that they are bound to fail or even cause outages on your end

Oh, and we didn’t go into the discussion of what to do with Android and iOS developers that might want to integrate with the services inside a native application (they need native SDKs…).

If you’re aiming to have an API for a WebRTC interface, then you should also work towards having an SDK for it. And if not, be very very clear to yourself why you don’t need an SDK.

The post CPaaS and LLMs both need APIs and SDKs appeared first on BlogGeek.me.

News from Industry

Answering ChatGPT questions about WebRTC

Choosing the best WebRTC signaling protocol for your application

WebRTC is about reducing friction and barriers of entry

Using LTE modems under Debian

How WebRTC’s NetEQ Jitter Buffer Provides Smooth Audio

8 ways to optimize WebRTC performance

A good WebRTC application is like a great orchestra performance

The future of Video APIs is… AI: LiveKit, Daily and Cloudflare this month

What’s Your SaaS for WebRTC Signaling?

OpenAI & WebRTC Q&A with Sean DuBois

WebRTC gives voice to LLMs

Measuring the response latency of OpenAIs WebRTC-based Realtime API

Upcoming Livestream April 10: Open AI WebRTC Q&A with Sean DuBois

Tools for troubleshooting WebRTC applications

The Unofficial Guide to OpenAI Realtime WebRTC API

WebRTC API trace

ICE candidates and active connections in WebRTC

Making sense of getStats in WebRTC

Everything you wanted to know about webrtc-internals and getStats

CPaaS and LLMs both need APIs and SDKs

Pages

Using the greatness of Parallax

Yet more available pages

Responsive grid

Typography

About

WITH A RICH FOOTER

Recent comments

Main menu

News from Industry

Pages

Using the greatness of Parallax

Yet more available pages

Responsive grid

Typography

Main menu

User login