How WebRTC media resilience works – what FEC, RED, PLC, RTX are and why they are needed to improve media quality in real-time communications.
Networks are finicky in nature, and media codecs even more so.
With networks, not everything sent is received on the other end, which means we have one more thing to deal with and care about when it comes to handling WebRTC media. Luckily for us, there are quite a few built-in tools that are available to us. But which one should we use at each point and what benefits do they bring?
This is what I’ll be focusing on in this article.
Table of contentsCommunication networks are lossy in nature. This means that if you send a packet through a network – there’s no guarantee of that packet reaching the other side. There’s also no guarantee that packets are reached in the order you’ve sent them or in a timely fashion, but that’s for another article.
This is why almost everything you do over the internet has this nice retransmission mechanism tucked away somewhere deep inside as an assumption. That retransmission mechanism is part of how TCP works – and for that matter, almost every other transport protocol implemented inside browsers.
The assumption here is that if something is lost, you simply send it again and you’re done. It may take a wee bit longer for the receiver to receive it, but it will get there. And if it doesn’t, we can simply announce that connection as severed and closed.
We call and measure that “something is lost” aspect of networks as packet loss.
Stripping away that automatic assumption that networks are reliable and everything you send over them is received on the other side is the first important step in understanding WebRTC but also in understanding real-time transport protocols and their underlying concepts.
Media codecs are lossy (and sensitive)Media codecs are also lossy but in a different way. When an audio codec or a video codec needs to encode (=compress) the raw input from a microphone or a camera, what they do is strip the data out of things they deem unnecessary. These things are levels of perceived quality of the original media.
I remember many years ago, sitting at the dorms in the university and talking about albums and CDs. One of the roommates there was an audiophile. He always explained how vinyl albums have better audio quality than CDs and how MP3 just ruins audio quality. Me? I never heard the difference.
Perceived quality might be different between different people. The better the codec implementation, the more people will not notice degraded quality.
Back to codecs.
Most media codecs are lossy in nature. There are a few lossless ones, but these are rarely used for real time communications and not used in WebRTC at all. The reason we use lossy codecs is to have better compression rates:
Taking 1080p (Full HD) video at 30 frames per second will result in roughly 1.5Gbps of data. Without compressing it – it just won’t work. We’re trying to squeeze a lot of raw data over networks, and as always, we need to balance our needs with the resources available to us.
To compress more, we need:
That last one is where media codecs become really sensitive.
If every bit matters, then losing a bit matters. And if losing a bit matters, then losing a whole packet matters even more.
Since networks are bound to lose packets, we’re going to need to deal with media packets missing and our system (in the decoder or elsewhere) needing to fill that gap somehow
More on lossy codecs
More on the future of audio codecs (lossy and lossless ones)
Types of WebRTC media correctionMedia packets are lost. Our media decoders – or WebRTC system as a whole – needs to deal with this fact. This is done using different media correction mechanisms. Here’s a quick illustration of the available choices in front of us:
Each such media correction technique has its advantages and challenges. Let’s review them so we can understand them better.
PLC: Packet Loss ConcealmentEvery WebRTC implementation needs a packet loss concealment strategy. Why? Because at some point, in some cases, you won’t have the packets you need to play NOW. And since WebRTC is all about real-time, there’s no waiting with NOW for too long.
What does packet loss concealment mean? It means that if we lost one or more packets, we need to somehow overcome that problem and continue to run to the best of our ability.
Before we dive a bit deeper, it is important to state: not losing packets is always better than needing to conceal lost packets. More on that – later.
This is done differently between audio and video:
Audio PLCFor the most part, audio packets are decoded frame-by-frame and usually also packet-by-packet. If one is lost, we can try various ways to solve that. There are the most common approaches:
Packet loss on video streams has its own headaches and challenges.
In video, most of the frames are dependant on previous ones, creating chains of dependencies:
I-frames or keyframes (whatever they are called depending on the video codec used) break these dependency chains, and then one can use techniques like temporal scalability to reduce the dependencies for some of the frames that follow.
When you lose a packet, the question isn’t only what to do with the current video frame and how to display it, but rather what is going to happen to future frames depending on the frame with the lost packet.
In the past, the focus was on displaying every bit that got decoded, which ended up with video played back with smears as well as greens and pinks.
Check it for yourself, with our most recent WebRTC fiddle around frame loss.Today, we mostly not display frames until we have a clean enough bitstream, opting to freeze the video a bit or skip video frames than show something that isn’t accurate enough. With the advances in machine learning, they may change in the future.
–
PLC is great, but there’s a lot to be done to get back the lost packets as opposed to trying to make do with what we have. Next, we will see the additional techniques available to us.
RTX: RetransmissionsHere’s a simple mechanism (used everywhere) to deal with packet loss – retransmission.
In whatever protocol you use, make sure to either acknowledge receiving what is sent to you or NACKing (sending a negative acknowledgement) when not receiving what you should have received. This way, the sender can retransmit whatever was lost and you will have it readily available.
This works well if there’s enough time for another round trip of data until you must play it back. Or when the data can help you out in future decoding (think the dependency across frames in video codecs). It is why retransmissions don’t always work that well in WebRTC media correction – we’re dealing with real time and low latency.
Another variation of this in video streams is asking for a new I-frame. This way, the receiver can signal the sender to “reset” the video stream and start encoding it from scratch, which essentially means a request to break the dependency between the old frames and the new ones that should be sent after the packet loss.
RED: REDundancy EncodingRetransmission means we overcome packet losses after the fact. But what if we could solve things without retransmissions? We can do that by sending the same packet more than once and be done with it.
Double or triple the bitstream by flooding it with the same information to add more robustness to the whole thing.
RED is exactly that. It concatenates older audio frames into fresh packets that are being sent, effectively doubling or tripling the packet size.
If a packet gets lost, the new frame it was meant to deliver will be found in one of the following packets that should be received.
Yes. it eats up our bandwidth budget, but in a video call where we send 1Mbps of video data or more, tripling the audio size from 40kbps to 90kbps might be a sacrifice worth making for cleaner audio.
FEC: Forward Error CorrectionRedundancy encoding requires an additional 100% or more of bitrate. We can do better using other means, usually referred to as Forward Error Correction.
Mind you, redundancy encoding is just another type of forward error correction mechanism
With FEC, we are going to add more packets that can be used to restore other packets that are lost. The most common approach for FEC is by taking multiple packets, XORing them and sending the XORed result as an additional packet of data.
If one of the packets is lost, we can use the XORed packet to recreate the lost one.
There are other means of correction algorithms that are a wee bit more complex mathematically (google about Reed-Solomon if you’re interested), but the one used in WebRTC for this purpose is XOR.
FEC is still an expensive thing since it increases the bitrate considerably. Which is why it is used only sparingly:
PLC, RTX, FEC, RED, …
How is each one signaled over the network? When would it make sense to use it? How does WebRTC implement it in the browser and what exactly can you expect out of it?
All that is mostly arcane knowledge. Something that is passed from one generation of WebRTC developers to another it seems.
Lucky for you, Philipp Hancke and myself are working on a new course – Higher Level WebRTC Protocols. In it, we are covering these specific topics as well as quite a few others in a level of detail that isn’t found anywhere else out there.
Most of the material is already written down. We just need to prettify it a bit and record it.
If you are interested in learning more about this, be sure to join our waiting list for once we launch the course
Join the course waiting listThe post WebRTC media resilience: the role FEC, RED, PLC, RTX and other acronyms play appeared first on BlogGeek.me.
ChatGPT is changing computing and as an extension how we interact with machines. Here’s how it is going to affect WebRTC.
ChatGPT became the service with the highest growth rate of any internet application, reaching 100 million active users within the first two months of its existence. A few are using it daily. Others are experimenting with it. Many have heard about it. All of us will be affected by it in one way or another.
I’ve been trying to figure out what exactly does a “ChatGPT WebRTC” duo means – or in other words – what does ChatGPT means for those of us working with and on WebRTC.
Here are my thoughts so far.
Table of contentsLet’s start with a quick look at what ChatGPT really is (in layman terms, with a lot of hand waving, and probably more than a few mistakes along the way).
BI, AI and Generative AII’ll start with a few slides I cobbled up for a presentation I did for a group of friends who wanted to understand this.
ChatGPT is a product/service that makes use of machine learning. Machine learning is something that has been marketed a lot as AI – Artificial Intelligence. If you look at how this field has evolved, it would be something like the below:
We started with simple statistics – take a few numbers, sum them up, divide by their count and you get an average. You complicate that a bit with weighted average. Add a bit more statistics on top of it, collect more data points and cobble up a nice BI (Business Intelligence) system.
At some point, we started looking at deep learning:
Here, we train a model by using a lot of data points, to a point that the model can infer things about new data given to it. Things like “do you see a dog in this picture?” or “what is the text being said in this audio recording?”.
Here, a lot of 3 letter acronyms are used like HMM, ANN, CNN, RNN, GNN…
What deep learning did in the past decade or two was enable machines to describe things – be able to identify objects in images and videos, convert speech to text, etc.
It made it the ultimate classifier, improving the way we search and catalog things.
And then came a new field of solutions in the form of Generative AI. Here, machine learning is used to generate new data, as opposed to classifying existing data:
Here what we’re doing is creating a random input vector, pushing it into a generator model. The generator model creates a sample for us – something that *should* result in the type of thing we want created (say a picture of a dog). That sample that was generated is then passed to the “traditional” inference model that checks if this is indeed what we wanted to generate. If it isn’t, we iteratively try to fine tune it until we get to a result that is “real”.
This is time consuming and resource intensive – but it works rather well for many use cases (like some of the images on this site’s articles that are now generated with the help of Midjourney).
So…
The thing is that all this thing I just explained wouldn’t be interesting without ChatGPT – a service that came to our lives only recently, becoming the hottest thing out there:
The Most Important Chart In 100 Years https://t.co/Ypcsqi0AWJ #AI #GPT #ChatGPT #technology @JohnNosta pic.twitter.com/QjMroVZ7cG
— Kyle Hailey (@kylelf_) February 16, 2023ChatGPT is based on LLMs – Large Language Models – and it is fast becoming the hottest thing around. No other service grew as fast as ChatGPT, which is why every business in the world now is trying to figure out if and how ChatGPT will fit into their world and services.
Why ChatGPT and WebRTC are like oil and waterSo it begged the question: what can you do with ChatGPT and WebRTC?
Problem is, ChatGPT and WebRTC are like oil and water – they don’t mix that well.
ChatGPT generates data whereas WebRTC enables people to communicate with each other. The “generation” part in WebRTC is taken care of by the humans that interact mostly with each other on it.
On one hand, this makes ChatGPT kinda useless for WebRTC – or at least not that obvious to use for it.
But on the other hand, if someone succeeds to crack this one up properly – he will have an innovative and unique thing.
What have people done with ChatGPT and WebRTC so far?It is interesting to see what people and companies have done with ChatGPT and WebRTC in the last couple of months. Here are a few things that I’ve noticed:
In LiveKit’s and Twilio’s examples, the concept is to use the audio source from humans as part of prompts for ChatGPT after converting them using Speech to Text and then converting the ChatGPT response using Text to Speech and pass it back to the humans in the conversation.
Broadening the scope: Generative AIChatGPT is one of many generative AI services. Its focus is on text. Other generative AI solutions deal with images or sound or video or practically any other data that needs to be generated.
I have been using MidJourney for the past several months to help me with the creation of many images in this blog.
Today it seems that in any field where new data or information needs to be created, a generative AI algorithm can be a good place to investigate. And in marketing-speak – AI is overused and a new overhyped term was needed to explain what innovation and cutting edge is – so the word “generative” was added to AI for that purpose.
Fitting Generative AI to the world of RTCHow does one go about connecting generative AI technologies with communications then? The answer to this question isn’t an obvious or simple one. From what I’ve seen, there are 3 main areas where you can make use of generative AI with WebRTC (or just RTC):
Here’s what it means
Conversations and botsIn this area, we either have a conversation with a bot or have a bot “eavesdrop” on a conversation.
The LiveKit and Twilio examples earlier are about striking a conversation with a bot – much like how you’d use ChatGPT’s prompts.
A bot eavesdropping to a conversation can offer assistance throughout a meeting or after the meeting –
As I stated above, this has little to do with WebRTC itself – it takes place elsewhere in the pipeline; and to me, this is mostly an application capability.
Media compressionAn interesting domain where AI is starting to be investigated and used is media compression. I’ve written about Lyra, Google’s AI enabled speech codec in the past. Lyra makes assumptions on how human speech sounds and behaves in order to send less data over the network (effectively compressing it) and letting the receiving end figure out and fill out the gaps using machine learning. Can this approach be seen as a case of generative AI? Maybe
Would investigating such approaches where the speakers are known to better compress their audio and even video makes sense?
How about the whole super resolution angle? Where you send video at resolutions of WVGA or 720p and then having the decoder scale them up to 1080p or 4K, losing little in the process. We’re generating data out of thin air, though probably not in the “classic” sense of generative AI.
I’d also argue that if you know the initial raw content was generated using generative AI, there might be a better way in which the data can be compressed and sent at lower bitrates. Is that something worth pursuing or investigating? I don’t know.
Media processingSimilar to how we can have AI based codecs such as Lyra, we can also use AI algorithms to improve quality – better packet loss concealment that learns the speech patterns in real time and then mimics them when there’s packet loss. This is what Google is doing with their WaveNetEQ, something I mentioned in my WebRTC unbundling article from 2020.
Here again, the main question is how much of this is generative AI versus simply AI – and does that even matter?
Is the future of WebRTC generative (AI)?ChatGTP and other generative AI services are growing and evolving rapidly. While WebRTC isn’t directly linked to this trend, it certainly is affected by it:
Like any other person and business out there, you too should see if and how does generative AI affects your own plans.
The post ChatGPT meets WebRTC: What Generative AI means to Real Time Communications appeared first on BlogGeek.me.
RTC@Scale is Facebook’s virtual WebRTC event, covering current and future topics. Here’s the summary for RTC@Scale 2023 so you can pick and choose the relevant ones for you.
WebRTC Insights is a subscription service I have been running with Philipp Hancke for the past two years. The purpose of it is to make it easier for developers to get a grip of WebRTC and all of the changes happening in the code and browsers – to keep you up to date so you can focus on what you need to do best – build awesome applications.
We got into a kind of a flow:
Oh – and we’re covering important events somewhat separately. Last month, a week after Meta’s RTC@Scale event took place, Philipp sat down and wrote a lengthy summary of the key takeaways from all the sessions, which we distributed to our WebRTC Insights subscribers.
As a community service (and a kind of a promotion for WebRTC Insights), we are now opening it up to everyone in this article
Table of contentsMeta ran their rtc@scale event again. Last year was a blast and we were looking forward to this one. The technical content was pretty good again. As last year, our focus for this summary is what we learned or what it means for folks developing with WebRTC. Once again, the majority of speakers were from Meta. At times they crossed the line of “is this generally useful” to the realm of “Meta specific” but most of the talks still provide value.
Compared to last year there were almost no “work with me” pitches (with one exception).
It is surprising how often Meta says “WebRTC” or “Google” (oh and Amazon as well).
Writing up these notes took a considerable amount of time (again) but we learned a ton and will keep referencing these talks in the future so it was totally worth it (again). You can find the list of speakers and topics on the conference website, the seven hours of raw video here (which includes the speaker introductions) or you just scroll down below for our summary.
SESSION 1 Rish Tandon / Meta – Meta RTC State of the UnionDuration: 13:50
Watch if you
Key insights:
Duration: 19:30
Watch if you are
Key insights:
Duration: 18:00
Watch if you are
Key insights:
Duration: 18:40
Watch if you are
Key insights:
Duration: 25:00
Watch if:
Key points:
Duration: 21:30
Watch if you are
Key points:
Duration: 19:00
Watch if you are
Key points:
Duration: 15:50
Watch if you are
Key points:
Duration: 28:00
Watch if:
Key points:
Duration: 22:00
Watch if you are
Key points:
Duration: 18:15
Watch if you
Key points:
Duration: 25:00
Watch if you are
Key takeaways:
Duration: 17:45
Watch if you are
Key points:
Duration: 19:45
Watch if
Key points:
Duration: 18:15
Watch if you are
Key points:
Duration: 22:00
Watch if you are
Key points:
Duration: 14:10
Watch if
Key points:
We tried capturing as much as possible, which made this a wee bit long. The purpose though is to make it easier for you to decide in which sessions to focus, and even in which parts of each session.
Oh – and did we mention you should check out (and subscribe) to our WebRTC Insights service?
The post RTC@Scale 2023 – an event summary appeared first on BlogGeek.me.
WebRTC media server is an optional component in a WebRTC application. That said, in most common use cases, you will need one.
There are different types of WebRTC servers. One of them is the WebRTC media server. When will you be needing one and what exactly it does? Read on.
Oh – and if you’re looking to dig deeper into WebRTC media servers, make sure to check the end of this article for an announcement of our latest WebRTC course
Table of contentsThere are quite a few moving parts in a WebRTC application. There’s the client device side, where you’ll have the web browsers with WebRTC support and maybe other types of clients like mobile applications that have WebRTC implementations in them.
And then there are the server side components and there are quite a few of them. The illustration above shows the 4 types of WebRTC servers you are likely to need:
The illustration below shows how all of these WebRTC servers connect to the client devices and what types of data flows through them:
What is interesting, is that the only real piece of WebRTC infrastructure component that can be seen as optional is the WebRTC media server. That said, in most real-world use-cases you will need media servers.
The role of a WebRTC media serverAt its conception, WebRTC was meant to be “between” browsers. Only recently, did the good people at the W3C see it fit to change it to something that can work also in browsers. We’ve know that to be the case all along
What does a WebRTC media server do exactly? It processes and routes media packets through the backend infrastructure – either in the cloud or on premise.
Let’s say you are building a group calling service and you want 10 people to be able to join in and talk to each other. For simplicity’s sake, assume we want to get 1Mbps of encoded video from each participant and show the other 9 participants on the screen of each of the users:
How would we go about building such an application without a WebRTC media server?
To do that, we will need to develop a mesh architecture:
We’d have the clients send out 1Mbps of their own media to all the other participants who wish to display them on their screen. This amounts to 9*1Mbps = 9Mbps of upstream data that each participant will be sending out. Each client receives streams from all 9 other participants, getting us to 9Mbps of downstream data.
This might not seem like much, but it is. Especially when sent over UDP in real time, and when we need to encode and encrypt each stream separately for each user, and to determine bandwidth estimation across the network. Even if we reduce the requirement from 1Mbps to a lower bitrate, this is still a hard problem to deal with and solve.
It becomes devilishly hard (impossible?) when we crank up the number to say 50 or a 100 participants. Not to mention the numbers we see today of 1,000 or more participants in sessions (either active participants or passive viewers).
Enter the WebRTC media server
This is where a WebRTC media server comes in. We will add it here to be able to do the following tasks for us:
Here’s what’s really going on and what we use these media servers for:
WebRTC media servers bridge the gaps in the architecture that we can’t solve with clients alone
How is a WebRTC media server different from TURN serversBefore we continue and dive in to the different types of media servers, there’s something that must be said and discussed:
WebRTC media server != TURN server
I’ve seen people try to use the TURN server to do what media servers do. Usually that would be things like recording the data stream.
This doesn’t work.
TURN servers route media through firewalls and NAT devices. They aren’t privy to the data being sent through them. WebRTC privacy is maintained by having data encrypted end to end when passing via TURN servers – the TURN servers don’t know the encryption key so can’t do anything with the media.
WebRTC media servers are implementations of WebRTC clients in a server component. From an architectural point of view, the “session” terminates in the WebRTC media server:
A WebRTC media server is privy to all data passing through it, and acts as a WebRTC client in front of each of the WebRTC devices it works with. It is also why it isn’t so well defined in WebRTC but at the same time so versatile.
Types of WebRTC media serversThis versatility of WebRTC media servers means that there are different types of such servers. Each one works under different architectural assumptions and concepts. Lets review them quickly here.
Routing media using an SFUThe most common and popular WebRTC media server is the SFU.
An SFU routes media between the devices, doing as little as possible when it comes to the media processing part itself.
The concept of an SFU is that it offloads much of the decision making of layout and display to the clients themselves, giving them more flexibility than any other alternative. At the same time, it takes care of bandwidth management and routing logic to best fit the capabilities of the devices it works with.
To do all that, it uses technologies such as bandwidth estimation, simulcast, SVC and many others (things like DTX, cascading and RED).
At the beginning, SFUs were introduced and used for group calls. Later on, they started to appear as live streaming and broadcast components.
Mixing media with an MCUProbably the oldest media server solution is the MCU.
The MCU was introduced years before WebRTC, when networks were limited. Telephony systems had/have voice conferencing bridges built around the concept of MCUs. Video conferencing systems required the use of media servers simply because video compression required specialized hardware and later too much CPU from client devices.
In telephony and audio, you’ll see this referred to as mixers or audio bridges and not MCUs. That said, they still are one and the same technically.
What MCUs do is to receive and mix the media streams it receives from the various participants, sending a single stream of media towards the clients. For clients, an MCU looks like a call between 2 participants – it is the only entity the client really interacts with directly. This means there’s a single audio and a single video stream coming into and going out of the client – regardless of the number of participants and how/when they join and leave the session.
MCUs were less used in WebRTC from the get go. Part of it was the simple economies of scale – MCUs are expensive to operate, requiring a lot of CPU power (encoding and decoding media is expensive). It is cheaper to offer the same or similar services using SFUs. There are vendors who still rely on MCUs in WebRTC for group calling, though in most cases, you will find MCUs providing the recording mechanism only – where what they end up doing is taking all inputs and mixing them into a single stream to place in storage.
Bridging across standards using a gatewayAnother type of media server that is used in WebRTC is a gateway.
In some cases, content – rendered, live or otherwise – needs to be shared in a WebRTC session – or a WebRTC session needs to be shared on another type of a protocol/medium. To do so, a gateway can be used to bridge between the protocols.
The two main cases where these happen are probably:
One more example is a kind of a hybrid media server. One that might do routing and processing together. A group calling service that also records the call into a single stream for example. Such solutions are becoming more and more popular and are usually deployed as multiple media servers of different types (unlike the illustration above), each catering for a different part of the service. Splitting them up makes it easier to develop, maintain and scale them based on the workload needed by each media server type.
Cloud renderingThis might not be a WebRTC media server per se, but for me this falls within the same category.
Sometimes, what we want is to render content in the cloud and share it live with a user on a browser. This is true for things like cloud gaming or cloud application delivery (Photoshop in the cloud for hourly consumption). In such a case, this is more like a peer-to-peer WebRTC session taking place between a user on a browser and a cloud server that renders the content.
I see it as a media server because many of the aspects of development and scaling of the cloud rendering components are more akin to how you’d think about WebRTC media servers than they are about browser or native clients.
A quick exercise: What WebRTC media servers are used by Google Meet?Let’s look at an example service – Google Meet. Why Google Meet? Well, because it is so versatile today and because if you want to trace capabilities in WebRTC, the best approach is to keep close tabs with what Google Meet is doing.
What WebRTC media servers does Google Meet use? Based on the functionality it offers, we can glean out the types that make up this service:
A classing meeting service in WebRTC may well require more than a single type of a WebRTC media server, likely deployed in hybrid mode across different hardware configurations.
When will you need a WebRTC media server?As we’ve seen earlier, the answer to this is simple – when doing things with WebRTC clients only isn’t possible and we need something to bridge this gap.
We may lack:
What I usually do when analyzing the needs of a WebRTC application is to find these gaps and determine if a WebRTC media server is needed (it usually is). I do so by thinking of the solution as a P2P one, without media servers. And then based on the requirements and the gaps found, I’ll be adding certain WebRTC media server elements into the infrastructure needed for my WebRTC application.
E2EE and WebRTC media serversWe’ve seen a growing interest in recent years in privacy. The internet has shifted to encryption first connections and WebRTC offers encrypted only media. This shift towards privacy started as privacy from other malicious actors on the public internet but has since shifted also towards privacy from the service provider itself.
Running a group meetings service through a service provider that cannot access the meeting’s content himself is becoming more commonplace.
This capability is known as E2EE – End to End Encryption.
When introducing WebRTC media servers into the mix, it means that while they are still a part of the session and are terminating WebRTC peer connections (=terminating encrypted SRTP streams) on their own, they shouldn’t have access to the media itself.
This can be achieved only in the SFU type of WebRTC media servers by the use of insertable streams. With it, the application logic can exchange private encryption keys between the users and have a second encryption layer that passes transparently through the SFU – enabling it to do its job of packet routing without the ability to understand the media content itself.
WebRTC media servers and open sourceAnother important aspect to understand about WebRTC media servers is that most of those using media servers in WebRTC do so using open source frameworks for media servers.
I’ve written at length about WebRTC open source projects – there are details there about the market state and open source WebRTC media servers there.
What is important to note is that more often than not, projects who don’t use managed services for their WebRTC media servers usually pick open source WebRTC media servers to work with and not develop their own from scratch. This isn’t always the case, but it is quite common.
Video APIs, CPaaS and WebRTC media serversWebRTC Video API and CPaaS is another area I cover quite extensively.
Vendors who decide to use a CPaaS vendor for their WebRTC application will mainly do it in one of two situations:
Both cases require media servers…
This leads to the following important conclusion: there’s no such thing as a CPaaS vendor doing WebRTC that isn’t offering a managed WebRTC media server as part of its solution – and if there is, then I’ll question its usefulness for most potential customers.
Taking a deep dive into WebRTC protocolsLast year, I released the Low-level WebRTC protocols course along with Philipp Hancke.
The Low-level WebRTC protocols course has been a huge success, which is why we’re starting to work on our next course in this series: Higher level WebRTC protocols
Before we go about understanding WebRTC media servers, it is important to understand the inner-workings of the network protocols that WebRTC employs. Our low-level protocols course covers the first part of the underlying protocols. This second course, looks at the higher level protocols – the parts that look and deal a bit more with network realities – challenges brought to us by packet losses as well as other network characteristics.
Things we cover here include retransmissions, forward error correction, codecs packetization and a myriad of media processing algorithms.
Want to be the first to know when we open our early bird enrollment?
Join the waiting listThe post What exactly is a WebRTC media server? appeared first on BlogGeek.me.
WHIP and WHEP are specifications to get WebRTC into live streaming. But is this really what is needed moving forward?
WebRTC is great for real time. Anything else – not as much. Recently two new protocols came to being – WHIP and WHEP. They work as signaling to WebRTC to better support live streaming use cases.
In recent months, there has been a growing adoption in the implementation of these protocols (the adoption of actual use isn’t something I am privy to so can’t attest either way). This progress is a positive one, but I can’t ignore the feelings I have that this is only a temporary solution.
Table of contentsWHIP stands for WebRTC-HTTP Ingestion Protocol. WHEP stands for WebRTC-HTTP Egress Protocol. They are both relatively new IETF drafts that define a signaling protocol for WebRTC.
WebRTC explicitly decided NOT to have any signaling protocol so that developers will be able to pick and choose any existing signaling protocol of their choice – be it SIP, XMPP or any other alternative. For the media streaming industry, this wasn’t a good thing – they needed a well known protocol with ready-made implementations. Which led to WHIP and WHEP.
To understand them how they fit into a solution, we can use the diagram below:
In a live streaming use case, we have one or more broadcasters who “Ingest” their media to a media server. That’s where WHIP comes in. The viewers on the other side, get their media streams on the egress side of the media servers infrastructure.
For a technical overview of WHIP & WHEP, check out this Kranky Geek session by Sergio Garcia Murillo from Dolby:
In video conferencing, WebRTC transformed the market and how it thought of meetings and interoperability by practically killing the notion of interoperability across vendors on the protocol level, shifting it to the application level and letting users install their own apps on devices or just load web pages on demand.
The streaming industry is different – it relies on 3 components, which can easily come from 3 different vendors:
When a broadcaster implements his application, he picks and chooses the media servers and media players. Sometimes he will also pick the ingestion part, but not always. And none of the vendors in each of these 3 categories can really enforce the use of his own components for the others.
This posed a real issue for WebRTC – it has no signaling protocol – this is left for the implementers, but how do you develop such a solution that works across vendors without a suitable signaling protocol?
The answer for that was WHIP and WHEP –
These are really simple protocols built around the notion of a single HTTP request – in an attempt to get the streaming industry to use them and not shy away from the complexities hidden in WebRTC.
StrengthsHere’s what’s working well for WHIP and WHEP:
There’s the challenging side of things as well:
This last weakness – WebRTC – leads me to the next issue at hand.
Streaming, latency and WebRTCStreaming comes in different shapes and sizes.
The scenario might have different broadcasters:viewers count – 1:1, 1:many, few:1, few:many – each has its own requirements and nuances as to what I’d prefer using on the sending side, receiving end and on the media server itself.
What really changes everything here is latency. How much latency are we willing to accept?
The lower the latency we want the more challenging the implementation is. The closer to live/real time we wish to get, the more sacrifices we will need to make in terms of quality. I’ve written about the need to choose either quality or latency.
WebRTC is razor focused on real time and live. So much so that it can’t really handle something that has latency in it. It can – but it will sacrifice too much for it at a high complexity cost – something you don’t really want or need.
What does that mean exactly?
This is when a few tough questions need to be asked – what exactly does your streaming service need?
If you need things to be conducted in sub-second latency only, then WebRTC is probably the way to go. But if you have in your use case other latencies as well, then think twice before choosing WebRTC as your go-to solution.
A hybrid WebRTC approach to “live” streamingAn important aspect that needs to be mentioned here is that in many cases, WebRTC is used in a hybrid model in media streaming.
Oftentimes, we want to ingest media using WebRTC and view the media elsewhere using other protocols – usually because we don’t care as much about latency or because we already have the viewing component solved and deployed – here WebRTC ingest is added to an existing service.
Adding the WHIP protocol here, and ingesting WebRTC media to the streaming service means we can acquire the media from a web browser without installing anything. Real time is nice, but not always needed. Browser ingest though is mostly about reducing friction and enabling web applications.
The 3 horsemen: WebTransport, WebCodecs and WebAssemblyThat last suggestion would have looked different just two years ago, when for real time the only game in town for browsers was WebRTC. Today though, it isn’t the case.
In 2020 I pointed to the unbundling of WebRTC. The trend in which WebRTC is being split into its core components so that developers will be able to use each one independently, and in a way, build their own solution that is similar to WebRTC but isn’t WebRTC. These components are:
Theoretically, using these 3 components one can build a real time communication solution, which is exactly what Zoom is trying to do inside web browsers.
In the past several months I’ve seen more and more companies adopting these interfaces. It started with vendors using WebAssembly for background blurring and replacement. Moved on to companies toying around with WebTransport and/or WebCodecs for streaming and recently a lot of vendors are doing noise suppression with WebAssembly.
Here’s what Intel showcased during Kranky Geek 2021:
This trend is only going to grow.
How does this relate to streaming?
Good that you asked!
These 3 enables us to implement our own live streaming solution, not based on WebRTC that can achieve sub second latency in web browsers. It is also flexible enough for us to be able to add mechanisms and tools into it that can handle higher latencies as needed, where in higher latencies we improve upon the quality of the media.
StrengthsHere’s what I like about this approach:
It isn’t all shiny though:
I don’t know.
WHIP and WHEP are here. They are gaining traction and have vendors behind them pushing them.
On the other hand, they don’t solve the whole problem – only the live aspect of streaming.
The reason WebRTC is used at the moment is because it was the only game in town. Soon that will change with the adoption of solutions based on WebTransport+WebCodecs+WebAssembly where an alternative to WebRTC for live streaming in browsers will introduce itself.
Can this replace WebRTC? For media streaming – yes.
Is this the way the industry will go? This is yet to be seen, but definitely something to track.
The post WHIP & WHEP: Is WebRTC the future of live streaming? appeared first on BlogGeek.me.
Note: Chinese translation thanks to Xueyuan Jia and Xiaoqian Wu of the W3C. See the English version here. W3C Web 技术标准专家 François Daoust 和 Dominique Hazaël-Massieux(Dom)先前与我们探讨了如何使用 WebCodecs 和 Streams 进行实时视频处理。那篇文章重点介绍了如何设置流水线以应付来自摄像头、WebRTC 流或其他来源的视频帧低延迟处理。演示了一些处理示例 — 改变颜色、覆盖图像,甚至是改变视频编解码。引用的其他用例还包括机器学习处理,例如添加虚拟背景。 今天,他们将重点讨论可用于进行实际视频处理的诸多技术选项。有很多技术用来读取和更改视频帧内的像素。他们全面回顾了当前基于 Web 的所有技术选项 — JavaScript、WebAssembly (wasm)、WebGPU、WebGL、WebCodecs、Web 神经网络(WebNN)和 WebTransport。其中一些技术已经存在一段时间,许多则是新出现的。 这是一篇关于与视频分析与操作的文章。感谢 François 和 Dominique 与我们分享他们的研究,测试 Web 上可用的进行视频处理的完整技术目录。 正文内容 视频帧处理选项 使用 JavaScript 像素格式 性能 其他考虑 使用 WebAssembly 演示代码 […]
The post Web 上的视频帧处理 – WebAssembly、WebGPU、WebGL、WebCodecs、WebNN 和 WebTransport appeared first on webrtcHacks.
There are a lot of options for reading and changing the pixels inside a video frame. In this post, W3C specialists François Daoust and Dominique Hazaël-Massieux (Dom) review every web-based option for processing video frames on the web available today - JavaScript, WebAssembly (wasm), WebGPU, WebGL, WebCodecs, Web Neural Networks (WebNN), and WebTransport.
The post Video Frame Processing on the Web – WebAssembly, WebGPU, WebGL, WebCodecs, WebNN, and WebTransport appeared first on webrtcHacks.
Understanding how WebRTC is governed in reality will enable you to make better decisions in your development strategy.
If you are correct or not is something we can argue about. What we can’t argue is that the expectation that a company who is maintaining an open source library doesn’t owe you anything.
Free is worth exactly what you pay for it. 0⃣
And there lies the whole issue – if you aren’t paying for WebRTC, then what gives you the right to complain? (btw – this is different from the other side of it – could Google do a better job of maintaining WebRTC for everyone at the same or lower effort, while increasing external contributions to it).
Table of contentsTo. Many, Times. People. Complain. About. Google.
I do that as well
If you are complaining, at least know that you’re complaining about something that is reasonable…
One of the more recent cases comes from Twilio (or more accurately a customer of theirs):
There was a minor change in Google’s implementation of WebRTC. For some reason, they decided to be less lenient with how they parse iceServers in peer connections to be more “spec compliant”.
Yes. It is nitpicking.
Yes. It is a useless change.
Yes. They could have decided not to do it.
But they did. And in a weird way, it makes sense to do so.
And there’s a process in place already for dealing with that – Canary and Beta versions of Chrome that vendors (like Twilio) can use to catch and handle these things beforehand. Or they can… well… register to the WebRTC Insights
Twilio had to fix their code (and they did by the way), and yet there are those who blame Google here for making changes in Chrome. Changes that one can say are needed.
I’d add a few more thoughts here before I continue to dive in to this topic properly:
WebRTC is an open standard governed by the W3C and an open source library which confusingly is also named “webrtc”. I prefer to call it libwebrtc.
The WebRTC open source standard is somewhat split in “ownership” between the W3C and the IETF. W3C is in charge of the API surface we use in the browser for WebRTC and the IETF on the network protocol itself – what gets sent over the network.
WebRTC as an open source library is… well… it depends. Google develops and maintains libwebrtc – that’s the source code that goes into Chrome. And Edge. And Firefox. And Safari. Yes – all of them. And then there are other alternative libraries you can use.
The thing is this – you can’t really use a different WebRTC implementation in the browser, because browsers come with libwebrtc “built-in”. And in many cases, if you don’t need a browser, you may still want to use libwebrtc just to be as close as possible to the browser implementation.
Does that mean that Google owns the WebRTC implementation? To some degree it does – while there are alternatives, none of them are truly usable for many of the use cases.
That said, anyone can fork the Google WebRTC implementation and create his own project – open source or otherwise – and continue from there. Apple could do it. So could Microsoft and Mozilla. And yet they all decided to stick with libwebrtc as is.
Why is that?
I can think of two main reasons:
So in a way, Google owns WebRTC without really owning it. At least as long as Chrome is the undisputed and dominant form in which we consume the internet (are you reading this on a Chrome browser?)
I usually place a global market share graph at this stage. This time, I’ll share this website’s visitors distribution:
A few words about libwebrtclibwebrtc is maintained by Google for Google. It is open sourced and you can use it. You can even contribute back, which isn’t a simple process.
By Google for Google means that prioritization of features, testing and bug fixes is done based on Google’s needs. These needs include Google Meet, a few other Google services and the need to support and maintain the larger ecosystem.
Who sets the tone here? What decides if your bug is more important to deal with than Google Meet or another vendor’s problems?
Put yourself in the shoes of the Google product manager for WebRTC and you’ll know the answer – it would be Google Meet first. The others later.
This also sets the tone as to the build system and code structure of libwebrtc. It is highly geared towards its use inside Chrome. Less elsewhere. And this in turn means that adopting it as a library inside your own application means dealing with code that isn’t meant to be a classic generic purpose SDK – you’ll need to figure your way through it (and with a bit less documentation than you’d like).
Vendors in the WebRTC ecosystemThere are now hundreds if not thousands of vendors using WebRTC in the ecosystem. They do it directly or indirectly via CPaaS vendors and other tooling and solutions. You can find many of them in my WebRTC Developer Tools Lanscape. Most of them view WebRTC as free. Not only that, it seems like many treat WebRTC as a human right – it needs to be there for them, it must be perfect, and if there’s something ”wrong” with it, then humanity has the obligation to fix it for them.
So… WebRTC is free. But what does that mean exactly? What is the SLA associated with it? What can you expect of it and come back to complain if it isn’t met?
Here are a few additional interesting questions, If WebRTC is cardinal and strategic to your application:
To be clear – there are no right or wrong answers here – just make sure you position your expectations based on your answers as well
Putting your money where your mouth isPhilipp Hancke has been doing WebRTC for a long time and is renowned for his bug reports. He even got Google to fix quite a few of them. Some bugs stayed open for years however, like this bug about TURN relay servers being used sometimes in cases where using STUN will be just fine. A bug here has an impact on the percentage of calls that get relayed via TURN servers which has a negative impact on call quality (at times) but also increases the cost to run those.
This bug has been open for since 2016. Quite a few Googlers took a look but without finding anything that stood out. The crucial hint of what goes wrong came in 2021 in another bug report. In the end, Philipp had to acquire the skills necessary to fix the bug (which will hopefully happen before the end of 2023).
This takes time and time is not cheap – especially that of engineers. Microsoft as his employer apparently decided it was important enough for him to spend time on fixing this and other issues.
Please Google add a feature for me!HEVC encoding and decoding in WebRTC seems to be a topic some folks get excited about. It would be great to know why..
There is a bug report about it in the WebRTC issue tracker which gets fairly frequent updates. And yet… Google does nothing! How can that be?
One would say that’s because it is out of the requirements of what Google needs for Google. There are other contributing factors as well here:
There’s this modern concept of zero trust in cloud computing these days.
Here’s my suggestion to you wrt WebRTC and your stance:
Zero expectations.
Don’t expect – and you won’t be disappointed.
But more importantly – understand how this game is played:
And yes – we’re here to help – you can use WebRTC Insights to get ahead of these issues in many ways.
The post With WebRTC, don’t expect Google to be your personal outsourcing vendor appeared first on BlogGeek.me.
WebRTC used to be about capturing some media and sending it from Point A to Point B. Machine Learning has changed this. Now it is common to use ML to analyze and manipulate media in real time for things like virtual backgrounds, augmented reality, noise suppression, intelligent cropping, and much more. To better accommodate this […]
The post Real-Time Video Processing with WebCodecs and Streams: Processing Pipelines (Part 1) appeared first on webrtcHacks.
In group calls there are different ways to decide on WebRTC server allocation. Here are some of them, along with recommendations of when to use what.
In WebRTC group calling, media server scaling is one of the biggest challenges. There are multiple scaling architectures that are used, and most likely, you will be aiming at a routing alternative, where media servers are used to route media streams around between the various participants of a session.
As your service grows, you will need to deal with scale:
In all these instances, you will have to deal with the following challenge: How do you decide on which server to allocate a new user? There are various allocation schemes to choose from for WebRTC group calling. Each with its own advantages and challenges. Below, I’ll highlight a few such schemes to help you with implementing the WebRTC allocation scheme that is most suitable for your application.
Table of contentsFirst things first. Media servers in WebRTC don’t scale well. For most use cases, a single server will be able to support 200-500 users. When more than these numbers are supported, it will usually be due to the fact that it sends lower bitrates by design, supports only voice or built to handle only one way live streaming scenarios.
This can be viewed as a bad thing, but in some ways, it isn’t all bad – with cloud architectures, it is preferable to keep the blast radius of failures smaller, so that an erroneous machine ends up affecting less users and sessions. WebRTC media servers force developers to handle scaling earlier in their development.
Our first order of the day is usually going to be deciding how to deal with more than a single media server in the same data center location. We are likely to load-balance these media servers through our signaling server policy, effectively associating a media server to a user or a media stream when the user joins a session. Here are a few alternatives to making this decision.
Server packingThis one is rather straightforward. We fill out a media server to capacity before moving on to fill out the next one.
Advantages:
Challenges:
In this technique, we look for the media server that has the most free capacity on it and place the new user or session on it.
Advantages:
Challenges:
Our “don’t think too much” approach. Allocate the next user or session to a server and move on to the next one in the list of servers for the next allocation.
Advantages:
Challenges:
Then there’s the approach of picking up a server by random. It sounds reckless, but in many cases, it can be just as useful as least used or round robin.
Advantages:
Challenges:
The second part is determining which region to send a session or a user in a session to.
If you plan on designing your service around a single media server handling the whole session, then the challenge is going to be where to open a brand new session (adding more users takes place on that same server anyway). Today, many services are moving away from the single server approach to a more distributed architecture.
Lets see what our options are here in general.
First in roomThe first user in a session decides in which region and data center it gets created. If there are more than a single media server in that data center, then we go with our single data center allocation techniques to determine which one to use.
This is the most straightforward and naive approach, making it almost the default solution many start with.
Advantages:
Challenges:
Note that everything has a solution. The solutions though makes this harder to implement and may degrade the user experience in the edge cases it deals with.
Application specificYou can pick the first that joins the room to make the decision of geolocation or you can use other means to do that. Here, the intent is to use something you know in your application in advance to make the decision.
For example, if this is a course lesson with the teacher joining from India and all the students are joining from the UK, it might be beneficial to connect everyone to a media server in the UK or vice versa – depending on where you want to put the focus.
A similar approach is to have the session determine the location by the host (similar to first in room) or be the configuration of the host – at account creation or at session creation.
Advantages:
Challenges:
Cascading is also viewed as distributed/mesh media servers architecture – pick the name you want for it.
With cascading, we let media servers communicate with each other to cater for a single session together. This approach is how modern services scale or increase media quality – in many ways, many of the other schemes here are “baked” into this one. Here are a few techniques that are applicable here:
Advantages:
Challenges:
This one surprised me the first time I saw it. In this approach, we “disconnect” all incoming traffic from outgoing and treat each of them separately as if it were an independent live stream.
What does that mean? When a user joins, he will always connect to the media server closest to them in order to send their media. For the incoming media from other users, he will subscribe to their streams directly on the media servers of those users.
Advantages:
Challenges:
One thing I ignored in all this is how do you know when a server is “full”. This decision can be done in multiple ways, and I’ve seen different vendors take different approaches here. There are two competing aspects here to deal with:
Here are a few examples, so you can make an informed decision on your end:
Sometimes, we will use multiple metrics to make our allocation decision.
Final wordsScaling group calls isn’t simple once you dive into the details. There are quite a few WebRTC allocation schemes that you can use to decide where to place new users joining group sessions. There are various techniques to implement allocation of users in group calling, each with its own advantages and challenges.
Pick your poison
One last word – this article was written based on a new lesson that was just added to the Advanced WebRTC Architecture course. If you are looking for the best WebRTC training, then check out my WebRTC Courses.
The post Different WebRTC server allocation schemes for scaling group calling appeared first on BlogGeek.me.
Yes and no. WebRTC getStats is what we have to work with, so we have to make do with it. That said, your real problems may lie elsewhere altogether.
Philipp Hancke assisted in writing this article and Midjourney helped with most of the visuals
This is the question I was posed in a meeting last week:
Can I trust WebRTC getStats?
As the Jewish person that I am, I immediately answered with a question of my own:
Assume the answer is “No”. What are you going to do now?
I thought the conversation merits a bit more discussion and some public sharing, which led to this article being written.
Table of contentsYes. You can and should trust the accuracy of WebRTC getStats, but like with everything else, you should also keep a dose of happy suspicion around you.
Like any piece of software, libwebrtc and its getStats implementation by extension, has bugs. These bugs get fixed over time. The priority given to fixing them relates mostly to how much Google’s own services suffer from and a seemingly arbitrary prioritization for the rest of the issues.
See below to learn more on why we have a problem and what you can do about it.
A short history of WebRTC getStats Midjourney, envisioning the history of WebRTC getStatsWebRTC was announced somewhere in 2011 and the initial public code in Chrome was released in 2012. The protocol itself was stabilized and officially published by the W3C in January 2021. Just… 10 years later.
In between these 10 years a lot of discussions took place and the actual API surface of the WebRTC standard specification was modified to fit the feedback provided and to encompass additional use cases and requirements.
We’ve had these discussions taking place in parallel to WebRTC being implemented in web browsers and shipped out so developers can make use of them. Years before WebRTC was officially “standardized” we had hundreds if not thousands of applications in production using WebRTC, oftentimes with paying customers.
At some point, the getStats implementation in the standard specification diverged from that implemented by Google in Chrome, ending with two main alternatives:
. This made switching from one to the other a challenge:
The decision was made that the distinction between the two would be how getStats() is called. Callback-based invocation returned the legacy stats while using a promise returned spec-compliant getStats. The logic behind this was that promises was a new construct introduced to Javascript at the time, so developers who used the legacy getStats didn’t use promises (yet).
This approach worked rather well for the last 6 years, with many (most?) applications adopting the use of the spec-compliant getStats:
We observed a step drop in usage when Google Meet stopped using the legacy API (that’s the blue line going down). That said, a few outliers still remain who use the old getStats. They will not be able to do so in 2024.
Google WebRTC housecleaning projectFast forward to today (or last year).
WebRTC is a solid standard and implementation used by many. It got us through the pandemic in many ways and aspects.
All the bigger requirements from WebRTC are behind us. There aren’t that many innovations or new features that get introduced to it.
Which is leading Google in recent months to house cleaning tasks:
This house cleaning work has reached getStats, and with it, 4 main areas:
Such changes are great when viewed in the long term. But in the short term they are a huge headache.
Firefox & SafariSince Safari uses libwebrtc, it will get most statistics out of the box. However, the binding at the WebKit layer needs some code to be written which creates some difference with libWebRTC changes that Safari does not notice. We observed this with the “trackIdentifier” property recently but there may be others. Apple seems rather reactive here.
Firefox used to spearhead the “spec” getStats implementation but has fallen behind and lacks several stats types (such as candidate-pair stats). This means workarounds like shown by this WebRTC sample are still required for very basic functionality. Statistics related to media quality are lacking even more.
Keeping up the pace with WebRTC getStats changesAt testRTC, we’re offering tools for the full lifecycle of WebRTC applications. These include testing and monitoring services. As such, we rely heavily on getStats.
Years ago, we had to implement the migration from legacy stats to spec complaint stats.
Then came 2022 and with it the housekeeping changes by Google to the statistics found in getStats. It started with Chrome 107 and continues even today. With each such release, we need to get an experienced WebRTC developer to check, test and fix our code to make sure our services collect the statistics properly. All that is on top of the need to support more metrics that Google adds to Chrome in WebRTC getStats from time to time.
Our job is harder than most in this simply because we need to collect and support all the stats – the customer base we have is varied and we never really know which metrics they’d be interested in.
This task of keeping up with getStats has been a bit of a challenge in the last few months. That’s because in each release something else changes. Each step is reasonable. Needed. Minor. But it brings with it changes we need to do in our own planning and roadmap.
To others, such changes have brought with them breakages as well. At times the need to update and upgrade open source components or to fix their own code.
This is a good thingIt is important to state – the changes and work conducted here by Google is for the better.
Going for a spec compliant WebRTC getStats implementation means we have actual documentation that we expect to work. It also means interoperability with other browsers and components (assuming they strive to spec compliance as well).
Improvements in performance and polishing out best practices means better performance and code for WebRTC applications in general.
Removing deadweight and deprecated/unused statistics and similar components means smaller codebase with less edge cases and “things” to test.
This is what we want our WebRTC implementation to be and look like.
The fact that we need to undergo this ordeal is the price we need to pay for it. It would have been a wee bit nicer if Google would lay their plans of such changes well in advance (not through sporadic PSAs but rather as a kind of a public roadmap). This will enable better planning for those running such applications. But it is what it is. And frankly – we get what we pay for (=free).
Chrome’s WebRTC getStats implementation might not be the reason for bad metric valuesThen there are bugs. Metrics you obtain for getStats that don’t seem to reflect reality.
There are usually 3 reasons for that to happen:
A few things to remember here:
WebRTC is used by MANY inside browsers. Think billion(s) of people
It is adopted by thousands of applications developing directly and indirectly on top of it
Using statistics is standard practice to optimizing for media quality and most of the large WebRTC applications rely on it heavily already
Why should your application and use case be any different in trusting WebRTC getStats?
What can you do about WebRTC getStats changes?Nothing.
That said, I do have a few suggestions for you:
The post Can I trust WebRTC getStats accuracy? appeared first on BlogGeek.me.
WebRTC is the best media engine out there. And it has nothing to do with its performance…
I’ve been part of the video conferencing industry throughout the first decade of the 21st century and a bit of the 2nd decade as well. The driving force at the time was resolution and frame rate. There was an arms race among vendors as to who provides higher resolutions and frame rates in their room system. A lot of the ethos at the time was the implementation of proprietary media engines that were built for the task at hand. Optimizing and fine tuning them for media quality was considered a core competency.
Fast forward to 2023, what should be the mindset and ethos today?
This is a kind of a continuation to my article on the WebRTC predictions for 2023
Table of contentsIn the context of VoIP and WebRTC, a media engine is a component that takes care of media processing. Simplifying it, a media engine implementation does something like this:
The media engine also deals with improving voice and video – things such as echo cancellation, noise suppression, packet loss concealment, background blurring, etc.
WebRTC (and libWebRTC) as a media engineOne of the descriptions of WebRTC that I love is that WebRTC is a media engine with a JavaScript API on top.
Google’s implementation of WebRTC is libWebRTC. Originally, it came from its acquisition of GIPS (Global IP Solutions) – a company that licensed their proprietary media engine to VoIP developers. Google took that library, sprinkled the WebRTC API definition on top of it and integrated it with their Chrome browser.
10 years ago, there were other media engines as well. Most large vendors built and maintained their own media engine – especially if their market was video conferencing.
WebRTC, being a standard on both network and interface later, with libWebRTC being an open source implementation of it (that is maintained by Google AND integrated inside the most popular web browser) – became the best media engine out there practically overnight (or at least within 10 years and through a pandemic).
Joining a video call in your browser? Great! If you aren’t using Zoom, then 99.99% chance that what you are using is WebRTC, with the libWebRTC implementation.
Can a media engine other than WebRTC perform better? Made with MidjourneyYes.
But what does that even mean?
What does performing better than WebRTC mean exactly?
libWebRTC isn’t the best media engine out there. At least not in that one (or more) parameters you’ve decided to compare it with your own proprietary alternative. But does it even matter?
Advantages of native (and proprietary) media enginesBuilding and maintaining your own native and proprietary media engine? Good for you! Lets’ see what advantages you gain by doing that:
Now that we’re happy with building our own native and proprietary media engines, lets see what are our challenges:
We’re in the 3rd year of the WebRTC unbundling trend. This is still early days.
WebAssembly is here. It is powerful. And it is used more and more, with ever increasing usefulness.
WebTransport and WebCodecs are still great experiments – usable mostly for proof of concepts or early implementations. Using these to power a full fledged media engine that doesn’t make use of WebRTC is still a challenge.
Not all browsers support these interfaces, and those that do still have instabilities and a lot of optimization work to pore into them.
Using these is a long term investment that won’t offer a usable solution for 2023.
Why would I choose WebRTC as my media engine every day of the week?Going to use your own native and proprietary media engine implementation? Good for you!
But do you need browser support in your application? Are these 5% of the user base or interactions or is it more like 50% or more?
Are you looking to make use of open source media servers and components? If so, then are these available for your proprietary implementation or will it be easier to just use ones that support… WebRTC!
Assuming you need browser support for your application and that said browser support isn’t there just as another unused feature to win a customer deal (and then lay forgotten somewhere), then you should just use WebRTC.
Why?
Because at the end of the day, that’s what browsers have available for you.
The post Can a native media engine beat WebRTC’s performance? appeared first on BlogGeek.me.
Here are the WebRTC predictions and trends you should expect in 2023. It is more of the same, but with nuanced differences.
As we’re starting 2023, it is time to look back and then into the future, to understand where we are and where we are headed with WebRTC. This year, things are getting somewhat trickier here:
Oh, and did I mention that I changed a lot in my own work-life? I am now Chief Product Officer at Spearline, dealing with the larger picture of testing and monitoring communication networks. Life is full of surprises
There’s lots to cover, so let’s start.
Table of contentsBefore I dive into the predictions, it is important to know where we stand. We’ll do this by looking at 3 different layers:
Let’s start with the technology itself
The era of differentiationWe are well into the era of differentiation:
This started with Google unbundling WebRTC in the browser, starting to offer pieces of it as separate future W3C standards as well as opening up more access to lower levels of the stack. In the past year we’ve seen growing use of these capabilities outside of Google and experimentation and in production.
2021 brought with it background blurring and replacement in the browser to the masses.
In 2022 we’ve seen proprietary codecs and noise suppression finding a solid home in WebRTC applications and technologies using these capabilities. Representative commercial examples of this are Dolby Voice proprietary codec and Twilio’s Krisp partnership on noise cancellation.
If this is hinting on anything, it is that we’re going to see more of these moving forward, as vendors try to differentiate further. The only thing slowing this trend down is the current market recession.
Peak WebRTCThe pandemic that has raised all boats is all but over.
China is opening up, with or without another COVID wave. Many have shifted to hybrid work. Others are now communicating via video sessions a lot more than they used to.
Zoom is seen as the poster child of the pandemic. If you overlay its stock price with WebRTC usage in Chrome, you get this interesting chart:
WebRTC is still 3-4 times bigger in use than it used to be prior to the pandemic. That said, throughout 2022 we’ve seen consistent decrease in use of WebRTC. This is likely to continue into 2023.
My guess/prediction is that we will stay at around 3 times the use we had at the beginning of 2020.
libWebRTC dominancelibWebRTC is still king of the hill when it comes to WebRTC client-side implementations.
Nothing comes close to it.
libWebRTC is Google’s implementation of WebRTC, and the one used across all browsers today. A monoculture.
For most projects, using libWebRTC as a starting point for a non-browser implementation is the way to go. In some niche use cases, other solutions can and should be considered. The main alternative in such cases is probably Pion today.
2022 has been mostly a year of optimizations and polishing for the libWebRTC implementation, continuing on Google’s focus in 2021. 2023 will look no different.
WebRTC Insights clients received an analysis of the contributors to the libWebRTC project throughout history as part of a recent issue tracker sent to them.
Lets try a quick Q&A here on libWebRTC:
Is there a competitive alternative to libWebRTC in WebRTC?
The most popular WebRTC implementation out there is libWebRTC.
It is also the most dominant since it got embedded in all modern browsers.
libWebRTC is well maintained and is undergoing consistent improvements and optimizations. No other WebRTC stack is getting the same level of investment.
This is not expected to change in the foreseeable future.
Why is Google investing in libWebRTC?
This isn’t about Google Meet. Google is monetizing the web via ads delivered on search conducted in browsers and smartphones. By placing more of our activities in browsers and on the web, Google can monetize more interactions – indirectly.
Then there’s Google Meet/Workspace, competing with Microsoft Office on enterprise productivity.
Commoditizing communications is Google’s way of managing complementary technologies. Ben Thomspon in his latest analysis of AI and the Big Five refers to Joel Spolsky’s Strategy Letter V which offers a great explanation for both Google’s approach and is a good segway to our next section on open source:
Open source is not exempt from the laws of gravity or economics. […] something is still going on which very few people in the open source world really understand: a lot of very large public companies, with responsibilities to maximize shareholder value, are investing a lot of money in supporting open source software, usually by paying large teams of programmers to work on it. And that’s what the principle of complements explains.
Once again: demand for a product increases when the price of its complements decreases. In general, a company’s strategic interest is going to be to get the price of their complements as low as possible. The lowest theoretically sustainable price would be the “commodity price” — the price that arises when you have a bunch of competitors offering indistinguishable goods. So:
Smart companies try to commoditize their products’ complements.
The state of WebRTC open sourceNot much has changed since my analysis a year ago on WebRTC trends in 2022, where I looked at WebRTC open source projects.
Unsurprisingly, Janus, Jitsi, mediasoup and Pion still reserve most of their founders and key figures. These are teams/individuals who are personally and emotionally invested in these projects, which is a good thing.
The challenge is that besides Janus, none of them offer any official support and custom development. For the rest, companies need to rely on in-house development or external outsourcing vendors and freelancers.
As this state hasn’t changed for a good few years, not much is expected to change in 2023.
The main difference or question mark can be put on the projects that are now indirectly owned by a business whose focus might be elsewhere:
The CPaaS landscape is changing and shifting where it comes to WebRTC.
We started seeing these shifts a couple of years ago, but it seems that change is accelerating in this space – something that is different from what is happening with WebRTC open source.
The perceived leaders in WebRTC CPaaS are still Twilio, Vonage and Agora. I have a feeling that by the end of 2023 this will change.
Let’s review the who’s who of WebRTC in CPaaS.
TwilioNo CPaaS list is complete without Twilio. I’ll obviously start with them.
Twilio is continuing their trend from last year of going after the Customer Experience Platform market.
There was one big change that took place in 2022, where Twilio announced focusing on 4 pillars, instead of spreading all over. This was conveyed in Jeff Lawson’s open letter laying off 11% of their workforce. These focus areas are:
No word about WebRTC. Definitely no video in here.
The opposite has happened – Twilio Live, announced in 2021, is being shut down:
Interestingly, its migration guide is recommending Mux, a vendor that just launched a WebRTC video offering as well. Should Twilio customers using Programmable Video also migrate that part to Mux? One wonders
VonageVonage has its hands full with Ericsson who acquired them.
Not much has changed on their platform besides the introduction of background blurring and replacement.
As the honeymoon between Vonage and Ericsson will dissipate, along with the realization of a recession, it will be interesting to see what will happen to the Vonage Video APIs – will the level of investment there remain high or will it shrink?
AgoraAgora’s stock tanked since its peak:
Our information there is more limited than that of Zoom simply because the Agora IPO took place only in 2020.
It got into a recent mud fight with Zoom over the quality of experience that their respective platforms offer.
ZoomZoom opted to go with the unbundled approach, using WebRTC only sparsely. For video, they are especially focused on building their own media stack replacing most of what WebRTC does. In the short term, such an approach isn’t too productive. Longer run, who knows?
Zoom and APIs and CPaaS is a long affair by now. One which hasn’t worked out well enough for Zoom. Their browser story wasn’t tight enough until recently. This got them to go head to head with competition and commission a performance report pitting their Zoom Video SDK versus Vonage Video API, Agora, Twilio Programmable Video and Amazon Chime SDK.
This specific post is telling:
IaaS gone video CPaaS. That was in 2020. Both Microsoft Azure and Amazon AWS introduced their own video APIs.
Microsoft had the better story: Azure Communication Services. Uses the same infrastructure as Microsoft Teams. Being able (in the longer run) to connect directly to Microsoft Teams calls.
The network effect and infrastructure were always in their favor. That said, it doesn’t appear enough in discussions I have with developers building WebRTC applications.
There’s a lot of untapped potential here.
AmazonI am starting to see the Amazon Chime SDK in more places. It seems that like Amazon Connect, after 3 years of being out there, it is getting the critical mass it needs to become “a thing” in the industry.
This is one to watch closely, especially if you are a video API vendor yourself…
Cloudflare (new entrants)There’s another IaaS vendor who is joining the party of Video APIs – Cloudflare.
Cloudflare started in 2021 with a managed TURN service. One that is still in private beta.
But they announced and launched on September 2023 two additional services:
Both API offerings that are well-defined these days in the Video API or WebRTC CPaaS space.
Hopefully, they’ll move faster with these two than they had with their managed TURN service.
Mux (new entrants)Mux, a vendor who focused on video delivery via APIs has joined the WebRTC market as well, offering their own Video APIs – Mux Real-Time Video. This is an interesting take, especially since their target audience is slightly different than that of developers who end up with CPaaS. It brings a fresh look and interpretation of the problem – just like the IaaS vendors and Zoom are.
The interesting part is that Twilio decided to refer their Twilio Live customers to Mux. If I were Mux, I’d mark every customer coming in from Twilio Live, making sure they get the best experience and support so that 6 months from now I can start talking to them about migrating away from Twilio Programmable Video.
SaaS as CPaaS, Embeddable & Prebuilt An embeddable video call, courtesy of DALL-EThen there’s the lowcode/nocode trend and how it manifests itself in CPaaS. I’ve written an ebook about it – Lowcode & Nocode in Communication APIs (sponsored by Daily, a known CPaaS vendor). In the past two years we’ve seen more and more CPaaS vendors offering lowcode and nocode solutions on top of their video APIs.
To that specific market/solution, we are seeing SaaS vendors heading as well – for some reason, everyone thinks that CPaaS is a great business.
The notable examples here are Whereby, a meetings platform that started offering Whereby Embedded, and Digital Samba, who started from a webinars platform and is now offering Digital Samba Embedded.
This part of the market will continue to evolve, with CPaaS vendors and others offering ever higher layers of abstraction.
How did I do with my 2022 WebRTC predictions?We’re done with the market overview. Time to move on to predictions.
I’ll start by looking at how I fared with my 2022 predictions of the upcoming trends…
This was a hit and miss thing (obviously).
Hitting the nailThere were three trends that I was spot-on.
#1 – Scale & performance
My bet at the time was that we will continue to see a continuation in improving scale and performance of WebRTC. This was definitely the case for 2022.
At the Kranky Geek event in November 2022, Google in their WebRTC annual update spent the time on quite a few items, but the first one of them was performance optimizations:
We will review this slide a few more times later on.
#2 – #newtech
This is the new technology trend, which was split a bit internally:
#4 – Live streaming
Live streaming continued to evolve in 2022:
This is where I got it wrong.
#3 – WebRTC infrastructure, hyperscaling and SD-WAN
Here, I thought we’ll still ponder if Anycast and SD-WAN are important to WebRTC.
And then Subspace got shut down, and with it, a lot of the effort to push this story forward. It is sad, because I do think that striving to lower latencies and clearer networks is the way to go. This setback will delay such attempts by a few years.
#5 – 2D to Metaverse
Extremes and experiments to counter Zoom fatigue. I don’t think that that many new alternatives and suggestions were made in 2022 that we haven’t seen before.
Cloud media processing
This is something I haven’t seen coming. It can’t be considered a trend yet, but it is something to keep a close eye on.
The whole point of using SFUs in WebRTC is in order to reduce infrastructure costs in compute.
BUT…
Google started with doing noise suppression in the cloud for Google Meet a few years back. This means decoding and encoding audio in the cloud in an SFU architecture.
And now Google is doing the same for background replacement on low-end devices
Is that a one-time transitional thing, or will others follow suit?
WebRTC predictions for 2023Time to look at my predictions for 2023. This is where I think we will see the most focus in WebRTC this year, and how it will shape up.
#1 – libWebRTC (and the future of WebRTC)In libWebRTC we will see more of the same, with a few nuances.
Google’s WebRTC library is mature. It has all the bells and whistles expected of it. Here’s where we will see Google taking libWebRTC:
libWebRTC will maintain its leading and dominant position as the WebRTC stack of choice for client-side development. And Google will take it wherever THEY need it.
#2 – Machine learning and media processingWebAssembly will continue to be a driving force in 2023 when it comes to WebRTC.
It will be used for media processing and in relatively the same places we see it used and experimented today – background replacement, noise suppression and proprietary codecs implementations.
We will also see it enabling more vendors to leave the peer connection implementations in WebRTC and play around with media engines developed using WebAssembly and running on top of WebRTC data channels or WebTransport.
#3 – Voice before video (Lyra first, AV1 later)This one is a bit of an overreach, but one I am willing to make.
Lyra, Google’s ML-based voice codec, will find its way into WebRTC before AV1 will. This isn’t in terms of availability, but in terms of adoption and popularity of use.
AV1 takes up too much CPU power and memory. This makes it usable only in high-end devices or devices with newer hardware (which is almost non-existent still). We have ways to go until AV1 can become a reality. Probably one or two more years.
Lyra is here. And it is improving in performance and quality. Microsoft’s Satin is breathing down Google’s neck. Something will have to happen here. And my bet is that this will happen in 2023.
The technology is most probably ready. The market is ready.
You can learn more about it from Phillip Hancke’s session about voice codecs in WebRTC at the recent Kranky Geek event.
#4 – ObservabilityYou can say I am biased. So be it.
Observability was always a real challenge with WebRTC applications. Its nature, due to many reasons (one of them being encryption), makes it hard to monitor using legacy tools and methodologies.
What we will see in 2023 is more interest in observability. We have more products in the market that use WebRTC. Contact centers are moving to the cloud. Many of the bigger vendors are in the process of shifting focus from SIP to WebRTC in their current deployments, and not just as a feature in their checklist.
This will bring with it the need for better tools to understand and figure out how WebRTC sessions behave – both in pre-production and in production.
And now it is time for some shameless self-promotion here –
Watch my session from Kranky Geek, where I discuss on where observability of WebRTC statistics fall short (hint: troubleshooting)
Don’t forget to check out the WebRTC products we have at Spearline
#5 – M&As and shutdownsThis is an easy one to make in 2023.
We’re in recession. It will get better by December. It will get worse and stay with us. Whoever is correct in his estimate at what will happen a year from now, one thing is quite apparent:
Companies are closing their pockets, downsizing and keeping to their core focus.
WebRTC is part of it, and as a relatively new technology, it might be hurt more than others. I don’t think this will be the case, simply because we’re also in transition towards hybrid work due to the pandemic we faced. These two will negate each other a bit.
The end though will be house cleaning of the industry itself:
This in itself puts a strain on developers who need to choose which CPaaS vendor to use – picking the wrong one may lead them stranded with the need to switch (think Twilio Live). They will go to the bigger, more known vendors. Which will lead to a vicious cycle since the smaller vendors may not have the time to grow quickly enough – potential customers will be less willing to risk using them.
Preparing for a rocky year Rendered using MidjourneyInteresting times ahead.
2023 will shape up to be challenging.
On one hand, we have more of the same in a lot of areas. On the other hand, the current market state is causing a lot of instabilities that will cause some shifts in the market.
And that, without saying a word about generative AI and what that might mean to the market of WebRTC and communications moving forward.
The post WebRTC predictions for 2023 appeared first on BlogGeek.me.
New coturn project leads Gustavo Garcia and Pavel Punsky give an update on the popular TURN server project, what's new in STUN and TURN standards, and the roadmap for the project
The post coturn: No Time to Die – Q&A with new project leads appeared first on webrtcHacks.
Home assignments are coming to the next round of office hours for my WebRTC training courses for developers.
Around 6 years ago I launched the first WebRTC course here. Since then, that grew into its own separate website and multiple courses and bundles.
Next month, another round of office hours is about to begin. In each such round, there are live sessions where I teach something about WebRTC and then open the floor for general questions. That’s on top of all the recorded lessons, the chat widget and slack channel that are available.
In this round (starting February 6), I am experimenting with something new. This time, I will be adding home assignments…
The dynamics of office hoursThe office hours are 10-12 lessons that take place on a weekly cadence at two separate time zones, to fit everyone.
In each I pick and choose a topic that is commonly discussed and try to untangle it from a slightly different angle than what you’ll be finding in the course itself. I then let people ask questions.
The office hours are semi-private. Usually with 2-6 participants each time. This gives the ability to really ask the questions you care about and need to deal with in your own WebRTC application.
Why home assignments?As part of my new role as the Chief Product Officer at Spearline, I asked to enroll in a course – CPO Bootcamp (the best one if you’re in Israel). It is grueling as hell but more importantly – highly useful and actionable.
One of the components in that bootcamp is home assignment. They are given every week, then they get checked and feedback is given. They make me think about the things I am doing at Spearline and how to improve and finetune our roadmap and strategy. I even share them with my own team – being able to delegate is great, but it is more about the shared brainpower.
As with anything else, when I see something that is so good, I try to figure out if and where I can make use of that idea.
Which brings me to the WebRTC courses home assignments.
Home assignments = implementation AND feedbackFor me, home assignments fit the best as part of the office hours.
Here’s what we’re going to do:
The assignments relate and are focused on your WebRTC application. Not to something unrelated. Their purpose is to make you think, revisit and evaluate the things you’ve done and decided.
They are also building upon one another, each touching a different aspect of the design and architecture.
In a way, this is a unique opportunity to get another pair of eyes (mine) looking at your set of requirements, architecture and decisions and offering a different viewpoint.
Getting the most of the WebRTC coursesIf you are planning to learn WebRTC, then now is the best time possible.
Those who have enrolled to the course in the last 12 months or have renewed their course subscription can join the office hours and take part in the home assignments.
Office hours will start February 6.
If you haven’t enrolled yet, then you should More information on how to enroll can be found on the WebRTC courses site.
The post WebRTC course home assignments are here appeared first on BlogGeek.me.
Kranky Geek 2022 follows our tradition of great curated content on WebRTC that is both timely and timeless. Here’s what we had this year.
Kranky Geek is the main event focusing on WebRTC. I’ve been doing it with Chris Koehncke and Chad Hart for many years now, with the help and assistance of Google along with various sponsors each time.
Like many, we’ve switched to an all virtual event since the pandemic started, and decided at least for this year to continue in the same format. This turned out well, since I had to go on a business trip to Ireland at the date of the event, and virtual meant I was still able to both host and speak at the event.
Kranky Geek is quite a grueling experience for the hosts. We curate the sessions, at times approaching those we want to speak, at other times telling the speakers what topics we think will fit best. We go over the draft slide decks and comment on them. Doing dry runs on the week of the event with all speakers to make sure the session is top notch.
You won’t find much commercial content in a Kranky Geek event. What you will find is lots of best practices and suggestions based on the experience and the path taken by our great speakers.
To this year’s summary, Philipp Hancke did the commentary about the sessions themselves. If you are a WebRTC Insights subscriber, and would like to discuss the content and how it fits in your company, feel free to reach out to me to schedule a meeting.
If you are looking for the whole playlist, you can find it here. The videos have been embedded below to make it easier for you to watch.
Roundtable: The state of Open Source in WebRTCBackground blurring and light adjustment using MediaPipe.
Krisp SDK on the Web: noise suppression.
Where I speak and Philipp comments (I am kinda subjective on this session).
Recording and compositing video sessions.
FlexFEC and video.
It should be noted that without our sponsors, doing the Kranky Geek event would be impossible. When we set out to run these events, we had this in mind:
This requires sponsors to help with funding it. Each year we search for sponsors and end up with a few that are willing and happy to participate in this project of ours.
This year?
Check them out
A Kranky Geek 2023?When we’re planning and preparing for the event, it feels like this is going to be our last event. It isn’t easy, and none of us in the Kranky Geek team are event planners by profession. The question arises after each such event – will we be doing another one?
Once this event was over, we started working on wrapping the event. Part of it was editing the content and uploading it to YouTube (which takes time).
Will there be another Kranky Geek event next year? Maybe
Will it be in person or virtual? Maybe
Until then, go check out our growing library of great WebRTC content: https://www.youtube.com/krankygeek
The post Kranky Geek WebRTC event summary 2022 appeared first on BlogGeek.me.
WebRTC comes with mandatory encryption, which enables privacy, but which type of privacy are you really looking for?
DALL-E: a broken lock on a chestIn the past, all the great stuff started in the enterprise and then trickled down to consumers. Now it is the other way around – first features come to consumers and from there find their way to enterprises.
Privacy is no different, but in enterprises it needs to be defined quite differently, making it a totally different kind of a feature.
This is where privacy vs privacy comes to play.
Table of contentsAs a user, what do you mean when you say privacy?
That the data you generate is yours. Be it sensor related data (think GPS or heart rate). The conversations you have with people are not accessible to anyone else. The same for the photos you take.
Practically, you want no one other than you and those you explicitly share data with to have any access to that data. And that includes the services you use to generate and share that data.
Sending messages over Whatsapp or any other social media service? You probably want these messages to be encrypted on the go, so no one can sniff the network and read your messages. You also don’t want Whatsapp’s employees reading what you wrote.
Essentially, what you are looking for is E2EE – End-to-End Encryption. This means that any intermediary along the route of your communications, including the communication provider himself who is facilitating the session, won’t have the ability to read the content. Simply because it is encrypted using some encryption key that is known only to those on the session.
The enterprise version of privacyLife for a consumer is simple. At least when compared to an enterprise.
In the enterprise you want this privacy thingy, but somehow you also want governance and the creation of some corporate knowledge base.
When a meeting takes place. Should only the people in the meeting have access? Think about it. Should the people involved in that aspect of the business have access?
Let’s say we’re on a sales call with a customer. And then the sales rep on that call leaves and gets replaced with another one. Should the new sales rep have access to that call that took place and the decisions made in it?
Today, our CRM systems can connect directly to the corporate email and siphon any emails sent or received with certain customers into their account for recording and safekeeping. So we stay in sync with all conversations with that customer.
We may need to store certain conversations due to regulatory reasons. Or we might just want to transcribe them for later search – that internal company knowledge base repository.
There are also times when we’d like to use these conversations we’re having to improve performance. Similar to what Gong does to sales teams.
BUT
We don’t want others to have access to these meetings. In some cases, we don’t want the theoretical ability of the provider of the service to access these conversations – think of a Microsoft Teams session, Google Meet or a Zoom call that gets listened to by the employees of these companies.
Privacy in an enterprise looks different than for consumers. It is more granular and more structured, with different rules and permissions at different levels and layers.
WebRTC and privacyPrivacy is king in WebRTC, with a few caveats:
Why these caveats?
And why is privacy king in WebRTC? Because security is ingrained in WebRTC, which means you can use it to provide privacy conscious services.
Lets go over what privacy in WebRTC actually means:
WebRTC mandatory encryption (and security)In WebRTC, all media is encrypted. You can’t decide to send media “in the clear”. And then the signaling itself is also encouraged to be encrypted, and for all intent and purpose – it is encrypted as well.
This means that if you send audio or video via WebRTC from one user to another or from one user to a media server – then that media is encrypted and can be played only by the recipient.
Someone looking at the bitstream “over the line” won’t be able to play it back or intervene with the content.
Note here that a media server terminates the conversation here and is privy to what is being sent – it has access to the encryption keys. TURN servers don’t have such access.
This mechanism of encryption isn’t optional – it is just there.
E2EE in WebRTCIf we increase the scope to group conversations, then we need E2EE – End-to-End Encryption.
This can be achieved on top of WebRTC using a mechanism known as insertable streams, which ends up as double encryption – one between the sender and the media server. And one between the sender and the receivers on the other end. That second layer of encryption is part of the application. WebRTC doesn’t mandate it or even encourage it – it just enables you to implement it.
Deniability vs governance of communications in WebRTCHere’s where things can get tricky with WebRTC – it can be used to cater for both ends of the equation.
You can use WebRTC to obtain deniability.
WebRTC has a data channel that runs peer to peer. Using signaling servers to open up such connections to create a loose mesh network of peers means you can send private, encrypted messages from one user to another on that network without having any easy way to trace the communications – let alone to trace its metadata. That’s on the extreme scale of what can be achieved with WebRTC – a TOR/bittorrent-like network.
With the same methodology, I can get two users or even small groups to communicate directly, so that their media travels between them and them alone. Or I can employ E2EE on media servers and get privacy of the content of the communications from the infrastructure used to facilitate it.
You can use WebRTC to handle governance.
On the other side of the equation, you can use WebRTC and force all communications to go through media servers. Media servers which can then enforce policy, record media and provide governance. For some industries and verticals – that’s a mandatory requirement.
And you get these capabilities while keeping the communication encrypted over the internet.
Who cares?With privacy that’s the biggest question. Who cares?
No one and everyone at the same time.
If you ask a person if he wants privacy the immediate answer is – yes!
And yet… Twitter still doesn’t offer E2EE on DM messages. And people use it.
Whatsapp added E2EE in 2016, when it already had a billion monthly active users. It added E2EE backups in 2021. It seems people wanted it, but not in such high demand to switch to a more secure and private messaging system.
Here’s a screenshot from my own Whatsapp in one of the groups I have:
That weird message is an indication that a friend of mine has changed his security code. This usually means he re-installed Whatsapp or switched a phone I presume. I ignore these messages altogether, and I am assuming most people ignore these messages.
In the same way, companies want and look and strive for privacy and want the services they use to be private. But most of them want it up to a point.
Does that mean privacy isn’t needed? No.
Does it mean we shouldn’t strive for privacy? No.
It just means that people value other things just as much or even more.
CPaaS, Video API and… privacyWhen it comes to video APIs and CPaaS platform, it feels that privacy is somewhat lagging behind.
Messaging platforms today mostly offer E2EE. UCaaS are and have been introducing E2EE to their chat services and video calls. Some are offering integration with third party KMS (Key Management Systems) so they don’t have access to the decryption keys to begin with.
CCaaS relies heavily on the telephony network, where, well, what privacy exactly? And they also like to record calls for “quality and training purposes” – which translates to using machine learning and providing governance.
Video CPaaS is somewhere in-between these days – it offers encryption on sessions because it uses WebRTC, which is encrypted by default. But anything going through the media server can usually be accessed by the Video APIs vendor itself. Very few have gone ahead and added E2EE capabilities as part of their solution.
The reasons for that? It is hard to offer E2EE, but it is even harder to offer it in a generic manner to fit multiple use cases. And on top of that, customers don’t necessarily care or will be willing to pay for it, while they will be willing to pay for features such as recording.
What next?Here’s the thing:
Everybody talks about privacy but nobody does anything about it
In the consumer space, we are moving to an E2EE world.
The enterprise space is glacially pacing towards that same goal.
Parallel to that though, machine learning and cloud media processing are shifting the balance back towards less privacy – at least less privacy from the vendor hosting the service.
Which is more important to the buyers of services? Privacy or governance? Deniability or machine learning?
The post WebRTC: Privacy or Privacy? Which one shall it be? appeared first on BlogGeek.me.
I interviewed mediasoup’s co-founder, Iñaki Baz Castillo, about how the project got started, what makes it different, their recent Rust support, and how he maintains a developer community there despite the project’s relative unapproachability. mediasoup was one of the second-generation Selective Forwarding Units (SFUs). This second generation emerged to incorporate different approaches or address different use cases a few years after the first generation of SFUs came to market. mediasoup was and is different. It is node.js-based, built as a library to be part of a serve app, and incorporated the Object-oriented approaches used by ORTC – the alternative spec to WebRTC at the time. Today, mediasoup is a popular SFU choice among skilled WebRTC developers. mediasoup’s low-level native means this skill is required.
The post Revealing mediasoup’s core ingredients: Q&A with Iñaki Baz Castillo appeared first on webrtcHacks.
It is time to stop for a second and review what we’ve accomplished here with our WebRTC Insights in the past two years.
There are a few pet projects that I am doing with partners, and one of the prime partners in crime for me is Philipp Hancke. We’ve launched our successful WebRTC codelab and are now in the process of finalizing our second course together – Low-level WebRTC protocols.
Two years ago, we decided to start a service – WebRTC Insights – where we send out an email every two weeks about everything and anything that WebRTC developers need to be aware of. This includes bug reports, upcoming features, Chrome experiments, security issues and market trends.
All of this with the intent of empowering you and letting you focus on what is really important – your application. We take care of giving you the information you need quicker and in a form that is already processed.
Now, two years in, it is safe to say that this is a VERY useful tool for our subscribers.
“WebRTC insights might be the most important email you read every fortnight as a RTC / video engineer. It’s hard to keep tabs on what Google et al are doing with WebRTC while working on your product and the WebRTC Insights provides very specific and actionable items that help tremendously. We have been ahead countless times because of it. If you are serious about WebRTC you should definitely subscribe 100% worth it.”
— Saúl Ibarra Corretgé, Principal Software Engineer @ 8×8 (Jitsi) How do we keep track of all the WebRTC changes?Keeping track of all the changes in WebRTC is a pretty daunting task. Tsahi started WebRTC Weekly almost nine years ago and it has been the source of high-level information ever since. Philipp has closely worked with WebRTC at a more technical level for a decade too. We both had our routines for keeping notes and transforming them into something informative for our audience but joining forces (which we never expected after having strong arguments about whether XMPP was a great signaling protocol in the early days!) has yielded a surprising amount of synergy effects.
We start doing Insights with a template. Whenever we find something that we think is interesting we add a link and maybe a very brief comment to that template . Usually we chat about those too (as we have done for…. almost a decade now). Then we move on because both of us have day jobs that keep us busy.
Every two weeks we spend a couple of hours turning the “brain dump” into something that our audience understands. Philipp focuses on the technical bits while Tsahi focuses on the market. Then we review each other’s section, improve and exchange thoughts.
We did this before Insights already but putting a structure and a biweekly cadence to it has “professionalized” it. While it remains a side project for us, we now have the process in place.
WebRTC Insights by the numbersWe’re not new to this, as this is our second year, we might as well also compare the numbers today with those we’ve had on year one of WebRTC Insights:
26 Insights issued this year with 447 issues & bugs, 151 PSAs, 11 security vulnerabilities, 146 market insights all totalling 239 pages. We’ve grown on all metrics besides security vulnerabilities.
WebRTC is still ever changing, but at least there are less security threats in it
Activity on libWebRTC has cooled down a bit in the last two years when it comes to the number of commits and people working on it:
After more than a decade that is a sign of maturity, the easy changes have already been done and all that is left is optimizations. The numbers we see for Insights roughly correlate with the amount of energy Google puts into the project. We are just glad we did not start it during the “hot phase” of 2016-2019.
Let’s dive into the categories, along with a few new initiatives we’ve taken this year as part of our WebRTC Insights service.
BugsAmong the really useful feedback we have received was the suggestion to add a “component” or area the issue is in. This is useful for larger teams where one person may be digesting the biweekly email and route this to a subteam with a particular focus such as audio, video or networking.
The other improvement is a visual hint whether a particular item is a bug, a regression, a feature or just something that is generally good to know:
In addition to that we classify it as “read, plan or act”. Of course we hope our subscribers read all the issues but some are more important than others.
PSAs & resources worth readingPublic service announcements or PSA are the main method Google’s WebRTC team uses to announce important changes on the discuss-webrtc mailing list. We track them and give some context why they are important or whether they are safe to ignore (which can happen for API changes where a PSA may be required by the release process.
We also look at important W3C changes in this section as well as other content that is too technical for the “market watch” section.
Experiments in WebRTCChrome’s field trials for WebRTC are a good indicator of what large changes are rolling out which either carry some risk of subtle breaks or need A/B experimentation. Sometimes, those trials may explain behavior that only reproduces on some machines but not on others. We track the information from the chrome://version page over time which gives us a pretty good picture on what is going on:
In this example we saw the AV1 decoder switch from libaom to libdav1d over the course of several weeks.
WebRTC security alertsThis year we continued keeping track of WebRTC related CVEs in Chrome (totaling 11 new ones in the past year). For each one, we determine whether they only affect Chromium or when they affect native WebRTC and need to be cherry-picked to your own fork of libwebrtc when you use it that way.
To make it easier to track, we now keep a separate Security Tracker file that gets updated with new issues as they are found. This makes it easier to glance at all the security issues we’ve collected.
On top of that, when there’s a popular open source component that has its own security issues published, we tend to also indicate these, though not add them to the Security Tracker, so they aren’t even counted in our statistics.
WebRTC market guidanceInformation overload. That’s what all of us face these days with so much material that is out there on the Internet. On our end, we read a lot and try to make sense of it.
Part of that is taking what feels relevant to WebRTC and sharing it with our WebRTC Insights subscribers. It includes the reference to the article, along with our thoughts about it.
For product managers, this is their bread and butter in gleaning the bits and pieces of information they need to make educated decisions about roadmap and priorities.
For developers, this brings a bit more context than they are used for in their daily work – and is often outside of their immediate work and expertise.
Our purpose? Enrich your world about WebRTC and express some of the power plays and the shifts in the market that are taking place. So you know them well ahead of them happening in force.
Covering important eventsWe really enjoyed Meta’s RTC@scale event. In terms of quality and technical depth it set a bar for the upcoming KrankyGeek event which had been the gold standard so far.
However, the technical depth of the event was too intense for it to be digested in real-time. This meant Philipp sat down on a rainy Saturday and started rewatching the videos while keeping notes. And ended up watching each session multiple times since there were so many great points that needed or even demanded a bit more explanation. This turned into a nine page summary of the event, annotated with the timestamps in the video.
We decided to make this summary public because, while we thought it provided a ton of valuable lessons to our subscribers. Meta made the content freely available and so should we. And hey, we keep referencing this every other week.
This may have been a one-off but we still genuinely enjoyed it so might repeat the exercise… on a rainy saturday!
WebRTC release notes interpretationWe started playing around with video release notes at the end of our first year, and quickly made it a part of the WebRTC Insights service.
Whenever Google publishes a release notes for WebRTC, we publish our own video with a quick analysis of the release notes (and the release itself) for our Insights clients.
We go over the release answering 4 main questions:
Our intent here, as with anything else, is to reduce the amount of work our clients have to do figuring out WebRTC details.
We are also making these release notes videos publicly available, 3-4 versions back, so you can derive value from them. You can find them on YouTube:
https://www.youtube.com/watch?v=DQt_OQT4ZAo&list=PL7fuFATIj-PUtMVTQKpW_odTCO0_CPfXV
Be sure to subscribe to receive them once they get published freely to everyone.
Join the WebRTC expertsWe are now headed into our third year of WebRTC Insights.
Our number of subscribers is growing. If you’ve got to this point, then the only question to ask is why aren’t you already subscribed to the WebTRC Insights if WebRTC interests you so much?
You can read more about the available plans for WebRTC Insights and if you have any questions – just contact Tsahi.
Oh – and you shouldn’t take only our word for how great WebRTC Insights – just see what our readers have to say about it:
“For any Service Provider or Apps who heavily relies on WebRTC, the WebRTC Insights offers great value. […] What I like most about the Insights is its bi-weekly cadence, which fits the rapid Chrome/WebRTC release cycle, and most of the mentions are actionable for us. With the recent Safari audio breakage, the Insights highlighted the problem timely and saved us a lot of troubleshooting effort.”
— Jim Fan, Engineering Director @ Dolby Laboratories“As a service company specialized in WebRTC I think WebRTC Insights is really useful. It keeps us up to date about what is coming next, giving good ideas for projects and research. Also, receiving periodic insights is always a good excuse to stop what I am doing and find some time to go over the latest WebRTC updates in more detail. It is much easier to do when you get all summarized in a single document than on your own just googling and going through an overwhelming list of webrtc news, updates and bugs.”
— Alberto Gonzalez Trastoy, CTO @ WebRTC.venturesHere’s the summary of the first year of Insights if you’re interested
The post Two years of WebRTC Insights appeared first on BlogGeek.me.
When developing with WebRTC, make sure you address the fact that many aspects are out of your control.
[In this list of short articles, I’ll be going over some WebRTC related quotes and try to explain them]
When you develop a WebRTC application, you need to take into consideration the sad truth that most of the things that are going to affect the media quality (and by extension the user experience) are out of your control.
To understand this, we first need to define who the lead actors are:
The main entities in WebRTC applications, taken from my presentation on testRTC Your applicationThis is probably the only piece you do control in a WebRTC application.
The code and logic you write in the application has immediate effect over the media quality and connectivity.
Deciding on how group calls are architected for example –
All these are going to greatly change the experience and it is all up to you to decide.
The browsersWeb browsers are out of your control.
I’ll repeat that for effect:
Web browsers are out of your control.
You can’t call Google asking them to delay their Chrome release by a week so you can solve a critical bug you saw cropping up in their upcoming release. It. doesn’t. work. this. way.
Browsers have their own release cadence, and it is brutal. In many cases, it is way faster than what you are going to be able to manage – a release every month.
The problem isn’t with this fast pace. It is with the fact that now, even after over 10 years since its announcement, WebRTC is still getting changed and improved quite frequently:
All of these changes mean that your application might break when a new browser version goes out to your users. And as we said, you don’t control the roadmap or release schedule of the browser vendors.
The networkYou decide where to place your servers. But you don’t get to decide what networks your users will be on.
I often get into talks with vendors who explain to me the weird places where they find their end users:
I am writing this article while sitting in the lobby of a dance studio on my laptop, tethered via WiFi to my smartphone’s cellular network (a long story). Users can be found in the most unexpected places and still want to get decent user experience.
WebRTC being so sensitive to the network connection (think latency, jitter, packet loss and bandwidth), these are things you’ll need to come to terms with.
In some cases, you can instruct your users to improve their connection. In others you can only guide them. In others still your best bet is to make do with what the user has.
Oh – and did I mention that the network’s conditions are… dynamic? They tend to change throughout the duration of the session the users are on, so whatever you decide to do needs to accommodate for such changes.
The user’s deviceIs your user running on a supercomputer? Or a 2010 smartphone? Do you think that’s going to make a difference in how they experience your WebRTC sessions?
WebRTC is a resource hog. It requires lots of CPU power to encode and decode media. Memory for the same purpose. It takes up bandwidth.
Your users don’t care about all that. They just want to have a decent experience. Which means you will need to accommodate for a vastly different range of devices. This leads to different application logic that gets selected based not only due to the network conditions, but also based on the performance of each and every user’s device – without sacrificing the experience for others.
Sounds simple? It is. Until you need to implement it.
How do you take back control of WebRTC?First step to gain control of your WebRTC application and its lead actors is by letting go.
Understand that you are not in control. And then embrace it and figure out how to make that into an advantage – after all – everyone is feeling these same pains.
Embracing them means for example:
The WebRTC Developer training courses touch a lot of these issues while teaching you about WebRTC
My WebRTC Scaling eBooks Bundle can assist you in figuring out some of the tools available to you when dealing with networks and devices
This blog is chock full with resources and articles that deal with these things. You just need to search for it and read
The post The lead actors in WebRTC are outside of your control appeared first on BlogGeek.me.
Phosfluorescently utilize future-proof scenarios whereas timely leadership skills. Seamlessly administrate maintainable quality vectors whereas proactive mindshare.
Dramatically plagiarize visionary internal or "organic" sources via process-centric. Compellingly exploit worldwide communities for high standards in growth strategies.
Wow, this most certainly is a great a theme.
Donec sed odio dui. Nulla vitae elit libero, a pharetra augue. Nullam id dolor id nibh ultricies vehicula ut id elit. Integer posuere erat a ante venenatis dapibus posuere velit aliquet.
Donec sed odio dui. Nulla vitae elit libero, a pharetra augue. Nullam id dolor id nibh ultricies vehicula ut id elit. Integer posuere erat a ante venenatis dapibus posuere velit aliquet.