Uncover the synergy between Programmable Video, Prebuilt, and marketplaces. Explore the role of video APIs in accelerating development.
Programmable Video is a known quantity. It is part of the CPaaS movement where in this case, video APIs are used to enable developers to build their applications faster. WebRTC is also a part of all this.
Prebuilt is another concept that is well defined, but differently from Programmable Video, it is still reshaping itself. Prebuilt is about embedding the UX/UI component of the video interactions, and not just using an API – it makes the faster development of Programmable Video… well… faster.
Then there are/were marketplaces. When taken to the domain of Programmable Video, it takes a slightly different shape still.
What’s there between Programmable Video, Prebuilt and marketplaces? Where are we headed with this? This is something I want to explore in this article.
Table of contentsThis isn’t my first foray and look at the Prebuilt market. Even before WebRTC, I’ve been fascinated about what a lowcode/nocode solution looks like for a Programmable Video offering.
At RADVISION, I’ve been in charge of defining our cloud vision for developers. What we did was license protocol stacks to developers building their own voice and video communication applications. That was before CPaaS was called CPaaS. And before WebSocket was part of the web.
Later on, I’ve written about embeddable solutions and then published an ebook on lowcode/nocode for video communications – and the Prebuilt solutions they prescribed for us. That ebook is still relevant today and can be downloaded freely.
Marketplaces 101Let’s switch gears a bit, before we head back to programmable video and lowcode/nocode solutions, I want to talk about a different topic called marketplaces.
When a vendor wants to build an ecosystem around his solution, one of the ways of doing that is to introduce a marketplace. The higher you go in the food chain, the more likely you are to find a marketplace as part of the complete offering.
I thought it would be best to explain what a marketplace is by looking at one. But just searching for an example gives you the gist of it. This is what I got for searching “AWS marketplace” on Google:
For me, a marketplace usually means:
The biggest marketplaces today are probably Apple’s App Store and Android Play. There’s the Microsoft Store for Windows applications.
Then there are the marketplaces of all the big IaaS vendors (Amazon AWS, Microsoft Azure and Google Cloud).
Cloud contact center vendors? All the big ones have marketplaces (I just searched for NICE, Five9 and Genesys to confirm).
Zoom has their own App Marketplace.
To complete this part – marketplaces are there once a vendor is big enough and looking to encompass third party solutions and services as part of his own offering.
Oh, and if you don’t understand why this is here, then just think what a marketplace for a Programmable Video Prebuilt offering may look like and mean.
Lowcode/nocode in Programmable Video: The next generationWhen I wrote the lowcode/nocode ebook most of the market for Prebuilt revolved around CPaaS vendors who were just adding a UI layer on top – anywhere from a source code reference application to a higher level of abstraction with an API and documentation.
Since then, the market has evolved. We are now seeing vendors coming from a different origin story into the domain of CPaaS and Programmable Video, and these come with a different view of Prebuilt. Here’s how I explained recently the origin stories:
The vendors with a SaaS origin story started life with a full fledged video meetings application – UX/UI – the whole shebang. For them, going down the food chain towards Programmable Video meant their focus was Prebuilt first and then the rest of the low level APIs. As such, they brought with them some new qualities and capabilities not often found in Prebuilt Programmable Video solutions up to that point.
That brings us to the next generation of what Prebuilt is in Programmable Video, and how this market is evolving and shaping up towards the future.
A few notable examplesHere are a few notable examples of what is changing in the Prebuilt space, and how this is shaping the coming changes in what future Prebuilt solutions in the programmable video space are going to look like.
Supporting multiple languagesMoving from the API layer to the UX/UI layer brings with it a need to deal with different languages and internationalization.
That means that the text messages displayed on the screen and shared with the user need to be conveyed in different languages. Which ones? That depends on the vendor. Each vendor offers a different set of languages, usually based on the customer base it has.
This is more than just text translation – there’s changing direction of the text for some languages (Arabic and Hebrew), numbering and dates conventions, layout of the screen – placing text in a given area or on a specific button might require changing its size.
For a Prebuilt service, there’s also the ability (maybe?) of letting the customer make changes to the text being displayed, and that needs to be done – again – in multiple languages.
It may sound obvious, especially if you’ve built consumer applications. But for those focused on developing APIs for other developers, this is a new type of a headache they need to deal with.
Prebuilt in Programmable Video is now coming in multiple languages from some vendors. Others may need to follow.
A user constructFor the most part, CPaaS and Programmable Video vendors don’t think in terms of users. Mostly minutes and peers or devices. Got a meeting to connect to? You publish your streams and subscribe to streams of others. They aren’t users in the sense that they aren’t identified or known users. Their identity, if any, is decided and managed by the application on top.
Programmable Video offers no memory or notion of the users, their preferences or history.
Prebuilt? Sometimes…
Some Prebuilt solutions are starting to show signs of dealing with users – their identification and authentication. Sometimes even offering different permission types within meetings based on who they are.
I am not sure how this will hold moving forward, but it is something to track and contemplate if you are investing in a Prebuilt offering.
Calendar integrationMeetings are sometimes done on a set schedule. And that schedule means there’s a calendar involved.
Programmable Video doesn’t have calendars integrated into it, so adding external ones via partnerships might make sense – and it does for some of the Prebuilt vendors.
The ones adding such an integration into their Prebuilt solution are mostly those with a SaaS origin story. They see such requirements from their users and then translate it to their embedded offering as well.
Transcriptions, translations, summaries and everything AILike calendars and multiple languages, there are other meeting features that aren’t a “classic” match for Programmable Video APIs but make sense for Prebuilt. These include the gamut of handling the speech to text side of things – the ability to transcribe, translate, generate summaries, extract action items, etc.
All these are things that Prebuilt solutions for Programmable Video are introducing now. And again, it comes mostly from those with a SaaS origin story.
While these are getting better and more accurate due to LLM and generative AI, I think it is worth separating the two. Which leads me to the next thing – LLMs.
How will LLM, conversational AI and bots fit inWith the introduction of generative AI due to the concept of LLM we’ve seen huge amounts of money poured into this space. This is geared and focused towards the creation of conversational AI solutions along with voice and video bots. How will this affect the programmable video space is yet to be seen.
Programmable Voice and Video goes generative AIOpen AI just released a Realtime API for ChatGPT. This is Websocket based and not easy enough to use for live interaction from browsers or end devices.
This left a kind of a gap in the market, which a lot of CPaaS vendors and Programmable Video vendors have rushed to fill with an interface of their own connecting to OpenAI’s Realtime API. We’re tracking these as part of the WebRTC Insights service:
The reason for this rush is threefold:
What is missing here though is the fact that once OpenAI does release a decent realtime API that fixes the gaps (think WebRTC interface), where does that leave all the Programmable Voice (and Video) vendors for the 1:machine use case?
Will they need to compete head on with LLM technology vendors for developer mindshare (and pocket) or will they still be viewed as viable partners?
Prebuilt and generative AIWould it enable plugging machine intelligence instead of humans into these conversations? Making an attempt to focus on specific industries and market niches. Or would it be more towards interfacing with such third parties who bring the machine intelligence piece from “elsewhere”?
More importantly to this article, how will that fit into the world of Prebuilt solutions? On the one hand, this can keep developers away from adopting Prebuilt approaches, as these may or may not be able to cater for the latest approach that comes along with generative AI. using Prebuit may be viewed as a way to stick with the best practice in the video conferencing domain. But we are at an inflection point where trying to figure out and understand what conversational AI really means and how it will look like in the future is practically like writing best practices from scratch. Keeping at the forefront here might mean skipping Prebuilt and needing to go at least one level lower in the abstraction stack.
On the other hand, going Prebuilt might mean having the ability and resources needed to figure out how to add conversational AI to such a solution, assuming it is flexible enough. But how does one know which Prebuilt solution is going to be flexible enough in a domain that is only now being defined?
And maybe, going Prebuilt might mean not needing to deal with this new technology front, and instead, having it provided by the Prebuilt vendor itself – at some (near) future point in time.
More questions here than answers.
The challenge of a niche focusA word of caution though. Taking the strategy of Prebuilt means diving into a niche market.
If you are developing a Prebuilt offering, then know that not all businesses are going to need or align with your offering. Each has his own unique requirements, many of which you are unlikely to be able to cater for. It means knowing and understanding that your potential target market is smaller, but also likely different in nature than the traditional programmable video market.
For those looking for a solution, choosing a Prebuilt alternative means ascribing to the set of features and capabilities provided by that specific vendor. At its code, a Prebuilt offering is less generic and more opinionated. You essentially get what the vendor thinks makes sense. It might be the common sense best practices that he baked into his solution, but that doesn’t mean that it fits your needs exactly. In some cases, using a more generic programmable video offering in the form of a video API might be the better option.
A hybrid approachSome vendors have decided to enjoy both worlds. They do so by offering both a low level generic API while at the same time offering a higher level Prebuilt construct.
How they go about doing that is different and interesting. It is also explained in the video above. They might start with a low level API, adding a Prebuilt solution on top. Or rather start with a kind of a SaaS offering of video communications, later on creating a Prebuilt solution from it and further down the road introduce a lower level API for it as well.
As time goes by and the market matures, we will see more vendors taking up the hybrid approach.
We are seeing this today with CPaaS where quite a few vendors offer both a generic API and a drag and drop Flow/Studio interface.
What’s nextIf you are into this domain and need assistance. Be it in validating the work you are doing with your own APIs and lowcode/nocode solution. Or if what you are after is deciding which vendor to work with for your application, reach out to me. I can help.
The post The future of Programmable Video: Prebuilt and marketplaces appeared first on BlogGeek.me.
Tuesday, December 10 @ 5PM CET / 11AM ET / 8AM PT / 16:00 UTC Join Chad Hart, Editor of webrtcHacks, for an analysis of WebRTC trends in GitHub, StackOverflow, and other open-source communities. Leveraging advanced quantitative analysis techniques, this talk examines millions of GitHub events and developer activity data to uncover key trends […]
The post Upcoming Livestream 10-Dec: 2024 WebRTC in Open Source Review appeared first on webrtcHacks.
Unlock the power of WebRTC in the era of generative AI. Explore the perfect partnership between these groundbreaking technologies.
I am working on my WebRTC for Business People update. Going through it, I saw that the slide I had depicting the evolution of WebRTC had to be updated and fit to today’s realities. These realities are… well… generative AI.
Here are some questions I want to cover this time
Let’s dive into it, shall we?
Table of contentsI had the above done about 5 years ago for the first time. Obviously, it had only the first 3 eras in it: Exploration, Growth and Differentiation –
As we are nearing the end of 2024, we are also closing the chapter on the Differentiation era and starting off the Generative AI one. The line isn’t as distinct as it was in the past, but it is there – you feel the difference in the energy inside companies today and where they put their focus and resources it is ALL about Generative AI.
Why Generative AI? Why now?OpenAI introduced ChatGPT in November 2022, making LLMs (Large Language Models) popular. ChatGPT enabled users to write text prompts and have “the machine” reply back with answers that were human in nature. The initial adoption of ChatCPT was… instant – faster than anything we’ve ever seen before.
Source: Kyle Hailey
This validated the use of AI and Generative AI in a back and forth “prompted” conversation between a human and a machine. From there, the market exploded.
If you look at the WebRTC domain these days, it is kinda “boring”. We’ve had the adrenaline rush of the pandemic, with everyone working on scaling, optimization and getting to a 49-view magic squares layout. But now? Crickets. The use of WebRTC has gone drastically down after the pandemic. Still a lot higher than pre-pandemic time, but lower than what we had earlier. This had companies’ use going back down and their investments shrinking down with it. The world’s turmoils and instabilities aren’t helping here, and the inflation is such that stifles investments as well.
So a new story was needed. One that would attract investment. LLM and Generative AI were then, powered by the popularity of OpenAI’s ChatGPT.
This is such a strong pull that I believe it is going to last for quite a few years, earning it an era of its own in my evolution of WebRTC view.
The need for speed: how GenAI and LLM fit so well with WebRTCChatGPT brought us prompting. You ask a question in text. You get a text answer back. A ping pong game. Conversations are somewhat like that, with a few distinct differences:
So there’s a race going on, where work is being invested everywhere in the Generative AI pipeline to reduce latency as much as possible. I touched on that when I wrote about Open AI, LLM and WebRTC a few months back.
Part of that pipeline is sending and receiving audio over the network, and that is best served today using WebRTC – WebRTC is low latency and available in web browsers. How vendors are designing their interfaces for audio and LLM interactions isn’t the most optimized or simple to use for actual conversations, which is why there are many CPaaS vendors who are adding that layer on top today. I am not quite sure that this is the right approach, or how things will look like a few months out. So many things are currently being experimented and decided.
What does that mean for WebRTC in 2025 and beyondWebRTC has been in a kind of maintenance status for quite some time now. This isn’t going to change much in 2025. The most we will see is developers figuring out how to best fit Generative AI with WebRTC.
Some of the time this will be about the best integration points and APIs to use. In other times, it is going to be about minor tweaks to WebRTC itself and maybe even introducing a new API or two to make it easier for WebRTC to work with Generative AI.
More on what’s in store for us in 2025, in a few weeks time. Once I actually sit down and work it out in a separate article.
I am here to helpIf you are looking to figure out your own way with Generative AI and WebRTC, then contact me.
I am working on a brand new workshop titled “Generative AI and WebRTC” – you can register to the webinar now and reserve your spot.
The post Generative AI and WebRTC: The fourth era in the evolution of WebRTC appeared first on BlogGeek.me.
Stay informed about the latest trends and insights in WebRTC technology. Our unique service offers expert analysis and valuable information for anyone developing using WebRTC.
We are into our 5th year of WebRTC Insights. Along with Philipp Hancke, I’ve been doing this premium biweekly newsletter. Every two weeks, we send it out to our subscribers, covering everything and anything that WebRTC developers need to be aware of. This is used to guide developers with the things important to them. We include bug reports, upcoming features, Chrome experiments, security issues and market trends.
The purpose of it all? Letting developers and decision makers who develop with WebRTC focus on their own application, leaving a lot of the issues that might surprise them to us. We give them the insights they need before they get complaints from customers or get surprised by their competitors.
Each year Philipp asks me if this might be our last one, because, well, let’s face it – there are times when the newsletter is “only” 7 or 8 pages long without a lot of issues. The thing is, whatever is in there is important to someone. I myself took note of something Philipp indicated in issue #102 to be sure to integrate it into our testRTC products.
Why is WebRTC Insights so valuable to our clients?It comes down to two key benefits:
We help engineers and product teams save time by quickly identifying WebRTC issues and market trends. Instead of spending hours searching the internet for clues or trying to piece together fragmented information, we deliver everything they need directly – often several days before their clients or management bring up the issue.
Beyond saving time, we help clients stay focused on what matters most. Whether it’s revisiting past issues, tracking security concerns, understanding Google’s ongoing experiments, or staying updated on areas where Google is investing, we make it easy for them to stay informed.
If I weren’t so humble, I’d say that for those truly dedicated to mastering WebRTC, we’re a force multiplier for their expertise.
WebRTC Insights by the numbersSince this is the fourth year, you can also check out our past “year in review” posts:
This is what we’ve done in these 4 years:
26 Insights issued this year with 250 issues & bugs, 141 PSAs, 13 security vulnerabilities, 312 market insights all totaling 235 pages. We’re keeping ourselves busy making sure you can focus on your stuff.
We have covered well over a thousand issues and written close to 1,000 pages so far.
2024…
In the past year, we’ve seen quite a steep decline in issues and bugs that were filed and we talked about. From our peak of ~450 a year in 2022, to ~320 in 2023 and now 250 in 2024:
YearIssues we reported onIssues filed (libWebRTC/Chrome)2020-2021331658 / 5792021-2022447549 / 6392022-2023329515 / 5572023-2024250361 / 420This correlates with the overall decline in the activity around libWebRTC which has dropped below 200 commits per month in the last year:
This is more visible by looking at the last three years:
The Google team working on WebRTC is now just keeping the lights on. While commit numbers stayed roughly the same, external contributions are now approximately 30% of the total commits. There’s little in the way of innovation and creativity. Most of the work is now technocratic maintenance, if we were to use boring slur words…
The reality is that libWebRTC is mature and good enough. It is embedded inside Chrome, with over a billion installations, and any change in it has a wide range of effect on many applications and users. In the language of Werner Vogels, the CTO of AWS, the blast radius of a bug in libWebRTC can be rather big and impactful.
Let’s dive into the categories of our WebRTC Insights service, to figure out what we’ve had in our 4th year.
BugsIn this section we track new issues filed and progress (in the form of code changes) for both libWebRTC and Chromium. We categorize the issues into regressions for which developers need to take action, insights and features which inform developers about new capabilities or changes to existing behavior and assign a category such as “audio”, “video” or “bandwidth estimation” to make it easy for more specialized developers to only read about the issues affecting their area.
A good example of regressions this year were several regressions in the handling of H.264:
In a nutshell, relatively harmless and very reasonable changes to the way libWebRTC deals with H.264 packetization caused interop issues for services that use H.264 and rely on some of its more exotic features. And those changes made it all the way to Chrome stable which suggests a lack of testing in Beta and Canary versions.
We also track progress on feature work such as “corruption detection” and speculate on why Google is embarking on such projects:
Google migrating both Chromium and WebRTC from the Monorail issue tracker system to the more modern Buganizer caused us a little bit of a headache here.
PSAs & resources worth readingIn this section we track “public service announcements” on the discuss-webrtc mailing list, webrtc-related threads on the blink/chromium mailing list, W3C activity (where we often shake our heads) and highly technical blog posts which do not fit into the “market” category.
A good example of this is Google experimenting with a new way to put the device permissions into the page content which we noted in May, followed by seeing how Google Meet put this into action in November. The process for this is “open” but as a developer you need to be aware of what is possible and being experimented with by Google to keep up.
We also used to track libWebRTC release notes in this section but stopped sending those earlier this year when the migration from Monorail to Buganizer broke the tooling we had. Not many folks missed them so far.
Experiments in WebRTCChrome’s field trials for WebRTC are a good indicator of what large changes are rolling out which either carry some risk of subtle breaks or need A/B experimentation. Sometimes, those trials may explain behavior that only reproduces on some machines but not on others. We track the information from the chrome://version page over time which gives us a pretty good picture on what is going on. Most recently we used it to track how Google is experimenting with a change in getUserMedia which changes how the “ideal” deviceId constraint behaves:
See this issue for more information about the change. We also waved goodbye to the longest-lasting field trial which had been with us the entire four years, being enabled 100% and causing a different behavior in Chrome versus Chromium-based browsers not using Google’s field trial configuration such as Microsoft Edge:
WebRTC-VP8ConferenceTemporalLayers
It was removed (without the default value changing) in this commit. Which is great because it had side-effects on other codecs like H.264.
WebRTC security alertsWe continued tracking WebRTC-related security issues announced in the Chrome release blog. We had eight of them this year, all but one related to how Chromium manages the underlying WebRTC objects. And a vulnerability in the dav1d decoder (as we predicted last year, codec implementations will get some more eyes on them).
WebRTC market guidanceWhat is happening in the world of WebRTC? Who is doing what? Why? When? Where?
We’re looking at the leading vendors, but also at the small startups.
There are probably 3 main areas we cover here:
From time to time, you’ll see us looking at call centers, security and privacy, governance, open source, etc. All with a view from the prism of WebRTC developers and with an attempt to find an insight – something actionable for you to do with that information.
The purpose of it all? For you to understand the moves in the market as well as the best practices that are being defined. Things you can use to think over your own strategy and tactics. Ways for you to leave your company’s echochamber for a bit. All in the purpose of improving your product at the end of the day.
With our shift towards an ever maturing WebRTC market, the market insights section is growing as well. We expect this to happen in the coming year yet again.
Join the WebRTC expertsWe are now headed into our fifth year of WebRTC Insights.
On one hand, there are less technical issues you will bump into. But those that you will, are going to be more important than ever. Why? Because the market is maturing and competition is growing.
So if you’re working with WebRTC and not subscribed to the WebRTC Insights yet – you need to ask yourself why it is. And if you might be interested, then let me know – and I’ll share with you a sample issue of our insights, so you can see what you’ve been missing out on.
The post Four years of WebRTC Insights appeared first on BlogGeek.me.
Rating access immediately in order to a large number of ports out of finest software company from the VegasSlotsOnline. A writer and you will editor with a good penchant to possess games and you may method, Adam Ryan has been for the Local casino.org party to own eight years now. With authored to own and edited numerous iGaming labels in the career, he’s anything of a content sage when it comes to our iGaming duplicate in america and you will Canada. Local casino.org have a rigid 25-step review procedure that we realize for each and every casino comment.
Free Revolves Existing Customers no depositContinue reading for solutions to the most popular questions relating to which kind of casino added bonus. Since the a fact-checker, and you may all of our Chief Betting Officer, Alex Korsager verifies the online casino info on this site. The guy yourself measures up our profiles to your casino’s and you can, when the some thing try unsure, he connectivity the new local casino. In a nutshell, Alex assurances you can make the best and you will accurate decision. Imagine if your FanDuel Michigan Gambling establishment’s indication-up incentive is a great “$2,000 Get involved in it Once more” render.
Knowledge No deposit 100 percent free RevolvesIn identical vein since the earn limitations, you might be extremely scarcely permitted to share everything you have redeemed from the NZ no deposit bonus codes using one spin. The new betting requirements away from 50x are a little greater than we’d including, but it is very standard with no deposit incentives inside the NZ. Yet , Casimba also provides 4 times as numerous zero depoist free revolves for a passing fancy game. This means 4 times as numerous opportunities to struck one to $a hundred max cashout. Besides the bonus size, you should see casual incentive words including reduced wagering requirements and you can a much bigger successful limit. All the free spins is actually valued from the £step 1.sixty, providing a whole bonus property value £8.
No deposit incentives basically become connected with heftier wagering criteria than just matches deposit incentives because they’re liberated to discover. Yes, specific no-deposit casinos in britain don’t have any betting standards on the free signal-upwards incentives. LeoVegas is a prime exemplory case of a casino having a no-bet no deposit extra. It’s 50 100 percent free revolves to your position online game from its range instead betting, nevertheless games change weekly.
This may be sure you are utilizing the newest bonuses correctly and will maximize your prospective profits. For each gambling enterprise has its own book choices and words, very learning the fresh fine print and you will knowing the standards ahead of saying people bonuses is extremely important. Here’s a go through the specific no deposit bonuses provided by this type of greatest casinos.
This isn’t always the case but there are several conditions and terms you need to be cautious about when claiming an excellent incentive choice give and no put. Make sure to see the full Ts and you will Cs of the no-deposit extra 100 percent free choice when saying your give. Here’s probably the most preferred inquiries we’ve got received from the zero deposit incentives in the usa.
Stating The No-deposit Incentive: A step-by-Action BookIn cases like this, the individuals systems are the various percentage steps given by casinos on the internet. One of the some other types out of no-deposit bonuses, 100 percent free gamble and you will extra dollars stand out using their book characteristics. Free play will provide you with an admission to the local casino’s park, letting you participate in casino games without the need to invest any individual money. At the same time, added bonus bucks offers you a selected sum which you can use as you wish within the local casino.
Fundamentally, players need to wager the main benefit number a certain number of times just before they could withdraw one earnings. Check the fresh conditions and terms to make sure your’lso are completely advised regarding the regulations. BetOnline is another on-line casino one runs glamorous no deposit incentive selling, along with individuals online casino bonuses. These types of sales can include totally free spins otherwise free gamble options, usually given as part of a pleasant package. Very, whether your’re a fan of slots or like dining table games, BetOnline’s no-deposit bonuses are certain to keep you captivated.
One of the best the way you use 7Bit gambling establishment bonuses appropriately is to steer clear of the following the added bonus abuse. You’re inclined to withdraw a no deposit added bonus away from your account, but being able to all hangs entirely on the new terminology and you can requirements of the internet casino. No deposit bonuses are perfect for people who do n’t need so you can going their money whenever exploring another local casino or online game. Free elite instructional courses to possess internet casino group geared towards industry recommendations, boosting user sense, and you can reasonable method to betting. Only a few gaming sites offer no-deposit bets many manage and we’ve got seen her or him to your pursuing the sportsbooks. Maximum added bonus amount you can get from your no-deposit extra wager give may differ ranging from playing web site and added bonus, but may also be a lot below what you can get off their incentives.
Even though you can access a no cost no-deposit extra or something like that else, you will want to see the campaign’s Fine print. For many who wear’t accomplish that promptly, you acquired’t know very well what to do, and even the new tiniest error may cause one get rid of the new extra. If you are unclear about all local casino offers, find knowledgeable responses with the content field. Offer an email address, and you will get a response in minutes.
You have to know which in the no-deposit expected bonusesOf a lot web based casinos offer support or VIP software one reward present players with exclusive no-deposit bonuses or any other incentives including cashback rewards. As an example, Bovada also offers a suggestion program getting to $a hundred for each and every deposit referral, in addition to a plus for guidelines having fun with cryptocurrency. They are particular promotions, as there are often a spot to them. That time is usually to score people to try specific online game created by the newest casino’s lovers.
The post one hundred Zombies Remark Position Ratings appeared first on BlogGeek.me.
Twilio Programmable Video is back. Twilio decided not to sunset this service. Here’s where their new focus lies and what it means to you and to the industry.
A year ago, Twilio announced sunsetting its Programmable Video service. Now, it is back from the dead, like a phoenix rising up from the ashes. Or is that going to be more like a dead walking zombie?
Here’s what I think happened and what it means – to CPaaS, Twilio and other vendors.
👉 Twilio being central about CPaaS means they have a dedicated page of their own on my site – you can check it up here: Twilio
Table of contentsLet’s first look at two important aspects of the decision of Twilio to sunset their Twilio Programmable Video service. I did a couple of video recordings converting some of the visuals from my Video API report and placed them on YouTube (you should subscribe to my channel if you haven’t already).
The first one? A look at Twilio’s video services.
The second one? A look at how the market is going to figure this one out:
All in all, not good for the market.
Twilio Customers in the past yearTo be frank, this started before the EOL announcement. If you look at the commits done to the Twilio Video SDK you see this picture:
Half a year prior to the announcement, the SDK got no commits whatsoever. And then? The official EOL came.
This last year has been tough on Twilio’s customers who use Programmable Video.
They had to migrate away from Twilio, with the need to do it by the end of 2024.
The time wasn’t long enough for many of the customers, and they likely complained to Twilio. The EOL (End Of Life) date moved to 2026, giving two more years for these customers.
The development work needed to switch and migrate away from Twilio might not have been huge, but it was not scheduled and came in as a critical requirement. In some cases, the customers didn’t have the engineering team in place for it, because external outsourcing vendors and freelancers originally developed the integration. In other cases, the migration required also dealing with mobile native applications, which is always more expensive and time consuming.
In one case, I had a vendor complain that they can’t replace the code in the appliances it deployed in a timespan of a year even if he wanted to – he works in a regulated industry and environment with native mobile applications.
Twilio set up their customers to a royal mess and a real headache here.
Zag: Twilio Programmable Video back from the deadThen came the zag. Twilio decided to revert its decision and keep Twilio Programmable Video going. Here’s the statement/announcement from Twilio’s blog.
Here’s how they start it off:
“Today, we’re excited to announce that Twilio Video will remain as a product that we are committed to investing in and growing to best meet the needs of our customers. […]
Twilio Video will not be discontinued, and instead, we are investing in its development moving forward to continue to enhance customer engagement by enabling businesses to embed Video calling into their unique customer experiences.”
In their “why the change” section of the post, Twilio is trying to build a case for video (again). In it, they are making an effort to explain that they aren’t going to sunset video in the future, which is an important signal to potential new customers as well as existing ones. Their explanation revolves around the customer engagement use cases – this is important.
The “what to expect moving forward” section is the interesting part. It is built out of 4 bullets. Here’s what I think about them:
Alli in all, Twilio is planning on focusing predominantly on 1:1 customer engagement use cases and connecting them to Segment. At least that’s my reading of things.
Sunk costs or a hidden opportunity for customersWhat about Twilio Programmable Video customers?
They had a year to plan and move away from the service to something else. Many of them either finished their migration or close to that point.
Should they now revert back to using Twilio? Stick with the competition?
Those who are in the middle of migration – should they stick to Twilio or keep investing resources in migrating away from Twilio?
These customers spent time and money on moving away. Should they view that as sunk costs or as an opportunity?
From discussions with a few Twilio customers, it seems that the answers are varied. In some cases, what they’ve done is built an abstraction running on top of two vendors – Twilio and the new vendor they’re migrating to. This way, they can keep Twilio as a backup as long as Twilio runs the service.
Now? They have the option to pick and choose which of the two alternatives to use.
This works well for services that do 1:1 meetings. Less so for group meetings.
In a way, Twilio reverting back adds another layer of headache and decisions that customers now need to go through (again).
Twilio’s challenges aheadThis leads us to the challenges Twilio is about to face.
The 3 leading ones are:
All 3 are solvable, but will take time, attention and commitment on behalf of Twilio.
Zoom: The biggest winner of allThe big winner this past year? Zoom.
Zoom had an SDK and a Programmable Video offering, but it was known and popularized for its UCaaS service. Twilio sunsetting Programmable Video while at the same time suggesting and sending customers to Zoom was a proof of quality from a third party in the space that Zoom enjoyed.
This cannot be taken back now. It rocketed the Zoom Video SDK to one of the alternatives that potential buyers now need to review and explain why they shouldn’t be trialing it.
All in all, a good thing for Zoom.
This change of heart by Twilio? Not going to affect Zoom.
What should you doIf you are already using Twilio and were migrating away –
There’s also always my Video API report to help you out (contact me for a discount on it or if you want some more specific consultation)
The post Twilio Programmable Video is back from the dead appeared first on BlogGeek.me.
Twilio Programmable Video is back. Twilio decided not to sunset this service. Here’s where their new focus lies and what it means to you and to the industry.
A year ago, Twilio announced sunsetting its Programmable Video service. Now, it is back from the dead, like a phoenix rising up from the ashes. Or is that going to be more like a dead walking zombie?
Here’s what I think happened and what it means – to CPaaS, Twilio and other vendors.
Twilio being central about CPaaS means they have a dedicated page of their own on my site – you can check it up here: Twilio
Table of contentsLet’s first look at two important aspects of the decision of Twilio to sunset their Twilio Programmable Video service. I did a couple of video recordings converting some of the visuals from my Video API report and placed them on YouTube (you should subscribe to my channel if you haven’t already).
The first one? A look at Twilio’s video services.
The second one? A look at how the market is going to figure this one out:
All in all, not good for the market.
Twilio Customers in the past yearTo be frank, this started before the EOL announcement. If you look at the commits done to the Twilio Video SDK you see this picture:
Half a year prior to the announcement, the SDK got no commits whatsoever. And then? The official EOL came.
This last year has been tough on Twilio’s customers who use Programmable Video.
They had to migrate away from Twilio, with the need to do it by the end of 2024.
The time wasn’t long enough for many of the customers, and they likely complained to Twilio. The EOL (End Of Life) date moved to 2026, giving two more years for these customers.
The development work needed to switch and migrate away from Twilio might not have been huge, but it was not scheduled and came in as a critical requirement. In some cases, the customers didn’t have the engineering team in place for it, because external outsourcing vendors and freelancers originally developed the integration. In other cases, the migration required also dealing with mobile native applications, which is always more expensive and time consuming.
Once I had a vendor complain that they can’t replace the code in the appliances it deployed in a timespan of a year even if he wanted to – he works in a regulated industry and environment with native mobile applications.
Twilio set up their customers to a royal mess and a real headache here.
Zag: Twilio Programmable Video back from the deadThen came the zag. Twilio decided to revert its decision and keep Twilio Programmable Video going. Here’s the statement/announcement from Twilio’s blog.
Here’s how they start it off:
“Today, we’re excited to announce that Twilio Video will remain as a product that we are committed to investing in and growing to best meet the needs of our customers. […]
Twilio Video will not be discontinued, and instead, we are investing in its development moving forward to continue to enhance customer engagement by enabling businesses to embed Video calling into their unique customer experiences.”
In their “why the change” section of the post, Twilio is trying to build a case for video (again). In it, they are making an effort to explain that they aren’t going to sunset video in the future, which is an important signal to potential new customers as well as existing ones. Their explanation revolves around the customer engagement use cases – this is important.
The “what to expect moving forward” section is the interesting part. It is built out of 4 bullets. Here’s what I think about them:
Alli in all, Twilio is planning on focusing predominantly on 1:1 customer engagement use cases and connecting them to Segment. At least that’s my reading of things.
Sunk costs or a hidden opportunity for customersWhat about Twilio Programmable Video customers?
They had a year to plan and move away from the service to something else. Many of them either finished their migration or close to that point.
Should they now revert back to using Twilio? Stick with the competition?
Those who are in the middle of migration – should they stick to Twilio or keep investing resources in migrating away from Twilio?
These customers spent time and money on moving away. Should they view that as sunk costs or as an opportunity?
From discussions with a few Twilio customers, it seems that the answers are varied. In some cases, what they’ve done is built an abstraction running on top of two vendors – Twilio and the new vendor they’re migrating to. This way, they can keep Twilio as a backup as long as Twilio runs the service.
Now? They have the option to pick and choose which of the two alternatives to use.
This works well for services that do 1:1 meetings. Less so for group meetings.
In a way, Twilio reverting back adds another layer of headache and decisions that customers now need to go through (again).
Twilio’s challenges aheadThis leads us to the challenges Twilio is about to face.
The 3 leading ones are:
All 3 are solvable, but will take time, attention and commitment on behalf of Twilio.
Zoom: The biggest winner of allThe big winner this past year? Zoom.
Zoom had an SDK and a Programmable Video offering, but it was known and popularized for its UCaaS service. Twilio sunsetting Programmable Video while at the same time suggesting and sending customers to Zoom was a proof of quality from a third party in the space that Zoom enjoyed.
This cannot be taken back now. It rocketed the Zoom Video SDK to one of the alternatives that potential buyers now need to review and explain why they shouldn’t be trialing it.
All in all, a good thing for Zoom.
This change of heart by Twilio? Not going to affect Zoom.
What should you doIf you are already using Twilio and were migrating away –
There’s also always my Video API report to help you out (contact me for a discount on it or if you want some more specific consultation)
The post Twilio Programmable Video is back from the dead appeared first on BlogGeek.me.
Struggling with WebRTC POC or demo development? Follow these best practices to save time and increase the success of your project.
I get approached by a lot of startups and developers who start on the path to building WebRTC applications. Oftentimes, they reach out to me when they can’t get their POC (Proof of Concept) or demo to work properly.
For those who don’t want to go through paid consulting, here are some best practices that can save you time and can considerably increase the success rate of your project.
Table of contentsI don’t want to delve here too much on peer to peer type solutions. These require no media server and due to that are “easier” to develop into a nice demo. The services that use media servers are the ones that are often more beefy and are also the ones that fall into many challenging traps during a POC development.
Media requires the use of ephemeral ports that get allocated dynamically. It needs to negotiate connections. There are more moving parts that can break and fail on you.
All of the following sections here include best practices that you should read before going on to implement your WebRTC demo. Best to use them during your design and planning phases.
👉 An introduction to WebRTC media servers
Use CPaaSLet’s start with the most important question of all. If you’ve decided to install and host media servers in AWS or other locations – are you sure this is an important part of your demo?
I’ll try to explain this question. A demo or a POC comes to prove a point. It can be something like “we want to validate the technical viability of the project” or “we wanted to have something up and running quickly to start getting real customers’ feedback”.
If what you want is to build an MVP (Minimal Viable Product) with the intent of attracting a few friendly customers, go to a VC for funding or just test the waters before plunging in, then be sure to do that using CPaaS or a Programmable Video solution. These are usually based on usage pricing so they won’t be expensive when you’re just starting out. But they will reduce a lot of the headaches in development and maintenance of the infrastructure – so they’re more than worth it.
Sometimes, what you will be after is a POC that seeks to answer the question “what does it mean to build this on our own”. Not only due to costs but mainly due to the uniqueness of the requirements desired – these may include the need to run in a closed network, connect to certain restricted components, etc. Here, having the POC not use CPaaS and rely on open source self hosted components will make perfect sense.
First have the “official” media server demo workDecided not to use CPaaS? Picked a few open source media servers and components that you’ll be using?
Make sure to install, run and validate the demo application of that open source media server.
You should do this because:
Using a 3rd party? Install and run its demo first.
Don’t. Use. DockerDocker is great. Especially in production. Well… that’s what I’ve been told by DevOps people. It makes deploying easier. It is great for continuous integration. It is fairy dust on the code developers write.
But for WebRTC media servers? It is hell on earth to get configured properly for the first time. Too many ports need to be opened all over the place. Some TCP. Lots of them UDP. And if you miss the configuration – the media won’t get connected. Or it will. Sometimes. Which is worse.
My suggestion? Leave all the DevOps fairy dust for production. For your POC and demo? Go with operating systems on virtual machines or on bare metal. This will save you a lot of headaches by making sure things will fail less due to not having ports opened properly on your Docker configuration(s).
You don’t have time to waste when you’re developing that WebRTC POC.
Don’t do native. Go webRemember that suggestion about doing the full session for your demo so you know the infrastructure is built properly? If you need native applications on mobile devices – don’t.
The easiest way to develop a demo for WebRTC would be by using a web browser for the client side. I’d go farther and say by using Chrome web browser. Ignore Firefox and Safari for the initial POC. Skip mobile – assume these are a lot of work but won’t validate anything architecturally. At least not for the majority of application types.
👉 Still need to go native and mobile? Here are your WebRTC mobile SDK alternatives
Use a 3rd party TURN serviceAlways always always configure TURN in your iceServers for the peer connections.
Your initial “hello world” moment is likely to take place on the local LAN or even on the same machine. But once you start placing the devices on different networks, things will start failing without TURN servers. To make sure you don’t get there, just have TURN configured.
And have it configured properly.
And don’t install and host your own TURN servers.
Just use a managed TURN service.
The ones I’d pick for this task are either Twilio or Cloudflare for this stage. They are easy to start with.
You can always replace them with your own later without any vendor lock-in risk. But starting off with your own is too much work and hassle and will bring with it a slew of potential bugs and blockers that you just don’t need at this point in time.
👉 More on NAT Traversal and TURN servers in WebRTC
Be very specific about your requirements (and “demo” them)Don’t assume that connecting a single user to a meeting room in a demo application means you can connect 20 users into that meeting room.
Streaming a webcam to a viewer isn’t the same as streaming that same webcam to 100 viewers.
If you plan on doing a real proof of concept, be sure to define the exact media requirements you have and to implement them at the scale of a different session. Not doing so means you aren’t really validating anything in your architecture.
A 1:1 meeting uses a different architecture than a 4-way video meeting which in turn uses a different architecture than a 20-50 participants in a meeting, which is different once you think about 100 or 200 participants, which again looks different architecturally when you’re hitting 1,000-10,000 and then… you get the point on how to continue from here.
The same applies for things like using screen sharing, doing spatial audio, multiple video sharing, etc. Have all these as part of your POC. It can be clunky and kinda ugly, but it needs to be there. You must have an understanding of if and how it works – of what are the limits you are bound to hit with it.
For the larger and more complex applications, be sure you know all of the suggestions in this article before coming to read it. If you don’t, then you should beef up your understanding and experience with WebRTC infrastructure and architecture…
Got a POC? Build it to scale for that single session you’re aiming for. I won’t care if you can do 2 of these in parallel or a 1,000. That’s also important, but can wait for later stages.
👉 More on scaling WebRTC meeting sizes
One step at a timeSetting up a WebRTC POC is a daunting task. There are multiple moving parts in there, each with its own quirks. If one thing goes wrong, nothing works.
This is true for all development projects, but it is a lot more relevant and apparent in WebRTC development projects. When you start these exploration steps with putting up a POC or a demo, there is a lot to get done right. Configurations, ports, servers, clients, communication channels.
Taking multiple installation or configuration steps at once will likely end up with a failure due to a bug in one of these steps. Tracing back to figure out what was the change causing this failure will take quite some time, leading to delays and frustrations. Better to take one step at a time. Validating each time that the step taken worked as expected.
I earned that the hard way at the age of 22, while being the lead integrator of an important project the company I worked for had with Cisco and HP. I blamed a change that HP did on an issue we had with our VoIP implementation that lost us a full week. It ended up me… doing two steps instead of one. But that’s a story for another time.
Know your toolingIf you don’t know what webrtc-internals is and haven’t used dump-importer then you’re doing it wrong.
Not using these tools mean that when things go wrong (and they will), you’re going to be totally blind on why. These aren’t perfect tools, but they give you a lot of power and visibility that you wouldn’t have otherwise.
Here’s how you download a webrtc internals file:
You’ll need to do that if you want to view the results on fippo’s webrtc-dump-importer.
And if you’re serious about it, then you can read a bit about what the WebRTC statistics there really mean.
Now if you’re going to do this properly and with a budget, I can suggest using testRTC for both testing and monitoring.
Know more about WebRTCEverything above will get you started. You’ll be able to get to a workable POC or demo. Is that fit for production? What will be missing there? Is the architecture selected the one that will work for you? How do you scale this properly?
You can read about it online or even ask ChatGPT as you go along. The thing is that a shallow understanding of WebRTC isn’t advisable here. Which is a nice segway to say that you should look at our WebRTC courses if you want to dig deeper into WebRTC and become skilled with using it.
The post Best practices for WebRTC POC/Demo development appeared first on BlogGeek.me.
Struggling with WebRTC POC or demo development? Follow these best practices to save time and increase the success of your project.
I get approached by a lot of startups and developers who start on the path to building WebRTC applications. Oftentimes, they reach out to me when they can’t get their POC (Proof of Concept) or demo to work properly.
For those who don’t want to go through paid consulting, here are some best practices that can save you time and can considerably increase the success rate of your project.
Table of contentsI don’t want to delve here too much on peer to peer type solutions. These require no media server and due to that are “easier” to develop into a nice demo. The services that use media servers are the ones that are often more beefy and are also the ones that fall into many challenging traps during a POC development.
Media requires the use of ephemeral ports that get allocated dynamically. It needs to negotiate connections. There are more moving parts that can break and fail on you.
All of the following sections here include best practices that you should read before going on to implement your WebRTC demo. Best to use them during your design and planning phases.
An introduction to WebRTC media servers
Use CPaaSLet’s start with the most important question of all. If you’ve decided to install and host media servers in AWS or other locations – are you sure this is an important part of your demo?
I’ll try to explain this question. A demo or a POC comes to prove a point. It can be something like “we want to validate the technical viability of the project” or “we wanted to have something up and running quickly to start getting real customers’ feedback”.
If what you want is to build an MVP (Minimal Viable Product) with the intent of attracting a few friendly customers, go to a VC for funding or just test the waters before plunging in, then be sure to do that using CPaaS or a Programmable Video solution. These are usually based on usage pricing so they won’t be expensive when you’re just starting out. But they will reduce a lot of the headaches in development and maintenance of the infrastructure – so they’re more than worth it.
Sometimes, what you will be after is a POC that seeks to answer the question “what does it mean to build this on our own”. Not only due to costs but mainly due to the uniqueness of the requirements desired – these may include the need to run in a closed network, connect to certain restricted components, etc. Here, having the POC not use CPaaS and rely on open source self hosted components will make perfect sense.
First have the “official” media server demo workDecided not to use CPaaS? Picked a few open source media servers and components that you’ll be using?
Make sure to install, run and validate the demo application of that open source media server.
You should do this because:
Using a 3rd party? Install and run its demo first.
Don’t. Use. DockerDocker is great. Especially in production. Well… that’s what I’ve been told by DevOps people. It makes deploying easier. It is great for continuous integration. It is fairy dust on the code developers write.
But for WebRTC media servers? It is hell on earth to get configured properly for the first time. Too many ports need to be opened all over the place. Some TCP. Lots of them UDP. And if you miss the configuration – the media won’t get connected. Or it will. Sometimes. Which is worse.
My suggestion? Leave all the DevOps fairy dust for production. For your POC and demo? Go with operating systems on virtual machines or on bare metal. This will save you a lot of headaches by making sure things will fail less due to not having ports opened properly on your Docker configuration(s).
You don’t have time to waste when you’re developing that WebRTC POC.
Don’t do native. Go webRemember that suggestion about doing the full session for your demo so you know the infrastructure is built properly? If you need native applications on mobile devices – don’t.
The easiest way to develop a demo for WebRTC would be by using a web browser for the client side. I’d go farther and say by using Chrome web browser. Ignore Firefox and Safari for the initial POC. Skip mobile – assume these are a lot of work but won’t validate anything architecturally. At least not for the majority of application types.
Still need to go native and mobile? Here are your WebRTC mobile SDK alternatives
Use a 3rd party TURN serviceAlways always always configure TURN in your iceServers for the peer connections.
Your initial “hello world” moment is likely to take place on the local LAN or even on the same machine. But once you start placing the devices on different networks, things will start failing without TURN servers. To make sure you don’t get there, just have TURN configured.
And have it configured properly.
And don’t install and host your own TURN servers.
Just use a managed TURN service.
The ones I’d pick for this task are either Twilio or Cloudflare for this stage. They are easy to start with.
You can always replace them with your own later without any vendor lock-in risk. But starting off with your own is too much work and hassle and will bring with it a slew of potential bugs and blockers that you just don’t need at this point in time.
More on NAT Traversal and TURN servers in WebRTC
Be very specific about your requirements (and “demo” them)Don’t assume that connecting a single user to a meeting room in a demo application means you can connect 20 users into that meeting room.
Streaming a webcam to a viewer isn’t the same as streaming that same webcam to 100 viewers.
If you plan on doing a real proof of concept, be sure to define the exact media requirements you have and to implement them at the scale of a different session. Not doing so means you aren’t really validating anything in your architecture.
A 1:1 meeting uses a different architecture than a 4-way video meeting which in turn uses a different architecture than a 20-50 participants in a meeting, which is different once you think about 100 or 200 participants, which again looks different architecturally when you’re hitting 1,000-10,000 and then… you get the point on how to continue from here.
The same applies for things like using screen sharing, doing spatial audio, multiple video sharing, etc. Have all these as part of your POC. It can be clunky and kinda ugly, but it needs to be there. You must have an understanding of if and how it works – of what are the limits you are bound to hit with it.
For the larger and more complex applications, be sure you know all of the suggestions in this article before coming to read it. If you don’t, then you should beef up your understanding and experience with WebRTC infrastructure and architecture…
Got a POC? Build it to scale for that single session you’re aiming for. I won’t care if you can do 2 of these in parallel or a 1,000. That’s also important, but can wait for later stages.
More on scaling WebRTC meeting sizes
One step at a timeSetting up a WebRTC POC is a daunting task. There are multiple moving parts in there, each with its own quirks. If one thing goes wrong, nothing works.
This is true for all development projects, but it is a lot more relevant and apparent in WebRTC development projects. When you start these exploration steps with putting up a POC or a demo, there is a lot to get done right. Configurations, ports, servers, clients, communication channels.
Taking multiple installation or configuration steps at once will likely end up with a failure due to a bug in one of these steps. Tracing back to figure out what was the change causing this failure will take quite some time, leading to delays and frustrations. Better to take one step at a time. Validating each time that the step taken worked as expected.
I earned that the hard way at the age of 22, while being the lead integrator of an important project the company I worked for had with Cisco and HP. I blamed a change that HP did on an issue we had with our VoIP implementation that lost us a full week. It ended up me… doing two steps instead of one. But that’s a story for another time.
Know your toolingIf you don’t know what webrtc-internals is and haven’t used dump-importer then you’re doing it wrong.
Not using these tools mean that when things go wrong (and they will), you’re going to be totally blind on why. These aren’t perfect tools, but they give you a lot of power and visibility that you wouldn’t have otherwise.
Here’s how you download a webrtc internals file:
You’ll need to do that if you want to view the results on fippo’s webrtc-dump-importer.
And if you’re serious about it, then you can read a bit about what the WebRTC statistics there really mean.
Now if you’re going to do this properly and with a budget, I can suggest using testRTC for both testing and monitoring.
Know more about WebRTCEverything above will get you started. You’ll be able to get to a workable POC or demo. Is that fit for production? What will be missing there? Is the architecture selected the one that will work for you? How do you scale this properly?
You can read about it online or even ask ChatGPT as you go along. The thing is that a shallow understanding of WebRTC isn’t advisable here. Which is a nice segway to say that you should look at our WebRTC courses if you want to dig deeper into WebRTC and become skilled with using it.
The post Best practices for WebRTC POC/Demo development appeared first on BlogGeek.me.
Explore the world of video codecs and their significance in WebRTC. Understand the advantages and trade-offs of switching between different codec generations.
Technology grinds forward with endless improvements. I remember when I first came to video conferencing, over 20 years ago, the video codecs used were H.261, H.263 and H.263+ with all of its glorious variants. H.264 was starting to be discussed and deployed here and there.
Today? H.264 and VP8 are everywhere. We bump into VP9 in WebRTC applications and we talk about AV1.
What does it mean exactly to move from one video codec generation to another? What do we gain? What do we lose? This is what I want to cover in this article.
Table of contentsDon’t have time for my ramblings? This short video should have you mostly covered:
👉 I started recording these videos a few months back. If you like them, then don’t forget to like them 😉
The TL;DR:
A codec is a piece of software that compresses and decompresses data. A video codec consists of an encoder which compresses a raw video input and a decoder which decompresses the compressed bitstream of a video back to something that can be displayed.
👉 We are dealing here with lossy codecs. Codecs that don’t maintain the whole data, but rather lose information trying to hold as much as the original as possible with as little data that needs to be stored as possible
The way video codecs are defined is by their decoder:
Given a bitstream generated by a video encoder, the video codec specification indicates how to decompress that bitstream back into a viewable format.
What does that mean?
Video codecs require a lot of CPU and memory to operate. This means that in many cases, our preference would be to offload their job from the CPU to hardware acceleration. Most modern devices today have media acceleration components in the form of GPUs or other chipset components that are capable of bearing the brunt of this work. It is why mobile devices can shoot high quality videos with their internal camera for example.
Since video codecs are dictated by the specification of their decoder, defining and implementing hardware acceleration for video decoders is a lot easier than doing the same thing for video encoders. That’s because the decoders are deterministic.
For the video encoder, you need to start asking questions –
This leads us to the fact that in many cases and scenarios, hardware acceleration of video codecs isn’t suitable for WebRTC at all – they are added to devices so people can watch YouTube videos of cats or create their own TikTok videos. Both of these activities are asynchronous ones – we don’t care how long the process of encoding and decoding takes (we do, but not in the range of milliseconds of latency).
Up until a few years ago, most hardware acceleration out there didn’t work well for WebRTC and video conferencing applications. This started to change with the Covid pandemic, which caused a shift in priorities. Remote work and remote collaboration scenarios climbed the priorities list for device manufacturers and their hardware acceleration components.
Where does that leave us?
The end result? Another headache to deal with… and we didn’t even start to talk about codec generations.
New video codec generation = newer, more sophisticated toolsI mentioned the tools that are the basis of a video codec. The decoder knows how to read a bitstream based on these tools. The encoder picks and chooses which tools to use when.
When moving to a newer codec generation what usually happens is that the tools we had are getting more flexible and sophisticated, introducing new features and capabilities. And new tools are also added.
More tools and features mean the encoder now has more decisions to make when it compresses. This usually means the encoder needs to use more memory and CPU to get the job done if what we’re aiming for is better compression.
Switching from one video codec generation to another means we need the devices to be able to carry that additional resource load…
A few hard facts about video codecsHere are a few things to remember when dealing with video codecs:
It is time to start looking at WebRTC and its video codecs. We will begin with the MTI video codecs – the Mandatory To Implement. This has been a big debate back in the day. The standardization organizations couldn’t decide if VP8 or H.264 need to be the MTI codecs.
To make a long story short – a decision was made that both are MTI.
What does this mean exactly?
These video codecs are rather comparable for their “price/performance”. There are differences though.
👉 If you’re contemplating which one to use, I’ve got a short free video course to guide you through this decision making process: H.264 or VP8 – What Shall it be?
The emergence of VP9 and rejection of HEVCThe descendants of VP8 and H.264 are VP9 and HEVC.
H.264 is a royalty bearing codec and so is HEVC. VP8 and VP9 are both royalty free codecs.
HEVC being newer and considerably more expensive made things harder for it to be adopted for something like WebRTC. That’s because WebRTC requires a large ecosystem of vendors and agreements around how things are done. With a video codec, not knowing who needs to pay the royalties stifles its adoption.
And here, should the ones paying be the chipset vendor? Device manufacturer? The browser vendor? The application developer? No easy answer, so no decision.
This is why HEVC ended up being left out of WebRTC for the time being.
VP9 was an easy decision in comparison.
Today, you can find VP9 in applications such as Google Meet and Jitsi Meet among many others who decided to go for this video codec generation and not stay in the VP8/H.264 generation.
The big promise of VP9 was its SVC support
Our brave new world of AV1AV1 is our next gen of video codecs. The promise of a better world. Peace upon the earth. Well… no.
Just a divergence in the road that puts a focus in a future that is mostly royalty free for video codecs (maybe).
What do we get from AV1 as a new video codec generation compared to VP9? Mainly what we did from VP9 compared to VP8. Better quality for the same bitrate and the price of CPU and memory.
Where VP9 brought us the promise of SVC, AV1 is bringing with it the promise of better screen sharing of text. Why? Because its compression tools are better equipped for text, something that was/is lacking in previous video codecs.
AV1 has behind it most of the industry. Somehow, at a magical moment in the past, they got together and got to the conclusion that a royalty free video codec would benefit everyone, creating the Alliance of Open Media and with it the AV1 specification. This got the push the codec needed to become the most dominant video coding technology of our near future.
For WebRTC, it marks the 3rd video generation codec that we can now use:
Here’s an update of what Meta is doing with AV1 on mobile from their RTC@Scale event earlier this year.
This is a start. And a good one. You see experiments taking place as well as first steps towards productizing it (think Google Meet and Jitsi Meet here among others) in the following areas:
First things first. If you’re going to use a video codec of a newer generation than what you currently have, then this is what you’ll need to decide:
Do you focus on getting the same bitrate you have in the past, effectively increasing the media quality of the session. Or alternatively, are you going to lower the bitrate from where it was, reducing your bandwidth requirements.
Obviously, you can also pick anything in between the two, reducing the bitrate used a bit and increasing the quality a bit.
Starting to use another video codec though isn’t only about bitrate and quality. It is about understanding its tooling and availability as well:
There’s a lot more to be said about video codecs and how they get used in WebRTC.
For more, you can always enroll in my WebRTC courses.
The post WebRTC video codec generations: Moving from VP8 and H.264 to VP9 and AV1 appeared first on BlogGeek.me.
Explore the world of video codecs and their significance in WebRTC. Understand the advantages and trade-offs of switching between different codec generations.
Technology grinds forward with endless improvements. I remember when I first came to video conferencing, over 20 years ago, the video codecs used were H.261, H.263 and H.263+ with all of its glorious variants. H.264 was starting to be discussed and deployed here and there.
Today? H.264 and VP8 are everywhere. We bump into VP9 in WebRTC applications and we talk about AV1.
What does it mean exactly to move from one video codec generation to another? What do we gain? What do we lose? This is what I want to cover in this article.
Table of contentsDon’t have time for my ramblings? This short video should have you mostly covered:
I started recording these videos a few months back. If you like them, then don’t forget to like them
The TL;DR:
A codec is a piece of software that compresses and decompresses data. A video codec consists of an encoder which compresses a raw video input and a decoder which decompresses the compressed bitstream of a video back to something that can be displayed.
We are dealing here with lossy codecs. Codecs that don’t maintain the whole data, but rather lose information trying to hold as much as the original as possible with as little data that needs to be stored as possible
The way video codecs are defined is by their decoder:
Given a bitstream generated by a video encoder, the video codec specification indicates how to decompress that bitstream back into a viewable format.
What does that mean?
Video codecs require a lot of CPU and memory to operate. This means that in many cases, our preference would be to offload their job from the CPU to hardware acceleration. Most modern devices today have media acceleration components in the form of GPUs or other chipset components that are capable of bearing the brunt of this work. It is why mobile devices can shoot high quality videos with their internal camera for example.
Since video codecs are dictated by the specification of their decoder, defining and implementing hardware acceleration for video decoders is a lot easier than doing the same thing for video encoders. That’s because the decoders are deterministic.
For the video encoder, you need to start asking questions –
This leads us to the fact that in many cases and scenarios, hardware acceleration of video codecs isn’t suitable for WebRTC at all – they are added to devices so people can watch YouTube videos of cats or create their own TikTok videos. Both of these activities are asynchronous ones – we don’t care how long the process of encoding and decoding takes (we do, but not in the range of milliseconds of latency).
Up until a few years ago, most hardware acceleration out there didn’t work well for WebRTC and video conferencing applications. This started to change with the Covid pandemic, which caused a shift in priorities. Remote work and remote collaboration scenarios climbed the priorities list for device manufacturers and their hardware acceleration components.
Where does that leave us?
The end result? Another headache to deal with… and we didn’t even start to talk about codec generations.
New video codec generation = newer, more sophisticated toolsI mentioned the tools that are the basis of a video codec. The decoder knows how to read a bitstream based on these tools. The encoder picks and chooses which tools to use when.
When moving to a newer codec generation what usually happens is that the tools we had are getting more flexible and sophisticated, introducing new features and capabilities. And new tools are also added.
More tools and features mean the encoder now has more decisions to make when it compresses. This usually means the encoder needs to use more memory and CPU to get the job done if what we’re aiming for is better compression.
Switching from one video codec generation to another means we need the devices to be able to carry that additional resource load…
A few hard facts about video codecsHere are a few things to remember when dealing with video codecs:
It is time to start looking at WebRTC and its video codecs. We will begin with the MTI video codecs – the Mandatory To Implement. This has been a big debate back in the day. The standardization organizations couldn’t decide if VP8 or H.264 need to be the MTI codecs.
To make a long story short – a decision was made that both are MTI.
What does this mean exactly?
These video codecs are rather comparable for their “price/performance”. There are differences though.
If you’re contemplating which one to use, I’ve got a short free video course to guide you through this decision making process: H.264 or VP8 – What Shall it be?
The emergence of VP9 and rejection of HEVCThe descendants of VP8 and H.264 are VP9 and HEVC.
H.264 is a royalty bearing codec and so is HEVC. VP8 and VP9 are both royalty free codecs.
HEVC being newer and considerably more expensive made things harder for it to be adopted for something like WebRTC. That’s because WebRTC requires a large ecosystem of vendors and agreements around how things are done. With a video codec, not knowing who needs to pay the royalties stifles its adoption.
And here, should the ones paying be the chipset vendor? Device manufacturer? The browser vendor? The application developer? No easy answer, so no decision.
This is why HEVC ended up being left out of WebRTC for the time being.
VP9 was an easy decision in comparison.
Today, you can find VP9 in applications such as Google Meet and Jitsi Meet among many others who decided to go for this video codec generation and not stay in the VP8/H.264 generation.
The big promise of VP9 was its SVC support
Our brave new world of AV1AV1 is our next gen of video codecs. The promise of a better world. Peace upon the earth. Well… no.
Just a divergence in the road that puts a focus in a future that is mostly royalty free for video codecs (maybe).
What do we get from AV1 as a new video codec generation compared to VP9? Mainly what we did from VP9 compared to VP8. Better quality for the same bitrate and the price of CPU and memory.
Where VP9 brought us the promise of SVC, AV1 is bringing with it the promise of better screen sharing of text. Why? Because its compression tools are better equipped for text, something that was/is lacking in previous video codecs.
AV1 has behind it most of the industry. Somehow, at a magical moment in the past, they got together and got to the conclusion that a royalty free video codec would benefit everyone, creating the Alliance of Open Media and with it the AV1 specification. This got the push the codec needed to become the most dominant video coding technology of our near future.
For WebRTC, it marks the 3rd video generation codec that we can now use:
Here’s an update of what Meta is doing with AV1 on mobile from their RTC@Scale event earlier this year.
This is a start. And a good one. You see experiments taking place as well as first steps towards productizing it (think Google Meet and Jitsi Meet here among others) in the following areas:
First things first. If you’re going to use a video codec of a newer generation than what you currently have, then this is what you’ll need to decide:
Do you focus on getting the same bitrate you have in the past, effectively increasing the media quality of the session. Or alternatively, are you going to lower the bitrate from where it was, reducing your bandwidth requirements.
Obviously, you can also pick anything in between the two, reducing the bitrate used a bit and increasing the quality a bit.
Starting to use another video codec though isn’t only about bitrate and quality. It is about understanding its tooling and availability as well:
There’s a lot more to be said about video codecs and how they get used in WebRTC.
For more, you can always enroll in my WebRTC courses.
The post WebRTC video codec generations: Moving from VP8 and H.264 to VP9 and AV1 appeared first on BlogGeek.me.
WebRTC’s peer connection includes a getStats method that provides a variety of low-level statistics. Basic apps don’t really need to worry about these stats but many more advanced WebRTC apps use getStats for passive monitoring and even to make active changes. Extracting meaning from the getStats data is not all that straightforward. Luckily return author […]
The post Power-up getStats for Client Monitoring appeared first on webrtcHacks.
Lip synchronization is a solved problem in WebRTC. That’s at least the case in the naive 1:1 sessions. The challenges start to amount once you hit multiparty architectures or when audio and video get generated/rendered separately.
Let’s dive into the world of lip synchronization, understand how it is implemented in WebRTC and in which use cases we need to deal with the headaches it brings with it.
Table of contentsDiscover the fascinating world of lip synchronization technology and its impact on WebRTC applications.
When you watch a movie or any video clip for that matter on your device – be it a PC display, tablet, smartphone or television – the audio and video that gets played back at you gets lip synced. There’s no “combination” of audio and video. These are two separate data sets / files / streams that are associated with one another in a synchronized fashion.
When you play out an mp4 file for example, it is actually a container file of multiple media streams. Each decoded and played out independently, synchronized again by timing the playout.
This was a decision made long ago that enables more flexibility in encoding technologies – you can use different codecs for the audio and the video of the content, based on your needs and the type of content you have. It also makes more sense since the codecs and technologies for compression audio and video are quite different from one another.
The RTP/RTCP solution to lip synchronizationWhen we’re dealing with WebRTC, we’re using SRTP as the protocol to send our media. SRTP is just the secure variant of RTP which is what I want to focus on here.
RTP is used to send media over the internet. RTCP acts as the control protocol for RTP and is tightly coupled with it.
The solution used for lip synchronization of RTP and RTCP was to rely on timestamps. To make sure we’re all confused though, the smart folks who conjured this solution up, decided to go with different types of timestamps and frequencies (it likely made them feel smart, though there’s probably a real reason I am not aware of that at least made sense at some point in the past).
We’re going to dive together into the charming world of RTP and NTP timestamps and see how together, we can lip sync audio and video in WebRTC.
RTP timestampRTP timestamp is like using “position: relative;” in CSS. We cannot use it to discern the absolute time a packet was sent (and we do not know the receiver’s clock in relation to ours).
What we can do with it, is discern the time that has passed between one RTP timestamp to another.
The slide above is from my Low-level WebRTC protocols course in the RTP lesson. Whenever we send a packet of media over the internet in WebRTC, the RTP header for that packet (be it audio or video) has a timestamp field. This field has 32 bits of data in it (which means it can’t be absolute in a meaningful way – not enough bits).
WebRTC picks the starting point for the RTP timestamps randomly, and from there it increases the value based on the frequency of the codec. Why the frequency of the codec and not something saner like “milliseconds” or similar? Because.
For audio, we increment the RTP timestamp by 48,000 every second for the Opus voice codec. For video, we increment it by 90,000 every second.
The headache we’re left dealing with here?
We said RTP timestamp is relative? Then NTP timestamp is like using “position: absolute;” in CSS. It gives us the wallclock time. It is 64 bits of data, which means we don’t want to send it as much over the network.
Oh, and it covers 1900-2036 after which it wraps around (expect a few minor bugs a decade from now because of this). This is slightly different from the more common Unix 1970 startpoint timestamp.
The slide above is from my Higher-level WebRTC protocols course in the Inside RTCP lesson.
You can see that when an RTCP SR block is sent over the network (let’s assume once every 5 seconds), then we get to learn about the NTP timestamp of the sender, as well as the RTP timestamp associated with it.
In a way,we can “sync” between any given RTP timestamp we bump into with the NTP/RTP timestamp pair we receive for that stream in a RTCP SR.
What are we going to use this for?
Let’s sum this part up:
Easy peasy. Until it isn’t.
👉 RTP, RTCP and other protocols are covered in our WebRTC Protocols courses. If you want to dig deeper into WebRTC or just to upskill yourself, check out webrtccourse.com
When lip synchronization breaks in WebRTCRTP/RTCP gives us the mechanism to maintain lip synchronization. And WebRTC already makes use of it. So why and how can WebRTC lose lip synchronization?
There are three main reasons for this to happen:
I’d like to tackle that from the perspective of the use cases. There are a few that are more prone than others to lip synchronization issues in WebRTC.
Group video conferencesIn group video conferencing there are no lip synchronization issues. At least not if you design and develop it properly and make sure that you either use the SFU model or the MCU model.
Some implementations decide to decouple voice and video streams, handling them separately and in different architectural solutions:
The diagram above shows what that means. Take a voice conferencing vendor that decided to add video capabilities:
In such cases, I often hear the explanation of “this is quite synchronized. It only loses sync when the network is poor”. Well… when the network is poor is when users complain. And adding this to their list of complaints won’t help. Especially if you want to be on par with the competition.
💡 What to do in this case? Go all in for SFU or all in for MCU – at least when it comes to the avoidance of splitting the audio and video processing paths.
Cloud renderingThe other big architectural headache for lip synchronization is cloud rendering. This is when the actual audio and/or video gets rendered and not acquired from a camera/microphone on some browser or mobile device.
In cloud gaming, for example, a game gets played, processed and rendered on a server in the cloud. Since this isn’t done in the web browser, the WebRTC stack used there needs to be aware of the exact timing of the audio and video frames – prior to them being encoded. This information should then be translated to the NTP+RTP timestamps that WebRTC needs. Not too hard, but just another headache to deal with.
For many cases of cloud gaming, we might even prioritize latency over lip synchronization, playing whatever we have when we get it as much as possible over having audio (or video) wait up for the other media type. That’s because in cloud games, a few milliseconds can be the difference between winning and game over.
When we’re dealing with our brave new world of conversational AI, now powered by LLM and WebRTC, then the video will usually follow the rendering of the audio, and might be done on a totally different machine. At the very least, it will occur using a different set of processes and algorithms.
💡 Here, it is critical to understand and figure out how to handle the NTP and RTP timestamps to get proper lip synchronization.
Latency and peripherals (and their effect on lip synchronization)Something I learned a bit later in my life when dealing with video conferencing is that the devices you use (the peripherals) have their own built in latency.
The sad thing here is that there’s NOTHING you can do about it. Remember that this is the user’s display or headset we’re talking about – you can’t tell them to buy something else.
On top of this, you have software device drivers that do noise reduction on the audio or add silly hats on the video (or replace the video altogether). These take their own sweet time to process the data and to add their own inherent latency into the whole media pipeline.
Device drivers on the operating system level should take care of this lag and this need to be factored into your lip synchronization logic – otherwise, you are bound to get issues here.
Got lip synchronization issues in your WebRTC application?Lip synchronization is one of these nasty things that can negatively impact the perception of media quality in WebRTC applications. Solving it requires reviewing the architecture, sniffing the network, and playing around with the code to figure out the root cause prior to doing any actual fixing.
I’ve assisted a few clients in this area over the years, trying together to figure out what went wrong and working out suitable solutions around this.
The post Lip synchronization and WebRTC applications appeared first on BlogGeek.me.
Explore the concept of WebRTC latency and its impact on real-time communication. Discover techniques to minimize latency and optimize your application.
WebRTC is about real time. Real time means low latency, low delay, low round trip – whatever metric you want to relate to (they are all roughly the same).
Time to look closer at latency and how you can reduce it in your WebRTC application.
Table of contentsLet’s do this one short and sweet:
Latency sometimes gets confused with round trip time. Let’s put things in order quickly here so we can move on:
Need more?
👉 I’ve written a longer form post on Cyara’s blog – What is Round-trip Time and How Does it Relate to Network Latency
👉 Round trip time (RTT) is one of the 7 top WebRTC video quality metrics
Latency isn’t good for your WebRTC healthWhen it comes to WebRTC and communications in general, latency isn’t a good thing. The higher it is, the worse off you are in terms of media quality and user experience.
That’s because interactivity requires low latency – it needs the ability to respond quickly to what is being communicated.
Here are a few “truths” to be aware of:
👉 One of the main things you should monitor and strive to lower is latency. This is usually done by looking at the round trip time metrics (which is what we can measure better than latency).
What are we measuring when it comes to latency?When you say “latency” – what do you mean exactly?
Latency starts with defining what part of the session are we measuring
And within that definition, there might be multiple pieces of processing in the pipeline that we’d want to measure individually. Usually we’d want to do that to decide where to focus our energies in optimizing and reducing the latency.
Here are two recent posts that talk about latency in the WebRTC-LLM space:
👉 You can decide to improve latency of the same use case, and take very different routes in how you end up doing that.
Different use cases deal with latency differentlyLatency is tricky. There are certain physical limits we can’t overcome – the most notable one used as an example is the speed of light: trying to pass a message from one side of the globe to the other will take considerable milliseconds no matter what you do, even not accounting for the need to process the data along its route.
Each use case or scenario has different ways to deal with these latencies. From defining what a low enough value is, through where in the processing pipeline to focus on optimizations, to the techniques to use to optimize latency.
Here are a few industries/markets where we can see how these requirements vary.
👉 Interested in the Programmable Video market, where vendors take care of your latency and use case? Check out my latest report: Video APIs: market status
ConferencingVideo conferencing has a set of unique challenges:
💡 Latency in conferencing? Below 200 milliseconds you’re doing great. 400 or 500 milliseconds is too high, but can be lived with if one must (though grudgingly).
StreamingStreaming is more lenient than video conferencing. We’re used to seconds of latency for streaming. You click on Netflix to start a movie and it can take a goodly couple of seconds at times. Nagging? Yes. Something to cancel the service for? No.
That said, we are moving towards live streaming, where we need more interactivity. From auctions, to sports and betting, to webinars and other use cases. Here are a few of the challenges seen here:
💡 For live streaming? 500 milliseconds is great. 1-2 seconds is good, depending on the scenario.
GamingGaming has a multitude of scenarios where WebRTC is used. What I want to focus on here is the one of having the game rendered by a cloud server and played “remotely” on a device.
The games here vary as well (which is critical). These can be casual games, board games (turn by turn), retro games, high end games, first person shooters, …
Often, these games have a high level of interaction that needs to be real time. Online gamers would pick an ISP, equipment and configuration that lowers their latency for games – just in order to get a bit more reaction time to improve their performance and score in the game. And this has nothing to do with rendering the whole game in the cloud – just about passing game state (which is smaller). Here’s an example of an article by CenturyLink for gamers on latency on their network. Lots of similar articles out there.
Cloud gaming, where the game gets rendered on the server in full and the video frames are sent via WebRTC over the network? That requires low latency to be able to play properly.
💡 In cloud gaming 50-60 milliseconds latency will be tolerable. Above that? Game over. Oh, and if you play against someone with 30 milliseconds? You’re still dead at 50 milliseconds. The lower the better at any number of milliseconds
Conversational AIConversational AI is a hot topic these days. Voice bots, LLM, Generative AI. Lots of exciting new technologies. I’ve covered LLM and WebRTC recently, so I’ll skip the topic here.
Suffice to say – conversational AI requires the same latencies as conferencing, but brings with it a slew of new challenges by the added processing time needed in the media pipeline of the voice bot itself – the machine that needs to listen and then generate responses.
I know it isn’t a fare comparison to latencies in conferencing (because there we don’t add it the human participant time or even the time it takes him to understand what is being sent his way, but at the moment, the response time of most voice bots is too slow for high levels of interaction).
💡 In conversational AI, the industry is striving to reach sub 500 milliseconds latencies. Being able to get to 200-300 milliseconds will be a dream come true.
Reducing latency in WebRTCDifferent use cases have different latency requirements. They also have different architecture and infrastructure. This leads to the simple truth that there’s no single way to reduce latency in WebRTC. It depends on the use case and the way you’ve architected your application that will decide what needs to be done to reduce the latency in it.
If you split the media processing pipeline in WebRTC to its coarse components, it makes it a bit easier to understand where latency occurs and from there, to decide where to focus your attention to optimize it.
Browsers and latency reductionWhen handling WebRTC in browsers there’s not much you can do on the browser side to reduce latency. That’s because the browser controls and owns the whole media processing stack for you.
There are still areas where you and and should take a look at what you’re doing. Here are a few questions you should ask yourself:
The most important thing in the browser is going to be the collection of latency related measurements. Ones you can use later on for monitoring and optimizing it. These would be rtt, jitter and packet loss that we mentioned earlier.
Mobile and latency reductionMobile applications, desktop applications, embedded applications. Any device side application that doesn’t run on a browser is something where you have more control of.
This means there’s more room for you to optimize latency. That said, it usually requires specialized expertise and more resources than many would be willing to invest.
Places to look at here?
When taking this route, also remember that most optimizations here are going to be device and operating system specific. This means you’ll have your hands full with platforms to optimize.
Infrastructure latency reductionThis is the network latency that most of the rtt metric in WebRTC statistics come from.
Where your infrastructure is versus the users has a huge impact on the latency.
The example I almost always use? Two users in France connected via a media or TURN server in the US.
Figuring out where your users are, what ISPs they are using, where to place your own servers, through which carriers to connect them to the users, how to connect your servers to one another when needed – all these are things you can optimize.
For starters, look at where you host your TURN servers and media servers. Compare that to where your users are coming from. Make sure the two are aligned. Also make sure the servers allocated for users are the ones closest to them (closest in terms of latency – not necessarily geography).
See if you need to deploy your infrastructure in more locations.
Rinse and repeat – as your service grows – you may need to change focus and infrastructure locations.
Other areas of improvement here are using Anycast or network acceleration that is offered by most large IaaS vendors today (at higher network delivery prices).
Media server processing and latenciesThen there are the media servers themselves. Most services need them.
Media servers are mainly the SFUs and MCUs that take care of routing and mixing media. There are also gateways of many shapes and sizes.
These media servers process media and have their own internal media processing pipelines. As with other pipelines, they have inherent latencies built into them.
Reducing that latency will reduce the end to end latency of the application.
The brave (new) world of generative AI, conversation AI and… LLMsRemember where we started here? Me discussing latency because WebRTC-LLM use cases had to focus on reducing latency in their own pipeline.
This got the industry looking at latency again, trying to figure out how and where you can shave a few more milliseconds along the way.
Frankly? This needs to be done throughout the pipeline – from the device, through the infrastructure and the media servers and definitely within the TTS-LLM-STT pipeline itself. This is going to be an ongoing effort for the coming year or two I believe.
Know your latency in WebRTCWe can’t optimize what we don’t measure.
The first step here is going to be measurements.
Here are some suggestions:
Did I mention that testRTC has some of the tools you’ll need to set up these environments? 😉And if you need assistance with this process, you know where to find me.
The post Reducing latency in WebRTC appeared first on BlogGeek.me.
Learn about WebRTC LLM and its applications. Discover how this technology can improve real-time communication using conversational AI.
Talk about an SEO-rich title… anyways. When Philipp suggests something to write about I usually take note and write about it. So it is time for a teardown of last month’s demo by OpenAI – what place WebRTC takes there, how it affects the programmable video market of Video APIs.
I’ve been dragged into this discussion before. In my monthly recorded conversation with Arin Sime, we talked about LLMs and WebRTC:
Time to break down the OpenAI demo that was shared last month and what role WebRTC and its ecosystem plays in it.
Table of contentsJust to be on the same page, watch the demo below – it is short and to the point:
(for the full announcement demos video check out this link. You really should watch it all)
There were several interfaces shown (and not shown) in these demos:
Besides the interface used, there were 3 important aspects mentioned, explained and shown:
Let’s see why this is different from what we’ve seen so far, and what is needed to build such things.
Text be like…ChatGPT started off as text prompting.
You write something in the prompt, and ChatGPT obligingly answers.
It does so with a nice “animation”, spewing the words out a few at a time. Is that due to how it works, or does it slow down the animation versus how it works? Who knows?
This gives a nice feel of a conversation – as if it is processing and thinking about what to answer, making up the sentences as it goes along (which to some extent it does).
This quaint prompting approach works well for text. A bit less for voice.
And now that ChatGPT added voice, things are getting trickier.
“Traditional” voice bots are like turn based gamesBefore all the LLM craze and ChatGPT, we had voice bots. The acronyms at the time were NLP and NLU (Natural Language Processing and Natural Language Understanding). The result was like a board game where each side has its turn – the customer and the machine.
The customer asks something. The bot replies. The customer says something more. Oh – now’s the bot’s turn to figure out what was said and respond.
In a way, it felt/feels like navigating the IVR menus via voice commands that are a bit more natural.
The turn by turn nature means there was always enough time.
You could wait until you heard silence from the user (known as endpointing). Then start your speech to text process. Then run the understanding piece to figure out intents. Then decide what to reply and turn it into text and from there to speech, preferably with punctuation, and then ship it back.
The pieces in red can easily be broken down into more logic blocks (and they usually are). For the purpose of discussing the real time nature of it all, I’ve “simplified” it into the basic STT-NLU-TTS
To build bots, we focused on each task one at a time. Trying to make that task work in the best way possible, and then move the output of that task to the next one in the pipeline.
If that takes a second or two – great!
But it isn’t what we want or need anymore. Turn based conversations are arduous and tiring.
Realtime LLMs are like… real-time gamesHere are the 4 things that struck a chord with me when GPT-4o was introduced from the announcement itself:
Then there was the fact that the person in the demo cuts GPT-4o short in mid-sentence and actually gets a response back without waiting until the end.
There’s more flexibility here as well. Less to learn about what needs to be said to “strike” specific intents.
Moving from turn based voice bots to real-time voice bots is no easy feat. It is also what’s in our future if we wish these bots to become commonplace.
Real life and conversational botsThe demo was quite compelling. In a way, jaw dropping.
There were a few things there that were either emphasized or skimmed through quickly that show off capabilities that if arrive in the product once it launches are going to make a huge difference in the industry.
Here are the ones that resonated with me
There are quite a few topics that still need to be addressed. OpenAI and ChatGPT have made huge strides and this is another big step. But it is far from the last one.
We will know more on how this plays out in real life once we get people using it and writing about their own experiences – outside of a controlled demo at a launch event.
Working on the WebRTC and LLM infrastructureIn our domain of communication platforms and infrastructure, there are a few notable vendors that are actively working on fusing WebRTC with LLMs. This definitely isn’t an exhaustive list. It includes:
They are taking slightly different approaches, which makes it all the more interesting.
Before we start, let’s take the diagram from above of voicebots and rename the NLU piece into LLM, following marketing hype as it is today:
The main difference now is that LLM is like pure black magic: We throw corpuses of text into it, the more the merrier. We then sprinkle a bit of our own knowledge base and domain expertise. And voila! We expect it to work flawlessly.
Why? Because OpenAI makes it seem so easy to do…
Programmable Video and Video APIs doing LLMIn our domain of programmable video, what we see are vendors trying to figure out the connectors that make up the WebRTC-LLM pipeline and doing that at as low latency as possible.
Agora
Agora just published a nice post about the impact of latency on conversational AI.
The post covers two areas:
In a way, they focus on the WebRTC-realm of the problem, ignoring (or at least not saying anything about) the AI/LLM-realm of the problem.
It should be said that this piece is important and critical in WebRTC no matter if you are using LLMs or just doing a plain meeting between mere humans.
Daily
Daily take their unique approach for LLM the same way they do for other areas. They offer a kind of a Prebuild solution. They bring in partners and integrations and optimize them for low latency.
In a recent post they discuss the creation of the fastest voice bot.
For Daily, WebRTC is the choice to go for since it is already real time in nature. Sprinkle on top of it some of the Daily infrastructure (for low latency). And add the new components that are not part of a typical WebRTC infrastructure. In this case, packing Deepgram’s STT and TTS along with Meta’s Llama 3.
The concept here is to place STT-LLM-TTS blocks together in the same container so that the message passing between them doesn’t happen over a network or an external API. This reduces latencies further.
Go read it. They also have a nice table with the latency consumers along the whole pipeline in a more detailed breakdown than my diagrams here.
LiveKit
In January this year, LiveKit introduced the LiveKit Agents. Components used to build conversational AI applications. They haven’t spoken since about this on their blog, or about latency.
That said, it is known that OpenAI is using LiveKit for their conversational AI. So whatever worries OpenAI has about latencies are likely known to LiveKit…
LiveKit has been lucky to score such a high profile customer in this domain, giving it credibility in this space that is hard to achieve otherwise.
Twilio’s approach to LLMsTwilio took a different route when it comes to LLM.
Ever since its acquisition of Segment, Twilio has been pivoting or diversifying. From communications and real time into personalization and storage. I’ve written about it somewhat when Twilio announced sunsetting Programmable Video.
This makes the announcement a few months back quite reasonable: Twilio AI Assistant
This solution, in developer preview, focuses on fusing the Segment data on a customer with the communication channel of Twilio’s CPaaS. There’s little here in the form of latency or real time conversations. That seems to be secondary for Twilio at the moment, but is also something they are likely now exploring as well due to OpenAI’s announcement of GPT-4o.
For Twilio? Memory and personalization is what is important about the LLM piece. And this is likely highly important to their customer base. How will other vendors without access to something like Segment are going to deal with it is yet to be seen.
Fixie anyone?When you give Philipp Hancke to review an article, he has good tips. This time it meant I couldn’t make this one complete without talking about fixie.ai. For a company that raised $17M they don’t have much of a website.
Fixie is important because of 3 things:
Fixie is working on Ultravox, an open source platform that is meant to offer a speech-to-speech model. No more need for STT and TTS components. Or breaking these into smaller pieces yet.
From the website, it seems that their focus at the moment is modeling speech directly into LLM, avoiding the need to go through text to speech. The reasoning behind this approach is twofold:
The second part of it, of converting the result of the LLM back into speech, is not there yet.
Why is that interesting?
There are a lot more topics to cover around WebRTC and LLM. Rob Pickering looks at scaling these solutions for example. Or how do you deal with punctuations, pauses and other phenomena of human conversations.
With every step we make along this route, we find a few more challenges we need to crack and solve. We’re not there yet, but we definitely stumbled upon a route that seems really promising.
The post OpenAI, LLMs, WebRTC, voice bots and Programmable Video appeared first on BlogGeek.me.
Get your copy of my ebook on the top 7 video quality metrics and KPIs in WebRTC (below).
I’ve been dealing with VoIP ever since I finished my first degree in computer science. That was… a very long time ago.
WebRTC? Been at it since the start. I co-founded testRTC, dealing with testing and monitoring WebRTC applications. Did consulting. Wrote a lot about it.
For the last two years I’ve been meaning to write a short ebook explaining video quality metrics in WebRTC. And I finally did that 😎
The challenges of measuring video qualityEver since we started testRTC, customers came to us asking for a quality score to fit their video application. But where do you even begin?
Deciding what’s good or bad is a personal decision that needs to be made by each and every company for its applications. Sometimes, differently per scenario used.
Where do we even start then?
Packet loss and latency aren’t enoughIf I had to choose two main characteristics of media quality in real time communications, these were going to be packet loss and latency.
Packet loss tells you how bad the network conditions are (at least most of the time this is what it is meant to do). Your goal would be to reduce packet loss as much as possible (don’t expect to fully eradicate it).
Latency indicates how far the users are from your infrastructure or from each other. Shrinking this improves quality.
But that’s not enough. There’s more to it than these two metrics to be able to get a better picture of your application’s media quality – especially when dealing with video streams.
Know your top 7 video quality metrics in WebRTCWhich is why I invite you to download and review the top 7 video quality metrics in WebRTC – my new ebook which lists the most important KPIs when it comes to understanding video quality in WebRTC. There you will find an explanation of these metrics, along with my suggestions on what to do about them in order to improve your application’s video quality.
And yes – the ebook is free to download and read – once you jot down your name and email, it will be sent to you directly.
The post Video quality metrics you should track in WebRTC applications appeared first on BlogGeek.me.
Discover the hidden dangers of packet loss and its impact on your WebRTC application. Find out how to optimize your network performance and minimize packet loss.
If there’s one thing that can give you better media quality in WebRTC it is going to be the reduction (or elimination?) of packet loss. Nothing else will be as effective as this.
What I want to do here, is to explain packet loss, what it is inevitable, and the many ways we have at our disposal to increase the resilience and quality of our media in WebRTC in the face of packet losses.
Table of contentsThere are many reasons for packet losses to occur on modern networks and with WebRTC. To count a few of these:
We think of the internet as a reliable network. You direct a browser to a web page. And magically the page loads. If it doesn’t, then the network or server is down. End of story. That’s because packet losses there are handled by retransmitting what is lost. The cost? You wait a wee bit longer for your page to load.
With WebRTC we are dealing with real time communications. So if something gets lost there is little time to fix that.
👉 Packet losses are a huge headache for WebRTC applications
What to do to overcome packet losses?Packet loss is an inevitability when it comes to WebRTC and VoIP in general. You can’t really avoid them. The question then becomes what can we do about this?
There are four different approaches here that can be combined for a better user experience:
From here on, let’s review each one of these four approaches.
Have less packet lossesThis is the most important solution.
Because I don’t want you to miss this, I’ll write this again:
This is the most important solution.
If there is less packet loss, there is going to be less headache to deal with when trying to “fix” this situation. So reducing packet loss should be your primary objective. Since you can’t fully eradicate packet loss, we will still need to use other techniques. But it starts with reducing the amount of packet losses.
Location of infrastructure elements in WebRTCWhere you place your media servers and TURN servers and how you route traffic for your WebRTC service will have a huge impact on packet loss.
Best practice today is having the first server that WebRTC media hits as close to the user as possible. The understanding behind that is that this reduces the number of hops and network infrastructure components that the media packets need to traverse over the open internet. Once on your server, you have a lot more control over how that data gets processed and forwarded between the servers.
Having a single data center in the US cater for all your traffic is great. Assuming your users are from that region – once users start joining from across the pond – say… France. Or India. You will start seeing higher latencies and with it higher levels of packet loss.
A few things here:
Where to start?
👉 Know the latency (RTT) of your users. Monitor it. Strive towards improving it
👉 Check if there are locations and users that are routed across regions. Beef up your infrastructure in the relevant regions based on this data
👉 Since we want to reduce packet loss, you should also monitor… packet loss
Better bandwidth estimationI should have called this better bandwidth management, but for SEO reasons, kept it bandwidth estimation 😉
Here’s the thing:
Sending more than the network can handle, the sender can send or the receiver can receive leads to packet loss and packet drops.
Fixing that boils down to bandwidth management – you don’t want to send too little since media quality will be lower than what you can achieve. And you don’t want to send too much since… well… packet loss.
Your service needs to be able to estimate bandwidth. That needs to happen on both the uplink and the downlink for each user.
The challenge is that available bandwidth is dynamic in nature. At each point in time, we need to estimate it. If we overshoot – packets are going to be delayed or lost. If we undershoot, we are going to reduce media quality below what we can achieve.
Web browser implementations of WebRTC have their own bandwidth management algorithms and they are rather good. Media servers have different implementations and their quality varies.
For media servers, we also need to remember that we aren’t dealing only with bandwidth estimation but rather with bandwidth management. Once we approximately know the available bandwidth, we need to decide which of the streams to send over the connection and at which bitrates; doing that while seeing the bigger picture of the session (hence bandwidth management and not estimation).
Conceal packet losses (PLC)Packet loss concealment is what we do after the fact. We lost packets, but we need to play out something for the user. What should we do to conceal the problem of packet loss?
This may seem like the last thing to deal with, but it is the first we need to tackle. There are two reasons why:
Audio and video are different, which is why from here on, we will distinguish between the two in the techniques we are going to use.
Audio and packet loss concealmentWith audio, a loss of an audio packet almost always translates immediately to a loss of one or more audio frames (and we usually have 50 audio frames per second).
“Skipping” them doesn’t work so well, as it leads to robotic audio when there’s packet loss.
Other naive approaches here include things like playing back the last frame received – either as is or with a reduction in its volume.
More sophisticated approaches try to estimate what should have been received by way of machine learning (or what we love calling it these days – generative AI). Google has such a capability inhouse (though not inside the open source implementation of WebRTC that they have). If you are interested in learning more about this, you can check out Google’s explanation of WaveNetEQ.
A few things to remember here:
👉 For the most part, this isn’t something in your control, unless you own/compile your WebRTC stack on the device side
👉 Knowing how browsers behave here enables you to be slightly smarter with the other techniques you are going to use (by deciding when to use them and how aggressively)
👉 In your own native application? You can improve on things, but you need to know what you’re doing and you need to have a compelling reason to take this route
Video and packet loss concealment 👉 frame droppingVideo is trickier with packet losses:
One lost packet translates into a lost frame, which can easily cause loss of the whole video sequence:
Packet loss concealment in video means dropping a frame, and oftentimes freezing the video until the next keyframe arrives.
What can the receiver do in case of such a loss? If it believes it won’t recuperate quickly (which is most commonly the case), he can send out a FIR or PLI message over RTCP to the sender. These messages indicate to the sender that there’s a loss that needs to be addressed, where the usual solution is to reset the encoder and send a new keyframe.
In the past, systems used to try and overcome packet losses by continuing to decode without the missing packets. The end result was smearing artifacts on the video until a new keyframe arrived. Today, best practice is to freeze the video until a keyframe arrives (which is what all browser implementations do).
A few things to remember here:
👉 You have more control here than in audio. That’s because a lost packet means you will receive FIR or PLI message on the other end. If that’s your media server receiving these messages, you can decide how to respond
👉 Sending a keyframe means investing more on bitrate for that frame. If there’s congestion over the network, then this will just put more burden. Most media servers would avoid sending too many of these in larger group meetings
👉 There are video coding techniques that reduce the dependencies between frames. These include temporal scalability and SVC
Retransmitting lost packets (RTX)If a packet is missing, then the first solution we can go for is to retransmit it.
The receiver knows what packets it is missing. Once the sender knows about the missing packets (via
NACK messages), it can resend them as RTX packets.
Retransmission is the most economic solution in terms of network resources. It is the least wasteful solution. It is also the hardest to make use of. That’s because it ends up looking something like this:
In order to retransmit, we need to:
This takes time. A long time.
The question then becomes, is it going to be too late to retransmit them.
Video and RTXVideo can make real use of retransmissions (and it does in WebRTC).
With video compression, we have a kind of hierarchy of frames. Some frames are more important than others:
The above illustration, for example, shows how keyframes and temporal scalability build dependency chains. Key denotes the keyframe while L0 has higher usability than L1 frames (L1 frames are dependent on L0 frames and nothing depends on them).
When we have such a dependency tree of frames, we can do some interesting things with resiliency. One of them is deciding if it is worthwhile to ask for a retransmission:
Audio compression doesn’t enjoy the same dependency tree that video compression does. Which is why libwebrtc doesn’t have code to deal with audio RTX.
Would having RTC for audio be useful? It can. Audio packets usually wait for video packets to arrive for lip synchronization purposes. If we can use that wait time to retransmit, then we can improve upon audio quality. Google likely deemed this not important enough.
Correct packet losses in advance (FEC)We could ask for a retransmission after the fact, but what about making sure there’s no need? This is what FEC (Forward Error Correction) is all about.
Think of it this way – if we had one shot at what we want to send and it was super important – would it make sense to send 100 copies of it, knowing that the chances that one of these copies would reach its destination is high?
FEC is about sending more packets that can be used to reconstruct or replace lost packets.
There are different FEC schemes that can be used, with the main 3 of them being:
WebRTC supports duplication and XOR out of the box.
The biggest hurdle of FEC is its use of bitrate – it is quite network hungry in that regard.
Audio FECAudio FEC comes in two different manners:
In-band FEC is implemented as part of the Opus codec library. It is ok’ish at best – nothing to write home about.
Then there’s RED – Redundancy Encoding – where each audio packet holds more than a single audio frame. And the ones it holds are just slightly older frames, so that if a packet is lost, we get it in another packet.
RED is implemented in libwebrtc. Support is limited to 1 level of redundancy for RED (meaning recovering up to one sequential lost packet). You can use WebRTC’s Insertable Streams mechanism to generate RED packets at higher redundancy or dynamic redundancy in the browser though.
In the above, Philipp Hancke explains RED (along with other resiliency features for audio in WebRTC).
Video FECFEC for video is considered wasteful. If we need to increase bitrate by 20% or more to introduce robustness using FEC, then it comes at a cost of video quality that we could increase by using higher video bitrate.
For the most part, WebRTC ignores FEC for video, which is a shame. When using temporal scalability or SVC, the same way that we can decide to retransmit only important packets, we can also decide to only add FEC protection only to more important frames.
Wrapping it all upDealing with packet loss in WebRTC isn’t a simple task. It gets more complex over time, as more techniques and optimizations are bolted on to the implementation. What I want to do here is to list the various tools at our disposal to deal with packet losses. When and how we decide to use them would determine the resulting robustness and media quality of the implementation.
Here’s a quick table to sum things up a bit:
PLCRTXFECFocusWhat to playback to the userWhen to ask for missing packetsWhen to send duplicated packetsAdvantagesNone. You must have this logic implementedLow network footprintLow latency overheadChallengesAudio may sound roboticVideo will freezeIncreases latency. Might not be usable due to itHigh network footprint. Can be quite wastefulAudioDuplicate last frames or reduce volumeUse Gen AI to estimate what was lostNot commonly used for audio in WebRTCFlexFEC used by WebRTCCan use RED if you want toVideoSkip video framesAsk for a fresh keyframe to reset the video streamCan be optimized to retransmit packets of important frames onlyNot commonly used for video in WebRTCOh – and make sure you first put an effort to reduce the amount of packet losses before starting to deal with how to overcome packet losses that occur…
Learn more about WebRTC (and everything about it)Packet loss is one of the topics you need to deal with when writing WebRTC applications. There are many aspects affecting media quality – packet loss is but one of them. This time, we looked into the tools available in WebRTC for dealing with packet losses.
To learn more about media processing and everything else related to WebRTC, check out these services:
And if what you want is to test, monitor, optimize and improve the performance of your WebRTC application, then I’d suggest checking out testRTC.
The post Fixing packet loss in WebRTC appeared first on BlogGeek.me.
Getting HEVC and WebRTC to work together is tricky and time consuming. Lets see what the advantages are and if this is worth your time or not.
Does HEVC & WebRTC make a perfect match, or a match at all???
WebRTC is open source, open standard, royalty free, …
HEVC is royalty bearing, made by committee, expensive
And yet… we do see areas where WebRTC and HEVC mix rather well. Here’s what I want to cover this time:
Table of contentsDigging here in my blog, you can find articles discussing the WebRTC codec wars dating as early as 2012.
Prior to WebRTC, most useful audio and video codecs were royalty bearing. Companies issued patents related to media compression and then got the techniques covered by their patents integrated into codec standards, usually, under the umbrella of a standardization organization.
The logic was simple: companies and research institutes need to make a profit out of their effort, otherwise, there would be no high quality codecs. That was before the internet as we know it…
Once websites such as YouTube appeared, and UGC (User Generated Content) became a thing, this started to shift:
The new business models broke in one way or another the notion of royalty bearing codecs. Or at least tried to break. There were solutions of sorts – smartphones had hardware encoders prepaid for, decoder licenses required no payments, etc.
But that didn’t fit something symmetric like WebRTC.
When WebRTC was introduced, the codec wars began – which codecs should be supported in WebRTC?
The early days leaned towards royalty free codecs – VP8 for video and Opus for voice. At some point, we ended up with H.264 as well…
How H.264 wiggled its way into WebRTCH.264 is royalty bearing. But it still found its way into WebRTC that was due to Cisco in a large part – they decided to contribute their encoder implementation of H.264 and pay the royalties on it (they likely already paid up to the cap needed anyways). That opened a weird technical solution to be concocted to make room for H.264 and allow it in WebRTC:
Why? Because lawyers. Or something.
It worked for browsers. But not on mobile, where the solution was to use the hardware encoder on the device, that doesn’t always exist and doesn’t always work as advertised. And it left a gaping headache for native developers that wanted to use H.264. But who cared? Those who wanted to make a decision for WebRTC and move on – got it.
That made certain that at some point in the future, the H.264 royalty bearing crowd would come back asking for more. They’d be asking for HEVC.
HEVC, patents and big 💰HEVC is a patents minefile, or at least were – I admit I haven’t been following up on this too closely for a few years now.
Here are two slides I have in my architecture course:
There are a gazillion patents related to HEVC (not that many, but 5 figures). They are owned by a lot of companies and get aggregated by multiple patent pools. Some of them are said to be trickling into VP9 and AV1, though for the time being, most of the market and vendors ignore that.
These patents make including HEVC in applications a pain – you need to figure out where to get the implementation of HEVC and who pays for its patents. With regard to WebRTC:
Oh, and there’s no “easy” cap to reach as there is/were with H.264 when it was included in WebRTC and paid for by Cisco.
HEVC is expensive, with a lot of vendors waiting to be paid for their efforts.
HEVC hardwareSoftware codecs and royalty payments are tricky. Why? Because it opens up the can of worms above, about who is paying. Hardware codecs are different in nature – the one paying for them is either the hardware acceleration vendor or the device manufacturer.
This means that hardware acceleration of codecs has two huge benefits – not only one:
This is likely why Apple decided to go all in with HEVC from iPhone 8 and on – it gave them an edge that Android phones couldn’t easily solve:
This gap for Android devices was a nice barrier for many years that kept Apple devices ahead. Apple could “easily” pay the HEVC royalties while Android vendors try to figure out how to get this done.
Today?
We have Intel and Apple hardware supporting HEVC. Other chipset vendors as well. Some Android devices. Not all of them. And many just do decoding but not encoding.
For the most part, the HEVC hardware support on devices is a swiss cheese with more holes than cheese in it. Which is why many focus on HEVC support in Apple devices only today (if at all).
Advantages of HEVC in WebRTCWhen it comes to video codecs, there are different generations of codecs. In the context of WebRTC, this is what it looks like:
There are two axes to look at in the illustration above
If we move from the VP8 and H.264 to the next generation of VP9 and HEVC, we’re improving on the media quality for the same bitrate. The challenge though is the complexity and performance associated with it.
To deal with the increased compute, a common solution is to use hardware acceleration. This doesn’t exist that much for VP9 but is more prevalent in HEVC. That’s especially true since ALL Apple devices have HEVC support in them – at least when using WebRTC in Safari.
The other reason for using HEVC is media processing outside of WebRTC. Streaming and broadcasting services have traditionally been using royalty bearing video codecs. They are slowly moving now from H.264 to HEVC. This shift means that a lot of media sources are going to have available in them either H.264 or HEVC as the video codec – a lot less common will be VP8 or VP9. This being the case, vendors would rather use HEVC than go for VP9 and deal with transcoding – their other alternative is going to stick to using H.264.
So, why use HEVC?
HEVC requires royalty payments in a minefield of organizations and companies.
Apple already committed itself fully to HEVC, but Google and the rest of the WebRTC industry haven’t.
Google will be supporting HEVC in Chrome for WebRTC only as a decoder and only if there’s hardware accelerator available – no software implementation. Google’s “official” stance on the matter can be found in the Chrome issues tracker.
So if you are going to support HEVC, this is where you’ll find it:
Then there is AV1. A video codec years in the making. Royalty free. With a new non-profit industry consortium behind it, with all the who’s who:
The specification is ready. The software implementation already exists inside libwebrtc. Hardware acceleration is on its way. And compression results are better than HEVC. What’s not to like here?
This makes the challenge extra hard these days –
Should you invest and adopt HEVC, or start investing and adopting AV1 instead?
Adopt VP9? Wait for AV1?
Where can you fit HEVC and WebRTC?Let’s see where there is room today to use HEVC. From here, you can figure out if it is worth the effort for your use case.
The Apple opportunity of WebRTC and HEVCWhy invest now in HEVC? Probably because HEVC is available on Apple devices. Mainly the iPhone. Likely for very specific and narrow use cases.
For a use case that needs to work there, there might be some reasoning behind using HEVC. It would work best there today with the hardware acceleration that Apple pampered us with for HEVC. It will be really hard or even impossible to achieve similar video quality in any other way on an iPhone today.
Doing this brings with it differentiation and uniqueness to your solution.
Deciding if this is worth it is a totally different story.
Intel (and other) HEVC hardwareIntel has worked on adding HEVC hardware acceleration to its chipsets. And while at it, they are pushing towards having HEVC implemented in WebRTC on Chrome itself. The reason behind this is a big unknown, or at least something that isn’t explained that much.
If I had to take a stab at it here, it would be the desire of Intel to work closely with Apple. Not sure why, it isn’t as if Intel chipsets are interesting for Apple anymore – they have been using their own chips for their devices for a few years now.
This might be due to some grandiose strategy, or just because a fiefdom (or a business unit or a team) within Intel needs to find things to do, and HEVC is both interesting and can be said to be important. And it is important, but is it important for WebRTC on Intel chipsets? That’s an open question.
Should you invest in HEVC for WebRTC?No. Yes. Maybe. It depends.
When I told Philipp Hancke I am going to write about this topic, he said be sure to write that “it is a bit late to invest in HEVC in 2024”.
I think this is more nuanced than this.
It starts with the question how much energy and resources do you have and can you spend them on both HEVC and AV1. If you can’t then you need to choose only one of them or none of them.
Investing in HEVC means figuring out how the end result will differentiate your service enough or give it an advantage with certain types of users that would make your service irresistible (or usable).
For the most part, a lot of the WebRTC applications are going to ignore and skip HEVC support. This means there might be an opportunity to shine here by supporting it. Or it might be wasted effort. Depending how you look at these things.
Learn more about WebRTC (and everything about it)Which codecs are available, which ones to use, how is that going to affect other parts of your application, how should you architect your solutions, can you keep up with the changes coming to WebRTC?
These and many other questions are being asked on a daily basis around the world by people who deal with WebRTC. I get these questions in many of my own meetings with people.
If you need assistance with answering them, then you may want to check out these services that I offer:
The post WebRTC & HEVC – how can you get these two to work together appeared first on BlogGeek.me.
GStreamer is one of the oldest and most established libraries for handling media. As a core media handling element in Linux and WebKit that as launched near the turn of the century, it is not surprising that many early WebRTC projects use various pieces of it. Today, GStreamer has expanded options for helping developers plumb […]
The post WebRTC Plumbing with GStreamer appeared first on webrtcHacks.
From time to time, WebRTC is going to discard media packets. Monitoring such behavior and understanding the reasons is important to optimize media quality.
WebRTC does things in real time. That means that if something takes its sweet time to occur, it will be too late to process it. This boils down to the fact that from time to time, WebRTC will discard media packets, which isn’t a good thing. Why is that going to happen? There are quite a few reasons for it, which is what this article is all about.
Table of contentsI just started a new initiative with Philipp Hancke. We’re publishing an answer to a WebRTC related question once a week (give or take), trying to keep it all below the 2 minutes mark.
We are going to cover topics ranging from media processing, through signaling to NAT traversal. Dealing with client side or server side issues. Or anything else that comes to mind.
👉 Want to be the first to know? Subscribe to the YouTube channel
👉 Got a question you need answered? Let us know
Discarded media packets in WebRTCMedia packets and frames can and are discarded by WebRTC in real life calls. There are even getstats metrics that allow you to track these:
The screenshot above was taken from the RTCInboundRtpStreamStats dictionary of getstats. I marked most of the important metrics we’re interested in for discarding media data.
packetsDiscarded – this field indicates any fields that the jitter buffer decided to discard and ignore because they arrived too early or too late. It relates to audio packets.
framesXXX fields are dealing with video only and look at full frames which can span multiple packets. They get discarded because of a multitude of reasons which we will be dealing with later in this article. For the time being – just know where to find this.
The diagram below is a screenshot taken in testRTC of a real session of a client. Here you can see a spike of 200 packetsDiscarded less than a minute into the call. We’ve recently added in testRTC insights that hunt for such cases (as well as for video frame drops), alerting about these scenarios so that the user doesn’t have to drill down and search for them too much – they now appear front and center to the user.
WebRTC = Real-Time. Timing is everythingWebRTC stands for Web Real Time Communication. The Real Time part of it is critical. It means that things need to happen in… real time… and if they don’t, then the opportunity has already passed. This leads to the eventuality that at times, media packets will need to be discarded simply because they aren’t useful anymore – the opportunity to use them has already passed.
For all that logic to happen, WebRTC uses a protocol called RTP. This protocol is in charge of sending and receiving real time media packets over the network. For that to occur, each RTP packet has two critical fields in its header:
The illustration above is taken from our course Low level WebRTC protocols. In it, you can see these two fields:
The sequence number is just a running counter which can easily be used to order the packets on the receiving end based on the value of the counter. This takes care of any reordering, duplication and packet losses that can occur over modern networks.
The timestamp is used to understand when the media packet was originally generated. It is used when we need to playback this packet. Multiple packets can have the same timestamp for example, when the frame we want to send gets split across packets – something that occurs frequently with video frames.
These two, sequence number and timestamp, are used to deal with the various characteristics of the network. Usually, we deal with the following problems (I am not going to explain them here): jitter, latency, packet loss and reordering.
All of this goodness, and more is handled in WebRTC by what is called a jitter buffer. Here’s a short explainer of how a jitter buffer works:
WebRTC discarding incoming audio packetsThe above video is our first WebRTC Q&A video. We started off with this because it popped up in discuss-webrtc. The question has since been deleted for some reason, but it was a good one.
LatencyThe main reason for discarded audio packets is receiving them too late.
When audio packets are received by WebRTC, it pushes them into its jitter buffer. There, these packets get sorted in their sending order by looking at the sequence number of these packets. When to play them out is then dependent on the timestamp indicated in the packet.
Assuming we already played a newer packet to the user, we will be discarding packets that have a lower (and older) sequence number since their time has already passed.
LipsyncAudio and video packets get played out together. This is due to a lip synchronization mechanism that WebRTC has, where it tries to match timestamps of audio and video streams to make sure there’s lip synchronization.
Here, if the video advanced too much, then you may need to drop some audio packets instead of playing them out in sync with the video (simply because you can’t sync the two anymore).
BugsHere’s another reason why audio packets might end up being discarded by the receiver – bugs in the sender’s implementation…
When the sender doesn’t use the correct timestamp in the packets, or does other “bad” things with the header fields of the RTP packets, you can get to a point when packets get discarded.
👉 Our focus here was on the timestamp because for some arcane reasons, figuring out the timestamp values and their progression in audio (and video) is never a simple task. Audio and video use different frequency clocks when calculating timestamps, done with values that make little sense to those who aren’t dealing with the innards and logic of audio and video encoders. This may easily lead to miscalculations and bugs in timestamp setting
WebRTC discarding outgoing audio packetsThis doesn’t really happen. Or at least WebRTC ignores this option altogether.
How do we know that? Besides looking at the code, we can look at the fields that we have in getstats for this. While we have discarded frames for incoming and outgoing video and discarded incoming audio packets, we don’t have anything of this kind for outgoing audio packets.
These packets are too small and “insignificant” to cause any dropping of them on the sender side. That’s at least the logic…
WebRTC discarding incoming video framesBefore we go into the reasons, let’s understand how video packets are handled in the media processing pipeline of WebRTC. This is partial at best, and specifically focused on what I am trying to convey here:
The above diagram shows the process that video packets go through once they are received, along with the metrics that get updated due to this processing:
👉 The exact places where these metrics might be updated are a wee bit more nuanced. Consider the above just me flailing my hands in the air as an explanation.
This also hints that with video, there are multiple places where things can get dropped and discarded along the pipeline.
The above is another screenshot from testRTC. This time, indicating framesDropped. You can see how throughout the session, quite a few frames got dropped by WebRTC.
Let’s find the potential reasons for such dropped frames..
Latency, lip sync & bugsJust like incoming audio packets, we can get dropped packets and video frames because of much the same reasons.
Latency and lip synchronization may cause the jitter buffer to discard video packets.
And bugs on the sender side can easily cause WebRTC to drop incoming packets here as well.
That said, with video, we have to look at a slightly bigger picture – that of a frame instead of that of a singular packet.
Not all packets of a frame are availableAssume you have a packet dropped. And that packet is part of a frame that is sent over a series of 7 packets. We had 1 packet drop that caused a frame drop, which in turn, caused another 6 packets to be useless to us since we can’t really decode them without the missing packet (we can to some extent, but we usually don’t these days).
Dependency on older framesWith video, unless we’re decoding a keyframe, the frame we need to decode requires a previous frame to be decoded. There are dependencies here since for the most part, we only encode and compress the differences across frames and not the full frame (that would be a keyframe).
What happens then if a frame we need for decoding a fresh frame we just received isn’t available? Here, all packets were received for this new frame, but the frame (and all its packets) will still get dropped. This will be reported in framesDropped.
Not enough CPUWe might not have enough CPU available to decode video. Video is CPU intensive, and if WebRTC understands that it won’t have time to decode the frame, it will simply drop it before decoding it.
But, it might also decode the frame, but then due to CPU issues, miss the time for playout, causing framesRendered not to increment.
WebRTC discarding outgoing video framesWith outgoing media, there is a different dictionary we need to look at in getstats – RTCOutboundRtpStreamStats:
Here, the relevant fields are framesSent and framesEncoded. We should strive to have these two equal to each other.
We know that WebRTC decided to discard frames here if framesEncoded is higher than framesSent. If this happens, then it is bad in a few levels:
On the RTCIceCandidatePairStats dictionary, there’s also packetsDiscardedOnSend metric, which hints to when and why would we lose and discard packets and frames on the sender side:
Total number of packets for this candidate pair that have been discarded due to socket errors, i.e. a socket error occurred when handing the packets to the socket. This might happen due to various reasons, including full buffer or no available memory.
If you’re dropping video frames on the sender side (framesEncoded < framesSent), then in all likelihood the network buffer on the device is full, causing a send failure. Here you should check the resources available on the device – especially memory and CPU – or just understand the network traffic you are dealing with.
Maintaining media quality in WebRTCMedia quality in WebRTC is a lot more than just dealing with bitrates or deciding what to do about packet losses. There are many aspects affecting media quality and they all do it dynamically throughout the session and in parallel to each other.
This time, we looked into why WebRTC discards media packets during calls. We’ve seen that there are many reasons for it.
To learn more about media processing and everything else related to WebRTC, check out these services:
The post Reasons for WebRTC to discard media packets appeared first on BlogGeek.me.
Phosfluorescently utilize future-proof scenarios whereas timely leadership skills. Seamlessly administrate maintainable quality vectors whereas proactive mindshare.
Dramatically plagiarize visionary internal or "organic" sources via process-centric. Compellingly exploit worldwide communities for high standards in growth strategies.
Wow, this most certainly is a great a theme.
Donec sed odio dui. Nulla vitae elit libero, a pharetra augue. Nullam id dolor id nibh ultricies vehicula ut id elit. Integer posuere erat a ante venenatis dapibus posuere velit aliquet.
Donec sed odio dui. Nulla vitae elit libero, a pharetra augue. Nullam id dolor id nibh ultricies vehicula ut id elit. Integer posuere erat a ante venenatis dapibus posuere velit aliquet.