There’s a lot of fuzzing around lately about WebRTC. Which is really about SRTP. Which is really important. But also really misplaced.
Before I BeginThis all started when Google Project Zero, a team tasked with actively searching for zero day bugs (nasty crashes and similar bugs that might be exploited by hackers) set their sights on video conferencing and WebRTC. The end result of it all is a github repository with tools to test RTP streams (and some filed bugs).
A few things to put the house in order:
Now that we’ve cleared the air – let’s check what’s all that fuzz. Shall we?
What Fuzzing meansWikipedia has his to say about Fuzzing:
Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks.
For me, fuzz testing is about the generation of malformed inputs in ways that the developers haven’t anticipated or tested for. This will result undefined behavior, which is largely a nicer word of saying a bug. In some cases, the bug will be an innocent one. In other cases, it can be nasty:
The type of bugs that can be found is endless, which makes for really good FUD (fear, uncertainty, doubt) and lore.
A good malformed input can theoretically be used to grant you administrative access to a machine or to allow you to read memory where you shouldn’t have access to.
A simple explanation can be this: assume your software expects a user’s email to be 40 characters long. Lower than that is obviously fine, but what will happen if you use an email that is longer than 40 characters? Somewhere along the line, there will be a piece of code that should check the length and state that you’ve got it too long. And if there isn’t… well… we’ve reached the realm of undefined and potential security bugs.
The same can happen in network protocols,where whatever you send “on the wire” has a structure of sorts. The machines need structure to be able to parse the data and act upon it. So if you change the data so it is close to the expected structure, but off in just a bit – you might get to that realm of undefined as well.
Fuzzing is trying to get to that place – adding randomness in just the correct places to get to undefined software behavior.
Let me tell you a bedtime storyMY fuzzy life started in Finland, though I’ve never been there (yet).
At Oulu university, one day, a new something called “PROTOS Test Suite” was created. At the time, I was the project manager leading the development and maintenance of RADVISION’s H.323 protocol stack. We’ve licensed it to many vendors around the globe, all using our source code to build VoIP products.
The PROTOS Test-Suite was all about security testing. The intent behind it was to find bugs that cause crashes and other ailments to those using H.323. And they chose the best possible entry point. Here’s how they phrased it:
The purpose of this test-suite is to evaluate implementation level security and robustness of H.225.0 implementations. H.225.0 is a protocol responsible for signalling and setting up H.323 calls. […]
The scope of the test-suite was narrowed to H.225.0 version 4 Setup-PDU. Rationale behind this selection was:
I marked in bold the important parts. Specifically, the guys at Oulu decided to go after the “pick up line” of H.323 and try to come up with nasty Setup messages that will confuse H.323 devices.
And confuse they did. PROTOS has 4497 Setup messages. On my first run with it, probably 50% of them caused our beloved H.323 stack to crash. I spent a week building the software to automate using it and fixing all the nastiness out of it. I admired the work they did and the work they made me do.
PROTOS practically analyzed how the things go on the wire, and devised a set of messages that were bound to get picked by bad programming practices, which we all err on as humans. This isn’t exactly fuzzing in an automated fashion, but it is the “manual” equivalent of it.
This got its own CERT vulnerability note and we had a great time working with our customers on updating our stack and getting these security fixes to work.
I believe some of our customers actually upgraded and updated their systems due to this. I am sure many didn’t. I am also assuming many of our customers’ customers didn’t upgrade their own deployed equipment. And the world continued on. Happily enough.
All this took place in 2004. Before WebRTC. Before the cloud. Before mobile. With practically the same RTP/RTCP protocol and the same techniques and mechanisms in VoIP that we use today in WebRTC.
Why didn’t people look at RTP vulnerabilities at that time? We’ll get to that.
Google’s Project Zero and video conferencingThis year, Google Project Zero decided to look at video conferencing. The “way in” was through WebRTC. Natalie Silvanovich was tasked with this and she wrote a series of 5 posts about it. The first one was about her selection and adventures with WebRTC itself. In it, she writes:
I started by looking at WebRTC signalling, because it is an attack surface that does not require any user interaction. […] WebRTC uses SDP for signalling.
I reviewed the WebRTC SDP parser code, but did not find any bugs. I also compiled it so it would accept an SDP file on the commandline and fuzzed it, but I did not find any bugs through fuzzing either. […]
I then decided to look at how RTP is processed in WebRTC. While RTP is not an interaction-less attack surface because the user usually has to answer the call before RTP traffic is processed, picking up a call is a reasonable action to expect a user to take. […]
Setting up end-to-end fuzzing was fairly time intensive […]
A few things that come to mind here:
Time intensive is important, as this raises the bar to those wishing to exploit such a weakness.
The fact that RTP isn’t the first attack surface and isn’t the first layer of interaction makes it somewhat less obvious on how to exploit it (besides instigating DDoS attacks on devices and servers).
Coupling these two – the complexity and the non-obviousness of an exploit is what kept people from putting the effort into it up until today.
The Fuzzy feelings of our WebRTC industryBen Hawkes, Project Zero team lead tweets on it garnered 3 digit likes and retweets, tapering off in the last 2 posts (I attribute that to fatigue of the subject):
Project Zero blog: "Adventures in Video Conferencing Part 1: The Wild World of WebRTC" by @natashenka – https://t.co/pdtZLDDP9M
— Ben Hawkes (@benhawkes) December 4, 2018
That kind of sharing is an average day for most posts published by that team. A few immediately took the cue and started fuzzing on their own. A notable example is Philipp Hancke who aimed at the Janus media server and fuzzed REMB RTCP messages.
His attack was quite successful due to several reasons:
Probably not.
And let’s face it – in the list of tests that you want to do but don’t do today, fuzzing fits nicely near that end of the things you just never find the time and priority to handle.
The good thing? For most of us, fuzzing is something that “others” should be doing.
If you are using a CPaaS vendor, it is his task to protect his signaling and media servers against such attacks.
If you run on top of the browser… well… those who maintain the WebRTC code for the browser need to do it (and it is Google for the most part at the moment).
You should think about fuzzing in your own application logic and the things that are under your control, but the WebRTC pieces? Going down the rabbit hole of fuzzing RTP and RTCP packets? Not for you.
Your role here is to ask the vendors you work with if they have taken steps in the area of security testing and what exactly have they done there. Fuzzing needs to be one of them things.
Who should care about fuzzing?There’s a shortlist of people that needs to deal with fuzzing.
Fuzzing isn’t the first thing that comes to mind when you set off to build your business.
We are at a point where we are dealing and addressing fuzzing, and at the layers of RTP is what people seem to be doing (at least a bit). We’ve come a long way since we started with WebRTC and it is a good sign.
To Fuzz or not to Fuzz? Where should you spend your energies with WebRTC? If you need help with that, just contact me.
The post All the Truth About the Latest (non)Hype of Fuzzy Testing WebRTC Applications appeared first on BlogGeek.me.
Tribbles Startrek GIF from Tribbles GIFs
Fuzzing is a Quality Assurance and security testing technique that provides unexpected, often random data to a program input to try to break it. Natalie Silvanovich from Google’s Project Zero team has had quite some fun fuzzing various different RTP implementations recently.
She found vulnerabilities in:
In a nutshell, she found a bunch of vulnerabilities just by throwing unexpected input at parsers.
Continue reading Lets get better at fuzzing in 2019 – here’s how at webrtcHacks.
Chrome=The web. Is that a good thing or a bad thing?
I’ve always said that Chrome is almost the only browser we need. Microsoft Edge was always an easy target to mock. And it now seems that Microsoft has thrown the towel on Edge and its technology stack as a differentiating factor and has decided to *gasp* use Chromium as the engine powering whatever comes next.
A long explanation from Microsoft on the move was published on github (more on GitHub later).
What are Browsers made of?I’ll start with a quick explanation of how I see a browser’s architecture. It is going to be rather simplistic and probably somewhat far from the truth, but it will be good enough for us for now.
A browser is built out of two main pieces: the renderer and the runtime engine.
The Renderer deals with displaying HTML pages with their CSS styling. Today, it probably also deals with CSS animation. It is what takes your webpage and renders it into something that can be displayed on the screen.
The Runtime Engine was all about executing JavaScript code inside the browser. It is what makes it interactive in modern browsers. It is usually called JavaScript Engine, but it is already running also WebAssembly, hence my preference in referring it as Runtime.
On top these two pieces sits the browser engine itself, which is later wrapped by the browser.
Who Uses What?That illustration of the browser makeup above? It shows in gray the components that Google uses in Chrome. Each browser vendor picks and chooses its own components.
In the past, we effectively had 3 browsers engines: “Firefox”, “Internet Explorer” and “WebKit”
WebKit was used by both Safari and Chrome. That until 2013 when Google decided to part ways and create Blink – it started by deleting everything it didn’t use out of WebKit and continue from there. In a way, it is a fork of WebKit, to the point that code integrated into WebKit oftentimes comes directly by porting it enmasse from Blink/Chromium (this is how WebRTC is implemented in Safari/WebKit today).
Up until a year ago, we had 4 roughly independent browser engines for the major 4 browsers:
Internet Explorer is all but dead.
Edge was never getting useful market share and now moving to embrace Chromium.
Apple’s Safari… I am not sure how much Apple cares about Safari, and besides, WebKit gets its fare share of code from Google’s Blink project. On top of it all, it runs only on Apple devices, limiting its popularity and use.
In a way, we’re down to two main browser stacks: Google’s and Mozilla’s
Mozilla wrote about the end of the line for EdgeHTML and they are spot on:
If one product like Chromium has enough market share, then it becomes easier for web developers and businesses to decide not to worry if their services and sites work with anything other than Chromium. That’s what happened when Microsoft had a monopoly on browsers in the early 2000s before Firefox was released. And it could happen again.
I’ve tried Firefox and Edge a year or two ago. They worked well enough. But somehow they weren’t Chrome (possibly because I am a heavy user of Google services), so it just made no sense to stick with any of them when Chrome feels too much like “home”.
Is the current state of affair lifts Chromium to the status of Linux? More on that a bit later down this article.
Chrome’s DominanceI’ve taken a snapshot of StatCounter’s desktop browsers market share:
If you are more interested in the numbers than that boring visual line, then here you go:
Chrome with over 72%; IE and Safari at 5%; Edge at 4%.
Firefox has a single digit 9%.
Funnily enough, all non-Chrome browsers are trending downwards. Even Safari which should enjoy growth due to an increase of Mac machines out there (for some unknown reason they are popular with developers these days – go figure).
Even if you ignore the desktop and check mobile only (see here), Chrome gets some 53% versus Safari’s 22%.
Investing in browser development isn’t a simple task. There are several vectors that need to be pursued at all times:
It would be safe to say that Chrome enjoys 100’s of Google employees developing code that goes directly into their Chrome browser.
Where will Microsoft take Edge?Microsoft under the lead of CEO Satya Nadella has shifted towards the cloud and is doubling down on the enterprise. To a big extent, its XBox business is an anomaly in the Microsoft of 2018.
Where once Microsoft was all about Windows and the Office suite, it has shifted towards Office 365 (subscription versus licensing business model for Office) and its Azure cloud. Windows is still there, but its importance and market dominance are a far cry from where it was a decade ago. Microsoft knows that and is making the necessary changes – not to win back the operating system market, but rather to grow its businesses on other core competencies and assets.
Microsoft Edge was an attempt to shed Internet Explorer. Give its browser a complete rewrite and bring something users would enjoy using. That hasn’t turned well. After all the investment in Edge, it had a small market share to show for it, with many of the users switching to Windows 10 opting to switch to Chrome instead of Edge.
This user behavior is surprising to say the least. With a default browser that is good enough (Edge), why would they make the conscious decision of browsing to chrome.com to download and install a different browser that does what Edge does?
Microsoft tried and failed to change this user behavior, which led it to the conclusion that Edge, or at least the innards of Edge are a waste of resources.
Why is opting for Chromium as a browser engine makes sense for Microsoft?
As Microsoft is shifting to the cloud, and Edge focusing on web standards, the end result was that anything and everything that Microsoft invested in for its web based services (Office 365 for example) has to work first and foremost on Chrome – that’s where users are anyway.
Google is using Chrome to drive proprietary initiatives to optimize its services for users and push them as standards later (think SPDY turn HTTP/2, QUIC or its latest Project Stream). It can do it due to its market dominance in browsers and the huge amount of web assets they operate. Microsoft never had that with Edge, so any proprietary initiatives on Microsoft’s part in web technologies was bound to fail.
Microsoft derived no value out of maintaining its own browser technology stack, and investing 100’s of developers on it was an expensive and useless endeavor.
So it went with Chromium.
Chromium brings one more benefit – theoretically, Microsoft can now push its browser to non-Windows 10 devices. Mac and Linux included. And since Microsoft is interested more in Office and Azure than it is in Windows, having an optimized “window” towards Office and Azure in the form of a Chromium-based Microsoft browser that works everywhere made sense.
This also means where Microsoft does want to focus its efforts in the browser – the user interface and experience, as well as in delivering the Microsoft services to customers.
Microsoft cannot forgo having its own browser and just pre-installing Chrome or even Firefox on its Windows operating system. That would mean ceding to much control to others. It has to have its own browser.
Windows ChromiumizedRemember that browser architecture I shared in the beginning? It is changing in one critical way. Google decided to create an “operating system” and call it Chrome OS, which ends up being based to some extent on the browser itself:
We spend more time in front of web applications that reside in the browser (or in Electron apps) and less inside native apps. This means that in many ways, the browser is the operating system.
Google derives all of its value from the internet, with the browser being the window there.
Microsoft is heading in the same direction, and where it matters for it with its operating system, it finds itself now competing against Chrome OS and Chromebooks, making it a huge threat to Microsoft and Office.
And obviously, there’s a “lite” version of Windows in the works, at least by the reports on Petri. Is this related to Edge using Chromium in some way? Would Windows Lite be web focused in the same way that Chrome OS is?
Who Controls Chromium? And is it the new Linux?Back to Chromium, and the reasons that the Microsoft news is making ripples in the web around openness and positive fragmentation.
Browsers are becoming operating systems in many ways. Can we correlate between Linux and its ecosystem to Chromium and its growing ecosystem?
Linux and OwnershipI’d say that these are two distinctly different cases. If anything, Chromium’s status should worry many out there. It is less about monocultures, openness and high words and more about control and competitive advantage.
On opensource.com, Greg Kroah-Hartman Feed wrote two years ago a piece titled 9 lessons from 25 years of Linux kernel development. Here’s lesson 6:
6. Corporate participation in the process is crucial, but no single company dominates kernel development.
Some 5,062 individual developers representing nearly 500 corporations have contributed to the Linux kernel since the 3.18 release in December of 2014. The majority of developers are paid for their work—and the changes they make serve the companies they work for. But, although any company can improve the kernel for its specific needs, no company can drive development in directions that hurt the others or restrict what the kernel can do.
This is important.
Who really controls Linux? Who owns it? Who decides what comes next? The fact that there are no clear answers to these questions is what makes Linux so powerful and so useful to the industry as a whole.
Chromium and GoogleDoes the same apply to Chromium?
Chromium is a Google owned project. Hosted on a Google domain. Managed using Google tooling. Maintained by Google. This includes all the main browser pieces that are created, controlled and owned by Google to a large extent: the V8 JavaScript Engine, Blink web renderer and Chromium itself.
When someone wants to contribute into Chromium, they need to go through a rigorous process. One that takes place at Google’s leisure and based on their priorities. This is understandable. Chromium is what Chrome is made up of, and Chrome gets released to a billion users every 6-8 weeks. Breakage there ends with backlash. Security holes there means vulnerability at a large scale.
While these aspects of stability and security are there with Linux as well, when it comes to Chromium, Google is the one that is setting the priorities.
It doesn’t end with priorities. It goes to the types of web experiments and proprietary features that end up in Chrome. Since Google controls and owns the Chromium stack… it can do as it pleases.
Will Google cede control of Chromium just because?
No.
It might benefit the open-whatever if it did, but it would also slow down innovation and won’t further Google’s own cause.
Microsoft and ChromiumMicrosoft is painting this in colors of open source and collaboration with the industry.
It isn’t.
This is about Microsoft going with Chromium because Edge took a few bad turns in its strategy from the get go:
Going with Chromium means two things to Microsoft:
The only challenge here is that it comes to Chromium as just another vendor. Not a partner or an owner.
A Single WebRTC StackAt the recent Kranky Geek event, Microsoft discussed its WebRTC on UWP project. Part of it was about merging changes it made to the WebRTC code from webrtc.org (=the code that goes into Chrome). Here’s how James Cadd framed it in his session:
… after 4 years of maintaining a fork on github, we’ve been discussing with Google the possibility of submitting this back to the webrtc.org repo and we’re working on that now. The caveat is that there’s no guarantee that we’ll get 100% of the way there. We’re mostly using the public submission process, so we’re going through reviews just like everyone does, but that’s our goal.
The UWP specific changes are going to live in sdk-contrib-windows so we will have our own little area to contribute this back. Microsoft has comitter rights there, so we’ll be able to keep everything moving there. […]
So just wanted to say thank you to Google for that opportunity. We’re looking forward for the collaboration.
A master and a slave? A landlord and a tenant? A patron and a client? Two partners? I am not sure what the exact relation here is, but it should be similar to what Microsoft has probably struck with Google across the board for all Chromium related technologies that are dear to Microsoft in one way or another.
Is a single stack good or bad?
If we look at it from a browser level perspective, we aren’t in a different position in the technology diversity than 8 years ago:
And here’s where we are today:
The main difference is market share – Chrome is eating up the internet with Blink and Chromium. Factor in Node.js which uses V8 JavaScript engine and you get the same tech running servers as well.
WebRTC specific though? Now runs on webrtc.org code only. All browser vendors pick bits and pieces from it for their own implementations, and while they are differences between browsers they aren’t many.
As I said before in many of my articles here – most developers today can simply develop their code for Chrome and be done with it; adding support for more browsers only if they really really really need to.
Browsers are one piece of getting WebRTC to run. Check out what else you’ll need in this free video series unraveling the server side story of WebRTC:
Register to the video series
Could Microsoft Buy Their way into Browser Market Share?Not really. If they could have, they would done so instead of going Chromium.
Let’s start from why such a move would be appealing.
GitHubThe recent acquisition of GitHub by Microsoft can be taken as a case point. Especially considering at the varied reactions it brought across the board.
6 months after that announcement, the sky haven’t fallen. Open source hasn’t been threatened or gobbled up by Microsoft. And Microsoft is even using GitHub for its own projects, and to announce its own initiatives – Edge using Chromium for example.
Time will tell, but my gut tells me that Microsoft’s acquisition of GitHub is as meaningful as Facebook’s acquisition of Whatsapp and Instagram. These made little sense at the time from a valuation standpoint, but no one is doubting these acquisitions today.
With GitHub, Microsoft is buying its way into open source. Not only as lip service, but also in understanding how open source works. By owning a large portion of the open source interactions, and being able to analyze them closely, Microsoft can tell where developers are headed and what they are after. Microsoft was always successful due to the developers using their platform (top notch tools for developers – always). GitHub allows them to continue with that in an open source world.
Then why not the browser market?
There were two assets that could be acquired here – Mozilla and Electron.
ElectronElectron is already developed and maintained by GitHub directly. Microsoft owns it already.
What advantages does Microsoft derive from Electron? None, assuming you remember that Electron runs on top of Chromium.
From a strategic standpoint, there’s no value in Electron for Microsoft. At the end of the day, Electron is a window to Chromium and to web applications.
Microsoft is using it for its own cross platform applications – Skype on Linux has been known to use Electron for several years now.
Owning Electron through GitHub doesn’t help Microsoft in its browser market share.
MozillaMozilla would have been an interesting acquisition.
Similarly to GitHub, it would be acquiring the obvious open source vendor. The challenge here is twofold:
Furthermore, acquiring Firefox as a window to Microsoft’s services and assets in the cloud is exactly one of them things that Mozilla is fighting Google against. It would be counterproductive to go there.
—
Microsoft has no one to buy in order to improve its position and market share in browsers.
It could only continue to fight it out with Edge or partner. And it decided to partner with the goliath in the room (an elephant wouldn’t be visible enough).
Will Chrome Reign Supreme?Yes.
Anyone thinks otherwise?
The post Is Chrome on its Way to be ONLY Browser out there? (Microsoft throwing the towel on Edge) appeared first on BlogGeek.me.
What Does Machine Learning Have to do with MOS Scores?
Human subjectivity in MOS calculations doesn’t hold water when it comes to heterogeneous environments. That’s where machine learning comes to play.
MOS score. That Mean Opinion Score. You get a voice call. You want to know its quality. So you use MOS. It gives you a number between 1 to 5. 1 being bad. 5 being great. If you get 3 or above – be happy and move on they say. If you get 4.something – you’re a god. If you don’t agree with my classification of the numbers then read on – there’s probably a good reason why we don’t agree.
Anyways, if you go down the rabbit hole of how MOS gets calculated, you’ll find out that there isn’t a single way of doing that. You can go now and define your own MOS scoring algorithm if you want, based on tests you’ll conduct. From that same Wikipedia link about MOS:
“a MOS value should only be reported if the context in which the values have been collected in is known and reported as well”
Phrased differently – MOS is highly subjective and you can’t really use MOS scores produced in one device to MOS scores produced in another device.
This is why I really truly hate delving into these globally-accepted-but-somewhat-useless quality metrics (and why we ended up with a slightly different scoring system in testRTC for our monitoring and testing services).
What Goes into MOS Scoring Calculations?Easy. everything.
Or at least everything you have access to:
Here are a few examples:
Physical desk phoneA physical IP phone has access to EVERYTHING. All the software and all the hardware.
It even knows how the headset works and what quality it offers.
Theoretically then, it can provide an accurate MOS that factors in everything there is.
Android native appAndroid apps have access to all the software. Almost. Mostly.
The low level device drivers are as known as the hardware that app is running on. The only problem is the number of potential devices. A few years back, these types of visualizations of the Android fragmentation were in fashion:
This one’s from OpenSignal. Different devices have different location for their mics and speakers. They use different device drivers. Have different “flavors” of the Android OS. They act differently and offer slightly different voice quality as well.
What does measuring what an objective person think about the quality of a played audio stream mean in such a case? Do we need to test this objectivity per device?
Media server who routes voice aroundThen we have the media server. It sends and receives voice. It might not even decode the audio (it could, and sometimes it does).
How does it measure MOS? What would it decide is good audio versus bad audio? It has access to all packets… so it can still be rather accurate. Maybe.
WebRTC inside a browserAnd we have WebRTC. Can’t write an article without mentioning WebRTC.
Here though, it is quite the challenge.
How would a browser measure MOS of its audio? It can probably do a good a job as an Android device. But for some reason, MOS scoring isn’t part of the WebRTC bundle. At least not today.
So how would a JavaScript web application calculate MOS of the incoming audio? By using getStats? That has access to an abstraction on top of the RTCP sender and receiver reports. It correlates to these to some extent. But that’s about as much as it has at its disposal for such calculations, which doesn’t amount for much.
Back to MOS calculationsBut what does MOS really calculate?
The quality of the voice I hear in a session?
Maybe the quality of voice the network is capable of supporting?
Or is it the quality of the software stack I use?
What about the issue with voice quality when the person I am speaking with is just standing in a crowded room? Would that affect MOS? Does the actual original content need to be factored into MOS scores to begin with?
I’ll leave these questions opened, but say that in my opinion, whatever quality measurement you look at, it should offer some information to the things that are in your power to change – at least as a developer or product owner. Otherwise, what can you do with that information?
What Affects Audio Quality in Communications?Everything.
I am sure I missed a bullet or two. Feel free to add them in the comments.
The thing is, there’s a lot of things that end up affecting audio quality when you make the decision of sending it through a network.
Is Machine Learning Killing MOS Scoring or Saving It?So what did we have so far?
A scoring system – MOS, which is subjective and inaccurate. It is also widely used and accepted as THE quality measure of voice calls. Most of the time, it looks at network traffic to decide on the quality level.
At Kranky Geek 2018, one of the interesting sessions for me was the one given by Curtis Peterson of RingCentral:
He discussed that problem of having different MOS scores for the SAME call in each device the call passes through in the network. The solution was to use machine learning to normalize MOS scoring across the network.
This got me thinking further.
Let’s say one of these devices provides machine learning based noise suppression. It is SO good, that it is even employed on the incoming stream, as opposed to placing it traditionally on the outgoing stream. This means that after passing through the network, and getting scored for MOS by some entity along the way, the device magically “improves” the audio simply by reducing the noise.
Does that help or hurt MOS scoring? Or at least the ability to provide something that can be easily normalized or referenced.
Machine Learning and Media OptimizationWe’ve had at Kranky Geek multiple vendors touching the domain of media optimizations. This year, their focus was mainly in video – both Agora.io and Houseparty gave eye opening presentations on using machine learning to improve the quality of a received video stream. Each taking a different approach to tackling the problem.
While researching for the AI in RTC report, we’ve seen other types of optimizations being employed. The idea is always to “silently” improve the quality of the call, offering a better experience to the users.
The next couple of years, we will see this area growing fast, with proprietary algorithms and techniques based on machine learning are added to the arms race of the various communication vendors.
Interested in more of these sessions around real time communications and how companies solve problems with it today?
Subscribe to our YouTube channel
The post What Does Machine Learning Have to do with MOS Scores? appeared first on BlogGeek.me.
It is about time for video room systems to adopt WebRTC native approaches.
When I first started this blog, I had no clue where it was going to take me. I wanted it to be about developers. To be interesting. I also decided early on to write three posts about WebRTC:
Somehow, I ended up covering a lot more ground since then when it comes to WebRTC…
Signaling came a long way since then. Most of you might not even know what H.323 is. SIP is still important, but a lot less these days. Proprietary signaling mechanisms are thriving – and that’s a good thing.
The thing that never did come to play was WebRTC in video room systems. When you went to purchase a room system, you were tethered to the vendor providing you that system, along with the signaling standards it supported. It is still painfully hard to connect room systems of different vendors. And if you factor in the need to integrate it with other services the enterprise uses, it becomes even worse.
What’s a Video Room System Anyway?This is called a codec for some arcane reason.
A video room system is a device split into 4 parts in most cases:
The TV display itself is almost never included in the package (unless you’re starting to look at the new touch boards).
Speaker pods are sometimes integrated into the camera itself. This is suitable for smaller meeting rooms, also known as huddle rooms.
Remote controls were always nasty. A meeting room will have at least 3 of those: one for the TV, one for the projector in the room and one for the video room system. The one for the video room system is somehow the most complex to use. The projector one is gone along with the projector, now that we all just use the TV(s) instead.
In many cases, an external touch panel will be used to control the gizmos in the room, including lighting and other moving parts. And today, in many cases, these room systems are capable of tethering themselves to apps on smartphones for the control, killing the need for the remot control altogether.
The brains? They are sometimes just wrapped into the same box as the camera, just to save on cabling and space.
It started off as an all customized solution. The hardware, the software – it was all proprietary and specific. DSPs made up the “brains”. High end cameras were purchased and branded from Sony. The software was written in embedded operating systems like VxWorks (anyone remembers that painful thing?)
We’ve standardized some of it as time went by. Cameras have become somewhat of a commodity, now that we’re all carrying powerful ones in our pockets. Operating systems for these devices have moved on to be Linux based. DSPs are less common now that we can just use SoC (system on chip, packing the host operating system and the DSPs nicely together) or just rely on Intel chips.
What never happened is the standardization and commoditization of the software in the brains – the actual video software running the room system.
Let’s Talk UCaaSThat may finally be changing. As we head to the cloud, UCaaS (unified communication as a service) vendors are beefing up their offerings. Adding contact centers, APIs, video support and other trinkets to their battle chest.
In the past few months, we’ve seen:
Each of these vendors is using today a third party for its video calling services but can now potentially displace them with its own technology stack.
While that solves their video software issues, how are they going to handle video room systems?
Lets see what the other notable players have done in that domain:
Vonage, 8×8 and RingCentral aren’t hardware vendors. They aren’t going to start designing and manufacturing video room systems. When it comes to physical phones, they partner with multiple device manufacturers. This is hard work when it comes to integration and to adding more devices into the fold and trying to introduce new features. The video room systems types of devices are limited today. Polycom offer partner-friendly solutions. Logitech sells components/peripherals (mainly the cameras). Lifesize has its own cloud service. And again, integrating these video room systems with other features and capabilities is sometimes close to impossible.
On the other end of the spectrum, there’s the customer. Banking on one UCaaS supplier is fine, but if you invest in hardware devices, will they be usable when switching to another vendor? What if you want more than a single service to run on a room system? Let’s say you want to record and transcribe physical meetings taking place in a room – when not on a call. Is the UCaaS vendor or the video room system vendor need to add such a capability? Can you add it on your own by partnering with a totally different vendor while still using the same hardware?
Now, here’s the thing:
How can you partner with video room system vendors (even if there are ones) in a way that is relatively easy?
You Redefine What a Room System isThe one thing that is now changing is the software that is built into a video room system.
That is done by first changing the operating system. Instead of Linux – Android.
And Android means we can start thinking of a video room system as a device that can run multiple different applications by different vendors for different tasks.
Need to run Zoom? Why not?
Wanna switch to GoToMeeting? Fine.
How about attending a WebEx call? Sure.
Just install any of these apps – or better yet – try joining them from an integrated Chrome browser if they happen to support WebRTC.
But what if you want to show internal news for your company on that display connected to the video meeting room? Or give the ability to record and transcribe local meetings? Or connect to other internal or external services with ease? Not a problem. Just install that app on Android and you’re ready to go.
The difference here is that there is no integration work required from the video room system vendor. This is something the UCaaS vendor can do – or god forbid – the actual enterprise who is using the video room system.
I’ve been waiting for this level of commoditization and flexibility to take place.
Enter HELLO 2One of the vendors in this space, is Solaborate. I’ve interviewed Labinot years ago on this blog. That was about his enterprise social network service. Since then, he’s added a hardware device called HELLO which successfully launched on Kickstarter; and he is now running a Kickstarter campaign for HELLO 2.
The HELLO 2 is an “all in one” video room system capable of what I was looking for to happen:
The best though? It runs on Android, so you can either use the HELLO 2 / Solaborate applications or any other application you fancy using (that said, the applications may not be as polished on the big screen as they are on a phone or a tablet and that requires a bit of reworking on their end).
This gives some real flexibility:
One more thing – you can run Chrome directly on the HELLO 2, and it will successfully operate any WebRTC based web page with it.
The FutureThis is the model of the future when it comes to video room systems. Generic types of devices, packing all the needed hardware, letting other vendors and customers handle the software components.
And today, there’s no easier way to do that than using Android as the baseline operating system. Having a Chrome browser inside the device is just an added bonus to let you join with guest access to those pesky calls your suppliers and customers schedule on their own services.
The post HELLO 2. Is Hardware Gear Finally Taking WebRTC Seriously? appeared first on BlogGeek.me.
Digital Ocean is offering a private LAN for internal communication between the VMs, and they claim it’s isolated from other customers. You get some random addresses within 10.133.0.0/16 (or maybe some other range), and they can talk to each other on dedicated virtual NICs.
But that’s it. You cannot run OSPF because multicast packets are not let through. Even if you manage configuring direct neighbors in OSPF, it renders useless because the private LAN does not allow packets with destination IP addresses outside of the LAN range. So, any kind of routing with next hop in the private LAN would not work.
Too bad guys, very disappointed. So, we need to resort to Tinc VPN for internal routing, and this private LAN doesn’t make any sense.
For me, Kranky Geek 2018 was a tremendously fun experience.
We had our fourth Kranky Geek event in San Francisco last week. As usual, it is a nerve wrecking experience up until the point it ends. And it doesn’t start on the day of the event itself – we’ve been busy with content curation, handling presentation drafts and doing dry runs for a few weeks.
The result is quite satisfying. We’ve decided this time to dig even deeper into the domain of artificial intelligence and machine learning and its role in real time communications. As I’ve been saying, WebRTC is ready – so what would be the point of doing an event about WebRTC? We have a lot of WebRTC topics already covered from our past events – and they are all available in the Kranky Geek YouTube channel.
The way we see it, there are 4 domains we had to cover: speech analytics, voicebots, computer vision and RTC optimization.
So we went hunting for the event. In the end, we were able to cover all four domains and squeeze a few WebRTC specific topics as well.
The SessionsThis year, we had the biggest number of sessions. The event has become a full day event from a shorter one over the years. The people I talked to noted that the day was long and tiring, but somehow, almost everyone stayed to the end. Here’s what we had this year:
Our own welcomeKranky Geek SF 2018: AI in RTC from Tsahi Levent-levi
One thing to note here – our AI in RTC report got a promotional discount of ~33%, which will be available until the end of the month. If this space interests you, then definitely check it out.
DiscordDiscord operates a large chat operation for gamers. Part of that service includes voice and video calling. At peak, they handle 2.8 million concurrent voice connections to their service.
What they shared, was the changes they have done to the vinyl WebRTC code base in order to fit their needs.
FacebookFacebook were kind enough to give a presentation around Facebook Portal – their new home device that is capable of handling video calls (using WebRTC of course). The device uses machine learning to track the people in the room during a call. They talked about the challenges that comes with automating the camera’s zoom and with connecting calls from Portal devices to mobile phones.
This was the first time they shared that information publicly at a conference.
IntelIntel announced open sourcing their media server – the Intel Collaboration Suite for WebRTC – under the name of Open Media Streamer. They also shared information of svt-hevc, their open source HEVC encoder.
VoicebaseVoicebase talked about Paralinguistics – the way we speak as opposed to the words we are saying. They shared the path they took charting that space, and understanding what makes more sense or less sense in terms of value.
VoiceraVoicera discussed virtual assistants and how they need to understand transcriptions.
IBMIBM explained the notion of voicebots and how it fits into contact centers. They explained the need to be able to handoff a voicebot to a human agent.
NexmoNexmo showed a demo using Dialog Flow, connected to a voice service for ordering a pizza. It stressed the need to be able to connect communication services to various machine learning ones.
DialpadDialpad explained how to take an open source speech to text engine and add some custom words into it in order to improve the accuracy of the transcription.
CallstatsCallstats clustered the sessions they are collecting, trying to figure out by that information the type of call and root cause of issues it may have.
RingCentralRingCentral normalized MOS scores of audio calls across its network and devices, to be able to give a clear indication of call quality – it appears that while there’s a standard specification for MOS, asking device manufacturers to follow it to the letter is rather challenging, so using machine learning they are “fixing” that issue.
GoogleGoogle talked about the current status and efforts in getting Chrome’s WebRTC implementation to 1.0 specification. It also shared the work being done to improve audio stability and performance in Chrome (lots of architecture changes in how devices get accessed in order to reduce the number of threads used and get a stable delay model for its acoustic echo canceller). There was also a look at what goes after 1.0 – WebRTC NV and what role may WebAssembly play there (I’ll write more about it in the future).
AgoraAgora showed how they use super resolution to improve video quality in calls, and what it means to run super resolution on a mobile device.
HousepartyHouseparty used machine learning to improve video quality as well, taking a different approach. They shared the work they are doing and the effort it takes to bring it to production.
MicrosoftMicrosoft shared the work done on WebRTC on UWP and explained how AR/VR fits into the story and the enterprise use cases they are seeing in the market.
Session RecordingsAs always, all the sessions were recorded and are available online.
Kranky Geek in 2019Every year we’ve done a Kranky Geek event, we came in with the notion that this is the last one. Not sure why, but that was always the case. Then about 9 months after the event, we started discussing with Google about the next event.
We’ve changed that this time. We are going to do an event in 2019, and we have a name for it:
Kranky Geek SF 2019
We have a tentative date for the event: November 15, 2019
Put it in your calendar.
We don’t yet know what the theme for next year will be, but I have a hunch that it will include WebRTC and machine learning
If you want to speak – contact me
If you want to sponsor – contact me
If you have feedback on what we should improve – you know – contact me
Oh – and if you are interested in AI in WebRTC, check out our report – there’s a discount available for it until the end of the month.
The post Kranky Geek 2018. A post event post appeared first on BlogGeek.me.
Jitsi was just acquired by 8×8, shifting hands from Atlassian. Here’s what to expect.
It seems that Jitsi has now switched hands, moving from Atlassian to 8×8.
Three months ago, Atlassian made a bold (desperate?) decision. It put up a white flag, decided to kill Stride, after investing in it huge amounts of money and resources, throw Hipchat along with it, and “sell” them to Slack, who “acquired” them.
The weird thing in this acquisition was that Jitsi was left behind.
Jitsi is an open source media framework. One of the most popular WebRTC frameworks out there. I wrote about that acquisition in 2015. The reason behind it was Atlassian’s need to own the video communications technically that powered Hipchat. And now that Hipchat is gone, what would Atlassian need Jitsi for?
The last 3 yearsThe last 3 years have been good for Jitsi in Atlassian.
The team of developers it had was big, considering its scope (and open-sourceness). Especially if you factor in the fact that everything that Hipchat (and Stride) needed from Jitsi was implemented directly inside Jitsi. Not on a private branch of the project available only to Atlassian.
Compare it to how Twilio treated Kurento after its acquisition… Atlassian did a great job at keeping Jitsi’s momentum and community. At the very least, it didn’t hurt the project, letting it grow and flourish, paying the salaries of its developers.
The interesting initiative that took place alongside the Jitsi open source project is Jitsi Meet – a free version of a group video calling service. One that wasn’t limited to a small number of participants or lower video resolutions.
Jitsi is in a better place than it were 3 years ago prior to its acquisition.
Leaving AtlassianLeaving Atlassian was a matter of time.
There was no room in today’s Atlassian for an open source project like Jitsi that brings no added value to its commercial products.
Jitsi didn’t go to Slack as part of the Hipchat/Stride deal. Slack were already using Janus, and moving on to their own homegrown media server – something they shared with us at Kranky Geek 2017 (hint: come and join us this year at Kranky Geek 2018). There was no reason for them to further invest in yet another migration – or they might have wanted to migrate to Jitsi and acquihire the team but it didn’t pan out.
That left Atlassian with one of 3 alternatives:
8×8 acquiring Jitsi is an interesting choice.
Here’s where things get interesting:
8×8 already has a WebRTC based web conferencing solution called “8×8 Virtual Office Meetings Online”. Somewhere in 2016, this service got rewritten. At some point between then and now, guest access on Chrome was introduced. From the looks of it, based on WebRTC.
Why would 8×8 need/want Jitsi when it had a solution already?
I can think of three possible reasons for it:
What would 8×8 do with Jitsi?
The obvious thing is to integrate the tech into its meetings service. If it is already there, then use the Jitsi team of developers to tweak and finetune the thing for the 8×8 use case.
If it isn’t there yet, then integrate it and replace its current WebRTC tech in the meetings app. This is a more challenging undertaking, as Jitsi will need to meet the current feature list of what 8×8 already has in that domain, along with integrating to an existing codebase of a service and an application.
Jitsi probably has most of the needed features to make this happen. It wouldn’t have been acquired otherwise.
On a different area, 8×8 has no real open source activity at the moment. Its github account is mostly forked repos. Searching for “8×8 open source” is dominated by the Jitsi acquisition news:
(the rest are comparisons to other vendors, who are leaning more heavily on open source)
If 8×8 is interested in embracing open source, then it just got an interesting opportunity to do just that. While brings me to the last topic –
The future of JitsiWhat will be of Jitsi?
Here we need to look at Jitsi and Jisti Meet separately.
JitsiThe Jitsi Videobridge, along with its derivatives, add ons, plugins, extensions and client-side SDKs.
That’s the open source part of the project. At Atlassian, there was nothing kept for internal use of Hipchat/Stride. Everything found its way back to the open source project.
Will 8×8 continue in that path?
Their focus in the coming months is going to be the integration of Jitsi into their 8×8 meetings service. They are bound to use the resources of the Jitsi team to do that.
Managers may decide to implement some of the features in the 8×8 meetings service moving forward and not invest in adding it to the Jitsi open source project. Or they might decide to add everything via Jitsi.
8×8 might end up taking the extreme – ditching the Jitsi project as an open source one – embed it into their meetings app and from there on, invest in that privat branch only. I see that as a highly unlikely outcome in the next 2-3 years.
Time will tell which direction is taken.
Jitsi MeetJitsi Meet is a different story altogether.
It is a group video meeting service. One which doesn’t limit the users’ bitrate in sessions, doesn’t limit the number of users in a session, offers mobile apps, Slack and calendar integration and scales globally. All for free.
Would 8×8 see it as competition to their own 8×8 meetings app? If it grows in popularity and its maintenance costs increase, how happy would 8×8 be in paying the bills? Would it see Jitsi Meet as a sales tool for its other services? How would it measure the success of this service?
Whatsapp’s founders just left Facebook this year. It was over disputes about data, privacy and such. Most of all, it was probably a dispute around the future of Whatsapp and Facebook’s intent of monetizing the asset. The same (at a much smaller scale) can happen here at some point.
How would 8×8 monetize Jitsi Meet? Should it? If it doesn’t, should it kill it?
I don’t know the answers. I am sure 8×8 doesn’t either. It is just too early to tell.
Last WordsJitsi is an open source success story in WebRTC. There’s no doubt about it.
It is now entering a new chapter in its life, under 8×8.
I wish the team the best of luck and us as an industry to have the option to use Jitsi for our future projects.
Media Frameworks are part of the picture of the backend story of WebRTC. Care to learn the rest? Try out my free mini-video series on WebRTC backedn servers:
Register to the video series
The post 8×8 Acquires Jitsi From Atlassian. Winners and Losers appeared first on BlogGeek.me.
Kranky Geek is happening this year again, the date is Nov 16, and we’ve got the best lineup of speakers for you.
Kranky Geek started almost by mistake. Like most good things that happened to me. It wasn’t planned. The result though is becoming a tradition by now, where I get to work with Chris Koehncke and Chad Hart for a period of time that can be considered quite intense (we’re all too opinionated).
Google, along with our other sponsors make this event happen. We only curate the content to make sure the end result is great.
In last year’s event, we started looking at the domain of AI. You can find the recordings of that event on YouTube. The feedback we got was positive, so this year we’re taking a step further here. Many of the sessions will focus on machine learning and AI and its impact on real time communications.
What’s on the Agenda?AI in RTC.
As always, our intent here is to focus as much as possible on services and applications that are running in production already. It won’t be theories about what can be done but what are people doing. Today.
The updated agenda can be found online. It might change a bit in its ordering, but it is mostly ready.
This year, we have some brand new speakers for you:
We also have some “repeat” speakers:
We are expanding our family of Kranky Geek speakers and Kranky Geek companies, which is a true joy. I can’t wait to hear your feedback once the day is over.
Our sponsors this yearAs always, the event is practically free to attend (there’s a $10 admission fee that gets donated to Girl Develop It).
The companies that made this event happen this year are Google, Intel, Agora.io and Nexmo who are our premium partners for the event; Callstats.io ,Voicebase and RingCentral who are our silver partners for the event.
No fire drillI am not sure if this is good or bad. We had a surprise fire drill last year. We knew about it about a week or two before the event. It cause so much headache for us. And a lot of worries.
It ended up pretty well, with our audience and speakers getting a one hour break outside on a beautiful sunny day. Almost all of them came back after the drill, which isn’t obvious or even expected.
Many were happy for the break – and the smalltalk that ensued during it.
Hopefully, there will only be pleasant surprises this year as well.
What are we looking for in Kranky Geek?We had to turn down a few vendors who wanted to speak. This is a process that takes place every year.
There’s no specific set of rules of what we approve or don’t as a session in Kranky Geek, but for me it boils down to this:
While the lineup of speakers for this year is full, if you want to speak in future Kranky Geek events – be sure to catch me during the event for a chat.
Should you travel just for this single day?I got this question a few times in the past few weeks.
My guess is that if this is the only thing you’re doing in San Francisco and coming for, then skip it. Especially if you are traveling from abroad.
That said, if you want to feel where WebRTC is headed, talk to many of the people who deal with it daily in the real world, then this is the place to be. So many discussions take place during the breaks that it might be worth coming only for the breaks… I know a person or two that are coming only for that.
We try to make Kranky Geek special and unique. We work hard to select the speakers and work with them on their presentations. All to make it worth your travel, wherever you come from.
Can non-developers attend?We received this question recently.
There is no easy answer to this one. On one hand, the event and its session are technical in nature as our focus is developers. On the other hand, the sessions are short (20 minutes all-in-all), so our speakers tend to focus on the essence and not dive too deep into the nitty gritty details. So a tough call.
My suggestion? Check out some of the session recordings on YouTube from past events and make your decision based on that.
Register nowYes. there’s this minor detail.
You need to register to attend. There’s limited room capacity, and at some point, we will need to close the registration.
We’re already half full in our registration list, so save your spot now and don’t wait.
Do you want to meet me prior to the event?
I’ll be in San Francisco Nov 12-17. Nov 15-16 are reserved for Kranky Geek. The rest for meetings with people – around WebRTC, CPaaS, testRTC, my WebRTC course, consulting and just catching up.
If you want to meet me during that week, leave me a note.
The post Meet me @ Kranky Geek San Francisco 2018 appeared first on BlogGeek.me.
Phosfluorescently utilize future-proof scenarios whereas timely leadership skills. Seamlessly administrate maintainable quality vectors whereas proactive mindshare.
Dramatically plagiarize visionary internal or "organic" sources via process-centric. Compellingly exploit worldwide communities for high standards in growth strategies.
Wow, this most certainly is a great a theme.
Donec sed odio dui. Nulla vitae elit libero, a pharetra augue. Nullam id dolor id nibh ultricies vehicula ut id elit. Integer posuere erat a ante venenatis dapibus posuere velit aliquet.
Donec sed odio dui. Nulla vitae elit libero, a pharetra augue. Nullam id dolor id nibh ultricies vehicula ut id elit. Integer posuere erat a ante venenatis dapibus posuere velit aliquet.