bloggeek

Subscribe to bloggeek feed bloggeek
The leading authority on WebRTC
Updated: 35 min 4 sec ago

WebRTC Multiparty Architectures

Mon, 04/15/2019 - 12:00

There are multiple ways to implement WebRTC multiparty sessions. These in turn are built around mesh, mixing and routing.

In the past few days I’ve been sick to the bone. Fever, headache, cough – the works. I couldn’t do much which meant no writing an article either. Good thing I had to remove an appendix from my upcoming WebRTC API Platforms report to make room for a new one.

I wanted to touch the topic of Flow and Embed in Communication APIs, and how they fit into the WebRTC space. This topic will replace an appendix in the report about multiparty architectures in WebRTC, which is what follows here – a copy+paste of that appendix:

Multiparty conferences of either voice or video can be supported in one of three ways:

  1. Mesh
  2. Mixing
  3. Routing

The quality of the solution will rely heavily on the different type of architecture used. In Routing, we see further refinement for video routing between multi-unicast, simulcast and SVC.

WebRTC API Platform vendors who offer multiparty conferencing will have different implementations of this technology. For those who need multiparty calling, make sure you know which technology is used by the vendor you choose.

Mesh

In a mesh architecture, all users are connected to all others directly and send their media to them. While there is no overhead on a media server, this option usually falls short of offering any meaningful media quality and starts breaking from 4 users or more.

Mesh topology

For the most part, consider vendors offering mesh topology for their video service as limited at best.

Mixing

MCUs were quite common before WebRTC came into the market. MCU stands for Multipoint Conferencing Unit, and it acts as a mixing point.

MCU mixing topology

An MCU receives the incoming media streams from all users, decodes it all, creates a new layout of everything and sends it out to all users as a single stream.

This has the added benefit of being easy on the user devices, which see it as a single user they need to operate in front; but it comes at a high compute cost and an inflexibility on the user side.

Routing

SFUs were new before WebRTC, but are now an extremely popular solution. SFU stands for Selective Forwarding Unit, and it acts like a router of media.

SFU routing topology

An SFU receives the incoming media streams from all users, and then decides which streams to send to which users.

This approach leaves flexibility on the user side while reducing the computational cost on the server side; making it the popular and cost effective choice in WebRTC deployments.

To route media, an SFU can employ one of three distinct approaches:

  1. Multi-unicast
  2. Simulcast
  3. SVC
Multi-unicast

This is the naïve approach to routing media. Each user sends his video stream towards he SFU, which then decide who to route this stream to.

If there is a need to lower bitrates or resolutions, it is either done at the source, by forcing a user to change his sent stream, or on the receiver end, by having the receiving user to throw data he received and processed.

It is also how most implementations of WebRTC SFUs were done until recently.

Simulcast

Simulcast is an approach where the user sends multiple video streams towards the SFU. These streams are compressed data of the exact same media, but in different quality levels – usually different resolutions and bitrates.

Simulcast

The SFU can then select which of the streams it received to send to which participant based on their device capability, available network or screen layout.

Simulcast has started to crop in commercial WebRTC SFUs only recently.

SVC

SVC stands for Scalable Video Coding. It is a technique where a single encoded video stream is created in a layered fashion, where each layer adds to the quality of the previous layer.

SVC

When an SFU receives a media stream that uses SVC, it can peel of layers out of that stream, to fit the outgoing stream to the quality, device, network and UI expectations of the receiving user. It offers better performance than Simulcast in both compute and network resources.

SVC has the added benefit of enabling higher resiliency to network impairments by allowing adding error correction only to base layers. This works well over mobile networks even for 1:1 calling.

SVC is very new to WebRTC and is only now being introduced as part of the VP9 video codec.

The post WebRTC Multiparty Architectures appeared first on BlogGeek.me.

Handling session disconnections in WebRTC

Mon, 04/08/2019 - 12:00

WebRTC disconnections are quite common, but you can “fix” many of them just by careful planning and proper development.

Years ago, I developed the H.323 Protocol Stack at RADVISION (later turned Avaya, turned Spirent turned Softil). I was there as a developer, R&D manager and then the product manager. My code is probably still in that codebase, lovingly causing products around the globe to crash from time to time – as any other developer, I have my share of bugs left behind.

Anyways, why am I mentioning this?

I had a client asking me recently about disconnections in WebRTC. And it kinda reminded me of a similar issue (or set of issues) we had with the H.323 stack and protocol years back.

If you bear with me a bit – I promise it will be worth your while.

I am starting this week the office hours for my WebRTC course. The next office hour (after the initial “hi everyone”) will cover WebRTC disconnections.

Check out the course – and maybe go over the first module for free:

Learn WebRTC

A quick intro to H.323 signaling and transport

H.323 is like SIP just better and more complex. At least for me, who started his way in VoIP with H.323 (I will always have a soft spot for it). For many years, the way H.323 worked is by opening two separate TCP connections for transporting its signaling. The first for passing what is called Q.931 protocol and the next for passing H.245 protocol.

If you would like to compare it to the way WebRTC handles things, then Q.931 is how you setup the connection – have the users find each other. H.245 is similar to what SDP and JSEP are for (I am blatantly ignoring H.225 here, another protocol in H.323 which takes care of registration and authentication).

Once Q.931 and H.245 get connected, you start adding the RTP/RTCP stuff over UDP, which gets you quite a lot of connections.

Add to that complexities like tunneling H.245 over Q.931, using something called faststart instead of H.245 (or before H.245), then sprinkle a dash of “parallel H.245” and then a bit of NAT traversal and/or security and you get a lot of places that require testing and a huge number of edge cases.

Where can H.323 get “stuck” or disconnected?

With so many connections, there are a lot of places that things can go wrong. There are multiple state machines (one for Q.931 state, one for H.245 state) and there are different connections that can get severed for one reason or another.

Oh – and in H.323 (at least in its earlier specifications that I had the joy to work with), when the Q.931 or H.245 connections get severed – the whole session is considered as disconnected, so you go and kill the RTP/RTCP sessions.

At the time, we suffered a lot from zombie sessions due to different edge cases. We ended up with solutions that were either based on the H.323 specification itself or best practices we created along the way.

Here are a few of these:

  • If the Q.931 connection gets severed – kill the session
  • If the H.245 connection gets severed – kill the session
  • If you don’t receive media or media control packets on RTP or RTCP respectively for a configurable period of time (think 5-10 seconds) – kill the session
  • When a state machine for Q.931 or H.245 initiates – start a timer. If that timer ends and the state machine didn’t get to the connected state – switch the state to timeout and… – kill the session
  • Killing the session means trying to gracefully close all connections, but if we can’t within a short period of a timeout – we just shut things down to collect the resources back to be used later

H.323 existed before smartphones. Systems were usually tethered to an ethernet cable or at most over WiFi in a static location at a time. There was no notion of roaming or moving between networks. Which meant that there was no need to ask yourself if a connection got severed because of a switch in the network or because there’s a real issue.

Life was simple:

And if you were really insistent then maybe this:

(in real life scenarios, these two simplistic state machines were a lot bigger and complicated, but their essence was based on these concepts)

Back to WebRTC signaling and transport

WebRTC is simpler and more complicated than H.323 at the same thing.

It is simpler, as there is only SRTP. There’s no signaling that is standardized or preselected for WebRTC. And for the most part, the one you use will probably require only a single connection (as opposed to the two in H.323). It also has a lot less alternatives built into the specification itself that H.323 has.

It is more complicated, as you own the signaling part. You make that selection, so you better make a good one. And while at it, implement it reasonably well and handle all of its edge cases. This is never a simple task even for simple signaling protocols. And it’s now on you.

Then there’s the fact that networks today are more complex. User expect to move around while communicating, and you should expect such scenarios where users switch networks in mid-session.

If you use WebRTC in a browser, then you get these interesting aspects associated with your implementation:

  1. When you close the browser, the session dies
  2. When you close the tab where the WebRTC session lives, the session dies
  3. When you refresh the page where the WebRTC session lives, the session dies
  4. When you click a link to move to a different page (even on the same site), the session dies

A lot of dying taking place on the browser, and the server, or the other client, will need to “sniff” these scenarios as they might not be gracefully disconnected, and decide what to do about them.

Where can WebRTC get “stuck” or disconnected?

We can split disconnections of WebRTC into 3 broad categories:

  1. Failure to connect at all
  2. Media disconnections
  3. Signaling disconnections

In each, there will be multiple scenarios, defining the reasons for failure as well as how to handle and overcome such issues.

In broad strokes, here’s what I’d do in each of these 3 categories:

#1 – Failure to connect at all

There’s a decent amount of failures happening when trying to connect WebRTC sessions. They start from not being able to even send out an SDP, through interoperability issues across browsers and devices to ICE negotiation failing to connect media.

In many of these cases, better configuration of the service as well as focus on edge cases would improve the situation.

If you experience connection failures for 10% or more of the sessions – you’re doing something wrong. Some can get it as low as 1% or less, but oftentimes that depends on the type of users your service attracts.

This leads to another very important aspect of using WebRTC:

Measure what you can if you want to be able to improve it in the future

#2 – Media disconnections

Sometimes, your sessions will simply disconnect.

There are many reasons why that can happen:

  • The firewall policies of the access point used are configured to kill P2P encrypted traffic (blame all them bittorrent-hating-IT-people)
  • The user switched from one network to another in mid-session, and you should follow WebRTC’s ICE restart mechanism
  • The other end crashed, closed or just got offline

Each of these requires different handling – some in the code while others some manual handling (think customer support working out the configuration with a customer to resolve the firewall issue).

#3 – Signaling disconnections

Unlike H.323, if signaling gets disconnected, WebRTC doesn’t even know about it, so it won’t immediately cause the session itself to disconnect.

First thing you’ll need to do is make a decision how you want to proceed in such cases – do you treat this as session failure/disconnection or do you let the show go on.

If you treat these as failures, then I suggest killing peer connections based on the status of your websocket connection to the server. If you are on the server side, then once a connection is lost, you should probably go ahead and kill the media paths – either from your media server towards the “dead” session leg or from the other participant on a P2P connection/session.

If you want to make sure the show goes on, you will need to try and reconnect the peer connection towards the same user/session somehow. In which case, additional signaling logic in your connection state machine along with additional timers to manage it will be necessary.

Announcing the WebRTC course snippets module

Here’s the thing.

My online WebRTC training has everything in it already. Well… not everything, but it is rather complete. What I’ve noticed is that I get repeat questions from different students and clients on very specific topics. They are mostly covered within lessons of the course, but they sometimes feel as being “buried” within the hours and hours of content.

This is why I decided to start creating course snippets. These are “lessons” that are 3-5 minutes long (as opposed to 20-40 minutes long), with a purpose to give an answer to one specific question at a time. Most of the snippets will be actionable and may contain additional materials to assist you in your development. This library of snippets will make up a new course module.

Here are the first 3 snippets that will be added:

  1. WebRTC session disconnections
  2. ICE servers configuration
  3. A Quick review of QUIC

While we’re at it, office hours for the course start today. If you want to learn WebRTC, now is the best time to enroll.

The post Handling session disconnections in WebRTC appeared first on BlogGeek.me.

Handling session disconnections in WebRTC

Mon, 04/08/2019 - 12:00

WebRTC disconnections are quite common, but you can “fix” many of them just by careful planning and proper development.

Years ago, I developed the H.323 Protocol Stack at RADVISION (later turned Avaya, turned Spirent turned Softil). I was there as a developer, R&D manager and then the product manager. My code is probably still in that codebase, lovingly causing products around the globe to crash from time to time – as any other developer, I have my share of bugs left behind.

Anyways, why am I mentioning this?

I had a client asking me recently about disconnections in WebRTC. And it kinda reminded me of a similar issue (or set of issues) we had with the H.323 stack and protocol years back.

If you bear with me a bit – I promise it will be worth your while.

I am starting this week the office hours for my WebRTC course. The next office hour (after the initial “hi everyone”) will cover WebRTC disconnections.

Check out the course – and maybe go over the first module for free:

Learn WebRTC

A quick intro to H.323 signaling and transport

H.323 is like SIP just better and more complex. At least for me, who started his way in VoIP with H.323 (I will always have a soft spot for it). For many years, the way H.323 worked is by opening two separate TCP connections for transporting its signaling. The first for passing what is called Q.931 protocol and the next for passing H.245 protocol.

If you would like to compare it to the way WebRTC handles things, then Q.931 is how you setup the connection – have the users find each other. H.245 is similar to what SDP and JSEP are for (I am blatantly ignoring H.225 here, another protocol in H.323 which takes care of registration and authoentication).

Once Q.931 and H.245 get connected, you start adding the RTP/RTCP stuff over UDP, which gets you quite a lot of connections.

Add to that complexities like tunneling H.245 over Q.931, using something called faststart instead of H.245 (or before H.245), then sprinkle a dash of “parallel H.245” and then a bit of NAT traversal and/or security and you get a lot of places that require testing and a huge number of edge cases.

Where can H.323 get “stuck” or disconnected?

With so many connections, there are a lot of places that things can go wrong. There are multiple state machines (one for Q.931 state, one for H.245 state) and there are different connections that can get severed for one reason or another.

Oh – and in H.323 (at least in its earlier specifications that I had the joy to work with), when the Q.931 or H.245 connections get severed – the whole session is considered as disconnected, so you go and kill the RTP/RTCP sessions.

At the time, we suffered a lot from zombie sessions due to different edge cases. We ended up with solutions that were either based on the H.323 specification itself or best practices we created along the way.

Here are a few of these:

  • If the Q.931 connection gets severed – kill the session
  • If the H.245 connection gets severed – kill the session
  • If you don’t receive media or media control packets on RTP or RTCP respectively for a configurable period of time (think 5-10 seconds) – kill the session
  • When a state machine for Q.931 or H.245 initiates – start a timer. If that timer ends and the state machine didn’t get to the connected state – switch the state to timeout and… – kill the session
  • Killing the session means trying to gracefully close all connections, but if we can’t within a short period of a timeout – we just shut things down to collect the resources back to be used later

H.323 existed before smartphones. Systems were usually tethered to an ethernet cable or at most over WiFi in a static location at a time. There was no notion of roaming or moving between networks. Which meant that there was no need to ask yourself if a connection got severed because of a switch in the network or because there’s a real issue.

Life was simple:

And if you were really insistent then maybe this:

(in real life scenarios, these two simplistic state machines were a lot bigger and complicated, but their essence was based on these concepts)

Back to WebRTC signaling and transport

WebRTC is simpler and more complicated than H.323 at the same thing.

It is simpler, as there is only SRTP. There’s no signaling that is standardized or preselected for WebRTC. And for the most part, the one you use will probably require only a single connection (as opposed to the two in H.323). It also has a lot less alternatives built into the specification itself that H.323 has.

It is more complicated, as you own the signaling part. You make that selection, so you better make a good one. And while at it, implement it reasonably well and handle all of its edge cases. This is never a simple task even for simple signaling protocols. And it’s now on you.

Then there’s the fact that networks today are more complex. User expect to move around while communicating, and you should expect such scenarios where users switch networks in mid-session.

If you use WebRTC in a browser, then you get these interesting aspects associated with your implementation:

  1. When you close the browser, the session dies
  2. When you close the tab where the WebRTC session lives, the session dies
  3. When you refresh the page where the WebRTC session lives, the session dies
  4. When you click a link to move to a different page (even on the same site), the session dies

A lot of dying taking place on the browser, and the server, or the other client, will need to “sniff” these scenarios as they might not be gracefully disconnected, and decide what to do about them.

Where can WebRTC get “stuck” or disconnected?

We can split disconnections of WebRTC into 3 broad categories:

  1. Failure to connect at all
  2. Media disconnections
  3. Signaling disconnections

In each, there will be multiple scenarios, defining the reasons for failure as well as how to handle and overcome such issues.

In broad strokes, here’s what I’d do in each of these 3 categories:

#1 – Failure to connect at all

There’s a decent amount of failures happening when trying to connect WebRTC sessions. They start from not being able to even send out an SDP, through interoperability issues across browsers and devices to ICE negotiation failing to connect media.

In many of these cases, better configuration of the service as well as focus on edge cases would improve the situation.

If you experience connection failures for 10% or more of the sessions – you’re doing something wrong. Some can get it as low as 1% or less, but oftentimes that depends on the type of users your service attracts.

This leads to another very important aspect of using WebRTC:

Measure what you can if you want to be able to improve it in the future

#2 – Media disconnections

Sometimes, your sessions will simply disconnect.

There are many reasons why that can happen:

  • The firewall policies of the access point used are configured to kill P2P encrypted traffic (blame all them bittorrent-hating-IT-people)
  • The user switched from one network to another in mid-session, and you should follow WebRTC’s ICE restart mechanism
  • The other end crashed, closed or just got offline

Each of these requires different handling – some in the code while others some manual handling (think customer support working out the configuration with a customer to resolve the firewall issue).

#3 – Signaling disconnections

Unlike H.323, if signaling gets disconnected, WebRTC doesn’t even know about it, so it won’t immediately cause the session itself to disconnect.

First thing you’ll need to do is make a decision how you want to proceed in such cases – do you treat this as session failure/disconnection or do you let the show go on.

If you treat these as failures, then I suggest killing peer connections based on the status of your websocket connection to the server. If you are on the server side, then once a connection is lost, you should probably go ahead and kill the media paths – either from your media server towards the “dead” session leg or from the other participant on a P2P connection/session.

If you want to make sure the show goes on, you will need to try and reconnect the peer connection towards the same user/session somehow. In which case, additional signaling logic in your connection state machine along with additional timers to manage it will be necessary.

Announcing the WebRTC course snippets module

Here’s the thing.

My online WebRTC training has everything in it already. Well… not everything, but it is rather complete. What I’ve noticed is that I get repeat questions from different students and clients on very specific topics. They are mostly covered within lessons of the course, but they sometimes feel as being “buried” within the hours and hours of content.

This is why I decided to start creating course snippets. These are “lessons” that are 3-5 minutes long (as opposed to 20-40 minutes long), with a purpose to give an answer to one specific question at a time. Most of the snippets will be actionable and may contain additional materials to assist you in your development. This library of snippets will make up a new course module.

Here are the first 3 snippets that will be added:

  1. WebRTC session disconnections
  2. ICE servers configuration
  3. A Quick review of QUIC

While we’re at it, office hours for the course start today. If you want to learn WebRTC, now is the best time to enroll.

The post Handling session disconnections in WebRTC appeared first on BlogGeek.me.

CPaaS differentiation in 2019

Mon, 04/01/2019 - 12:00

CPaaS differentiation seems to be revolving around tackling niches.

Time for another look at the world of CPaaS – Communication Platform as a Service. In January 2018, a bit over a year ago, I’ve looked at CPaaS trends for 2018. The ones there were:

  1. Serverless – which didn’t really happen, at least not as a direct CPaaS offering, other than what Twilio has to offer and what Voximplant had as well
  2. Omnichannel – where we see most vendors collecting channels to support, with Whatsapp being the lead noise-maker
  3. Visual/IDE – ended up being a winner in 2018, with Plivo, MessageBird, Voximplant and Infobip joining Twilio. It is also now usually called “Flow”
  4. Machine learning and AI – still more talk than action, but we’re moving in this direction. The whole industry is
  5. AR/VR – happening, though less with the CPaaS vendors directly
  6. Bots – that’s part of the omnichannel + ML/AI story. And we see instances of it done with CPaaS
  7. GDPR – something that was done and somehow mostly forgotten

I’d like to look at what’s happening in CPaaS this time from a slightly different angle, which alludes itself to trends as well, but in a more nuanced way. From briefings I’ve been given this past few weeks and the announcements and stories coming out of Enterprise Connect 2019, it looks like different CPaaS vendors are settling on different target audiences and catering to different use cases and market niches.

Today CPaaS is almost synonymous to Twilio. Every player looks at what Twilio does in order to plot its own route in the market, at times, with the intended aim of disrupting Twilio and then mostly with lower price points. In other times, with trying to offer something more/better.

Then there are external players who add APIs to their platform. Usually UCaaS (Unified Communications as a Service) platform. They don’t directly compete with CPaaS, but if you are purchasing a “phone system” for your enterprise from a UCaaS player, then why not use its APIs and services instead of opting for another vendor (a CPaaS vendor in this case)?

Planning on selecting a CPaaS vendor? Check out this shortlist of CPaaS vendor selection metrics:

Get the shortlist

Here are how some of the vendors in this space are trying to differentiate, pivot and/or find their niche within the CPaaS market.

Agora.io – Gaming

If you look at Agora’s blog, what you’ll find out there is a slew of posts around gaming and gaming related frameworks (Unity to be exact):

  • It’s How You Play the Game: Trends at Game Developers Conference – Day 1 Recap
  • Adding Voice Chat to a Multiplayer Cross-Platform Unity game
  • How To: Create a Video Chat App in Unity
  • Add Voice Chat to your Unity game
  • (iOS) Run Video Chat within your Unity application
  • (Android) Run Video Chat within your Unity application
Agora offers a specific solution for gaming

Gaming is an untapped market for CPaaS.

There’s communications there of all kinds – collaboration or communications across gamers inside a game, talking before the game, streaming the game to viewers, etc.

All this communications is either developed by the gaming companies (not a lot), being catered for by specialized VoIP gaming vendors, done out of scope (using Discord, Skype, …). Rarely is it covered by a CPaaS vendor.

Somehow, for CPaaS cracking this market is really tough. Agora.io is trying to do just that, along with its other focus areas – live broadcast and social (two other tough nuts).

ECLWebRTC – Media Pipeline

The Japanese platform from NTT Communications – ECLWebRTC.

Like many of the WebRTC-first/only platforms out there, ECLWebRTC had an SFU implementation and support for various devices and browsers.

When you get to that point, one approach is to go after voice and PSTN. Another one is to add more features and increase the sizes of meetings and live broadcasts that can be supported.

ECLWebRTC decided to go after machine learning here, with the intent of letting its customers integrate and connect its media paths directly to cloud APIs. This is done using what they call Media Pipeline Factory, which feels from the looks of it like a general purpose media server.

ECLWebRTC is less known in Europe and the US, and probably not much outside of Japan either. With the Japanese market focus on automation, it makes sense that media pipeline would be a focus area for ECLWebRTC. This type of a capability is relevant elsewhere as well, but it doesn’t seem to be a priority for others yet.

Infobip – Omnichannel

I’ve had the opportunity to fiddle around with Infobip Flow recently, something that turned out to be a very pleasant experience. From Flow, it became apparent that Infobip is working hard on offering its customers an omnichannel experience. Compared to other CPaaS vendors, they seem to have the most coverage of channels:

To the above, you can add SMS and RCS and email.

Infobip Flow has another nice  quality – it is built for both inbound and outbound communications. Most of its competitors do inbound flows only.

In a world where competition may force price wars on CPaaS basic offerings of voice and SMS, adding support for omnichannel seems like a good way to limit attrition and churn and increase vendor lock-in.

RingCentral – Embeddables

RingCentral isn’t a CPaaS vendor. They offer a communication service for the enterprise. You got a company and need a way to communicate? There’s RingCentral.

What they’ve done in the past couple of years was add an API layer to some of their services. Things like pushing messages into Glip, handling phone calls, etc.

The idea is that if you need something done in an automated fashion in RingCentral you can use the API for it. In many simple cases, this might be used instead of adopting CPaaS APIs. in other cases, it is about using a single vendor or having specific integrations relevant to the RingCentral platform.

What RingCentral did was add what they call Embeddable:

“With RingCentral Embeddable, you can embed a full-featured softphone into your favorite web application for an integrated communications experience that drives productivity and ease of use without lengthy development time“

This concept of embedding a piece of code isn’t new – YouTube videos offer such a capability as well as a slew of other services out there. When it comes to communications, it is similar in nature to what TokBox has in the form of Video Chat Embeds but done at the level of users and their user accounts on RingCentral.

This definitely makes integrations of RingCentral with CRM tools a lot easier to get done, and makes it easier to non-developers to engage with them – similar to how Flow type offerings make it easier for non-developers to handle communication flows.

SignalWire – Price and Flexibility

SignalWire is an interesting proposition. It comes from the team that created and is maintaining FreeSWITCH, the leading open source framework used today by many communication providers, including some of the CPaaS vendors.

The FreeSWITCH team decided to build their own managed service (=CPaaS in this case), calling it SignalWire. Here’s a few examples of the punchy copy they have on their website:

  • Advanced communications from the source
  • We don’t price gouge you for carrier services like per-minute and per-message rates. Focus on what’s important to your business, not your phone bill

What they seem to be aiming for are two things: price and flexibility

Price

They offer close to whole-sale price points (at least based on the website – I haven’t gone to a price comparison on this one, though their sample pricing for the US does seem low).

To make things easier, they are targeting Twilio customers, doing that by offering TwiML support (similar to what Pilvo did/is doing). TwiML is a markup language for Twilio, which can be used to control what happens on connected calls. Continuing with the blunt approach, SignalWire calls this LāML – Legacy Antiquated Markup Language.

While this may fit a certain type of Twilio customers, it certainly doesn’t cover the whole gamut of Twilio services today.

Flexibility

On the flexibility front, there’s mostly marketing messages today and not any real announced products on the SignalWire website.

Besides LāML there’s a WebSocket based client API/SDK, not so different than what you’ll find elsewhere.

They can probably get away with it in the sales process by saying “we give you FreeSWITCH from the source”, but I am not sure what happens when developers want to configure that elastic cloud service the way they are used to be doing with their own FreeSWITCH installation.

All in all, this is an interesting offering and an interesting approach and go to market.

TeleSign – Security and Data Analytics

TeleSign is focused on SMS. And a bit of voice. As their website states: “APIs Delivering User Verification, Data Insights & Communications”

Since security, verification and fraud prevention these days rely heavily on analytics, TeleSign are “horeding” data about phone numbers, using it for these use cases. It isn’t that others don’t do it (there’s Twilio Authy, nexmo Number Insight and others), but this is what they are putting front and center.

Since their acquisition by BICS, a wholesale operator for wireline and wireless carries, that has grown even further, as they gain access to more and more data.

It will be interesting to see how TeleSign grows their business from security to additional communication domains, or will they try to focus on security and expand from the telecom space to adjacent areas.

Twilio – Adjacencies

Talking about adjacencies, that’s what Twilio is doing. Now that they are a public company, there is even more insatiability for growth within Twilio, in an effort to find more revenue streams. So far, this has worked great for Twilio.

Here are two areas we’ve seen Twilio going into:

  1. Contact centers, shifting away from developers per se with their CPaaS platform towards a cloud based contact center offering, competing head to head with some of their own customers (that would be Twilio Flex)
  2. Email, through the acquisition of SendGrid

How email fits into the Twilio communication APIs is still an open question, though I can see a few interesting initiatives there.

And then there’s the wireless offering of Twilio, which resembles a more flexible M2M play.

But where would Twilio go next?

UCaaS, going after unified communications vendors and competing with them head to head?

Maybe try to jump towards an Intercom like service of its own? Or purchase Intercom?

Or find another market of developers that is growing nicely – similar maybe to its recent Stripe integration of Twilio Pay.

Twilio in a way has been defining and redefining what CPaaS is for the past several years. They need to continue doing that to stay in the lead and well ahead of their competition.

VoIP Innovations – Marketplace

VoIP Innovations came out with what they call Showroom.

Here’s a short video of the explanation of what that is exactly:

Many of the CPaaS vendors offer a partner program of sorts. This is where vendors who develop stuff for others or build tooling and apps on top of the CPaaS vendor’s APIs can go and showcase their work. The programs vary from CPaaS company to another.

Twilio has Showcase as well as an add-on marketplace of sorts. Nexmo has a partners directory. VoIP Innovations are banking on their showroom.

What makes it different a bit is the target audience associated with it:

  1. Developers – obvious, as CPaaS caters first and foremost for developers
  2. Resellers – who can pick off marketplace apps, whitelabel and resell them
  3. Subscribers – who pay for that privilege

While there isn’t much documentation to go about, I am assuming that the whole intent behind the marketplace is to offer direct monetization opportunities for developers and resellers by taking care of customer acquisition as well as payment on behalf of the developer and reseller.

A concept taken from other marketplaces (think mobile app stores). It will be intersting to see how successful this will be.

Vonage – UCaaS+CPaaS

Vonage is interesting. Started as consumer VoIP, turned cloud UC vendor (=enterprise communications) through acquisitions, turned to acquire Nexmo and then TokBox to add CPaaS, continued with NewVoiceMedia acquisition to cover contact center space.

How does one differentiate in such a way? Probably by leveraging synergies across its product offerings and markets.

What Vonage recently did was bring number programmability from its Nexmo/CPaaS offering to its VBC/UCaaS platform.

What do they gain?

  1. Single API across product lines, making it easier to learn and use the same APIs
  2. Large ecosystem of developers using Nexmo able to build on VBC – it is… the same API
  3. The level of flexibility that a CPaaS platform has right on top of a UCaaS offering. In this case, scripting using Nexmo NCCO

Is this good for Nexmo customers and partners? Yap. They can now reach out to the Vonage business customers as an additional target market.

Is this good for Vonage customers and partners? Yap. They can now do more, and more customized communications solutions with this added flexibility.

Voximplant – Flow

Voximplant is one of the lesser known CPaaS vendors. Its whole platform is built on the concept of an App Engine, where you write the communications logic right onto their platform using Java Script. It is serverless from the ground up. A year or two ago, Voximplant added Smartcalls. A product that enables you to sketch out call flows for outbound interactions – marketing, sales, etc. These interactions would be played out across a large number of phone numbers and get automated, making it really easy and flexible to drive phone based campaigns.

Now? Voximplant took the next step of adding inbound interactions, covering the IVR and contact center types of scenarios.

Twilio, MessageBird and Plivo offer inbound visual flow products. These allow developers to drag and drop communication widgets to build a flow – a customer interaction through the system.

Voximplant and Infobip offer inbound and outbound flows, where you can also plot company/agent based initiatives with greater ease as well as the customer initiated interactions.

Why aren’t you listed here?

The CPaaS market is large and varied. It is hard to see everyone all the time. It is also hard to innovate and differentiate every year. The vendors here are the ones I had briefings with or ones who promoted their products in ways that were visible to me. But more than anything, these are the ones that I felt have changed their offerings in the past year in a differentiating manner.

BTW – if you think that differentiation here means that it is a functionality that other vendors don’t have then you are wrong. Doing that is close to impossible today. Differentiation is simply where each vendor is putting his focus and trying to attract customers and carve his niche within the broader market. It is the stories each vendor tells about his product.

If you feel like a vendor needs to be here, or did something meaningful and interesting, just contact me. I am always happy to learn more about what is happening in the market.

Who is missing in my WebRTC PaaS report?

Later this month, I will be releasing my latest update of the WebRTC PaaS report.

There are changes taking place in the market, and what vendors are offering in the WebRTC space as a managed API service is also changing. This report is there to guide buyers and sellers in the market on what to do.

For buyers, it is about which platform to pick for their project – or in some cases, in which of the platform vendors to invest.

For sellers, it is about what to add to their roadmap. To understand how they are viewed from the outside and how do they compare to their peers.

Here’s who’s been in the last update of the report:

Think you should be there? Contact me.

Want to purchase the report? There’s a 30% discount on it from today and until the update gets published (and yes – you will be receiving the update once it gets published for no additional fee).

There will be a new appendix in the report, covering the topic of Flow and Embeddable trends in the market. Something that will become more important as we move forward.

The post CPaaS differentiation in 2019 appeared first on BlogGeek.me.

How does WebRTC connect people?

Mon, 03/25/2019 - 12:00

WebRTC doesn’t really connect people, but the way you think about it signaling is important to your WebRTC application.

Here’s a comment left on one of my recent articles:

WebRTC is… still just a little confusing…Tsahi, i’m reading the book recommended by Loreto & Romano but the examples are outdated. With regards to the SDP signal – if peer A is on a webRTC application, but peer B is surfing youtube – How does peer B get notified of an offer? It would have to go to peer B’s email address right? — because there is no way of knowing peer B’s IP address. Please help.

A few quick things before I dig deeper into this WebRTC connectivity thing:

  • Yap. WebRTC is a little confusing. Maybe even a lot. It doesn’t behave like any other browser technology we have
  • The sad thing about books about WebRTC is that they didn’t age all too well. WebRTC still changes too fast
  • There’s some confusion here in wording – peers, offer, etc.

How well do you know WebRTC? Check it out in my online WebRTC quiz.

Take the WebRTC quiz

Connecting, Signaling and WebRTC

I’ll try to use a kind of a bad comparison here to try to explain this.

Let’s say you are the proud owners of a Pilates studio. You’re the instructor there (#truestory – at least for my wife).

My wife gives Pilates lessons at different hours of the day. These are private lessons so it is rather flexible on both sides. But let me ask you this – how do people know when to come for a lesson?

This being Israel, they usually communicate with my wife via Whatsapp to decide together on the date and time. Usually, people stick to the day of week and time and start communicating only if they can’t make it, want to reschedule or just make sure the lesson is still taking place.

Back to WebRTC.

WebRTC is that Pilates studio. It does one thing – enables live media to flow from one browser to another. Sometimes also non-browsers, but let’s stick to the basics here.

How do the people who need to share or receive that live media connect to each other? That’s not what WebRTC does – it happens somewhere else. And that somewhere is the signaling mechanism that you pick for your own application. I am calling it a mechanism and not a protocol, since it is going to be a tad more confusing in a second.

Or not.

Now let’s go back to WebRTC, signaling and connecting people and look at it from a point of view of different scenarios.

Scheduled Meeting

We’ll start with a scheduled meeting. At any given point in time, I have a few of those coming up. Meetings with clients, partners and potential clients. Here’s one such calendar invitation:

This one happens to take place using Google Meet. Who’s calling who? No one really. I’ll just click that link in the invite when the time comes and magically find myself in the same conference with the other participants.

In most scheduled conferences, you just join a WebRTC link

Where do you get that link to use?

  • Inside the calendar invite
  • In an email that was sent
  • Through an SMS reminder

Some of these services allow inviting people from inside the meeting. That ends up being sent to them via email or an SMS as a link or just dialing their phone (without WebRTC).

Ad-hoc “upgrade” of text chat to video conference

There are ad-hoc calls. These usually start from a chat message.

Often times, I’d rather text chat than do a voice or a video call. It has to do with the speed and asynchronous nature of text. Which means that I’ll be chatting with someone over whatever instant messaging service we select, and at some point, I might want to switch medium – move from text to something a bit more synchronous like video:

Like this example with Philipp – most of our conversations start in Hangouts (that’s where he is most reachable to me) and when needed, we’ll just jump on a call, without planning it first.

Who is calling whom here? Does it matter?

What happens here is that both of us are already “inside” the communications app, so we both have a direct link to the service. Passing that information from one side to the other is a no brainer at this point.

So how will that get signaled? However you see fit. Probably on top of a Websocket or over HTTPS.

I am calling you on the “phone”

What if there’s nothing pre-planned, so it isn’t a scheduled meeting. And we haven’t really been on a text chat to warm things up towards a call. How do you reach me now?

How do you “dial”?

Puneet is one of our support/testing engineers at testRTC. While he will usually text me over slack to start a call, he might just try calling directly from time to time.

What happens then?

I am not in front of my laptop with the Slack app opened. My phone is on standby mode. How does it start ringing on me? What does WebRTC do to get my attention?

Nothing.

The phone starts dialing because it received a mobile push notification. I’ve got the Slack app installed, so it can receive push notifications. Slack invoked a push notification to wake up the app and make it “ring” for me.

The same can be done with web notifications. And there are probably other means to do similar things in IOT devices. The thing is – this is out of scope for WebRTC, but something that is doable with the signaling technologies available to you.

Contact center agent answering calls

When a contact center adopts WebRTC to be able to migrate its agents from using desktop phones or installed softphone towards WebRTC, calls will end up being received in the browser.

This happens by integrating callbars inside CRMs or just by having the CRM implement the contact center part of the equation as well.

What happens then? How do calls get dialed? (the above is a screenshot taken from Talkdesk’s support site)

They go through PSTN towards a PBX. More often than not, that PBX will be based on Asterisk or FreeSWITCH, though other alternatives exist. PBXs usually base themselves around the SIP protocol, which will lead to two alternatives on the signaling protocol that will be used by WebRTC in the browser:

  1. SIP over Websocket. Practically the same thing happening in SIP will happen on the browser
  2. Some proprietary protocol will be used, translated from SIP

In both cases, the contact center agent is registered in advance. It is also marked as “available” in most contact center software logic – this means that incoming calls waiting in the call center queue can be routed to that agent. So it is sitting and waiting for incoming calls. In some ways, this is similar to the upgrade from text chat scenario.

Connecting? WebRTC?

When it comes to actual users, WebRTC doesn’t get them “connected”. At least not from a signaling point of view.

What WebRTC does is negotiate the paths that the media will use throughout the session. That’s the “offer-answer” (or JSEP) messages that pass between one WebRTC entity to another. And even that isn’t sent by WebRTC itself – WebRTC creates the blob of data it wants to send and lets your application send it in any way you see fit.

Still confused? There’s a course for that – my online WebRTC training. The first module (out of eight modules) is free, so go learn about WebRTC.

Get a WebRTC training

The post How does WebRTC connect people? appeared first on BlogGeek.me.

Why is WebRTC winning over its (non)competition?

Mon, 03/18/2019 - 12:00

WebRTC wins over competition because there is no competition – browsers offer only WebRTC as a technology for web developers.

It was raining and miserable this last Saturday. I had lost of ideas for articles to write for BlogGeek.me in my backlog, but none of them really inspired me to action. The 8yo went to his cousin. The wife had her own things to do. My 11yo daughter was bored to death. She comes to me and says: “Can we do a trip outside to the park? I need some fresh air.” How could I answer besides saying yes?

The rain stopped a bit, so we went outside. What she really wanted wasn’t fresh air, but a chaperone to the closest candy vending machine. They are having a game at school for Purim, where she needs to bring small presents and candies to another kid in her class without her knowing who is pampering her. She needed an extra candy.

How is this related to WebRTC? It isn’t.

When I asked her about her plans for this game, she mentioned the trinket she planned on giving today –

2 mechanical pencils.

And that’s definitely WebRTC related.

A quick conversation ensued between me and my daughter – are these 0.5 mm or 0.7 mm point type? My daughter went to explain that it might even be 0.9 mm.

So many alternatives.

Competing standards

It got me thinking:

With analog video recording we had VHS and Betamax.

Paper size? A4 and Letter.

Power frequency? 50 Hz and 60 Hz.

With VoIP signaling we had H.323 and SIP. And also XMPP.

Audio and video codecs? A shopping mall of alternatives.

Web browser streaming? HLS and MPEG-DASH.

Inches and Meters. Left side vs right side driver in cars.

The list is endless.

WebRTC standard

But browser based real time media communications?

WebRTC.

There. Is. No. Other. Alternative.

We had that short romance around ORTC, which ended with ORTC dead and its main concepts just wrapped back into WebRTC.

What other technology would you use or could you use inside a browser to do a video call?

Nothing.

Just WebRTC.

The other alternatives just don’t cure it (including what Zoom is presumably doing).

  • You want to build a real time service
  • It needs to run in the browser
  • You use WebRTC

What does that mean exactly? It gives us a kind of a virtuous circle.

  • You want to build a real time service
  • Looking at alternatives, you find WebRTC
  • There’s a vibrant community around it (because of web browsers)
  • Alternatives are limited proprietary solutions or old open source
  • You pick WebRTC
  • Adding to its popularity, adoption and ecosystem

For the most part, there’s no question if you should select WebRTC these days. There’s also no question what are the alternatives (there usually are none). It isn’t a question if WebRTC is getting adopted, used, growing or popular.

When our window to the world is the browser, then WebRTC is what you use.

For mobile apps or other devices, the need for browsers or just having an ecosystem around the technology picked translates again to WebRTC.

Thinking of using real time media technology? That’s synonymous to WebRTC.

Want to learn more about WebRTC? Check out the first module of my online course – it is free.

Start learning WebRTC

The post Why is WebRTC winning over its (non)competition? appeared first on BlogGeek.me.

Are you blocked by the rules of your upbringing in your WebRTC application?

Mon, 03/11/2019 - 12:00

I know I am. I am constantly surprised what people are doing with WebRTC.

Here’s something I hear a lot:

How do you make a call with WebRTC?

Well… you don’t. Not really. And in many scenarios – that term call, or dialing, or answering – has no real meaning.

Here’s a funny opposite for you:

Kids in front of old phones don’t know what to do. It isn’t “natural”. Guess what? Nothing is. The things that are natural to you are things you’ve learned, and are now used to. They are a set of rules in your upbringing.

If you come from a VoIP background, then WebRTC brings with it quite a challenge to your world. I know – I had 13 years of VoIP background before WebRTC was announced. Since that announcement, I’ve been surprised time and again by what people are doing with WebRTC. Especially people who shouldn’t be able to even use it because they don’t know VoIP enough.

Coming from VoIP? Interested in streaming? Broadcasting? Some other communication use cases? Tomorrow I am hosting a free webinar – Google Does Gaming: WebRTC Man-to-Machine Use Cases

Register to the webinar

When we all first started out in this adventure called WebRTC, what we’ve seen was video calling. It was all about face to face meetings. It took time to think about WebRTC in other settings and for other use cases.

And here we are. Years later, dealing with WebRTC in the aid of cloud gaming. Google used WebRTC in Project Stream, where they showcased playing the game Spartan through a web browser – the game itself was rendered in Google’s cloud.


(that’s a screenshot of one of my slides for tomorrow’s webinar)

Who would have thought WebRTC would be used for that?

Anyways, if you come from a VoIP background, here are some aspects of WebRTC you’ll need to unlearn and relearn – I am still grappling with them myself every once in awhile:

Signaling? What’s “Signaling”?

With any other VoIP protocol out there, it seems like we’re starting off with signaling.

H.323? Signaling.

SIP? That’s signaling.

XMPP? Ditto.

WebRTC? Nope. No signaling. Sorry.

What does that mean exactly? That you can use whatever signaling mechanism/protocol you see fit. That’s assuming you can get it to run inside a web browser or wherever it is your application needs to operate.

SIP, which is the most popular VoIP signaling protocol out there, is probably an overkill for a lot of WebRTC services. I tend to look at it as a hindrance when I see it in architectures – I often ask time and again why is it there to make sure there’s a real need other than saying someone needed signaling for his WebRTC application.

You. Don’t. Answer. Calls.

There’s no such thing as a call while we’re at it.

I remember doing a live WebRTC training a couple of years back. I had to hammer out of the people the need to ask incessant questions about dial, answer, mute, hold and a bunch of other paradigms they thought are golden rules in communications.

If you feel that way too, then look at that video at the top of this article again. What made sense 20 years ago doesn’t hold water today.

WebRTC isn’t fixed in any specific concept of how “calls” are made. I prefer using the term session and deal with the initiation part of it on a case by case basis.

If there’s no need for dialing or answering – just don’t force it on your WebRTC solution.

It isn’t only Google

Most days of the week, I like thinking of WebRTC as the source code that resides on webrtc.org. That’s the codebase Google is maintaining and putting inside its Chrome browser.

The thing is, many end up modifying it for their own needs. They:

  • Port it over to mobile
  • Fix private bugs in it
  • Add their own minor modifications to it where needed
  • Seriously change it (check out what Discord did)
  • Modify the Chromium version, replace it inside Electron and release their own stuff

There are some really interesting “mods” to the vinyl WebRTC implementations out there, usually held privately for internal use of companies. In many ways, this is a shortcut to building your own media engine from scratch.

There’s more than one way

What I like about WebRTC is that usually, there’s a single way of doing things with it: everything is encrypted – you can’t override that; it defaults to multiplex and bundle its media connections; the list goes on.

How you use it is a totally different story.

Each SFU implementation is different than the other. There are different ways to record a session. Different ideas and approaches to broadcasting at low latency.

The “right” answer differs a lot not only based on the use case, but also on the business model, the developers available, the DNA of the company, etc.

Wasteful can be just fine

There’s also a school of thought that never really existed with VoIP: the “good enough” approach – one where we’re just fine with not optimizing everything and leaving things it a kind of a mediocre stage that is good enough for what we’re trying to do. It may eat up to much bandwidth or tax on the CPU. Or just not be how things are done around here. But it works. Good enough.

Heck – the default WebRTC implementation does it on its own, deciding to waste 1.7Mbps for a VGA resolution encoding instead of limiting it to 800kbps or less. Such a waste of good resources.

I learned to love this approach (and then try to optimize it with my clients).

How do you think about WebRTC?

What about you?

What mistakes you see people make when thinking about WebRTC that fits the web or VoIP better?

What things do you need to unlearn about WebRTC?

Coming from VoIP? Interested in streaming? Broadcasting? Some other communication use cases? Tomorrow I am hosting a free webinar – Google Does Gaming: WebRTC Man-to-Machine Use Cases

Register to the webinar

The post Are you blocked by the rules of your upbringing in your WebRTC application? appeared first on BlogGeek.me.

When will WebRTC 1.0 be available?

Mon, 03/04/2019 - 12:00

Some believe WebRTC isn’t ready. I think it is ready. But when will WebRTC 1.0 be available?

Ready or not, WebRTC is here. The thing is, we still don’t have a closed standard specification we can all print and take on a plane to read for our enjoyment. There are drafts – but nothing that is final.

And once final, does it mean that it is available?

There are 3 parts that needs to be addressed to answer this question. I’ll deal with only two of them (skipping the IETF one):

  1. When will the relevant WebRTC draft become IETF RFC
  2. When will the relevant WebRTC draft become W3C recommendation
  3. When will browsers implement the new specification

Want to learn more about WebRTC, the various components in its specification and what compute power you need for each WebRTC server? Try out my free video course:

Learn about WebRTC servers

Want to learn more about WebRTC, the various components in its specification and what compute power you need for each WebRTC server? Try out my free video course:

WebRTC standardization

WebRTC as a standard is built out of two components:

  1. What goes on over the network – that’s what the IETF is working on
  2. What APIs can developers use on top of a web browser – that’s what the W3C is working on

Most of the industry is already viewing WebRTC as a done deal – so much so that the IETF already has an RFC for SIP over WebSocket. The only reason to have such an RFC is to be able to use SIP inside a browser, and the only way to use SIP inside a browser with media being sent or received would be by way of WebRTC. The people working at the IETF were so certain WebRTC will get an RFC of its own in 2014 already (5 years ago!).

Each of these organizations has its own set of rules, policies, governance and flow.

I’ve tried to keep the standardization of WebRTC at arm’s length. In the past I’ve been part of standardization processes related to H.323 and 3G-324M, going to ITU-T and 3GPP standardization meetings as well as acting as a co-chair of the 3G-324M activity group at the IMTC (dealing with interoperability). It is a tedious work that combines technology with politics. As fun as it is (at times at least), dealing with it as an employee of a company is different than doing it as a consultant. The value for me just wasn’t there.

For vendors? If you want to take a driver’s seat at this, and decide what gets more attention, then you should invest time in it.

But where are we with WebRTC then?

W3C WebRTC status

I’ve asked Dominique Hazael-Massieux about WebRTC’s status. He works as a W3C Staff dealing with WebRTC. Here’s what I got –

When it comes to W3C, where the browser WebRTC APIs are being defined, WebRTC is considered to be at the CR stage.

CR means a Candidate Recommendation. We’ve moved from a Working Draft (WD) towards a Candidate Recommendation.

Next up would be PR – Proposed Recommendation, and from there, a Recommendation.

How do we move to the next step?

  1. First the draft needs to be finalized. There are some open issues that needs to be closed for that to happen (at the time of writing this, there were 53 open issues)
  2. All the features written in the draft need to be implemented in two independent browsers (this is kinda tricky now that Chrome is gobbling up the market). More on browser implementations later
  3. It needs to be tested for interoperability across browsers. So tests needs to be written to validate that

That first one is “easy”. Get the people writing the spec into a room. Have them agree. Then have someone write down the agreement on “paper”. Get everyone to read it. And agree again. Rinse and repeat. It’s never easy.

That second one of implementing in browsers? That’s also not easy. They have other things on their minds as well. And WebRTC is pretty darn complex to implement. But we’re getting there.

That third one of interoperability testing? With a test suite. That tests for the various features? This is downright suicidal. And daunting.

All that work needs to be done for “free”. There’s no direct money to be made out of it. But lost of hours needs to be spent by many people to get it done. We’re getting there, but we’re not there yet.

WebRTC 1.0 browser implementation

And then there are the browser implementations.

The specification is as good as its implementations. People always complain when I suggest following the Chrome behavior in WebRTC as opposed to implementing against the specification. That’s where theory and expectations meets reality.

At the end of the day, your service will need to:

  1. Run inside web browsers; and/or
  2. Integrate/port/embed a WebRTC SDK in your app

In the first case, Chrome wins on market share; Microsoft Edge will be migrating to Chromium. And for most use cases, Chrome is the first browser to target anyway.

In the second case, if you are using the code in webrtc.org for your app, then you are effectively basing your app on Chrome’s WebRTC implementation.

Better go with what’s available now than what will be ready some time in the future.

In the past, the changes we’ve seen in browser implementations of WebRTC revolved a lot around media optimizations and interoperability across browsers. What we are seeing now a lot more is changes in the API layer, where browsers are shifting towards the WebRTC 1.0 specification. This is necessary because:

  • Without spec compliant implementations we can’t move WebRTC from CR to PR
  • People still (rightfully) expect to have the specification implemented by browser vendors
  • It is about time…

These changes mean one sad thing though. You can be certain in one thing – during 2019, WebRTC implementations in browsers is going to break existing apps multiple times. This is due to the changes taking place. We are seeing migration from Plan B towards Unified Plan, modifications to the connection state machine, and an experimental implementation of mDNS. There’s more that I probably forgot and more ahead of us still.

The only certainty is that nothing is certain. You’ll need to continue investing in aligning with the browser implementations with each and every browser version release.

When then?

The current intent is to be able to get to the PR stage for WebRTC somewhere in Q3 2019. Will it be postponed further? I don’t really know.

Interestingly, work has started in parallel about WebRTC NV – what comes next. I’ve covered the WebAssembly in WebRTC part of it in the past.

Want to learn more about WebRTC, the various components in its specification and what compute power you need for each WebRTC server? Try out my free video course:

Learn about WebRTC servers

The post When will WebRTC 1.0 be available? appeared first on BlogGeek.me.

The five make-or-break WebRTC challenges you need to address

Mon, 02/25/2019 - 12:00

WebRTC is a great piece of technology, assuming you can develop a coherent strategy on how you plan on using it.

There are two extremes happening in the enterprise communication space, and they are quite opposite in nature. On one hand, companies are striving towards more automation and this is coming to their contact centers by way of machine learning and bots “replacing” humans. On the other hand, many of us are striving for better and more meaningful communications. Be it for long distance relationships (personal as well as business ones) or by the use of machine learning (again) and context, to guide us through an interaction – being able to know beforehand the intents of people for example.

Enter WebRTC, which enables communications to take place anywhere – be it a mobile application, a physical device or a modern web browser. What WebRTC brings with it is better context of sessions and lower barrier of entry for enterprises to make use if this technology. Some enterprises use it to improve business agility or lower their operating costs. Others use it to create new businesses never before seen or to improve the communications with their customers or peers in the industry.

We are now 7-8 years since the announcement of WebRTC (depends on who’s doing the counting and from which date), but in many ways, a lot of enterprises (I don’t want to say most) have failed in to capture the value they initially envisioned from using WebRTC. In many cases, the lack of any thoughtful strategy created a rush towards initiatives that never really matured.

Through my work with many clients on their WebRTC initiatives along with discussions with many others on their projects and services – failed as well as successful ones, I’ve seen a few challenges that crop up consistently across such initiatives.

#1 – Where to begin?

WebRTC is a versatile and powerful building block in your arsenal. This means that you can do a lot with it. That range of utility can be overwhelming, oftentimes leading to wasted resources. The other problem is that WebRTC can’t do everything, while the expectations of it are rather high. This leads to requirements and plans that are often not grounded in what can be done in reality or within the allocated budget and resources.

Deciding what to build using WebRTC requires an understanding of the capabilities and limitations of WebRTC coupled with a clear view of the communication problems you are trying to solve for your customers. There’s a lot of feature creep happening when it comes to WebRTC. I find myself asked about a simple video chat service for 2 people, but once you dig a layer deeper, you see requirements for group video calls, recording and even broadcasts as part of the project. Being able to see the full picture, and map it back into requirements and a roadmap comprised out of multiple phases is an important first step in any WebRTC initiative.

There are a few other things to keep in mind –

Integration with existing infrastructure

Oftentimes, you’d be planning on adding WebRTC to an existing service. This can happen in many ways:

  • A chat application that gets voice/video interactions as an additional feature
  • An existing telephony/communication service that needs to get guest access via the web browser
  • Just a regular self service application with a new option to connect to the contact center via the application itself (instead of using expensive 1-800 numbers)

This requires extra care in how WebRTC gets introduced as it isn’t going into a green field where anything you pick immediately fits your needs.

Cloud migration and transformation

WebRTC was born in the cloud era. Many of its deployments are cloud based.

Most of its uses in non-cloud environments are actually enabling guest access from the public cloud towards the internal communications infrastructure. In other cases, it just needs to integrate with on premise data centers for things like users database and policies.

This places an additional strain on enterprises who are just starting out their migration towards the cloud.

Not your regular web application

WebRTC is different than other web technologies. It has a lot more moving parts to get to a minimal viable product, and then there’s that media quality issue to contend with. Its deployment needs to start as a global one for many of the use cases.

What are the server side components needed for WebRTC? Learn that in my free online mini video course.

Register now

#2 – Who should I have on my team?

Putting a team of developers on a WebRTC initiative is a daunting task. There are multiple disciplines they need to come from and the myth of a full stack developer that can do it all gets stretched even further here, as that superhero needs to also know about media processing, WebRTC APIs, browser changes and standardization processes.

Here’s what i wrote a while back about WebRTC developers after discussing the topic with a few people who manage/hire them.

Some other aspects you’ll need to decide on:

Internal vs External

Will you be relying on your existing engineering team or will you be outsourcing some/most of the project to an external vendor? Assuming you decide to go for an external vendor, who will maintain the service on an ongoing basis?

Multidisciplinary

The team in question needs to be multidisciplinary, capable of handling anything from media processing, to mobile app development, to backend integration work and ongoing DevOps and maintenance.

There needs to be a skilled product manager and a system architect who understand WebRTC enough to know what is possible and what’s… less possible. What incurs risk and where quick wins can be found.

Which new skills are needed?

Your teams. Do they have the necessary skills?

Here it goes to a lot more than just developers. There are product managers, testers, DevOps people, support staff.

Do I need to enhance some in-house capabilities?

What skills are you missing? If you operate everything on premise and WebRTC is forcing you to start using cloud services, then this is an in-house capability you will need to start contending with.

The same goes for mobile application development, going global in how you deploy servers, etc.

Looking to beef up the WebRTC experience and skills of your team? Check out my WebRTC training (the first module is free).

Enroll to my course

#3 – What technology stack do I use?

Different companies have different DNA to them. That often dictates what their technology stack will look like and how they’d prefer to partner/hire.

There are three main aspects that need to be taken into account when picking a WebRTC technology stack:

Open source / commercial

You might favor open source components and frameworks for your WebRTC service or you might be someone who prefers a commercial offering with a company focused on that product development.

Both alternatives can come with support contracts but companies seem to prefer one or the other.

Which alternative will it be for you?

Hosted or on prem?

These two approaches means different technology stacks, levels of expertise and staffing on your end.

Are you planning on hosting this on your own, in your data centers, on bare metal or in the cloud? Or are you going to have someone else host the service for you? Which parts of it will be managed and which will be self managed?

Acquisitions

WebRTC is still relatively new, with the vendors ecosystem dynamically shifting. There have been quite a few acquisitions in this space. These acquisitions sometimes removed solutions from the market, made them weaker or made them stronger.

When selecting a technology stack, the potential acquisition scenario of the vendors in question needs to be taken into consideration as well.

Fit for the requirements

This one seems silly but it is highly relevant and important.

Are you sure the technology stack you’ve selected can do the things you want it to do?

I’ve seen too many cases where the framework used wasn’t up for the task. Things like taking signaling when media servers needs to be used, picking a CPaaS vendor when the scenario requires too much control of media processing, etc.

Just look at what WebRTC signaling alternatives people have these days.

#4 – How do I know it is working?

You built it. Tested it in the lab. Did a call or two with your colleagues. Went home and showed it to a friend.

Does it scale? Will it work properly?

I had a customer recently who is developing a group video calling feature. He wanted to test the service with around 20 people in a single room. It wasn’t easy to find 20 people to run that one scenario. And when he did – things broke and needed fixing. So he had to find 20 people to run it again once a fix was put in place.

Testing is often neglected when it comes to WebRTC applications and it shouldn’t be. Take this one seriously. You can cobble up a testing environment on your own (there are even a few open source projects that can help you out here) or you can just use testRTC (I am a co-founder there) and start running tests within a couple of hours.

#5 – What do I track?

Tracking websites is rather “easy” these days. Use Nagios, Cacti, Zabbix or any other open source tool that sounds like a disease. Or use something like New Relic or DataDog to do it managed in the cloud.

Problem is, these tools only cover the machines metrics and performance and they don’t really watch for the media and its quality (or even if a session got connected for that matter). There’s no end to end monitoring/tracking.

You will need to collect WebRTC related metrics from either the backend or the devices (or both). You’ll need to track it for quality.

You’ll need to monitor your service (we’re doing a webinar on WebRTC monitoring next more @ testRTC – register to join).

How can I get help?

There are various ways in which you can get some help for what you are doing.

The best approach is probably to get some external assistance in what you are doing as part of your research and planning – even before you go outsourcing the whole project (if that’s the path you are going to take).

You can contact me for that, or go to other consultants. Some of the outsourcing vendors offer such consultancy service as well. Whatever you do – don’t go it alone. At least not in the planning stages.

The post The five make-or-break WebRTC challenges you need to address appeared first on BlogGeek.me.

Who needs QUIC in WebRTC anyway?

Mon, 02/18/2019 - 12:00

Is QUIC in WebRTC a solution looking for a problem or a real requirement?

QUIC is the next evolution of browser transport protocols. I’ve written about it in 2015, when Google started experimenting with the idea of replacing SCTP with QUIC for data channels. Three and a half years later, and we still don’t really have QUIC in WebRTC – at least not until last month. Google decided to come out with a new RTCQUICTransport for WebRTC in Chrome and written a post about it on their Chrome Developers site.

UDP, TCP, SCTP & QUIC. How do these transport protocols compare?

Download my free Transport Comparison Table

What is QUIC again?

I am not going to go into the technical details – I’ve done that in the past already, and there are other places for that. I want to focus here on the bigger picture.

If you look at the timeline of web transport protocols, it looks something like this:

We had TCP and UDP for some 40 years now. HTTP 1.1 is defunct, but runs most of the internet at the moment. HTTP/2 is growing nicely in adoption. According to W3Techs, we’re standing on ~33% adoption for HTTP/2 (Feb 2019):

HTTP/2 came to be after Google came out with SPDY, a “fix” for HTTP and got parts (most?) of it wrapped into HTTP/2 to get it standardized.

HTTP 1.0, 1.1 and HTTP/2 are all built on top of TCP. Signaling, which requires reliability and causality won’t work on top of UDP without adding these characteristics. After around 40 years, it is time for a refresh. Enter QUIC. It uses UDP and works in ways that are better than TCP for signaling purposes.

QUIC follows a similar path – Google created it to “fix” the ailments of HTTP over TCP. the end goal here is to turn it into HTTP/3.

Since QUIC is built on top of UDP, it can handle a lot more than just HTTP signaling. Which is why it is becoming an interesting topic for WebRTC –

Where QUIC in WebRTC fits exactly?

This is the real question. My answer to it in 2015 was this:

There are two places where QUIC fits in WebRTC:

1. In the signaling, which is out of scope of WebRTC, but interesting, as it enables faster connection of the initial call (theoretically at least)

2. In the data channel, by replacing SCTP with QUIC wholesale

Google’s answer in their post on Chrome Developers blog?

Why?

A powerful low level data transport API can enable applications (like real time communications) to do new things on the web. You can build on top of the API, creating your own solutions, pushing the limits of what can be done with peer to peer connections, […] WebRTC’s NV effort is to move towards lower level APIs, and experimenting early with this is valuable.

Why QUIC?

The QUIC protocol is desirable for real time communications. It is built on top of UDP, has built in encryption, congestion control and is multiplexed without head of line blocking.

Hmm… somehow they lost me in that explanation somewhere. This is about real time communications. It is about doing stuff on top of UDP. And it is about low level APIs. Great. Why do I need it again? For voice and video I already have SRTP in WebRTC. The SCTP data channel works quite well. So where exactly do I need this great thing called QUIC in WebRTC?

I think there’s merit, but it is in totally different places.

QUIC is about having a single, modern, common transport protocol for the web.

Here’s what we do today with WebRTC in terms of transport protocols:

  • HTTPS, HTTP/2 or WebSocket for our signaling, which runs over TCP/TLS
  • SRTP for media, which runs over UDP
  • SCTP for data channels

There’s this popular drawing from the High Performance Browser Networking book that shows this amalgamation of protocols:

So many transport protocols in a single standard. This makes implementations of the backend more complex, as they need to be able to understand all these transport protocols as well. One can say that this is already common enough and widely used already that it is a solution looking for a problem, but the developer in me can appreciate unifying all these functionality over a single transport protocol.

Here’s how life will look like with QUIC in WebRTC:

  • QUIC is being planned for HTTP/3, so it can be used for WebRTC signaling moving forward (replacing both WebSocket and HTTP/2)
  • QUIC is looked as an SRTP replacement, which means sending real time audio and video can take place on top of it
  • QUIC can replace SCTP for the data channels (that was the obvious use of QUIC in WebRTC to begin with)

Putting it into an architecture diagram of my own, we get this:

Much simpler.

What do we gain?

Theoretically, we can multiplex signaling, voice, video and low latency data in a single QUIC connection. That’s powerful:

  • We can now tunnel or proxy all that WebRTC traffic with a lot less logic, boxes and code in our servers
  • For smaller deployments, we might not even need multiple servers – just the one that handles it all
  • It makes developing web servers that handle media and data channels simpler, as they need to support only one transport – QUIC, instead of having to implement multiple transports
What do we lose?

This isn’t going to happen in a day. Getting there is going to be a journey of multiple years and people will complain and whine about it along the way. Similar to what is happening today with WebRTC – whenever something is modified or something new is added – things tend to break (either because APIs get deprecated, behavior changes or just pure bugs).

Moving to a QUIC based stack is a huge undertaking – for the WebRTC stack, browser vendors and all the related internet infrastructure vendors.

Connecting to other realms such as SIP? That’s going to get even harder, as we move away from the domain of SRTP towards QUIC, more translations and protocol interworking will be required.

The question then becomes – is it worth all the fuss? Are we gaining enough to make this effort worthwhile?

Can you use QUIC in WebRTC now?

To some extent you can. Check out the recent post on QUIC @ webrtcHacks for that.

I will be adding a new dedicated lesson to my online WebRTC course about QUIC – my goal is to have the most up to date and relevant WebRTC training curriculum in the market, so keeping up with these changes comes with the territory.

Interested in WebRTC? Check out my WebRTC course.

The post Who needs QUIC in WebRTC anyway? appeared first on BlogGeek.me.

Which WebRTC JS library should I use?

Mon, 02/11/2019 - 12:00

I don’t really know, but there’s a lot in this innocent “WebRTC JS library” question that isn’t clear without digging a lot further.

Every now and again (= a week or two) I get a question asking me to help with the selection of this or that open source component, pick a CPaaS vendor for a project, find someone to outsource WebRTC work to or hire a stellar WebRTC developer.

Many of these emails are about shortcuts. Give us that silver bullet. Shortcuts seldomly work with WebRTC.

Last week, I had a question come in. A startup is looking for a “WebRTC JS library” to use. Something that does 1:1 voice chat rooms, stores user profiles, etc. It also needed to be inexpensive – Twilio is too expensive for them. And a free alternative was their main preference.

The problem I had with it, is that this simple question of which WebRTC JS library should I use didn’t align that well with the set of questions asked.

This article is about what components are needed for WebRTC deployments. If you’re looking to dig deeper into the media paths in WebRTC, then join my free webinar: Mesh, MCU or SFU

Register to the webinar

Let’s break down WebRTC to its main components as seen from a network architecture perspective:

  1. Signaling
  2. NAT traversal
  3. Media
  4. Other

Here’s a slide I’ve been using to explain where a device gets connected to in a typical WebRTC session –

Signaling

Signaling is how the devices reach out to one another. They can’t do it directly, since they don’t have each other’s IP address, and even if they could, we need some kind of a “protocol” for them to do that.

Signaling in WebRTC is… non-existent. You need to bring your own signaling. This approach confuses some developers, and probably causes this lack of a good solution that fits no-one and everyone at the same time.

Today, you can use SIP, XMPP, MQTT or just proprietary protocols as your signaling for WebRTC traffic. Each such protocol will have its own set of frameworks, services and SDKs that you can use. Some will be free (open source) while others will be licensable software or SaaS based.

NAT traversal

NAT traversal is about being able to actually get media flowing.

WebRTC is P2P (peer to peer), meaning you can, in some cases, send media directly across devices. This is something that is impossible otherwise with web browsers. WebRTC also have a preference on using UDP, since it offers better real time low latency characteristics. It is also the only web browser traffic that makes use of UDP, which means it is sometimes blocked as well.

NAT traversal is how WebRTC get past these pesky issues, and it requires additional servers to help it out to do so. Some of these servers (TURN) may end up relaying all traffic through it…

At the end of the day, you will need to deploy these servers or pay for someone to do it for you (no free meals here).

Media

Recording. Group calling. The need to control media paths. Broadcasting. All these end up requiring media servers in the backend. Ones that can process media in one way or another.

The most common approaches today is to use SFUs and solve most of the world/media problems with them. These also offer some signaling protocol of their own – my preference is usually to short circuit these and redirect all this traffic through a different signaling/messaging path – especially for the more complex applications.

Again, they come in different shapes, sizes and types – open source ones and commercial ones. You usually won’t be able to pay for them separately as a hosted service and will need to go to a CPaaS vendor to get the whole set of solutions – if you’re looking for the hosted/managed path.

Other

Payments, user authentication and identity, the website itself and a large number of other things you might be needing.

These are really out of scope of WebRTC, but sometimes are provided by the various vendors and frameworks out there.

Back to that question

What were we dealing with to begin with here?

looking for a “WebRTC JS library” to use. Something that does 1:1 voice chat rooms, stores user profiles, etc. It also needed to be inexpensive – Twilio is too expensive for them. And a free alternative was their main preference.

Here’s how I’d break this one down to try and understand what was asked:

  • That “WebRTC JS library” gives a hint of someone searching for a signaling framework. Which is great
  • 1:1 voice chats strengthens that feeling we’re dealing with signaling only
  • The word rooms… that feels more like an SFU media server. In this case, I’ll assume there’s no need for a media server though – due to the price points asked (free), the fact that there’s no ask on recording and that this is a 1:1 scenario
  • Stores user profiles. Hmm. this usually has nothing to do with WebRTC. So much so that most CPaaS vendors don’t offer such a capability either
  • Twilio is about the full shebang – getting a hosted, SaaS, CPaaS, managed (pick the term you like best) solution that gives you signaling, NAT traversal, media and some other knick knacks. Doesn’t quite fit in with the rest of the ask here

When I get such jumbled questions, it feels like there’s a bit of a misunderstanding of what WebRTC is and about how the ecosystem of vendors and services has evolved around it.

Want to learn more about WebRTC?

There are several things to do at this point if you need to grok WebRTC:

  1. Read this article on learning WebRTC for more suggestions
  2. Read my WebRTC for Business People report (it is free)
  3. Learn how I think about WebRTC requirements
  4. Take the first module of my WebRTC training (it’s free)
  5. Join me for the webinar tomorrow – I’ll talk about Mesh, MCU and SFU media architectures

The post Which WebRTC JS library should I use? appeared first on BlogGeek.me.

WebRTC for Business People: 2019 Edition

Mon, 02/04/2019 - 12:00

Fresh from the oven – an update to my first ever report – WebRTC for Business People. Download it for free.

It was time. Two years have passed since my last update to this report. In WebRTC-land, things deteriorate and become unusable quite fast. We now have WebRTC in all modern browsers (at least theoretically and to some scenarios) and Microsoft decided to place Edge on top of Chromium. On the vendor stories things have changed and shifted as well.

This, and the need to do something to start off 2019, I decided to write an update to the report. This time, with the assistance of Frozen Mountain who sponsored this update.

Besides the usual updates of reading the report and making sure it is as close to where we are with WebRTC today as possible (and adding more references and links while at it), I’ve also updated the use cases section. I consider this part the most important one in the report.

I removed a few of the stories and added others, ending up with a total of 28 vendor stories. While the groups of these vendor stories haven’t changed, the direction I’ve taken in some of them did.

Here’s what you’ll find in there:

Tooling

The tooling section is usually the hardest one. With over 100 vendors in this space, I wanted to make a few distinct picks, each from a different angle of tooling. I decided this time around to also feature testRTC, a company where I am a co-founder (I am biased on this one, so sorry).

Customer Services and Support

In the customer services space I wanted to make a change to reflect the growing adoption of “see what I see” type of contact center services, also known as “remote assistance” or similar names. To that end, I’ve featured Indeca4D who are making use of mixed reality in their solution.

Enterprise Communications

In the enterprise communications space, it was time to put a UCaaS vendor – something overdue from the last round I guess. I picked Vonage for this one. They are unique also because they offer CPaaS (=Tooling) and contact center services.

Webinars

For the webinars section, I decided to add AnyMeeting. I’ve used other platforms in the past, and after getting to know their platform somewhat more, I decided to start using it for my webinars in 2019. The first webinar will take place next week (feel free to register here).

Healthcare

In Healthcare I’ve replaced one of the stories there for the story of GuruMD. One of the trends in this space is the creation of marketplaces and tools that independent doctors and clinics can start using with their patients or for attracting new clients.

Education

For Education, I’ve added Soliya. I wanted to somehow emphasize that education is probably one of the most varied domains where you see WebRTC. Almost every vendor there is looking at education from a different angle, leading to different requirements and final product offerings.

Social

Social… remained the same. The stories got a bit of a refresh where needed, but stayed mostly the same. I felt that Facebook, Houseparty, Snap and YouNow are relevant today as they were two years ago.

Streaming and Content Delivery

In streaming and content delivery, I’ve replaced two vendors, deciding to showcase Google Project Stream and Limelight. Both bringing some strong validation to where WebRTC is headed and how it fits into these non-video calling domains.

Download the report

If WebRTC interests you, then you should definitely read this report –

Tell me what you think about it.

The post WebRTC for Business People: 2019 Edition appeared first on BlogGeek.me.

Asking Google: WebRTC is …

Mon, 01/28/2019 - 12:00

This is going to be awkward. For me? WebRTC is an open source media engine with a publicly known JavaScript API that got implemented in browsers.

I’ve written a “what is WebRTC” article more than once. The most notable ones?

  1. What is WebRTC? – an article from 2017
  2. WebRTC FAQ: The 2018 Version
  3. WebRTC for Business People – a report that got updated in 2017, with a new 2019 edition coming real soon
  4. Advanced WebRTC Architecture Course – a full length paid for course that teaches WebRTC

This time, I wanted to check what Google thinks of WebRTC, so I started asking it:

Before we continue down this rabbit hole, make sure to register and join me in two weeks for a webinar covering Mesh, MCU and SFU topologies and what each one is good for in your WebRTC application.

Lets go one by one over these alternatives, trying to understand what are people looking for in their WebRTC.

WebRTC is disabled

Somehow, this got the highest ranking. VPN vendors doing their best with FUD and SEO here, in trying to get people to disable WebRTC in browsers.

Reminds me of the good old days when people disabled JavaScript in their browsers.

WebRTC does give access to the camera, microphone, screen and local IP address of a user. Most of it under the user’s own volition. You can use browser extensions to support local IP address “leaks”, while in Safari exposing local IP addresses requires user authorization of some sort as well.

Not sure how this got first place in “WebRTC is”.

WebRTC is free

Yes it is. Mostly. Somewhat. If you understand what “free” is.

You can go to webrtc.org and download it for free. You can even use it and modify it.

But then again, hosting a service isn’t free. Someone needs to pay for the network and electricity. Someone needs to do the coding.

Things brings a rather interesting mindset that I see in entrepreneurs and developers – they feel like using a third party framework or even a managed service should be free – or a lot cheaper than it is. So they go about developing it on their own, spending time and money on development (and a lot of times a lot more than it would have been just picking up a managed service instead).

That concept of free in WebRTC? It is mostly about removing barriers of entry for vendors. It isn’t about free video calling.

WebRTC is_component_build

Beats me how this got so high as a suggestion by google.

The build system in WebRTC is often challenging. That’s because Google maintains the main WebRTC open source project with the main purpose of being embedded in Chrome. Due to this, it is just part of the Chrome build process and scripts, and not a standalone product or library.

This part is probably the most painful in WebRTC for developers who need to modify or adapt it for native applications.

Still not sure why it ranks so high.

WebRTC is dead

It isn’t. Can’t even call it a grownup or a teanager.

Moving on.

WebRTC is ready

Yap. it is.

WebRTC is ready. Developers will still bitch and whine that it isn’t complete and changes all the time breaking things up, but at the end of the day – if you’re doing something with communications these days, WebRTC should be the first thing to look at before searching elsewhere.

WebRTC is udp

It is also TCP. With a dash of SCTP. With talks about making it QUIC. Go figure.

UDP is what WebRTC uses to send its media. It works well because TCP has this nasty habit of retransmitting things to make sure they get received. This retransmission thing doesn’t work well where what you’re sending is time sensitive (like media of an interactive conversation).

Not sure why this one is in the top 10 either.

WebRTC is_clang

Like is_component_build, is_clang is also a build/compiler related setting. In this case, deciding which C/C++ compiler to use with WebRTC.

And again, I am clueless as to how and why this is such a popular Google search for WebRTC is.

WebRTC is not defined

This is golden.

The search itself is most probably related to compilation and runtime errors of developers with WebRTC, where they post the error messages around the web in stack overflow, discuss-webrtc and other online forums – asking for help from fellow developers.

Yet…

WebRTC isn’t defined. Yet.

People primsed me WebRTC 1.0 since 2015. Maybe a year or two earlier. We are now in 2019, talking about things like WebAssembly in WebRTC. But we still don’t have WebRTC 1.0. We’re getting there, but it is still a draft. Will WebRTC 1.0 standardization complete in 2019? Maybe. But WebRTC is not defined. But it is ready. Go figure.

WebRTC is p2p

WebRTC is peer to peer.

You can send media directly from one browser to another (if network conditions allow). But you need to handle signaling in front of web servers, which is kinda centralized. And sometimes, sending media peer to peer won’t work media and has to be routed. And other times, you’ll want to send media towards a media server.

You can read more about it here – Get Over it: WebRTC isn’t Peer-to-Peer

WebRTC is supported

Something that is going to change meaning in 2019.

People used to ask “which browsers support WebRTC?” or “is WebRTC supported on X” where X is Internet Explorer, Edge or Safari.

Nowadays, we’re over that bit of a challenge, with the last gaps closing as well.

The shift of this one is going to be towards traditional voice and video services that are adding WebRTC support for guest access or for those who don’t want to install any apps.

In the last year or so, I’ve had to install a lot less applications for meetings I have with companies. It isn’t because we all use Google Meet – it is because almost all of the services (Zoom is the exception here) give WebRTC guest access. WebEx, GoToMeeting, Amazon Chime – all offer WebRTC support. So I can easily handle these calls without installing anything. And yes – WebRTC is supported.

What’s your WebRTC is search term?

I found this list of google search suggestions for WebRTC is quite interesting. Not exactly what I expected starting out.

For me, WebRTC is progress. It is the next step we’re taking in figuring out communications, and in that, it fills the role of one of the most basic building blocks we now have and use.

What about you? WebRTC is …

Looking to learn more about what WebRTC is? How about understanding about mesh, mixing and routing architecture? You should join me for this free webinar:

Register to Mesh, MCU or SFU webinar

The post Asking Google: WebRTC is … appeared first on BlogGeek.me.

What is a WebRTC Signaling Server and Why You Should NOT Use AppRTC?

Mon, 01/21/2019 - 12:00

AppRTC isn’t your friend when it comes to developing a commercial WebRTC application.

I already wrote about the fact that there’s no free TURN server from Google. It seems that I failed to mention the fact that you shouldn’t use Google’s “free” STUN server in production either. Which leads us to this great question on github on AppRTC:

apprtc websocket server down?

The interesting part about this one is that no one from Google commented on it at any point in time.

You see, AppRTC wasn’t meant as a full fledged application, and to some extent, not even as a reference application for other developers. It is mostly meant to be a hello world type of an example.

With a glaring lack of good, simple, popular open source signaling frameworks for WebRTC,
developers sometimes use AppRTC for that purpose.

Signaling is important, and so is media. If you want to learn more about mesh, mixing and routing architecture, you should join me for this free webinar:

Register to Mesh, MCU or SFU webinar

While I use AppRTC for baselining, I don’t think it is a good starting place for actual development of a real service.

Here are 4 reasons why:

#1 – AppRTC doesn’t get much love and attention

Look at github insights for AppRTC:

See the number of additions and deletions taking place in 2018?

Latest commit? March 2018.

One could argue that this is because the “Hello World” example for WebRTC is already quite polished and working well, so there’s no need to change anything. Or that WebRTC is now stable enough.

#2 – This is just a “Hello World”

Here’s an example of a Hello World js function:

function hello(name){ console.log("Hello " + name); } hello('node.js');

This isn’t a starting point I’d use for writing an application.

The AppRTC application is admittedly larger. Here’s the lines of code count for its github project at the time of writing (not that I’d expect much change to it in 2019):

The problem is in what AppRTC doesn’t include, which many developers want/try to add:

  • Android and/or iOS AppRTC apps – these aren’t available from Google. There are 3rd party projects for it you can find on github, but they are even less maintained than the Google AppRTC one
  • Screen sharing – it isn’t there. Need it? Add it on your own
  • Multiparty – not there either. And if you’d try using AppRTC for it, my guess is you’d end up with a mesh architecture (which for 99.9% of the use cases and most definitely for your use case – is destructive)
#3 – Not built to scale

AppRTC uses a python based signaling server, which is great. The actual signaling protocol selected and used isn’t really documented anywhere, so you’ll need to dive into the code to figure it out if you’ll want to add or modify anything. And you will, simply because a lot of functionality you might want is missing.

The thing is, if you plan on scaling up your service to large number of users, you’ll need this to work across machines – and that’s not easy – or at least not trivial.

At Kranky Geek 2016, Google explained what they did to scale and improve signaling for their own production services. Check out what that means:

Not everyone needs to do things at scale, but many do. Starting for AppRTC places you at the wrong place for growth.

And when it comes to edge cases, it doesn’t cover them all – if ICE negotiation fails, you won’t know about it on the UI, just have it as an ICE failure message in the console log. That’s the example I’ve bumped into when using testRTC with it and closing all ports but 443.

#4 – Don’t iframe or URL to it

Running a service and just need basic meeting capabilities?

Don’t place AppRTC in an iframe of your app or have a URL to it open in another window.

You don’t get an SLA from Google when using AppRTC, and they won’t treat it like a critical service when it fails to run. Throughout the years there have been times when AppRTC was down for one reason or another.

Upwork, for example, used to use a third party free/sample/demo service similar to AppRTC or Jitsi Meet. You had to schedule a meeting with people you work with on Upwork? Click a button, it created a kind of an ad-hoc, random URL for that meeting and opened it on a new browser tab. They were smart enough to replace it with their own branded meetings feature later down the road.

That service that Upwork used? No longer exists. Want to get a signed guarantee from Google that AppRTC will stay up and running and work the same way it does today 2 years from now?

If you plan on running a serious business, host your own communications infrastructure or pay for it.

Do you have any other alternative?

Not really. Not an immediate one at least.

People are still falling to the trap of using peerjs (see here why NOT to use peer.js).

We used to have EasyRTC and SimpleWebRTC in the past. EasyRTC still gets some love and attention, so you can try it out. SimpleWebRTC is now deprecated – &yet have decided to offer it “as a service” instead.

There are many other github projects offering webrtc signaling. Most of them seem to be projects people built for themselves but never really matured to a robust framework that others have adopted.

I started suggesting matrix, but many don’t really manage getting WebRTC to work well with out.

Then there’s the cloud based services – PubNub, Pusher, Scaledrone, Ably and even Google’s Firebase. These give you robust transport where you can pour your signaling protocol into.

Or a commercial software you can install anywhere such as Frozen Mountain’s WebSync.

In many cases, this will be an each to his own situation, where you’ll just need to develop it yourself or start somewhere and make it your own quite fast.

Signaling is important, and so is media. If you want to learn more about mesh, mixing and routing architecture, you should join me for this free webinar:

Register to Mesh, MCU or SFU webinar

The post What is a WebRTC Signaling Server and Why You Should NOT Use AppRTC? appeared first on BlogGeek.me.

What’s the Role of WebAssembly in WebRTC?

Mon, 01/14/2019 - 12:00

WebAssembly in WebRTC will enable vendors to create differentiation in their products, probably favoring the more established, larger players.

In Kranky Geek two months ago, Google gave a presentation covering the overhaul of audio in Chrome as well as there is WebRTC headed next. That what’s next part was presented by Justin Uberti, creator and lead engineer for Google Duo and WebRTC.

The main theme Uberti used was the role of WebAssembly, and how deeper customizations of WebRTC are currently being thought of/planned for the next version of WebRTC (also known as WebRTC NV).

Before we dive into this and where my own opinions lie, let’s take a look at what WebAssembly is and what makes it important.

Looking to learn more about WebRTC? Start from understanding the server side aspects of it using my free mini video course.

Enroll to the free course

What is WebAssembly?

Here’s what webassembly.org has to say about WebAssembly:

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

To me, WebAssembly is a JVM for your browser. The same as Java is a language that gets compiled into a binary code that then gets interpreted and executed on a virtual machine, WebAssembly, or Wasm, allows developers to take the hard core languages (which means virtually any language), “compile” it to a binary representation that a Wasm virtual machine can execute efficiently. And this Wasm virtual machine just happen to be available on all web browsers.

WebAssembly allows vendors to do some really cool things – things that just weren’t possible to do with JavaScript. JavaScript is kinda slow compared to using C/C++ and a lot of hard core stuff that’s already written in C/C++ can now be ported/migrated/compiled using WebAssembly and used inside a browser.

Here are a few interesting examples:

What’s in WebRTC NV?

While the ink hasn’t dried yet on WebRTC 1.0 (I haven’t seen a press release announcing its final publication), discussions are taking place around what comes next. This is being captured in a W3C document called WebRTC Next Version Use Cases – WebRTC NV in short.

The current list of use cases includes:

  • Multiparty voice and video communications for online gaming – mainly more control on how streams are created, consumed and controlled
  • Improved support in mobile networks – the ability to manage and switch across network connections
  • Better support for media servers
  • New file sharing capabilities
  • Internet of Things – giving some love, care and attention to the data channel
  • Funny hats – enabling AI (computer vision) on video streams
  • Machine learning – like funny hats, but a bit more generic in its nature and requirements
  • Virtual reality – ability to synchronize audio/video with the data channel

While some of these requirements will end up being added as APIs and capabilities to WebRTC, a lot of them will end up enabling someone to control and interfere with how WebRTC works and behaves, which is where WebAssembly will find (and is already finding) a home in WebRTC.

Google’s example use case for WebAssembly in WebRTC

At the recent Kranky Geek event, Google shared with the audience their recent work in the audio pipeline for WebRTC in Chrome and the work ahead around WebRTC NV.

For Google, WebRTC NV means these areas:

The Low Level APIs is about places where WebAssembly can be used.

You should see the whole session, but here it is from where Justin Uberti starts talking about WebRTC NV – and mainly about WebAssembly in WebRTC:

WebAssembly is a really powerful tool. To give a taste of it with WebRTC, Justin Uberti resorted to the domain of noise separation – distinguishing between speech and noise. To do that, he put up an online demo that takes RNNoise, a noise suppression algorithm based on machine learning, ported it to WebAssembly, and built a small demo around it. The idea is that in a multiparty conference, the system won’t switch to a camera of a person unless he is really speaking – ignoring all other interfering noises (key strokes, falling pen, eating, moving furniture, etc).

Interestingly enough, the webpage hosting this demo is internal to Google and has a URL called hangouts_echo_detector/hackathon_2018/doritos – more on that later.

To explain the intent, Justin Uberti showed this slide:

As he said, the “stuff in green” (that’s Session Management, Media Processing, Codecs and Packetizer/FEC/RTX) can now be handled by the application instead of by WebRTC’s PeerConnection and enable higher differentiation and innovation.

I am not sure if this should make us happier or more worried.

In favor of differentiation and innovation through WebAssembly in WebRTC

Savvy developers will LOVE WebAssembly in WebRTC. It allows them to:

  • have way more control over the browser behavior with WebRTC
  • add their own shtick
  • do stuff they can’t do today – without waiting on Google and the other browser vendors

In 2018, I’ve seen a lot of companies using customized WebRTC implementations to solve problems that are very close to what WebRTC does, but with a difference. These mainly revolved around streaming and internet of things type of use cases, where people aren’t communicating with each other in the classic sense. If they’d have low level API access, they could use WebAssembly and run these same use cases in the browser instead of having to port, compile and run their own stand-alone applications.

This theoretically allows Zoom to use WebRTC and by using WebAssembly get it to play nice with its current Zoom infrastructure without the need to modify it. The result would give better user experience than the current Zoom implementation in the browser.

Enabling WebAssembly in WebRTC can increase the speed of innovation and spread it across a larger talent pool and vendors pool.

In favor of a level playing field for WebRTC

The best part about WebRTC? Practically any developer can get a sample application up and running in no time compared to the alternatives. It reduced the barrier of entry for companies who wanted to use real time communications, democratizing the technology and making it accessible to all.

Since I am on a roll here – WebRTC did one more thing. It leveled the playing field for the players in this space.

Enabling something like WebAssembly in WebRTC goes in the exact opposite direction. It favors the bigger players who can invest in media optimizations. It enables them to place patents on media processing and use it not only to differentiate but to create a legal mote around their applications and services.

The simplest example to this can be seen in how Google itself decided to share the concept by taking RNNoise and porting it to WebAssembly. The demo itself isn’t publicly available. It was shown at Kranky Geek, but that’s about it. Was it because it isn’t ready? Because Google prefers having such innovations to itself (which it is certainly allowed to do)? I don’t know.

There’s a dark side to enabling WebAssembly in WebRTC – and we will most definitely be seeing it soon enough.

Where do we go from here?

WebRTC is maturing, and with it, the way vendors are trying to adopt it and use it.

Enabling WebAssembly in WebRTC is going to take it to the next level, allowing developers more control of media processing. This is going to be great for those looking to differentiate and innovate or those that want to take WebRTC towards new markets and new use cases, where the current implementation isn’t suitable.

It is also going to require developers to have better understanding of WebRTC if they want to unlock such capabilities.

Looking to learn more about WebRTC? Start from understanding the server side aspects of it using my free mini video course.

Enroll to the free course

The post What’s the Role of WebAssembly in WebRTC? appeared first on BlogGeek.me.

What’s the Best Size for a WebRTC SFU Media Server?

Tue, 01/08/2019 - 12:00

Small, Medium, Big or Extra Large? How do you like your WebRC SFU Media Server?

I just checked AWS. If I had to build the most bad-ass, biggest, meanest, scalest, siziest server for WebRTC. One that can handle gazillions of sessions, I’d go for this one:

A machine to drool over… Should buy such a toy to write my articles on.

Or should I go for the biggest machine out there?

I did a round-up of some of the people who develop these SFUs. And guess what? None of them is ordering the XL machine.

They go for a Medium or Medium Well. Or should I say Medium Large?

Media servers, Signaling, NAT traversal – do you know what it takes to install and manage your own WebRTC infrastructure? Check out this free video course on the untold story of the WebRTC servers backend.

Start your free course

Anyways – here are a few things to think about when picking a machine for your SFU:

Going BIG on your SFU

As big as they come that’s how big you wanna take them.

We called it scale up in the past. Taking the same monolith application and put it on a bigger machine to get more juice out of it.

It’s not all bad, and there are good reasons to go that route with a media server:

Managing less machines

If one big machine does the work of 10 smaller machines, then all in all, you’ll need 1/10 the number of machines to handle the same workload.

In many ways, scaling is non-linear. To get to linear scaling, you’ll need to put a lot of effort. Different bits and pieces of your architecture will start breaking once you scale too much. In this sense, having less machines to manage means less scaling headaches as well.

Having bigger rooms

Group calling is what we’re after with media servers. Not always, but mostly.

Getting 4 people in a room is easy. 20? Harder. 500? Doable.

The bigger the rooms, the more you’ll need to start addressing it with your architecture and scale out strategies.

If you take smaller machines, say ones that can handle up to 100 concurrent users, then getting any group meeting to 100 participants or more is going to be quite a headache – especially if the alternative is just to use a bigger machine spec.

The bigger the rooms you want, the bigger the machines you’ll aim for (up to a point – if you want to cater for 100+ users in a room, I’d aim for other scaling metrics and factors than just enlarging the machines).

Less fragmentation

Similar to how you fit chunks of memory allocations into physical memory, fitting group sessions into media servers, and maybe even cascading them across machines will end up with fragmentation headaches for you.

Let’s say some of your meetings are really large and most are pretty smallish. But you don’t really know in advance which is which. What would be the best approach of starting to fit new rooms into existing media servers? This isn’t a simple question to answer, and it gets harder the smaller the machines are.

Simpler architecture (=no cascading)

If you are setting up the media server for a specific need, say catering for the needs of a hospital, then the size is known in advance – there’s a given number of hospital beds and they aren’t going to expand exponentially over night. The size of the workforce (doctors and nurses) is also known. And these numbers aren’t too big. In such a case, aiming for a large machine, with an additional one acting as active/passive server for high availability will be rather easy.

Aiming for smaller machines might get you faster to the need to scale out in your architecture. And scaling out has its own headaches and management costs.

Simpler

Bigger machines are going to be simpler in many ways.

Going small on your SFU

This is something I haven’t thought about as an alternative – at least not until a few years ago when I was helping a client in picking a media server for his cloud based service. One of the parameters that interested him was how small was considered too small by each media server vendor – trying to understand the overhead of a single media server process/machine/application.

I asked, and got good answers. I since decided to always look at this angle as well with the projects I handle. Here’s where smaller is better for WebRTC media servers:

Easier to upgrade

I dealt with upgrading WebRTC media servers in the past.

There are two things you need to remember and understand:

  1. WebRTC moves fast (and breaks things while doing so)
  2. You’ll need to update your backend rather frequently, including your media servers

The most common approach to upgrades these days is to drain media servers – when wanting to upgrade, block new sessions from going into some of the media servers, and once the sessions the are already handling are closed, kill and upgrade that media server. If it takes too long – just kill the sessions.

Smaller machines make it easier to drain them as they hold less sessions in them to begin with.

Having more machines also means you can mark more on them in parallel for draining without breaking the bank.

Blast radius of crashes

This is what started me on this article to begin with.

I took the time to watch Werner Vogels’s keynote from AWS re:Invent which took place November 2018. In it, he explains what got AWS on the route to build their own databases instead of using Oracle, and why cloud has different requirements and characteristics.

Here’s what Werner Vogels said:

With blast radius we mean that if a failure happens, and remember: everything fails all the time. Whether this is hardware or networking or transformers or your code. Things fail. And what you want to achieve is that you minimize the impact of such a failure on your customers.

Basically, if something fails, the minimum set of customers should be affected, if that’s the case.

Everything fails all the time.

And we do want to minimize who’s affected by such failures.

The more media servers we have (because they are smaller), the less customers will be affected if one of these servers fail. Why? Because our blast radius will be smaller.

CPU utilization

Here’s something about most modern media servers you might not have known – they don’t eat up CPU. Well… they do, but less than they used to a decade ago.

In the past, media servers were focused on mixing media – the industry was rallied around the MCU concept. This means that all video and audio content had to be decoded and re-encoded at least once. These days, it is a lot more common for vendors to be using a routing model for media – in the form of SFUs. With it, media gets routed around but never decoded or encoded.

Media servers, Signaling, NAT traversal – do you know what it takes to install and manage your own WebRTC infrastructure? Check out this free video course on the untold story of the WebRTC servers backend.

Start your free course

In an SFU, network I/O and even memory gets far more utilized than the CPU itself. When vendors go for bigger machines, they end up using less of the CPU of the machines, which translates into wasted resources (and you are paying for that waste).

At times, cloud vendors throttle network traffic, putting a limit at the number of packets you can send or receive from your cloud servers, which again ends up as putting a limit to how much you can push through your servers. Again, causing you to go for bigger machines but finding it hard to get them fully utilized.

Smaller machines translates into better CPU utilization for your SFU in most cases.

Number of Cores/CPUs and Your SFU’s Architecture

Big or small, there’s another thing you’ll need to give your thought to – and that’s the architecture of the media server itself.

Media servers contain two main components (at least for an SFU):

  1. Control/signaling
  2. Media routing

Sometimes, they are coupled together, other times, they are split between threads or even processes.

In general, there are 3 types of architectures that SFUs take:

  1. Have a single process handle both control and media; doing it in a multithreaded mode
  2. Have separate processes that can scale out, running each on its own machine or thread
  3. Decoupling control and media and having both of them scale out independently of each other

Me? I like the third alternative for large scale deployments. Especially when each process there is also running a single thread (I don’t really like multithreaded architectures and prefer shying away from them if possible).

That said, that third option isn’t always the solution I suggest to clients. It all depends on the use case and requirements.

In any case, you do need to give some thought to this as well when you pick a machine size – in almost all cases, you’ll be used a multi-core multi-threaded machine anyway, so better make the most of it.

How Do You Like Your SFU?

Back to you.

Media servers, Signaling, NAT traversal – do you know what it takes to install and manage your own WebRTC infrastructure? Check out this free video course on the untold story of the WebRTC servers backend.

Start your free course

The post What’s the Best Size for a WebRTC SFU Media Server? appeared first on BlogGeek.me.

A new design and what to expect in 2019 from BlogGeek.me?

Mon, 12/31/2018 - 12:00

The new look is here – and it is less… green.

I’m splitting this one into two main parts – the redesign and what’s going to happen in 2019.

BlogGeek.me – Redesigned

When I started this blog, what I didn’t want is yet another blue website. Somehow, it didn’t seem right to me. I ended up with a green one. So much so, that it stuck to almost everything else that I did online. As a kid, I really liked light blue – I don’t think green was anywhere in my sights.

Earlier this year, I wanted to refresh the look and the “brand” that is BlogGeek.me a bit. Luckily, the original designer just moved back from being a designer in an IoT startup to being a freelancer again, so I asked her for a new look. Which she happily and lovingly provided.

A few months later, with a lot of deliberation, hard work and updating ALL posts and pages (I had a lot of crap lying around due to custom shortcodes and plugins that accumulated in 6 years), I decided to take the plunge and update the main site with the new design.

What are the main differences?

There’s a lot… but here’s what you should know:

  1. I’ve removed the number and frequency of nagging popups. From now on, the only thing that will jump at you might be what is called an exit intent – it will show relevant content you may want to review further, and only once you’re ready to leave the page (no more searching for the x in the middle of reading an article)
  2. What is it that I do for a living? My site was designed and built as a blog. That last redesign I did was nice, but still left people wondering how I can actually help them. I tried fixing that with a new homepage and a simplified menu bar and footer area
  3. No course. I haven’t closed my WebRTC training – I just moved it to a website of its own: WebRTCcourse.com. This allows me to focus on the course and improve it in ways I just couldn’t do when it was part of BlogGeek.me
  4. Better reading experience. For now, I decided that article pages won’t have a sidebar, so you’ll get a distraction-free reading experience. The fonts are also bigger now (I am getting older, and with it my preference of font size seem to be changing)

Oh – and the pictures of me featuring on the website? They’re also new. Took them earlier in 2018.

Things are still broken

Not everything is working flawlessly. And there’s a reason for that. I knew that if I want just ship the thing, it will never come to be. So I decided to just release it “as is” at this point. I wanted to have a fresh start in 2019 with my website.

Here are somethings I know are broken:

  1. Mobile. Bad job there. This is known and will be taken care of through January
  2. Digital payments. The online store that I have/had was split into 2 – the one on BlogGeek.me which serves the reports and a separate one on WebRTCcourse.com which… needs to be fixed

Other than that, some pages are still ugly, and in other cases, there might be some dead or broken links.

If you find anything – just email me about it – I must have missed some of the ailments throughout this transition so I really appreciate your help here.

What to expect from BlogGeek.me in 2019?

Honestly, I don’t really know. At least not exactly.

Each year I start off with a plan, in which certain initiatives take place throughout the year. Some of them come to fruition while others – don’t.

Here’s what I decided for 2019:

Webinars

Last year was a rather slow year for webinars. Both on BlogGeek.me and on testRTC (where I am a co-founder and CEO).

This is going to change.

In 2019, I want, at least theoretically, to do a webinar a month for each. A line up of topics has been created and is maintained (I’ll need more topics, but I have a good starting point).

For BlogGeek.me, webinars would be around topics that make sense for me at a given month. First one will be around Mesh/MCU/SFU – one of those topics that I can endlessly babble about.

testRTC webinars are going to focus on things that you can do with testRTC. Instead of trying to aim for generic WebRTC industry/testing/marketing/promoting/whatever non-focus, we’re going to double down on best practices, hacks and interesting things we’re bumping into with our customers at testRTC.

testRTC

Speaking of testRTC – we’ve had a good year in 2018, growing our list of customers and getting into new areas. We’ve rewritten a big portion of our backend and we will continue with the rewrite in 2019 to close our technical debt.

Expect some new features and a new product or two from testRTC to be announced during 2019.

Articles on BlogGeek.me

I am going to write this year on BlogGeek.me, as well as other places when time permits.

For now, I plan to stick with a weekly article per week, something that was hard to maintain this year and I assume will be harder in 2019.

WebRTC Training

My online WebRTC course got over 250 registered students. I want to scale it up even further.

This year, I’ll be giving the course additional focus, making sure it stays the best alternative out there for those who wish to learn WebRTC.

In February, there will be a few announcements about the course.

Reports update

The reports will get some refresh in 2019.

The WebRTC for Business People is up for a 2019 edition (later this month). I’d like to thank Frozen Mountain for sponsoring this initiative and making this edition free for everyone.

I might do an update to Choosing a WebRTC API Platform report. There are enough changes in the industry taking place that merit such an update. If you are a CPaaS vendor, who is now offering WebRTC support of some kind and you’re not featured in this report already – contact me.

The recent AI in RTC report I’ve written with Chad Hart doesn’t need an update. Yet.

Kranky Geek

Unlike previous years, Kranky Geek already has a date for 2019: November 15, San Francisco, Google office – same place as always.

If you’d like to talk about sponsorships, speaking opportunities and such – we’re happy to start this earlier than usual.

In any case, mark your calendar.

Other projects and initiatives

As in previous years, more projects will crop up during the year. There are a few I am contemplating already, but not sure yet if I’ll be doing them.

If there’s a project you’d like to do together – just tell me.

2019

Have a great new year!

The post A new design and what to expect in 2019 from BlogGeek.me? appeared first on BlogGeek.me.

All the Truth About the Latest (non)Hype of Fuzzy Testing WebRTC Applications

Mon, 12/17/2018 - 12:00

There’s a lot of fuzzing around lately about WebRTC. Which is really about SRTP. Which is really important. But also really misplaced.

Before I Begin

This all started when Google Project Zero, a team tasked with actively searching for zero day bugs (nasty crashes and similar bugs that might be exploited by hackers) set their sights on video conferencing and WebRTC. The end result of it all is a github repository with tools to test RTP streams (and some filed bugs).

A few things to put the house in order:

  1. These bugs are important. Go fix them
  2. I am not a security expert, but I know my way with security and have a few scars to show for it
  3. This isn’t the end of the world. A few bugs were found. Many of them old. This happens every day. Some are nastier than others
  4. These won’t be the last bugs in WebRTC and they won’t be the most serious that get found either. Just ask NewVoiceMedia about their recent audio issues
  5. We will all forget about this come 2019 and proceed with our normal daily lives

Now that we’ve cleared the air – let’s check what’s all that fuzz. Shall we?

What Fuzzing means

Wikipedia has his to say about Fuzzing:

Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks.

For me, fuzz testing is about the generation of malformed inputs in ways that the developers haven’t anticipated or tested for. This will result undefined behavior, which is largely a nicer word of saying a bug. In some cases, the bug will be an innocent one. In other cases, it can be nasty:

  • It might cause the software to crash
  • Go read or write where it shouldn’t (overflow)
  • Deadlock the whole thing (=cause it to freeze)
  • Cause a memory leak

The type of bugs that can be found is endless, which makes for really good FUD (fear, uncertainty, doubt) and lore.

A good malformed input can theoretically be used to grant you administrative access to a machine or to allow you to read memory where you shouldn’t have access to.

A simple explanation can be this: assume your software expects a user’s email to be 40 characters long. Lower than that is obviously fine, but what will happen if you use an email that is longer than 40 characters? Somewhere along the line, there will be a piece of code that should check the length and state that you’ve got it too long. And if there isn’t… well… we’ve reached the realm of undefined and potential security bugs.

The same can happen in network protocols,where whatever you send “on the wire” has a structure of sorts. The machines need structure to be able to parse the data and act upon it. So if you change the data so it is close to the expected structure, but off in just a bit – you might get to that realm of undefined as well.

Fuzzing is trying to get to that place – adding randomness in just the correct places to get to undefined software behavior.

Let me tell you a bedtime story

MY fuzzy life started in Finland, though I’ve never been there (yet).

At Oulu university, one day, a new something called “PROTOS Test Suite” was created. At the time, I was the project manager leading the development and maintenance of RADVISION’s H.323 protocol stack. We’ve licensed it to many vendors around the globe, all using our source code to build VoIP products.

The PROTOS Test-Suite was all about security testing. The intent behind it was to find bugs that cause crashes and other ailments to those using H.323. And they chose the best possible entry point. Here’s how they phrased it:

The purpose of this test-suite is to evaluate implementation level security and robustness of H.225.0 implementations. H.225.0 is a protocol responsible for signalling and setting up H.323 calls. […]

The scope of the test-suite was narrowed to H.225.0 version 4 Setup-PDU. Rationale behind this selection was:

  • Setup is the first message sent to a target H.323 endpoint upon call signalling, it is easy to deliver test-cases and to restore the implementation back to its initial state by disconnecting.
  • […]

I marked in bold the important parts. Specifically, the guys at Oulu decided to go after the “pick up line” of H.323 and try to come up with nasty Setup messages that will confuse H.323 devices.

And confuse they did. PROTOS has 4497 Setup messages. On my first run with it, probably 50% of them caused our beloved H.323 stack to crash. I spent a week building the software to automate using it and fixing all the nastiness out of it. I admired the work they did and the work they made me do.

PROTOS practically analyzed how the things go on the wire, and devised a set of messages that were bound to get picked by bad programming practices, which we all err on as humans. This isn’t exactly fuzzing in an automated fashion, but it is the “manual” equivalent of it.

This got its own CERT vulnerability note and we had a great time working with our customers on updating our stack and getting these security fixes to work.

I believe some of our customers actually upgraded and updated their systems due to this. I am sure many didn’t. I am also assuming many of our customers’ customers didn’t upgrade their own deployed equipment. And the world continued on. Happily enough.

All this took place in 2004. Before WebRTC. Before the cloud. Before mobile. With practically the same RTP/RTCP protocol and the same techniques and mechanisms in VoIP that we use today in WebRTC.

Why didn’t people look at RTP vulnerabilities at that time? We’ll get to that.

Google’s Project Zero and video conferencing

This year, Google Project Zero decided to look at video conferencing. The “way in” was through WebRTC. Natalie Silvanovich was tasked with this and she wrote a series of 5 posts about it. The first one was about her selection and adventures with WebRTC itself. In it, she writes:

I started by looking at WebRTC signalling, because it is an attack surface that does not require any user interaction. […] WebRTC uses SDP for signalling.

I reviewed the WebRTC SDP parser code, but did not find any bugs. I also compiled it so it would accept an SDP file on the commandline and fuzzed it, but I did not find any bugs through fuzzing either. […]

I then decided to look at how RTP is processed in WebRTC. While RTP is not an interaction-less attack surface because the user usually has to answer the call before RTP traffic is processed, picking up a call is a reasonable action to expect a user to take. […]

Setting up end-to-end fuzzing was fairly time intensive […]

A few things that come to mind here:

  1. The “signaling” layer in WebRTC (=the SDP parser) is rather robust against these types of attacks. Natalie couldn’t find anything there
  2. Signaling and SDP, is the equivalent of what the guys at Oulu did with their PROTOS test suite
  3. There is a notion here of “call answering”. This isn’t what WebRTC does. It connects sessions. Sometimes directly and sometimes indirectly. And in all cases, there are layers above RTP that the users (and attackers) will need to go through first
  4. Setting up such a test, doing end-to-end fuzzing in the RTP layer is time intensive

Time intensive is important, as this raises the bar to those wishing to exploit such a weakness.

The fact that RTP isn’t the first attack surface and isn’t the first layer of interaction makes it somewhat less obvious on how to exploit it (besides instigating DDoS attacks on devices and servers).

Coupling these two – the complexity and the non-obviousness of an exploit is what kept people from putting the effort into it up until today.

The Fuzzy feelings of our WebRTC industry

Ben Hawkes, Project Zero team lead tweets on it garnered 3 digit likes and retweets, tapering off in the last 2 posts (I attribute that to fatigue of the subject):

Project Zero blog: "Adventures in Video Conferencing Part 1: The Wild World of WebRTC" by @natashenkahttps://t.co/pdtZLDDP9M

— Ben Hawkes (@benhawkes) December 4, 2018

That kind of sharing is an average day for most posts published by that team. A few immediately took the cue and started fuzzing on their own. A notable example is Philipp Hancke who aimed at the Janus media server and fuzzed REMB RTCP messages.

His attack was quite successful due to several reasons:

  1. He had he source code of Janus and was able to isolate the area he wanted to attack. This made the process easier than the work done by Project Zero
  2. He picked an obvious target that was bound to crash multiple times – a message buried deep inside the protocol that aimed at control logic that takes place a lot after the session gets connected
Should you start Fuzzing away your WebRTC application?

Probably not.

And let’s face it – in the list of tests that you want to do but don’t do today, fuzzing fits nicely near that end of the things you just never find the time and priority to handle.

The good thing? For most of us, fuzzing is something that “others” should be doing.

If you are using a CPaaS vendor, it is his task to protect his signaling and media servers against such attacks.

If you run on top of the browser… well… those who maintain the WebRTC code for the browser need to do it (and it is Google for the most part at the moment).

You should think about fuzzing in your own application logic and the things that are under your control, but the WebRTC pieces? Going down the rabbit hole of fuzzing RTP and RTCP packets? Not for you.

Your role here is to ask the vendors you work with if they have taken steps in the area of security testing and what exactly have they done there. Fuzzing needs to be one of them things.

Who should care about fuzzing?

There’s a shortlist of people that needs to deal with fuzzing.

  • If you develop and deploy your own media servers and client side frameworks – you should fuzz them away
    • The example above that Philipp Hancke did with Janus? It should be done on more such message types and protocol layers and it should be done for the other media servers
    • A WebRTC implementation in Python added some fuzzing related fixes in version 0.9.14: “Fix RTP and RTCP parsing errors detected by fuzzing”
    • That said, do we want them to do that or implement unified plan? What has a higher priority? For most of the industry, it would be unified plan…
  • If you are using third parties, you need to make sure you update them frequently
    • Using a WebRTC stack from a year or two ago isn’t something you should be doing
    • Using open source media servers without upgrading them from time to time (and actively looking for these security patches for them) is als not something you should be doing
  • CPaaS vendors…
    • These things is one of them things they live for
    • They deal with this headache so you don’t have to
    • If they don’t – you should take your business elsewhere. Just saying
  • Browser vendors. Enough said
Where do we go to next?

Fuzzing isn’t the first thing that comes to mind when you set off to build your business.

We are at a point where we are dealing and addressing fuzzing, and at the layers of RTP is what people seem to be doing (at least a bit). We’ve come a long way since we started with WebRTC and it is a good sign.

 

To Fuzz or not to Fuzz? Where should you spend your energies with WebRTC? If you need help with that, just contact me.

The post All the Truth About the Latest (non)Hype of Fuzzy Testing WebRTC Applications appeared first on BlogGeek.me.

Is Chrome on its Way to be ONLY Browser out there? (Microsoft throwing the towel on Edge)

Mon, 12/10/2018 - 12:00

Chrome=The web. Is that a good thing or a bad thing?

I’ve always said that Chrome is almost the only browser we need. Microsoft Edge was always an easy target to mock. And it now seems that Microsoft has thrown the towel on Edge and its technology stack as a differentiating factor and has decided to *gasp* use Chromium as the engine powering whatever comes next.

A long explanation from Microsoft on the move was published on github (more on GitHub later).

What are Browsers made of?

I’ll start with a quick explanation of how I see a browser’s architecture. It is going to be rather simplistic and probably somewhat far from the truth, but it will be good enough for us for now.

A browser is built out of two main pieces: the renderer and the runtime engine.

The Renderer deals with displaying HTML pages with their CSS styling. Today, it probably also deals with CSS animation. It is what takes your webpage and renders it into something that can be displayed on the screen.

The Runtime Engine was all about executing JavaScript code inside the browser. It is what makes it interactive in modern browsers. It is usually called JavaScript Engine, but it is already running also WebAssembly, hence my preference in referring it as Runtime.

On top these two pieces sits the browser engine itself, which is later wrapped by the browser.

Who Uses What?

That illustration of the browser makeup above? It shows in gray the components that Google uses in Chrome. Each browser vendor picks and chooses its own components.

In the past, we effectively had 3 browsers engines: “Firefox”, “Internet Explorer” and “WebKit”

WebKit was used by both Safari and Chrome. That until 2013 when Google decided to part ways and create Blink – it started by deleting everything it didn’t use out of WebKit and continue from there. In a way, it is a fork of WebKit, to the point that code integrated into WebKit oftentimes comes directly by porting it enmasse from Blink/Chromium (this is how WebRTC is implemented in Safari/WebKit today).

Up until a year ago, we had 4 roughly independent browser engines for the major 4 browsers:

  1. Chrome, using Chromium, Blink and V8
  2. Firefox, using its own tech stack; with Gecko as the rendered, being replaced by Servo
  3. Safari uses WebKit and Nitro
  4. Edge had its own stuff – EdgeHTML and Chakra; now migrating to Chromium tech (and maybe a rebranded name instead of Edge?)

Internet Explorer is all but dead.

Edge was never getting useful market share and now moving to embrace Chromium.

Apple’s Safari… I am not sure how much Apple cares about Safari, and besides, WebKit gets its fare share of code from Google’s Blink project. On top of it all, it runs only on Apple devices, limiting its popularity and use.

In a way, we’re down to two main browser stacks: Google’s and Mozilla’s

Mozilla wrote about the end of the line for EdgeHTML and they are spot on:

If one product like Chromium has enough market share, then it becomes easier for web developers and businesses to decide not to worry if their services and sites work with anything other than Chromium. That’s what happened when Microsoft had a monopoly on browsers in the early 2000s before Firefox was released. And it could happen again.

I’ve tried Firefox and Edge a year or two ago. They worked well enough. But somehow they weren’t Chrome (possibly because I am a heavy user of Google services), so it just made no sense to stick with any of them when Chrome feels too much like “home”.

Is the current state of affair lifts Chromium to the status of Linux? More on that a bit later down this article.

Chrome’s Dominance

I’ve taken a snapshot of StatCounter’s desktop browsers market share:

If you are more interested in the numbers than that boring visual line, then here you go:

Chrome with over 72%; IE and Safari at 5%; Edge at 4%.

Firefox has a single digit 9%.

Funnily enough, all non-Chrome browsers are trending downwards. Even Safari which should enjoy growth due to an increase of Mac machines out there (for some unknown reason they are popular with developers these days – go figure).

Even if you ignore the desktop and check mobile only (see here), Chrome gets some 53% versus Safari’s 22%.

Investing in browser development isn’t a simple task. There are several vectors that need to be pursued at all times:

  • Adherence to the HTML5 specification(s), adding new components to it along the way (PWA, WebGL, WebVR, WebAssembly, Web Workers to name a few)
  • Deal with backward compatibility of the billions of web pages that are out there as much as possible
  • Handle security aspects
  • Deal with performance and bloat
  • Support hardware acceleration for optimized performance where possible, a trend that is becoming common

It would be safe to say that Chrome enjoys 100’s of Google employees developing code that goes directly into their Chrome browser.

Where will Microsoft take Edge?

Microsoft under the lead of CEO Satya Nadella has shifted towards the cloud and is doubling down on the enterprise. To a big extent, its XBox business is an anomaly in the Microsoft of 2018.

Where once Microsoft was all about Windows and the Office suite, it has shifted towards Office 365 (subscription versus licensing business model for Office) and its Azure cloud. Windows is still there, but its importance and market dominance are a far cry from where it was a decade ago. Microsoft knows that and is making the necessary changes – not to win back the operating system market, but rather to grow its businesses on other core competencies and assets.

Microsoft Edge was an attempt to shed Internet Explorer. Give its browser a complete rewrite and bring something users would enjoy using. That hasn’t turned well. After all the investment in Edge, it had a small market share to show for it, with many of the users switching to Windows 10 opting to switch to Chrome instead of Edge.

This user behavior is surprising to say the least. With a default browser that is good enough (Edge), why would they make the conscious decision of browsing to chrome.com to download and install a different browser that does what Edge does?

Microsoft tried and failed to change this user behavior, which led it to the conclusion that Edge, or at least the innards of Edge are a waste of resources.

Why is opting for Chromium as a browser engine makes sense for Microsoft?

As Microsoft is shifting to the cloud, and Edge focusing on web standards, the end result was that anything and everything that Microsoft invested in for its web based services (Office 365 for example) has to work first and foremost on Chrome – that’s where users are anyway.

Google is using Chrome to drive proprietary initiatives to optimize its services for users and push them as standards later (think SPDY turn HTTP/2, QUIC or its latest Project Stream). It can do it due to its market dominance in browsers and the huge amount of web assets they operate. Microsoft never had that with Edge, so any proprietary initiatives on Microsoft’s part in web technologies was bound to fail.

Microsoft derived no value out of maintaining its own browser technology stack, and investing 100’s of developers on it was an expensive and useless endeavor.

So it went with Chromium.

Chromium brings one more benefit – theoretically, Microsoft can now push its browser to non-Windows 10 devices. Mac and Linux included. And since Microsoft is interested more in Office and Azure than it is in Windows, having an optimized “window” towards Office and Azure in the form of a Chromium-based Microsoft browser that works everywhere made sense.

This also means where Microsoft does want to focus its efforts in the browser – the user interface and experience, as well as in delivering the Microsoft services to customers.

Microsoft cannot forgo having its own browser and just pre-installing Chrome or even Firefox on its Windows operating system. That would mean ceding to much control to others. It has to have its own browser.

Windows Chromiumized

Remember that browser architecture I shared in the beginning? It is changing in one critical way. Google decided to create an “operating system” and call it Chrome OS, which ends up being based to some extent on the browser itself:

We spend more time in front of web applications that reside in the browser (or in Electron apps) and less inside native apps. This means that in many ways, the browser is the operating system.

Google derives all of its value from the internet, with the browser being the window there.

Microsoft is heading in the same direction, and where it matters for it with its operating system, it finds itself now competing against Chrome OS and Chromebooks, making it a huge threat to Microsoft and Office.

And obviously, there’s a “lite” version of Windows in the works, at least by the reports on Petri. Is this related to Edge using Chromium in some way? Would Windows Lite be web focused in the same way that Chrome OS is?

Who Controls Chromium? And is it the new Linux?

Back to Chromium, and the reasons that the Microsoft news is making ripples in the web around openness and positive fragmentation.

Browsers are becoming operating systems in many ways. Can we correlate between Linux and its ecosystem to Chromium and its growing ecosystem?

Linux and Ownership

I’d say that these are two distinctly different cases. If anything, Chromium’s status should worry many out there. It is less about monocultures, openness and high words and more about control and competitive advantage.

On opensource.com, Greg Kroah-Hartman Feed wrote two years ago a piece titled 9 lessons from 25 years of Linux kernel development. Here’s lesson 6:

6. Corporate participation in the process is crucial, but no single company dominates kernel development.

Some 5,062 individual developers representing nearly 500 corporations have contributed to the Linux kernel since the 3.18 release in December of 2014. The majority of developers are paid for their work—and the changes they make serve the companies they work for. But, although any company can improve the kernel for its specific needs, no company can drive development in directions that hurt the others or restrict what the kernel can do.

This is important.

Who really controls Linux? Who owns it? Who decides what comes next? The fact that there are no clear answers to these questions is what makes Linux so powerful and so useful to the industry as a whole.

Chromium and Google

Does the same apply to Chromium?

Chromium is a Google owned project. Hosted on a Google domain. Managed using Google tooling. Maintained by Google. This includes all the main browser pieces that are created, controlled and owned by Google to a large extent: the V8 JavaScript Engine, Blink web renderer and Chromium itself.

When someone wants to contribute into Chromium, they need to go through a rigorous process. One that takes place at Google’s leisure and based on their priorities. This is understandable. Chromium is what Chrome is made up of, and Chrome gets released to a billion users every 6-8 weeks. Breakage there ends with backlash. Security holes there means vulnerability at a large scale.

While these aspects of stability and security are there with Linux as well, when it comes to Chromium, Google is the one that is setting the priorities.

It doesn’t end with priorities. It goes to the types of web experiments and proprietary features that end up in Chrome. Since Google controls and owns the Chromium stack… it can do as it pleases.

Will Google cede control of Chromium just because?

No.

It might benefit the open-whatever if it did, but it would also slow down innovation and won’t further Google’s own cause.

Microsoft and Chromium

Microsoft is painting this in colors of open source and collaboration with the industry.

It isn’t.

This is about Microsoft going with Chromium because Edge took a few bad turns in its strategy from the get go:

  1. Limiting Edge to Windows 10 only
    • Internet Explorer was always a Windows play, ignoring its stint on Mac
    • Microsoft today is in a very different place – access to its services across all devices is what is driving it
    • This requires its browser to run everywhere and not be limited to Windows 10
  2. Making Edge all about performance and security
    • When Chrome was released, its leading pitch was exactly that. A secure browser with high performance
    • As it grew in adoption, all browsers focused more resources towards that goal, and today, it is a moot point
    • While Chrome is definitely a memory and resource hog, there’s no big backlash due to it
    • Trying to take that same strategy as a differentiating point failed
  3. Not differentiating Edge through Microsoft’s assets
    • There’s a challenge in this one. Take Office 365. If you make it run better on Edge and purposefully harming it on Chrome, you lose on (1) – you limit it on non-Windows devices
    • Microsoft should have invested in a world where the user’s profile and preferences are stored in the cloud. Google and Apple devices “just work” when you plugin them in with your credentials. Microsoft doesn’t really
    • Having a user’s profile in the cloud, easily accessible via Edge would strengthen the tie between people using Office and Azure to an Edge browser, keeping them away from Chrome

Going with Chromium means two things to Microsoft:

  1. Working on making Chromium (and by extension the new Edge) work perfectly on Windows devices (not only Windows 10, but also Windows 7, HoloLens and whatever comes next in the Internet of Things). This is an optimization effort, simply shifting it from what was Edge towards Chromium
  2. Doubling down on the differentiation of Edge based on a single browser engine, which is where it should have focused in the first place anyway

The only challenge here is that it comes to Chromium as just another vendor. Not a partner or an owner.

A Single WebRTC Stack

At the recent Kranky Geek event, Microsoft discussed its WebRTC on UWP project. Part of it was about merging changes it made to the WebRTC code from webrtc.org (=the code that goes into Chrome). Here’s how James Cadd framed it in his session:

… after 4 years of maintaining a fork on github, we’ve been discussing with Google the possibility of submitting this back to the webrtc.org repo and we’re working on that now. The caveat is that there’s no guarantee that we’ll get 100% of the way there. We’re mostly using the public submission process, so we’re going through reviews just like everyone does, but that’s our goal.

The UWP specific changes are going to live in sdk-contrib-windows so we will have our own little area to contribute this back. Microsoft has comitter rights there, so we’ll be able to keep everything moving there. […]

So just wanted to say thank you to Google for that opportunity. We’re looking forward for the collaboration.

A master and a slave? A landlord and a tenant? A patron and a client? Two partners? I am not sure what the exact relation here is, but it should be similar to what Microsoft has probably struck with Google across the board for all Chromium related technologies that are dear to Microsoft in one way or another.

Is a single stack good or bad?

If we look at it from a browser level perspective, we aren’t in a different position in the technology diversity than 8 years ago:

And here’s where we are today:

The main difference is market share – Chrome is eating up the internet with Blink and Chromium. Factor in Node.js which uses V8 JavaScript engine and you get the same tech running servers as well.

WebRTC specific though? Now runs on webrtc.org code only. All browser vendors pick bits and pieces from it for their own implementations, and while they are differences between browsers they aren’t many.

As I said before in many of my articles here – most developers today can simply develop their code for Chrome and be done with it; adding support for more browsers only if they really really really need to.

Browsers are one piece of getting WebRTC to run. Check out what else you’ll need in this free video series unraveling the server side story of WebRTC:

Register to the video series

Could Microsoft Buy Their way into Browser Market Share?

Not really. If they could have, they would done so instead of going Chromium.

Let’s start from why such a move would be appealing.

GitHub

The recent acquisition of GitHub by Microsoft can be taken as a case point. Especially considering at the varied reactions it brought across the board.

6 months after that announcement, the sky haven’t fallen. Open source hasn’t been threatened or gobbled up by Microsoft. And Microsoft is even using GitHub for its own projects, and to announce its own initiatives – Edge using Chromium for example.

Time will tell, but my gut tells me that Microsoft’s acquisition of GitHub is as meaningful as Facebook’s acquisition of Whatsapp and Instagram. These made little sense at the time from a valuation standpoint, but no one is doubting these acquisitions today.

With GitHub, Microsoft is buying its way into open source. Not only as lip service, but also in understanding how open source works. By owning a large portion of the open source interactions, and being able to analyze them closely, Microsoft can tell where developers are headed and what they are after. Microsoft was always successful due to the developers using their platform (top notch tools for developers – always). GitHub allows them to continue with that in an open source world.

Then why not the browser market?

There were two assets that could be acquired here – Mozilla and Electron.

Electron

Electron is already developed and maintained by GitHub directly. Microsoft owns it already.

What advantages does Microsoft derive from Electron? None, assuming you remember that Electron runs on top of Chromium.

From a strategic standpoint, there’s no value in Electron for Microsoft. At the end of the day, Electron is a window to Chromium and to web applications.

Microsoft is using it for its own cross platform applications – Skype on Linux has been known to use Electron for several years now.

Owning Electron through GitHub doesn’t help Microsoft in its browser market share.

Mozilla

Mozilla would have been an interesting acquisition.

Similarly to GitHub, it would be acquiring the obvious open source vendor. The challenge here is twofold:

  1. Mozilla wouldn’t want to be acquired and would rather stay independent, as this is their stance and current market position. It may change, but resistance internally from Mozilla employees would rather be big
  2. Firefox market share is now a single digit and the trend isn’t a positive one

Furthermore, acquiring Firefox as a window to Microsoft’s services and assets in the cloud is exactly one of them things that Mozilla is fighting Google against. It would be counterproductive to go there.

Microsoft has no one to buy in order to improve its position and market share in browsers.

It could only continue to fight it out with Edge or partner. And it decided to partner with the goliath in the room (an elephant wouldn’t be visible enough).

Will Chrome Reign Supreme?

Yes.

Anyone thinks otherwise?

The post Is Chrome on its Way to be ONLY Browser out there? (Microsoft throwing the towel on Edge) appeared first on BlogGeek.me.

What Does Machine Learning Have to do with MOS Scores?

Mon, 12/03/2018 - 12:00

What Does Machine Learning Have to do with MOS Scores?

Human subjectivity in MOS calculations doesn’t hold water when it comes to heterogeneous environments. That’s where machine learning comes to play.

MOS score. That Mean Opinion Score. You get a voice call. You want to know its quality. So you use MOS. It gives you a number between 1 to 5. 1 being bad. 5 being great. If you get 3 or above – be happy and move on they say. If you get 4.something – you’re a god. If you don’t agree with my classification of the numbers then read on – there’s probably a good reason why we don’t agree.

Anyways, if you go down the rabbit hole of how MOS gets calculated, you’ll find out that there isn’t a single way of doing that. You can go now and define your own MOS scoring algorithm if you want, based on tests you’ll conduct. From that same Wikipedia link about MOS:

“a MOS value should only be reported if the context in which the values have been collected in is known and reported as well”

Phrased differently – MOS is highly subjective and you can’t really use MOS scores produced in one device to MOS scores produced in another device.

This is why I really truly hate delving into these globally-accepted-but-somewhat-useless quality metrics (and why we ended up with a slightly different scoring system in testRTC for our monitoring and testing services).

What Goes into MOS Scoring Calculations?

Easy. everything.

Or at least everything you have access to:

  • RTCP sender and receiver reports
  • Received RTP packets
  • Knowing the voice codec used
  • Actually decoding the audio stream and “listening” to it
  • Understanding what the end user is really going to hear

Here are a few examples:

Physical desk phone

A physical IP phone has access to EVERYTHING. All the software and all the hardware.

It even knows how the headset works and what quality it offers.

Theoretically then, it can provide an accurate MOS that factors in everything there is.

Android native app

Android apps have access to all the software. Almost. Mostly.

The low level device drivers are as known as the hardware that app is running on. The only problem is the number of potential devices. A few years back, these types of visualizations of the Android fragmentation were in fashion:

This one’s from OpenSignal. Different devices have different location for their mics and speakers. They use different device drivers. Have different “flavors” of the Android OS. They act differently and offer slightly different voice quality as well.

What does measuring what an objective person think about the quality of a played audio stream mean in such a case? Do we need to test this objectivity per device?

Media server who routes voice around

Then we have the media server. It sends and receives voice. It might not even decode the audio (it could, and sometimes it does).

How does it measure MOS? What would it decide is good audio versus bad audio? It has access to all packets… so it can still be rather accurate. Maybe.

WebRTC inside a browser

And we have WebRTC. Can’t write an article without mentioning WebRTC.

Here though, it is quite the challenge.

How would a browser measure MOS of its audio? It can probably do a good a job as an Android device. But for some reason, MOS scoring isn’t part of the WebRTC bundle. At least not today.

So how would a JavaScript web application calculate MOS of the incoming audio? By using getStats? That has access to an abstraction on top of the RTCP sender and receiver reports. It correlates to these to some extent. But that’s about as much as it has at its disposal for such calculations, which doesn’t amount for much.

Back to MOS calculations

But what does MOS really calculate?

The quality of the voice I hear in a session?

Maybe the quality of voice the network is capable of supporting?

Or is it the quality of the software stack I use?

What about the issue with voice quality when the person I am speaking with is just standing in a crowded room? Would that affect MOS? Does the actual original content need to be factored into MOS scores to begin with?

I’ll leave these questions opened, but say that in my opinion, whatever quality measurement you look at, it should offer some information to the things that are in your power to change – at least as a developer or product owner. Otherwise, what can you do with that information?

What Affects Audio Quality in Communications?

Everything.

  • The quality of the microphone used to record the original audio (though this usually gets neglected in discussions around MOS)
  • The location of the person speaking – a crowded room, airport, next to a working vacuum cleaner – or in a silent recording studio
  • The voice codec used, its configuration and the level and aggressiveness of the compression it is using for this session
  • The network conditions – in the last mile from both the sender and the receiver, of every hop along the way and the routers and servers it has to pass through
  • The media servers – and every possible aspect about them
  • The receiver’s software. Especially the jitter buffer and packet loss concealment algorithms
  • The sender’s acoustic echo cancellation implementation quality
  • The receiver’s voice decoder implementation
  • The receiver’s speakers

I am sure I missed a bullet or two. Feel free to add them in the comments.

The thing is, there’s a lot of things that end up affecting audio quality when you make the decision of sending it through a network.

Is Machine Learning Killing MOS Scoring or Saving It?

So what did we have so far?

A scoring system – MOS, which is subjective and inaccurate. It is also widely used and accepted as THE quality measure of voice calls. Most of the time, it looks at network traffic to decide on the quality level.

At Kranky Geek 2018, one of the interesting sessions for me was the one given by Curtis Peterson of RingCentral:

He discussed that problem of having different MOS scores for the SAME call in each device the call passes through in the network. The solution was to use machine learning to normalize MOS scoring across the network.

This got me thinking further.

Let’s say one of these devices provides machine learning based noise suppression. It is SO good, that it is even employed on the incoming stream, as opposed to placing it traditionally on the outgoing stream. This means that after passing through the network, and getting scored for MOS by some entity along the way, the device magically “improves” the audio simply by reducing the noise.

Does that help or hurt MOS scoring? Or at least the ability to provide something that can be easily normalized or referenced.

Machine Learning and Media Optimization

We’ve had at Kranky Geek multiple vendors touching the domain of media optimizations. This year, their focus was mainly in video – both Agora.io and Houseparty gave eye opening presentations on using machine learning to improve the quality of a received video stream. Each taking a different approach to tackling the problem.

While researching for the AI in RTC report, we’ve seen other types of optimizations being employed. The idea is always to “silently” improve the quality of the call, offering a better experience to the users.

The next couple of years, we will see this area growing fast, with proprietary algorithms and techniques based on machine learning are added to the arms race of the various communication vendors.

Interested in more of these sessions around real time communications and how companies solve problems with it today?

Subscribe to our YouTube channel

The post What Does Machine Learning Have to do with MOS Scores? appeared first on BlogGeek.me.

Pages

Using the greatness of Parallax

Phosfluorescently utilize future-proof scenarios whereas timely leadership skills. Seamlessly administrate maintainable quality vectors whereas proactive mindshare.

Dramatically plagiarize visionary internal or "organic" sources via process-centric. Compellingly exploit worldwide communities for high standards in growth strategies.

Get free trial

Wow, this most certainly is a great a theme.

John Smith
Company name

Startup Growth Lite is a free theme, contributed to the Drupal Community by More than Themes.