Subscribe to webrtchacks feed
Guides and information for WebRTC developers
Updated: 2 hours 44 min ago

Facetime doesn’t face WebRTC

Tue, 06/09/2015 - 16:03

This is the next decode and analysis in Philipp Hancke’s Blackbox Exploration series conducted by &yet in collaboration with Google. Please see our previous posts covering WhatsApp and Facebook Messenger for more details on these services and this series. {“editor”: “chad“}


FaceTime is Apple’s answer to video chat, coming preinstalled on all modern iPhones and iPads. It allows audio and video calls over WiFi and, since 2011, 3G too. Since Apple does not talk much about WebRTC (or anything else), maybe we can find out if they are using WebRTC behind the scenes?

As part of the series of deconstructions, the full analysis (another sixteen pages) is available for download here, including the Wireshark dumps.
If you prefer watching videos, check out the recording of this talk I did at Twilio’s Signal conference where I touch on this analysis and the others in this series.

In a nutshell, FaceTime

  • is quite impressive in terms of quality,
  • requires an open port (16402) in your firewall as documented here,
  • supports iOS and MacOS devices only,
  • supports simultaneous ring on multiple devices,
  • is separate from the messaging application, unlike WhatsApp and Facebook Messenger,
  • announces itself by sending metrics over an unencrypted HTTP connection (Dear Apple, have you heard about pervasive monitoring?)
  • presumably still uses SDES (no signs of DTLS handshakes, but I have not seen a=crypto lines in the SDP either).

Since privacy is important, it is sad to see a complete lack of encryption in the HTTP metrics call like this one:

Example of an unencrypted keep alive packet that could be intercepted by a 3rd party to track a user


FaceTime has been analyzed earlier- first when it was introduced back in 2010 and more recently in 2013. While the general architecture is still the same, FaceTime has evolved over the years like adding new codecs like H.265 when calling over cellular data.

What else has changed? And how much of the changes can we observe? Is there anything those changes tell us about potential compatibility with WebRTC?

Still using SDES

It is sad that Apple continuing to use SDES almost two years after the IETF at it’s Berlin meeting where it was decided that WebRTC MUST NOT Support SDES. The consensus on this topic during the meeting was unanimous. For more background information, see either Victor’s article on why SDES should not be used or dive into Eric Rescorla’s presentation from that meeting comparing the security properties of both systems.

NAT traversal

Like WebRTC, FaceTime is using the ICE protocol to work around NATs and provide a seamless user experience. However, Apple is still asking users to open a certain number of ports to make things works. Yes, in 2015.

Their interpretation of ICE is slightly different from the standard. In a way similar to WhatsApp, it has a strong preference for using a TURN servers to provide a faster call setup. Most likely, SDES is used for encryption.


For video, both the H.264 and the H.265 codecs are supported, but only H.264 was observed when making a call on a WiFi. The reason for that is probably that, while saving bandwidth, H.265 is more computationally expensive. One of the nice features is that the optimal image size to display on the remote device is negotiated by both clients.


For audio, the AAC-ELD codec from Fraunhofer is used as outlined on the Fraunhofer website.
In nonscientific testing, the codec did show behaviour of playing out static noise during wifi periods of packet loss between two updated iPhone 6 devices.


The signaling is pretty interesting, using XMPP to establish a peer-to-peer connection and then using SIP to negotiate the video call over that peer-to-peer connection (without encrypting the SIP negotiation).

This is a rather complicated and awkward construct that I have seen in the past when people tried to avoid making changes to their existing SIP stack. Does that mean Apple will take a long time to make the library used by FaceTime generally usable for the variety of use cases arising in the context of WebRTC? That is hard to predict, but this seems overly complex.

Quality of Experience

FaceTime offers an impressive quality and user experience. Hardware and software are perfectly attuned to achieve this. As well as the networking stack as you can see in the full story.


{“author”: “Philipp Hancke“}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

The post Facetime doesn’t face WebRTC appeared first on webrtcHacks.

Why put WebRTC in Webkit? Q&A with the webrtcinwebkit team

Tue, 05/26/2015 - 15:45

The world of browsers and how they work is both complex and fascinating. For those that are new to the browser engine landscape, Google, Apple, and many others collaborated on an open source web rendering engine for many years known as WebKit.  WebKit has active community with many less well known browsers that use it, so the WebKit community was shocked when Google announced they would fork WebKit into a new engine for Chrome called Blink.

Emphasis for implementing WebRTC shifted with Google into Blink at the expense of WebKit. To date, Apple has not given any indications it was going to add  WebRTC into WebKit (see this post for an idea on nudging them). This is not good for the eclectic WebKit development community that would like to start working with WebRTC or those hoping for WebRTC support in Apple’s browsers.

Then an interesting project by Ericsson, Temasys, and Igalia was announced –  webrtcinwebkit. As the name implies, the goal of this project was to put WebRTC in WebKit, by themselves. Many questions come to mind, but the most important may be is this a worthwhile effort and one that other WebRTC developers should be involved in?

I asked webrtcinwebkit contributors Alex Gouaillard and Stefan Håkansson to give some more background and help answer who should care about this project and why they are working on this.

{“intro-by”, “chad“}




webrtcHacks: For those who are not familiar, can you describe the webrtcinwebkit project?

Alex:  webrtcinwebkit is a project aiming at bringing WebRTC support in WebKit which is the foundation of a lot of web browsers like Safari and web browsing frameworks like WebView used extensively in Mobile applications.

webrtcHacks: How did this come together? Who are the major contributors to this project?

Alex:  in 2014, Ericsson had an initial implementation to flesh out and was thinking about open sourcing the underlying engine now known as OpenWebRTC. Igalia’s Philippe Normand had been trying to integrate WebRTC in WebKitgtk+ using Ericsson’s 2011 code base for some time, Temasys was trying to find a way to enable WebRTC in iOS and remove the need for a plugin in desktop Safari. At a W3C meeting in Santa Clara in November 2014, a discussion happened with an Apple employee that make the parties believe that there was a way. The project was born.

webrtcHacks: Why do you think Apple has not implemented WebRTC already? How will webrtcinwebkit change that?

Alex:  Those who know can’t talk and those who talk likely don’t know.

Our answer is: we don’t know for sure.

webrtcHacks: How much of this project is about Apple vs. other WebKit users?

Alex:  Not at all about Apple. We’re working on adding WebRTC API’s to WebKit, and to back it with OpenWebRTC for the WebKitgtk+ port. To what extent other ports make use of the WebRTC API’s is up to them, but of course we hope they will use them.

webrtcHacks: was there a lot of WebRTC in WebKit already from before Google forked blink?

Stefan:  Yes, there was quite a bit of WebRTC in there, contributed by Ericsson, Google, Igalia and others. Some parts can be re-used, but largely the standard has evolved so much that we had to start over.


WebKit commits per month after Blink. Source: Juan J Sanchez, Igalia (May 2014)

webrtcHacks: Can you comment on how your perspective of how Google’s forking of webkit has impacted the webkit community? Isn’t everyone just moving to blink now?

Stefan:  Just looking at the revision history of the WebKit source code will immediately show that the WebKit project is very much alive.

webrtcHacks: Is there value in webrtcinwebkit for other developers even if Apple ignores this?

Alex:  the WebKit community itself is pretty big, and again as far as mobile and embedded devices are concerned, this is the main way of handling web pages. Whether Apple adopt the changes or not, this is good news for many people.

Stefan: I agree fully. I would also like to add the value for the WebRTC community and the standardization effort of a second, independent implementation, as the initial back end used will be OpenWebRTC. The current WebRTC implementations in Chrome, Firefox and Opera all to a large extent use the same webrtc.org backend so we think it is a point in using OpenWebRTC as the backend as it gives a second, truly independent, implementation of the standard.

webrtcHacks: How much effort is  going into this project? Is it staffed adequately for success?

Alex:  Everybody is working on separate projects, so depending for example if you count people working on OpenWebRTC as being part of this project or not, the count will change. There are roughly the following Full-time-equivalent (FTE) staff:

  • 4  at Ericsson
  • 2 Ericsson contractors
  • 2 Temasys
  • 1 Igalia

Not all FTE are equal contributors though and the main technical leaders (for WebKit) are with Ericsson and Igalia.

webrtcHacks: How long will it take to complete?

Alex:  We have a version that includes getUserMedia already working a demoable in a separate fork. Ericsson is polishing the PeerConnection, while Temasys is handling Datachannel. It then takes a little bit of time to submit patches to upstream WebKit, so the nightly version of WebKit is still behind. OpenWebRTC and webrtc in WebKit is based on a very recent version of GStreamer, and updating that component in WebKit as far reaching consequences. It touches all the media functionalities of WebKit. We think this will take some time to get in, then the following patches should be self contained and easier to push. We aim at a June timeline.

webrtcHacks: I have heard many comments about GStreamer not being appropriate for real time applications. Can you address that topic?

Stefan:  That does not match our experience at all. We’ve been using GStreamer for many real time applications since back in 2010 when we showed our first implementation of what eventually became WebRTC, and we’ve consistenly got good real time performance. The fact that there is a lively GStreamer community is also a big plus, that enabled OpenWebRTC to quickly make use of the HW accelerated video codec that became available in iOS 8 – something I have heard webrtc.org still doesn’t for example.

webrtcHacks: How tightly coupled is webrtcinwebkit with OpenWebRTC? Is webrtcinwebkit really a derivative and dependant on OpenWebRTC?

Stefan: Although our initial implementation is backended by OpenWebRTC, the project is designed to fully support other implementations, so no, webrtcinwebkit is not a derivative on OpenWebRTC.

Alex:  Given enough time (and resource), the original plan was to have two back ends for webrtcinwebkit: both OWR from OpenWebRTC and libwebrtc from webrtc.org. It’s not clear yet if we will be able to make it happen, but it’s still a wish.

Browser Engine Share Chrome Blink 41% IE Trident 14% Safari Webkit 14% Android Browser Webkit 7% Firefox Gecko 12% Opera Blink 4% UC Browser ?? 3% Nokia Webkit 1% Browser Usage Share – May 2014 to Apr 2015
Source: http://gs.statcounter.com/#all-browser-ww-monthly-201405-201504-bar

webrtcHacks: How has the community developed in the month since you launched this project? How many contributors are outside of Ericsson and Temasys?

Alex:  We announced the project before before it was ready for others to contribute. Most of the WebRTC ecosystem was under the impression there was nothing being done on that side, which was just not true. We wanted to let people know something was brewing, and that they would be able to contribute soon. We also wanted to welcome early feedback.

We received a lot of interest by e-mail, from different companies. We eventually made our fork of WebKit, our working copy if you wish, public on GitHub. Today anybody can make a pull request. That being said, we believe it would be better to wait until we have a first implementation of getUserMedia, PeerConnection and DataChannel before they jump in. For most of the work, you need a first implementation of Offer/Answer, and the capacity to see on screen the video streams, or hear the audio stream before you can start to debug. That should happen in a matter of weeks now.

webrtcHacks: What does success of this project look like? How will you know when this succeeds?

Alex:  As soon as WebKitgtk+, one of the default Linux browsers, supports WebRTC, we think it will be a success. Eventually seeing other ports – other browsers using WebKit  adopting those changes will be very gratifying too.

{“interviewer”, “chad“}

{“interviewees”, [“Alex Gouaillard“, “Stefan Håkansson“]}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

The post Why put WebRTC in Webkit? Q&A with the webrtcinwebkit team appeared first on webrtcHacks.

Facebook Messenger likes WebRTC

Mon, 05/11/2015 - 12:37

Two weeks ago Philipp Hancke,  lead WebRTC developer of Talky and part of the &yet‘s WebRTC consulting team, started a series of posts about detailed examinations he is doing on several major VoIP deployments to see if and how they may be using WebRTC. Please see that post on WhatsApp for some background on the series and below for another great analysis – this time on Facebook Messenger. {“editor”: “chad“}


Last week, Facebook announced support for video chats in their Messenger app. Given that Messenger claims to account for 10% of global mobile VoIP traffic, this made in a very interesting target for further investigation. As part of the series of deconstructions, the full analysis (another fifteen pages, using the full range of analysis techniques demonstrated earlier) is available for download here, including the wireshark dumps.


Facebook Messenger likes WebRTC

Facebook Messenger is an extremely popular messaging and communications app. It works in Chrome / Firefox and Opera as well as Android and iOS. The mobile versions use the WebRTC.org library and Messenger is optimized heavily for mobile use cases:

  • Instead of DTLS, the older SDES encryption scheme is used between mobile clients. While not offering features such as perfect forward secrecy, it provides faster call setup time.
  • The audio codec depends on platform and peer. Opus, iSAC and iSAC Low Complexity (LC) are used, with a packetization that prefers fewer, larger packets.
  • VP8 is used as video codec on web, iOS and Android. Interestingly, Facebook chose VP8 over H.264 for iOS, even though VP8 has no hardware acceleration capability on Apple iOS devices.
Comparison with WebRTC  Feature WebRTC/RTCWeb Specifications Messenger SDES MUST NOT offer SDES uses SDES between mobile clients, DTLS with browsers ICE RFC 5245 RFC 5245 with some old google quirks TURN usage used as last resort TURN and old Google relay Audio codec Opus or G.711 Opus, iSAC, ISAC Low Complexity Video codec VP8 or H264 VP8 WebRTC fulfilling its promise: No plugins required

Back in 2011, Facebook launched Skype-powered video calling. It used a plugin. The need for a plugin is now gone, probably along with other parts of the integration as described in this announcement. Most platforms are supported without requiring users to install something. WebRTC is starting to fulfill its promise: no plugins needed.

The WebRTC rollout at Facebook has been done gradually, starting in early 2015. Chad Hart was one of the first people to notice in mid-January. I took a quick look and was not very impressed by what I found. Basically it was a simple 1-1 webchat akin to Google’s apprtc sample.

Recently, there has been a lot of news on Facebook’s Messenger. They launched messenger.com as a standalone website with support for voice and video between browsers. As Tsahi Levent-Levi pointed out already, it is using WebRTC.

On the mobile side, the Messenger client had been voice only. Video calling is available in the apps as well. All of this warrants a closer look. With WhatsApp and Messenger coming from the same company, we were expecting similarities, making this report easy. Well… that’s not how it ended up.

It turns out that unlike WhatsApp:

  • Messenger uses the webrtc.org library provided by Google
  • VP8 video codec is used for video calls
  • Choice of the audio codec, which includes ISAC and Opus, varies based on the devices used and the other party in the call
  • DTLS is used for encrypting the media between the app and the browsers
  • DTLS is not used in calls between two mobile devices (surprisingly)

While Messenger looks pretty standard, there are quite a number of optimizations here. Some gems are hidden to anyone glancing through the details. Chrome’s webrtc-internals are easily available and were used in the first scenarios tested. It allows looking at the implementation from a number of different angles, all the way from the captured packets, via the signaling protocol and up to the WebRTC API calls.

The iOS application offers a number of non-standardized codecs. In particular, when calling Chrome the iSAC audio codec will be used. Opus is used with Firefox, which does not implement iSAC. In both cases, the app tweaks the codec usage by sending larger frames than the browser. That is quite an interesting hack reminiscent of the WhatsApp behaviour of switching to a larger packetization. I can only speculate that this may show larger packets to be more robust on wireless networks?

Calls between two mobile devices turn out to be more interesting. Here, the optimizations for calling a browser come to bear fully.

Currently, video calls are only supported between either mobile devices or browsers, which will surely change soon. VP8 is used as the codec.

Unlike WhatsApp, Messenger can be run on multiple devices. So when calling someone who is logged in on multiple devices, all devices ring. This is nice behavior, as it allows the called user to choose where to take the call. This makes a lot more sense than previous approaches, e.g. Google Talk attempting to let the caller determine the recipients “most available device” from a UX point of view. In a world where mobile devices are disconnected frequently and need to be woken up via push messages, this is even more important than it was a decade ago.


The most important takeaway is that instead of using DTLS, Messenger uses the older SDES encryption scheme between two mobile clients. This means media can flow as soon as ICE is done, eliminating a round-trip for the DTLS handshake. This shows a desire to reduce the session startup time, similar to what we saw with WhatsApp.
The widely known downside of that is that the encryption keys are sent via the signaling servers and can be used to retroactively decrypt traffic. Government agencies will surely rejoice at the news of this being applicable to 10% of the mobile VoIP traffic…


{“author”: “Philipp Hancke“}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

The post Facebook Messenger likes WebRTC appeared first on webrtcHacks.

What’s up with WhatsApp and WebRTC?

Wed, 04/22/2015 - 17:14

One of our first posts was a Wireshark analysis of Amazon’s Mayday service to see if it was actually using WebRTC. In the very early days of WebRTC, verifying a major deployment like this was an important milestone for the WebRTC community. More recently, Philipp Hancke – aka Fippo – did several great posts analyzing Google Hangouts and Mozilla’s Hello service in Firefox. These analyses validate that WebRTC can be successfully deployed by major companies at scale. They also provide valuable insight for developers and architects on how to build a WebRTC service.

These posts are awesome and of course we want more.

I am happy to say many more are coming. In an effort to disseminate factual information about WebRTC, Google’s WebRTC team has asked &yet – Fippo’s employer – to write a series of publicly available, in-depth, reverse engineering and trace analysis reports. Philipp has agreed to write summary posts outlining the findings and implications for the WebRTC community here at webrtcHacks. This analysis is very time consuming. Making it consumable for a broad audience is even more intensive, so webrtcHacks is happy to help with this effort in our usual impartial, non-commercial fashion.

Please see below for Fippo’s deconstruction of WhatsApp voice calling.

{“editor”: “chad“}


Philipp Hancke deconstructs WhatsApp to search for WebRTC


After some rumors (e.g. on TechCrunch), WhatsApp recently launched voice calls for Android. This spurred some interest in the WebRTC world with the usual suspects like Tsahi Levent-Levi chiming in and starting a heated debate. Unfortunately, the comment box on Tsahi’s BlogGeek.Me blog was too narrow for my comments so I came back here to webrtchacks.

At that point, I had considered doing an analysis of some mobile services already and, thanks to support from the Google WebRTC team, I was able to spend a number of days looking at Wireshark traces from WhatsApp in a variety of scenarios.

Initially, I was merely trying to validate the capture setup (to be explained in a future blog post) but it turned out that there is quite a lot of interesting information here and even some lessons for WebRTC. So I ended up writing a full fifteen page report which you can get here. It is a long story of packets (available for download here) which will be very boring if you are not an engineer so let me try to summarize the key points here.


WhatsApp is using the PJSIP library to implement Voice over IP (VoIP) functionality. The captures shows no signs of DTLS, which suggests the use of SDES encryption (see here for Victor’s past post on this).  Even though STUN is used, the binding requests do not contain ICE-specific attributes. RTP and RTCP are multiplexed on the same port.

The audio codec can not be fully determined. The sampling rate is 16kHz, the codec bandwidth of about 20kbit/s and the bandwidth was the same when muted.

An inspection of the binary using the strings tool shows both PJSIP and several strings hinting at the use of elements from the webrtc.org voice engine such as the acoustic echo cancellation (AEC), AECM, gain control (AGC), noise suppression and the high-pass filter.

Comparison with WebRTC  Feature WebRTC/RTCWeb Specifications WhatsApp SDES MUST NOT offer SDES probably uses SDES ICE RFC 5245 no ICE, STUN connectivity checks TURN usage used as last resort uses a similar mechanism first Audio codec Opus or G.711 unclear, 16khz with 20kbps bitrate Switching from a relayed session to a p2p session

The most impressive thing I found is the optimization for a fast call setup by using a relay initially and then switching to a peer-to-peer session. This also opens up the possibility for a future multi-party VoIP call which would certainly be supported by this architecture. The relay server is called “conf bridge” in the binary.

Lets look at the first session to illustrate this (see the PDF for the full, lengthy description):

  1. The session kicks off (in packet #70) by sending TURN ALLOCATE requests to eight different servers. This request doesn’t use any standard STUN attributes which is easy to miss.
  2. After getting a response the client is exchanging some signaling traffic with the signaling server, so this is basically gathering a relayed candidate and sending an offer to the peer.
  3. Packet #132 shows the client sending something to one of those TURN servers. This turns out to be an RTCP packet, followed by some RTP packets, which can be seen by using Wiresharks “decode as” functionality. This is somewhat unusual and misleading, as it is not using standard TURN functionality like send or data indications. Instead, it just does raw RTP on that.
  4. Packet #146 shows the first RTP packet from the peer. For about three seconds, the RTP traffic is relayed over this server.
  5. In the mean time, packet #294 shows the client sending a STUN binding request to the peer’s public IP address. Using a filter (ip.addr eq and ip.addr eq and (udp.port eq 45395 and udp.port eq 35574)  clearly shows this traffic.
  6. The first response is received in packet #300.
  7. Now something really interesting happens. The client switches the destination of the RTP stream between packets #298 and #305. By decoding those as RTP we can see that the RTP sequence number increases just by one. See this screenshot:

Now, if we have decoded everything as RTP (which is something Wireshark doesn’t get right by default so it needs a little help), we can change the filter to rtp.ssrc == 0x0088a82d  and see this clearly. The intent here is to try a connection that is almost guaranteed to work first (I used a similar rationale in the minimal viable SDP post recently even) and then switch to a peer-to-peer connection in order to minimize the load on the TURN servers.

Wow, that is pretty slick. It likely reduces the call setup time the user perceives. Let me repeat that: this is a hack which makes the user experience better!

By how much is hard to quantify. Only a large-scale measurement of both this approach and the standard approach can answer that.

Lessons for WebRTC

In WebRTC, we can do something similar, but it is a little more effort right now. We can setup the call with iceTransports: ‘relay’ which will skip host and server-reflexive candidates. Also, using a relay helps to guarantee the connetion will work (in conditions where WebRTC will work at all).

There are some drawbacks to this approach in terms of round-trip-times due to TURN’s permission mechanism. Basically when creating a TURN-relayed candidate the following happens (in Chrome; Firefox’s behavior differs slightly):

  1. Chrome tries to create an allocation without authentication
  2. the TURN server asks for authentication
  3. Chrome retries to create an allocation with authentication
  4. the TURN server tells chrome the address and port of the candidate.
  5. Chrome signals the candidate to the JavaScript layer via the onicecandidate callback. That is two full round-trip times.
  6. after adding a remote candidate, Chrome will create a TURN permission on the server before the server will relay traffic from the peer. This is a security mechanism described here.
  7. now STUN binding requests can happen over the relayed address. This uses TURN send and data indications. These add the peer’s address and port to each packet received.
  8. when agreeing on a candidate, Chrome creates a TURN channel for the peer’s address which is more efficient in terms of overhead.

Compared to this, the proprietary mechanism used by Whatsapp saves a number of roundtrips.

this is a hack which makes the user experience better!

If we started with just relay candidates, then, since this hides the IP addresses of the parties involved from each other, we might even establish the relayed connection and do the DTLS handshake before the callee accepts the call. This is known as transport warmup, it reduces the perceived time until media starts flowing.

Once the relayed connection is established, we can call setConfiguration (formerly known as updateIce; which is currently not implemented) to remove the restriction to relay candidates and do an ICE restart by calling createOffer again with the iceRestart flag set to true. This would trigger an ICE restart which might determine that a P2P connection can be established.

Despite updateIce not being implemented, we can still switch from a relay to peer-to-peer today. ICE restarts work in Chrome so the only bit we’re missing is the iceTransports ‘relay’ which just generates relay candidates. Now the same effect can be simulated in Javascript by dropping any non-relay candidates during the first iteration. It was pretty easy to implement this behaviour in my favorite sdp munging sample. The switch from relayed to P2P just works. The code is committed here.

While ICE restart is inefficient currently, the actual media switch (which is hard) happens very seamlessly.


In my humble opinion

Whatsapp’s usage of STUN and RTP seems a little out of date. Arguably, the way STUN is used is very straightforward and makes things like implementing the switch from relayed calls to P2P mode easier. But ICE provides methods to accomplish the same thing, in a more robust way. Using a custom TURN-like functionality that delivers raw RTP from the conference bridge saves some bytes’ overhead for TURN channels, but that overhead is typically negligible.

Not using DTLS-SRTP with ciphers capable of perfect forward secrecy is a pretty big issue in terms of privacy. SDES is known to have drawbacks and can be decrypted retroactively if the key (which is transmitted via the signaling server) is known. Note that the signaling exchange might still be protected the same way it is done for text messages.

In terms of user experience, the mid-call blocking of P2P showed that this scenario had been considered which shows quite some thought. Echo cancellation is a serious problem though. The webrtc.org echo cancellation is capable of a much better job and seems to be included in the binary already. Maybe the team there would even offer their help in exchange for an acknowledgement… or awesome chocolate.


{“author”: “Philipp Hancke“}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

The post What’s up with WhatsApp and WebRTC? appeared first on webrtcHacks.

The WebRTC Troubleshooter: test.webrtc.org

Mon, 04/20/2015 - 09:00

WebRTC-based services are seeing new and larger deployments every week. One of the challenges I’m personally facing is troubleshooting as many different problems might occur (network, device, components…) and it’s not always easy to get useful diagnostic data from users.

troubleshooting (Image source: google)

Earlier this week, Tsahi, Chad and I participated at the WebRTC Global Summit in London and had the chance to catch up with some friends from Google, who publicly announced the launch of test.webrtc.org. This is great diagnostic tool but, to me, the best thing is that it can be easily integrated into your own applications; in fact, we are already integrating this in some of our WebRTC apps.

Sam, André and Christoffer from Google are providing here a brief description of the tool. Enjoy it and happy troubleshooting!

{“intro-by”: “victor“}

The WebRTC Troubleshooter: test.webrtc.org (by Google) Why did we decide to build this?

We have spent countless hours debugging things when a bug report comes in for a real-time application. Besides the application itself, there are many other components (audio, video, network) that can and will eventually go wrong due to the huge diversity among users’ system configurations.

By running small tests targeted at each component we hoped to identify issues and create the possibility to gather information on the system reducing the need for round-trips between developers and users to resolve bug reports.

Test with audio problem

What did we build?

It was important to be able to run this diagnostic tool without installing any software and ideally one should be able to integrate very closely with an application, thus making it possible to clearly identify bugs in an application from the components that power it.

To accomplish this, we created a collection of tests that verify basic real-time functionality from within a web page: video capture, audio capture, connectivity, network limitations, stats on encode time, supported resolutions, etc… See details here. 

We then bundled the tests on a web page that enables the user to download a report, or make it available via a URL that can be shared with developers looking into the issue.

How can you use it?

Take a look at test.webrtc.org and find out what tests you could incorporate in your app to help detect or diagnose user issues. For example, simple tests to distinguish application failures from system components failures, or more complex tests such as detecting if the camera is delivering frozen frames, or tell the user that their network signal quality is weak. 


You are encouraged by us to take ideas and code from GitHub and integrate similar functionality in your own UX. Using test.webrtc.org should be part of any “support request” flow for real-time applications. We encourage developers to contribute! 

In particular we’d love some help getting a uniform getStats API between browsers.

test.webrtc.org repo

What’s next?

Working on adding more tests (e.g. network analysis detecting issues that affect audio and video performance is on the way).

We want to learn how developers integrate our tests into their apps and we want to make them easier to use!

{“authors”: [“Sam“, “André“, “Christoffer”]}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

The post The WebRTC Troubleshooter: test.webrtc.org appeared first on webrtcHacks.

Put in a Bug in Apple’s Apple – Alex Gouaillard’s Plan

Tue, 04/14/2015 - 13:08

Apple Feast photo courtesy of flikr user Overduebook. Licensed under Creative Commons NC2.0.

One of the biggest complaints about WebRTC is the lack of support for it inside Safari and iOS’s webview. Sure you can use a SDK or build your own native iOS app, but that is a lot of work compared to Android which has Chrome and WebRTC inside the native webview on Android 5 (Lollipop) today. Apple being Apple provides no external indications on what it plans to do with WebRTC. It is unlikely they will completely ignore a W3C standard, but who knows if iOS support is coming tomorrow or in 2 years.

Former guest webrtcHacks interviewee Alex Gouillard came to me with an idea a few months ago for helping to push Apple and get some visibility. The idea is simple – leverage Apple’s bug process to publicly demonstrate the desire for WebRTC support today, and hopefully get some kind of response from them. See below for details on Alex’s suggestion and some additional Q&A at the end.

Note: Alex is also involved in the webrtcinwebkit project – that is a separate project that is not directly related, although it shares the same goal of pushing Apple. Stay tuned for some coverage on that topic.

{“intro-by”: “chad“}

Plan to Get Apple to support WebRTC The situation

According to some polls, adding WebRTC support to Safari, especially on iOS and in native apps in iOS, is the most wanted WebRTC item today.

The technical side of the problem is simple: any native app has to follow Apple’s store rules to be accepted in the store. These rules state that any apps that “browse the web” need to use Apple provided WebView [rule 2.17] based on the WebKit framework. Safari is also based on WebKit. WebKit does not Support WebRTC… yet!

First Technical step

The webrtcinwebkit.org project aims at addressing the technical problem within the first half of 2015. However, bringing WebRTC support to WebKit is just part of the overall problem. Only Apple can decide to use it in their products, and they are not commenting about products that have not been released.

There have been lots of signs though that Apple is not opposed to WebRTC in WebKit/Safari.

  • Before the Chrome fork of WebKit/WebCore in what became known as blink, Apple was publicly working on parts of the WebRTC implementation (source)
  • Two umbrella bugs to accept implementation of WebRTC in WebKit are still open and active in WebKit’s bugzilla, with an Apple media engineer in charge (Bug 124288 &  Bug 121101)
  • Apple Engineers, not the usual Apple standard representative, joined the W3C WebRTC working group early 2014 (public list), and participated to the technical plenary meeting in November 2014 (w3c members restricted link)
  • Finally, an early implementation of Media Streams and GetUserMedia API in WebKit was contributed late 2014 (original bug & commit).

So how to let Apple know you want it and soon – potentially this year?

Let Apple know!

Chrome and Internet Explorer (IE), for example, have set up pages for web developers to directly give their feedback about which feature they want to see next (WebRTC related items generally rank high by the way).  There is no such thing yet for Apple’s product.

The only way to formally provide feedback to Apple is through the bug process. One needs to have or create a developer account, and open a bug to let Apple know they want something.  Free accounts are available, so there is no financial cost associated with the process. One can open a bug in any given category, the bugs are then triaged and will end up in “WebRTC” placeholder internally.

Volume counts. The more people will ask for this feature, the most likely Apple is to support it. The more requests the better.

But that is not the only thing that counts. Users of WebRTC libraries, or any third party who has a business depending on WebRTC can also raise their case with Apple that their business would profit from Apple supporting WebRTC in their product. Here too, volume (of business) counts.

As new releases of Safari are usually made with new releases of the OS, and generally in or around September, it is very unlikely to see WebRTC in Safari (if ever) before the next release, late 2015.

We need you

You want WebRTC support on iOS? You can help. See below for a step-by-step guide on how.

How to Guide Step-by-step guide
  1. Register a free Apple Developer account. Whether you are a developer or not does not matter eventually. You will need to make an Apple ID if you do not have one already.
  2. Sign in to the Bug Reporter:
  3. Once signed in, you should see the following screen:
  4. Click on Open, then select Safari:
  5. Go ahead and write the bug report:

It is very important here that you write WHY, in your own words, you want WebRTC support in Safari. There are a multiple of different reasons you might want it:

  • You’re a developer  you have developed a website that requires WebRTC support, and you cannot use it on Safari. If your users are requesting it, please share the volume of request, and/or share the volume of usage you’re getting on non-safari browsers to show the importance of the this for Apple.
  • You’re a company with a WebRTC product or service. You have the same problem as above, and the same suggestions apply.
  • You’re a user of a website that requires WebRTC, and owner of many Apple devices. You would love to be able to use your favorite WebRTC product or service on your beloved device.
  • You’re a company that propose a plugin for WebRTC in Safari, and you would love to get rid of it.
  • others

Often times, some communities organize “bug writing campaigns” that include boilerplate text to include in a bug.  It’s a natural tendency for reviewers to discount those bugs somewhat because they feel like more of a “me too” than a bug filed by someone that took 60 seconds to write up a report in their own words.

{“author”, “Alex Gouaillard“}

{“editor”, “chad“}

Chad’s follow-up Q&A with Alex

Chad: What is Apple’s typical response to these bug filing campaigns?

Alex: I do not have the direct answer to this, and I guess only Apple has. However, here are two very clear comments by an Apple representative:

The only way to let Apple know that a feature is needed is through bug filling.

I would just encourage people to describe why WebRTC (or any feature) is important to them in their own words. People sometimes start “bug writing campaigns” that include boilerplate text to include in a bug, and I think people here have a natural tendency to discount those bugs somewhat because they feel like more of a “me too” than a bug filed by someone that took 60 seconds to write up a report in their own words.”

So my initiative here is not to start a bug campaign per say, where everybody would copy paste the same text, or click the same report to increment a counter. My goal here is to let the community know they can let Apple know their opinion in a way that counts.

[Editor’s note: I was not able to get a direct confirmation from Apple (big suprise) – I did directly confirm  evidence that at least one relevant Apple employee agrees with the sentiment above.]

Chad: Do you have any examples of where this process has worked in the past to add a whole new W3C-defined capability like WebRTC?

Alex: I do not. However, the comment #1 above by Apple representative was very clear that whether it will eventually work or not, there is no other way.

Chad: Is there any kind of threshold on the number of bug filings you think the community needs to meet?

Alex: My understanding is that it’s not so much about the number of people that send bugs, it’s more about the case they make. It’s a blend between business opportunities and number of people. I guess volume counts – whether it is people or dollars. This is why it is so important that people use they own words and describe their own case. 

Let’s say my friends at various other WebRTC Platform-as-a-Service providers desire to show the importance for them of having WebRTC in iOS or Safari- one representative of the company could go in and explain their use case and their numbers for the platform / service. They could also ask their devs to file a bug describing their application they developed on top of their WebRTC platform. They could also ask their users to describe why as users of the WebRTC app that they feel segregated against their friends who owns a Samsung tablet and who can enjoy WebRTC while they cannot on their iPad. (That is just an example, and I do not suggest that they should write exactly this. Again, everybody should use their own word.)

If I understand correctly, it does not matter whether one or several employees of the above named company fill only one or several bugs for the same company use case.

Chad: Are you confident this will be a good use of the WebRTC developer’s community’s time?

Alex: Ha ha. Well, let’s put it that way, the whole process takes around a couple of minutes in general, and maybe just a little bit more for companies that have a bigger use case and want to weight in the balance. Less than what you are spending reading this blog post. If you don’t have a couple of minute to fill a bug to Apple, then I guess you don’t really need the feature.

More seriously, I have been contacted by enough people that just wanted to have a way, anyway, to make it happen, that I know this information will be useful. For the cynics out there, I’m tempted to say, worse case scenario you lost a couple of minutes to prove me wrong. Don’t miss the opportunity.

Yes, I’m positive this will be a good use of everybody’s time.

{“interviewer”, “chad“}

{“interviewee”, “Alex Gouaillard“}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

The post Put in a Bug in Apple’s Apple – Alex Gouaillard’s Plan appeared first on webrtcHacks.

The Minimum Viable SDP

Tue, 03/31/2015 - 13:30

Unnatural shrinkage. Photo courtesy Flikr user Ed Schipul


One evening last week, I was nerd-sniped by a question Max Ogden asked:

That is quite an interesting question. I somewhat dislike using Session Description Protocol (SDP)  in the signaling protocol anyway and prefer nice JSON objects for the API and ugly XML blobs on the wire to the ugly SDP blobs used by the WebRTC API.

The question is really about the minimum amount of information that needs to be exchanged for a WebRTC connection to succeed.

 WebRTC uses ICE and DTLS to establish a secure connection between peers. This mandates two constraints:

  1. Both sides of the connection need to send stuff to each other
  2. You need at minimum to exchange ice-ufrag, ice-pwd, DTLS fingerprints and candidate information

Now the stock SDP that WebRTC uses (explained here) is a rather big blob of text, more than 1500 characters for an audio-video offer not even considering the ICE candidates yet.

Do we really need all this?  It turns out that you can establish a P2P connection with just a little more than 100 characters sent in each direction. The minimal-webrtc repository shows you how to do that. I had to use quite a number of tricks to make this work, it’s a real hack.

How I did it Get some SDP

First, we want to establish a datachannel connection. Once we have this, we can potentially use it negotiate a second audio/video peerconnection without being constrained in the size of the offer or the answer. Also, the SDP for the data channel is a lot smaller to start with since the is no codec negotiation. Here is how to get that SDP:

var pc = new webkitRTCPeerConnection(null); var dc = pc.createDataChannel('webrtchacks'); pc.createOffer( function (offer) { pc.setLocalDescription(offer); console.log(offer.sdp); }, function (err) { console.error(err); } );

The resulting SDP is slightly more than 400 bytes. Now we need also some candidates included, so we wait for the end-of-candidates event:

pc.onicecandidate = function (event) { if (!event.candidate) console.log(pc.localDescription.sdp); };

The result is even longer:

v=0 o=- 4596489990601351948 2 IN IP4 s=- t=0 0 a=msid-semantic: WMS m=application 47299 DTLS/SCTP 5000 c=IN IP4 a=candidate:1966762134 1 udp 2122260223 47299 typ host generation 0 a=candidate:211962667 1 udp 2122194687 40864 typ host generation 0 a=candidate:1002017894 1 tcp 1518280447 0 typ host tcptype active generation 0 a=candidate:1109506011 1 tcp 1518214911 0 typ host tcptype active generation 0 a=ice-ufrag:1/MvHwjAyVf27aLu a=ice-pwd:3dBU7cFOBl120v33cynDvN1E a=ice-options:google-ice a=fingerprint:sha-256 75:74:5A:A6:A4:E5:52:F4:A7:67:4C:01:C7:EE:91:3F:21:3D:A2:E3:53:7B:6F:30:86:F2:30:AA:65:FB:04:24 a=setup:actpass a=mid:data a=sctpmap:5000 webrtc-datachannel 1024

Only take what you need

We are only interested in a few bits of information here: 

  1. the ice-ufrag: 1/MvHwjAyVf27aLu
  2. the ice-pwd: 3dBU7cFOBl120v33cynDvN1E
  3. the sha-256 DTLS fingerprint: 75:74:5A:A6:A4:E5:52:F4:A7:67:4C:01:C7:EE:91:3F:21:3D:A2:E3:53:7B:6F:30:86:F2:30:AA:65:FB:04:24
  4. the ICE candidates

The ice-ufrag is 16 characters due to randomness security requirements from RFC 5245. While it is possible to reduce that, it’s probably not worth the effort. The same applies to the 24 characters of the ice-pwd. Both are random so there is not much to gain from compressing them even.

The DTLS fingerprint is a representation of the 256 bytes of the sha-256 hash. It’s length can easily be reduced from 95 characters to almost optimal (assuming we want to be binary-safe) 44 characters: 

var line = "a=fingerprint:sha-256 75:74:5A:A6:A4:E5:52:F4:A7:67:4C:01:C7:EE:91:3F:21:3D:A2:E3:53:7B:6F:30:86:F2:30:AA:65:FB:04:24"; var hex = line.substr(22).split(':').map(function (h) { return parseInt(h, 16); }); console.log(btoa(String.fromCharCode.apply(String, hex))); // yields dXRapqTlUvSnZ0wBx+6RPyE9ouNTe28whvIwqmX7BCQ=

So we have So we’re at 84 characters now. We can hardcode everything else in the application.

Dealing with candidates

Let’s look at the candidates. Wait, we got only host candidates. This is not going to work unless people are on the same network. STUN does not help much either since it only works in approximately 80% of all cases.

So we need candidates that were gathered from a TURN server. In Chrome, the easy way to achieve this is to set the iceTransports constraint to ‘relay’ which will not even gather host and srflx candidates. In Firefox, you need to ignore all non-relay candidates currently.

If you use the minimal-webrtc demo you need to use your own TURN credentials, the ones in the repository will no longer work since they’re using the time-based credential scheme. Here is what happened on my machine was that two candidates were gathered:

a=candidate:1211076970 1 udp 41885439 47751 typ relay raddr rport 0 generation 0 a=candidate:1211076970 1 udp 41819903 38132 typ relay raddr rport 0 generation 0

I believe this is a bug in chrome which gathers a relay candidate for an interface which is not routable, so I filed an issue.

Lets look at the first candidate using the grammar defined in RFC 5245: 

  1. the foundation is 1211076970
  2. the component is 1. Another reason for using the datachannel, there are no RTCP candidates
  3. the transport is UDP
  4. the priority is 41885439
  5. the IP address is (the ip of the TURN server I used)
  6. the port is 47751
  7. the typ is relay
  8. the raddr and rport are set to and 0 respectively in order to avoid information leaks when iceTransports is set to relay
  9. the generation is 0. This is a Jingle extension of vanilla ICE that allows detecting ice restarts

If we were to simply append both candidates to the 84 bytes we already have we would end up with 290 bytes. But we don’t need most of the information in there.

The most interesting information is the IP and port. For IPv4, that is 32bits for the IP and 16 bits for the port. We can encode that using btoa again which yields 7 + 4 characters per candidate. Actually, if both candidates share the same IP, we can skip encoding it again, reducing the size.

After consulting RFC 5245 it turned out that the foundation and priority can actually be skipped, even though that requires some effort. And everything else can be easily hard-coded in the application. 

sdp.length = 106

Let’s summarize what we have so far: 

  1. the ice-ufrag: 16 characters
  2. the ice-pwd: 22 characters
  3. the sha-256 DTLS fingerprint: 44 characters
  4. the ip and port: 11 characters for the first candidate, 4 characters for subsequent candidates from the same ip.

Now we also want to encode whether this is an offer or an answer. Let’s use uppercase O and A respectively. Next, we concatenate this and separate the fields with a ‘,’ character. While that is less efficient than a binary encoding or one that relies on fixed field lengths, it is flexible. The result is a string like:

O,1/MvHwjAyVf27aLu,3dBU7cFOBl120v33cynDvN1E, dXRapqTlUvSnZ0wBx+6RPyE9ouNTe28whvIwqmX7BCQ=, 1k85hij,1ek7,157k

106 characters! So that is tweetable. Yay!

You better be fast

Now, if you try this it turns out it does not usually work unless you are fast enough pasting stuff.

ICE is short for Interactive Connectivity Establishment. If you are not fast enough in transferring the answer and starting ICE at the Offerer, it will fail. You have less than 30 seconds between creating the answer at the Answerer and setting it at the Offerer. That’s pretty tough for humans doing copy-paste. And it will not work via twitter.

What happens is that the Answerer is trying to perform connectivity checks as explained in RFC 5245. But those never reach the Offerer since we are using a TURN server. The TURN server does not allow traffic from the Answerer to be relayed to the Offerer before the Offerer creates a TURN permission for the candidate, which it can only do once the Offerer receives the answer. Even if we could ignore permissions, the Offerer can not form the STUN username without the Answerer’s ice-ufrag and ice-pwd. And if the Offerer does not reply to the connectivity checks by Answerer, the Answerer will conclude that ICE has failed.


So what was the point of this?

Now… it is pretty hard to come up with a use-case for this. It fits into an SMS. But sending your peer an URL where you both connect using a third-party signaling server is a lot more viable most of the time. Especially given that to achieve this, I had to make some tough design decisions like forcing a TURN server and taking some shortcuts with the ICE candidates which are not really safe. Also, this cannot use trickle ice.


(thanks, Max)

So is this just a case study in arcane signaling protocols? Probably. But hey, I can now use IRC as a signaling protocol for WebRTC. IRC has a limit of 512 characters so one can include more candidates and information even. CTCP WEBRTC anyone?

{“author”: “Philipp Hancke“}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

The post The Minimum Viable SDP appeared first on webrtcHacks.

Avoiding Contact Center IVR Hell with WebRTC

Mon, 03/02/2015 - 14:08

A couple of decades ago if you bought something of any reasonable complexity, odds are it came with a call center number you had to call in case something went wrong. Perhaps like the airline industry, economic pressures on contact centers shifted their modus operandi from customer delight to cost reduction. Unsurprisingly this has not done well for contact center public sentiment. Its no wonder the web came along to augment and replace much of this experience –  but no where near all it. Today, WebRTC offers a unique opportunity for contact centers to combine their two primary means of customer interaction – the web and phone calls – and entirely change the dynamic to the benefit of both sides.

To delve into what this looks like, we invited Rob Welbourn to walk us through a typical WebRTC-enabled contact center infrastructure. Rob has been working on the intersection of telephony and web technologies for more than 8 years, starting at Covergence. Rob continued this work which eventually coalesced into deep enterprise and contact center WebRTC expertise at Acme Packet, Oracle, Cafe X, and now as an consultant for hire.

Please see Rob’s great technology brief on WebRTC architectures in the Contact Center below.

{“intro-by”: “chad“}

Robert Welbourn


If ever there was an area where WebRTC is expected to have a major impact, it is surely the contact center.  By now most readers of this blog have seen the Amazon Kindle Fire commercials, featuring the get-help-now Mayday button and Amy, the annoyingly perky call center agent:

Those in the industry know that Mayday’s voice and video capability use WebRTC, as detailed by Chad and confirmed by Google WebRTC development lead Justin Uberti.   When combined with screen sharing, annotation and co-browsing, this makes for a compelling package. Executives in charge of call centers have taken notice, and are looking to their technology suppliers to spice up their call centers in the same way.

Indeed, the contact center is a very instructive example of how WebRTC can be used to enhance a well-established, existing system.  For those who doubt that the technology isn’t mature enough for widespread deployment, I’ll let you into a dirty little secret: WebRTC on the consumer side of the call center isn’t happening in web browsers, it’s happening in mobile apps.  I’ll say more about this later.

What a Contact Center looks like

Before we examine how we can turbocharge a contact center with WebRTC, let’s take a look at the main component parts, and some of the pain points that both customers and call center staff encounter in their daily lives.

(Disclaimer:  This sketch is a simplified caricature of a call center, drawn from the author’s experience with a number of different systems.   The same is true for the descriptions of WebRTC gateways in the following sections, which should be viewed as idealized and not a description of any one vendor’s offerings.)

Generic Contact Center Architecture

The web-to-call correlation problem

Let’s imagine that we’re a consumer, calling our auto insurance company.  Perhaps we’ve been to their website, or maybe we’re using their shiny new mobile app on our smartphone.  Either way, we’ve logged into the insurer’s web portal, to get an update on an insurance claim, update our coverage, or whatever.  (And yes, even if we’re using a mobile app, we’re most likely still communicating with a web server.  It’s only the presentation layer that’s different.)

Now suppose that we actually want to talk to a human being who can help us.  If we’re lucky, the web site will provide a phone number in an easy-to-find place, or maybe our mobile app will automatically bring up the phone’s dialer to make the call.  However, at this point, all of our contextual information, such as our identity and the web page we were on, gets lost.

The main problem here is that it is not easy to correlate the web session with the phone call.  The PSTN provides no way of attaching a context identifier from a web session to a phone call, leaving the caller ID or dialed number as the only clues in the call signaling.   That leaves us with the following possibilities:

  • Use the caller ID.  This is ambiguous at best, in that a phone number doesn’t definitively identify a person, and mobile device APIs in any case forbid apps from harvesting a device’s phone number, so it can’t be readily passed into the contact center by the app.
  • Use the called number.  Some contact centers use the concept of the steering pool, where a phone number from a pool is used to temporarily identify a particular session, which could potentially be used by a mobile app.  However, the redial list is the enemy of this idea; since the number is temporarily allocated to a session, you wouldn’t want a customer mistakenly thinking they could use the same number to call back later.
  • Have the contact center call the customer back when it’s their turn, and an agent is about to become available.  This is in fact a viable approach, but complex to implement, largely for reasons of not tying up an agent while an attempt is made to reach the customer and verify they still want the call.
  • Use WebRTC for in-app, contextual communications.
Customer-side interaction

But let’s continue with the premise that the customer has made a regular phone call to the contact center.  From the diagram above, we can see that the first entity the call hits is the ingress gateway (if via TDM) or Session Border Controller (if via a SIP trunk).  This will most likely route the call directly to an Interactive Voice Response (IVR) system, to give the caller an opportunity to perform self-service actions, such as looking up their account balance.  Depending on the vendor, the ingress gateway or SBC may itself take part in the interaction, by hosting a VoiceXML browser, as is the case with Cisco products; or else the IVR may be an application running on a SIP-connected media server platform.

Whatever the specific IVR architecture, it will certainly connect to the same customer database used by the web portal, but using DTMF to input an account number and PIN, rather than a username and password.  If the customer is lucky, they have managed to find an account statement that tells them what their account number is; if not, the conversation with agent is going to start by having them spell their name, give the last four digits of their Tax ID, and so on.  Not only that, but if a PIN is used, it is doubtless the same one used for their bank card, garage door opener and everything else, which hardly promotes security.  This whole process is time-consuming for both customer and agent, error-prone, and generally frustrating.

At this point the IVR has determined who the caller is, and why they are calling – “Press 1 for auto claims, 2 for household claims…”; the call now needs to be held in a queue, awaiting a suitably qualified agent.  The job of managing the pool of agents with their various skills, and the queues of incoming calls, is the job of the Automated Call Distributor (ACD).  An ACD typically has a well-defined but proprietary interface or protocol by which it interacts with an IVR.  The IVR will submit various data items to the ACD, notably the caller ID, called number, customer identity and required skill group.  The ACD may then itself interrogate the customer database, perhaps to determine whether this is a customer who gets priority service, or whether they have a delinquent account and need to be handled specially, and so on, so that the call can be added to the appropriate queue.  The ACD may also be able to provide the IVR with the estimated wait time for an agent, for feedback to the caller.

Agent-side interaction

Let’s turn for a moment to the agent’s side of the contact center.  An agent will invariably have a phone (whether a physical device or a soft client), an interface to the ACD (possibly a custom “thick client”, but increasingly a web-based one in modern contact centers) and a view into the customer database.  For business-to-business contact centers, the agent may also be connected to a CRM system: Salesforce.com, Siebel, Oracle CRM, Microsoft Dynamics, and so on.

For the purposes of our discussion, the agent’s phone is connected to a PBX, the PBX will provide call status information to the ACD using a standard telephony interface such as JTAPI, and the ACD will in turn use the same interface to direct incoming calls to agents.  This would typically be the case where an organization has a Cisco or Avaya PBX, for example, and the use of standard JTAPI allows for the deployment of a multi-vendor call center.  Other vendors, notably Genesys, have taken the approach of building their call center software using a SIP proxy as a key component, and the agents register their phones directly with the ACD rather than with a PBX.

The agent will log into the ACD at the beginning of their shift, signaling that they are available.  Call handling is then directed by the ACD, and when a call is passed to an agent, the ACD pushes down the customer ID to the agent’s desktop, which is then used to automatically do a “screen pop” of the customer’s account details from the customer database or CRM system.

Call handling in a contact center is thus a complex orchestration between an IVR, ACD, PBX and various pieces of enterprise software, usually requiring the writing of custom scripts and plugins to make everything work together  Not only this, but contact centers also make use of analytics software, call recording systems, and so on.

The caller experience

Let’s return to our caller, parked on the IVR and being played insipid on-hold music.  When the call eventually reaches the head of the queue, the ACD will instruct the IVR to transfer the call to a specific agent.  The agent gets the screen pop, asks the caller to verify their identity, and then begins the process of asking why they called.

To summarize, the contact center experience typically involves:

  • Loss of contextual information from an existing web or mobile app session.
  • Navigating IVR Hell.
  • Waiting on hold.
  • Re-establishing identity and context.
  • A voice-only experience with a faceless representative, and lack of collaboration tools.

It’s no wonder this is judged a poor experience, for customers and contact center agents alike.

Adding WebRTC to the Contact Center

WebRTC is part of what is called in the contact center business, the “omnichannel experience”, in which multiple modalities of communication between a customer and the contact center all work together seamlessly.  An interaction may start on social media, be escalated to chat, from there to voice and video, and possibly be accompanied by screen sharing and co-browsing.  But how is this accomplished?

Contact Center with WebRTC Gateways and Co-browse Server

The key thing to hold in mind is that voice and video are integrated into the contact center app, and that context is at all times preserved.  As a customer, you have already established your identity with the contact center’s web portal; there’s no need to have the PSTN strip that away when you want to talk to a human being.  And when you do get put through to an agent, why shouldn’t they be able to view the same web page that you do?  (Subject to permission, of course.)

To do this, we need the following components (shown colored purple in the above diagram):

  • A back-end to the web portal that is capable of acting as a pseudo-IVR.  As far as the ACD is concerned, it’s getting a regular incoming call, which has to be queued and transferred to an agent as usual.  The fact that this is a WebRTC call and not from the PSTN is totally transparent to the ACD.
  • A co-browsing server – this acts as a rendezvous point between the customer and the agent for a particular co-browsing session, where changes to the customer’s web page are published over a secure WebSockets (WSS) connection, and the agent subscribes to those changes.  The actual details of how this works are proprietary and vary between vendors; however, the DOM Mutation Observer API is generally at the heart of the toolkit used.  When the agent wishes to navigate on behalf of the customer, mouse-click  events are sent back over the WSS connection from the agent and injected into the customer’s web page using a JavaScript or jQuery simulated mouse click event.  Annotation works similarly, with a mousedown event being passed over the WSS connection and used to paint on an HTML canvas element overlaying the customer’s web page.
  • A WebRTC-to-SIP signaling gateway (as webrtcHacks has covered here).
  • A media gateway, which transforms the SRTP used by WebRTC to the unencrypted RTP used by most enterprise telephony systems, and vice-versa.  This element may also carry out H.264 to VP8 video transcoding and audio codec transcoding if required.

The signaling and media gateways are common components for vendors selling WebRTC add-ons for legacy SIP-based systems, and are functionally equivalent in the network to a Session Border Controller.  Indeed, several such products are based on SBCs, or a combination of an SBC for the media and a SIP application server for the signaling gateway.  On the other hand, the pseudo-IVR and co-browse servers are rather more specialized elements, designed for contact center applications.

The work of this array of network elements is coordinated by the web portal, using their APIs and supporting SDKs.  The sequence diagrams in the next section show how the web portal and the ACD between them orchestrate a WebRTC call from its creation to being handed off as a SIP call to an agent, and how it is correlated with a co-browsing session.

Finally, it should be noted that a reverse HTTP proxy is generally required to protect the web servers in this arrangement, which reside within the inner firewall.  The media gateway would normally be placed within the DMZ.  The use of multiplexing to allow the media streams of multiple calls to use a single RTP port is a particularly noteworthy feature of WebRTC, which is deserving of appreciation by those whose job it is to manage firewalls.

Call Flows

In the diagrams that follow, purple lines indicate web-based interactions, often based on REST APIs. Some interactions may use WebSockets because of their asynchronous, bidirectional nature, which is particularly useful for call signaling and event handling.

Preparing for a call

Let us start at the point where the customer has already been authenticated by the web portal, and has been perusing their account details. Seeing the big friendly ‘Get Help’ button on their mobile app (remember, this is a mobile-first deployment), they decide they want to talk to a human. Inevitably, an agent is never just sitting around waiting for the call, so there is work to be done to make this happen.

WebRTC Contact Center Call Flow, Part 1

The first step in preparing for the call is for the web portal code to allocate a SIP identity for the caller, in other words, the ‘From’ URI or the caller id. This could be any arbitrary string or number, but it should be unique, since we’re also going to use it to identify the co-browse session. Next, the portal requests the WebRTC signaling gateway to authorize a session for this particular URI, because, well, you don’t want people hacking into your PBX and committing toll fraud using WebRTC. The signaling gateway obliges, and passes back to the web portal a one-time authorization token. Armed with the token, the portal instructs the client app (or browser) to prepare for the WebRTC call. It provides the token, the From URI, the location of a STUN server and information on how to contact the signaling gateway.

While the client is being readied, the portal makes a web services call to the ACD to see when an agent is expected to become available, given the customer’s identity and the nature of their inquiry. (The nature of the inquiry will be determined by what page of the website or app they were on when they pressed the ‘Get Help’ button.) Assuming an agent is not available at that very moment, the portal passes back the estimated wait time to be displayed by the client.

But what about the insipid on-hold music I mentioned earlier? Don’t we need to transfer the customer to a media server to play this? Well, no, we don’t. This is the Web we’re talking about, and we can readily tell the client to play a video from YouTube, or wherever, while they are waiting.

Next, the web portal submits the not-yet-created call to the ACD for queuing, via the pseudo-IVR component. Key pieces of information submitted are the From URI, the customer ID and the queue corresponding to the reason for the call. When the call reaches the head of the queue, the ACD instructs the call to be transferred to the selected agent.

(Side-note: Pseudo-IVR adapters for contact centers are used for a variety of purposes. They may be used to dispatch social media “tweets”, inbound customer service emails and web-chat sessions, as well as WebRTC calls.)

For modern deployments, agent desktop software may be constructed from a web-based framework, which allows third-party plugin components to pull information from the customer database, to connect to a CRM system, and in our case, to connect to the co-browse server. The screen pop to the agent uses the customer URI to connect to the correct session on the co-browse server.

Making the call

Now that the customer and agent are both ready, the web portal instructs the WebRTC client to call the agent’s URI. The actual details of how this is done depend on the vendor-specific WebRTC signaling protocol supported by the gateway; however, on the SIP side of the gateway they are turned into the standard INVITE, with the SDP payload reflecting the media gateway’s IP address and RTP ports.

WebRTC Contact Center Call Flow, Part 2

The fact that this is a video call is transparent to the ACD. The pool of agents with video capability can be put in their own skill group for the purposes of allocating them to customers using WebRTC.  The agents could be using video on suitably equipped phone handsets, or they could themselves be using WebRTC clients.  Indeed, some contact center vendors with whom I have spoken point to the advantages of delivering the entire agent experience within a browser: delivering updates to a SIP soft client or thick-client agent desktop software then becomes a thing of the past.

After the video session has been established, the customer may assent to the sharing of their mobile app screen or web-browsing session.  The co-browse server acts as the rendezvous point, with the customer’s unique URI acting as the session identifier.

Concluding thoughts: It’s all about mobile, stupid!

The fact that WebRTC is not ubiquitous, that it is not currently supported in major browsers such as Safari and Internet Explorer, might be thought an insurmountable barrier to deploying it in a contact center.  But this is not the case.  The very same infrastructure that works for web browsers also works for mobile apps, which in many cases are simply mobile UI elements placed on top of a web application, making the same HTTP calls on a web server.

All that is required is a WebRTC-based SDK that works in Android or iOS.  Happily for us, Google has made its WebRTC code readily available through the Chromium project.  Several vendors have made that code the basis of their mobile SDKs, wrapping them with Java and Objective-C language bindings equivalent to the JavaScript APIs found in browsers.

For contact center executives, a mobile-first approach offers the following advantages:

  • You don’t want your customers messing around trying to install WebRTC browser plugins for Safari and IE.  If they’re going to download anything, it may as well be your mobile app.
  • Mobile devices are near-ubiquitous.  Both Pew and Nielsen report their popularity amongst older demographics in particular, where regular PCs might not be used.
  • Microphones and cameras on mobile devices are near-universal and of excellent quality, and echo cancellation works well.  That old PC with a flaky webcam?  Perhaps not so much.
  • If your customer is having a real-world problem, then the back-facing camera on a phone or tablet is a great way of showing it.  The auto insurance industry comes readily to mind.
  • Although the Great Video Codec Compromise now promises H.264 support in browsers, mobile SDKs have been able to take advantage of those devices’ hardware support for H.264 video encoding for some time.  When your contact center agents have sleek enterprise-class, video-capable phones that don’t support VP8, you don’t want to have to buy a pile of servers simply to do video transcoding.

In the call center industry, Amazon and American Express have shown the way in supporting video in their tablet apps, and both these services use WebRTC under the hood.  Speaking at the 2014 Cisco Live! event in San Francisco, Amex executive Todd Walthall related how users of the Amex iPad app who used the video feature had greater levels of customer satisfaction, through a more personal experience.  This should not surprise us, as it’s much easier to empathize with a customer service representative if they’re not just a disembodied voice.

For companies deploying WebRTC, it’s an incremental approach that doesn’t require significant architectural change or the replacement of existing systems. Early adopters are seeing shorter calls, as context is preserved and co-browsing allows problems to be resolved more quickly. One day we will look back at IVR Hell, waiting on endless hold with only a lo-fi rendition of Mantovani for company, trying in vain to find our account number and PIN, as if it were a childhood nightmare.

{“author”: “Robert Welbourn“}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

The post Avoiding Contact Center IVR Hell with WebRTC appeared first on webrtcHacks.


Using the greatness of Parallax

Phosfluorescently utilize future-proof scenarios whereas timely leadership skills. Seamlessly administrate maintainable quality vectors whereas proactive mindshare.

Dramatically plagiarize visionary internal or "organic" sources via process-centric. Compellingly exploit worldwide communities for high standards in growth strategies.

Get free trial

Wow, this most certainly is a great a theme.

John Smith
Company name

Startup Growth Lite is a free theme, contributed to the Drupal Community by More than Themes.