Your Browser as a Audio Conference Server with WebRTC & Web Audio (Alexey Aylarov)

Fri, 05/20/2016 - 11:04

Conference calling is a multi-billion dollar industry that is mostly powered by expensive, high-powered conferencing servers. Now you can replicate much of this functionality for free with a modern browser using the combination of WebRTC and WebAudio. Like with video, multi-party audio can utilize a few architectures: Full mesh – each client sends their audio […]

Update: Anatomy of a WebRTC SDP (Antón Román)

Thu, 05/05/2016 - 14:33

Session Description Protocol (SDP) is a fundamental, but very unintuitive concept behind how WebRTC works today. Its no wonder that the Anatomy of a WebRTC SDP post and the interactive SDP guide by Quobis CTO, Antón Román has been so popular here on webrtcHacks. With all things WebRTC, things have changed and we were due for an […]

Sharpening the Edge – extended Q&A with Microsoft for RTC devs

Thu, 04/21/2016 - 17:22

Two weeks ago Microsoft’s Bernard Aboba (and former webrtcHack’s interviewee) gave an update on Edge’s ORTC and WebRTC at the Microsoft Build conference. He covered some big topics including VP8 and WebRTC 1.0 support. You can see the update video here or read the follow-up post for details. Then last week Microsoft announced plug-in free Skype […]

The Big Churn – learning from real usage stats (Lasse Lumiaho and Varun Singh)

Fri, 04/08/2016 - 15:39

Losing customers because of issues with your network service is a bad thing. Sure you can gather data and try to prevent, but isn’t it better to prevent issues in the first place? What are the most common pitfalls to look out for? What’s a good benchmark? What WebRTC-specific user experience elements should you spend […]

Is Slack’s WebRTC Really Slacking? (Yoshimasa Iwase)

Thu, 03/24/2016 - 13:25

Earlier this month Fippo published a post analyzing Slack’s new WebRTC implementation. He did not have direct access or a team account to do a thorough deep dive – not to mention he is supposed to be taking some off this month. That left many with some open questions? Is there more to the TURN network? […]

Dear Slack: why is your WebRTC so weak?

Thu, 03/03/2016 - 20:41

  Dear Slack, There has been quite some buzz this week about you and WebRTC. WebRTC… kind of. Because actually you only do stuff in Chrome and your native apps: I’ve been there. Launching stuff only for Chrome. That was is late 2012. In 2016, you need to have a very good excuse to launch something […]

getUserMedia resolutions III – constraints unleashed

Mon, 02/08/2016 - 14:03

Back in October 2013,  the relative early days of WebRTC, I set out to get a better understanding of the getUserMedia API and camera constraints in one of my first and most popular posts. I discovered that working with getUserMedia constraints was not all that straight forward. A year later I gave an update after the […]

Surviving Mandatory HTTPS in Chrome (Xander Dumaine)

Thu, 12/17/2015 - 13:11

Xander Dumaine provides some strategies and code for dealing with the new secure origin only policy in Chrome 47+ that forces the use of HTTPS.

Shut up! Monitoring audio volume in getUserMedia

Thu, 12/10/2015 - 13:14

A few days back my old friend Chris Koehnke, better known as “Kranky” asked me how hard it would be to implement a wild idea he had to monitor what percentage of the time you spent talking instead of listening on a call when using WebRTC. When I said “one day” that made him wonder whether he could offshore it to save money. Well… good luck!

A week later Kranky showed me some code. Wait, he is writing code? It was not bad – it was using the WebAudio API so going in the right direction. It was enough to prod me to finish writing the app for him.

The audio stream volume sample application from Google calculates the root mean square (RMS) of the audio signal which is extracted from the input stream using a script processor every 200ms. There is a lot of tuning options here of course.

Instead of starting from scratch, I decided to use hark, a small open source module for this task that my coworker Philip Roberts had built in mid-2013 when the WebAudio API became first available.

Instead of the RMS, hark uses the Fast Fourier Transformation to obtain a frequency domain representation of the input signal. Then, hark picks the maximum amplitude as an indication for the volume of the signal. Let’s try this (full code here):

var hark = require('../hark.js') var getUserMedia = require('getusermedia') getUserMedia(function(err, stream) { if (err) throw err var options = {}; var speechEvents = hark(stream, options); speechEvents.on('volume_change', function(volume) { console.log('current volume', volume); }); });

On top of this, hark uses a simple speech detection algorithm that considers speech to be started when the maximum amplitude stays above a threshold for a number of milliseconds. Much less complicated than typical voice activity detection algorithms but pretty effective. And easy to use as well, just subscribe to two additional events:

speechEvents.on('speaking', function() { console.log('speaking'); }); speechEvents.on('stopped_speaking', function() { console.log('stopped_speaking'); });

Tuning the threshold for accurate speech detection is pretty tricky. So I needed visualization (and just requiring hark only took five minutes so I had plenty of time). Using the awesome Highcharts graph library I quickly added plot bands to the graph I was generating:

With the visualization I could easily see that the speech detection events happened a bit later than I expected since hark requires a certain history over the threshold for the trigger to work (say: 400ms).  To adjust for this in the graph had to substract this speech starting to trigger time from my x-axis (now()– 400ms for example).

That graph is still visibile on the more techie variant of the website so if you think the results are not accurate… it might help you figure out what is going on. I am happy with the current behavior.

The percentage of speech then calculated as the sum of the intervals that speech is detected divided by the duration of the call. As a display, a gauge chart is used with three different colors:

  • up to 65% speech time: green
  • up to 79%: yellow
  • more than 80%: red

Adding remote audio to this would be awesome. However, while the WebAudio API is supported for local media streams in Chrome, Firefox and Edge, it is only supported for remote streams in Firefox. Hooking this up with the getStats API (in Chrome) to get the audio level would certainly be possible, but would require calling getStats at a very high frequency to get proper averages.

Check out the app in action at talklessnow and let us know what you think.

{"author": "Philipp Hancke"}

OMG WebRTC is tracking me! Or is it?

Thu, 11/05/2015 - 15:23

There has been more noise about WebRTC making it possible to track users. We have covered some of the nefarious uses of WebRTC and look out for it before. After reading a blog post on this topic covering some allegedly new unaddressed issues a week ago I decided to ignore it after some discussion on the mozilla IRC channel. But this has some up on a the twitter-sphere again and Tsahi said ‘ouch’, here are my thoughts.


The blog post (available here) makes a number of claims about how certain Chrome behavior makes fingerprinting easier:

  • Chrome started caching certificates for 30 days recently, creating a cookie-like attack surface for privacy
  • this allows cross-origin tracking of users
  • the incognito mode behavior is inconsistent with respect to this

Caching certificates

First, there is a claim that the way Chrome caches certificates changed recently:

In the past, Google Chrome used to generate a new self-signed certificate for every WebRTC PeerConnection. But now (using Chrome 46, or maybe earlier as i did not check) it generates a self-signed certificate which is valid for one month and uses it for all PeerConnections of a particular domain.

The code used to demonstrate this behaviour is rather odd, too. It uses the getStats API to the query the fingerprint, which is also available more easily in the SDP.

Chrome has cached certificates in this way for about two years, this is not real news. One of the reasons for this is that it is rather expensive to generate the current private keys for DTLS, especially on mobile devices. In the future, there will be more control over this behaviour. Neither Firefox nor Edge currently cache certificates.

To be fair, the WebRTC team made a serious blunder here. Until Chrome 45, the certificate was not cleared when cookies were cleared, only when all data was cleared. The bugfix for this only appeared in the Chrome 47 release notes:

Issue 510850 DTLS cert should be cleared when cookies are cleared

Cross-Origin Tracking

So this part is not really news. The second claim made in the blog post is that this enables cross-origin tracking:

To test this go to http://www.kapejod.org/tracking/test.html and to http://kapejod.org/tracking/test.html. Open the network tab of Chrome’s developer console and compare the urls of the requested “tracking.png”. They should contain the same fingerprint, now!

They do. Now, let’s look at this test page:

// make up some random id var transactionId = 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c) {var r = Math.random()*16|0,v=c=='x'?r:r&0x3|0x8;return v.toString(16);}); var fragment = document.createDocumentFragment(); var div = document.createElement("DIV"); div.innerHTML = '<iframe src="http://kapejod.org/tracking/identify.html?'+transactionId+'" width="1" height="1" style="display:none;"/>'; fragment.appendChild(div); document.body.insertBefore(fragment, document.body.childNodes[document.body.childNodes.length - 1]);

It includes the URL http://kapejod.org/tracking/identify.html. Let’s also look at the code there as well. It executes the code shown above and logs the fingerprint to the console:

console.log('your fingerprint is: ' + fingerprint);

Now why is the fingerprint the same? Well, the iframe is always included from kapejod.org. Which means the Javascript is executed within the context of this origin.
So Chrome can use the persisted fingerprint. As well as any cookies and localStorage data. The attack surface here is no worse than setting a cookie.

Another thing related to this (and I am surprised this has not yet been mentioned) are the deviceIds returned by navigator.mediaDevices.enumerateDevices. Those are also persisted with the same lifetime as cookies. The W3C mediacapture specification has a paragraph about security and privacy considerations on this:

The identifiers for the devices are designed to not be useful for a fingerprint that can track the user between origins, but the number of devices adds to the fingerprint surface. It recommends to treat the per-origin persistent identifier deviceId as other persistent storages (e.g. cookies) are treated.

Again, WebRTC and other HTML5 techniques increase the fingerprint surface. But by design, this is not worse than cookies or equivalent techniques like localStorage.

Incognito Mode

Last but not least the blog post makes claims about the incognito mode:

But to make it generate a new one you have to close ALL incognito tabs. Otherwise you can be tracked across multiple domains.

Again, this behaviour is consistent with the incognito mode behaviour for things like localStorage. In both Chrome and Firefox. In incognito mode, open a site, set something in localStorage. Open another tab. Close first tab. Navigate to same site. Check localStorage. Boo!


There is no real news here. In Germany, we call this ‘olle kamellen’.

{"author": "Philipp Hancke"}

Are we There Yet? WebRTC standards Q&A with Dan Burnett

Wed, 10/21/2015 - 12:58

If you are new to WebRTC then you have missed out on years of drama in the standards bodies over various issues like SDP and codecs. These standards dictate what vendors must implement so they ultimately dictate the industry roadmap.  To get a deep perspective and appreciation of the issues, we like to ask Dan Burnett, W3C editor to comment on where we are at with the standardization process. I caught up with Dan at this year’s IIT Real Time Communications Conference and had the more detailed Q&A with him shortly thereafter.

We asked Dan to comment on recent spec changes, ORTC, the next version of WebRTC, codecs, Apple, when the 1.0 spec might ever be finalized, and a whole lot more.

{"editor", "chad hart"}

New Governance

webrtcHacks: Hi Dan. Can you describe some of the recent changes to the W3C WebRTC governance?

Dan: Yes. There was a long-running but productive discussion among the members of the WebRTC Working Group (WG), ORTC Community Group (CG), and the some of the members of the W3C advisory committee – which is the group that officially determines group charters.

As part of the Charter renewal process, we decided that there would be one additional Chair of the WebRTC Working Group –  Eric Lagerway of Hookflash who was one of the initiators of ORTC. Also the decision was that the WebRTC WG is the official group where all future standardization work in WebRTC will happen, meaning the ORTC work will gradually fold into that group.

Additionally, the group was chartered to work on another version beyond 1.0 – WebRTC Next Version or WebRTC-NV.

There are 2 requirements on that version:

  1. There is no requirement that new features introduced in the specification have an SDP equivalent
  2. WebRTC NV is not a replacement for WebRTC 1.0 – it is an extension. It is expected that all browsers that support WebRTC NV will support 1.0 functionality as well.

One other thing has happened that is not official, but is probably good is that Bernard Aboba from Microsoft has joined the WebRTC 1.0 editing team.

The Next Version

webrtcHacks: yeah, Bernard mentioned that in the interview I did with him last week. Can you explain WebRTC NV? Why didn’t you just call it 2.0, or 1.1, or whatever?

Dan: I have been working on standards for a long time. I have seen groups spend ridiculous amounts of time deciding on a name for a specification.  In this particular case a “1.1” sounds like a minor change from “1.0” while “2.0” sounds like a major change. Some people want a minor change. Some people want a major change. If enough people want different minor changes it will end up being a 2.0 anyway because of the number of changes. The goal was to avoid that disagreement now so that we can move forward,.

webrtcHacks: So what is WebRTC NV then, beyond what you stated earlier about no SDP?

Dan: Nothing is officially decided but I expect that there will continue to be more low-level controls as in ORTC. This is complicated by the fact that new feature proposals are continuing to come in for 1.0. Many of these features are from ORTC.

In the Sapporo meeting coming up, Google will be sharing their idea for what should go into WebRTC-NV when we finally start working on it.

Dan at the IIT-RTC Conference

webrtcHacks: How do you see ORTC influencing the WebRTC spec? Is WebRTC-NV really just ORTC?

Dan: If it had to summarize WebRTC-NV I would say that it is the combination of WebRTC 1.0 and ORTC. It is a requirement that 1.0 applications continue to work in WebRTC-NV implementations. It is not required that ORTC applications work directly in WebRTC-NV.

I believe the ORTC community intends to modify ORTC as necessary to remain consistent with WebRTC as it evolves.

webrtcHacks: Is there an end-date to ORTC-then? When it is mostly merged with WebRTC-NV will it cease to exist?

Dan: I can’t speak for the ORTC group. I have not heard of an end date. You’ll have to ask one of the primary ORTC contributors.

Spec Changes

webrtcHacks: What are some of changes made to the specs recently. Particularly those that impact the developers out there?

Dan: First I would like to give a little plug for my webrtcstandards.info site where I have been putting exactly that sort of information over the past few months. I will mention some things here, but you can get more details on that site.

webrtcHacks: ok, we’ll give you one plug (laughs)

Dan: One of the biggest and most relevant changes on what we were just talking about is the introduction of the RTCsenders and RTCreceivers. These are objects that allow for both information and more direct control over how tracks are sent over a PeerConnection. Notice as part of this that we have moved from a stream based API to a track based API.

webrtcHacks: And what advantage does the track approach provide?

Dan: It turns out developers want to have more control over exactly how tracks are sent and received. For example being able to specify which codecs are to be used and the parameters used to configure those codecs. They should be able to configure some transport properties as well on a per track basis such as FEC, retransmission, and bandwidth. Because of this it really didn’t make sense to talk about streams as the primary primitive being sent over a PeerConnection since they are really just a collection of tracks.

One of Peter Thatcher ORTC update slide’s showing the differences between the WebRTC and ORTC API. source: IIT-RTC 2015

webrtcHacks: So the others?

Dan: First, on the one we just mentioned – that was a foundational change where we are going to be seeing many other changes later on. Now I’ll talk about the others that are not related to that.

One big change was the API’s have been converted to use ECMAScript Promises. I think I mentioned this last year.

webrtchacks: You did.

Dan: It has happened. It is now in the specifications.

Promises are now the recommended mechanism for WebRTC specifications and for web specifications in general for dealing with asynchronous function calls. Not so much for things that generate multiple events, but definitely for any single asynchronous function call.

This is part of the move of ECMAscript toward truly asynchronous function calls as you can see if you look at some of the thoughts or future versions of ECMAscript.

The original callback based API’s currently still exist but will eventually be deprecated. Developers should start using the Promise versions.

webrtcHacks: I know media capture from the DOM is another one.

Dan: There has been good progress on capturing media directly from media elements such as audio, video and canvas. Developers have had to use hacks up to this point to be able to capture a canvas for example. Maybe they would take snapshots, but that is not the same as a realtime media stream as you would get from a getUserMedia call.

The major changes going into the specification soon  are to try to reproduce the resulting media stream as faithfully as possible to what a user would experience from that element. For example, if the user is playing a video and pauses it and then resumes, the resulting stream should show the paused video for the amount of time it was paused and then resume again.

This seems to be what developers are most interested in.

webrtcHacks: can you talk about some of the use cases that are being referenced around this feature?

Dan: Shared whiteboard is probably the best example, but there maybe some instances for training purposes where you want to capture how the user has interacted with existing elements – video or audio.

webrtcHacks: What about screensharing?

Dan: There is good progress happening there as well on the specification. It still has some tricky issues in terms of what apps should be able to request to be shared and what users should have control over. An example of this is Microsoft Powerpoint – if a user has 3 powerpoint documents up – say different presentations for different clients; they are likely to only want to share one one of those presentations – one window of that application. That works great until they go into presentation mode, which is far as the computer is concerned is a different window. So is this a case where the user should decide or is this a case where the application should decide what is shared?

In general the WG believes that the user should have the control, but browsers may have to make special cases for known applications such as Powerpoint so that it just works.

webrtcHacks: How about simulcast?

Dan: At the Seattle meeting there were some strong opinions on how simulcast should work and some proposals. Each time we get to the details the discussions diverge rather than converge. We all want it but we do not agree on how it should be signaled.


webrtcHacks: Now for an easier one. When will 1.0 be done?


Dan: I am tempted to give a similar answer as last year.

There are 2 primary specifications. The media capture specification is right now finishing up addressing the comments from its Last Call review which is the wide range review that is required in order to go forward. There aren’t any new features being requested by group members – it’s just cleaning up and fixing.

It probably will be stable within another 6 months.

webrtcHacks: Stable meaning not changing any more?

Dan: Yes – meaning no contentful changes. Only editorial fixes.

Now the WebRTC specification has the problem that new features keep coming in.

werbrtcHacks: Just to clarify – the Media Capture group is the getUserMedia API and when you WebRTC, that means the RTCPeerConnection and DataChannel related API’s?

Dan: Yes.

These are features that have come from ORTC. At each meeting we have tried to finalize the list, but new proposals continue to creep in. Within 6 months we will know whether the chairs have been able to hold the line on the most recent list agreed to in Seattle.

webrtcHacks: So is this why it is taking so long?

Dan: Yes.. The good news about it is that the features that are going in are the most requested ones from ORTC.

IP Leakage

webrtcHacks: The IP leakage issue was a hot topic on webrtcHacks and elsewhere? Many have labeled it as a flaw; other say this behaviour was by design? Can you share the “standards” perspective on this topic and the considerations that were discussed?

Dan: The summary is this – there are 2 problems with IP leakage:

One kind is the leakage of public addresses that the user doesn’t want leaked. This can happen when a user is using a VPN and not all of the traffic is sent over the VPN – a so called split tunnel VPN.  This is an issue if the user doesn’t want their non-VPN public address to be revealed. This is not a WebRTC problem; this is a split tunnel VPN problem. That doesn’t mean that people don’t blame the browser vendors even though it’s not their fault (laughs}

Technically any application running on your machine could do the same thing if you’re running a split tunnel VPN. There are extensions to turn off WebRTC for people who are very concerned about this.

The other kind of leakage is leakage of your local IP address. the reason this concerns some people is that it can be used to map the topology of your local network, say within an enterprise. However it turns out that applications can use an XmlHttpRequest to do the same thing.  Despite that, the browser vendors are working on ways to turn off the reporting of these local addresses.

There will be more details coming up in an upcoming post on my site.

Dan talking to webrtcHacks guest author Alan Jonhston at the IIT-RTC show

What’s Apple Doing?

webrtcHacks: Now the only major browser vendor left is Apple. Can you comment on public participation by Apple?

Dan: It is clear that people from Apple are continue to follow the work, but they still don’t contribute.

webrtcHacks: Do you know if they contribute to other WG more actively.

Dan: Yes, Apple does contribute more actively in other WG within W3C.


webrtcHacks: Anything new with video codecs now that the market has had some time to react to the decision to include both VP8 & H.264 for browsers? How is the VP9 vs. H.265 and Alliance for Open Media (AOM) discussion changed the discussion?

Dan: The gauntlet has been thrown for the creation of free and open source video codecs. MPEG-LA needs to take notice that the media producers and distributors are serious about coming up with lower cost alternatives. This pressure just continually increases. The AOM is a prime example of that.

webrtcHacks: Has the Alliance for Open Media come up in standards discussion? In the past I know there was discussion of just allowing software codecs that could defined on the fly.

Dan: Codecs still need to be created.  The discussions of VP8 vs H.265 and VP9 vs. H.265 are not really technical discussions. They are all about intellectual property because of the cost of licensing the codecs. The issue is not being able to select a codec – the issue is having a codec that you want to choose.

One API change that is just gone in is being able to choose which codec of the browser supported ones to use.


webrtcHacks: Anything else to add?

Dan: I think we’re finally on a good track in respect to a path forward for ORTC and WebRTC and thus the inclusion of Microsoft as a true and complete WebRTC vendor eventually. We just need the feature inflow from ORTC to stop right now to be able to declare victory and move on.

I think this is evidence that the industry really does want this to happen.

I spoke with a number of people who talk to HTML developer groups and they all agree that even today no more than 50% of the developers have heard of WebRTC – still! It is likely that one reason for that is for many developers a technology isn’t real until it is in Internet Explorer or its successor – Edge.

So having Microsoft fully engaged on a plan that we can all agree on now is a good thing for everyone.


    “interviewer”:“chad hart“,
    “interviewee”:“Dan Burnett

Hello Chrome and Firefox, this is Edge calling

Thu, 10/15/2015 - 14:15

Chrome, Firefox, and Edge are all on the same party line. Image from Pillow Talk (1959)

For the first time, Chrome, Firefox and Edge can “talk” to each other via WebRTC and ORTC. Check the demo on Microsoft’s modern.ie testdrive.

tl;dr: don’t worry, audio works. codec interop issue…

Feature Interoperability Notes ICE yes Edge requires end-of-candidate signaling DTLS yes audio yes using G.722, Opus or G.711 codecs video no standard H.264 is not supported in Edge yet DataChannels no Edge does not support dataChannels

As a reader of this blog, you probably know what WebRTC is but let me quote this:

WebRTC is a new set of technologies that brings clear crisp voice, sharp high-definition (HD) video and low-delay communication to the web browser.

In order to succeed, a web-based communications platform needs to work across browsers. Thanks to the work and participation of the W3C and IETF communities in developing the platform, Chrome and Firefox can now communicate by using standard technologies such as the Opus and VP8 codecs for audio and video, DTLS-SRTP for encryption, and ICE for networking.

This description is taken from the early-2013 Chromium blog post that announced interoperability between Chrome and Firefox. And now Edge?


So we have interoperability – for audio calls.  It is just audio. No video interoperability yet. Now this is just an issue of all vendors implementing at least one common video codec:

  • Edge currently implements a Microsoft variant of H264 called H264UC which adds some features like SVC
    • Adding H264 is work in progress
    • While there is a VP9 decoder for playing videos, that is not usable for ORTC so don’t get too excited
    • See Bernard’s comments for more information
  • Chrome implements VP8; H264 is work in progress
  • Firefox implements VP8 and H264

Audio interoperability is currently using G.722 instead of Opus because Edge still prefers Silk and G.722 over Opus.


But wait, how can those browsers talk if they do not agree on APIs?

Well, I implemented the PeerConnection API on top of ORTC. The gory details can be found here as part of a pull request for adapter.js. It has undergone a quite critical review and improved as a result of that. This process also showed some issues in the ORTC specification. While there has always been the assumption that it would be possible to implement the PeerConnection API using the lower-level ORTC API, nobody had actually done it.

The functionality provided is limited. More than a single audio and video track has not been tested and, since this is using an SDP similar to what is specified in the Unified Plan draft would likely not be interoperable with Chrome. But this is sufficient for quite a number of applications that are simple enough not to benefit from ORTC natively.


Using this Javascript implementation, Edge will generate something that is close enough to the SDP used by the PeerConnection API:

v=0 o=thisisadapterortc 8169639915646943137 2 IN IP4 s=- t=0 0 m=audio 9 UDP/TLS/RTP/SAVPF 104 9 106 0 103 8 97 13 118 101 c=IN IP4 a=rtcp:9 IN IP4 a=rtpmap:104 SILK/16000 a=rtcp-fb:104 x-message app send:dsh recv:dsh a=rtpmap:9 G722/8000 a=rtcp-fb:9 x-message app send:dsh recv:dsh a=rtpmap:106 OPUS/48000/2 a=rtcp-fb:106 x-message app send:dsh recv:dsh a=rtpmap:0 PCMU/8000 a=rtcp-fb:0 x-message app send:dsh recv:dsh a=rtpmap:103 SILK/8000 a=rtcp-fb:103 x-message app send:dsh recv:dsh a=rtpmap:8 PCMA/8000 a=rtcp-fb:8 x-message app send:dsh recv:dsh a=rtpmap:97 RED/8000 a=rtpmap:13 CN/8000 a=rtpmap:118 CN/16000 a=rtpmap:101 telephone-event/8000 a=rtcp-mux a=ice-ufrag:lMRF a=ice-pwd:NR15fT4U6wHaOKa0ivn64MtQ a=setup:actpass a=fingerprint:sha-256 6A:D8:7D:05:1A:ED:DB:BD:6A:60:1A:BC:15:70:D1:6C:A1:D9:00:79:E5:5C:56:15:73:80:E2:82:9D:B9:FB:69 a=mid:nbiwo5l60z a=sendrecv a=msid:7E4272C7-2B6C-49BD-BF7A-A3E7B8DD44F5 D2945771-D7B4-4915-AC29-CEA9EC51EC9E a=ssrc:1001 msid:7E4272C7-2B6C-49BD-BF7A-A3E7B8DD44F5 D2945771-D7B4-4915-AC29-CEA9EC51EC9E a=ssrc:1001 cname:3s6hzpz1jj

Check the anatomy of a WebRTC SDP post to find out what each of these lines mean.

This allows quite a number of the WebRTC PeerConnection samples to work in Edge, just like many of the getUserMedia samples already work.

With that working, the next big challenge was browser interoperability. Would this underspecified blob of text be good enough to be accepted by Chrome and Firefox?

It turned out to be good enough. After adding ICE candidates on both sides the ice connection and DTLS states soon changed to completed and connected. Yay. In Chrome at least.
Firefox did not work because of trivial mistakes that took a while to figure out. But then, it just worked as well.

As far as I am concerned this shows the hard part, making ICE and DTLS interoperable, is solved. The rest is something for codec folks to work out. Not my area of interest

{"author": "Philipp Hancke"}

Microsoft’s ORTC Edge for WebRTC – Q&A with Bernard Aboba

Mon, 10/12/2015 - 17:56

We have been waiting a long time for Microsoft to add WebRTC to its browser portfolio. That day finally came last month when Microsoft announced its new Windows 10 Edge browser had ORTC. This certainly does not immediately address the Internet Explorer population and ORTC is still new to many (which is why we cover it often). On the positive side, interoperability between Edge, Chrome, and Firefox on the audio side was proven within days by multiple parties. Much of ORTC is finding its way into the WebRTC 1.0 specification and browser implementations.

I was with Bernard Aboba, Microsoft’s WebRTC lead at the IIT Real Time Communications Conference (IIT-RTC) and asked him for an interview to cover the Edge implementation and where Microsoft is headed. The conversation below has been edited for readability and technical accuracy. The full, unedited audio recording is also available below if you would rather listen than read. Warning – we recorded our casual conversation in an open room off my notebook microphone, so please do not expect high production value.


We cover what exactly is in Edge ORTC implementation, why ORTC in the first place, the roadmap, and much more.

You can view the IIT-RTC ORTC Update presentation slides given by Bernard, Robin Raymond of Hookflash, and Peter Thatcher of Google here.

{"editor", "chad hart"}

Micosoft’s Edge is hungry for WebRTC

Intro to Bernard

webrtcHacks: Hi Bernard. To start out, can you please describe your role at Microsoft and the projects you’ve been working on? Can you give a little bit of background about your long time involvement in WebRTC Standards, ORTC, and also your new W3C responsibilities?

Bernard: I’m a Principal Architect at Skype within Microsoft, and I work on the Edge ORTC project primarily, but also help out other groups within the company that are interested in WebRTC. I have been involved in ORTC since the very beginning as one of the co-authors of ORTC, and very recently, signed up as an Editor of WebRTC 1.0.

webrtcHacks:  That’s concurrent with some of the agreement around merging more of ORTC into WebRTC going forward. Is that accurate?

Bernard: One of the reasons I signed up was that I found that I was having to file WebRTC 1.0 API issues and follow them. Because many of the remaining bugs in ORTC related to WebRTC 1.0, and of course we wanted the object models to be synced between WebRTC 1.0 and ORTC, I had to review pull requests for WebRTC 1.0  anyway, and reflect the changes within ORTC.  Since I had to be aware of WebRTC 1.0 Issues and Pull Requests to manage the ORTC and Pull Requests, I might as well be an editor of WebRTC 1.0.

Bernard Aboba of Microsoft and Robin Raymond of Hookflash discussing ORTC at the IIT Real Time Communications Conference (IIT-RTC)

What’s in Edge

webrtcHacks:  Then I guess we’ll move on to Edge then. Edge and Edge Preview are out there with varying forms of WebRTC. Can you walk through a little bit of that?

Bernard: Just also to clarify for people, Edge ORTC is in what’s called Windows Insider Preview.  Windows Insider Preview builds are only available to people who specifically sign up to receive them.  If you sign up for the Windows Insider Preview program and install the most recent build 10547, then you will have access to the ORTC API in Edge. In terms of what is in it, the audio is relatively complete. We have:

  • G.711,
  • G.722,
  • Opus,
  • Comfort Noise,
  • DTMF, as well as the
  • SILK codec.

Then on the video side, we have an implementation of H.264/SVC, which does both simulcast and scalable video coding, and as well as forward error correction (FEC), known as H.264UC. I should also mention, we support RED and forward error correction for audio as well. 

That’s what’s you will find in the Edge ORTC API within Windows Insider Preview, as well as support for “half-trickle” ICE, DTLS 1.0, etc.

webrtcHacks: I’ll include the slide from your presentation for everyone to reference because there’s a lot of stuff to go through. I do have a couple of questions on a few things for follow up. One was support on the video side of things for. I think you mentioned external FEC and also talked about other aspects of robustness, such as retransmission?

Bernard’s slide from IIT-RTC 2015 showing Edge’s ORTC coverage

Bernard: Currently in Edge ORTC Insider Preview, we do not support generic NACK or re-transmission.  We do support external forward error correction (FEC), both for audio and video.   Within Opus as well as SILK we do not support internal FEC, but you can configure RED with FEC externally.  Also, we do not support internal Discontinuous Operation (DTX) within Opus or SILK, but you can configure Comfort Noise (CN) for use with audio codec, including Opus and SILK.

Video interoperability

webrtcHacks: Then could you explain H.264 UC? The majority of the people out there that aren’t familiar with the old Lync or Skype for Business as it is now called.

Bernard: Basically, H.264 UC supports spatial simulcast along with temporal scalability in H.264/SVC, handled automatically “under the covers”.  These are basically the same technologies that are in Hangouts with VP8.   While the ORTC API offers detailed control of things like simulcast and SVC, in many cases, the developer just basically wants the stack to do the right thing, such as figuring out how many layers it can send. That’s what H.264UC does.  It can adapt to network conditions by dropping or adding simulcast streams or temporal layers, based on the bandwidth it feels is available. Currently, the H.264UC codec is only supported by Edge.

webrtcHacks:  Is the base layer H.264?

Bernard: Yes, the base layer is H.264 but RFC 6190 specifies additional NAL Unit types for SVC, so that an implementation that only understands the base layer would not be able to understand extension layers.  Also, our implementation of RFC 6190 sends layers using distinct SSRCs, which is known as Multiple RTP stream Single Transport (MRST).  In contrast, VP8 uses Single RTP stream Single Transport (SRST).

We are going to work on an implementation of H.264/AVC in order to interoperate.  As specified in RFC 6184 and RFC 6190, H.264/AVC and H.264/SVC have different codec names.

webrtcHacks:  For Skype, at least, in the architecture that was published, they showed a gateway. Would you expect other people to do similar gateways?

Bernard: Once we support H.264/AVC, developers should be able to configure that codec, and use it to communicate with other browsers supporting H.264/AVC.  That would be the preferred way to interoperate peer-to-peer.  There might be some conferencing scenarios where it might make sense to configure H.264UC and have the SFU or mixer strip off layers to speak to H.264/AVC-only browsers, but that would require a centralized conferencing server or media relay that could handle that. 


webrtcHacks:  What can you can you say about the future roadmap? Is it basically what’s on the dev.modern.ie page?

Bernard: In general, people should look at the dev.modern.ie web page for status, because that has the most up to date. In fact, I often learn about things from the page. As I mentioned, the Screen Sharing and Media Recorder specifications are now under consideration, along with features that are in preview or are under development.  The website breaks down each feature.  If the feature is in Preview, then you can get access to it via the Windows Insider Preview.  If it is under development, this means that it is not yet in Preview.  Features that are supported have already been released, so if you have Windows 10, you should already have access to them. 

Slide from Bernard’s IIT-RTC 2015 presentation covering What’s in Edge

In terms of our roadmap, we made a roadmap announcement in October 2014 and are still executing on things such as H.264, which we have not delivered yet.  Supporting interoperable H.264 is about more than just providing an encoder/decoder, which we have already delivered as part of H.264UC.  The IETF RTCWEB Video specification provides guidance on what is needed to provide interoperable H.264/AVC, but that is not all that a developer needs to implement – there are aspects that are not yet specified, such as bandwidth estimation and congestion control.

Beyond the codec bitstream, RTP transport and congestion control there are other aspects as well.  For example, I mentioned robustness features such as Forward Error Correction and Retransmission.   A Flexible FEC draft is under development in IETF which will handle burst loss (distances greater than one).  That is important for robust operation on wireless networks, for both audio and video.  Today we have internal FEC within Opus, but that does not handle burst loss well.

webrtcHacks: Do you see Edge pushing the boundaries in this area? 

Bernard: One of the areas where Edge ORTC has advanced the state of the art is in external forward error (FEC) correction as well as in statistics.  Enabling external FEC to handle burst loss, provides additional robustness for both audio and video.  We also support additional statistics which provide information on burst loss and FEC operation.  What we have found is that burst loss is a fact of life on wireless networks, so that being able to measure this and to address it is important. The end result of this work is that Edge should be more robust than existing implementations with respect to burst loss (at least with larger RTTs where retransmission would not be available).  We can also provide burst loss metrics, which other implementations cannot currently do.  I should also mention that there are metrics have been developed in the XRBLOCK WG to address issues of burst loss, concealment, error correction, etc.


webrtcHacks:  You have been a long time advocate for ORTC. Maybe you can summarize why ORTC was a good fit for Edge? Why did you start with that spec versus something else? What does it enable you to do now as a result?

Bernard: Some of the advantages of ORTC were indeed advantages, but in implementation we found there were also other advantages we didn’t think of at the time.


Bernard: ORTC doesn’t have SDP [like WebRTC 1.0]; the irony is ORTC allowed us to get to WebRTC 1.0 compatibility and interoperability faster than we would have otherwise. If you look at the adapter.js, it’s actually interesting to read that code- the actual code for Edge is actually smaller than for some of the other browsers. One might think that’s weird – why would it take less adaptation for Edge than for anything else? Are we really more 1.0 compatible than 1.0? The answer is, to some respects, we are, because we don’t generate SDP than somebody needs to parse and reformat. It certainly saves a lot of development to not have to write that code and have control in JavaScript, and also be easy to modify in case people find bugs in it.

The irony is ORTC allowed us to get to WebRTC 1.0 compatibility and interoperability faster than we would have otherwise

Connection State Details

The other thing we found about ORTC that we didn’t quite understand early on was it gives you detailed status of each of the transports- each of your ICE transports. Particularly when you’re dealing with situations like multiple interfaces, you actually get information about failure conditions that you don’t get out of WebRTC 1.0. 

It’s interesting to look at 1.0 – one of the reasons that I think people will find the objects interesting in 1.0 is because you actually need that kind of diagnostic information. The current connection state [in the current WebRTC] is not really enough – it’s not even clear what it means. It says in the spec that it’s about ICE, but it really combines ICE and DTLS. With the object model, you know exactly what ICE transport went down or if DTLS is in some weird state. Actually for diagnostics, details of the connection state is actually pretty important. It’s one of the most frequently requested statistical things. That was a benefit we didn’t anticipate, that we found is pretty valuable and will be coming into 1.0.

Many simple scenarios

Bernard: Then there were the simple scenarios. Everyone said, “I don’t need ORTC because I don’t do scalable video coding and simulcast” Do you ever do hold? Do you ever do changing owners of codecs? All illustrations that Peter [Thatcher] showed in his WebRTC 1.0 presentation. The answer is, a lot of those things are, in fact, common, and were not possible in 1.0. There is a lot of fairly basic benefits that you get as well. 

How is Edge’s Media Engine built

webrtcHacks:  In building and putting this in the Edge, you had a few different media engines you could choose from. You had the Skype media engine and a Lync media – you combine them or go and build a new one. Can you reveal the Edge media architecture and how you put that together?

Bernard: What we chose to do in Skype is move to a unified media engine. What we’ve done is, we’ve added WebRTC capabilities into that media engine. That’s a good thing because, for example, things like RTCP MUX and things like BUNDLE are now part of the Skype media engine so we can use them. The idea was to produce something that was unified and would have all the capabilities in one. It took a little bit longer to do it that way, but the benefit is that we get to produce a standardized compliant browser and we also get to use those technologies internally. Now we do not have 3 or 4 different stacks that we would have to rationalize later.

right now, our focus is very much on video, and trying to get that more solid, and more interoperable

Also, I should mention that one thing that is interesting about the way we work is we produce stacks that are both client and server capable. We don’t just produce pure client code that wouldn’t, for example, be able to handle load. Some of those things can go into back-end components as well. That is also true for DTLS and all that. Whether or not we use all those things in Skype is another issue, but it is part of the repertoire for apps. 

More than Edge

webrtcHacks: Is there anything else that’s not on dev.modern.ie that is exposed that a developer would care about? Any NuGet packages with these API’s for example?

Bernard: There is a couple of things. dev.modern.ie does not cover non-browser things in Windows platform. For example, currently we support DTLS 1.0. We do want to support 1.2, because there’s additional cipher  suites that are important. For example, the Elliptic Curve stuff we’re seeing going into all the browsers. I think Mozilla already has it, or Chrome has it, or if they don’t, they will very soon. That is actually very important. Elliptic Curve turned out to be more than just a cipher suite issue – the time and effort it takes to generate more secure certificates is large. For RSA-2048 you can actually block the UI thread if you thread the object. Anyway, those are very important things that we don’t cover on dev.modern.ie, but those are the things we obviously have to do. 

There’s a lot of work and a lot of thinking that’s been going on in the IETF if relating to ICE and how to be better for mobile scenarios. Some of that I don’t think is converged yet, but there’s a new ICE working group. Some of that is in the ortc-lib implementation yet. Robin [Raymond] likes to be on the cutting edge so he has done basically the first implementation of a lot of those new technologies. That’s something, I think is of general interest – particularly as ORTC moves to mobile.

I should mention, by the way, that the Edge Insider Preview was only for desktop. It does not run on Windows Phone just to clarify that. 

webrtcHacks:  Any plans for embedding the Edge ORTC engine as a IE plugin?

Bernard: An external plugin or something?

webrtcHacks:  Yeah, or a Microsoft plugin for IE that would implement ORTC. 

Bernard: Basically at this point, IE is frozen technology. All the new features, if you look on the website, they all go into Edge. That’s what we’ve been developing for. I never say Microsoft will never do anything, but currently that’s not the thinking. Windows 10 for consumers is a free upgrade. Hopefully, people will take advantage of that and get all the new stuff, including Edge.

Is there an @MSEdgeDev post on the relationship between this and InPrivate? pic.twitter.com/bbu0Mdz0Yd

— Eric Lawrence (@ericlaw) September 22, 2015

A setting discovered in Internet Explorer that appears to address the IP Address Leakage issue. Validating ORTC

webrtcHacks:  Is there anything you want to share?

Bernard: I do want to clarify a little bit, I think adapter.js is a very important thing because it validates our original idea that essentially WebRTC 1.0 could be built into the JavaScript layer with ORTC. 

webrtcHacks:  And that happened pretty quick – with Fippo‘s help. Really quick. 

Bernard: Fippo has written all the pull requests. We’re paying a lot of attention to the bugs he’s finding. Obviously, he’s finding bugs in Edge, which hopefully we’ll fix, but he’s also finding spec bugs. It really helps make sure that this compatibility that we’ve promised is actually real. It’s a very interesting process to actually reduce that to code so that it’s not just a vague promise. It has to be demonstrated in software.

Of course what we’ve done is currently with audio. We know that video is more complicated, particularly as you start adding lots and lots of codecs to get that level of compatibility. I wouldn’t say that when Fippo is down with audio that it will be the last word. I think we’ll have to  pay even more attention to interoperability stuff in the video cases. It will be interesting because video is a lot more complicated. 

adapter.js is a very important thing because it validates our original idea that essentially WebRTC 1.0 could be built into the JavaScript layer with ORTC.

What does the Microsoft WebRTC team look like

webrtcHacks:  Can you comment on how big the time is that’s working on ORTC in Edge? You have a lot of moving pieces in different aspects … 

Bernard: There’s the people in Edge. There’s the people in Skype. In the Windows system there’s the people on the S-channel team that worked on the DTLS. There’s people all over – for example, the VP9 work that we talked about, was not done by either Skype or the conventional Edge people. It’s the whole Windows Media team. I don’t really know how to get my hands around this, because if you look at all the code we’re using, it’s written by probably, I don’t know, hundreds and hundreds of people. 

webrtcHacks:  And you need to pull it together for purposes of WebRTC/ORTC, is that right?

Bernard: Yeah. We have to pull it together, but there’s a lot there. There’s a lot of teams. There will probably be more teams going forward. People say, “Why don’t you have the datachannel”? The dataChannel isn’t something that would be in Skype’s specific area of expertise. That’s a transfer protocol, it should be really written by people who are experts in transfer protocol, which isn’t either Edge or Skype. It’s not some decision that was made by either of our groups not to do it. We have to find somebody who proves that they can do that work, to take ownership of that. 

Feedback please

webrtcHacks:  Any final comments?

Bernard: No. I just encourage people to download the preview, run it, file bugs, and let us know what you think. You can actually can vote on the website for new features, which is cool. 

We do listen to the input. WebRTC is an expanding thing. There’s a ton of things you can do – there’s all that stuff on dev.modern.ie site and then there’s internal improvement. Getting a sense of priority is what’s most important to people, is not that easy, because there’s so much that you could possibly focus on. I’d say right now, our focus is very much on video, and trying to get that more solid, and more interoperable, at least for the moment. We can walk and chew gum at the same time. We can do more than just one thing. Conceivably, especially when you look at IE and other teams. 

webrtcHacks:  This is great and very insightful. I think it will be a big help to all the developers out there. Thanks!

    “interviewer”:“chad hart“,
    “interviewee”:“Bernard Aboba

Traffic Encryption

Wed, 09/23/2015 - 20:35

So I talked about Skype and Viber at KrankyGeek two weeks ago. Watch the video on youtube or take a look at the slides. No “reports” or packet dumps to publish this time, mostly because it is very hard to draw conclusions from the results.

The VoIP services we have looked at so far which use the RTP protocol for transferring media. RTP uses a packet header which is not encrypted and contains a number of attributes such as the payload type (identifying the codec used), a synchronization source (which identifies the source of the stream), a sequence number and a timestamp. This allows routers to identify RTP packets and prioritize them. This also allows someone monitoring all network traffic (“Pervasive Monitoring“) to easily identify VoIP traffic. Or someone wiretapping your internet connection.

Skype and Viber encrypt all packets. Does that make them them less susceptible for this kind of attack?

Bear with me, the answer is going to be very technical. tl;dr:

  • it is still pretty easy to determine that you are making a call.
  • it is also pretty easy to tell if you muted your microphone.
  • it is pretty apparent whether this is a videochat.

Not expecting to find much, I ran a standard set of scenarios with Skype of Android and iOS similar to those used in the Whatsapp analysis.
A first look did not show much. Luckily, when analyzing WhatsApp I had developed some tooling to deal with RTP. I modified those tools, removing the RTP parser, and was greeted with these graphs:

While the bitrate alone (blue is my ipad3 with a 172.16. ip address, black is my old Android phone) is not very interesting, the packet rate of exactly 50 packets was interesting. Also, the packet length distribution was similar to Opus. As I figured out later from the integrated debugging (on the Android device, this must be too technical for iOS users!), this was the Silk codec. In fact, if you account for some overhead the black distribution matches what we saw from WhatsApp earlier and what is now known to be Opus at 16khz or 8khz.
So the encryption did not change the traffic pattern. Nor does it hide the fact that a call is happening.

Keepalive Traffic

When muting the audio on one device, one can even see regular spikes in the traffic every then seconds. Supposedly, those are keepalive packets.

Telling audio and video traffic apart

Let’s look at some video traffic. Note the two distinct distributions in the third graph? Let’s suppose that the left one is audio and everything else is video. This works well enough looking at the last graph which shows the ‘audio’ traffic in green and orange respectively.

The accuracy could possibly be improved a little by looking at the number of packets which is pretty much constant for audio.
In RTP, we would use the synchronization source (SSRC) field from the header to accomplish this. But that just makes things easier for routers.

Relay traffic

Last but not least relays. When testing this from Europe, I was surprised to see my traffic being routed through Redmond, Washington.

This is quite interesting in comparison to the first graph. The packet rate stays roughly the same, but the bitrate doubles to 100 kilobits/second. That is quite some overhead compared to the standard TURN protocol which has negligible overhead. The packet length distribution is shifted to the right and there are a couple of very large packets. Latency was probably higher but this is very hard to measure.


While I got some pretty interesting results from Skype, Viber turned out to harder. Thanks to the tooling it took now only a matter of seconds to discover that, like Whatsapp, it uses a relay server to help with call establishment:

Blue traffic is captured locally before it is sent to the peer, the black and green traffic is received from the remote end. The traffic shown in black almost vanishes after a couple of initial spikes (which contain very large packets at a low frequency). Visualizations of this kind are a lot easier to understand than the packet dumps captured with Wireshark.

And for the sake of completeness, muting audio on both sides showed keepalive traffic, visible as tiny period spikes in this graph:


VoIP security is hard. And this not really news, attacks on encrypted VoIP traffic have been known for quite a while, see e.g. this paper from 2008 and the more recent ‘Phonotactic Reconstruction’ attacks.

The fact that RTP does not encrypt the header data makes it slightly easier to identify, but it seems that a determined attacker could have come to the same conclusions about the encrypted traffic of services like Skype. Keep that in mind when talking about the security of your service. Also, keep the story of the ECB penguin in mind.

Or, as Emil Ivov said about the security of peer-to-peer: “Unless there is a cable going between your computer and the other guys computer and you can see the entire cable, then you’re probably in for a rude awakening”.

{"author": "Philipp Hancke"}

First steps with ORTC

Fri, 09/18/2015 - 21:20

ORTC support in Edge has been announced today. A while back, we saw this on twitter:

Windows Insider Preview build 10525 is now available for PCs: http://t.co/zeXQJocgLs This release lays groundwork for ORTC in Microsoft Edge

— Microsoft Edge Dev (@MSEdgeDev) August 18, 2015

“This release [build 10525] lays the groundwork for ORTC” was quite an understatement. It was considered experimental and while the implementation still differs from the specification (which is still work in progress) slightly, it already worked and as a developer you can get familiar with how ORTC works and how it is different from the RTCPeerConnection API.
If you want to test this, please use builds newer than 10547. Join the Windows Insider Program to get them and make sure you’re on the fast ring.

The approach taken differs from the RTCPeerConnection way of giving you a blob that you exchange as this WebRTC PC1 sample shows quite well. It’s more about giving you the building blocks.

In ORTC, you have to incrementally build up things. Let’s walk through the code (available on github):

Setting up a Peer to Peer connection

var gatherer1 = new RTCIceGatherer(iceOptions); var transport1 = new RTCIceTransport(gatherer1); var dtls1 = new RTCDtlsTransport(transport1);

There are three elements on the transport side:
* the RTCIceGatherer which gathers ICE candidates to be sent to the peer,
* the RTCIceTransport where you add the candidates from the peer,
* the DtlsTransport which is sitting on top of the ICE transport and deals with encryption.

As in the peerConnection API, you exchange the candidates:

// Exchange ICE candidates. gatherer1.onlocalcandidate = function (evt) { console.log('1 -> 2', evt.candidate); transport2.addRemoteCandidate(evt.candidate); }; gatherer2.onlocalcandidate = function (evt) { console.log('2 -> 1', evt.candidate); transport1.addRemoteCandidate(evt.candidate); };

Also, you need to exchange the ICE parameters (usernameFragment and password) and start the ICE transport:

transport1.start(gatherer1, gatherer2.getLocalParameters(), 'controlling'); transport1.onicestatechange = function() { console.log('ICE transport 1 state change', transport1.state); };

This is done with SDP in the PeerConnection API. One side needs to be controlling, the other is controlled.

You also need to start the DTLS transport with the remote fingerprint and hash algorithm:

dtls1.start(dtls2.getLocalParameters()); dtls1.ondtlsstatechange = function() { console.log('DTLS transport 1 state change', dtls1.state); };

Once this is done, you can see the candidates being exchanged and the ICE and DTLS state changes on both sides.

Cool. Now what?

Sending a MediaStream track over the connection

Let’s send a MediaStream track. First, we acquire it using the promise-based navigator.mediaDevices.getUserMedia API and attach it to the local video element.

// call getUserMedia to get a MediaStream. navigator.mediaDevices.getUserMedia({video: true}) .then(function(stream) { document.getElementById('localVideo').srcObject = stream;

Next, we determine the send and receive parameters. This is where the PeerConnection API does the “offer/answer” magic.
Since our sending capabilities match the receiving capabilities, there is little we need to do here.
Some black magic is still involved, check the specification for the gory details.

// Determine RtpCodecParameters. Consider this black magic. var params = RTCRtpReceiver.getCapabilities('video'); params.muxId = 1001; params.encodings = [{ ssrc: 1001, codecPayloadType: 0, fec: 0, rtx: 0, priority: 1.0, maxBitrate: 2000000.0, minQuality: 0, framerateBias: 0.5, resolutionScale: 1.0, framerateScale: 1.0, active: true, dependencyEncodingId: undefined, encodingId: undefined }]; // We need to transform the codec capability into a codec. params.codecs.forEach(function (codec) { codec.payloadType = codec.preferredPayloadType; }); params.rtcp = { cname: "", reducedSize: false, ssrc: 0, mux: true }; console.log(params);

Then, we start the RtpReceiver with those parameters:

// Start the RtpReceiver to receive the track. receiver = new RTCRtpReceiver(dtls2, 'video'); receiver.receive(params); var remoteStream = new MediaStream(); remoteStream.addTrack(receiver.track); document.getElementById('remoteVideo').srcObject = remoteStream;

Note that the Edge implementation is slightly different from the current ORTC specification here since you need to specify the media type as second argument when creating the RtpReceiver.
We create a stream to contain the track and attach it to the remote video element.
Last but not least, let’s send the video track we got:

sender = new RTCRtpSender(stream.getVideoTracks()[0], dtls1); sender.send(params);

That’s it. It gets slightly more complicated when you have to deal with multiple tracks, and have to actually negotiate capabilities in order to interop between Chrome and Edge. But that’s a longer story…

{"author": "Philipp Hancke"}

Reacting to React Native for native WebRTC apps (Alexey Aylarov)

Tue, 09/15/2015 - 23:47

It turns out people like their smartphone apps, so that native mobile is pretty important. For WebRTC that usually leads to venturing outside of JavaScript into the world of C++/Swift for iOS and Java for Android. You can try hybrid applications (see our post on this), but many modern web apps applications often use JavaScript frameworks like AngularJS, Backbone.js, Ember.js, or others and those don’t always mesh well with these hybrid app environments.

Can you have it all? Facebook is trying with React which includes the ReactJS framework and  React Native for iOS and now Android too. There has been a lot of positive fanfare with this new framework, but will it help WebRTC developers? To find out I asked VoxImplant’s Alexey Aylarov to give us a walkthrough of using React Native for a native iOS app with WebRTC.

{"editor": "chad hart"}

If you haven’t heard about ReactJS or React Native then I can recommend to check them out. They already have a big influence on a web development and started having influence on mobile app development with React Native release for iOS and an Android version just released. It sounds familiar, doesn’t it? We’ve heard the same about WebRTC, since it changes the way web and mobile developers implement real-time communication in their apps. So what is React Native after all?

“React Native enables you to build world-class application experiences on native platforms using a consistent developer experience based on JavaScript and React. The focus of React Native is on developer efficiency across all the platforms you care about — learn once, write anywhere. Facebook uses React Native in multiple production apps and will continue investing in React Native.”

I can simplify it to “one of the best ways for web/javascript developers to build native mobile apps, using familiar tools like Javascript, NodeJS, etc.”. If you are connected to WebRTC world (like me) the first idea that comes to your mind when you play with React Native is “adding WebRTC there should be a big thing, how can I make it?” and then from React Native documentation you’ll find out that there is a way to create your own Native Modules:

Sometimes an app needs access to platform API, and React Native doesn’t have a corresponding module yet. Maybe you want to reuse some existing Objective-C, Swift or C++ code without having to reimplement it in JavaScript, or write some high performance, multi-threaded code such as for image processing, a database, or any number of advanced extensions.

That’s exactly what we needed! Our WebRTC module in this case is a low-level library that provides high-level Javascript API for React Native developers. Another good thing about React Native is that it’s an open source framework and you can find a lot of required info on GitHub. It’s very useful, since React Native is still very young and it’s not easy to find the details about native module development. You can always reach out to folks using Twitter (yes, it works! Look for #reactnative or https://twitter.com/Vjeux) or join their IRC channel to ask your questions, but checking examples from GitHub is a good option.

React Native’s module architecture

Native modules can have C/C++ , Objective-C, and Javascript code. This means you can put the native WebRTC libraries, signaling and some other libs written in C/C++ as a low-level part of your module, implement video element rendering in Objective-C and offer Javascript/JSX API for react native developers.

Technically low-level and high-level code is divided in the following way:

  1. you create Objective-C class that extends React’s RCTBridgeModule class and
  2. use RCT_EXPORT_METHOD to let Javascript code work with it.

While in Objective-C you can interact with the OS, C/C++ libs and even create iOS widgets. The Ready-to-use native module(s) can be distributed in number of different ways, the easiest one being via a npm package.

WebRTC module API

We’ve been implementing a React Native module for our own platform and already knew which of our API functions we would provide to Javascript. Creating a WebRTC module that is independent of signaling that can be used by any WebRTC developer is a much more complicated problem.

We can divide the process into few parts:

Integration with WebRTC

Since webRTC does not limit developers how to discover user names and network connection information, this signaling can be done in multiple ways. Google’s WebRTC implementation known as libwebrtc. libwebrtc has a built-in library called libjingle that provides “signaling” functionality.

There are 3 ways how libwebrtc can be used to establish a communication:

  1. libjingle with built-in signaling

This is the simplest one leveraging libjingle. In this case signaling is implemented in libjingle via XMPP protocol.

  1. Your own signaling

This is a more complicated one with signaling on the application side. In this case you need to implement SDP and ICE candidates exchange and pass data to webrtc. One of popular methods is to use some SIP library for signaling.

  1. Application-controlled RTC

For the hardcore you can avoid using signaling altogether This means the application should take care of all RTP session params: RTP/RTCP ports, audio/video codecs, codec params, etc. Example of this type of integration can be found in WebRTC sources in WebRTCDemo app for Objective-C (src/talk/app/webrtc)

Adding Signaling

We used the 2nd approach in our implementation. Here are some code examples for making/receiving calls (C++):

  1. First of all, create Peer Connection factory:
    peerConnectionFactory = webrtc::CreatePeerConnectionFactory(…);
  2. Then creating local stream (we can set if it will be voice or video call):
    localStream = &nbsp;peerConnectionFactory->CreateLocalMediaStream(uniqueLabel); localStream->AddTrack(audioTrack); &nbsp;&nbsp;&nbsp;&nbsp;if (withVideo) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;localStream->AddTrack(videoTrack);
  3. Creating PeerConnection (set STUN/TURN servers list, if you are going to use it)
    webrtc::PeerConnectionInterface::IceServers servers; webrtc::CreateSessionDescriptionObserver* peerConnectionObserver; peerConnection = peerConnectionFactory ->CreatePeerConnection(servers, …., peerConnectionObserver);
  4. Adding local stream to Peer Connection:
  5. Creating SDP:
    webrtc::CreateSessionDescriptionObserver* sdpObserver;

    1. For outbound call:
      1. Creating SDP:
      2. Waiting for SDP from remote peer (via signaling) and pass it to Peer Connection:
    2. In case of inbound call we need to set remote SDP before setting local SDP:
      peerConnection->SetRemoteDescription(remoteSDP); peerConnection->CreateAnswer(sdpObserver);
  6. Waiting for events and sending SDP and ICE-candidate info to remote party (via signaling):
    webrtc::CreateSessionDescriptionObserver::OnSuccess(webrtc::SessionDescriptionInterface* desc) { if (this->outgoing) sendOffer(); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;else &nbsp; &nbsp; &nbsp; &nbsp; sendAnswer(); } webrtc::CreateSessionDescriptionObserver::OnIceCandidate(const webrtc::IceCandidateInterface* candidate) &nbsp;{ &nbsp; &nbsp; &nbsp; &nbsp; sendIceCandidateInfo(candidate); }
  7. Waiting for ICE candidates info from remote peer  and when it arrives pass it to Peer Connection:
  8. After a successful ICE exchange (if everything is ok) connection/call is established.
Integration with React Native

First of all we need to create react-native module (https://facebook.github.io/react-native/docs/native-modules-ios.html) , where we describe the API and implement audio/video calling using WebRTC (Obj-C , iOS):

@interface YourVoipModule () { } @end @implementation YourVoipModule RCT_EXPORT_MODULE(); RCT_EXPORT_METHOD(createCall: (NSString *) to withVideo: (BOOL) video ResponseCallback: (RCTResponseSenderBlock)callback) { NSString * callId = [createVoipCall: to withVideo:video]; callback(@[callId]); }

If want to to support video calling we will need an additional component to show the local camera (Preview) or remote video stream (RemoteView):

@interface YourRendererView : RCTView @end

Initialization and deinitialization can be implemented in the following methods:

- (void)removeFromSuperview {         [videoTrack removeRenderer:self];         [super removeFromSuperview]; } - (void)didMoveToSuperview {         [super didMoveToSuperview];         [videoTrack addRenderer:self]; }

You can find the code examples on our GitHub page – just swap the references to our signaling with your own. We found examples very useful while developing the module, so hopefully they will help you to understand the whole idea much faster.


The end result can look like as follows:

Closing Thoughts

When WebRTC community started working on the standard one of the main ideas was to make real-time communications simpler for web developers and provide developers with a convenient Javascript API for real time communications. React Native has similar goal, it lets web developers build native apps using Javascript. In our opinion bringing WebRTC to the set of available React Native APIs makes a lot of sense – web app developers will be able to build their RTC apps for mobile platforms. Guys behind React Native has just released it for Android at Scale conference, so we will update the article or write a new one about building the module compatible with Android as soon as we know all the details.

{"author", "Alexey Aylarov"}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.

Gaming with the WebRTC DataChannel – A Walkthrough with Arin Sime

Thu, 08/27/2015 - 14:51

The fact that you can use WebRTC to implement a secure, reliable, and standards based peer-to-peer network is a huge deal that is often overlooked.  We have been notably light on the DataChannel here at webrtcHacks, so I asked Arin Sime if would be interested in providing one of his great walkthrough’s on this topic.  He put together a very practical example of a multi-player game.  You make recognize Arin from RealTime Weekly or from his company Agility Feat or his new webRTC.ventures brand. Check out this excellent step-by-step guide below and start lightening the load on your servers and reducing message latency with the DataChannel.

{“editor”: “chad hart“}

WebRTC is “all about video chat”, right? I’ve been guilty of saying things like that myself when explaining WebRTC to colleagues and clients, but it’s a drastic oversimplification that should never go beyond your first explanation of WebRTC to someone.

Of course, there’s more to WebRTC than just video chat. WebRTC allows for peer-to-peer video, audio, and data channels.  The Data channels are a distinct part of that architecture and often forgotten in the excitement of seeing your video pop up in the browser.

Don’t forget about the Data Channel!

Being able to exchange data directly between two browsers, without any sort of intermediary web socket server, is very useful. The Data Channel carries the same advantages of WebRTC video and audio:  it’s fully peer-to-peer and encrypted.  This means Data Channels are useful for things like text chat applications, file transfers, P2P file exchanges, gaming, and more.a

In this post, I’m going to show you the basics of how to setup and use a WebRTC Data Channel.

First, let’s review the architecture of a WebRTC application.

You have to setup signaling code in order to establish the peer to peer connection between two peers.  Once the signaling is complete (which takes place over a 3rd party server), then you have a Peer to Peer (P2P) connection between two users which can contain video and audio streams, and a data channel.

The signaling for both processes is very similar, except that if you are building a Data Channel only application then you don’t need to call GetUserMedia or exchange streams with the other peer.

Data Channel Security

There are a couple of other differences about using the DataChannel.  The most obvious one is that users don’t need to give you their permission in order to establish a Data Channel over an RTCPeerConnection object.  That’s different than video and audio, which will prompt the browser to ask the user for permissions to turn on their camera and microphone.

Although it’s generating some debate right now, data channels don’t require explicit permission from users.  That makes it similar to a web socket connection, which can be used in a website without the knowledge of users.

The Data Channel can be used for many different things.  The most common examples are for implementing text chat to go with your video chat.  If you’re already setting up an RTCPeerConnection for video chat, then you might as well use the same connection to supply a Data Channel for text chat instead of setting up a different socket connection for text chat.

Likewise, you can use the Data Channel for transferring files directly between your peers in the RTCPeerConnection.  This is nicer than a normal socket style connection because just like WebRTC video, the Data Channel is completely peer-to-peer and encrypted in transit.  So your file transfer is more secure than in other architectures.

The game of “Memory”

Don’t limit your Data Channel imagination by these common examples though.  In this post, I’m going to show you how to use the Data Channel to build a very simple two-player game.  You can use the Data Channel to transfer any type of data you like between two browsers, so in this case we’ll use it to send commands and data between two players of a game you might remember called “Memory”.

In the game of memory, you can flip over a card, and then flip a second card, and if they match, you win that round and the cards stay face up.  If they didn’t match, you put both face down again, and it’s the next person’s turn.  By trying to remember what you and your opponents have been flipping, and where those cards were, you can win the game by correctly flipping the most pairs.

Photo Credit: http://www.vwmin.org/memory-game.html

Adam Khoury already built a javascript implementation of this game for a single player, and you can read his tutorial on how to build the game Memory for a single player.  I won’t explain the logic of his code for building the game, what I’m going to do instead is build on top of his code with a very simple WebRTC Data Channel implementation to keep the card flipping in synch across two browsers.

You can see my complete code on GitHub, and below I’m going to show you the relevant segments.

In this example view of my modified Memory game, the user has correctly flipped pairs of F, D, and B, so those cards will stay face up.  The cards K and L were just flipped and did not match, so they will go back face down.

Setting up the Data Channel configuration

I started with a simple NodeJS application to serve up my code, and I added in Express to create a simple visual layer.  My project structure looks like this:

The important files for you to look at are datachannel.js (where the majority of the WebRTC logic is), memorygame.js (where Adam’s game javascript is, and which I have modified slightly to accommodate the Data Channel communications), and index.ejs, which contains a very lightweight presentation layer.

In datachannel.js, I have included some logic to setup the Data Channel.  Let’s take a look at that:

//Signaling Code Setup var configuration = {         'iceServers': [{                 'url': 'stun:stun.l.google.com:19302'         }] }; var rtcPeerConn; var dataChannelOptions = {         ordered: false, //no guaranteed delivery, unreliable but faster         maxRetransmitTime: 1000, //milliseconds }; var dataChannel;

The configuration variable is what we pass into the RTCPeerConnection object, and we’re using a public STUN server from Google, which you often see used in WebRTC demos online.  Google is kind enough to let people use this for demos, but remember that it is not suitable for public use and if you are building a real app for production use, you should look into setting up your own servers or using a commercial service like Xirsys to provide production ready STUN and TURN signaling for you.

The next set of options we define are the data channel options.  You can choose for “ordered” to be either true or false.

When you specify “ordered: true”, then you are specifying that you want a Reliable Data Channel.  That means that the packets are guaranteed to all arrive in the correct order, without any loss, otherwise the whole transaction will fail.  This is a good idea for applications where there is significant burden if packets are occasionally lost due to a poor connection.  However, it can slow down your application a little bit.

We’ve set ordered to false, which means we are okay with an Unreliable Data Channel.  Our commands are not guaranteed to all arrive, but they probably will unless we are experiencing poor connectivity.  Unless you take the Memory game very seriously and have money on the line, it’s probably not a big deal if you have to click twice.  Unreliable data channels are a little faster.

Finally, we set a maxRetransmitTime before the Data Channel will fail and give up on that packet.   Alternatively, we could have specified a number for maxRetransmits, but we can’t specify both constraints together.

Those are the most common options for a data channel, but you can also specify the protocol if you want something other than the default SCTP, and you can set negotiated to true if you want to keep WebRTC from setting up a data channel on the other side.  If you choose to do that, then you might also want to supply your own id for the data channel.  Typically you won’t need to set any of these options, leave them at their defaults by not including them in the configuration variable.

Set up your own Signaling layer

The next section of code may be different based on your favorite options, but I have chosen to use express.io in my project, which is a socket.io package for node that integrates nicely with the express templating engine.

So the next bit of code is how I’m using socket.io to signal to any others on the web page that I am here and ready to play a game.  Again, none of this is specified by WebRTC.  You can choose to kick off the WebRTC signaling process in a different way.

io = io.connect(); io.emit('ready', {"signal_room": SIGNAL_ROOM}); //Send a first signaling message to anyone listening //In other apps this would be on a button click, we are just doing it on page load io.emit('signal',{"type":"user_here", "message":"Would you like to play a game?", "room":SIGNAL_ROOM});

In the next segment of datachannel.js, I’ve setup the event handler for when a different visitor to the site sends out a socket.io message that they are ready to play.

io.on('signaling_message', function(data) {         //Setup the RTC Peer Connection object         if (!rtcPeerConn)                 startSignaling();         if (data.type != "user_here") {                 var message = JSON.parse(data.message);                 if (message.sdp) {                         rtcPeerConn.setRemoteDescription(new RTCSessionDescription(message.sdp), function () {                                 // if we received an offer, we need to answer                                 if (rtcPeerConn.remoteDescription.type == 'offer') {                                         rtcPeerConn.createAnswer(sendLocalDesc, logError);                                 }                         }, logError);                 }                 else {                         rtcPeerConn.addIceCandidate(new RTCIceCandidate(message.candidate));                 }         }         });

There are several things going on here.  The first one to be executed is that if the rtcPeerConn object has not been initialized yet, then we call a local function to start the signaling process.  So when Visitor 2 announces themselves as here, they will cause Visitor 1 to receive that message and start the signaling process.

If the type of socket.io message is not “user_here”, which is something I arbitrarily defined in my socket.io layer and not part of WebRTC signaling, then the code goes into a couple of WebRTC specific signaling scenarios – handling an SDP “offer” that was sent and crafting the “answer” to send back, as well as handling ICE candidates that were sent.

The WebRTC part of Signaling

For a more detailed discussion of WebRTC signaling, I refer you to http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/”>Sam Dutton’s HTML5 Rocks tutorial, which is what my signaling code here is based on.

For completeness’ sake, I’m including below the remainder of the signaling code, including the startSignaling method referred to previously.

function startSignaling() {         rtcPeerConn = new webkitRTCPeerConnection(configuration, null);         dataChannel = rtcPeerConn.createDataChannel('textMessages', dataChannelOptions);         dataChannel.onopen = dataChannelStateChanged;         rtcPeerConn.ondatachannel = receiveDataChannel;         // send any ice candidates to the other peer         rtcPeerConn.onicecandidate = function (evt) {                 if (evt.candidate)                         io.emit('signal',{"type":"ice candidate", "message": JSON.stringify({ 'candidate': evt.candidate }), "room":SIGNAL_ROOM});         };         // let the 'negotiationneeded' event trigger offer generation         rtcPeerConn.onnegotiationneeded = function () {                 rtcPeerConn.createOffer(sendLocalDesc, logError);         }   }   function sendLocalDesc(desc) {         rtcPeerConn.setLocalDescription(desc, function () {                 io.emit('signal',{"type":"SDP", "message": JSON.stringify({ 'sdp': rtcPeerConn.localDescription }), "room":SIGNAL_ROOM});         }, logError); }

This code handles setting up the event handlers on the RTCPeerConnection object for dealing with ICE candidates to establish the Peer to Peer connection.

Adding DataChannel options to RTCPeerConnection

This blog post is focused on the DataChannel more than the signaling process, so the following lines in the above code are the most important thing for us to discuss here:

rtcPeerConn = new webkitRTCPeerConnection(configuration, null); dataChannel = rtcPeerConn.createDataChannel('textMessages', dataChannelOptions); dataChannel.onopen = dataChannelStateChanged; rtcPeerConn.ondatachannel = receiveDataChannel;

In this code what you are seeing is that after an RTCPeerConnection object is created, we take a couple extra steps that are not needed in the more common WebRTC video chat use case.

First we ask the rtcPeerConn to also create a DataChannel, which I arbitrarily named ‘textMessages’, and I passed in those dataChannelOptions we defined previously.

Setting up Message Event Handlers

Then we just define where to send two important Data Channel events:  onopen and ondatachannel.  These do basically what the names imply, so let’s look at those two events.

function dataChannelStateChanged() {         if (dataChannel.readyState === 'open') {                 dataChannel.onmessage = receiveDataChannelMessage;         } } function receiveDataChannel(event) {         dataChannel = event.channel;         dataChannel.onmessage = receiveDataChannelMessage; }


When the data channel is opened, we’ve told the RTCPeerConnection to call dataChannelStateChanged, which in turn tells the dataChannel to call another method we’ve defined, receiveDataChannelMessage, whenever a data channel message is received.

The receiveDataChannel method gets called when we receive a data channel from our peer, so that both parties have a reference to the same data channel.  Here again, we are also setting the onmessage event of the data channel to call our method receiveDataChannelMessage method.

Receiving a Data Channel Message

So let’s look at that method for receiving a Data Channel message:

function receiveDataChannelMessage(event) {                 if (event.data.split(" ")[0] == "memoryFlipTile") {                 var tileToFlip = event.data.split(" ")[1];                 displayMessage("Flipping tile " + tileToFlip);                 var tile = document.querySelector("#" + tileToFlip);                 var index = tileToFlip.split("_")[1];                 var tile_value = memory_array[index];                 flipTheTile(tile,tile_value);         } else if (event.data.split(" ")[0] == "newBoard") {                 displayMessage("Setting up new board");                 memory_array = event.data.split(" ")[1].split(",");                 newBoard();         } }

Depending on your application, this method might just print out a chat message to the screen.  You can send any characters you want over the data channel, so how you parse and process them on the receiving end is up to you.

In our case, we’re sending a couple of specific commands about flipping tiles over the data channel.  So my implementation is parsing out the string on spaces, and assuming the first item in the string is the command itself.

If the command is “memoryFlipTile”, then this is the command to flip the same tile on our screen that our peer just flipped on their screen.

If the command is “newBoard”, then that is the command from our peer to setup a new board on our screen with all the cards face down.  The peer is also sending us a stringified array of values to go on each card so that our boards match.  We split that back into an array and save it to a local variable.

Controlling the Memory game to flip tiles

The actual flipTheTile and newBoard methods that are called reside in the memorygame.js file, which is essentially the same code that we’ve modified from Adam.

I’m not going to step through all of Adam’s code to explain how he built the single player Memory game in javascript, but I do want to highlight two places where I refactored it to accommodate two players.

In memorygame.js, the following function tells the DataChannel to let our peer know which card to flip, as well as flips the card on our own screen:

function memoryFlipTile(tile,val){         dataChannel.send("memoryFlipTile " + tile.id);         flipTheTile(tile,val); }

Notice how simple it is to send a message to our peers using the data channel – just call the send method and pass any string you want.  A more sophisticated example might send well formatted XML or JSON in a message, in any format you specify.  In my case, I just send a command followed by the id of the tile to flip, with a space between.

Setting up a new game board

In Adam’s single player memory game, a new board is setup whenever you load the page.  In my two player adaptation, I decided to have a new board triggered by a button click instead:

var setupBoard = document.querySelector("#setupBoard"); setupBoard.addEventListener('click', function(ev){         memory_array.memory_tile_shuffle();         newBoard();         dataChannel.send("newBoard " + memory_array.toString());         ev.preventDefault(); }, false);


In this case, the only important thing to notice is that I’ve defined a “newBoard” string to send over the data channel, and in this case I want to send a stringified version of the array containing the values to put behind each card.

Next steps to make the game better

That’s really all there is to it!  There’s a lot more we could do to make this a better game.  I haven’t built in any logic to limit the game to two players, keep score by players, or enforce the turns between the players.  But it’s enough to show you the basic idea behind using the WebRTC data channel to send commands in a multiplayer game.

The nice thing about building a game like this that uses the WebRTC data channel is it’s very scalable.  All my website had to do is help the two players get a connection setup, and after that, all the data they need to exchange with each other is done over an encrypted peer-to-peer channel and it won’t burden my web server at all.

A completed multiplayer game using the Data Channel

Here’s a video showing the game in action:

Demo of a simple two player game using the WebRTC Data Channel video

As I hope this example shows you, the hard part of WebRTC data channels is really just in the signaling and configuration, and that’s not too hard.  Once you have the data channel setup, sending messages back and forth is very simple.  You can send messages that are as simple or complex as you like.

How are you using the Data Channel?  What challenges have you run into?  Feel free to contact me on Twitter or through my site to share your experiences too!

{"author": "arin sime"}





Making WebRTC source building not suck (Alex Gouaillard)

Tue, 08/25/2015 - 15:17

One of WebRTC’s benefits is that the source to it is all open source. Building WebRTC from source provides you the ultimate flexibility to do what you want with the code, but it is also crazy difficult for all but the small few VoIP stack developers who have been dedicated to doing this for years. What benefit does the open source code provide if you can’t figure out how to build from it?

As WebRTC matures into mobile, native desktop apps, and now into embedded devices as part of the Internet of Things, working with the lower-level source code is becoming increasingly common.

Frequent webrtcHacks guest poster Dr. Alex Gouaillard has been trying to make this easier. Below he provides a review of the building WebRTC from source, exposing many of the gears WebRTC developers take for granted when they leverage a browser or someone else’s SDK. Alex also reviews the issues complexities associated with this process and introduces the open source make process he developed to help ease the process.

{"editor": "chad hart"}

Building WebRTC from source sometimes feels like engineering the impossible. Photo courtesy of Andrew Lipson.

Building WebRTC from source

Most of the audience for WebRTC (and webrtcHacks)  is made of web developers, JavaScript and cloud ninjas that might not be less familiar with handling external libraries from source. That process is painful. Let’s make it clear, it’s painful for everybody – not only web devs.

 What are the cases where you need to build from source?

  1. Writing a native app – mobile, desktop, IoT,..)
  2. Some kind of server (gateway, media, ….)
  3. Plugin (either for IE, Safari, Cordova, …)

 You basically need to build from source anytime you can’t leverage a browser, WebRTC enabled node.js (for the sake of discussion), SDK’s someone how put together for you,  or anything else.

 These main cases are illustrated below in the context of a comprehensive offering.

Figure 1: map of a WebRTC solution

Usually, the project owners provide precompiled and tested libraries that you can use yourself (stable) and the most recent version that is compiled but not tested for those who are brave.

Pre-compiled libraries are usable out of the box, but do not allow you to modify anything. Sometimes there are build scripts that help you recompile the libs yourselves. This provides more flexibility in terms of what gets in the lib, and what optimizations/options you set, at the cost of now having to maintain a development environment.

Comparing industry  approaches

For example, Cisco with its openH264 library provides both precompiled libraries and build scripts. In their case, using the precompiled library defers H264 royalty issues to them, but that’s another subject. While the libwebrtc project includes build scripts, they are complex use, do not provide a lot of flexibility for modifying the source, and make it difficult to test any modifications.

The great cordova plugin from eFace2Face is using a precompiled libWebRTC (here) (see our post on this too). Pristine.io were among the first one to propose build script to make it easier (see here; more about that later).

Sarandogou/doubango’s webrtc-everywhere plugin for IE and Safari does NOT use automated build scripts, versioning or a standard headers layout, which causes them a lot of problems and slows their progress.

The pristine.io guys put a drawing of what the process is, and noted that, conceptually, there is not a big difference between android and iOS build as to the steps you need to follow. Practically, there is a difference in the tools you used though.

My build process

Here is my build process:


Please also note that I mention testing explicitly and there is a good reason for that, learned the hard way. I will come to it in the next section.

You will see I have a “send to dashboard” step. I mean something slightly different than what people usually refer to as a dashboard. Usually, people want to report the results of the tests to a dashboard to show that a given revision is bug free (as much as possible) and that the corresponding binary can be used in production.

If you have performance tests, a dashboard can also help you spot performance regressions.  In my case here, I also want to use a common public dashboard as a way to publish failing builds on different systems or with different configurations, and still provide full log access to anyone. It makes solving those problem easier. The one asking the question can point to the dashboard, and interesting parties have an easier time looking at the issue or reproducing it. More problems reported, more problems solved, everyone is happy.

Now that we have reviewed the build from source process a bit, let’s talk about what’s wrong with it.

Building from Source Sucks

Writing an entire WebRTC stack is insanely hard. That’s why Google went out and bought GIPS, even though they have a lot of very very good engineers at disposal. Most devs and vendors use an existing stack.

For historical reasons most people use google’s contributed WebRTC stack based on the GIPS media engine, and Google’s libjingle for the network part.

Even Mozilla is using the same media engine, even though they originally went for a Cisco SIP soft phone code as the base (see here, under “list of components”, “SIPCC”) to implement the network part of WebRTC. Since then, Mozilla went on and rewrote almost all that part to support more advanced functionality such as multi-party. However, the point is, their network and signaling is different from Google’s while their media engine is almost identical. Furthermore, Mozilla does not attempt to provide a standalone version of their WebRTC implementation, which makes it hard for developers to make use of it right away.

Before Ericson’s OpenWebRTC announcement in October 2014, the Google standalone version was the only viable option out there for most. OpenWebRTC has advantages on some parts, like hardware support for H.264 on iOS for example, but lacks some features and Windows support that can be a showstopper for some. It is admittedly less mature. It also uses GStreamer, which has its own conventions and own build system (cerbero), which is also tough to learn.

The webrtc.org stack is not available in a precompiled library with an installer. This forces developers to compile WebRTC themselves, which is “not a picnic”.

One needs first to become accustomed to Chrome dev tools which are quite unique, adding a learning step to the process. The code changes quite often (4 commits a day), and the designs are poorly documented at best.

Even if you manage to compile the libs, either by yourself or using resources on the web, it is almost certain that you cannot test it before using it in your app, as most of the bug report, review, build, test and dashboard  infrastructure is under the control of Google by default.

Don’t get me wrong, the bug report and review servers allow anybody to set up an account. What is done with your tickets or suggestions however is up to Google. You can end up with quite frustrating answers. If you dig deep enough in the Chrome infrastructure for developers, you will also find how to replicate their entire infrastructure, but the level you need to have to go through this path, and the amount of effort to get it right is prohibitive for most teams. You want to develop your product, not become a Chrome expert.

Finally, the contributing process at Google allows for bugs to get in. You can actually looks at the logs and see a few “Revert” commits there.

Figure 2: Example of a Revert commit message.

From the reverted commits (see footnote[1]: 107 since January 2015), one can tell that revisions of WebRTC on the HEAD are arbitrarily broken. Here again, this comment might be perceived as discriminatory against Google. It is not. There is nothing wrong there; it always happen for any project, and having only 107 reverts in 6 months while maintaining 4 commits a day is quite an achievement. However, it means that you, as a developer, cannot work with any given commit and expect the library to be stable. You have at least to test it yourself.

My small side project to help

My goals are:

  1. Provide information to the community that is not documented elsewhere, or not consolidated. The blog posts on www.webrtcbydralex.com fulfill this goal.
  2. Learn more about WebRTC
  3. Prepare a course for the local university.
  4. Do something useful of my current “long vacations”

    Yes, vacations in Boracay, Philippines, once voted #2 most beautiful beach in the world by tripadvisor are nice. But I very quickly get that I-need-to-code urge, and they have Wi-Fi on the beach ….

  5. Have fun!

More importantly I would like to lower the barrier of adoption / collaboration / contribution by providing:

  • WebRTC installers that sync with chrome revisions that developers could use blindly out of the box (knowing they’ve been tested)
  • Code for anyone to set up their own build/try/package pipeline, either locally or in the cloud
  • Easy patching and testing framework to enhance Webrtc. As an example, provide an h264 compliant WebRTC lib based on work from Kaiduan Xue, Jesup Randell, and others.
  • More examples and applications for Devs to start from. A first example will be a stand-alone, h264 compliant, appRTCDemo desktop app.
  • Public dashboard for a community to come together, contribute build bots and de duplicate the tests efforts going on at almost every vendor for the base stack.
  • Public dashboard for people to submit their fail builds as a way to ask question on the mailing list and get faster answers.

Example of my dashboard

What we did exactly

We leveraged the CMake / CTest / CDash / CPack suite of tools instead of the usual shell scripts, to automate most of the fetch, configure, build, test, report and package processes.

CMake is cross platform from the ground up, and makes it very easy to deploy such processes. No need to maintain separate or different build scripts for each platform, or build-toolchain.

CTest help you manage your test suites, and is also a client for CDash which handle the dashboard part of the process.

Finally CPack handle packaging your libs with headers and anything else you might want, and support a lot of different packagers with a unified syntax.

This entire suite of tools have also designed in such a way that “a gifted master student could use it and contribute back in a matter of days”, while being so flexible and powerful that big companies like Netflix or Canonical (Ubuntu), use it as the core of their engineering process.

Most of the posts at webrtcbydralex.com will take you through the process, step by step of setting up this solution., in conjunction with a github repository holding all the corresponding source code.

The tool page provides installers for WebRTC for those in a hurry.

{"author": "Alex Gouaillard"}

[1] git log –since=1.week –pretty=oneline | grep Revert | wc -l

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.

How to stop a leak – the WebRTC notifier

Tue, 08/04/2015 - 11:15

The “IP Address Leakage” topic has turned into a public relations issue for WebRTC. It is a fact that the WebRTC API’s can be used to share one’s private IP address(es) without any user consent today. Nefarious websites could potentially use this information to fingerprint individuals who do not want to be tracked. Why is this an issue? Can this be stopped? Can I tell when someone is trying to use WebRTC without my knowledge? We try to cover those questions below along with a walkthrough of a Chrome extension that you can install or modify for yourself that provides a notification if WebRTC is being used without your knowledge.

Creative solutions for leaks

The “IP Leakage” problem Why does WebRTC need a local IP address?

As Reid explained long ago in his An Intro to WebRTC’s NAT/Firewall Problem, peer-to-peer communications cannot occur without providing the peer your IP address. The ICE protocol gathers and checks all the addresses that can be used to communicate to a peer. IP addresses come in a few flavors:

  • host IP address – this is the usually the local LAN IP address and is the one that is being exposed that is causing all the fuss
  • server-reflexive – this is the address outside the web server hosting the page will see
  • relay – this will show-up if you have a TURN server

Why not just use the server reflexive and relay addresses? The host IP address is the If you have 2 peers that want to talk to each other on the same LAN, then the most effective way to do this is to use the host IP address to keep all the traffic local. Otherwise you might end up sending the traffic out to the WAN and then back into the LAN, adding a lot of latency and degrading quality. This is the best address to use for this situation.

Relay addresses require that you setup a TURN server to relay your media. Use of relay means you are no longer truely peer-to-peer. Relay use is typically temporarily to speed connection time or as a last resort when a direct peer-to-peer connection cannot be made. Relay is generally avoided since just passing along a lot of media with no added value is expensive in terms of bandwidth costs and added latency.

This is why the WebRTC designers do not consider the exposure of the host IP address a bug – they built WebRTC on this way on purpose. The challenge is this mechanism can be used in to help with fingerprinting, providing a datapoint on your local addresses that you and your network administrator might not be happy about. The concern over this issue is illustrated by the enormous response on the Dear NY Times, if you’re going to hack people, at least do it cleanly! post last month exemplified this issue.

Why not just ask for when someone wants your local IP address?

When you want to share a video or audio stream, a WebRTC application you use the getUserMedia API. The getUserMedia API requires user consent to access the camera & microphone. However, there is no requirement to do this when using a dataChannel. So why not require consent here?

Let’s look at the use-cases. For a typical WebRTC videochat, user consent is required for the camera permission. The question “do you want to allow this site to access to your camera and microphone” is easy to understand for users. One might require consent here or impose the requirement that a mediastream originating from a camera is attached to the peerconnection.

What about a webinar. Participants might want to join just to listen. No permission is asked currently. Is that bad? Well… is there a permission prompt when you connect to a streaming server to watch a video? No. What is the question that should be asked here?

There are usecases like filetransfer which involve datachannel-only connections without the requirement of local media. Since you can upload the file to any http server without the browser asking for any permission, what is the question to ask here?

Last but not least, there are usecases like peer-to-peer CDNs where visitors of a website form a CDN to reduce the server-load in high-bandwidth resources like videos. While many people claim this is a new use-case enabled by WebRTC, Adobe showed this capability in Flash at MAX 2008 and 2009.

As as side-note, the RTMFP protocol in Flash has leaked the same information since then. It was just alot less obvious to acquire.

There is an additional caveat here. Adobe required user consent before using the user’s upstream to share data — even if peer-to-peer connections did not require consent. Apparently, this consent dialog completely killed the use-case for Flash, at a time when it was still the best way to deliver video. What is the question that the user must answer here? And does the user understand the question?

Photo courtesy flickr user Nisha A under Creative Commons 2.0 What are the browser vendors and the W3C doing about it?

Last week Google created an extension with source code to limit WebRTC to only using public addresses. There have been some technical concerns about breaking applications and degrading performance.
Mozilla is considering similar capabilities for Firefox as discussed here. This should hit the nightly build soon.
The W3C also discussed the issue at their recent meeting in Berlin and will likely address this as part of the unsanctioned tracking group.


How do I know if a site is trying to run WebRTC?

We usually have chrome://webrtc-internals open all the time and occasionally we do see sites using WebRTC in unexpected ways? I wondered if there was an easier way to see if a site was covertly using WebRTC, so I asked Fippo how hard it would be to make an extension to show peerConnection attempts. In usual fashion he had some working sample code back to be in a couple of hours. Let’s take a look…

How the extension works

The extension source code is available on github.
It consists of a content script, snoop.js, which is run at document start (as specified in the manifest.json file) and a background script, background.js
The background script is sitting idly and waiting for messages sent via the Message Passing API.
When receiving a message with the right format, it prints that message to the background page’s console and show the page action.

chrome.runtime.onConnect.addListener(function (channel) { channel.onMessage.addListener(function (message, port) { if (message[0] !== 'WebRTCSnoop') return; console.log(new Date(), message[1], message[2]); chrome.pageAction.show(port.sender.tab.id); }); });

Pretty simple, eh? You can inspect the background page console from the chrome://extensions page.
Let’s look at the content script as well. It consists of three blocks.
The first block does the important work. It overloads the createOffer, createAnswer, setLocalDescription and setRemoteDescription methods of the webkitRTCPeerConnection using a technique also used by adapter.js. Whenever one of these methods is called, it does a window.postMessage which is then triggers a call to the background page.

var inject = '('+function() { // taken from adapter.js, written by me ['createOffer', 'createAnswer', 'setLocalDescription', 'setRemoteDescription'].forEach(function(method) { var nativeMethod = webkitRTCPeerConnection.prototype[method]; webkitRTCPeerConnection.prototype[method] = function() { // TODO: serialize arguments var self = this; this.addEventListener('icecandidate', function() { //console.log('ice candidate', arguments); }, false); window.postMessage(['WebRTCSnoop', window.location.href, method], '*'); return nativeMethod.apply(this, arguments); }; }); }+')();';

The code snippet also shows how to listen for the ice candidates in a way which
The second part, inspired by the WebRTCBlock extension, injects the Javascript into the page by creating a script element, inserting the code and removing it immediately.

var script = document.createElement('script'); script.textContent = inject; (document.head||document.documentElement).appendChild(script); script.parentNode.removeChild(script);

Last but not least, a message channel is set up that listens to the events generated in the first part and send them to the background page:

var channel = chrome.runtime.connect(); window.addEventListener('message', function (event) { if (typeof(event.data) === 'string') return; if (event.data[0] !== 'WebRTCSnoop') return; channel.postMessage(event.data); });

There is a caveat here. The code is not executed for iframes that use the sandbox attribute as described here so it does not detect all usages of WebRTC. That is outside our control. Hey Google… can you fix this?

Ok, but how do I install it?

If you are not familiar with side-loading Chrome extensions, the instructions are easy:

  1. Download the zip from github
  2. Unzip it to a folder of your choice
  3. go to chrome://extensions
  4. Click on “Developer mode”
  5. Then click “Load unpacked extension”
  6. Find the webrtcnotify-master folder that you unzipped

View of the WebRTC Notifier extension

That’s it! If you want to see more details from the extension then it is helpful to load the extension’s console log. To do this just click on “background page” by “Inspect views”.

If you are familiar with Chrome Extensions and have improvement ideas, please contribute to the project!

What do I do if I find an offending site?

No one really knows how big of a problem this is yet, so let’s try to crowd source it. If you find a site that appears to be using WebRTC to gather your IP address in a suspicious way then post a comment about it here. If we get a bunch of these and others in the community confirm then we will create a public list.

With some more time we could potentially combine selenium with this extension to do something like a survey of the most popular 100k websites? We are not trying to start a witch hunt here, but having data to illustrate how big a problem this is would help inform the optimal path forward enormously.

{"authors": ["Chad Hart", "Philipp Hancke"]}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.

Wiresharking Wire

Thu, 07/16/2015 - 21:07

This is the next decode and analysis in Philipp Hancke's Blackbox Exploration series conducted by &yet in collaboration with Google. Please see our previous posts covering WhatsApp, Facebook Messenger and FaceTime for more details on these services and this series. {"editor": "chad"}

Wire is an attempt to reimagine communications for the mobile age. It is a messaging app available for Android, iOS, Mac, and now web that supports audio calls, group messaging and picture sharing. One of it’s often quoted features is the elegant design. As usual, this report will focus on the low level VoIP aspects, and leave the design aspects up for the users to judge.

As part of the series of deconstructions, the full analysis is available for download here, including the wireshark dumps.

Half a year after launching the Wire Android app currently has been downloaded between 100k and 500k times. They also recently launched a web version, powered by WebRTC. Based on this, it seems to be stuck with what Dan York calls the directory dilemma.

What makes Wire more interesting from a technical point of view is that they’re strong proponents of the Opus codec for audio calls. Maybe there is something to learn here…

The wire blog explains some of the problems that they are facing in creating a good audio experience on mobile and wifi networks:

The WiFi and mobile networks we all use are “best effort” — they offer no quality of service guarantees. Devices and apps are all competing for bandwidth. Therefore, real-time communications apps need to be adaptive. Network adaptation means working around parameters such as variable throughput, latency, and variable latency, known as jitter. To do this, we need to measure the variations and adjust to them in as close to real-time as possible.

Given the preference of ISAC over Opus by Facebook Messenger, the question which led to investigating Wire was whether they can show how to successfully use Opus on mobile.


The blog post mentioned above also describes the Wire stackas “a derivate of WebRTC and the IETF standardized Opus codec”. It’s not quite clear what exactly “derivate of WebRTC” means. What we found when looking at Wire was, in comparison to the other apps reviewed, was a more “out of the box” WebRTC app, using the protocols as defined in the standards body.

Comparison with WebRTC  Feature WebRTC/RTCWeb Specifications Wire SDES MUST NOT offer SDES does not offer SDES ICE RFC 5245 RFC 5245 TURN usage used as last resort used as last resort Audio codec Opus or G.711 Opus Video codec H.264 or VP8 none (yet?) Quality of experience

Audio quality did turn out to be top notch, as our unscientific tests on various networks showed.
Testing on simulated 2G and 3G networks showed some adaptivity to the situations there.


The STUN implementation turned out to be based on the BSD-licensed libre by creytiv.com, which is compatible with both the Chrome and Firefox implementations of WebRTC. Binary analysis showed that the webrtc.org media engine along with libopus 1.1 is used for the upper layer.


Wire is company that prides itself on the user privacy protection that comes from having it’s HQ in Switzerland, yet has it’s signalling and TURN servers in Ireland. They get strong kudos for using DTLS-SRTP. To sum it up, Wire offers a case study in how to fully adopt WebRTC for both Web and native mobile.
Related articles across the web

