webrtchacks
The Big Churn – learning from real usage stats (Lasse Lumiaho and Varun Singh)
Losing customers because of issues with your network service is a bad thing. Sure you can gather data and try to prevent, but isn’t it better to prevent issues in the first place? What are the most common pitfalls to look out for? What’s a good benchmark? What WebRTC-specific user experience elements should you spend […]
The post The Big Churn – learning from real usage stats (Lasse Lumiaho and Varun Singh) appeared first on webrtcHacks.
Is Slack’s WebRTC Really Slacking? (Yoshimasa Iwase)
Earlier this month Fippo published a post analyzing Slack’s new WebRTC implementation. He did not have direct access or a team account to do a thorough deep dive – not to mention he is supposed to be taking some off this month. That left many with some open questions? Is there more to the TURN network? […]
The post Is Slack’s WebRTC Really Slacking? (Yoshimasa Iwase) appeared first on webrtcHacks.
Dear Slack: why is your WebRTC so weak?
Dear Slack, There has been quite some buzz this week about you and WebRTC. WebRTC… kind of. Because actually you only do stuff in Chrome and your native apps: I’ve been there. Launching stuff only for Chrome. That was is late 2012. In 2016, you need to have a very good excuse to launch something […]
The post Dear Slack: why is your WebRTC so weak? appeared first on webrtcHacks.
getUserMedia resolutions III – constraints unleashed
Back in October 2013, the relative early days of WebRTC, I set out to get a better understanding of the getUserMedia API and camera constraints in one of my first and most popular posts. I discovered that working with getUserMedia constraints was not all that straight forward. A year later I gave an update after the […]
The post getUserMedia resolutions III – constraints unleashed appeared first on webrtcHacks.
Surviving Mandatory HTTPS in Chrome (Xander Dumaine)
Xander Dumaine provides some strategies and code for dealing with the new secure origin only policy in Chrome 47+ that forces the use of HTTPS.
The post Surviving Mandatory HTTPS in Chrome (Xander Dumaine) appeared first on webrtcHacks.
Shut up! Monitoring audio volume in getUserMedia
A few days back my old friend Chris Koehnke, better known as “Kranky” asked me how hard it would be to implement a wild idea he had to monitor what percentage of the time you spent talking instead of listening on a call when using WebRTC. When I said “one day” that made him wonder whether he could offshore it to save money. Well… good luck!
A week later Kranky showed me some code. Wait, he is writing code? It was not bad – it was using the WebAudio API so going in the right direction. It was enough to prod me to finish writing the app for him.
The audio stream volume sample application from Google calculates the root mean square (RMS) of the audio signal which is extracted from the input stream using a script processor every 200ms. There is a lot of tuning options here of course.
Instead of starting from scratch, I decided to use hark, a small open source module for this task that my coworker Philip Roberts had built in mid-2013 when the WebAudio API became first available.
Instead of the RMS, hark uses the Fast Fourier Transformation to obtain a frequency domain representation of the input signal. Then, hark picks the maximum amplitude as an indication for the volume of the signal. Let’s try this (full code here):
var hark = require('../hark.js') var getUserMedia = require('getusermedia') getUserMedia(function(err, stream) { if (err) throw err var options = {}; var speechEvents = hark(stream, options); speechEvents.on('volume_change', function(volume) { console.log('current volume', volume); }); });On top of this, hark uses a simple speech detection algorithm that considers speech to be started when the maximum amplitude stays above a threshold for a number of milliseconds. Much less complicated than typical voice activity detection algorithms but pretty effective. And easy to use as well, just subscribe to two additional events:
speechEvents.on('speaking', function() { console.log('speaking'); }); speechEvents.on('stopped_speaking', function() { console.log('stopped_speaking'); });Tuning the threshold for accurate speech detection is pretty tricky. So I needed visualization (and just requiring hark only took five minutes so I had plenty of time). Using the awesome Highcharts graph library I quickly added plot bands to the graph I was generating:
With the visualization I could easily see that the speech detection events happened a bit later than I expected since hark requires a certain history over the threshold for the trigger to work (say: 400ms). To adjust for this in the graph had to substract this speech starting to trigger time from my x-axis (now()– 400ms for example).
That graph is still visibile on the more techie variant of the website so if you think the results are not accurate… it might help you figure out what is going on. I am happy with the current behavior.
The percentage of speech then calculated as the sum of the intervals that speech is detected divided by the duration of the call. As a display, a gauge chart is used with three different colors:
- up to 65% speech time: green
- up to 79%: yellow
- more than 80%: red
Adding remote audio to this would be awesome. However, while the WebAudio API is supported for local media streams in Chrome, Firefox and Edge, it is only supported for remote streams in Firefox. Hooking this up with the getStats API (in Chrome) to get the audio level would certainly be possible, but would require calling getStats at a very high frequency to get proper averages.
Check out the app in action at talklessnow and let us know what you think.
{“author”: “Philipp Hancke“}
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post Shut up! Monitoring audio volume in getUserMedia appeared first on webrtcHacks.
OMG WebRTC is tracking me! Or is it?
There has been more noise about WebRTC making it possible to track users. We have covered some of the nefarious uses of WebRTC and look out for it before. After reading a blog post on this topic covering some allegedly new unaddressed issues a week ago I decided to ignore it after some discussion on the mozilla IRC channel. But this has some up on a the twitter-sphere again and Tsahi said ‘ouch’, here are my thoughts.
ClaimsThe blog post (available here) makes a number of claims about how certain Chrome behavior makes fingerprinting easier:
- Chrome started caching certificates for 30 days recently, creating a cookie-like attack surface for privacy
- this allows cross-origin tracking of users
- the incognito mode behavior is inconsistent with respect to this
First, there is a claim that the way Chrome caches certificates changed recently:
In the past, Google Chrome used to generate a new self-signed certificate for every WebRTC PeerConnection. But now (using Chrome 46, or maybe earlier as i did not check) it generates a self-signed certificate which is valid for one month and uses it for all PeerConnections of a particular domain.
The code used to demonstrate this behaviour is rather odd, too. It uses the getStats API to the query the fingerprint, which is also available more easily in the SDP.
Chrome has cached certificates in this way for about two years, this is not real news. One of the reasons for this is that it is rather expensive to generate the current private keys for DTLS, especially on mobile devices. In the future, there will be more control over this behaviour. Neither Firefox nor Edge currently cache certificates.
To be fair, the WebRTC team made a serious blunder here. Until Chrome 45, the certificate was not cleared when cookies were cleared, only when all data was cleared. The bugfix for this only appeared in the Chrome 47 release notes:
Issue 510850 DTLS cert should be cleared when cookies are cleared
Cross-Origin TrackingSo this part is not really news. The second claim made in the blog post is that this enables cross-origin tracking:
To test this go to http://www.kapejod.org/tracking/test.html and to http://kapejod.org/tracking/test.html. Open the network tab of Chrome’s developer console and compare the urls of the requested “tracking.png”. They should contain the same fingerprint, now!
They do. Now, let’s look at this test page:
// make up some random id var transactionId = 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c) {var r = Math.random()*16|0,v=c=='x'?r:r&0x3|0x8;return v.toString(16);}); var fragment = document.createDocumentFragment(); var div = document.createElement("DIV"); div.innerHTML = '<iframe src="http://kapejod.org/tracking/identify.html?'+transactionId+'" width="1" height="1" style="display:none;"/>'; fragment.appendChild(div); document.body.insertBefore(fragment, document.body.childNodes[document.body.childNodes.length - 1]);It includes the URL http://kapejod.org/tracking/identify.html. Let’s also look at the code there as well. It executes the code shown above and logs the fingerprint to the console:
console.log('your fingerprint is: ' + fingerprint);Now why is the fingerprint the same? Well, the iframe is always included from kapejod.org. Which means the Javascript is executed within the context of this origin.
So Chrome can use the persisted fingerprint. As well as any cookies and localStorage data. The attack surface here is no worse than setting a cookie.
Another thing related to this (and I am surprised this has not yet been mentioned) are the deviceIds returned by navigator.mediaDevices.enumerateDevices. Those are also persisted with the same lifetime as cookies. The W3C mediacapture specification has a paragraph about security and privacy considerations on this:
The identifiers for the devices are designed to not be useful for a fingerprint that can track the user between origins, but the number of devices adds to the fingerprint surface. It recommends to treat the per-origin persistent identifier deviceId as other persistent storages (e.g. cookies) are treated.
Again, WebRTC and other HTML5 techniques increase the fingerprint surface. But by design, this is not worse than cookies or equivalent techniques like localStorage.
Incognito ModeLast but not least the blog post makes claims about the incognito mode:
But to make it generate a new one you have to close ALL incognito tabs. Otherwise you can be tracked across multiple domains.
Again, this behaviour is consistent with the incognito mode behaviour for things like localStorage. In both Chrome and Firefox. In incognito mode, open a site, set something in localStorage. Open another tab. Close first tab. Navigate to same site. Check localStorage. Boo!
tl;drThere is no real news here. In Germany, we call this ‘olle kamellen’.
{“author”: “Philipp Hancke“}
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post OMG WebRTC is tracking me! Or is it? appeared first on webrtcHacks.
Are we There Yet? WebRTC standards Q&A with Dan Burnett
If you are new to WebRTC then you have missed out on years of drama in the standards bodies over various issues like SDP and codecs. These standards dictate what vendors must implement so they ultimately dictate the industry roadmap. To get a deep perspective and appreciation of the issues, we like to ask Dan Burnett, W3C editor to comment on where we are at with the standardization process. I caught up with Dan at this year’s IIT Real Time Communications Conference and had the more detailed Q&A with him shortly thereafter.
We asked Dan to comment on recent spec changes, ORTC, the next version of WebRTC, codecs, Apple, when the 1.0 spec might ever be finalized, and a whole lot more.
{“editor”, “chad hart“}
New GovernancewebrtcHacks: Hi Dan. Can you describe some of the recent changes to the W3C WebRTC governance?
Dan: Yes. There was a long-running but productive discussion among the members of the WebRTC Working Group (WG), ORTC Community Group (CG), and the some of the members of the W3C advisory committee – which is the group that officially determines group charters.
As part of the Charter renewal process, we decided that there would be one additional Chair of the WebRTC Working Group – Eric Lagerway of Hookflash who was one of the initiators of ORTC. Also the decision was that the WebRTC WG is the official group where all future standardization work in WebRTC will happen, meaning the ORTC work will gradually fold into that group.
Additionally, the group was chartered to work on another version beyond 1.0 – WebRTC Next Version or WebRTC-NV.
There are 2 requirements on that version:
- There is no requirement that new features introduced in the specification have an SDP equivalent
- WebRTC NV is not a replacement for WebRTC 1.0 – it is an extension. It is expected that all browsers that support WebRTC NV will support 1.0 functionality as well.
One other thing has happened that is not official, but is probably good is that Bernard Aboba from Microsoft has joined the WebRTC 1.0 editing team.
The Next VersionwebrtcHacks: yeah, Bernard mentioned that in the interview I did with him last week. Can you explain WebRTC NV? Why didn’t you just call it 2.0, or 1.1, or whatever?
Dan: I have been working on standards for a long time. I have seen groups spend ridiculous amounts of time deciding on a name for a specification. In this particular case a “1.1” sounds like a minor change from “1.0” while “2.0” sounds like a major change. Some people want a minor change. Some people want a major change. If enough people want different minor changes it will end up being a 2.0 anyway because of the number of changes. The goal was to avoid that disagreement now so that we can move forward,.
webrtcHacks: So what is WebRTC NV then, beyond what you stated earlier about no SDP?
Dan: Nothing is officially decided but I expect that there will continue to be more low-level controls as in ORTC. This is complicated by the fact that new feature proposals are continuing to come in for 1.0. Many of these features are from ORTC.
In the Sapporo meeting coming up, Google will be sharing their idea for what should go into WebRTC-NV when we finally start working on it.
Dan at the IIT-RTC Conference
webrtcHacks: How do you see ORTC influencing the WebRTC spec? Is WebRTC-NV really just ORTC?
Dan: If it had to summarize WebRTC-NV I would say that it is the combination of WebRTC 1.0 and ORTC. It is a requirement that 1.0 applications continue to work in WebRTC-NV implementations. It is not required that ORTC applications work directly in WebRTC-NV.
I believe the ORTC community intends to modify ORTC as necessary to remain consistent with WebRTC as it evolves.
webrtcHacks: Is there an end-date to ORTC-then? When it is mostly merged with WebRTC-NV will it cease to exist?
Dan: I can’t speak for the ORTC group. I have not heard of an end date. You’ll have to ask one of the primary ORTC contributors.
Spec ChangeswebrtcHacks: What are some of changes made to the specs recently. Particularly those that impact the developers out there?
Dan: First I would like to give a little plug for my webrtcstandards.info site where I have been putting exactly that sort of information over the past few months. I will mention some things here, but you can get more details on that site.
webrtcHacks: ok, we’ll give you one plug (laughs)
Dan: One of the biggest and most relevant changes on what we were just talking about is the introduction of the RTCsenders and RTCreceivers. These are objects that allow for both information and more direct control over how tracks are sent over a PeerConnection. Notice as part of this that we have moved from a stream based API to a track based API.
webrtcHacks: And what advantage does the track approach provide?
Dan: It turns out developers want to have more control over exactly how tracks are sent and received. For example being able to specify which codecs are to be used and the parameters used to configure those codecs. They should be able to configure some transport properties as well on a per track basis such as FEC, retransmission, and bandwidth. Because of this it really didn’t make sense to talk about streams as the primary primitive being sent over a PeerConnection since they are really just a collection of tracks.
One of Peter Thatcher ORTC update slide’s showing the differences between the WebRTC and ORTC API. source: IIT-RTC 2015
webrtcHacks: So the others?
Dan: First, on the one we just mentioned – that was a foundational change where we are going to be seeing many other changes later on. Now I’ll talk about the others that are not related to that.
One big change was the API’s have been converted to use ECMAScript Promises. I think I mentioned this last year.
webrtchacks: You did.
Dan: It has happened. It is now in the specifications.
Promises are now the recommended mechanism for WebRTC specifications and for web specifications in general for dealing with asynchronous function calls. Not so much for things that generate multiple events, but definitely for any single asynchronous function call.
This is part of the move of ECMAscript toward truly asynchronous function calls as you can see if you look at some of the thoughts or future versions of ECMAscript.
The original callback based API’s currently still exist but will eventually be deprecated. Developers should start using the Promise versions.
webrtcHacks: I know media capture from the DOM is another one.
Dan: There has been good progress on capturing media directly from media elements such as audio, video and canvas. Developers have had to use hacks up to this point to be able to capture a canvas for example. Maybe they would take snapshots, but that is not the same as a realtime media stream as you would get from a getUserMedia call.
The major changes going into the specification soon are to try to reproduce the resulting media stream as faithfully as possible to what a user would experience from that element. For example, if the user is playing a video and pauses it and then resumes, the resulting stream should show the paused video for the amount of time it was paused and then resume again.
This seems to be what developers are most interested in.
webrtcHacks: can you talk about some of the use cases that are being referenced around this feature?
Dan: Shared whiteboard is probably the best example, but there maybe some instances for training purposes where you want to capture how the user has interacted with existing elements – video or audio.
webrtcHacks: What about screensharing?
Dan: There is good progress happening there as well on the specification. It still has some tricky issues in terms of what apps should be able to request to be shared and what users should have control over. An example of this is Microsoft Powerpoint – if a user has 3 powerpoint documents up – say different presentations for different clients; they are likely to only want to share one one of those presentations – one window of that application. That works great until they go into presentation mode, which is far as the computer is concerned is a different window. So is this a case where the user should decide or is this a case where the application should decide what is shared?
In general the WG believes that the user should have the control, but browsers may have to make special cases for known applications such as Powerpoint so that it just works.
webrtcHacks: How about simulcast?
Dan: At the Seattle meeting there were some strong opinions on how simulcast should work and some proposals. Each time we get to the details the discussions diverge rather than converge. We all want it but we do not agree on how it should be signaled.
TimelineswebrtcHacks: Now for an easier one. When will 1.0 be done?
(laughs)
Dan: I am tempted to give a similar answer as last year.
There are 2 primary specifications. The media capture specification is right now finishing up addressing the comments from its Last Call review which is the wide range review that is required in order to go forward. There aren’t any new features being requested by group members – it’s just cleaning up and fixing.
It probably will be stable within another 6 months.
webrtcHacks: Stable meaning not changing any more?
Dan: Yes – meaning no contentful changes. Only editorial fixes.
Now the WebRTC specification has the problem that new features keep coming in.
werbrtcHacks: Just to clarify – the Media Capture group is the getUserMedia API and when you WebRTC, that means the RTCPeerConnection and DataChannel related API’s?
Dan: Yes.
These are features that have come from ORTC. At each meeting we have tried to finalize the list, but new proposals continue to creep in. Within 6 months we will know whether the chairs have been able to hold the line on the most recent list agreed to in Seattle.
webrtcHacks: So is this why it is taking so long?
Dan: Yes.. The good news about it is that the features that are going in are the most requested ones from ORTC.
IP LeakagewebrtcHacks: The IP leakage issue was a hot topic on webrtcHacks and elsewhere? Many have labeled it as a flaw; other say this behaviour was by design? Can you share the “standards” perspective on this topic and the considerations that were discussed?
Dan: The summary is this – there are 2 problems with IP leakage:
One kind is the leakage of public addresses that the user doesn’t want leaked. This can happen when a user is using a VPN and not all of the traffic is sent over the VPN – a so called split tunnel VPN. This is an issue if the user doesn’t want their non-VPN public address to be revealed. This is not a WebRTC problem; this is a split tunnel VPN problem. That doesn’t mean that people don’t blame the browser vendors even though it’s not their fault (laughs}
Technically any application running on your machine could do the same thing if you’re running a split tunnel VPN. There are extensions to turn off WebRTC for people who are very concerned about this.
The other kind of leakage is leakage of your local IP address. the reason this concerns some people is that it can be used to map the topology of your local network, say within an enterprise. However it turns out that applications can use an XmlHttpRequest to do the same thing. Despite that, the browser vendors are working on ways to turn off the reporting of these local addresses.
There will be more details coming up in an upcoming post on my site.
Dan talking to webrtcHacks guest author Alan Jonhston at the IIT-RTC show
What’s Apple Doing?webrtcHacks: Now the only major browser vendor left is Apple. Can you comment on public participation by Apple?
Dan: It is clear that people from Apple are continue to follow the work, but they still don’t contribute.
webrtcHacks: Do you know if they contribute to other WG more actively.
Dan: Yes, Apple does contribute more actively in other WG within W3C.
CodecswebrtcHacks: Anything new with video codecs now that the market has had some time to react to the decision to include both VP8 & H.264 for browsers? How is the VP9 vs. H.265 and Alliance for Open Media (AOM) discussion changed the discussion?
Dan: The gauntlet has been thrown for the creation of free and open source video codecs. MPEG-LA needs to take notice that the media producers and distributors are serious about coming up with lower cost alternatives. This pressure just continually increases. The AOM is a prime example of that.
webrtcHacks: Has the Alliance for Open Media come up in standards discussion? In the past I know there was discussion of just allowing software codecs that could defined on the fly.
Dan: Codecs still need to be created. The discussions of VP8 vs H.265 and VP9 vs. H.265 are not really technical discussions. They are all about intellectual property because of the cost of licensing the codecs. The issue is not being able to select a codec – the issue is having a codec that you want to choose.
One API change that is just gone in is being able to choose which codec of the browser supported ones to use.
MicrosoftwebrtcHacks: Anything else to add?
Dan: I think we’re finally on a good track in respect to a path forward for ORTC and WebRTC and thus the inclusion of Microsoft as a true and complete WebRTC vendor eventually. We just need the feature inflow from ORTC to stop right now to be able to declare victory and move on.
I think this is evidence that the industry really does want this to happen.
I spoke with a number of people who talk to HTML developer groups and they all agree that even today no more than 50% of the developers have heard of WebRTC – still! It is likely that one reason for that is for many developers a technology isn’t real until it is in Internet Explorer or its successor – Edge.
So having Microsoft fully engaged on a plan that we can all agree on now is a good thing for everyone.
{
“Q&A”:{
“interviewer”:“chad hart“,
“interviewee”:“Dan Burnett“
}
}
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post Are we There Yet? WebRTC standards Q&A with Dan Burnett appeared first on webrtcHacks.
Hello Chrome and Firefox, this is Edge calling
Chrome, Firefox, and Edge are all on the same party line. Image from Pillow Talk (1959)
For the first time, Chrome, Firefox and Edge can “talk” to each other via WebRTC and ORTC. Check the demo on Microsoft’s modern.ie testdrive.
tl;dr: don’t worry, audio works. codec interop issue…
Feature Interoperability Notes ICE yes Edge requires end-of-candidate signaling DTLS yes audio yes using G.722, Opus or G.711 codecs video no standard H.264 is not supported in Edge yet DataChannels no Edge does not support dataChannelsAs a reader of this blog, you probably know what WebRTC is but let me quote this:
WebRTC is a new set of technologies that brings clear crisp voice, sharp high-definition (HD) video and low-delay communication to the web browser.
In order to succeed, a web-based communications platform needs to work across browsers. Thanks to the work and participation of the W3C and IETF communities in developing the platform, Chrome and Firefox can now communicate by using standard technologies such as the Opus and VP8 codecs for audio and video, DTLS-SRTP for encryption, and ICE for networking.
This description is taken from the early-2013 Chromium blog post that announced interoperability between Chrome and Firefox. And now Edge?
Codecs…So we have interoperability – for audio calls. It is just audio. No video interoperability yet. Now this is just an issue of all vendors implementing at least one common video codec:
- Edge currently implements a Microsoft variant of H264 called H264UC which adds some features like SVC
- Adding H264 is work in progress
- While there is a VP9 decoder for playing videos, that is not usable for ORTC so don’t get too excited
- See Bernard’s comments for more information
- Chrome implements VP8; H264 is work in progress
- Firefox implements VP8 and H264
Audio interoperability is currently using G.722 instead of Opus because Edge still prefers Silk and G.722 over Opus.
APIsBut wait, how can those browsers talk if they do not agree on APIs?
Well, I implemented the PeerConnection API on top of ORTC. The gory details can be found here as part of a pull request for adapter.js. It has undergone a quite critical review and improved as a result of that. This process also showed some issues in the ORTC specification. While there has always been the assumption that it would be possible to implement the PeerConnection API using the lower-level ORTC API, nobody had actually done it.
The functionality provided is limited. More than a single audio and video track has not been tested and, since this is using an SDP similar to what is specified in the Unified Plan draft would likely not be interoperable with Chrome. But this is sufficient for quite a number of applications that are simple enough not to benefit from ORTC natively.
SDP!Using this Javascript implementation, Edge will generate something that is close enough to the SDP used by the PeerConnection API:
v=0 o=thisisadapterortc 8169639915646943137 2 IN IP4 127.0.0.1 s=- t=0 0 m=audio 9 UDP/TLS/RTP/SAVPF 104 9 106 0 103 8 97 13 118 101 c=IN IP4 0.0.0.0 a=rtcp:9 IN IP4 0.0.0.0 a=rtpmap:104 SILK/16000 a=rtcp-fb:104 x-message app send:dsh recv:dsh a=rtpmap:9 G722/8000 a=rtcp-fb:9 x-message app send:dsh recv:dsh a=rtpmap:106 OPUS/48000/2 a=rtcp-fb:106 x-message app send:dsh recv:dsh a=rtpmap:0 PCMU/8000 a=rtcp-fb:0 x-message app send:dsh recv:dsh a=rtpmap:103 SILK/8000 a=rtcp-fb:103 x-message app send:dsh recv:dsh a=rtpmap:8 PCMA/8000 a=rtcp-fb:8 x-message app send:dsh recv:dsh a=rtpmap:97 RED/8000 a=rtpmap:13 CN/8000 a=rtpmap:118 CN/16000 a=rtpmap:101 telephone-event/8000 a=rtcp-mux a=ice-ufrag:lMRF a=ice-pwd:NR15fT4U6wHaOKa0ivn64MtQ a=setup:actpass a=fingerprint:sha-256 6A:D8:7D:05:1A:ED:DB:BD:6A:60:1A:BC:15:70:D1:6C:A1:D9:00:79:E5:5C:56:15:73:80:E2:82:9D:B9:FB:69 a=mid:nbiwo5l60z a=sendrecv a=msid:7E4272C7-2B6C-49BD-BF7A-A3E7B8DD44F5 D2945771-D7B4-4915-AC29-CEA9EC51EC9E a=ssrc:1001 msid:7E4272C7-2B6C-49BD-BF7A-A3E7B8DD44F5 D2945771-D7B4-4915-AC29-CEA9EC51EC9E a=ssrc:1001 cname:3s6hzpz1jjCheck the anatomy of a WebRTC SDP post to find out what each of these lines mean.
This allows quite a number of the WebRTC PeerConnection samples to work in Edge, just like many of the getUserMedia samples already work.
With that working, the next big challenge was browser interoperability. Would this underspecified blob of text be good enough to be accepted by Chrome and Firefox?
It turned out to be good enough. After adding ICE candidates on both sides the ice connection and DTLS states soon changed to completed and connected. Yay. In Chrome at least.
Firefox did not work because of trivial mistakes that took a while to figure out. But then, it just worked as well.
As far as I am concerned this shows the hard part, making ICE and DTLS interoperable, is solved. The rest is something for codec folks to work out. Not my area of interest
{“author”: “Philipp Hancke“}
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post Hello Chrome and Firefox, this is Edge calling appeared first on webrtcHacks.
Microsoft’s ORTC Edge for WebRTC – Q&A with Bernard Aboba
We have been waiting a long time for Microsoft to add WebRTC to its browser portfolio. That day finally came last month when Microsoft announced its new Windows 10 Edge browser had ORTC. This certainly does not immediately address the Internet Explorer population and ORTC is still new to many (which is why we cover it often). On the positive side, interoperability between Edge, Chrome, and Firefox on the audio side was proven within days by multiple parties. Much of ORTC is finding its way into the WebRTC 1.0 specification and browser implementations.
I was with Bernard Aboba, Microsoft’s WebRTC lead at the IIT Real Time Communications Conference (IIT-RTC) and asked him for an interview to cover the Edge implementation and where Microsoft is headed. The conversation below has been edited for readability and technical accuracy. The full, unedited audio recording is also available below if you would rather listen than read. Warning – we recorded our casual conversation in an open room off my notebook microphone, so please do not expect high production value.
https://webrtchacks.com/wp-content/uploads/2015/10/Bernard-Aboba-QA.mp3We cover what exactly is in Edge ORTC implementation, why ORTC in the first place, the roadmap, and much more.
You can view the IIT-RTC ORTC Update presentation slides given by Bernard, Robin Raymond of Hookflash, and Peter Thatcher of Google here.
{“editor”, “chad hart“}
Micosoft’s Edge is hungry for WebRTC
Intro to BernardwebrtcHacks: Hi Bernard. To start out, can you please describe your role at Microsoft and the projects you’ve been working on? Can you give a little bit of background about your long time involvement in WebRTC Standards, ORTC, and also your new W3C responsibilities?
Bernard: I’m a Principal Architect at Skype within Microsoft, and I work on the Edge ORTC project primarily, but also help out other groups within the company that are interested in WebRTC. I have been involved in ORTC since the very beginning as one of the co-authors of ORTC, and very recently, signed up as an Editor of WebRTC 1.0.
webrtcHacks: That’s concurrent with some of the agreement around merging more of ORTC into WebRTC going forward. Is that accurate?
Bernard: One of the reasons I signed up was that I found that I was having to file WebRTC 1.0 API issues and follow them. Because many of the remaining bugs in ORTC related to WebRTC 1.0, and of course we wanted the object models to be synced between WebRTC 1.0 and ORTC, I had to review pull requests for WebRTC 1.0 anyway, and reflect the changes within ORTC. Since I had to be aware of WebRTC 1.0 Issues and Pull Requests to manage the ORTC and Pull Requests, I might as well be an editor of WebRTC 1.0.
Bernard Aboba of Microsoft and Robin Raymond of Hookflash discussing ORTC at the IIT Real Time Communications Conference (IIT-RTC)
What’s in EdgewebrtcHacks: Then I guess we’ll move on to Edge then. Edge and Edge Preview are out there with varying forms of WebRTC. Can you walk through a little bit of that?
Bernard: Just also to clarify for people, Edge ORTC is in what’s called Windows Insider Preview. Windows Insider Preview builds are only available to people who specifically sign up to receive them. If you sign up for the Windows Insider Preview program and install the most recent build 10547, then you will have access to the ORTC API in Edge. In terms of what is in it, the audio is relatively complete. We have:
- G.711,
- G.722,
- Opus,
- Comfort Noise,
- DTMF, as well as the
- SILK codec.
Then on the video side, we have an implementation of H.264/SVC, which does both simulcast and scalable video coding, and as well as forward error correction (FEC), known as H.264UC. I should also mention, we support RED and forward error correction for audio as well.
That’s what’s you will find in the Edge ORTC API within Windows Insider Preview, as well as support for “half-trickle” ICE, DTLS 1.0, etc.
webrtcHacks: I’ll include the slide from your presentation for everyone to reference because there’s a lot of stuff to go through. I do have a couple of questions on a few things for follow up. One was support on the video side of things for. I think you mentioned external FEC and also talked about other aspects of robustness, such as retransmission?
Bernard’s slide from IIT-RTC 2015 showing Edge’s ORTC coverage
Bernard: Currently in Edge ORTC Insider Preview, we do not support generic NACK or re-transmission. We do support external forward error correction (FEC), both for audio and video. Within Opus as well as SILK we do not support internal FEC, but you can configure RED with FEC externally. Also, we do not support internal Discontinuous Operation (DTX) within Opus or SILK, but you can configure Comfort Noise (CN) for use with audio codec, including Opus and SILK.
Video interoperabilitywebrtcHacks: Then could you explain H.264 UC? The majority of the people out there that aren’t familiar with the old Lync or Skype for Business as it is now called.
Bernard: Basically, H.264 UC supports spatial simulcast along with temporal scalability in H.264/SVC, handled automatically “under the covers”. These are basically the same technologies that are in Hangouts with VP8. While the ORTC API offers detailed control of things like simulcast and SVC, in many cases, the developer just basically wants the stack to do the right thing, such as figuring out how many layers it can send. That’s what H.264UC does. It can adapt to network conditions by dropping or adding simulcast streams or temporal layers, based on the bandwidth it feels is available. Currently, the H.264UC codec is only supported by Edge.
webrtcHacks: Is the base layer H.264?
Bernard: Yes, the base layer is H.264 but RFC 6190 specifies additional NAL Unit types for SVC, so that an implementation that only understands the base layer would not be able to understand extension layers. Also, our implementation of RFC 6190 sends layers using distinct SSRCs, which is known as Multiple RTP stream Single Transport (MRST). In contrast, VP8 uses Single RTP stream Single Transport (SRST).
We are going to work on an implementation of H.264/AVC in order to interoperate. As specified in RFC 6184 and RFC 6190, H.264/AVC and H.264/SVC have different codec names.
webrtcHacks: For Skype, at least, in the architecture that was published, they showed a gateway. Would you expect other people to do similar gateways?
Bernard: Once we support H.264/AVC, developers should be able to configure that codec, and use it to communicate with other browsers supporting H.264/AVC. That would be the preferred way to interoperate peer-to-peer. There might be some conferencing scenarios where it might make sense to configure H.264UC and have the SFU or mixer strip off layers to speak to H.264/AVC-only browsers, but that would require a centralized conferencing server or media relay that could handle that.
RoadmapwebrtcHacks: What can you can you say about the future roadmap? Is it basically what’s on the dev.modern.ie page?
Bernard: In general, people should look at the dev.modern.ie web page for status, because that has the most up to date. In fact, I often learn about things from the page. As I mentioned, the Screen Sharing and Media Recorder specifications are now under consideration, along with features that are in preview or are under development. The website breaks down each feature. If the feature is in Preview, then you can get access to it via the Windows Insider Preview. If it is under development, this means that it is not yet in Preview. Features that are supported have already been released, so if you have Windows 10, you should already have access to them.
Slide from Bernard’s IIT-RTC 2015 presentation covering What’s in Edge
In terms of our roadmap, we made a roadmap announcement in October 2014 and are still executing on things such as H.264, which we have not delivered yet. Supporting interoperable H.264 is about more than just providing an encoder/decoder, which we have already delivered as part of H.264UC. The IETF RTCWEB Video specification provides guidance on what is needed to provide interoperable H.264/AVC, but that is not all that a developer needs to implement – there are aspects that are not yet specified, such as bandwidth estimation and congestion control.
Beyond the codec bitstream, RTP transport and congestion control there are other aspects as well. For example, I mentioned robustness features such as Forward Error Correction and Retransmission. A Flexible FEC draft is under development in IETF which will handle burst loss (distances greater than one). That is important for robust operation on wireless networks, for both audio and video. Today we have internal FEC within Opus, but that does not handle burst loss well.
webrtcHacks: Do you see Edge pushing the boundaries in this area?
Bernard: One of the areas where Edge ORTC has advanced the state of the art is in external forward error (FEC) correction as well as in statistics. Enabling external FEC to handle burst loss, provides additional robustness for both audio and video. We also support additional statistics which provide information on burst loss and FEC operation. What we have found is that burst loss is a fact of life on wireless networks, so that being able to measure this and to address it is important. The end result of this work is that Edge should be more robust than existing implementations with respect to burst loss (at least with larger RTTs where retransmission would not be available). We can also provide burst loss metrics, which other implementations cannot currently do. I should also mention that there are metrics have been developed in the XRBLOCK WG to address issues of burst loss, concealment, error correction, etc.
Why ORTC?webrtcHacks: You have been a long time advocate for ORTC. Maybe you can summarize why ORTC was a good fit for Edge? Why did you start with that spec versus something else? What does it enable you to do now as a result?
Bernard: Some of the advantages of ORTC were indeed advantages, but in implementation we found there were also other advantages we didn’t think of at the time.
InteroperabilityBernard: ORTC doesn’t have SDP [like WebRTC 1.0]; the irony is ORTC allowed us to get to WebRTC 1.0 compatibility and interoperability faster than we would have otherwise. If you look at the adapter.js, it’s actually interesting to read that code- the actual code for Edge is actually smaller than for some of the other browsers. One might think that’s weird – why would it take less adaptation for Edge than for anything else? Are we really more 1.0 compatible than 1.0? The answer is, to some respects, we are, because we don’t generate SDP than somebody needs to parse and reformat. It certainly saves a lot of development to not have to write that code and have control in JavaScript, and also be easy to modify in case people find bugs in it.
The irony is ORTC allowed us to get to WebRTC 1.0 compatibility and interoperability faster than we would have otherwise
Connection State DetailsThe other thing we found about ORTC that we didn’t quite understand early on was it gives you detailed status of each of the transports- each of your ICE transports. Particularly when you’re dealing with situations like multiple interfaces, you actually get information about failure conditions that you don’t get out of WebRTC 1.0.
It’s interesting to look at 1.0 – one of the reasons that I think people will find the objects interesting in 1.0 is because you actually need that kind of diagnostic information. The current connection state [in the current WebRTC] is not really enough – it’s not even clear what it means. It says in the spec that it’s about ICE, but it really combines ICE and DTLS. With the object model, you know exactly what ICE transport went down or if DTLS is in some weird state. Actually for diagnostics, details of the connection state is actually pretty important. It’s one of the most frequently requested statistical things. That was a benefit we didn’t anticipate, that we found is pretty valuable and will be coming into 1.0.
Many simple scenariosBernard: Then there were the simple scenarios. Everyone said, “I don’t need ORTC because I don’t do scalable video coding and simulcast” Do you ever do hold? Do you ever do changing owners of codecs? All illustrations that Peter [Thatcher] showed in his WebRTC 1.0 presentation. The answer is, a lot of those things are, in fact, common, and were not possible in 1.0. There is a lot of fairly basic benefits that you get as well.
How is Edge’s Media Engine builtwebrtcHacks: In building and putting this in the Edge, you had a few different media engines you could choose from. You had the Skype media engine and a Lync media – you combine them or go and build a new one. Can you reveal the Edge media architecture and how you put that together?
Bernard: What we chose to do in Skype is move to a unified media engine. What we’ve done is, we’ve added WebRTC capabilities into that media engine. That’s a good thing because, for example, things like RTCP MUX and things like BUNDLE are now part of the Skype media engine so we can use them. The idea was to produce something that was unified and would have all the capabilities in one. It took a little bit longer to do it that way, but the benefit is that we get to produce a standardized compliant browser and we also get to use those technologies internally. Now we do not have 3 or 4 different stacks that we would have to rationalize later.
right now, our focus is very much on video, and trying to get that more solid, and more interoperable
Also, I should mention that one thing that is interesting about the way we work is we produce stacks that are both client and server capable. We don’t just produce pure client code that wouldn’t, for example, be able to handle load. Some of those things can go into back-end components as well. That is also true for DTLS and all that. Whether or not we use all those things in Skype is another issue, but it is part of the repertoire for apps.
More than EdgewebrtcHacks: Is there anything else that’s not on dev.modern.ie that is exposed that a developer would care about? Any NuGet packages with these API’s for example?
Bernard: There is a couple of things. dev.modern.ie does not cover non-browser things in Windows platform. For example, currently we support DTLS 1.0. We do want to support 1.2, because there’s additional cipher suites that are important. For example, the Elliptic Curve stuff we’re seeing going into all the browsers. I think Mozilla already has it, or Chrome has it, or if they don’t, they will very soon. That is actually very important. Elliptic Curve turned out to be more than just a cipher suite issue – the time and effort it takes to generate more secure certificates is large. For RSA-2048 you can actually block the UI thread if you thread the object. Anyway, those are very important things that we don’t cover on dev.modern.ie, but those are the things we obviously have to do.
There’s a lot of work and a lot of thinking that’s been going on in the IETF if relating to ICE and how to be better for mobile scenarios. Some of that I don’t think is converged yet, but there’s a new ICE working group. Some of that is in the ortc-lib implementation yet. Robin [Raymond] likes to be on the cutting edge so he has done basically the first implementation of a lot of those new technologies. That’s something, I think is of general interest – particularly as ORTC moves to mobile.
I should mention, by the way, that the Edge Insider Preview was only for desktop. It does not run on Windows Phone just to clarify that.
webrtcHacks: Any plans for embedding the Edge ORTC engine as a IE plugin?
Bernard: An external plugin or something?
webrtcHacks: Yeah, or a Microsoft plugin for IE that would implement ORTC.
Bernard: Basically at this point, IE is frozen technology. All the new features, if you look on the website, they all go into Edge. That’s what we’ve been developing for. I never say Microsoft will never do anything, but currently that’s not the thinking. Windows 10 for consumers is a free upgrade. Hopefully, people will take advantage of that and get all the new stuff, including Edge.
Is there an @MSEdgeDev post on the relationship between this and InPrivate? pic.twitter.com/bbu0Mdz0Yd
— Eric Lawrence (@ericlaw) September 22, 2015
A setting discovered in Internet Explorer that appears to address the IP Address Leakage issue. Validating ORTCwebrtcHacks: Is there anything you want to share?
Bernard: I do want to clarify a little bit, I think adapter.js is a very important thing because it validates our original idea that essentially WebRTC 1.0 could be built into the JavaScript layer with ORTC.
webrtcHacks: And that happened pretty quick – with Fippo‘s help. Really quick.
Bernard: Fippo has written all the pull requests. We’re paying a lot of attention to the bugs he’s finding. Obviously, he’s finding bugs in Edge, which hopefully we’ll fix, but he’s also finding spec bugs. It really helps make sure that this compatibility that we’ve promised is actually real. It’s a very interesting process to actually reduce that to code so that it’s not just a vague promise. It has to be demonstrated in software.
Of course what we’ve done is currently with audio. We know that video is more complicated, particularly as you start adding lots and lots of codecs to get that level of compatibility. I wouldn’t say that when Fippo is down with audio that it will be the last word. I think we’ll have to pay even more attention to interoperability stuff in the video cases. It will be interesting because video is a lot more complicated.
adapter.js is a very important thing because it validates our original idea that essentially WebRTC 1.0 could be built into the JavaScript layer with ORTC.
What does the Microsoft WebRTC team look likewebrtcHacks: Can you comment on how big the time is that’s working on ORTC in Edge? You have a lot of moving pieces in different aspects …
Bernard: There’s the people in Edge. There’s the people in Skype. In the Windows system there’s the people on the S-channel team that worked on the DTLS. There’s people all over – for example, the VP9 work that we talked about, was not done by either Skype or the conventional Edge people. It’s the whole Windows Media team. I don’t really know how to get my hands around this, because if you look at all the code we’re using, it’s written by probably, I don’t know, hundreds and hundreds of people.
webrtcHacks: And you need to pull it together for purposes of WebRTC/ORTC, is that right?
Bernard: Yeah. We have to pull it together, but there’s a lot there. There’s a lot of teams. There will probably be more teams going forward. People say, “Why don’t you have the datachannel”? The dataChannel isn’t something that would be in Skype’s specific area of expertise. That’s a transfer protocol, it should be really written by people who are experts in transfer protocol, which isn’t either Edge or Skype. It’s not some decision that was made by either of our groups not to do it. We have to find somebody who proves that they can do that work, to take ownership of that.
Feedback pleasewebrtcHacks: Any final comments?
Bernard: No. I just encourage people to download the preview, run it, file bugs, and let us know what you think. You can actually can vote on the website for new features, which is cool.
We do listen to the input. WebRTC is an expanding thing. There’s a ton of things you can do – there’s all that stuff on dev.modern.ie site and then there’s internal improvement. Getting a sense of priority is what’s most important to people, is not that easy, because there’s so much that you could possibly focus on. I’d say right now, our focus is very much on video, and trying to get that more solid, and more interoperable, at least for the moment. We can walk and chew gum at the same time. We can do more than just one thing. Conceivably, especially when you look at IE and other teams.
webrtcHacks: This is great and very insightful. I think it will be a big help to all the developers out there. Thanks!
{
“Q&A”:{
“interviewer”:“chad hart“,
“interviewee”:“Bernard Aboba“
}
}
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post Microsoft’s ORTC Edge for WebRTC – Q&A with Bernard Aboba appeared first on webrtcHacks.
Traffic Encryption
So I talked about Skype and Viber at KrankyGeek two weeks ago. Watch the video on youtube or take a look at the slides. No “reports” or packet dumps to publish this time, mostly because it is very hard to draw conclusions from the results.
The VoIP services we have looked at so far which use the RTP protocol for transferring media. RTP uses a packet header which is not encrypted and contains a number of attributes such as the payload type (identifying the codec used), a synchronization source (which identifies the source of the stream), a sequence number and a timestamp. This allows routers to identify RTP packets and prioritize them. This also allows someone monitoring all network traffic (“Pervasive Monitoring“) to easily identify VoIP traffic. Or someone wiretapping your internet connection.
Skype and Viber encrypt all packets. Does that make them them less susceptible for this kind of attack?
Bear with me, the answer is going to be very technical. tl;dr:
- it is still pretty easy to determine that you are making a call.
- it is also pretty easy to tell if you muted your microphone.
- it is pretty apparent whether this is a videochat.
Not expecting to find much, I ran a standard set of scenarios with Skype of Android and iOS similar to those used in the Whatsapp analysis.
A first look did not show much. Luckily, when analyzing WhatsApp I had developed some tooling to deal with RTP. I modified those tools, removing the RTP parser, and was greeted with these graphs:
While the bitrate alone (blue is my ipad3 with a 172.16. ip address, black is my old Android phone) is not very interesting, the packet rate of exactly 50 packets was interesting. Also, the packet length distribution was similar to Opus. As I figured out later from the integrated debugging (on the Android device, this must be too technical for iOS users!), this was the Silk codec. In fact, if you account for some overhead the black distribution matches what we saw from WhatsApp earlier and what is now known to be Opus at 16khz or 8khz.
So the encryption did not change the traffic pattern. Nor does it hide the fact that a call is happening.
When muting the audio on one device, one can even see regular spikes in the traffic every then seconds. Supposedly, those are keepalive packets.
Let’s look at some video traffic. Note the two distinct distributions in the third graph? Let’s suppose that the left one is audio and everything else is video. This works well enough looking at the last graph which shows the ‘audio’ traffic in green and orange respectively.
The accuracy could possibly be improved a little by looking at the number of packets which is pretty much constant for audio.
In RTP, we would use the synchronization source (SSRC) field from the header to accomplish this. But that just makes things easier for routers.
Last but not least relays. When testing this from Europe, I was surprised to see my traffic being routed through Redmond, Washington.
This is quite interesting in comparison to the first graph. The packet rate stays roughly the same, but the bitrate doubles to 100 kilobits/second. That is quite some overhead compared to the standard TURN protocol which has negligible overhead. The packet length distribution is shifted to the right and there are a couple of very large packets. Latency was probably higher but this is very hard to measure.
While I got some pretty interesting results from Skype, Viber turned out to harder. Thanks to the tooling it took now only a matter of seconds to discover that, like Whatsapp, it uses a relay server to help with call establishment:
Blue traffic is captured locally before it is sent to the peer, the black and green traffic is received from the remote end. The traffic shown in black almost vanishes after a couple of initial spikes (which contain very large packets at a low frequency). Visualizations of this kind are a lot easier to understand than the packet dumps captured with Wireshark.
And for the sake of completeness, muting audio on both sides showed keepalive traffic, visible as tiny period spikes in this graph:
VoIP security is hard. And this not really news, attacks on encrypted VoIP traffic have been known for quite a while, see e.g. this paper from 2008 and the more recent ‘Phonotactic Reconstruction’ attacks.
The fact that RTP does not encrypt the header data makes it slightly easier to identify, but it seems that a determined attacker could have come to the same conclusions about the encrypted traffic of services like Skype. Keep that in mind when talking about the security of your service. Also, keep the story of the ECB penguin in mind.
Or, as Emil Ivov said about the security of peer-to-peer: “Unless there is a cable going between your computer and the other guys computer and you can see the entire cable, then you’re probably in for a rude awakening”.
{“author”: “Philipp Hancke“}
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post Traffic Encryption appeared first on webrtcHacks.
First steps with ORTC
ORTC support in Edge has been announced today. A while back, we saw this on twitter:
Windows Insider Preview build 10525 is now available for PCs: http://t.co/zeXQJocgLs This release lays groundwork for ORTC in Microsoft Edge
— Microsoft Edge Dev (@MSEdgeDev) August 18, 2015
“This release [build 10525] lays the groundwork for ORTC” was quite an understatement. It was considered experimental and while the implementation still differs from the specification (which is still work in progress) slightly, it already worked and as a developer you can get familiar with how ORTC works and how it is different from the RTCPeerConnection API.
If you want to test this, please use builds newer than 10547. Join the Windows Insider Program to get them and make sure you’re on the fast ring.
The approach taken differs from the RTCPeerConnection way of giving you a blob that you exchange as this WebRTC PC1 sample shows quite well. It’s more about giving you the building blocks.
In ORTC, you have to incrementally build up things. Let’s walk through the code (available on github):
Setting up a Peer to Peer connection var gatherer1 = new RTCIceGatherer(iceOptions); var transport1 = new RTCIceTransport(gatherer1); var dtls1 = new RTCDtlsTransport(transport1);There are three elements on the transport side:
* the RTCIceGatherer which gathers ICE candidates to be sent to the peer,
* the RTCIceTransport where you add the candidates from the peer,
* the DtlsTransport which is sitting on top of the ICE transport and deals with encryption.
As in the peerConnection API, you exchange the candidates:
// Exchange ICE candidates. gatherer1.onlocalcandidate = function (evt) { console.log('1 -> 2', evt.candidate); transport2.addRemoteCandidate(evt.candidate); }; gatherer2.onlocalcandidate = function (evt) { console.log('2 -> 1', evt.candidate); transport1.addRemoteCandidate(evt.candidate); };Also, you need to exchange the ICE parameters (usernameFragment and password) and start the ICE transport:
transport1.start(gatherer1, gatherer2.getLocalParameters(), 'controlling'); transport1.onicestatechange = function() { console.log('ICE transport 1 state change', transport1.state); };This is done with SDP in the PeerConnection API. One side needs to be controlling, the other is controlled.
You also need to start the DTLS transport with the remote fingerprint and hash algorithm:
dtls1.start(dtls2.getLocalParameters()); dtls1.ondtlsstatechange = function() { console.log('DTLS transport 1 state change', dtls1.state); };Once this is done, you can see the candidates being exchanged and the ICE and DTLS state changes on both sides.
Cool. Now what?
Sending a MediaStream track over the connectionLet’s send a MediaStream track. First, we acquire it using the promise-based navigator.mediaDevices.getUserMedia API and attach it to the local video element.
// call getUserMedia to get a MediaStream. navigator.mediaDevices.getUserMedia({video: true}) .then(function(stream) { document.getElementById('localVideo').srcObject = stream;Next, we determine the send and receive parameters. This is where the PeerConnection API does the “offer/answer” magic.
Since our sending capabilities match the receiving capabilities, there is little we need to do here.
Some black magic is still involved, check the specification for the gory details.
Then, we start the RtpReceiver with those parameters:
// Start the RtpReceiver to receive the track. receiver = new RTCRtpReceiver(dtls2, 'video'); receiver.receive(params); var remoteStream = new MediaStream(); remoteStream.addTrack(receiver.track); document.getElementById('remoteVideo').srcObject = remoteStream;Note that the Edge implementation is slightly different from the current ORTC specification here since you need to specify the media type as second argument when creating the RtpReceiver.
We create a stream to contain the track and attach it to the remote video element.
Last but not least, let’s send the video track we got:
That’s it. It gets slightly more complicated when you have to deal with multiple tracks, and have to actually negotiate capabilities in order to interop between Chrome and Edge. But that’s a longer story…
{“author”: “Philipp Hancke“}
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post First steps with ORTC appeared first on webrtcHacks.
Reacting to React Native for native WebRTC apps (Alexey Aylarov)
It turns out people like their smartphone apps, so that native mobile is pretty important. For WebRTC that usually leads to venturing outside of JavaScript into the world of C++/Swift for iOS and Java for Android. You can try hybrid applications (see our post on this), but many modern web apps applications often use JavaScript frameworks like AngularJS, Backbone.js, Ember.js, or others and those don’t always mesh well with these hybrid app environments.
Can you have it all? Facebook is trying with React which includes the ReactJS framework and React Native for iOS and now Android too. There has been a lot of positive fanfare with this new framework, but will it help WebRTC developers? To find out I asked VoxImplant’s Alexey Aylarov to give us a walkthrough of using React Native for a native iOS app with WebRTC.
{“editor”: “chad hart“}
If you haven’t heard about ReactJS or React Native then I can recommend to check them out. They already have a big influence on a web development and started having influence on mobile app development with React Native release for iOS and an Android version just released. It sounds familiar, doesn’t it? We’ve heard the same about WebRTC, since it changes the way web and mobile developers implement real-time communication in their apps. So what is React Native after all?
“React Native enables you to build world-class application experiences on native platforms using a consistent developer experience based on JavaScript and React. The focus of React Native is on developer efficiency across all the platforms you care about — learn once, write anywhere. Facebook uses React Native in multiple production apps and will continue investing in React Native.”
https://facebook.github.io/react-native/
I can simplify it to “one of the best ways for web/javascript developers to build native mobile apps, using familiar tools like Javascript, NodeJS, etc.”. If you are connected to WebRTC world (like me) the first idea that comes to your mind when you play with React Native is “adding WebRTC there should be a big thing, how can I make it?” and then from React Native documentation you’ll find out that there is a way to create your own Native Modules:
Sometimes an app needs access to platform API, and React Native doesn’t have a corresponding module yet. Maybe you want to reuse some existing Objective-C, Swift or C++ code without having to reimplement it in JavaScript, or write some high performance, multi-threaded code such as for image processing, a database, or any number of advanced extensions.
That’s exactly what we needed! Our WebRTC module in this case is a low-level library that provides high-level Javascript API for React Native developers. Another good thing about React Native is that it’s an open source framework and you can find a lot of required info on GitHub. It’s very useful, since React Native is still very young and it’s not easy to find the details about native module development. You can always reach out to folks using Twitter (yes, it works! Look for #reactnative or https://twitter.com/Vjeux) or join their IRC channel to ask your questions, but checking examples from GitHub is a good option.
React Native’s module architectureNative modules can have C/C++ , Objective-C, and Javascript code. This means you can put the native WebRTC libraries, signaling and some other libs written in C/C++ as a low-level part of your module, implement video element rendering in Objective-C and offer Javascript/JSX API for react native developers.
Technically low-level and high-level code is divided in the following way:
- you create Objective-C class that extends React’s RCTBridgeModule class and
- use RCT_EXPORT_METHOD to let Javascript code work with it.
While in Objective-C you can interact with the OS, C/C++ libs and even create iOS widgets. The Ready-to-use native module(s) can be distributed in number of different ways, the easiest one being via a npm package.
WebRTC module APIWe’ve been implementing a React Native module for our own platform and already knew which of our API functions we would provide to Javascript. Creating a WebRTC module that is independent of signaling that can be used by any WebRTC developer is a much more complicated problem.
We can divide the process into few parts:
Integration with WebRTCSince webRTC does not limit developers how to discover user names and network connection information, this signaling can be done in multiple ways. Google’s WebRTC implementation known as libwebrtc. libwebrtc has a built-in library called libjingle that provides “signaling” functionality.
There are 3 ways how libwebrtc can be used to establish a communication:
- libjingle with built-in signaling
This is the simplest one leveraging libjingle. In this case signaling is implemented in libjingle via XMPP protocol.
- Your own signaling
This is a more complicated one with signaling on the application side. In this case you need to implement SDP and ICE candidates exchange and pass data to webrtc. One of popular methods is to use some SIP library for signaling.
- Application-controlled RTC
For the hardcore you can avoid using signaling altogether This means the application should take care of all RTP session params: RTP/RTCP ports, audio/video codecs, codec params, etc. Example of this type of integration can be found in WebRTC sources in WebRTCDemo app for Objective-C (src/talk/app/webrtc)
Adding SignalingWe used the 2nd approach in our implementation. Here are some code examples for making/receiving calls (C++):
- First of all, create Peer Connection factory:
peerConnectionFactory = webrtc::CreatePeerConnectionFactory(…); - Then creating local stream (we can set if it will be voice or video call):
localStream = peerConnectionFactory->CreateLocalMediaStream(uniqueLabel); localStream->AddTrack(audioTrack); if (withVideo) localStream->AddTrack(videoTrack); - Creating PeerConnection (set STUN/TURN servers list, if you are going to use it)
webrtc::PeerConnectionInterface::IceServers servers; webrtc::CreateSessionDescriptionObserver* peerConnectionObserver; peerConnection = peerConnectionFactory ->CreatePeerConnection(servers, …., peerConnectionObserver); - Adding local stream to Peer Connection:
peerConnection->AddStream(localStream); - Creating SDP:
webrtc::CreateSessionDescriptionObserver* sdpObserver;- For outbound call:
- Creating SDP:
peerConnection->CreateOffer(sdpObserver); - Waiting for SDP from remote peer (via signaling) and pass it to Peer Connection:
peerConnection->SetRemoteDescription(remoteSDP);
- Creating SDP:
- In case of inbound call we need to set remote SDP before setting local SDP:
peerConnection->SetRemoteDescription(remoteSDP); peerConnection->CreateAnswer(sdpObserver);
- For outbound call:
- Waiting for events and sending SDP and ICE-candidate info to remote party (via signaling):
webrtc::CreateSessionDescriptionObserver::OnSuccess(webrtc::SessionDescriptionInterface* desc) { if (this->outgoing) sendOffer(); else sendAnswer(); } webrtc::CreateSessionDescriptionObserver::OnIceCandidate(const webrtc::IceCandidateInterface* candidate) { sendIceCandidateInfo(candidate); } - Waiting for ICE candidates info from remote peer and when it arrives pass it to Peer Connection:
peerConnection->AddIceCandidate(candidate); - After a successful ICE exchange (if everything is ok) connection/call is established.
First of all we need to create react-native module (https://facebook.github.io/react-native/docs/native-modules-ios.html) , where we describe the API and implement audio/video calling using WebRTC (Obj-C , iOS):
@interface YourVoipModule () { } @end @implementation YourVoipModule RCT_EXPORT_MODULE(); RCT_EXPORT_METHOD(createCall: (NSString *) to withVideo: (BOOL) video ResponseCallback: (RCTResponseSenderBlock)callback) { NSString * callId = [createVoipCall: to withVideo:video]; callback(@[callId]); }If want to to support video calling we will need an additional component to show the local camera (Preview) or remote video stream (RemoteView):
@interface YourRendererView : RCTView @endInitialization and deinitialization can be implemented in the following methods:
- (void)removeFromSuperview { [videoTrack removeRenderer:self]; [super removeFromSuperview]; } - (void)didMoveToSuperview { [super didMoveToSuperview]; [videoTrack addRenderer:self]; }You can find the code examples on our GitHub page – just swap the references to our signaling with your own. We found examples very useful while developing the module, so hopefully they will help you to understand the whole idea much faster.
DemoThe end result can look like as follows:
Closing ThoughtsWhen WebRTC community started working on the standard one of the main ideas was to make real-time communications simpler for web developers and provide developers with a convenient Javascript API for real time communications. React Native has similar goal, it lets web developers build native apps using Javascript. In our opinion bringing WebRTC to the set of available React Native APIs makes a lot of sense – web app developers will be able to build their RTC apps for mobile platforms. Guys behind React Native has just released it for Android at Scale conference, so we will update the article or write a new one about building the module compatible with Android as soon as we know all the details.
{“author”, “Alexey Aylarov”}
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post Reacting to React Native for native WebRTC apps (Alexey Aylarov) appeared first on webrtcHacks.
Gaming with the WebRTC DataChannel – A Walkthrough with Arin Sime
The fact that you can use WebRTC to implement a secure, reliable, and standards based peer-to-peer network is a huge deal that is often overlooked. We have been notably light on the DataChannel here at webrtcHacks, so I asked Arin Sime if would be interested in providing one of his great walkthrough’s on this topic. He put together a very practical example of a multi-player game. You make recognize Arin from RealTime Weekly or from his company Agility Feat or his new webRTC.ventures brand. Check out this excellent step-by-step guide below and start lightening the load on your servers and reducing message latency with the DataChannel.
{“editor”: “chad hart“}
WebRTC is “all about video chat”, right? I’ve been guilty of saying things like that myself when explaining WebRTC to colleagues and clients, but it’s a drastic oversimplification that should never go beyond your first explanation of WebRTC to someone.
Of course, there’s more to WebRTC than just video chat. WebRTC allows for peer-to-peer video, audio, and data channels. The Data channels are a distinct part of that architecture and often forgotten in the excitement of seeing your video pop up in the browser.
Being able to exchange data directly between two browsers, without any sort of intermediary web socket server, is very useful. The Data Channel carries the same advantages of WebRTC video and audio: it’s fully peer-to-peer and encrypted. This means Data Channels are useful for things like text chat applications, file transfers, P2P file exchanges, gaming, and more.a
In this post, I’m going to show you the basics of how to setup and use a WebRTC Data Channel.
First, let’s review the architecture of a WebRTC application.
You have to setup signaling code in order to establish the peer to peer connection between two peers. Once the signaling is complete (which takes place over a 3rd party server), then you have a Peer to Peer (P2P) connection between two users which can contain video and audio streams, and a data channel.
The signaling for both processes is very similar, except that if you are building a Data Channel only application then you don’t need to call GetUserMedia or exchange streams with the other peer.
Data Channel SecurityThere are a couple of other differences about using the DataChannel. The most obvious one is that users don’t need to give you their permission in order to establish a Data Channel over an RTCPeerConnection object. That’s different than video and audio, which will prompt the browser to ask the user for permissions to turn on their camera and microphone.
Although it’s generating some debate right now, data channels don’t require explicit permission from users. That makes it similar to a web socket connection, which can be used in a website without the knowledge of users.
The Data Channel can be used for many different things. The most common examples are for implementing text chat to go with your video chat. If you’re already setting up an RTCPeerConnection for video chat, then you might as well use the same connection to supply a Data Channel for text chat instead of setting up a different socket connection for text chat.
Likewise, you can use the Data Channel for transferring files directly between your peers in the RTCPeerConnection. This is nicer than a normal socket style connection because just like WebRTC video, the Data Channel is completely peer-to-peer and encrypted in transit. So your file transfer is more secure than in other architectures.
The game of “Memory”Don’t limit your Data Channel imagination by these common examples though. In this post, I’m going to show you how to use the Data Channel to build a very simple two-player game. You can use the Data Channel to transfer any type of data you like between two browsers, so in this case we’ll use it to send commands and data between two players of a game you might remember called “Memory”.
In the game of memory, you can flip over a card, and then flip a second card, and if they match, you win that round and the cards stay face up. If they didn’t match, you put both face down again, and it’s the next person’s turn. By trying to remember what you and your opponents have been flipping, and where those cards were, you can win the game by correctly flipping the most pairs.
Photo Credit: http://www.vwmin.org/memory-game.html
Adam Khoury already built a javascript implementation of this game for a single player, and you can read his tutorial on how to build the game Memory for a single player. I won’t explain the logic of his code for building the game, what I’m going to do instead is build on top of his code with a very simple WebRTC Data Channel implementation to keep the card flipping in synch across two browsers.
You can see my complete code on GitHub, and below I’m going to show you the relevant segments.
In this example view of my modified Memory game, the user has correctly flipped pairs of F, D, and B, so those cards will stay face up. The cards K and L were just flipped and did not match, so they will go back face down.
Setting up the Data Channel configurationI started with a simple NodeJS application to serve up my code, and I added in Express to create a simple visual layer. My project structure looks like this:
The important files for you to look at are datachannel.js (where the majority of the WebRTC logic is), memorygame.js (where Adam’s game javascript is, and which I have modified slightly to accommodate the Data Channel communications), and index.ejs, which contains a very lightweight presentation layer.
In datachannel.js, I have included some logic to setup the Data Channel. Let’s take a look at that:
//Signaling Code Setup var configuration = { 'iceServers': [{ 'url': 'stun:stun.l.google.com:19302' }] }; var rtcPeerConn; var dataChannelOptions = { ordered: false, //no guaranteed delivery, unreliable but faster maxRetransmitTime: 1000, //milliseconds }; var dataChannel;The configuration variable is what we pass into the RTCPeerConnection object, and we’re using a public STUN server from Google, which you often see used in WebRTC demos online. Google is kind enough to let people use this for demos, but remember that it is not suitable for public use and if you are building a real app for production use, you should look into setting up your own servers or using a commercial service like Xirsys to provide production ready STUN and TURN signaling for you.
The next set of options we define are the data channel options. You can choose for “ordered” to be either true or false.
When you specify “ordered: true”, then you are specifying that you want a Reliable Data Channel. That means that the packets are guaranteed to all arrive in the correct order, without any loss, otherwise the whole transaction will fail. This is a good idea for applications where there is significant burden if packets are occasionally lost due to a poor connection. However, it can slow down your application a little bit.
We’ve set ordered to false, which means we are okay with an Unreliable Data Channel. Our commands are not guaranteed to all arrive, but they probably will unless we are experiencing poor connectivity. Unless you take the Memory game very seriously and have money on the line, it’s probably not a big deal if you have to click twice. Unreliable data channels are a little faster.
Finally, we set a maxRetransmitTime before the Data Channel will fail and give up on that packet. Alternatively, we could have specified a number for maxRetransmits, but we can’t specify both constraints together.
Those are the most common options for a data channel, but you can also specify the protocol if you want something other than the default SCTP, and you can set negotiated to true if you want to keep WebRTC from setting up a data channel on the other side. If you choose to do that, then you might also want to supply your own id for the data channel. Typically you won’t need to set any of these options, leave them at their defaults by not including them in the configuration variable.
Set up your own Signaling layerThe next section of code may be different based on your favorite options, but I have chosen to use express.io in my project, which is a socket.io package for node that integrates nicely with the express templating engine.
So the next bit of code is how I’m using socket.io to signal to any others on the web page that I am here and ready to play a game. Again, none of this is specified by WebRTC. You can choose to kick off the WebRTC signaling process in a different way.
io = io.connect(); io.emit('ready', {"signal_room": SIGNAL_ROOM}); //Send a first signaling message to anyone listening //In other apps this would be on a button click, we are just doing it on page load io.emit('signal',{"type":"user_here", "message":"Would you like to play a game?", "room":SIGNAL_ROOM});In the next segment of datachannel.js, I’ve setup the event handler for when a different visitor to the site sends out a socket.io message that they are ready to play.
io.on('signaling_message', function(data) { //Setup the RTC Peer Connection object if (!rtcPeerConn) startSignaling(); if (data.type != "user_here") { var message = JSON.parse(data.message); if (message.sdp) { rtcPeerConn.setRemoteDescription(new RTCSessionDescription(message.sdp), function () { // if we received an offer, we need to answer if (rtcPeerConn.remoteDescription.type == 'offer') { rtcPeerConn.createAnswer(sendLocalDesc, logError); } }, logError); } else { rtcPeerConn.addIceCandidate(new RTCIceCandidate(message.candidate)); } } });There are several things going on here. The first one to be executed is that if the rtcPeerConn object has not been initialized yet, then we call a local function to start the signaling process. So when Visitor 2 announces themselves as here, they will cause Visitor 1 to receive that message and start the signaling process.
If the type of socket.io message is not “user_here”, which is something I arbitrarily defined in my socket.io layer and not part of WebRTC signaling, then the code goes into a couple of WebRTC specific signaling scenarios – handling an SDP “offer” that was sent and crafting the “answer” to send back, as well as handling ICE candidates that were sent.
The WebRTC part of SignalingFor a more detailed discussion of WebRTC signaling, I refer you to http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/”>Sam Dutton’s HTML5 Rocks tutorial, which is what my signaling code here is based on.
For completeness’ sake, I’m including below the remainder of the signaling code, including the startSignaling method referred to previously.
function startSignaling() { rtcPeerConn = new webkitRTCPeerConnection(configuration, null); dataChannel = rtcPeerConn.createDataChannel('textMessages', dataChannelOptions); dataChannel.onopen = dataChannelStateChanged; rtcPeerConn.ondatachannel = receiveDataChannel; // send any ice candidates to the other peer rtcPeerConn.onicecandidate = function (evt) { if (evt.candidate) io.emit('signal',{"type":"ice candidate", "message": JSON.stringify({ 'candidate': evt.candidate }), "room":SIGNAL_ROOM}); }; // let the 'negotiationneeded' event trigger offer generation rtcPeerConn.onnegotiationneeded = function () { rtcPeerConn.createOffer(sendLocalDesc, logError); } } function sendLocalDesc(desc) { rtcPeerConn.setLocalDescription(desc, function () { io.emit('signal',{"type":"SDP", "message": JSON.stringify({ 'sdp': rtcPeerConn.localDescription }), "room":SIGNAL_ROOM}); }, logError); }This code handles setting up the event handlers on the RTCPeerConnection object for dealing with ICE candidates to establish the Peer to Peer connection.
Adding DataChannel options to RTCPeerConnectionThis blog post is focused on the DataChannel more than the signaling process, so the following lines in the above code are the most important thing for us to discuss here:
rtcPeerConn = new webkitRTCPeerConnection(configuration, null); dataChannel = rtcPeerConn.createDataChannel('textMessages', dataChannelOptions); dataChannel.onopen = dataChannelStateChanged; rtcPeerConn.ondatachannel = receiveDataChannel;In this code what you are seeing is that after an RTCPeerConnection object is created, we take a couple extra steps that are not needed in the more common WebRTC video chat use case.
First we ask the rtcPeerConn to also create a DataChannel, which I arbitrarily named ‘textMessages’, and I passed in those dataChannelOptions we defined previously.
Setting up Message Event HandlersThen we just define where to send two important Data Channel events: onopen and ondatachannel. These do basically what the names imply, so let’s look at those two events.
function dataChannelStateChanged() { if (dataChannel.readyState === 'open') { dataChannel.onmessage = receiveDataChannelMessage; } } function receiveDataChannel(event) { dataChannel = event.channel; dataChannel.onmessage = receiveDataChannelMessage; }
When the data channel is opened, we’ve told the RTCPeerConnection to call dataChannelStateChanged, which in turn tells the dataChannel to call another method we’ve defined, receiveDataChannelMessage, whenever a data channel message is received.
The receiveDataChannel method gets called when we receive a data channel from our peer, so that both parties have a reference to the same data channel. Here again, we are also setting the onmessage event of the data channel to call our method receiveDataChannelMessage method.
Receiving a Data Channel MessageSo let’s look at that method for receiving a Data Channel message:
function receiveDataChannelMessage(event) { if (event.data.split(" ")[0] == "memoryFlipTile") { var tileToFlip = event.data.split(" ")[1]; displayMessage("Flipping tile " + tileToFlip); var tile = document.querySelector("#" + tileToFlip); var index = tileToFlip.split("_")[1]; var tile_value = memory_array[index]; flipTheTile(tile,tile_value); } else if (event.data.split(" ")[0] == "newBoard") { displayMessage("Setting up new board"); memory_array = event.data.split(" ")[1].split(","); newBoard(); } }Depending on your application, this method might just print out a chat message to the screen. You can send any characters you want over the data channel, so how you parse and process them on the receiving end is up to you.
In our case, we’re sending a couple of specific commands about flipping tiles over the data channel. So my implementation is parsing out the string on spaces, and assuming the first item in the string is the command itself.
If the command is “memoryFlipTile”, then this is the command to flip the same tile on our screen that our peer just flipped on their screen.
If the command is “newBoard”, then that is the command from our peer to setup a new board on our screen with all the cards face down. The peer is also sending us a stringified array of values to go on each card so that our boards match. We split that back into an array and save it to a local variable.
Controlling the Memory game to flip tilesThe actual flipTheTile and newBoard methods that are called reside in the memorygame.js file, which is essentially the same code that we’ve modified from Adam.
I’m not going to step through all of Adam’s code to explain how he built the single player Memory game in javascript, but I do want to highlight two places where I refactored it to accommodate two players.
In memorygame.js, the following function tells the DataChannel to let our peer know which card to flip, as well as flips the card on our own screen:
function memoryFlipTile(tile,val){ dataChannel.send("memoryFlipTile " + tile.id); flipTheTile(tile,val); }Notice how simple it is to send a message to our peers using the data channel – just call the send method and pass any string you want. A more sophisticated example might send well formatted XML or JSON in a message, in any format you specify. In my case, I just send a command followed by the id of the tile to flip, with a space between.
Setting up a new game boardIn Adam’s single player memory game, a new board is setup whenever you load the page. In my two player adaptation, I decided to have a new board triggered by a button click instead:
var setupBoard = document.querySelector("#setupBoard"); setupBoard.addEventListener('click', function(ev){ memory_array.memory_tile_shuffle(); newBoard(); dataChannel.send("newBoard " + memory_array.toString()); ev.preventDefault(); }, false);
In this case, the only important thing to notice is that I’ve defined a “newBoard” string to send over the data channel, and in this case I want to send a stringified version of the array containing the values to put behind each card.
Next steps to make the game betterThat’s really all there is to it! There’s a lot more we could do to make this a better game. I haven’t built in any logic to limit the game to two players, keep score by players, or enforce the turns between the players. But it’s enough to show you the basic idea behind using the WebRTC data channel to send commands in a multiplayer game.
The nice thing about building a game like this that uses the WebRTC data channel is it’s very scalable. All my website had to do is help the two players get a connection setup, and after that, all the data they need to exchange with each other is done over an encrypted peer-to-peer channel and it won’t burden my web server at all.
A completed multiplayer game using the Data ChannelHere’s a video showing the game in action:
Demo of a simple two player game using the WebRTC Data Channel video
As I hope this example shows you, the hard part of WebRTC data channels is really just in the signaling and configuration, and that’s not too hard. Once you have the data channel setup, sending messages back and forth is very simple. You can send messages that are as simple or complex as you like.
How are you using the Data Channel? What challenges have you run into? Feel free to contact me on Twitter or through my site to share your experiences too!
{“author”: “arin sime“}
Sources:
http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/
http://www.w3.org/TR/webrtc/#simple-peer-to-peer-example
https://www.developphp.com/video/JavaScript/Memory-Game-Programming-Tutorial
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post Gaming with the WebRTC DataChannel – A Walkthrough with Arin Sime appeared first on webrtcHacks.
Making WebRTC source building not suck (Alex Gouaillard)
One of WebRTC’s benefits is that the source to it is all open source. Building WebRTC from source provides you the ultimate flexibility to do what you want with the code, but it is also crazy difficult for all but the small few VoIP stack developers who have been dedicated to doing this for years. What benefit does the open source code provide if you can’t figure out how to build from it?
As WebRTC matures into mobile, native desktop apps, and now into embedded devices as part of the Internet of Things, working with the lower-level source code is becoming increasingly common.
Frequent webrtcHacks guest poster Dr. Alex Gouaillard has been trying to make this easier. Below he provides a review of the building WebRTC from source, exposing many of the gears WebRTC developers take for granted when they leverage a browser or someone else’s SDK. Alex also reviews the issues complexities associated with this process and introduces the open source make process he developed to help ease the process.
{“editor”: “chad hart“}
Building WebRTC from source sometimes feels like engineering the impossible. Photo courtesy of Andrew Lipson.
Building WebRTC from sourceMost of the audience for WebRTC (and webrtcHacks) is made of web developers, JavaScript and cloud ninjas that might not be less familiar with handling external libraries from source. That process is painful. Let’s make it clear, it’s painful for everybody – not only web devs.
What are the cases where you need to build from source?
- Writing a native app – mobile, desktop, IoT,..)
- Some kind of server (gateway, media, ….)
- Plugin (either for IE, Safari, Cordova, …)
You basically need to build from source anytime you can’t leverage a browser, WebRTC enabled node.js (for the sake of discussion), SDK’s someone how put together for you, or anything else.
These main cases are illustrated below in the context of a comprehensive offering.
Figure 1: map of a WebRTC solution
Usually, the project owners provide precompiled and tested libraries that you can use yourself (stable) and the most recent version that is compiled but not tested for those who are brave.
Pre-compiled libraries are usable out of the box, but do not allow you to modify anything. Sometimes there are build scripts that help you recompile the libs yourselves. This provides more flexibility in terms of what gets in the lib, and what optimizations/options you set, at the cost of now having to maintain a development environment.
Comparing industry approachesFor example, Cisco with its openH264 library provides both precompiled libraries and build scripts. In their case, using the precompiled library defers H264 royalty issues to them, but that’s another subject. While the libwebrtc project includes build scripts, they are complex use, do not provide a lot of flexibility for modifying the source, and make it difficult to test any modifications.
The great cordova plugin from eFace2Face is using a precompiled libWebRTC (here) (see our post on this too). Pristine.io were among the first one to propose build script to make it easier (see here; more about that later).
Sarandogou/doubango’s webrtc-everywhere plugin for IE and Safari does NOT use automated build scripts, versioning or a standard headers layout, which causes them a lot of problems and slows their progress.
The pristine.io guys put a drawing of what the process is, and noted that, conceptually, there is not a big difference between android and iOS build as to the steps you need to follow. Practically, there is a difference in the tools you used though.
My build processHere is my build process:
Please also note that I mention testing explicitly and there is a good reason for that, learned the hard way. I will come to it in the next section.
You will see I have a “send to dashboard” step. I mean something slightly different than what people usually refer to as a dashboard. Usually, people want to report the results of the tests to a dashboard to show that a given revision is bug free (as much as possible) and that the corresponding binary can be used in production.
If you have performance tests, a dashboard can also help you spot performance regressions. In my case here, I also want to use a common public dashboard as a way to publish failing builds on different systems or with different configurations, and still provide full log access to anyone. It makes solving those problem easier. The one asking the question can point to the dashboard, and interesting parties have an easier time looking at the issue or reproducing it. More problems reported, more problems solved, everyone is happy.
Now that we have reviewed the build from source process a bit, let’s talk about what’s wrong with it.
Building from Source SucksWriting an entire WebRTC stack is insanely hard. That’s why Google went out and bought GIPS, even though they have a lot of very very good engineers at disposal. Most devs and vendors use an existing stack.
For historical reasons most people use google’s contributed WebRTC stack based on the GIPS media engine, and Google’s libjingle for the network part.
Even Mozilla is using the same media engine, even though they originally went for a Cisco SIP soft phone code as the base (see here, under “list of components”, “SIPCC”) to implement the network part of WebRTC. Since then, Mozilla went on and rewrote almost all that part to support more advanced functionality such as multi-party. However, the point is, their network and signaling is different from Google’s while their media engine is almost identical. Furthermore, Mozilla does not attempt to provide a standalone version of their WebRTC implementation, which makes it hard for developers to make use of it right away.
Before Ericson’s OpenWebRTC announcement in October 2014, the Google standalone version was the only viable option out there for most. OpenWebRTC has advantages on some parts, like hardware support for H.264 on iOS for example, but lacks some features and Windows support that can be a showstopper for some. It is admittedly less mature. It also uses GStreamer, which has its own conventions and own build system (cerbero), which is also tough to learn.
The webrtc.org stack is not available in a precompiled library with an installer. This forces developers to compile WebRTC themselves, which is “not a picnic”.
One needs first to become accustomed to Chrome dev tools which are quite unique, adding a learning step to the process. The code changes quite often (4 commits a day), and the designs are poorly documented at best.
Even if you manage to compile the libs, either by yourself or using resources on the web, it is almost certain that you cannot test it before using it in your app, as most of the bug report, review, build, test and dashboard infrastructure is under the control of Google by default.
Don’t get me wrong, the bug report and review servers allow anybody to set up an account. What is done with your tickets or suggestions however is up to Google. You can end up with quite frustrating answers. If you dig deep enough in the Chrome infrastructure for developers, you will also find how to replicate their entire infrastructure, but the level you need to have to go through this path, and the amount of effort to get it right is prohibitive for most teams. You want to develop your product, not become a Chrome expert.
Finally, the contributing process at Google allows for bugs to get in. You can actually looks at the logs and see a few “Revert” commits there.
Figure 2: Example of a Revert commit message.
From the reverted commits (see footnote[1]: 107 since January 2015), one can tell that revisions of WebRTC on the HEAD are arbitrarily broken. Here again, this comment might be perceived as discriminatory against Google. It is not. There is nothing wrong there; it always happen for any project, and having only 107 reverts in 6 months while maintaining 4 commits a day is quite an achievement. However, it means that you, as a developer, cannot work with any given commit and expect the library to be stable. You have at least to test it yourself.
My small side project to helpMy goals are:
- Provide information to the community that is not documented elsewhere, or not consolidated. The blog posts on www.webrtcbydralex.com fulfill this goal.
- Learn more about WebRTC
- Prepare a course for the local university.
- Do something useful of my current “long vacations”
Yes, vacations in Boracay, Philippines, once voted #2 most beautiful beach in the world by tripadvisor are nice. But I very quickly get that I-need-to-code urge, and they have Wi-Fi on the beach ….
- Have fun!
More importantly I would like to lower the barrier of adoption / collaboration / contribution by providing:
- WebRTC installers that sync with chrome revisions that developers could use blindly out of the box (knowing they’ve been tested)
- Code for anyone to set up their own build/try/package pipeline, either locally or in the cloud
- Easy patching and testing framework to enhance Webrtc. As an example, provide an h264 compliant WebRTC lib based on work from Kaiduan Xue, Jesup Randell, and others.
- More examples and applications for Devs to start from. A first example will be a stand-alone, h264 compliant, appRTCDemo desktop app.
- Public dashboard for a community to come together, contribute build bots and de duplicate the tests efforts going on at almost every vendor for the base stack.
- Public dashboard for people to submit their fail builds as a way to ask question on the mailing list and get faster answers.
Example of my dashboard
What we did exactlyWe leveraged the CMake / CTest / CDash / CPack suite of tools instead of the usual shell scripts, to automate most of the fetch, configure, build, test, report and package processes.
CMake is cross platform from the ground up, and makes it very easy to deploy such processes. No need to maintain separate or different build scripts for each platform, or build-toolchain.
CTest help you manage your test suites, and is also a client for CDash which handle the dashboard part of the process.
Finally CPack handle packaging your libs with headers and anything else you might want, and support a lot of different packagers with a unified syntax.
This entire suite of tools have also designed in such a way that “a gifted master student could use it and contribute back in a matter of days”, while being so flexible and powerful that big companies like Netflix or Canonical (Ubuntu), use it as the core of their engineering process.
Most of the posts at webrtcbydralex.com will take you through the process, step by step of setting up this solution., in conjunction with a github repository holding all the corresponding source code.
The tool page provides installers for WebRTC for those in a hurry.
{“author”: “Alex Gouaillard“}
[1] git log –since=1.week –pretty=oneline | grep Revert | wc -l
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post Making WebRTC source building not suck (Alex Gouaillard) appeared first on webrtcHacks.
How to stop a leak – the WebRTC notifier
The “IP Address Leakage” topic has turned into a public relations issue for WebRTC. It is a fact that the WebRTC API’s can be used to share one’s private IP address(es) without any user consent today. Nefarious websites could potentially use this information to fingerprint individuals who do not want to be tracked. Why is this an issue? Can this be stopped? Can I tell when someone is trying to use WebRTC without my knowledge? We try to cover those questions below along with a walkthrough of a Chrome extension that you can install or modify for yourself that provides a notification if WebRTC is being used without your knowledge.
Creative solutions for leaks
The “IP Leakage” problem Why does WebRTC need a local IP address?As Reid explained long ago in his An Intro to WebRTC’s NAT/Firewall Problem, peer-to-peer communications cannot occur without providing the peer your IP address. The ICE protocol gathers and checks all the addresses that can be used to communicate to a peer. IP addresses come in a few flavors:
- host IP address – this is the usually the local LAN IP address and is the one that is being exposed that is causing all the fuss
- server-reflexive – this is the address outside the web server hosting the page will see
- relay – this will show-up if you have a TURN server
Why not just use the server reflexive and relay addresses? The host IP address is the If you have 2 peers that want to talk to each other on the same LAN, then the most effective way to do this is to use the host IP address to keep all the traffic local. Otherwise you might end up sending the traffic out to the WAN and then back into the LAN, adding a lot of latency and degrading quality. This is the best address to use for this situation.
Relay addresses require that you setup a TURN server to relay your media. Use of relay means you are no longer truely peer-to-peer. Relay use is typically temporarily to speed connection time or as a last resort when a direct peer-to-peer connection cannot be made. Relay is generally avoided since just passing along a lot of media with no added value is expensive in terms of bandwidth costs and added latency.
This is why the WebRTC designers do not consider the exposure of the host IP address a bug – they built WebRTC on this way on purpose. The challenge is this mechanism can be used in to help with fingerprinting, providing a datapoint on your local addresses that you and your network administrator might not be happy about. The concern over this issue is illustrated by the enormous response on the Dear NY Times, if you’re going to hack people, at least do it cleanly! post last month exemplified this issue.
Why not just ask for when someone wants your local IP address?When you want to share a video or audio stream, a WebRTC application you use the getUserMedia API. The getUserMedia API requires user consent to access the camera & microphone. However, there is no requirement to do this when using a dataChannel. So why not require consent here?
Let’s look at the use-cases. For a typical WebRTC videochat, user consent is required for the camera permission. The question “do you want to allow this site to access to your camera and microphone” is easy to understand for users. One might require consent here or impose the requirement that a mediastream originating from a camera is attached to the peerconnection.
What about a webinar. Participants might want to join just to listen. No permission is asked currently. Is that bad? Well… is there a permission prompt when you connect to a streaming server to watch a video? No. What is the question that should be asked here?
There are usecases like filetransfer which involve datachannel-only connections without the requirement of local media. Since you can upload the file to any http server without the browser asking for any permission, what is the question to ask here?
Last but not least, there are usecases like peer-to-peer CDNs where visitors of a website form a CDN to reduce the server-load in high-bandwidth resources like videos. While many people claim this is a new use-case enabled by WebRTC, Adobe showed this capability in Flash at MAX 2008 and 2009.
As as side-note, the RTMFP protocol in Flash has leaked the same information since then. It was just alot less obvious to acquire.
There is an additional caveat here. Adobe required user consent before using the user’s upstream to share data — even if peer-to-peer connections did not require consent. Apparently, this consent dialog completely killed the use-case for Flash, at a time when it was still the best way to deliver video. What is the question that the user must answer here? And does the user understand the question?
Photo courtesy flickr user Nisha A under Creative Commons 2.0 What are the browser vendors and the W3C doing about it?Last week Google created an extension with source code to limit WebRTC to only using public addresses. There have been some technical concerns about breaking applications and degrading performance.
Mozilla is considering similar capabilities for Firefox as discussed here. This should hit the nightly build soon.
The W3C also discussed the issue at their recent meeting in Berlin and will likely address this as part of the unsanctioned tracking group.
How do I know if a site is trying to run WebRTC?
We usually have chrome://webrtc-internals open all the time and occasionally we do see sites using WebRTC in unexpected ways? I wondered if there was an easier way to see if a site was covertly using WebRTC, so I asked Fippo how hard it would be to make an extension to show peerConnection attempts. In usual fashion he had some working sample code back to be in a couple of hours. Let’s take a look…
How the extension worksThe extension source code is available on github.
It consists of a content script, snoop.js, which is run at document start (as specified in the manifest.json file) and a background script, background.js
The background script is sitting idly and waiting for messages sent via the Message Passing API.
When receiving a message with the right format, it prints that message to the background page’s console and show the page action.
Pretty simple, eh? You can inspect the background page console from the chrome://extensions page.
Let’s look at the content script as well. It consists of three blocks.
The first block does the important work. It overloads the createOffer, createAnswer, setLocalDescription and setRemoteDescription methods of the webkitRTCPeerConnection using a technique also used by adapter.js. Whenever one of these methods is called, it does a window.postMessage which is then triggers a call to the background page.
The code snippet also shows how to listen for the ice candidates in a way which
The second part, inspired by the WebRTCBlock extension, injects the Javascript into the page by creating a script element, inserting the code and removing it immediately.
Last but not least, a message channel is set up that listens to the events generated in the first part and send them to the background page:
var channel = chrome.runtime.connect(); window.addEventListener('message', function (event) { if (typeof(event.data) === 'string') return; if (event.data[0] !== 'WebRTCSnoop') return; channel.postMessage(event.data); });There is a caveat here. The code is not executed for iframes that use the sandbox attribute as described here so it does not detect all usages of WebRTC. That is outside our control. Hey Google… can you fix this?
Ok, but how do I install it?If you are not familiar with side-loading Chrome extensions, the instructions are easy:
- Download the zip from github
- Unzip it to a folder of your choice
- go to chrome://extensions
- Click on “Developer mode”
- Then click “Load unpacked extension”
- Find the webrtcnotify-master folder that you unzipped
View of the WebRTC Notifier extension
That’s it! If you want to see more details from the extension then it is helpful to load the extension’s console log. To do this just click on “background page” by “Inspect views”.
If you are familiar with Chrome Extensions and have improvement ideas, please contribute to the project!
What do I do if I find an offending site?No one really knows how big of a problem this is yet, so let’s try to crowd source it. If you find a site that appears to be using WebRTC to gather your IP address in a suspicious way then post a comment about it here. If we get a bunch of these and others in the community confirm then we will create a public list.
With some more time we could potentially combine selenium with this extension to do something like a survey of the most popular 100k websites? We are not trying to start a witch hunt here, but having data to illustrate how big a problem this is would help inform the optimal path forward enormously.
{“authors”: [“Chad Hart“, “Philipp Hancke“]}
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post How to stop a leak – the WebRTC notifier appeared first on webrtcHacks.
Wiresharking Wire
This is the next decode and analysis in Philipp Hancke’s Blackbox Exploration series conducted by &yet in collaboration with Google. Please see our previous posts covering WhatsApp, Facebook Messenger and FaceTime for more details on these services and this series. {“editor”: “chad“}
Wire is an attempt to reimagine communications for the mobile age. It is a messaging app available for Android, iOS, Mac, and now web that supports audio calls, group messaging and picture sharing. One of it’s often quoted features is the elegant design. As usual, this report will focus on the low level VoIP aspects, and leave the design aspects up for the users to judge.
As part of the series of deconstructions, the full analysis is available for download here, including the wireshark dumps.
Half a year after launching the Wire Android app currently has been downloaded between 100k and 500k times. They also recently launched a web version, powered by WebRTC. Based on this, it seems to be stuck with what Dan York calls the directory dilemma.
What makes Wire more interesting from a technical point of view is that they’re strong proponents of the Opus codec for audio calls. Maybe there is something to learn here…
The wire blog explains some of the problems that they are facing in creating a good audio experience on mobile and wifi networks:
The WiFi and mobile networks we all use are “best effort” — they offer no quality of service guarantees. Devices and apps are all competing for bandwidth. Therefore, real-time communications apps need to be adaptive. Network adaptation means working around parameters such as variable throughput, latency, and variable latency, known as jitter. To do this, we need to measure the variations and adjust to them in as close to real-time as possible.
Given the preference of ISAC over Opus by Facebook Messenger, the question which led to investigating Wire was whether they can show how to successfully use Opus on mobile.
ResultsThe blog post mentioned above also describes the Wire stackas “a derivate of WebRTC and the IETF standardized Opus codec”. It’s not quite clear what exactly “derivate of WebRTC” means. What we found when looking at Wire was, in comparison to the other apps reviewed, was a more “out of the box” WebRTC app, using the protocols as defined in the standards body.
Comparison with WebRTC Feature WebRTC/RTCWeb Specifications Wire SDES MUST NOT offer SDES does not offer SDES ICE RFC 5245 RFC 5245 TURN usage used as last resort used as last resort Audio codec Opus or G.711 Opus Video codec H.264 or VP8 none (yet?) Quality of experienceAudio quality did turn out to be top notch, as our unscientific tests on various networks showed.
Testing on simulated 2G and 3G networks showed some adaptivity to the situations there.
The STUN implementation turned out to be based on the BSD-licensed libre by creytiv.com, which is compatible with both the Chrome and Firefox implementations of WebRTC. Binary analysis showed that the webrtc.org media engine along with libopus 1.1 is used for the upper layer.
PrivacyWire is company that prides itself on the user privacy protection that comes from having it’s HQ in Switzerland, yet has it’s signalling and TURN servers in Ireland. They get strong kudos for using DTLS-SRTP. To sum it up, Wire offers a case study in how to fully adopt WebRTC for both Web and native mobile.
Related articles across the web
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post Wiresharking Wire appeared first on webrtcHacks.
Dear NY Times, if you’re going to hack people, at least do it cleanly!
So the New York times uses WebRTC to gather your local ip addresses… Tsahi describes the non-technical parts of the issue in his blog. Let’s look at the technical details… it turns out that the Javascript code used is very clunky and inefficient.
First thing to do is to check chrome://webrtc-internals (my favorite tool since the hangouts analysis). And indeed, nytimes.com is using the RTCPeerConnection API. We can see a peerconnection created with the RtpDataChannels argument set to true and using stun:ph.tagsrvcs.com as a STUN server.
Also, we see that a data channel is created, followed by calls to createOffer and setLocalDescription. That pattern is pretty common to gather IP addresses.
Using Chrome’s devtools search feature it is straightforward to find out that the RTCPeerConnection is created in the following Javascript file:
http://s.tagsrvcs.com/2/4.10.0/loaded.js
Since it’s minified here is the de-minified snippet that is gathering the IPs:
Mt = function() { function e() { this.addrsFound = { "0.0.0.0": 1 } } return e.prototype.grepSDP = function(e, t) { var n = this; if (e) { var o = []; e.split("\r\n").forEach(function(e) { if (0 == e.indexOf("a=candidate") || 0 == e.indexOf("candidate:")) { var t = e.split(" "), i = t[4], r = t[7]; ("host" === r || "srflx" === r) && (n.addrsFound[i] || (o.push(i), n.addrsFound[i] = 1)) } else if (0 == e.indexOf("c=")) { var t = e.split(" "), i = t[2]; n.addrsFound[i] || (o.push(i), n.addrsFound[i] = 1) } }), o.length > 0 && t.queue(new y("webRTC", o)) } }, e.prototype.run = function(e) { var t = this; if (c.wrip) { var n = window.RTCPeerConnection || window.webkitRTCPeerConnection || window.mozRTCPeerConnection; if (n) { var o = { optional: [{ RtpDataChannels: !0 }] }, i = []; - 1 == w.baseDomain.indexOf("update.") && i.push({ url: "stun:ph." + w.baseDomain }); var r = new n({ iceServers: i }, o); r.onicecandidate = function(n) { n.candidate && t.grepSDP(n.candidate.candidate, e) }, r.createDataChannel(""), r.createOffer(function(e) { r.setLocalDescription(e, function() {}, function() {}) }, function() {}); var a = 0, s = setInterval(function() { null != r.localDescription && t.grepSDP(r.localDescription.sdp, e), ++a > 15 && (clearInterval(s), r.close()) }, 200) } } }, e }(),Let’s look at the run function first. It is creating a peerconnection with the optional RtpDataChannels constraint set to true. No reason for that, it will just unnecessarily create candidates with an RTCP component in Chrome and is ignored in Firefox.
As mentioned earlier,
stun:ph.tagsrvcs.com
is used as STUN server. From Wireshark dumps it’s pretty easy to figure out that this is running the [coturn stun/turn server](https://code.google.com/p/coturn/); the SOFTWARE field in the binding response is set to
Coturn-4.4.2.k3 ‘Ardee West’.
The code hooks up the onicecandidate callback and inspects every candidate it gets. Then, a data channel is created and createOffer and setLocalDescription are called to start the candidate gathering process.
Additionally, in the following snippet
the localDescription is searched for candidates every 200ms for three seconds. That polling is pretty unnecessary. Once candidate gathering is done, onicecandidate would have been called with the null candidate so polling is not required.
Lets look at the grepSDP function. It is called in two contexts, once in the onicecandidate callback with a single candidate, the other time with the complete SDP.
It splits the SDP or candidate into individual lines and then parses that line, extracting the candidate type at index 7 and the IP address at index 4.
Since without a relay server one will never get anything but host or srflx candidates, the check in following line is unnecessary. The rest of this line does eliminate duplicates however.
Oddly, the code also looks for an IP in the c= line which is completely unnecessary as this line will not contain new information. Also, looking for the candidate lines in the localDescription.sdp will not yield any new information as any candidate found in there will also be signalled in the onicecandidate callback (unless someone is using a 12+ months old version of Firefox).
Since the JS is minified it is rather hard to trace what actually happens with those IPs.
If you’re going to hack people, at least do it cleanly!
{“author”: “Philipp Hancke“}
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post Dear NY Times, if you’re going to hack people, at least do it cleanly! appeared first on webrtcHacks.
Can an Open Source SFU Survive Acquisition? Q&A with Jitsi & Atlassian HipChat
Atlassian’s HipChat acquired BlueJimp, the company behind the Jitsi open source project. Other than for positive motivation, why should WebRTC developers care? Well, Jitsi had its Jitsi Video Bridge (JVB) which was one of the few open source Selective Forwarding Units (SFU) projects out there. Jitsi’s founder and past webrtcHacks guest author, Emil Ivov, was a major advocate for this architecture in both the standards bodies and in the public. As we have covered in the past, SFU’s are an effective way to add multiparty video to WebRTC. Beyond this one component, Jitsi was also a popular open source project for its VoIP client, XMPP components, and much more.
So, we had a bunch of questions: what’s new in the SFU world? Is the Jitsi project going to continue? What happens when an open source project gets acquired? Why the recent licensing change?
To answer these questions I reached out to Emil, now Chief Video Architect at Atlassian and Jitsi Project Lead and Joe Lopez, Senior Development and Product Manager at Atlassian who is responsible for establishing and managing Atlassian Open Source program.
Photo courtesy of Flickr user Scott Cresswell
webrtcHacks: It has been a while since we have covered multi-party video architectures here. Can you give us some background on what a SFU is and where it helps in WebRTC?
Emil: A Selective Forwarding Unit (SFU) is what allows you to build reliable and scalable multi-party conferences. You can think of them as routers for video, as they receive media packets from all participants and then decide if and who they need to forward them to.
Compared to other conferencing servers, like video mixers – i.e. Multipoint Control Units (MCUs) – SFUs only need a small amount of resources and therefore scale much better. They are also significantly faster as they don’t need to transcode or synchronize media packets, so it is possible to cascade them for larger conferences.
webrtcHacks: is there any effort to standardize the SFU function?
Emil: Yes. It is true that SFUs operate at the application layer, so, strictly speaking, they don’t need to be standard the way IP routers do. Different vendors implement different features for different use cases and things work. Still, as their popularity grows, it becomes more and more useful for people to agree on best practices for SFUs. There is an ongoing effort at the IETF to describe how SFUs generally work: draft-ietf-avtcore-rtp-topologies-update. This helps the community understand how to best build and use them.
Having SFUs well described also helps us optimize other components of the WebRTC ecosystem for them. draft-aboba-avtcore-sfu-rtp-00.txt, for example, talks about how a number of fields that encoders use are currently shared by different codecs (like VP8 and H.264) but are still encoded differently, in codec-specific ways. This is bad as it means developers of more sophisticated SFUs need to suddenly start caring about codecs and the whole point of moving away from MCUs was to avoid doing that. Therefore, works like draft-berger-avtext-framemarking and draft-pthatcher-avtext-esid aim to take shared information out of the media payload and into generic RTP header extensions.
Privacy is another issue that has seen significant activity on the IETF is about improving the end-to-end privacy in SFUs. All existing SFUs today need to decrypt all incoming data before they can process it and forward it to other participants. This obviously puts SFUs in a position to eavesdrop on calls, which, unless you are running your own instance just for yourself, is not a great thing. It also means that the SFU needs to do allocate a lot of processing resources to transcrypting media and avoiding this would improve scalability even further.
Selective Forwarding Middlebox diagram from https://tools.ietf.org/html/draft-ietf-avtcore-rtp-topologies-update-08
webrtcHacks: Other than providing multi-party video functionality, what else do SFU’s like the JVB do?
Emil: Simple straightforward relaying from everyone to everyone – what we call full star routing – means you may end up sending a lot of traffic to a lot of people … potentially more than they care or are able to receive. There are two main ways to address that issue.
First, you can limit the number of streams that everyone receives. This means that in a conference with a hundred participants, rather than getting ninety-nine streams, everyone would only receive the streams for the last four, five or N active speakers. This is what we call Last N and it is something that really helps scalability. Right now N is a number that JVB deployments have as a config param but we are working on making it adaptive so that JVB would adapt it based on link quality.
Another way we improve bandwidth usage is by using “simulcast”. Chrome has the option of generating multiple outgoing video streams in different resolutions. This allows us to pick the higher resolution for active speakers (presumably those are the ones you would want to see in good quality) and resend it to participants who can afford the traffic. It will just relay the lower resolution for and to everyone else.
A few other SFUs, like the one Google Hangouts are using implement simulcast as it saves a lot of resources on the server but also on the client side.We are actually working on some improvements there right now.
webrtcHacks: Joe – now that you have had a chance to get to know the Blue Jimp/Jitsi team and technology, can you give an update on your plans to incorporate Jitsi into HipChat and Atlassian?
Joe: Teams of all sizes use HipChat every day to communicate in real-time all over the world. Teams are stronger when they feel connected, and video is an integral part of that. Users have logged millions of minutes of 1:1 video using HipChat, which helps teams collaborate and build their company culture regardless of whether they work in the same location. With Jitsi we can give users so much more! We’re in the process of developing our own video, audio, and screen-sharing features using Jitsi Video Bridge. It’s a little early for us to comment on exact priority order and timing, but our objective is to make it easier for teams to connect effortlessly anywhere, anytime, on any device.
webrtcHacks: what is the relationship between the core Atlassian team, HipChat, and Jitsi? How does Jitsi fit in your org structure?
Joe: The Jitsi team joined Atlassian as part of HipChat team and makes up the core of our video and real-time communication group. They are experts on all things RTC and we’re now leveraging their expertise. The Jitsi developers are relocating to our Austin offices and will keep working on Jitsi and Atlassian implementations.
webrtcHacks: Can you disclose the terms of the deal?
Emil: I can only say that the acquisition was a great thing for both BlueJimp and Jitsi.
Joe: Same here. We really wanted to add to our team the sort of expertise and technology that BlueJimp brings.
webrtcHacks: I had to try.. So, how big is the Jitsi community? Do you know how many active developers you have using your various projects?
Emil Ivov – founder of the Jitsi project
Emil: We haven’t been tracking our users so it’s hard to say. In terms of development, a lot of the work on Jitsi Videobridge and Jitsi Meet is done by BlueJimp, but we are beginning to get some pretty good patches. Hopefully, the trend will continue in that direction.
As for the Jitsi client, we haven’t had a lot of time for it in the past couple of years so community contributions there are likely surpassing those of the company.
webrtcHacks: Your public statements indicate the primary focus of the acquisition was the Jitsi Videobridge. Jitsi had many other popular products, including the Jitsi client, a TURN server, and many others. What is the future of these other projects? Does Atlassian have justification to continue to maintain these elements?
Joe: Our plan is to continue developing the Jitsi Videobridge as well as the other projects including libjitsi, Jitsi Meet, Jirecon, Jigasi and other WebRTC related projects in the Jitsi community. We’re also going to continue providing the build infrastructure for the Jitsi client just as BlueJimp has been doing. But, we don’t have immediate plans for substantial development on the purely client-side.
Emil: It’s worth pointing out that the heart of Jitsi Videobridge, libjitsi, is something it shares with the Jitsi client. There’s also a lot of code that Jigasi, our SIP gateway, imports directly from the client. So, while the upper UX layers in the client are not our main focus, we will continue working heavily on the core.
Above all, however, the developer community around the client is much older and more mature than that of our newer projects. There are developers like Ingo Bauersachs or Danny van Heumen, for example, who have been long involved with the project and who continue working on it. Developers coming and going or changing focus is a natural part of any FLOSS project, and as Joe mentioned, we are going to continue providing the logistics.
webrtcHacks: While Atlassian has initiatives to help open source projects, your core products are not based on open source and you are not known for having many open source projects of its own. Can you address this concern? Is the Jitsi acquisition an attempt to change this? If so, what else has Atlassian done to accommodate more of an open sourcing culture internally?
Joe: Actually, Atlassian uses, supports and develops a number of open source projects. We just haven’t been very vocal about it. That will be changing soon. In addition to managing our real-time communication project, I’m also responsible for our open source program. We’re in the process of restructuring how Atlassian supports open source, and the Jitsi project is one of the first initiatives in our plan. We see Jitsi as a great opportunity, and we are *very* serious about making this project a success. We’ll have more to say about open source later this year.
Finally we think that Jitsi Videobridge is the most advanced open source Selective Forwarding Unit, which puts the team in a unique position to contribute to the WebRTC ecosystem. We are very keen on doing this.
webrtcHacks: last week you moved all the Jitsi licenses from LGPL to Apache.
Emil, why did you choose LGPL in the first place?
Emil: Ever since we started the project, one of our primary motivations had been to get our code in the hands of as many people as possible, so we wanted to lower adoption barriers. Licensing is one of the important components here and, during its very early stages, around 2003, 2004, Jitsi (then SIP Communicator) started with an Apache license.
Then we had to think a little bit more seriously about how we were going to make a living off of our work, because otherwise there wouldn’t have been any project at all. That’s when we thought we might be better off if BlueJimp had protection and decided to switch to LGPL.
So, although it wasn’t our first choice, it did give us a certain measure of protection.
I am very happy that Atlassian has decided to take the risk and relinquish that protection. I firmly believe this is the best option for Jitsi and its users.
Joe Lopez of Atlassian at an Austin WebRTC Meetup
webrtcHacks: Joe – why the move to Apache? Why not other licenses like MIT, BSD, etc?
Joe: As for why Apache over MIT/BSD, it’s actually very simple: like at many organizations, Apache is our preferred license of choice when using other people’s open source work. So, it made sense to us that this is what we should choose for our projects. We talked with a number of people internally and externally, and even went so far as to evaluate all licenses. But our technical and legal experts found Apache to be a tried and tested license respected by many organizations for their terms and clarity. At the end of the day we chose Apache because it best fit our organization and others.
webrtcHacks: how do you expect this license change will impact existing Jitsi users?
Emil: Very positively! A number of developers and companies are looking at using Jitsi Videobridge for their new startups, products and services. We expect the Apache license to make Jitsi significantly more appealing to them.
When you are integrating a technology, the more permissive the license is, the less it precludes you from certain choices in the future. When launching a new service or a product, it is very hard to know that you would never need to keep some parts of it proprietary. This is especially true for startups, and I am saying it from experience.
You simply need to keep that option open because sometimes it makes all the difference between a company closing its doors or thriving for years.
That’s the liberty that you get from Apache.
webrtcHacks: the Meet application was previously a MIT license. How are you handling that? Some argue that going from MIT to Apache is a step in the wrong direction.
Emil: That’s true – the first lines of code in the Jitsi Meet project did come under MIT. But there’s not much to handle there. The MIT license allows for code to be redistributed under any other license, including Apache, and Joe already pointed out why we think Apache is a better choice.
Joe: There is also a purely practical side to this. As I mentioned, we’re in the process of restructuring our open source story, and Jitsi is one of the first in this effort. So, it’s important for us to apply the same policy everywhere. The more exceptions we have, the harder it will be to manage and ensure a good experience for any Atlassian contributor.
webrtcHacks: Jitsi was known in the past for soliciting community input before making major decisions. Why didn’t you announce the plans to change your licensing model before the actual change this time?
Emil: Knowing the project as I do, it just never crossed my mind that this would be a problem for anyone. Throughout the past years I only heard concerns from people that found the LGPL too restrictive for them, so I only expected positive opinions. And the overwhelming majority have reacted positively.
For the few people who have raised concerns, let me reiterate that we think this is the best possibility for Jitsi, and we also need to be practical and use a uniform license for all Atlassian projects.
People who feel that LGPL was a better match for them are completely free to take last week’s version of the project and continue maintaining it under that license.
webrtcHacks: what level of transparency can the Jitsi community expect going forward?
Emil: This is actually one of the main ways in which BlueJimp’s acquisition is going to be beneficial to Jitsi.
A lot of the work that BlueJimp did in the past was influenced by customer demand. As a result, we never really knew exactly what to expect a month in the future. This is now over. Today it is much easier for us to define a roadmap and stick to it. Obviously we will still remain flexible as we listen to requests and important use cases from the community, but we are going to have significantly more visibility than before.
webrtcHacks: the github charts indicate a slow down in activity vs. last year. Was this due to distraction from the acquisition? What level of public commits should we expect out of the new Atlassian Jitsi team going forward?
Emil: As with any project, there’s a lot that needs to be done in the early stages. Ninety percent of what you do is push code. This gradually changes with time as the problems you are solving become more complex. At that point you spend a lot of time thinking, testing and debugging. As a result your code output diminishes.
Take our joint efforts with Firefox, for example. This took a lot of time looking through wireshark traces, debugging and making small adjustments. The time it took to actually write the code was negligible compared to everything else we needed to do. Still, adding Firefox compatibility was important to Jitsi, and that happened within Atlassian.
In addition, the entire team is relocating to Austin, and a relocation can be time-consuming.
But, there haven’t been any private commits, if that’s what you are thinking of :).
Illustration of a Selective Forwarding Unit (SFU) architecture with 3 participants
webrtcHacks: can you share some of your roadmap & plans for Jitsi?
Emil: Gladly! We are really excited to continue working on what makes Jitsi Videobridge the most advanced SFU out there. This includes things like bandwidth adaptivity, for instance. We have big changes coming to our Simulcast and Last N support. Scalability and reliability will also be a main focus in the next months. This includes being able to do more conferences per deployment but also more people per conference. We are also going to be working on mobile, to make it easier for people to use the project on iOS and Android. Supporting other browsers and switching to Maven are also on the roadmap.
We’re not ready to say when, or in what order these things will be happening – but they’re coming.
webrtcHacks: should we expect to see the Jitsi source move from github to bitbucket?
Joe: We’re keeping Jitsi on GitHub, since they excel at being a place for open source projects. Bitbucket is better designed for software teams within organizations that want greater control over their source code, to restrict access within their organization, teams, or even to specific individuals. However, one area that we do want to address is issue tracking. This has been a source of pain for Jitsi, so we’re considering moving issue tracking to JIRA, Atlassian’s issue tracking and management software, which will provide us with everything we need for better project management.
We will be discussing this with the community in the coming weeks.
{“interviewer”, “chad“}
{“interviewees”, [“Emil Ivov“, “Joe Lopez“]}
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post Can an Open Source SFU Survive Acquisition? Q&A with Jitsi & Atlassian HipChat appeared first on webrtcHacks.
Developing mobile WebRTC hybrid applications
There are a lot of notable exceptions, but most WebRTC developers start with the web because well, Web RTC does start with web and development is much easier there. Market realities tells a very different story – there is more traffic on mobile than desktop and this trend is not going to change. So the next phase in most WebRTC deployments is inevitably figuring out how to support mobile. Unfortunately for WebRTC that has often meant finding the relatively rare native iOS and Android developer.
The team at eFace2Face decided to take a different route and build a hybrid plugin. Hybrid apps allows web developers to use their HTML, CSS, and JavaScript skills to build native mobile apps. They also open sourced the project and verified its functionality with the webrtc.org AppRTC reference. We asked them to give us some background on hybrid apps and to walk us through their project.
{“intro-by”, “chad“}
Hybrid apps for WebRTC (image source)
When deciding how to create a mobile application using WebRTC there is no obvious choice. There are several items that should be taken into consideration when faced with this difficult decision, like the existence of previous code base, the expertise, amount of resources and knowledge available. Maintenance and support are also a very important factors given the fragmentation of the mobile environment.
At eFace2Face we wanted to extend our service to mobile devices. We decided to choose our own path- exploring and filling in the gaps (developing new tools when needed) in order to create the solution that fitted us best.This post shares some of the knowledge and expertise we gained the hard way while doing so. We hope you find it useful!
Types of mobile apps (image source)
What’s a hybrid application?There are two main approaches on how hybrid apps are built:
- WebView: Put simply, this is an HTML5 web application that is bundled inside a native app and uses the device’s web browser to display it. The development framework for the application provides access to the device’s functions (camera, address book, accelerometers, etc.) in the form of JavaScript APIs through the use of plugins. It should also be totally responsive and use native-like resources to get a UX similar to a real app. Examples include Cordova/PhoneGap, Trigger.io, Ionic, and Sencha (the latter two being like Cordova with steroids).
- Compilation: You can choose from several different languages (like C#) and the code gets compiled to a native application for each supported platform. Examples are Xamarin, Appcelerator, Embarcadero FireMonkey, or RubyMotion.
Simple hybrid app example using PhoneGap (source)
Creating Hybrid HTML5 app is the most extensive alternative and the one we prefer because it uses web specific technologies. You can get a deeper overview about native vs. HTML5 (and hybrid applications) in a recent blog post at Android Authority.
Hybrid App Pros & Cons Pros:- Hybrid apps are as portable as HTML5 apps. They allow code reuse across platforms, with the framework handling all platform-specific differences.
- A hybrid app can be built at virtually the same speed at which an HTML5 app can be built. The underlying technology is the same.
- A hybrid app can be built for almost the same cost as an HTML5 app. However, most frameworks require a license, which adds an extra development cost.
- Hybrid apps can be made available and distributed via the relevant app store, just like native apps.
- Hybrid apps have greater access to native hardware resources than plain HTML5 apps, usually through the corresponding framework’s own APIs.
- Not all native hardware resources are available to hybrid apps. The available functionality depends on the framework used.
- Hybrid apps appear to the end user as native apps, but run significantly slower than native apps. The same restriction on HTML5 apps being rejected for being too slow and not responsive on Apple’s App Store also applies to hybrid apps. Rendering complex CSS layouts will take longer than rendering a corresponding native layout.
- Each framework has its own unique idiosyncrasies and ways of doing things that are not necessarily useful outside of the given framework.
From our point of view, a typical WebRTC application is not really graphic-intensive (i.e. it is not, for instance, a game with lots of animations and 3D effects). Most of the complex processes are done internally by the browser, not in JavaScript, so a graphical UX interface should be perfectly doable on a hybrid application and run without any significant perceptible slowdown. Instagram is a good example of a well-known hybrid app that uses web technologies in at least some of its components.
WebRTC on native mobile: current statusNative support in Android and iOS is a bit discouraging. Apple do not support it at all, and has no public information about when are they going to do so, if they decide to support it at all. On Android, the native WebView supported WebRTC starting in version 4.4 (but be cautious as it is based on Chromium 36) then in 5.0 and onwards.
Browser vendors fight (source)
Note that there are no “native WebRTC” APIs on Android or iOS yet, so you will have to use Google’s WebRTC library. Justin Uberti (@juberti) provides a very nice overview of how to do this (go here to see the slides).
SolutionsLet’s take a look at the conclusions of our research.
Android: CrosswalkIn Android, using the native WebView seems like a good approach; in fact we used it during our first attempt to create our application. But then we decided to switch to Intel’s Crosswalk, which includes what’s best described as a “full Chrome browser”. It actually allows us to use a fully updated version of native Chromium instead of WebView.
These were our reasons for choosing Crosswalk:
- Fully compatible source code: You only have to handle a single Chromium version across all Android devices. More importantly, it has the latest, regularly updated WebRTC APIs.
- Backward compatibility: According to developer.android.com, approximately 48% of Android devices currently in use are running Android versions below 4.4. While most of them don’t have hardware powerful enough to run WebRTC (either native or hybrid), you shouldn’t exclude this market.
- Fragmentation: Different versions of Android mean different versions of WebView. Given the speed at which WebRTC is evolving, you will have difficulties dealing with version fragmentation and supporting old versions of WebView.
- Performance: It seems you can get up to 10x improvement of both HTML/CSS rendering and JavaScript performance and CSS correctness.
An advanced reader could think: “Ok, this is cool but I need to use different console clients (Cordova and Crosswalk) to generate my project, and I don’t like the idea of that.” You’re right, it would be a hassle, but we also found another trick here. This project allows us to add Crosswalk support to a Cordova project; it uses a new Cordova feature to provide different engines like any other plugin. This way we don’t need to have different baselines in the source code.
iOS: Cordova pluginAs explained before, there are frameworks that provide hybrid applications with the device functionality code via plugins. You can use them in your JavaScript code but they are implemented using native code. So, it should be possible to add the missing WebRTC JavaScript APIs.
There are several options available, but most of them provide custom APIs or are tightly coupled with some proprietary signaling from a service provider. That’s the reason that we released an open source WebRTC Cordova plugin for iOS.
The plugin is built on top of Google’s native WebRTC code and exposes the W3C WebRTC APIs. Also, as it is a Cordova plugin, it allows you to have the same Cordova application running on Android with Crosswalk, and on iOS with the WebRTC plugin. And both of them reuse all of the code base you are already using for your web application.
Show me the code!“Yes, I have heard this already”, you might say, so let’s get some hands-on experience. In order to demonstrate that it’s trivial to reuse your current code and have your mobile application running in a matter of days (if not hours), we decided to take Google’s AppRTC HTML5 application and create a mobile application using the very same source code.
You can find the iOS code on github, Here are the steps required to get everything we’re talking about working in minutes:
- Get the source code: “git clone https://github.com/eface2face/iOSRTCApp; cd iOSRTCApp”
- Add both platforms; all required plugins are installed automatically because of their inclusion in the “config.xml” file: Cordova platform add iOS android
- Run as usual: “cordova run –device”
- Once running, enter the same room as the one that’s already been created via web browser at https://apprtc.appspot.com/ and enjoy!
Call between iOSRTCApp on iPad and APPRTC on browser
We needed to make some minor changes in order to make it work properly in the Cordova environment. Each of these changes didn’t require more than a couple of js/html/css lines:
- Due to Cordova’s nature, we had to add its structure to the project. Some plugins are required to get native features and permissions. The scripts js/apprtc.debug.js and js/appwindow.js are loaded once Cordova’s deviceready event is fired. This is necessary since the first one relies on the existing window.webkitRTCPeerConnection and navigator.webkitGetUserMedia , which are not set by cordova-plugin-iosrtc until the event fires.
- The webrtcDetectedVersion global variable is hardcoded to 43 as the AppRTC JavaScript code expects the browser to be Chrome or Chromium, and fails otherwise.
- In order to correctly place video views (iOS native UIView elements), the plugin function refreshVideos is called when local or remote video is actually displayed. This is because the CSS video elements use transition effects that modify their position and size for a duration of 1 second.
- A new CSS file css/main_overrides.css changes the properties of video elements. For example, it sets opacity to 0.85 in local-video and remote-video so HTML call controls are shown even below the native UIView elements rendering the local and remote video.
- Safari crashes when calling plugin methods within WebSocket events (“onopen”, “onmessage”, etc.). Instead, you have to run a setTimeout within the WebSocket event if you need to call plugin methods on it. We loaded the provided ios-websocket-hack.js script into our Cordova iOS app and solved this.
- Polyfill for windows.performance.now() used on AppRTC code.
Deciding whether to go hybrid or native for your WebRTC app is up to you. It depends on the kind of resources and relevant experience your company has, the kind of application that you want to implement, and the existing codebase and infrastructure you already have in place. The good news is our results show that using WebRTC is not a key factor in this decision, and you can have the mobile app version of your WebRTC web service ready in much less time than you probably expected.
References- Hybrid Mobile Apps: Providing A Native Experience With Web Technologies
- “Diego Ferreiro & Norbert Hu: Building a performant HTML5/Hybrid Ap”
- Your favourite app isn’t native
- Crosswalk comes to Ionic
{“authors”, [“Jesus Perez“,”Iñaki Baz“, “Sergio Garcia Murillo“]}
Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.
The post Developing mobile WebRTC hybrid applications appeared first on webrtcHacks.