webrtchacks

Subscribe to webrtchacks feed
Guides and information for WebRTC developers
Updated: 35 min 59 sec ago

Traffic Encryption

Wed, 09/23/2015 - 20:35

So I talked about Skype and Viber at KrankyGeek two weeks ago. Watch the video on youtube or take a look at the slides. No “reports” or packet dumps to publish this time, mostly because it is very hard to draw conclusions from the results.

The VoIP services we have looked at so far which use the RTP protocol for transferring media. RTP uses a packet header which is not encrypted and contains a number of attributes such as the payload type (identifying the codec used), a synchronization source (which identifies the source of the stream), a sequence number and a timestamp. This allows routers to identify RTP packets and prioritize them. This also allows someone monitoring all network traffic (“Pervasive Monitoring“) to easily identify VoIP traffic. Or someone wiretapping your internet connection.

Skype and Viber encrypt all packets. Does that make them them less susceptible for this kind of attack?

Bear with me, the answer is going to be very technical. tl;dr:

  • it is still pretty easy to determine that you are making a call.
  • it is also pretty easy to tell if you muted your microphone.
  • it is pretty apparent whether this is a videochat.
Skype

Not expecting to find much, I ran a standard set of scenarios with Skype of Android and iOS similar to those used in the Whatsapp analysis.
A first look did not show much. Luckily, when analyzing WhatsApp I had developed some tooling to deal with RTP. I modified those tools, removing the RTP parser, and was greeted with these graphs:

While the bitrate alone (blue is my ipad3 with a 172.16. ip address, black is my old Android phone) is not very interesting, the packet rate of exactly 50 packets was interesting. Also, the packet length distribution was similar to Opus. As I figured out later from the integrated debugging (on the Android device, this must be too technical for iOS users!), this was the Silk codec. In fact, if you account for some overhead the black distribution matches what we saw from WhatsApp earlier and what is now known to be Opus at 16khz or 8khz.
So the encryption did not change the traffic pattern. Nor does it hide the fact that a call is happening.

Keepalive Traffic

When muting the audio on one device, one can even see regular spikes in the traffic every then seconds. Supposedly, those are keepalive packets.

Telling audio and video traffic apart

Let’s look at some video traffic. Note the two distinct distributions in the third graph? Let’s suppose that the left one is audio and everything else is video. This works well enough looking at the last graph which shows the ‘audio’ traffic in green and orange respectively.

The accuracy could possibly be improved a little by looking at the number of packets which is pretty much constant for audio.
In RTP, we would use the synchronization source (SSRC) field from the header to accomplish this. But that just makes things easier for routers.

Relay traffic

Last but not least relays. When testing this from Europe, I was surprised to see my traffic being routed through Redmond, Washington.

This is quite interesting in comparison to the first graph. The packet rate stays roughly the same, but the bitrate doubles to 100 kilobits/second. That is quite some overhead compared to the standard TURN protocol which has negligible overhead. The packet length distribution is shifted to the right and there are a couple of very large packets. Latency was probably higher but this is very hard to measure.

Viber

While I got some pretty interesting results from Skype, Viber turned out to harder. Thanks to the tooling it took now only a matter of seconds to discover that, like Whatsapp, it uses a relay server to help with call establishment:

Blue traffic is captured locally before it is sent to the peer, the black and green traffic is received from the remote end. The traffic shown in black almost vanishes after a couple of initial spikes (which contain very large packets at a low frequency). Visualizations of this kind are a lot easier to understand than the packet dumps captured with Wireshark.

And for the sake of completeness, muting audio on both sides showed keepalive traffic, visible as tiny period spikes in this graph:

Conclusions

VoIP security is hard. And this not really news, attacks on encrypted VoIP traffic have been known for quite a while, see e.g. this paper from 2008 and the more recent ‘Phonotactic Reconstruction’ attacks.

The fact that RTP does not encrypt the header data makes it slightly easier to identify, but it seems that a determined attacker could have come to the same conclusions about the encrypted traffic of services like Skype. Keep that in mind when talking about the security of your service. Also, keep the story of the ECB penguin in mind.

Or, as Emil Ivov said about the security of peer-to-peer: “Unless there is a cable going between your computer and the other guys computer and you can see the entire cable, then you’re probably in for a rude awakening”.

{“author”: “Philipp Hancke“}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.

The post Traffic Encryption appeared first on webrtcHacks.

First steps with ORTC

Fri, 09/18/2015 - 21:20

ORTC support in Edge has been announced today. A while back, we saw this on twitter:

Windows Insider Preview build 10525 is now available for PCs: http://t.co/zeXQJocgLs This release lays groundwork for ORTC in Microsoft Edge

— Microsoft Edge Dev (@MSEdgeDev) August 18, 2015

“This release [build 10525] lays the groundwork for ORTC” was quite an understatement. It was considered experimental and while the implementation still differs from the specification (which is still work in progress) slightly, it already worked and as a developer you can get familiar with how ORTC works and how it is different from the RTCPeerConnection API.
If you want to test this, please use builds newer than 10547. Join the Windows Insider Program to get them and make sure you’re on the fast ring.

The approach taken differs from the RTCPeerConnection way of giving you a blob that you exchange as this WebRTC PC1 sample shows quite well. It’s more about giving you the building blocks.

In ORTC, you have to incrementally build up things. Let’s walk through the code (available on github):

Setting up a Peer to Peer connection

var gatherer1 = new RTCIceGatherer(iceOptions); var transport1 = new RTCIceTransport(gatherer1); var dtls1 = new RTCDtlsTransport(transport1);

There are three elements on the transport side:
* the RTCIceGatherer which gathers ICE candidates to be sent to the peer,
* the RTCIceTransport where you add the candidates from the peer,
* the DtlsTransport which is sitting on top of the ICE transport and deals with encryption.

As in the peerConnection API, you exchange the candidates:

// Exchange ICE candidates. gatherer1.onlocalcandidate = function (evt) { console.log('1 -> 2', evt.candidate); transport2.addRemoteCandidate(evt.candidate); }; gatherer2.onlocalcandidate = function (evt) { console.log('2 -> 1', evt.candidate); transport1.addRemoteCandidate(evt.candidate); };

Also, you need to exchange the ICE parameters (usernameFragment and password) and start the ICE transport:

transport1.start(gatherer1, gatherer2.getLocalParameters(), 'controlling'); transport1.onicestatechange = function() { console.log('ICE transport 1 state change', transport1.state); };

This is done with SDP in the PeerConnection API. One side needs to be controlling, the other is controlled.

You also need to start the DTLS transport with the remote fingerprint and hash algorithm:

dtls1.start(dtls2.getLocalParameters()); dtls1.ondtlsstatechange = function() { console.log('DTLS transport 1 state change', dtls1.state); };

Once this is done, you can see the candidates being exchanged and the ICE and DTLS state changes on both sides.

Cool. Now what?

Sending a MediaStream track over the connection

Let’s send a MediaStream track. First, we acquire it using the promise-based navigator.mediaDevices.getUserMedia API and attach it to the local video element.

// call getUserMedia to get a MediaStream. navigator.mediaDevices.getUserMedia({video: true}) .then(function(stream) { document.getElementById('localVideo').srcObject = stream;

Next, we determine the send and receive parameters. This is where the PeerConnection API does the “offer/answer” magic.
Since our sending capabilities match the receiving capabilities, there is little we need to do here.
Some black magic is still involved, check the specification for the gory details.

// Determine RtpCodecParameters. Consider this black magic. var params = RTCRtpReceiver.getCapabilities('video'); params.muxId = 1001; params.encodings = [{ ssrc: 1001, codecPayloadType: 0, fec: 0, rtx: 0, priority: 1.0, maxBitrate: 2000000.0, minQuality: 0, framerateBias: 0.5, resolutionScale: 1.0, framerateScale: 1.0, active: true, dependencyEncodingId: undefined, encodingId: undefined }]; // We need to transform the codec capability into a codec. params.codecs.forEach(function (codec) { codec.payloadType = codec.preferredPayloadType; }); params.rtcp = { cname: "", reducedSize: false, ssrc: 0, mux: true }; console.log(params);

Then, we start the RtpReceiver with those parameters:

// Start the RtpReceiver to receive the track. receiver = new RTCRtpReceiver(dtls2, 'video'); receiver.receive(params); var remoteStream = new MediaStream(); remoteStream.addTrack(receiver.track); document.getElementById('remoteVideo').srcObject = remoteStream;

Note that the Edge implementation is slightly different from the current ORTC specification here since you need to specify the media type as second argument when creating the RtpReceiver.
We create a stream to contain the track and attach it to the remote video element.
Last but not least, let’s send the video track we got:

sender = new RTCRtpSender(stream.getVideoTracks()[0], dtls1); sender.send(params);

That’s it. It gets slightly more complicated when you have to deal with multiple tracks, and have to actually negotiate capabilities in order to interop between Chrome and Edge. But that’s a longer story…

{“author”: “Philipp Hancke“}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.

The post First steps with ORTC appeared first on webrtcHacks.

Reacting to React Native for native WebRTC apps (Alexey Aylarov)

Tue, 09/15/2015 - 23:47

It turns out people like their smartphone apps, so that native mobile is pretty important. For WebRTC that usually leads to venturing outside of JavaScript into the world of C++/Swift for iOS and Java for Android. You can try hybrid applications (see our post on this), but many modern web apps applications often use JavaScript frameworks like AngularJS, Backbone.js, Ember.js, or others and those don’t always mesh well with these hybrid app environments.

Can you have it all? Facebook is trying with React which includes the ReactJS framework and  React Native for iOS and now Android too. There has been a lot of positive fanfare with this new framework, but will it help WebRTC developers? To find out I asked VoxImplant’s Alexey Aylarov to give us a walkthrough of using React Native for a native iOS app with WebRTC.

{“editor”: “chad hart“}

If you haven’t heard about ReactJS or React Native then I can recommend to check them out. They already have a big influence on a web development and started having influence on mobile app development with React Native release for iOS and an Android version just released. It sounds familiar, doesn’t it? We’ve heard the same about WebRTC, since it changes the way web and mobile developers implement real-time communication in their apps. So what is React Native after all?

“React Native enables you to build world-class application experiences on native platforms using a consistent developer experience based on JavaScript and React. The focus of React Native is on developer efficiency across all the platforms you care about — learn once, write anywhere. Facebook uses React Native in multiple production apps and will continue investing in React Native.”
https://facebook.github.io/react-native/

I can simplify it to “one of the best ways for web/javascript developers to build native mobile apps, using familiar tools like Javascript, NodeJS, etc.”. If you are connected to WebRTC world (like me) the first idea that comes to your mind when you play with React Native is “adding WebRTC there should be a big thing, how can I make it?” and then from React Native documentation you’ll find out that there is a way to create your own Native Modules:

Sometimes an app needs access to platform API, and React Native doesn’t have a corresponding module yet. Maybe you want to reuse some existing Objective-C, Swift or C++ code without having to reimplement it in JavaScript, or write some high performance, multi-threaded code such as for image processing, a database, or any number of advanced extensions.

That’s exactly what we needed! Our WebRTC module in this case is a low-level library that provides high-level Javascript API for React Native developers. Another good thing about React Native is that it’s an open source framework and you can find a lot of required info on GitHub. It’s very useful, since React Native is still very young and it’s not easy to find the details about native module development. You can always reach out to folks using Twitter (yes, it works! Look for #reactnative or https://twitter.com/Vjeux) or join their IRC channel to ask your questions, but checking examples from GitHub is a good option.

React Native’s module architecture

Native modules can have C/C++ , Objective-C, and Javascript code. This means you can put the native WebRTC libraries, signaling and some other libs written in C/C++ as a low-level part of your module, implement video element rendering in Objective-C and offer Javascript/JSX API for react native developers.

Technically low-level and high-level code is divided in the following way:

  1. you create Objective-C class that extends React’s RCTBridgeModule class and
  2. use RCT_EXPORT_METHOD to let Javascript code work with it.

While in Objective-C you can interact with the OS, C/C++ libs and even create iOS widgets. The Ready-to-use native module(s) can be distributed in number of different ways, the easiest one being via a npm package.

WebRTC module API

We’ve been implementing a React Native module for our own platform and already knew which of our API functions we would provide to Javascript. Creating a WebRTC module that is independent of signaling that can be used by any WebRTC developer is a much more complicated problem.

We can divide the process into few parts:

Integration with WebRTC

Since webRTC does not limit developers how to discover user names and network connection information, this signaling can be done in multiple ways. Google’s WebRTC implementation known as libwebrtc. libwebrtc has a built-in library called libjingle that provides “signaling” functionality.

There are 3 ways how libwebrtc can be used to establish a communication:

  1. libjingle with built-in signaling

This is the simplest one leveraging libjingle. In this case signaling is implemented in libjingle via XMPP protocol.

  1. Your own signaling

This is a more complicated one with signaling on the application side. In this case you need to implement SDP and ICE candidates exchange and pass data to webrtc. One of popular methods is to use some SIP library for signaling.

  1. Application-controlled RTC

For the hardcore you can avoid using signaling altogether This means the application should take care of all RTP session params: RTP/RTCP ports, audio/video codecs, codec params, etc. Example of this type of integration can be found in WebRTC sources in WebRTCDemo app for Objective-C (src/talk/app/webrtc)

Adding Signaling

We used the 2nd approach in our implementation. Here are some code examples for making/receiving calls (C++):

  1. First of all, create Peer Connection factory:
    peerConnectionFactory = webrtc::CreatePeerConnectionFactory(…);
  2. Then creating local stream (we can set if it will be voice or video call):
    localStream =  peerConnectionFactory->CreateLocalMediaStream(uniqueLabel); localStream->AddTrack(audioTrack);     if (withVideo)        localStream->AddTrack(videoTrack);
  3. Creating PeerConnection (set STUN/TURN servers list, if you are going to use it)
    webrtc::PeerConnectionInterface::IceServers servers; webrtc::CreateSessionDescriptionObserver* peerConnectionObserver; peerConnection = peerConnectionFactory ->CreatePeerConnection(servers, …., peerConnectionObserver);
  4. Adding local stream to Peer Connection:
    peerConnection->AddStream(localStream);
  5. Creating SDP:
    webrtc::CreateSessionDescriptionObserver* sdpObserver;

    1. For outbound call:
      1. Creating SDP:
        peerConnection->CreateOffer(sdpObserver);
      2. Waiting for SDP from remote peer (via signaling) and pass it to Peer Connection:
        peerConnection->SetRemoteDescription(remoteSDP);
    2. In case of inbound call we need to set remote SDP before setting local SDP:
      peerConnection->SetRemoteDescription(remoteSDP); peerConnection->CreateAnswer(sdpObserver);
  6. Waiting for events and sending SDP and ICE-candidate info to remote party (via signaling):
    webrtc::CreateSessionDescriptionObserver::OnSuccess(webrtc::SessionDescriptionInterface* desc) { if (this->outgoing) sendOffer();         else         sendAnswer(); } webrtc::CreateSessionDescriptionObserver::OnIceCandidate(const webrtc::IceCandidateInterface* candidate)  {         sendIceCandidateInfo(candidate); }
  7. Waiting for ICE candidates info from remote peer  and when it arrives pass it to Peer Connection:
    peerConnection->AddIceCandidate(candidate);
  8. After a successful ICE exchange (if everything is ok) connection/call is established.
Integration with React Native

First of all we need to create react-native module (https://facebook.github.io/react-native/docs/native-modules-ios.html) , where we describe the API and implement audio/video calling using WebRTC (Obj-C , iOS):

@interface YourVoipModule () { } @end @implementation YourVoipModule RCT_EXPORT_MODULE(); RCT_EXPORT_METHOD(createCall: (NSString *) to withVideo: (BOOL) video ResponseCallback: (RCTResponseSenderBlock)callback) { NSString * callId = [createVoipCall: to withVideo:video]; callback(@[callId]); }

If want to to support video calling we will need an additional component to show the local camera (Preview) or remote video stream (RemoteView):

@interface YourRendererView : RCTView @end

Initialization and deinitialization can be implemented in the following methods:

- (void)removeFromSuperview {         [videoTrack removeRenderer:self];         [super removeFromSuperview]; } - (void)didMoveToSuperview {         [super didMoveToSuperview];         [videoTrack addRenderer:self]; }

You can find the code examples on our GitHub page – just swap the references to our signaling with your own. We found examples very useful while developing the module, so hopefully they will help you to understand the whole idea much faster.

Demo

The end result can look like as follows:

Closing Thoughts

When WebRTC community started working on the standard one of the main ideas was to make real-time communications simpler for web developers and provide developers with a convenient Javascript API for real time communications. React Native has similar goal, it lets web developers build native apps using Javascript. In our opinion bringing WebRTC to the set of available React Native APIs makes a lot of sense – web app developers will be able to build their RTC apps for mobile platforms. Guys behind React Native has just released it for Android at Scale conference, so we will update the article or write a new one about building the module compatible with Android as soon as we know all the details.

{“author”, “Alexey Aylarov”}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.

The post Reacting to React Native for native WebRTC apps (Alexey Aylarov) appeared first on webrtcHacks.

Gaming with the WebRTC DataChannel – A Walkthrough with Arin Sime

Thu, 08/27/2015 - 14:51

The fact that you can use WebRTC to implement a secure, reliable, and standards based peer-to-peer network is a huge deal that is often overlooked.  We have been notably light on the DataChannel here at webrtcHacks, so I asked Arin Sime if would be interested in providing one of his great walkthrough’s on this topic.  He put together a very practical example of a multi-player game.  You make recognize Arin from RealTime Weekly or from his company Agility Feat or his new webRTC.ventures brand. Check out this excellent step-by-step guide below and start lightening the load on your servers and reducing message latency with the DataChannel.

{“editor”: “chad hart“}

WebRTC is “all about video chat”, right? I’ve been guilty of saying things like that myself when explaining WebRTC to colleagues and clients, but it’s a drastic oversimplification that should never go beyond your first explanation of WebRTC to someone.

Of course, there’s more to WebRTC than just video chat. WebRTC allows for peer-to-peer video, audio, and data channels.  The Data channels are a distinct part of that architecture and often forgotten in the excitement of seeing your video pop up in the browser.

Don’t forget about the Data Channel!

Being able to exchange data directly between two browsers, without any sort of intermediary web socket server, is very useful. The Data Channel carries the same advantages of WebRTC video and audio:  it’s fully peer-to-peer and encrypted.  This means Data Channels are useful for things like text chat applications, file transfers, P2P file exchanges, gaming, and more.a

In this post, I’m going to show you the basics of how to setup and use a WebRTC Data Channel.

First, let’s review the architecture of a WebRTC application.

You have to setup signaling code in order to establish the peer to peer connection between two peers.  Once the signaling is complete (which takes place over a 3rd party server), then you have a Peer to Peer (P2P) connection between two users which can contain video and audio streams, and a data channel.

The signaling for both processes is very similar, except that if you are building a Data Channel only application then you don’t need to call GetUserMedia or exchange streams with the other peer.

Data Channel Security

There are a couple of other differences about using the DataChannel.  The most obvious one is that users don’t need to give you their permission in order to establish a Data Channel over an RTCPeerConnection object.  That’s different than video and audio, which will prompt the browser to ask the user for permissions to turn on their camera and microphone.

Although it’s generating some debate right now, data channels don’t require explicit permission from users.  That makes it similar to a web socket connection, which can be used in a website without the knowledge of users.

The Data Channel can be used for many different things.  The most common examples are for implementing text chat to go with your video chat.  If you’re already setting up an RTCPeerConnection for video chat, then you might as well use the same connection to supply a Data Channel for text chat instead of setting up a different socket connection for text chat.

Likewise, you can use the Data Channel for transferring files directly between your peers in the RTCPeerConnection.  This is nicer than a normal socket style connection because just like WebRTC video, the Data Channel is completely peer-to-peer and encrypted in transit.  So your file transfer is more secure than in other architectures.

The game of “Memory”

Don’t limit your Data Channel imagination by these common examples though.  In this post, I’m going to show you how to use the Data Channel to build a very simple two-player game.  You can use the Data Channel to transfer any type of data you like between two browsers, so in this case we’ll use it to send commands and data between two players of a game you might remember called “Memory”.

In the game of memory, you can flip over a card, and then flip a second card, and if they match, you win that round and the cards stay face up.  If they didn’t match, you put both face down again, and it’s the next person’s turn.  By trying to remember what you and your opponents have been flipping, and where those cards were, you can win the game by correctly flipping the most pairs.

Photo Credit: http://www.vwmin.org/memory-game.html

Adam Khoury already built a javascript implementation of this game for a single player, and you can read his tutorial on how to build the game Memory for a single player.  I won’t explain the logic of his code for building the game, what I’m going to do instead is build on top of his code with a very simple WebRTC Data Channel implementation to keep the card flipping in synch across two browsers.

You can see my complete code on GitHub, and below I’m going to show you the relevant segments.

In this example view of my modified Memory game, the user has correctly flipped pairs of F, D, and B, so those cards will stay face up.  The cards K and L were just flipped and did not match, so they will go back face down.

Setting up the Data Channel configuration

I started with a simple NodeJS application to serve up my code, and I added in Express to create a simple visual layer.  My project structure looks like this:

The important files for you to look at are datachannel.js (where the majority of the WebRTC logic is), memorygame.js (where Adam’s game javascript is, and which I have modified slightly to accommodate the Data Channel communications), and index.ejs, which contains a very lightweight presentation layer.

In datachannel.js, I have included some logic to setup the Data Channel.  Let’s take a look at that:

//Signaling Code Setup var configuration = {         'iceServers': [{                 'url': 'stun:stun.l.google.com:19302'         }] }; var rtcPeerConn; var dataChannelOptions = {         ordered: false, //no guaranteed delivery, unreliable but faster         maxRetransmitTime: 1000, //milliseconds }; var dataChannel;

The configuration variable is what we pass into the RTCPeerConnection object, and we’re using a public STUN server from Google, which you often see used in WebRTC demos online.  Google is kind enough to let people use this for demos, but remember that it is not suitable for public use and if you are building a real app for production use, you should look into setting up your own servers or using a commercial service like Xirsys to provide production ready STUN and TURN signaling for you.

The next set of options we define are the data channel options.  You can choose for “ordered” to be either true or false.

When you specify “ordered: true”, then you are specifying that you want a Reliable Data Channel.  That means that the packets are guaranteed to all arrive in the correct order, without any loss, otherwise the whole transaction will fail.  This is a good idea for applications where there is significant burden if packets are occasionally lost due to a poor connection.  However, it can slow down your application a little bit.

We’ve set ordered to false, which means we are okay with an Unreliable Data Channel.  Our commands are not guaranteed to all arrive, but they probably will unless we are experiencing poor connectivity.  Unless you take the Memory game very seriously and have money on the line, it’s probably not a big deal if you have to click twice.  Unreliable data channels are a little faster.

Finally, we set a maxRetransmitTime before the Data Channel will fail and give up on that packet.   Alternatively, we could have specified a number for maxRetransmits, but we can’t specify both constraints together.

Those are the most common options for a data channel, but you can also specify the protocol if you want something other than the default SCTP, and you can set negotiated to true if you want to keep WebRTC from setting up a data channel on the other side.  If you choose to do that, then you might also want to supply your own id for the data channel.  Typically you won’t need to set any of these options, leave them at their defaults by not including them in the configuration variable.

Set up your own Signaling layer

The next section of code may be different based on your favorite options, but I have chosen to use express.io in my project, which is a socket.io package for node that integrates nicely with the express templating engine.

So the next bit of code is how I’m using socket.io to signal to any others on the web page that I am here and ready to play a game.  Again, none of this is specified by WebRTC.  You can choose to kick off the WebRTC signaling process in a different way.

io = io.connect(); io.emit('ready', {"signal_room": SIGNAL_ROOM}); //Send a first signaling message to anyone listening //In other apps this would be on a button click, we are just doing it on page load io.emit('signal',{"type":"user_here", "message":"Would you like to play a game?", "room":SIGNAL_ROOM});

In the next segment of datachannel.js, I’ve setup the event handler for when a different visitor to the site sends out a socket.io message that they are ready to play.

io.on('signaling_message', function(data) {         //Setup the RTC Peer Connection object         if (!rtcPeerConn)                 startSignaling();         if (data.type != "user_here") {                 var message = JSON.parse(data.message);                 if (message.sdp) {                         rtcPeerConn.setRemoteDescription(new RTCSessionDescription(message.sdp), function () {                                 // if we received an offer, we need to answer                                 if (rtcPeerConn.remoteDescription.type == 'offer') {                                         rtcPeerConn.createAnswer(sendLocalDesc, logError);                                 }                         }, logError);                 }                 else {                         rtcPeerConn.addIceCandidate(new RTCIceCandidate(message.candidate));                 }         }         });

There are several things going on here.  The first one to be executed is that if the rtcPeerConn object has not been initialized yet, then we call a local function to start the signaling process.  So when Visitor 2 announces themselves as here, they will cause Visitor 1 to receive that message and start the signaling process.

If the type of socket.io message is not “user_here”, which is something I arbitrarily defined in my socket.io layer and not part of WebRTC signaling, then the code goes into a couple of WebRTC specific signaling scenarios – handling an SDP “offer” that was sent and crafting the “answer” to send back, as well as handling ICE candidates that were sent.

The WebRTC part of Signaling

For a more detailed discussion of WebRTC signaling, I refer you to http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/”>Sam Dutton’s HTML5 Rocks tutorial, which is what my signaling code here is based on.

For completeness’ sake, I’m including below the remainder of the signaling code, including the startSignaling method referred to previously.

function startSignaling() {         rtcPeerConn = new webkitRTCPeerConnection(configuration, null);         dataChannel = rtcPeerConn.createDataChannel('textMessages', dataChannelOptions);         dataChannel.onopen = dataChannelStateChanged;         rtcPeerConn.ondatachannel = receiveDataChannel;         // send any ice candidates to the other peer         rtcPeerConn.onicecandidate = function (evt) {                 if (evt.candidate)                         io.emit('signal',{"type":"ice candidate", "message": JSON.stringify({ 'candidate': evt.candidate }), "room":SIGNAL_ROOM});         };         // let the 'negotiationneeded' event trigger offer generation         rtcPeerConn.onnegotiationneeded = function () {                 rtcPeerConn.createOffer(sendLocalDesc, logError);         }   }   function sendLocalDesc(desc) {         rtcPeerConn.setLocalDescription(desc, function () {                 io.emit('signal',{"type":"SDP", "message": JSON.stringify({ 'sdp': rtcPeerConn.localDescription }), "room":SIGNAL_ROOM});         }, logError); }

This code handles setting up the event handlers on the RTCPeerConnection object for dealing with ICE candidates to establish the Peer to Peer connection.

Adding DataChannel options to RTCPeerConnection

This blog post is focused on the DataChannel more than the signaling process, so the following lines in the above code are the most important thing for us to discuss here:

rtcPeerConn = new webkitRTCPeerConnection(configuration, null); dataChannel = rtcPeerConn.createDataChannel('textMessages', dataChannelOptions); dataChannel.onopen = dataChannelStateChanged; rtcPeerConn.ondatachannel = receiveDataChannel;

In this code what you are seeing is that after an RTCPeerConnection object is created, we take a couple extra steps that are not needed in the more common WebRTC video chat use case.

First we ask the rtcPeerConn to also create a DataChannel, which I arbitrarily named ‘textMessages’, and I passed in those dataChannelOptions we defined previously.

Setting up Message Event Handlers

Then we just define where to send two important Data Channel events:  onopen and ondatachannel.  These do basically what the names imply, so let’s look at those two events.

function dataChannelStateChanged() {         if (dataChannel.readyState === 'open') {                 dataChannel.onmessage = receiveDataChannelMessage;         } } function receiveDataChannel(event) {         dataChannel = event.channel;         dataChannel.onmessage = receiveDataChannelMessage; }

 

When the data channel is opened, we’ve told the RTCPeerConnection to call dataChannelStateChanged, which in turn tells the dataChannel to call another method we’ve defined, receiveDataChannelMessage, whenever a data channel message is received.

The receiveDataChannel method gets called when we receive a data channel from our peer, so that both parties have a reference to the same data channel.  Here again, we are also setting the onmessage event of the data channel to call our method receiveDataChannelMessage method.

Receiving a Data Channel Message

So let’s look at that method for receiving a Data Channel message:

function receiveDataChannelMessage(event) {                 if (event.data.split(" ")[0] == "memoryFlipTile") {                 var tileToFlip = event.data.split(" ")[1];                 displayMessage("Flipping tile " + tileToFlip);                 var tile = document.querySelector("#" + tileToFlip);                 var index = tileToFlip.split("_")[1];                 var tile_value = memory_array[index];                 flipTheTile(tile,tile_value);         } else if (event.data.split(" ")[0] == "newBoard") {                 displayMessage("Setting up new board");                 memory_array = event.data.split(" ")[1].split(",");                 newBoard();         } }

Depending on your application, this method might just print out a chat message to the screen.  You can send any characters you want over the data channel, so how you parse and process them on the receiving end is up to you.

In our case, we’re sending a couple of specific commands about flipping tiles over the data channel.  So my implementation is parsing out the string on spaces, and assuming the first item in the string is the command itself.

If the command is “memoryFlipTile”, then this is the command to flip the same tile on our screen that our peer just flipped on their screen.

If the command is “newBoard”, then that is the command from our peer to setup a new board on our screen with all the cards face down.  The peer is also sending us a stringified array of values to go on each card so that our boards match.  We split that back into an array and save it to a local variable.

Controlling the Memory game to flip tiles

The actual flipTheTile and newBoard methods that are called reside in the memorygame.js file, which is essentially the same code that we’ve modified from Adam.

I’m not going to step through all of Adam’s code to explain how he built the single player Memory game in javascript, but I do want to highlight two places where I refactored it to accommodate two players.

In memorygame.js, the following function tells the DataChannel to let our peer know which card to flip, as well as flips the card on our own screen:

function memoryFlipTile(tile,val){         dataChannel.send("memoryFlipTile " + tile.id);         flipTheTile(tile,val); }

Notice how simple it is to send a message to our peers using the data channel – just call the send method and pass any string you want.  A more sophisticated example might send well formatted XML or JSON in a message, in any format you specify.  In my case, I just send a command followed by the id of the tile to flip, with a space between.

Setting up a new game board

In Adam’s single player memory game, a new board is setup whenever you load the page.  In my two player adaptation, I decided to have a new board triggered by a button click instead:

var setupBoard = document.querySelector("#setupBoard"); setupBoard.addEventListener('click', function(ev){         memory_array.memory_tile_shuffle();         newBoard();         dataChannel.send("newBoard " + memory_array.toString());         ev.preventDefault(); }, false);

 

In this case, the only important thing to notice is that I’ve defined a “newBoard” string to send over the data channel, and in this case I want to send a stringified version of the array containing the values to put behind each card.

Next steps to make the game better

That’s really all there is to it!  There’s a lot more we could do to make this a better game.  I haven’t built in any logic to limit the game to two players, keep score by players, or enforce the turns between the players.  But it’s enough to show you the basic idea behind using the WebRTC data channel to send commands in a multiplayer game.

The nice thing about building a game like this that uses the WebRTC data channel is it’s very scalable.  All my website had to do is help the two players get a connection setup, and after that, all the data they need to exchange with each other is done over an encrypted peer-to-peer channel and it won’t burden my web server at all.

A completed multiplayer game using the Data Channel

Here’s a video showing the game in action:

Demo of a simple two player game using the WebRTC Data Channel video

As I hope this example shows you, the hard part of WebRTC data channels is really just in the signaling and configuration, and that’s not too hard.  Once you have the data channel setup, sending messages back and forth is very simple.  You can send messages that are as simple or complex as you like.

How are you using the Data Channel?  What challenges have you run into?  Feel free to contact me on Twitter or through my site to share your experiences too!

{“author”: “arin sime“}

Sources:

http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/

http://www.w3.org/TR/webrtc/#simple-peer-to-peer-example

https://www.developphp.com/video/JavaScript/Memory-Game-Programming-Tutorial

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.

The post Gaming with the WebRTC DataChannel – A Walkthrough with Arin Sime appeared first on webrtcHacks.

Making WebRTC source building not suck (Alex Gouaillard)

Tue, 08/25/2015 - 15:17

One of WebRTC’s benefits is that the source to it is all open source. Building WebRTC from source provides you the ultimate flexibility to do what you want with the code, but it is also crazy difficult for all but the small few VoIP stack developers who have been dedicated to doing this for years. What benefit does the open source code provide if you can’t figure out how to build from it?

As WebRTC matures into mobile, native desktop apps, and now into embedded devices as part of the Internet of Things, working with the lower-level source code is becoming increasingly common.

Frequent webrtcHacks guest poster Dr. Alex Gouaillard has been trying to make this easier. Below he provides a review of the building WebRTC from source, exposing many of the gears WebRTC developers take for granted when they leverage a browser or someone else’s SDK. Alex also reviews the issues complexities associated with this process and introduces the open source make process he developed to help ease the process.

{“editor”: “chad hart“}

Building WebRTC from source sometimes feels like engineering the impossible. Photo courtesy of Andrew Lipson.

Building WebRTC from source

Most of the audience for WebRTC (and webrtcHacks)  is made of web developers, JavaScript and cloud ninjas that might not be less familiar with handling external libraries from source. That process is painful. Let’s make it clear, it’s painful for everybody – not only web devs.

 What are the cases where you need to build from source?

  1. Writing a native app – mobile, desktop, IoT,..)
  2. Some kind of server (gateway, media, ….)
  3. Plugin (either for IE, Safari, Cordova, …)

 You basically need to build from source anytime you can’t leverage a browser, WebRTC enabled node.js (for the sake of discussion), SDK’s someone how put together for you,  or anything else.

 These main cases are illustrated below in the context of a comprehensive offering.

Figure 1: map of a WebRTC solution

Usually, the project owners provide precompiled and tested libraries that you can use yourself (stable) and the most recent version that is compiled but not tested for those who are brave.

Pre-compiled libraries are usable out of the box, but do not allow you to modify anything. Sometimes there are build scripts that help you recompile the libs yourselves. This provides more flexibility in terms of what gets in the lib, and what optimizations/options you set, at the cost of now having to maintain a development environment.

Comparing industry  approaches

For example, Cisco with its openH264 library provides both precompiled libraries and build scripts. In their case, using the precompiled library defers H264 royalty issues to them, but that’s another subject. While the libwebrtc project includes build scripts, they are complex use, do not provide a lot of flexibility for modifying the source, and make it difficult to test any modifications.

The great cordova plugin from eFace2Face is using a precompiled libWebRTC (here) (see our post on this too). Pristine.io were among the first one to propose build script to make it easier (see here; more about that later).

Sarandogou/doubango’s webrtc-everywhere plugin for IE and Safari does NOT use automated build scripts, versioning or a standard headers layout, which causes them a lot of problems and slows their progress.

The pristine.io guys put a drawing of what the process is, and noted that, conceptually, there is not a big difference between android and iOS build as to the steps you need to follow. Practically, there is a difference in the tools you used though.

My build process

Here is my build process:

 

Please also note that I mention testing explicitly and there is a good reason for that, learned the hard way. I will come to it in the next section.

You will see I have a “send to dashboard” step. I mean something slightly different than what people usually refer to as a dashboard. Usually, people want to report the results of the tests to a dashboard to show that a given revision is bug free (as much as possible) and that the corresponding binary can be used in production.

If you have performance tests, a dashboard can also help you spot performance regressions.  In my case here, I also want to use a common public dashboard as a way to publish failing builds on different systems or with different configurations, and still provide full log access to anyone. It makes solving those problem easier. The one asking the question can point to the dashboard, and interesting parties have an easier time looking at the issue or reproducing it. More problems reported, more problems solved, everyone is happy.

Now that we have reviewed the build from source process a bit, let’s talk about what’s wrong with it.

Building from Source Sucks

Writing an entire WebRTC stack is insanely hard. That’s why Google went out and bought GIPS, even though they have a lot of very very good engineers at disposal. Most devs and vendors use an existing stack.

For historical reasons most people use google’s contributed WebRTC stack based on the GIPS media engine, and Google’s libjingle for the network part.

Even Mozilla is using the same media engine, even though they originally went for a Cisco SIP soft phone code as the base (see here, under “list of components”, “SIPCC”) to implement the network part of WebRTC. Since then, Mozilla went on and rewrote almost all that part to support more advanced functionality such as multi-party. However, the point is, their network and signaling is different from Google’s while their media engine is almost identical. Furthermore, Mozilla does not attempt to provide a standalone version of their WebRTC implementation, which makes it hard for developers to make use of it right away.

Before Ericson’s OpenWebRTC announcement in October 2014, the Google standalone version was the only viable option out there for most. OpenWebRTC has advantages on some parts, like hardware support for H.264 on iOS for example, but lacks some features and Windows support that can be a showstopper for some. It is admittedly less mature. It also uses GStreamer, which has its own conventions and own build system (cerbero), which is also tough to learn.

The webrtc.org stack is not available in a precompiled library with an installer. This forces developers to compile WebRTC themselves, which is “not a picnic”.

One needs first to become accustomed to Chrome dev tools which are quite unique, adding a learning step to the process. The code changes quite often (4 commits a day), and the designs are poorly documented at best.

Even if you manage to compile the libs, either by yourself or using resources on the web, it is almost certain that you cannot test it before using it in your app, as most of the bug report, review, build, test and dashboard  infrastructure is under the control of Google by default.

Don’t get me wrong, the bug report and review servers allow anybody to set up an account. What is done with your tickets or suggestions however is up to Google. You can end up with quite frustrating answers. If you dig deep enough in the Chrome infrastructure for developers, you will also find how to replicate their entire infrastructure, but the level you need to have to go through this path, and the amount of effort to get it right is prohibitive for most teams. You want to develop your product, not become a Chrome expert.

Finally, the contributing process at Google allows for bugs to get in. You can actually looks at the logs and see a few “Revert” commits there.

Figure 2: Example of a Revert commit message.

From the reverted commits (see footnote[1]: 107 since January 2015), one can tell that revisions of WebRTC on the HEAD are arbitrarily broken. Here again, this comment might be perceived as discriminatory against Google. It is not. There is nothing wrong there; it always happen for any project, and having only 107 reverts in 6 months while maintaining 4 commits a day is quite an achievement. However, it means that you, as a developer, cannot work with any given commit and expect the library to be stable. You have at least to test it yourself.

My small side project to help

My goals are:

  1. Provide information to the community that is not documented elsewhere, or not consolidated. The blog posts on www.webrtcbydralex.com fulfill this goal.
  2. Learn more about WebRTC
  3. Prepare a course for the local university.
  4. Do something useful of my current “long vacations”

    Yes, vacations in Boracay, Philippines, once voted #2 most beautiful beach in the world by tripadvisor are nice. But I very quickly get that I-need-to-code urge, and they have Wi-Fi on the beach ….

  5. Have fun!

More importantly I would like to lower the barrier of adoption / collaboration / contribution by providing:

  • WebRTC installers that sync with chrome revisions that developers could use blindly out of the box (knowing they’ve been tested)
  • Code for anyone to set up their own build/try/package pipeline, either locally or in the cloud
  • Easy patching and testing framework to enhance Webrtc. As an example, provide an h264 compliant WebRTC lib based on work from Kaiduan Xue, Jesup Randell, and others.
  • More examples and applications for Devs to start from. A first example will be a stand-alone, h264 compliant, appRTCDemo desktop app.
  • Public dashboard for a community to come together, contribute build bots and de duplicate the tests efforts going on at almost every vendor for the base stack.
  • Public dashboard for people to submit their fail builds as a way to ask question on the mailing list and get faster answers.

Example of my dashboard

What we did exactly

We leveraged the CMake / CTest / CDash / CPack suite of tools instead of the usual shell scripts, to automate most of the fetch, configure, build, test, report and package processes.

CMake is cross platform from the ground up, and makes it very easy to deploy such processes. No need to maintain separate or different build scripts for each platform, or build-toolchain.

CTest help you manage your test suites, and is also a client for CDash which handle the dashboard part of the process.

Finally CPack handle packaging your libs with headers and anything else you might want, and support a lot of different packagers with a unified syntax.

This entire suite of tools have also designed in such a way that “a gifted master student could use it and contribute back in a matter of days”, while being so flexible and powerful that big companies like Netflix or Canonical (Ubuntu), use it as the core of their engineering process.

Most of the posts at webrtcbydralex.com will take you through the process, step by step of setting up this solution., in conjunction with a github repository holding all the corresponding source code.

The tool page provides installers for WebRTC for those in a hurry.

{“author”: “Alex Gouaillard“}

[1] git log –since=1.week –pretty=oneline | grep Revert | wc -l

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.

The post Making WebRTC source building not suck (Alex Gouaillard) appeared first on webrtcHacks.

How to stop a leak – the WebRTC notifier

Tue, 08/04/2015 - 11:15

The “IP Address Leakage” topic has turned into a public relations issue for WebRTC. It is a fact that the WebRTC API’s can be used to share one’s private IP address(es) without any user consent today. Nefarious websites could potentially use this information to fingerprint individuals who do not want to be tracked. Why is this an issue? Can this be stopped? Can I tell when someone is trying to use WebRTC without my knowledge? We try to cover those questions below along with a walkthrough of a Chrome extension that you can install or modify for yourself that provides a notification if WebRTC is being used without your knowledge.

Creative solutions for leaks

The “IP Leakage” problem Why does WebRTC need a local IP address?

As Reid explained long ago in his An Intro to WebRTC’s NAT/Firewall Problem, peer-to-peer communications cannot occur without providing the peer your IP address. The ICE protocol gathers and checks all the addresses that can be used to communicate to a peer. IP addresses come in a few flavors:

  • host IP address – this is the usually the local LAN IP address and is the one that is being exposed that is causing all the fuss
  • server-reflexive – this is the address outside the web server hosting the page will see
  • relay – this will show-up if you have a TURN server

Why not just use the server reflexive and relay addresses? The host IP address is the If you have 2 peers that want to talk to each other on the same LAN, then the most effective way to do this is to use the host IP address to keep all the traffic local. Otherwise you might end up sending the traffic out to the WAN and then back into the LAN, adding a lot of latency and degrading quality. This is the best address to use for this situation.

Relay addresses require that you setup a TURN server to relay your media. Use of relay means you are no longer truely peer-to-peer. Relay use is typically temporarily to speed connection time or as a last resort when a direct peer-to-peer connection cannot be made. Relay is generally avoided since just passing along a lot of media with no added value is expensive in terms of bandwidth costs and added latency.

This is why the WebRTC designers do not consider the exposure of the host IP address a bug – they built WebRTC on this way on purpose. The challenge is this mechanism can be used in to help with fingerprinting, providing a datapoint on your local addresses that you and your network administrator might not be happy about. The concern over this issue is illustrated by the enormous response on the Dear NY Times, if you’re going to hack people, at least do it cleanly! post last month exemplified this issue.

Why not just ask for when someone wants your local IP address?

When you want to share a video or audio stream, a WebRTC application you use the getUserMedia API. The getUserMedia API requires user consent to access the camera & microphone. However, there is no requirement to do this when using a dataChannel. So why not require consent here?

Let’s look at the use-cases. For a typical WebRTC videochat, user consent is required for the camera permission. The question “do you want to allow this site to access to your camera and microphone” is easy to understand for users. One might require consent here or impose the requirement that a mediastream originating from a camera is attached to the peerconnection.

What about a webinar. Participants might want to join just to listen. No permission is asked currently. Is that bad? Well… is there a permission prompt when you connect to a streaming server to watch a video? No. What is the question that should be asked here?

There are usecases like filetransfer which involve datachannel-only connections without the requirement of local media. Since you can upload the file to any http server without the browser asking for any permission, what is the question to ask here?

Last but not least, there are usecases like peer-to-peer CDNs where visitors of a website form a CDN to reduce the server-load in high-bandwidth resources like videos. While many people claim this is a new use-case enabled by WebRTC, Adobe showed this capability in Flash at MAX 2008 and 2009.

As as side-note, the RTMFP protocol in Flash has leaked the same information since then. It was just alot less obvious to acquire.

There is an additional caveat here. Adobe required user consent before using the user’s upstream to share data — even if peer-to-peer connections did not require consent. Apparently, this consent dialog completely killed the use-case for Flash, at a time when it was still the best way to deliver video. What is the question that the user must answer here? And does the user understand the question?

Photo courtesy flickr user Nisha A under Creative Commons 2.0 What are the browser vendors and the W3C doing about it?

Last week Google created an extension with source code to limit WebRTC to only using public addresses. There have been some technical concerns about breaking applications and degrading performance.
Mozilla is considering similar capabilities for Firefox as discussed here. This should hit the nightly build soon.
The W3C also discussed the issue at their recent meeting in Berlin and will likely address this as part of the unsanctioned tracking group.

 

How do I know if a site is trying to run WebRTC?

We usually have chrome://webrtc-internals open all the time and occasionally we do see sites using WebRTC in unexpected ways? I wondered if there was an easier way to see if a site was covertly using WebRTC, so I asked Fippo how hard it would be to make an extension to show peerConnection attempts. In usual fashion he had some working sample code back to be in a couple of hours. Let’s take a look…

How the extension works

The extension source code is available on github.
It consists of a content script, snoop.js, which is run at document start (as specified in the manifest.json file) and a background script, background.js
The background script is sitting idly and waiting for messages sent via the Message Passing API.
When receiving a message with the right format, it prints that message to the background page’s console and show the page action.

chrome.runtime.onConnect.addListener(function (channel) { channel.onMessage.addListener(function (message, port) { if (message[0] !== 'WebRTCSnoop') return; console.log(new Date(), message[1], message[2]); chrome.pageAction.show(port.sender.tab.id); }); });

Pretty simple, eh? You can inspect the background page console from the chrome://extensions page.
Let’s look at the content script as well. It consists of three blocks.
The first block does the important work. It overloads the createOffer, createAnswer, setLocalDescription and setRemoteDescription methods of the webkitRTCPeerConnection using a technique also used by adapter.js. Whenever one of these methods is called, it does a window.postMessage which is then triggers a call to the background page.

var inject = '('+function() { // taken from adapter.js, written by me ['createOffer', 'createAnswer', 'setLocalDescription', 'setRemoteDescription'].forEach(function(method) { var nativeMethod = webkitRTCPeerConnection.prototype[method]; webkitRTCPeerConnection.prototype[method] = function() { // TODO: serialize arguments var self = this; this.addEventListener('icecandidate', function() { //console.log('ice candidate', arguments); }, false); window.postMessage(['WebRTCSnoop', window.location.href, method], '*'); return nativeMethod.apply(this, arguments); }; }); }+')();';

The code snippet also shows how to listen for the ice candidates in a way which
The second part, inspired by the WebRTCBlock extension, injects the Javascript into the page by creating a script element, inserting the code and removing it immediately.

var script = document.createElement('script'); script.textContent = inject; (document.head||document.documentElement).appendChild(script); script.parentNode.removeChild(script);

Last but not least, a message channel is set up that listens to the events generated in the first part and send them to the background page:

var channel = chrome.runtime.connect(); window.addEventListener('message', function (event) { if (typeof(event.data) === 'string') return; if (event.data[0] !== 'WebRTCSnoop') return; channel.postMessage(event.data); });

There is a caveat here. The code is not executed for iframes that use the sandbox attribute as described here so it does not detect all usages of WebRTC. That is outside our control. Hey Google… can you fix this?

Ok, but how do I install it?

If you are not familiar with side-loading Chrome extensions, the instructions are easy:

  1. Download the zip from github
  2. Unzip it to a folder of your choice
  3. go to chrome://extensions
  4. Click on “Developer mode”
  5. Then click “Load unpacked extension”
  6. Find the webrtcnotify-master folder that you unzipped

View of the WebRTC Notifier extension

That’s it! If you want to see more details from the extension then it is helpful to load the extension’s console log. To do this just click on “background page” by “Inspect views”.

If you are familiar with Chrome Extensions and have improvement ideas, please contribute to the project!

What do I do if I find an offending site?

No one really knows how big of a problem this is yet, so let’s try to crowd source it. If you find a site that appears to be using WebRTC to gather your IP address in a suspicious way then post a comment about it here. If we get a bunch of these and others in the community confirm then we will create a public list.

With some more time we could potentially combine selenium with this extension to do something like a survey of the most popular 100k websites? We are not trying to start a witch hunt here, but having data to illustrate how big a problem this is would help inform the optimal path forward enormously.

{“authors”: [“Chad Hart“, “Philipp Hancke“]}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.

The post How to stop a leak – the WebRTC notifier appeared first on webrtcHacks.

Wiresharking Wire

Thu, 07/16/2015 - 21:07

This is the next decode and analysis in Philipp Hancke’s Blackbox Exploration series conducted by &yet in collaboration with Google. Please see our previous posts covering WhatsApp, Facebook Messenger and FaceTime for more details on these services and this series. {“editor”: “chad“}

Wire is an attempt to reimagine communications for the mobile age. It is a messaging app available for Android, iOS, Mac, and now web that supports audio calls, group messaging and picture sharing. One of it’s often quoted features is the elegant design. As usual, this report will focus on the low level VoIP aspects, and leave the design aspects up for the users to judge.

As part of the series of deconstructions, the full analysis is available for download here, including the wireshark dumps.

Half a year after launching the Wire Android app currently has been downloaded between 100k and 500k times. They also recently launched a web version, powered by WebRTC. Based on this, it seems to be stuck with what Dan York calls the directory dilemma.

What makes Wire more interesting from a technical point of view is that they’re strong proponents of the Opus codec for audio calls. Maybe there is something to learn here…

The wire blog explains some of the problems that they are facing in creating a good audio experience on mobile and wifi networks:

The WiFi and mobile networks we all use are “best effort” — they offer no quality of service guarantees. Devices and apps are all competing for bandwidth. Therefore, real-time communications apps need to be adaptive. Network adaptation means working around parameters such as variable throughput, latency, and variable latency, known as jitter. To do this, we need to measure the variations and adjust to them in as close to real-time as possible.

Given the preference of ISAC over Opus by Facebook Messenger, the question which led to investigating Wire was whether they can show how to successfully use Opus on mobile.

Results

The blog post mentioned above also describes the Wire stackas “a derivate of WebRTC and the IETF standardized Opus codec”. It’s not quite clear what exactly “derivate of WebRTC” means. What we found when looking at Wire was, in comparison to the other apps reviewed, was a more “out of the box” WebRTC app, using the protocols as defined in the standards body.

Comparison with WebRTC  Feature WebRTC/RTCWeb Specifications Wire SDES MUST NOT offer SDES does not offer SDES ICE RFC 5245 RFC 5245 TURN usage used as last resort used as last resort Audio codec Opus or G.711 Opus Video codec H.264 or VP8 none (yet?) Quality of experience

Audio quality did turn out to be top notch, as our unscientific tests on various networks showed.
Testing on simulated 2G and 3G networks showed some adaptivity to the situations there.

Implementation

The STUN implementation turned out to be based on the BSD-licensed libre by creytiv.com, which is compatible with both the Chrome and Firefox implementations of WebRTC. Binary analysis showed that the webrtc.org media engine along with libopus 1.1 is used for the upper layer.

Privacy

Wire is company that prides itself on the user privacy protection that comes from having it’s HQ in Switzerland, yet has it’s signalling and TURN servers in Ireland. They get strong kudos for using DTLS-SRTP. To sum it up, Wire offers a case study in how to fully adopt WebRTC for both Web and native mobile.
Related articles across the web

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.

    The post Wiresharking Wire appeared first on webrtcHacks.

    Dear NY Times, if you’re going to hack people, at least do it cleanly!

    Tue, 07/14/2015 - 00:07

    So the New York times uses WebRTC to gather your local ip addresses… Tsahi describes the non-technical parts of the issue in his blog. Let’s look at the technical details… it turns out that the Javascript code used is very clunky and inefficient.

    First thing to do is to check chrome://webrtc-internals (my favorite tool since the hangouts analysis). And indeed, nytimes.com is using the RTCPeerConnection API. We can see a peerconnection created with the RtpDataChannels argument set to true and using stun:ph.tagsrvcs.com as a STUN server.
    Also, we see that a data channel is created, followed by calls to createOffer and setLocalDescription. That pattern is pretty common to gather IP addresses.

    Using Chrome’s devtools search feature it is straightforward to find out that the RTCPeerConnection is created in the following Javascript file:
    http://s.tagsrvcs.com/2/4.10.0/loaded.js

    Since it’s minified here is the de-minified snippet that is gathering the IPs:

    Mt = function() { function e() { this.addrsFound = { "0.0.0.0": 1 } } return e.prototype.grepSDP = function(e, t) { var n = this; if (e) { var o = []; e.split("\r\n").forEach(function(e) { if (0 == e.indexOf("a=candidate") || 0 == e.indexOf("candidate:")) { var t = e.split(" "), i = t[4], r = t[7]; ("host" === r || "srflx" === r) && (n.addrsFound[i] || (o.push(i), n.addrsFound[i] = 1)) } else if (0 == e.indexOf("c=")) { var t = e.split(" "), i = t[2]; n.addrsFound[i] || (o.push(i), n.addrsFound[i] = 1) } }), o.length > 0 && t.queue(new y("webRTC", o)) } }, e.prototype.run = function(e) { var t = this; if (c.wrip) { var n = window.RTCPeerConnection || window.webkitRTCPeerConnection || window.mozRTCPeerConnection; if (n) { var o = { optional: [{ RtpDataChannels: !0 }] }, i = []; - 1 == w.baseDomain.indexOf("update.") && i.push({ url: "stun:ph." + w.baseDomain }); var r = new n({ iceServers: i }, o); r.onicecandidate = function(n) { n.candidate && t.grepSDP(n.candidate.candidate, e) }, r.createDataChannel(""), r.createOffer(function(e) { r.setLocalDescription(e, function() {}, function() {}) }, function() {}); var a = 0, s = setInterval(function() { null != r.localDescription && t.grepSDP(r.localDescription.sdp, e), ++a > 15 && (clearInterval(s), r.close()) }, 200) } } }, e }(),

    Let’s look at the run function first. It is creating a peerconnection with the optional RtpDataChannels constraint set to true. No reason for that, it will just unnecessarily create candidates with an RTCP component in Chrome and is ignored in Firefox.

    As mentioned earlier,
    stun:ph.tagsrvcs.com
    is used as STUN server. From Wireshark dumps it’s pretty easy to figure out that this is running the [coturn stun/turn server](https://code.google.com/p/coturn/); the SOFTWARE field in the binding response is set to
    Coturn-4.4.2.k3 ‘Ardee West’.

    The code hooks up the onicecandidate callback and inspects every candidate it gets. Then, a data channel is created and createOffer and setLocalDescription are called to start the candidate gathering process.
    Additionally, in the following snippet

    var a = 0, s = setInterval(function() { null != r.localDescription && t.grepSDP(r.localDescription.sdp, e), ++a > 15 && (clearInterval(s), r.close()) }, 200)

    the localDescription is searched for candidates every 200ms for three seconds. That polling is pretty unnecessary. Once candidate gathering is done, onicecandidate would have been called with the null candidate so polling is not required.

    Lets look at the grepSDP function. It is called in two contexts, once in the onicecandidate callback with a single candidate, the other time with the complete SDP.
    It splits the SDP or candidate into individual lines and then parses that line, extracting the candidate type at index 7 and the IP address at index 4.
    Since without a relay server one will never get anything but host or srflx candidates, the check in following line is unnecessary. The rest of this line does eliminate duplicates however.

    Oddly, the code also looks for an IP in the c= line which is completely unnecessary as this line will not contain new information. Also, looking for the candidate lines in the localDescription.sdp will not yield any new information as any candidate found in there will also be signalled in the onicecandidate callback (unless someone is using a 12+ months old version of Firefox).

    Since the JS is minified it is rather hard to trace what actually happens with those IPs.
    If you’re going to hack people, at least do it cleanly!

    {“author”: “Philipp Hancke“}

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.

    The post Dear NY Times, if you’re going to hack people, at least do it cleanly! appeared first on webrtcHacks.

    Can an Open Source SFU Survive Acquisition? Q&A with Jitsi & Atlassian HipChat

    Sun, 07/12/2015 - 22:12

    Atlassian’s HipChat acquired BlueJimp, the company behind the Jitsi open source project. Other than for positive motivation, why should WebRTC developers care? Well, Jitsi had its Jitsi Video Bridge (JVB) which was one of the few open source Selective Forwarding Units (SFU) projects out there. Jitsi’s founder and past webrtcHacks guest author, Emil Ivov, was a major advocate for this architecture in both the standards bodies and in the public. As we have covered in the past, SFU’s are an effective way to add multiparty video to WebRTC. Beyond this one component, Jitsi was also a popular open source project for its VoIP client, XMPP components, and much more.

    So, we had a bunch of questions: what’s new in the SFU world? Is the Jitsi project going to continue? What happens when an open source project gets acquired? Why the recent licensing change?

    To answer these questions I reached out to Emil, now Chief Video Architect at Atlassian and Jitsi Project Lead and Joe Lopez, Senior Development and Product Manager at Atlassian who is responsible for establishing and managing Atlassian Open Source program. 

    Photo courtesy of Flickr user Scott Cresswell

    webrtcHacks: It has been a while since we have covered multi-party video architectures here. Can you give us some background on what a SFU is and where it helps in WebRTC?

    Emil: A Selective Forwarding Unit (SFU) is what allows you to build reliable and scalable multi-party conferences. You can think of them as routers for video, as they receive media packets from all participants and then decide if and who they need to forward them to.

    Compared to other conferencing servers, like video mixers – i.e. Multipoint Control Units (MCUs) – SFUs only need a small amount of resources and therefore scale much better. They are also significantly faster as they don’t need to transcode or synchronize media packets, so it is possible to cascade them for larger conferences.

    webrtcHacks: is there any effort to standardize the SFU function?

    Emil: Yes. It is true that SFUs operate at the application layer, so, strictly speaking, they don’t need to be standard the way IP routers do. Different vendors implement different features for different use cases and things work. Still, as their popularity grows, it becomes more and more useful for people to agree on best practices for SFUs. There is an ongoing effort at the IETF to describe how SFUs generally work: draft-ietf-avtcore-rtp-topologies-update. This helps the community understand how to best build and use them.

    Having SFUs well described also helps us optimize other components of the WebRTC ecosystem for them. draft-aboba-avtcore-sfu-rtp-00.txt, for example, talks about how a number of fields that encoders use are currently shared by different codecs (like VP8 and H.264) but are still encoded differently, in codec-specific ways. This is bad as it means developers of more sophisticated SFUs need to suddenly start caring about codecs and the whole point of moving away from MCUs was to avoid doing that. Therefore, works like draft-berger-avtext-framemarking and draft-pthatcher-avtext-esid aim to take shared information out of the media payload and into generic RTP header extensions.

    Privacy is another issue that has seen significant activity on the IETF is about improving the end-to-end privacy in SFUs. All existing SFUs today need to decrypt all incoming data before they can process it and forward it to other participants. This obviously puts SFUs in a position to eavesdrop on calls, which, unless you are running your own instance just for yourself, is not a great thing. It also means that the SFU needs to do allocate a lot of processing resources to transcrypting media and avoiding this would improve scalability even further.

    Selective Forwarding Middlebox diagram from https://tools.ietf.org/html/draft-ietf-avtcore-rtp-topologies-update-08

    webrtcHacks: Other than providing multi-party video functionality, what else do SFU’s like the JVB do? 

    Emil: Simple straightforward relaying from everyone to everyone – what we call full star routing – means you may end up sending a lot of traffic to a lot of people … potentially more than they care or are able to receive. There are two main ways to address that issue.

    First, you can limit the number of streams that everyone receives. This means that in a conference with a hundred participants, rather than getting ninety-nine streams, everyone would only receive the streams for the last four, five or N active speakers. This is what we call Last N and it is something that really helps scalability. Right now N is a number that JVB deployments have as a config param but we are working on making it adaptive so that JVB would adapt it based on link quality.

    Another way we improve bandwidth usage is by using “simulcast”. Chrome has the option of generating multiple outgoing video streams in different resolutions. This allows us to pick the higher resolution for active speakers (presumably those are the ones you would want to see in good quality) and resend it to participants who can afford the traffic. It will just relay the lower resolution for and to everyone else.

    A few other SFUs, like the one Google Hangouts are using implement simulcast as it saves a lot of resources on the server but also on the client side.We are actually working on some improvements there right now.

    webrtcHacks: Joe – now that you have had a chance to get to know the Blue Jimp/Jitsi team and technology, can you give an update on your plans to incorporate Jitsi into HipChat and Atlassian?

    Joe: Teams of all sizes use HipChat every day to communicate in real-time all over the world. Teams are stronger when they feel connected, and video is an integral part of that. Users have logged millions of minutes of 1:1 video using HipChat, which helps teams collaborate and build their company culture regardless of whether they work in the same location. With Jitsi we can give users so much more! We’re in the process of developing our own video, audio, and screen-sharing features using Jitsi Video Bridge. It’s a little early for us to comment on exact priority order and timing, but our objective is to make it easier for teams to connect effortlessly anywhere, anytime, on any device.

    webrtcHacks: what is the relationship between the core Atlassian team, HipChat, and Jitsi? How does Jitsi fit in your org structure?

    Joe: The Jitsi team joined Atlassian as part of HipChat team and makes up the core of our video and real-time communication group. They are experts on all things RTC and we’re now leveraging their expertise. The Jitsi developers are relocating to our Austin offices and will keep working on Jitsi and Atlassian implementations.

    webrtcHacks: Can you disclose the terms of the deal?

    Emil: I can only say that the acquisition was a great thing for both BlueJimp and Jitsi.

    Joe: Same here. We really wanted to add to our team the sort of expertise and technology that BlueJimp brings.

    webrtcHacks: I had to try.. So, how big is the Jitsi community? Do you know how many active developers you have using your various projects?

    Emil Ivov – founder of the Jitsi project

    Emil:  We haven’t been tracking our users so it’s hard to say. In terms of development, a lot of the work on Jitsi Videobridge and Jitsi Meet is done by BlueJimp, but we are beginning to get some pretty good patches. Hopefully, the trend will continue in that direction.

    As for the Jitsi client, we haven’t had a lot of time for it in the past couple of years so community contributions there are likely surpassing those of the company.

    webrtcHacks: Your public statements indicate the primary focus of the acquisition was the Jitsi Videobridge. Jitsi had many other popular products, including the Jitsi client, a TURN server, and many others. What is the future of these other projects? Does Atlassian have justification to continue to maintain these elements?

    Joe: Our plan is to continue developing the Jitsi Videobridge as well as the other projects including libjitsi, Jitsi Meet, Jirecon, Jigasi and other WebRTC related projects in the Jitsi community. We’re also going to continue providing the build infrastructure for the Jitsi client just as BlueJimp has been doing.  But, we don’t have immediate plans for substantial development on the purely client-side.

    Emil: It’s worth pointing out that the heart of Jitsi Videobridge, libjitsi, is something it shares with the Jitsi client. There’s also a lot of code that Jigasi, our SIP gateway, imports directly from the client.  So, while the upper UX layers in the client are not our main focus, we will continue working heavily on the core.

    Above all, however, the developer community around the client is much older and more mature than that of our newer projects. There are developers like Ingo Bauersachs or Danny van Heumen, for example, who have been long involved with the project and who continue working on it. Developers coming and going or changing focus is a natural part of any FLOSS project, and as Joe mentioned, we are going to continue providing the logistics.

    webrtcHacks: While Atlassian has initiatives to help open source projects, your core products are not based on open source and you are not known for having many open source projects of its own. Can you address this concern? Is the Jitsi acquisition an attempt to change this? If so, what else has Atlassian done to accommodate more of an open sourcing culture internally?

    Joe: Actually, Atlassian uses, supports and develops a number of open source projects.  We just haven’t been very vocal about it. That will be changing soon. In addition to managing our real-time communication project, I’m also responsible for our open source program. We’re in the process of restructuring how Atlassian supports open source, and the Jitsi project is one of the first initiatives in our plan. We see Jitsi as a great opportunity, and we are *very* serious about making this project a success. We’ll have more to say about open source later this year.

    Finally we think that Jitsi Videobridge is the most advanced open source Selective Forwarding Unit, which puts the team in a unique position to contribute to the WebRTC ecosystem. We are very keen on doing this.

    webrtcHacks: last week you moved all the Jitsi licenses from LGPL to Apache.

    Emil, why did you choose LGPL in the first place?

    Emil: Ever since we started the project, one of our primary motivations had been to get our code in the hands of as many people as possible, so we wanted to lower adoption barriers. Licensing is one of the important components here and, during its very early stages, around 2003, 2004, Jitsi (then SIP Communicator) started with an Apache license.

    Then we had to think a little bit more seriously about how we were going to make a living off of our work, because otherwise there wouldn’t have been any project at all. That’s when we thought we might be better off if BlueJimp had protection and decided to switch to LGPL.

    So, although it wasn’t our first choice, it did give us a certain measure of protection.

    I am very happy that Atlassian has decided to take the risk and relinquish that protection. I firmly believe this is the best option for Jitsi and its users.

    Joe Lopez of Atlassian at an Austin WebRTC Meetup

    webrtcHacks: Joe – why the move to Apache? Why not other licenses like MIT, BSD, etc?

    Joe: As for why Apache over MIT/BSD, it’s actually very simple: like at many organizations, Apache is our preferred license of choice when using other people’s open source work. So, it made sense to us that this is what we should choose for our projects. We talked with a number of people internally and externally, and even went so far as to evaluate all licenses. But our technical and legal experts found Apache to be a tried and tested license respected by many organizations for their terms and clarity. At the end of the day we chose Apache because it best fit our organization and others.

    webrtcHacks: how do you expect this license change will impact existing Jitsi users?

    Emil: Very positively! A number of developers and companies are looking at using Jitsi Videobridge for their new startups, products and services. We expect the Apache license to make Jitsi significantly more appealing to them.

    When you are integrating a technology, the more permissive the license is, the less it precludes you from certain choices in the future. When launching a new service or a product, it is very hard to know that you would never need to keep some parts of it proprietary. This is especially true for startups, and I am saying it from experience.

    You simply need to keep that option open because sometimes it makes all the difference between a company closing its doors or thriving for years.

    That’s the liberty that you get from Apache.

    webrtcHacks: the Meet application was previously a MIT license. How are you handling that? Some argue that going from MIT to Apache is a step in the wrong direction.

    Emil: That’s true – the first lines of code in the Jitsi Meet project did come under MIT.  But there’s not much to handle there. The MIT license allows for code to be redistributed under any other license, including Apache, and Joe already pointed out why we think Apache is a better choice.

    Joe: There is also a purely practical side to this. As I mentioned, we’re in the process of restructuring our open source story, and Jitsi is one of the first in this effort.  So, it’s important for us to apply the same policy everywhere. The more exceptions we have, the harder it will be to manage and ensure a good experience for any Atlassian contributor.

    webrtcHacks: Jitsi was known in the past for soliciting community input before making major decisions. Why didn’t you announce the plans to change your licensing model before the actual change this time?

    Emil: Knowing the project as I do, it just never crossed my mind that this would be a problem for anyone. Throughout the past years I only heard concerns from people that found the LGPL too restrictive for them, so I only expected positive opinions. And the overwhelming majority have reacted positively.

    For the few people who have raised concerns, let me reiterate that we think this is the best possibility for Jitsi, and we also need to be practical and use a uniform license for all Atlassian projects.

    People who feel that LGPL was a better match for them are completely free to take last week’s version of the project and continue maintaining it under that license.

    webrtcHacks: what level of transparency can the Jitsi community expect going forward?

    Emil: This is actually one of the main ways in which BlueJimp’s acquisition is going to be beneficial to Jitsi.

    A lot of the work that BlueJimp did in the past was influenced by customer demand.  As a result, we never really knew exactly what to expect a month in the future. This is now over. Today it is much easier for us to define a roadmap and stick to it. Obviously we will still remain flexible as we listen to requests and important use cases from the community, but we are going to have significantly more visibility than before.

    webrtcHacks: the github charts indicate a slow down in activity vs. last year. Was this due to distraction from the acquisition? What level of public commits should we expect out of the new Atlassian Jitsi team going forward?

    Emil:  As with any project, there’s a lot that needs to be done in the early stages.  Ninety percent of what you do is push code. This gradually changes with time as the problems you are solving become more complex. At that point you spend a lot of time thinking, testing and debugging. As a result your code output diminishes.

    Take our joint efforts with Firefox, for example. This took a lot of time looking through wireshark traces, debugging and making small adjustments. The time it took to actually write the code was negligible compared to everything else we needed to do. Still, adding Firefox compatibility was important to Jitsi, and that happened within Atlassian.

    In addition, the entire team is relocating to Austin, and a relocation can be time-consuming.  

    But, there haven’t been any private commits, if that’s what you are thinking of :).

    Illustration of a Selective Forwarding Unit (SFU) architecture with 3 participants

    webrtcHacks: can you share some of your roadmap & plans for Jitsi?

    Emil: Gladly! We are really excited to continue working on what makes Jitsi Videobridge the most advanced SFU out there. This includes things like bandwidth adaptivity, for instance. We have big changes coming to our Simulcast and Last N support. Scalability and reliability will also be a main focus in the next months. This includes being able to do more conferences per deployment but also more people per conference. We are also going to be working on mobile, to make it easier for people to use the project on iOS and Android. Supporting other browsers and switching to Maven are also on the roadmap.

    We’re not ready to say when, or in what order these things will be happening – but they’re coming.

    webrtcHacks: should we expect to see the Jitsi source move from github to bitbucket?

    Joe: We’re keeping Jitsi on GitHub, since they excel at being a place for open source projects. Bitbucket is better designed for software teams within organizations that want greater control over their source code, to restrict access within their organization, teams, or even to specific individuals. However, one area that we do want to address is issue tracking. This has been a source of pain for Jitsi, so we’re considering moving issue tracking to JIRA, Atlassian’s issue tracking and management software, which will provide us with everything we need for better project management.

    We will be discussing this with the community in the coming weeks.

    {“interviewer”, “chad“}

    {“interviewees”, [“Emil Ivov“, “Joe Lopez“]}

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.

    The post Can an Open Source SFU Survive Acquisition? Q&A with Jitsi & Atlassian HipChat appeared first on webrtcHacks.

    Developing mobile WebRTC hybrid applications

    Mon, 06/29/2015 - 15:30

    There are a lot of notable exceptions, but most WebRTC developers start with the web because well, Web RTC does start with web and development is much easier there. Market realities tells a very different story – there is more traffic on mobile than desktop and this trend is not going to change. So the next phase in most WebRTC deployments is inevitably figuring out how to support mobile. Unfortunately for WebRTC that has often meant finding the relatively rare native iOS and Android developer.

    The team at eFace2Face decided to take a different route and build a hybrid plugin. Hybrid apps allows web developers to use their HTML, CSS, and JavaScript skills to build native mobile apps. They also open sourced the project and verified its functionality with the webrtc.org AppRTC reference. We asked them to give us some background on hybrid apps and to walk us through their project.

     {“intro-by”, “chad“}

    Hybrid apps for WebRTC (image source)

    When deciding how to create a mobile application using WebRTC there is no obvious choice. There are several items that should be taken into consideration when faced with this difficult decision, like the existence of previous code base, the expertise, amount of resources and knowledge available. Maintenance and support are also a very important factors given the fragmentation of the mobile environment.

    At eFace2Face we wanted to  extend our service to mobile devices.  We decided to choose our own path- exploring and filling in the gaps (developing new tools when needed) in order to create the solution that fitted us best.This post shares some of the knowledge and expertise we gained the hard way while doing so. We hope you find it useful!

    Types of mobile apps (image source)

    What’s a hybrid application?

    There are two main approaches on how hybrid apps are built:

    • WebView: Put simply, this is an HTML5 web application that is bundled inside a native app and uses the device’s web browser to display it. The development framework for the application provides access to the device’s functions (camera, address book, accelerometers, etc.) in the form of JavaScript APIs through the use of plugins. It should also be totally responsive and use native-like resources to get a UX similar to a real app. Examples include Cordova/PhoneGap, Trigger.io, Ionic, and Sencha (the latter two being like Cordova with steroids).

    Simple hybrid app example using PhoneGap (source)

    Creating Hybrid HTML5 app is the most extensive alternative and the one we prefer because it uses web specific technologies. You can get a deeper overview about native vs. HTML5 (and hybrid applications) in a recent blog post at Android Authority.

    Hybrid App Pros & Cons Pros:
    • Hybrid apps are as portable as HTML5 apps. They allow code reuse across platforms, with the framework handling all platform-specific differences.
    • A hybrid app can be built at virtually the same speed at which an HTML5 app can be built. The underlying technology is the same.  
    • A hybrid app can be built for almost the same cost as an HTML5 app. However, most frameworks require a license, which adds an extra development cost.
    • Hybrid apps can be made available and distributed via the relevant app store, just like native apps.
    • Hybrid apps have greater access to native hardware resources than plain HTML5 apps, usually through the corresponding framework’s own APIs.
    Cons:
    • Not all native hardware resources are available to hybrid apps. The available functionality depends on the framework used.
    • Hybrid apps appear to the end user as native apps, but run significantly slower than native apps. The same restriction on HTML5 apps being rejected for being too slow and not responsive on Apple’s App Store also applies to hybrid apps. Rendering complex CSS layouts will take longer than rendering a corresponding native layout.
    • Each framework has its own unique idiosyncrasies and ways of doing things that are not necessarily useful outside of the given framework.

    From our point of view, a typical WebRTC application is not really graphic-intensive (i.e. it is not, for instance, a game with lots of animations and 3D effects). Most of the complex processes are done internally by the browser, not in JavaScript, so a graphical UX interface should be perfectly doable on a hybrid application and run without any significant perceptible slowdown. Instagram is a good example of a well-known hybrid app that uses web technologies in at least some of its components.

    WebRTC on native mobile: current status

    Native support in Android and iOS is a bit discouraging. Apple do not support it at all, and has no public information about when are they going to do so, if they decide to support it at all. On Android, the native WebView supported WebRTC starting in version 4.4 (but be cautious as it is based on Chromium 36) then in 5.0 and onwards.

    Browser vendors fight (source)

    Note that there are no “native WebRTC” APIs on Android or iOS yet, so you will have to use Google’s WebRTC library. Justin Uberti (@juberti) provides a very nice overview of how to do this (go here to see the slides).

    Solutions

    Let’s take a look at the conclusions of our research.

    Android: Crosswalk

    In Android, using the native WebView seems like a good approach; in fact we used it during our first attempt to create our application. But then we decided to switch to Intel’s Crosswalk, which includes what’s best described as a “full Chrome browser”. It actually allows us to use a fully updated version of native Chromium instead of WebView.

    These were our reasons for choosing Crosswalk:

    • Fully compatible source code: You only have to handle a single Chromium version across all Android devices. More importantly, it has the latest, regularly updated WebRTC APIs.
    • Backward compatibility: According to developer.android.com, approximately 48% of Android devices currently in use are running Android versions below 4.4. While most of them don’t have hardware powerful enough to run WebRTC (either native or hybrid), you shouldn’t exclude this market.
    • Fragmentation: Different versions of Android mean different versions of WebView. Given the speed at which WebRTC is evolving, you will have difficulties dealing with version fragmentation and supporting old versions of WebView.
    • Performance: It seems you can get up to 10x improvement of both HTML/CSS rendering and JavaScript performance and CSS correctness.

    An advanced reader could think: “Ok, this is cool but I need to use different console clients (Cordova and Crosswalk) to generate my project, and I don’t like the idea of that.” You’re right, it would be a hassle, but we also found another trick here. This project allows us to add Crosswalk support to a Cordova project; it uses a new Cordova feature to provide different engines like any other plugin. This way we don’t need to have different baselines in the source code.

    iOS: Cordova plugin

    As explained before, there are frameworks that provide hybrid applications with the device functionality code via plugins. You can use them in your JavaScript code but they are implemented using native code. So, it should be possible to add the missing WebRTC JavaScript APIs.

    There are several options available, but most of them provide custom APIs or are tightly coupled with some proprietary signaling from a service provider. That’s the reason that we released an open source WebRTC Cordova plugin for iOS.

    The plugin is built on top of Google’s native WebRTC code and exposes the W3C WebRTC APIs. Also, as it is a Cordova plugin, it allows you to have the same Cordova application running on Android with Crosswalk, and on iOS with the WebRTC plugin. And both of them reuse all of the code base you are already using for your web application.

    Show me the code!

    “Yes, I have heard this already”, you might say, so let’s get some hands-on experience. In order to demonstrate that it’s trivial to reuse your current code and have your mobile application running in a matter of days (if not hours), we decided to take Google’s AppRTC HTML5 application and create a mobile application using the very same source code.

    You can find the iOS code on github, Here are the steps required to get everything we’re talking about working in minutes:

    • Get the source code: “git clone https://github.com/eface2face/iOSRTCApp; cd iOSRTCApp”
    • Add both platforms; all required plugins are installed automatically because of their inclusion in the “config.xml” file: Cordova platform add iOS android
    • Run as usual: “cordova run –device”
    • Once running, enter the same room as the one that’s already been created via web browser at https://apprtc.appspot.com/ and enjoy!

    Call between iOSRTCApp on iPad and APPRTC on browser

    We needed to make some minor changes in order to make it work properly in the Cordova environment. Each of these changes didn’t require more than a couple of js/html/css lines:

    • Due to Cordova’s nature, we had to add its structure to the project. Some plugins are required to get native features and permissions. The scripts js/apprtc.debug.js and js/appwindow.js are loaded once Cordova’s deviceready  event is fired. This is necessary since the first one relies on the existing window.webkitRTCPeerConnection  and navigator.webkitGetUserMedia , which are not set by cordova-plugin-iosrtc until the event fires.
    • The webrtcDetectedVersion  global variable is hardcoded to 43 as the AppRTC JavaScript code expects the browser to be Chrome or Chromium, and fails otherwise.
    • In order to correctly place video views (iOS native UIView elements), the plugin function refreshVideos is called when local or remote video is actually displayed. This is because the CSS video elements use transition effects that modify their position and size for a duration of 1 second.
    • A new CSS file css/main_overrides.css changes the properties of video elements. For example, it sets opacity to 0.85 in local-video  and remote-video  so HTML call controls are shown even below the native UIView elements rendering the local and remote video.
    • Safari crashes when calling plugin methods within WebSocket events (“onopen”, “onmessage”, etc.). Instead, you have to run a setTimeout  within the WebSocket event if you need to call plugin methods on it. We loaded the provided ios-websocket-hack.js script into our Cordova iOS app and solved this.
    • Polyfill for windows.performance.now()  used on AppRTC code.
    Conclusion

    Deciding whether to go hybrid or native for your WebRTC app is up to you. It depends on the kind of resources and relevant experience your company has, the kind of application that you want to implement, and the existing codebase and infrastructure you already have in place. The good news is our results show that using WebRTC is not a key factor in this decision, and you can have the mobile app version of your WebRTC web service ready in much less time than you probably expected.

    References

     

    {“authors”, [“Jesus Perez“,”Iñaki Baz“, “Sergio Garcia Murillo“]}

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart, @victorpascual and @tsahil.

    The post Developing mobile WebRTC hybrid applications appeared first on webrtcHacks.

    The new Android M App Permissions – Dag-Inge Aas

    Wed, 06/17/2015 - 15:30

    Android got a lot of WebRTC’s mobile development attention in the early days.  As a result a lot of the blogosphere’s attention has turned to the harder iOS problem and Android is often overlooked for those that want to get started with WebRTC. Dag-Inge Aas of appear.in has not forgotten about the Android WebRTC developer. He recently published an awesome walkthrough post explaining how to get started with WebRTC on Android. (Dag’s colleague Thomas Bruun also put out an equally awesome getting started walkthrough for iOS.) Earlier this month Google also announced some updates on how WebRTC permissions interaction will work on the new Android.  Dag-Inge provides another great walkthrough below, this time covering the new permission model.

    {“editor”: “chad“}

     

    At this year’s Google I/O, Google released the Android M Developer Preview with lots of new features. One of them is called App Permissions, and will have an impact on how you design your WebRTC powered applications. In this article, we will go through how you can design your applications to work with this new permissions model.

    To give you the gist of App Permissions, they allow the user to explicitly grant access to certain high-profile permissions. These include permissions such as Calendar, Sensors, SMS, Camera and Microphone. The permissions are granted at runtime, for example, when the user has pressed the video call-button in your application. A user can also at any time, without the app being notified, revoke permissions through the app settings, as seen on the right. If the app requests access to the camera again after being revoked, the user is prompted to once again grant permission.

    This model is very similar to how iOS has worked their permission model for years. User’s can feel safe that their camera and microphone are only used if they have given explicit consent at a time that is relevant for them and the action they are trying to perform.

    However, this does mean that WebRTC apps built for Android now have to face the same challenges that developers on iOS have to face. What if the user does not grant access?

    To get started, let’s make sure our application is built for the new Android M release. To do this, you have to edit your application’s build.gradle file with the following values:

    targetSdkVersion "MNC" compileSdkVersion "android-MNC" minSdkVersion "MNC" // For testing on Android M Preview only.

    Note that these values are prone to change once the finalized version of Android M is out.

    In addition, I had to update my Android Studio to the Canary version (1.3 Preview 2) and add the following properties to my build.gradle to get my sources to compile successfully:

    compileOptions { sourceCompatibility JavaVersion.VERSION_1_7 targetCompatibility JavaVersion.VERSION_1_7 }

    However, your mileage may vary. With all that said and done, and the M version SDK installed, you can compile your app to your Android device running Android M.

    Checking and asking for permissions

    If you start your application and enable its audio and video capabilities, you will notice that the camera stream is black, and that no audio is being recorded. This is because you haven’t asked for permission to use those APIs from your user yet. To do this, you have to call requestPermissions(permissions, yourRequestCode)  in your activity, where permissions is a String[] of android permission identifiers, and yourRequestCode is a unique integer to identify this specific request for permissions.

    String[] permissions = {   "android.permission.CAMERA",   "android.permission.RECORD_AUDIO" }; int yourRequestId = 1; requestPermissions(permissions, yourRequestCode);

    Calling requestPermissions will spawn two dialogs to the user, as shown below.

    When the user has denied or allowed access to the APIs you request, the Activity’s onRequestPermissionsResult(int requestCode, String permissions[], int[] grantResults) method is called. Here we can recognize the requestCode we sent when asking for permissions, the String[] of permissions we asked for access to, and an int[]  of results from the grant permission dialog. We now need to inspect what permissions the user granted us, and act accordingly. To act on this data, your Activity needs to override this method.

    @Override public void onRequestPermissionsResult(int requestCode, String permissions[], int[] grantResults) { switch (requestCode) { case YOUR_REQUEST_CODE: { if (grantResults[0] == PackageManager.PERMISSION_GRANTED){ // permission was granted, woho! } else { // permission denied, boo! Disable the // functionality that depends on this permission. } return; } } }

    Handling denied permissions will be up to your app how you want to handle, but best practices dictate that you should disable any functions that rely on these permissions being granted. For example, if the user denies access to the camera, but enables the microphone, the toggle video button should be disabled, or alternatively trigger the request again, should the user wish to add their camera stream at a later point in the conversation. Disabling access to video also means that you can avoid doing the VideoCapturer  dance to get the camera stream, it will be black anyway.

    One thing to note is that you don’t always need to ask for permission. If the user has already granted access to the camera and microphone previously, you can skip this step entirely. To determine if you need to ask for permission, you can use checkSelfPermission(PERMISSION_STRING)  in your Activity. This will return PackageManager.PERMISSION_GRANTED  if the permission has been granted, and PackageManager.PERMISSION_DENIED  if the request was denied. If the request was denied, you may ask for permission using requestPermissions.

    if (checkSelfPermission(Manifest.permission.CAMERA) == PackageManager.PERMISSION_DENIED) { requestPermissions(new String[]{Manifest.permission.CAMERA}, YOUR_REQUEST_CODE); }

    When to ask for permission

    The biggest question with this approach is when to ask for permission. When are the user’s more likely to understand what they are agreeing to, and therefore more likely to accept your request for permissions?

    To me, best practices really depend on what your application does. If your applications primary purpose is to enable video communication, for example a video chat application, then you should at initial app startup prime the user for setting up their permissions. However, you must at the same time make sure that permissions are still valid, and if necessary, reprompt the user in whatever context is natural for permissions should you need it. For example, a video chat application based on rooms may prompt the user to enable video and audio before entering a video room. If the user has already granted access, the application should proceed to the next step. If not, the application should explain in clear terms why it needs audio and video access, and ask the user to press a button to get prompted. This makes the user much more likely to trust the application, and grant access.

    If your application’s secondary purpose is video communication, for example in a text messaging app with video capabilities, best practices dictate that the user should get prompted when clicking the video icon for starting a video conversation. This has a clear benefit, as the user knows their original intention, and will therefore be likely to grant access to their camera and microphone.

    And remember, camera and microphone access can be revoked at any time without the app’s knowledge, so you have to run the permission check every time.

    The new Android M permissions UI

    But do I have to do it all now?

    Yes and no. Android M is still a ways off, so there is no immediate rush. In addition, the new permission style only affects applications built for, or targeting, Android M or greater API levels. This means that your application with the old permission model will still work and be available to devices running Android M. The user will instead be asked to grant access at install time. Note that the user may still explicitly disable access through the app settings, at which point your application will show black video or no sound. So unless you wish to use any of the new features made available in Android M, you can safely hold off for a little while longer.

    {“author”: “Dag-Inge Aas“}

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

    The post The new Android M App Permissions – Dag-Inge Aas appeared first on webrtcHacks.

    WebRTC and Man in the Middle Attacks

    Fri, 06/12/2015 - 21:33

    WebRTC is supposed to be secure. A lot more than previous VoIP standards. It isn’t because it uses any special new mechanism, but rather because it takes it seriously and mandates it for all sessions.

    Alan Johnston decided to take WebRTC for a MitM spin – checking how easy is it to devise a man-in-the-middle attack on a naive implementation. This should be a reminder to all of us that while WebRTC may take care of security, we should secure our signaling path and the application as well.

    {“editor”: “tsahi“}

    Earlier this year, I was invited to teach a graduate class on WebRTC at IIT, the Illinois Institute of Technology in Chicago.  Many of you are probably familiar with IIT because of the excellent Real-Time Communications (RTC) Conference (http://www.rtc-conference.com/) that has been hosted at IIT for the past ten years.

    I’ve taught a class on SIP and RTC at Washington University in St. Louis for many years, but I was very excited to teach a class on WebRTC.  One of the key challenges in teaching is to come up with ways to make the important concepts come alive for your students.  Trying to make security more interesting for my students led me to write my first novel, Counting from Zero, a technothriller that introduces concepts in computer and Internet security (https://countingfromzero.net).  For this new WebRTC class, I decided that when I lectured about security, I would – without any warning – launch a man-in-the-middle (MitM) attack (https://en.wikipedia.org/wiki/Man-in-the-middle_attack) on my students.

    It turned out the be surprisingly easy to do, for two reasons.

    1. It is so quick and easy to prototype and test new ideas with WebRTC. JavaScript is such a fun language to program in, and node.js makes it really easy on the server side.
    1. Unfortunately, WebRTC media sessions have virtually no protection against MitM attacks. Not many people seem to be aware of the fact that although WebRTC uses DTLS to generate keys for the SRTP media session, the normal protections of TLS are not available.

    So, a few weeks later, I had a WebRTC MitM attack ready to launch on my students that neither Chrome or Firefox could detect.

    How did it work?  Very simple.  First, I compromised the signaling server.  I taught the class using the simple demo application from the WebRTC book (http://webrtcbook.com) that I wrote with Dan Burnett.  (You can try the app with a friend at http://demo.webrtcbook.com:5001/data.html?turnuri=1.)  The demo app uses a simple HTTP polling signaling server that matches up two users that enter the same key and allows them to exchange SDP offers and answers.

    I compromised the signaling server so that when I entered a key using my MitM JavaScript application, instead of the signaling server connecting the users who entered that key, those users would instead be connected to me.  When one of the users called the other, establishing a new WebRTC Peer Connection, I would actually receive the SDP offer, and I would answer it, and then create a new Peer Connection to the other user, sending them my SDP offer.  The net result was two Peer Connections instead of one, and both terminated on my MitM JavaScript application.  My application performs the SDP offer/answer negotiation and the DTLS Handshake with each of the users.  Each of the Peer Connections was considered fully authenticated by both browsers.  Unfortunately, the Peer Connections were fully authenticated to the MitM attacker, i.e. me.

    Here’s how things look with no MitM attacker:

     

    Here’s how things look with a MitM attacker who acts as a man-in-the-middle to both the signaling channel and DTLS:

    How hard was it to write this code?  Really easy.  I just had to duplicate much of the code so that instead of one signaling channel, my MitM JavaScript had two.  Instead of one Peer Connection, there were two.  All I had to do was take the MediaStream I received incoming over one Peer Connection and attach it to the other Peer Connection as outgoing, and I was done.  Well, almost.  It turns out that Firefox doesn’t currently support this yet (but I’m sure it will one of these days) and Chrome has a bug in their audio stack so that the audio does not make it from one Peer Connection to another (see bug report https://code.google.com/p/webrtc/issues/detail?id=2192#c15).  I tried every workaround I could think of, including cloning, but no success.  If anyone has a clever workaround for this bug, I’d love to hear about it.  But the video does work, and in the classroom, my students didn’t even notice that the MitM call had no audio.  They were too busy being astonished that after setting up their “secure WebRTC call” (we even used HTTPS which gave the green padlock – of course, this had no effect on the attack but showed even more clearly how clueless DTLS and the browsers were), I showed them my browser screen which had both of their video streams.

    When I tweeted about this last month, I received lots of questions, some asking if I had disclosed this new vulnerability.  I answered that I had not, because it was not an exploit and was not anything new.  Everyone involved in designing WebRTC security was well aware of this situation.  This is WebRTC working as designed – believe it or not.

    So how hard is it to compromise a signaling server?  Well, it was trivial for me since I did it to my own signaling server.  But remember that WebRTC does not mandate HTTPS (why is that, I wonder?).  So if HTTP or ordinary WebSocket is used, any attacker can MitM the signaling if they can get in the middle with a proxy.  If HTTPS or secure WebSocket is used, then the signaling server is the where the signaling would need to be compromised.  I can tell you from many years of working with VoIP and video signaling that signaling servers make very tempting targets for attackers.

    So how did we get here?  Doesn’t TLS and DTLS have protection against MitM attacks?

    Well, TLS as used in web browsing uses a certificate from the web server issued by a CA that can be verified and authenticated.  On the other hand, WebRTC uses self-signed certificates that can’t be verified or authenticated.  See below for examples of self-signed certificates used by DTLS in WebRTC from Chrome and Firefox.  I extracted these using Wireshark and displayed them on my Mac.  As you can see, there is nothing to verify.  As such, the DTLS-SRTP key agreement is vulnerable to an active MitM attack.

    The original design of DTLS-SRTP relied on exchanging fingerprints (essentially a SHA-256 hash of the certificate, e.g. a=fingerprint:sha-256 C7:4A:8A:12:F8:68:9B:A8:2A:95:C9:5E:7A:2A:CE:64:3D:0A:95:8E:E9:93:AA:81:00:97:CE:33:C3:91:50:DB) in the SIP SDP offer/answer exchange, and then verifying that the certificates used in the DTLS Handshake matched the certificates in the SDP.  Of course, this assumes no MitM is present in the SIP signaling path.   The protection against a MitM in signaling recommended by DTLS-SRTP is to use RFC 4474 SIP Enhanced Identity for integrity protection of the SDP in the offer/answer exchange.  Unfortunately, there were major problems with RFC 4474 when it came to deployment, and the STIR Working Group in the IETF (https://tools.ietf.org/wg/stir/) is currently trying to fix these problems.  For now, there is no SIP Enhanced Identity and no protection against a MitM when DTLS-SRTP is used with SIP.   Of course, WebRTC doesn’t mandate SIP or any signaling protocol, so even this approach is not available.

    For WebRTC, a new identity mechanism, known as Identity Provider, is currently proposed (https://tools.ietf.org/html/draft-ietf-rtcweb-security-arch).  I will hold off on an analysis of this protocol for now, as it is still under development in an Internet-Draft, and is also not available yet.  Firefox Nightly has some implementation, but I’m not aware of any Identity Service Providers, either real or test, that can be used to try it out yet.  I do have serious concerns about this approach, but that is a topic for another day.

    So are we out of luck with MitM protection for WebRTC for now?  Fortunately, we aren’t.

    There is a security protocol for real-time communications which was designed with protection against MitM – it is ZRTP (https://tools.ietf.org/html/rfc6189) invented by Phil Zimmermann, the inventor of PGP.   ZRTP was designed to not rely on and not trust the signaling channel, and uses a variety of techniques to protect against MitM attacks.

    Two years ago, I described how ZRTP, implemented in JavaScript and run over a WebRTC data channel, could be used to provide WebRTC the MitM protection it currently lacks (https://tools.ietf.org/html/draft-johnston-rtcweb-zrtp).  During TADHack 2015(http://tadhack.com/2015/), if my team sacrifices enough sleep and drinks enough coffee, we hope to have running code to show how ZRTP can detect exactly this MitM attack.

    But that also is a subject for another post…

    {“author”: “Alan Johnston“}

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

    The post WebRTC and Man in the Middle Attacks appeared first on webrtcHacks.

    Facetime doesn’t face WebRTC

    Tue, 06/09/2015 - 16:03

    This is the next decode and analysis in Philipp Hancke’s Blackbox Exploration series conducted by &yet in collaboration with Google. Please see our previous posts covering WhatsApp and Facebook Messenger for more details on these services and this series. {“editor”: “chad“}

     

    FaceTime is Apple’s answer to video chat, coming preinstalled on all modern iPhones and iPads. It allows audio and video calls over WiFi and, since 2011, 3G too. Since Apple does not talk much about WebRTC (or anything else), maybe we can find out if they are using WebRTC behind the scenes?

    As part of the series of deconstructions, the full analysis (another sixteen pages) is available for download here, including the Wireshark dumps.
    If you prefer watching videos, check out the recording of this talk I did at Twilio’s Signal conference where I touch on this analysis and the others in this series.

    In a nutshell, FaceTime

    • is quite impressive in terms of quality,
    • requires an open port (16402) in your firewall as documented here,
    • supports iOS and MacOS devices only,
    • supports simultaneous ring on multiple devices,
    • is separate from the messaging application, unlike WhatsApp and Facebook Messenger,
    • announces itself by sending metrics over an unencrypted HTTP connection (Dear Apple, have you heard about pervasive monitoring?)
    • presumably still uses SDES (no signs of DTLS handshakes, but I have not seen a=crypto lines in the SDP either).

    Since privacy is important, it is sad to see a complete lack of encryption in the HTTP metrics call like this one:

    Example of an unencrypted keep alive packet that could be intercepted by a 3rd party to track a user

    Details

    FaceTime has been analyzed earlier- first when it was introduced back in 2010 and more recently in 2013. While the general architecture is still the same, FaceTime has evolved over the years like adding new codecs like H.265 when calling over cellular data.

    What else has changed? And how much of the changes can we observe? Is there anything those changes tell us about potential compatibility with WebRTC?

    Still using SDES

    It is sad that Apple continuing to use SDES almost two years after the IETF at it’s Berlin meeting where it was decided that WebRTC MUST NOT Support SDES. The consensus on this topic during the meeting was unanimous. For more background information, see either Victor’s article on why SDES should not be used or dive into Eric Rescorla’s presentation from that meeting comparing the security properties of both systems.

    NAT traversal

    Like WebRTC, FaceTime is using the ICE protocol to work around NATs and provide a seamless user experience. However, Apple is still asking users to open a certain number of ports to make things works. Yes, in 2015.

    Their interpretation of ICE is slightly different from the standard. In a way similar to WhatsApp, it has a strong preference for using a TURN servers to provide a faster call setup. Most likely, SDES is used for encryption.

    Video

    For video, both the H.264 and the H.265 codecs are supported, but only H.264 was observed when making a call on a WiFi. The reason for that is probably that, while saving bandwidth, H.265 is more computationally expensive. One of the nice features is that the optimal image size to display on the remote device is negotiated by both clients.

    Audio

    For audio, the AAC-ELD codec from Fraunhofer is used as outlined on the Fraunhofer website.
    In nonscientific testing, the codec did show behaviour of playing out static noise during wifi periods of packet loss between two updated iPhone 6 devices.

    Signaling

    The signaling is pretty interesting, using XMPP to establish a peer-to-peer connection and then using SIP to negotiate the video call over that peer-to-peer connection (without encrypting the SIP negotiation).

    This is a rather complicated and awkward construct that I have seen in the past when people tried to avoid making changes to their existing SIP stack. Does that mean Apple will take a long time to make the library used by FaceTime generally usable for the variety of use cases arising in the context of WebRTC? That is hard to predict, but this seems overly complex.

    Quality of Experience

    FaceTime offers an impressive quality and user experience. Hardware and software are perfectly attuned to achieve this. As well as the networking stack as you can see in the full story.

     

    {“author”: “Philipp Hancke“}

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

    The post Facetime doesn’t face WebRTC appeared first on webrtcHacks.

    Why put WebRTC in Webkit? Q&A with the webrtcinwebkit team

    Tue, 05/26/2015 - 15:45

    The world of browsers and how they work is both complex and fascinating. For those that are new to the browser engine landscape, Google, Apple, and many others collaborated on an open source web rendering engine for many years known as WebKit.  WebKit has active community with many less well known browsers that use it, so the WebKit community was shocked when Google announced they would fork WebKit into a new engine for Chrome called Blink.

    Emphasis for implementing WebRTC shifted with Google into Blink at the expense of WebKit. To date, Apple has not given any indications it was going to add  WebRTC into WebKit (see this post for an idea on nudging them). This is not good for the eclectic WebKit development community that would like to start working with WebRTC or those hoping for WebRTC support in Apple’s browsers.

    Then an interesting project by Ericsson, Temasys, and Igalia was announced –  webrtcinwebkit. As the name implies, the goal of this project was to put WebRTC in WebKit, by themselves. Many questions come to mind, but the most important may be is this a worthwhile effort and one that other WebRTC developers should be involved in?

    I asked webrtcinwebkit contributors Alex Gouaillard and Stefan Håkansson to give some more background and help answer who should care about this project and why they are working on this.

    {“intro-by”, “chad“}

     

     

     

    webrtcHacks: For those who are not familiar, can you describe the webrtcinwebkit project?

    Alex:  webrtcinwebkit is a project aiming at bringing WebRTC support in WebKit which is the foundation of a lot of web browsers like Safari and web browsing frameworks like WebView used extensively in Mobile applications.

    webrtcHacks: How did this come together? Who are the major contributors to this project?

    Alex:  in 2014, Ericsson had an initial implementation to flesh out and was thinking about open sourcing the underlying engine now known as OpenWebRTC. Igalia’s Philippe Normand had been trying to integrate WebRTC in WebKitgtk+ using Ericsson’s 2011 code base for some time, Temasys was trying to find a way to enable WebRTC in iOS and remove the need for a plugin in desktop Safari. At a W3C meeting in Santa Clara in November 2014, a discussion happened with an Apple employee that make the parties believe that there was a way. The project was born.

    webrtcHacks: Why do you think Apple has not implemented WebRTC already? How will webrtcinwebkit change that?

    Alex:  Those who know can’t talk and those who talk likely don’t know.

    Our answer is: we don’t know for sure.

    webrtcHacks: How much of this project is about Apple vs. other WebKit users?

    Alex:  Not at all about Apple. We’re working on adding WebRTC API’s to WebKit, and to back it with OpenWebRTC for the WebKitgtk+ port. To what extent other ports make use of the WebRTC API’s is up to them, but of course we hope they will use them.

    webrtcHacks: was there a lot of WebRTC in WebKit already from before Google forked blink?

    Stefan:  Yes, there was quite a bit of WebRTC in there, contributed by Ericsson, Google, Igalia and others. Some parts can be re-used, but largely the standard has evolved so much that we had to start over.

     

    WebKit commits per month after Blink. Source: Juan J Sanchez, Igalia (May 2014)

    webrtcHacks: Can you comment on how your perspective of how Google’s forking of webkit has impacted the webkit community? Isn’t everyone just moving to blink now?

    Stefan:  Just looking at the revision history of the WebKit source code will immediately show that the WebKit project is very much alive.

    webrtcHacks: Is there value in webrtcinwebkit for other developers even if Apple ignores this?

    Alex:  the WebKit community itself is pretty big, and again as far as mobile and embedded devices are concerned, this is the main way of handling web pages. Whether Apple adopt the changes or not, this is good news for many people.

    Stefan: I agree fully. I would also like to add the value for the WebRTC community and the standardization effort of a second, independent implementation, as the initial back end used will be OpenWebRTC. The current WebRTC implementations in Chrome, Firefox and Opera all to a large extent use the same webrtc.org backend so we think it is a point in using OpenWebRTC as the backend as it gives a second, truly independent, implementation of the standard.

    webrtcHacks: How much effort is  going into this project? Is it staffed adequately for success?

    Alex:  Everybody is working on separate projects, so depending for example if you count people working on OpenWebRTC as being part of this project or not, the count will change. There are roughly the following Full-time-equivalent (FTE) staff:

    • 4  at Ericsson
    • 2 Ericsson contractors
    • 2 Temasys
    • 1 Igalia

    Not all FTE are equal contributors though and the main technical leaders (for WebKit) are with Ericsson and Igalia.

    webrtcHacks: How long will it take to complete?

    Alex:  We have a version that includes getUserMedia already working a demoable in a separate fork. Ericsson is polishing the PeerConnection, while Temasys is handling Datachannel. It then takes a little bit of time to submit patches to upstream WebKit, so the nightly version of WebKit is still behind. OpenWebRTC and webrtc in WebKit is based on a very recent version of GStreamer, and updating that component in WebKit as far reaching consequences. It touches all the media functionalities of WebKit. We think this will take some time to get in, then the following patches should be self contained and easier to push. We aim at a June timeline.

    webrtcHacks: I have heard many comments about GStreamer not being appropriate for real time applications. Can you address that topic?

    Stefan:  That does not match our experience at all. We’ve been using GStreamer for many real time applications since back in 2010 when we showed our first implementation of what eventually became WebRTC, and we’ve consistenly got good real time performance. The fact that there is a lively GStreamer community is also a big plus, that enabled OpenWebRTC to quickly make use of the HW accelerated video codec that became available in iOS 8 – something I have heard webrtc.org still doesn’t for example.

    webrtcHacks: How tightly coupled is webrtcinwebkit with OpenWebRTC? Is webrtcinwebkit really a derivative and dependant on OpenWebRTC?

    Stefan: Although our initial implementation is backended by OpenWebRTC, the project is designed to fully support other implementations, so no, webrtcinwebkit is not a derivative on OpenWebRTC.

    Alex:  Given enough time (and resource), the original plan was to have two back ends for webrtcinwebkit: both OWR from OpenWebRTC and libwebrtc from webrtc.org. It’s not clear yet if we will be able to make it happen, but it’s still a wish.

    Browser Engine Share Chrome Blink 41% IE Trident 14% Safari Webkit 14% Android Browser Webkit 7% Firefox Gecko 12% Opera Blink 4% UC Browser ?? 3% Nokia Webkit 1% Browser Usage Share – May 2014 to Apr 2015
    Source: http://gs.statcounter.com/#all-browser-ww-monthly-201405-201504-bar

    webrtcHacks: How has the community developed in the month since you launched this project? How many contributors are outside of Ericsson and Temasys?

    Alex:  We announced the project before before it was ready for others to contribute. Most of the WebRTC ecosystem was under the impression there was nothing being done on that side, which was just not true. We wanted to let people know something was brewing, and that they would be able to contribute soon. We also wanted to welcome early feedback.

    We received a lot of interest by e-mail, from different companies. We eventually made our fork of WebKit, our working copy if you wish, public on GitHub. Today anybody can make a pull request. That being said, we believe it would be better to wait until we have a first implementation of getUserMedia, PeerConnection and DataChannel before they jump in. For most of the work, you need a first implementation of Offer/Answer, and the capacity to see on screen the video streams, or hear the audio stream before you can start to debug. That should happen in a matter of weeks now.

    webrtcHacks: What does success of this project look like? How will you know when this succeeds?

    Alex:  As soon as WebKitgtk+, one of the default Linux browsers, supports WebRTC, we think it will be a success. Eventually seeing other ports – other browsers using WebKit  adopting those changes will be very gratifying too.

    {“interviewer”, “chad“}

    {“interviewees”, [“Alex Gouaillard“, “Stefan Håkansson“]}

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

    The post Why put WebRTC in Webkit? Q&A with the webrtcinwebkit team appeared first on webrtcHacks.

    Facebook Messenger likes WebRTC

    Mon, 05/11/2015 - 12:37

    Two weeks ago Philipp Hancke,  lead WebRTC developer of Talky and part of the &yet‘s WebRTC consulting team, started a series of posts about detailed examinations he is doing on several major VoIP deployments to see if and how they may be using WebRTC. Please see that post on WhatsApp for some background on the series and below for another great analysis – this time on Facebook Messenger. {“editor”: “chad“}

     

    Last week, Facebook announced support for video chats in their Messenger app. Given that Messenger claims to account for 10% of global mobile VoIP traffic, this made in a very interesting target for further investigation. As part of the series of deconstructions, the full analysis (another fifteen pages, using the full range of analysis techniques demonstrated earlier) is available for download here, including the wireshark dumps.

     

    Facebook Messenger likes WebRTC

    Facebook Messenger is an extremely popular messaging and communications app. It works in Chrome / Firefox and Opera as well as Android and iOS. The mobile versions use the WebRTC.org library and Messenger is optimized heavily for mobile use cases:

    • Instead of DTLS, the older SDES encryption scheme is used between mobile clients. While not offering features such as perfect forward secrecy, it provides faster call setup time.
    • The audio codec depends on platform and peer. Opus, iSAC and iSAC Low Complexity (LC) are used, with a packetization that prefers fewer, larger packets.
    • VP8 is used as video codec on web, iOS and Android. Interestingly, Facebook chose VP8 over H.264 for iOS, even though VP8 has no hardware acceleration capability on Apple iOS devices.
    Comparison with WebRTC  Feature WebRTC/RTCWeb Specifications Messenger SDES MUST NOT offer SDES uses SDES between mobile clients, DTLS with browsers ICE RFC 5245 RFC 5245 with some old google quirks TURN usage used as last resort TURN and old Google relay Audio codec Opus or G.711 Opus, iSAC, ISAC Low Complexity Video codec VP8 or H264 VP8 WebRTC fulfilling its promise: No plugins required

    Back in 2011, Facebook launched Skype-powered video calling. It used a plugin. The need for a plugin is now gone, probably along with other parts of the integration as described in this announcement. Most platforms are supported without requiring users to install something. WebRTC is starting to fulfill its promise: no plugins needed.

    The WebRTC rollout at Facebook has been done gradually, starting in early 2015. Chad Hart was one of the first people to notice in mid-January. I took a quick look and was not very impressed by what I found. Basically it was a simple 1-1 webchat akin to Google’s apprtc sample.

    Recently, there has been a lot of news on Facebook’s Messenger. They launched messenger.com as a standalone website with support for voice and video between browsers. As Tsahi Levent-Levi pointed out already, it is using WebRTC.

    On the mobile side, the Messenger client had been voice only. Video calling is available in the apps as well. All of this warrants a closer look. With WhatsApp and Messenger coming from the same company, we were expecting similarities, making this report easy. Well… that’s not how it ended up.

    It turns out that unlike WhatsApp:

    • Messenger uses the webrtc.org library provided by Google
    • VP8 video codec is used for video calls
    • Choice of the audio codec, which includes ISAC and Opus, varies based on the devices used and the other party in the call
    • DTLS is used for encrypting the media between the app and the browsers
    • DTLS is not used in calls between two mobile devices (surprisingly)
    Mobile

    While Messenger looks pretty standard, there are quite a number of optimizations here. Some gems are hidden to anyone glancing through the details. Chrome’s webrtc-internals are easily available and were used in the first scenarios tested. It allows looking at the implementation from a number of different angles, all the way from the captured packets, via the signaling protocol and up to the WebRTC API calls.

    The iOS application offers a number of non-standardized codecs. In particular, when calling Chrome the iSAC audio codec will be used. Opus is used with Firefox, which does not implement iSAC. In both cases, the app tweaks the codec usage by sending larger frames than the browser. That is quite an interesting hack reminiscent of the WhatsApp behaviour of switching to a larger packetization. I can only speculate that this may show larger packets to be more robust on wireless networks?

    Calls between two mobile devices turn out to be more interesting. Here, the optimizations for calling a browser come to bear fully.

    Currently, video calls are only supported between either mobile devices or browsers, which will surely change soon. VP8 is used as the codec.

    Unlike WhatsApp, Messenger can be run on multiple devices. So when calling someone who is logged in on multiple devices, all devices ring. This is nice behavior, as it allows the called user to choose where to take the call. This makes a lot more sense than previous approaches, e.g. Google Talk attempting to let the caller determine the recipients “most available device” from a UX point of view. In a world where mobile devices are disconnected frequently and need to be woken up via push messages, this is even more important than it was a decade ago.

    Security

    The most important takeaway is that instead of using DTLS, Messenger uses the older SDES encryption scheme between two mobile clients. This means media can flow as soon as ICE is done, eliminating a round-trip for the DTLS handshake. This shows a desire to reduce the session startup time, similar to what we saw with WhatsApp.
    The widely known downside of that is that the encryption keys are sent via the signaling servers and can be used to retroactively decrypt traffic. Government agencies will surely rejoice at the news of this being applicable to 10% of the mobile VoIP traffic…

     

    {“author”: “Philipp Hancke“}

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

    The post Facebook Messenger likes WebRTC appeared first on webrtcHacks.

    What’s up with WhatsApp and WebRTC?

    Wed, 04/22/2015 - 17:14

    One of our first posts was a Wireshark analysis of Amazon’s Mayday service to see if it was actually using WebRTC. In the very early days of WebRTC, verifying a major deployment like this was an important milestone for the WebRTC community. More recently, Philipp Hancke – aka Fippo – did several great posts analyzing Google Hangouts and Mozilla’s Hello service in Firefox. These analyses validate that WebRTC can be successfully deployed by major companies at scale. They also provide valuable insight for developers and architects on how to build a WebRTC service.

    These posts are awesome and of course we want more.

    I am happy to say many more are coming. In an effort to disseminate factual information about WebRTC, Google’s WebRTC team has asked &yet – Fippo’s employer – to write a series of publicly available, in-depth, reverse engineering and trace analysis reports. Philipp has agreed to write summary posts outlining the findings and implications for the WebRTC community here at webrtcHacks. This analysis is very time consuming. Making it consumable for a broad audience is even more intensive, so webrtcHacks is happy to help with this effort in our usual impartial, non-commercial fashion.

    Please see below for Fippo’s deconstruction of WhatsApp voice calling.

    {“editor”: “chad“}

     

    Philipp Hancke deconstructs WhatsApp to search for WebRTC

     

    After some rumors (e.g. on TechCrunch), WhatsApp recently launched voice calls for Android. This spurred some interest in the WebRTC world with the usual suspects like Tsahi Levent-Levi chiming in and starting a heated debate. Unfortunately, the comment box on Tsahi’s BlogGeek.Me blog was too narrow for my comments so I came back here to webrtchacks.

    At that point, I had considered doing an analysis of some mobile services already and, thanks to support from the Google WebRTC team, I was able to spend a number of days looking at Wireshark traces from WhatsApp in a variety of scenarios.

    Initially, I was merely trying to validate the capture setup (to be explained in a future blog post) but it turned out that there is quite a lot of interesting information here and even some lessons for WebRTC. So I ended up writing a full fifteen page report which you can get here. It is a long story of packets (available for download here) which will be very boring if you are not an engineer so let me try to summarize the key points here.

    Summary

    WhatsApp is using the PJSIP library to implement Voice over IP (VoIP) functionality. The captures shows no signs of DTLS, which suggests the use of SDES encryption (see here for Victor’s past post on this).  Even though STUN is used, the binding requests do not contain ICE-specific attributes. RTP and RTCP are multiplexed on the same port.

    The audio codec can not be fully determined. The sampling rate is 16kHz, the codec bandwidth of about 20kbit/s and the bandwidth was the same when muted.

    An inspection of the binary using the strings tool shows both PJSIP and several strings hinting at the use of elements from the webrtc.org voice engine such as the acoustic echo cancellation (AEC), AECM, gain control (AGC), noise suppression and the high-pass filter.

    Comparison with WebRTC  Feature WebRTC/RTCWeb Specifications WhatsApp SDES MUST NOT offer SDES probably uses SDES ICE RFC 5245 no ICE, STUN connectivity checks TURN usage used as last resort uses a similar mechanism first Audio codec Opus or G.711 unclear, 16khz with 20kbps bitrate Switching from a relayed session to a p2p session

    The most impressive thing I found is the optimization for a fast call setup by using a relay initially and then switching to a peer-to-peer session. This also opens up the possibility for a future multi-party VoIP call which would certainly be supported by this architecture. The relay server is called “conf bridge” in the binary.

    Lets look at the first session to illustrate this (see the PDF for the full, lengthy description):

    1. The session kicks off (in packet #70) by sending TURN ALLOCATE requests to eight different servers. This request doesn’t use any standard STUN attributes which is easy to miss.
    2. After getting a response the client is exchanging some signaling traffic with the signaling server, so this is basically gathering a relayed candidate and sending an offer to the peer.
    3. Packet #132 shows the client sending something to one of those TURN servers. This turns out to be an RTCP packet, followed by some RTP packets, which can be seen by using Wiresharks “decode as” functionality. This is somewhat unusual and misleading, as it is not using standard TURN functionality like send or data indications. Instead, it just does raw RTP on that.
    4. Packet #146 shows the first RTP packet from the peer. For about three seconds, the RTP traffic is relayed over this server.
    5. In the mean time, packet #294 shows the client sending a STUN binding request to the peer’s public IP address. Using a filter (ip.addr eq 172.16.42.124 and ip.addr eq 83.209.197.82) and (udp.port eq 45395 and udp.port eq 35574)  clearly shows this traffic.
    6. The first response is received in packet #300.
    7. Now something really interesting happens. The client switches the destination of the RTP stream between packets #298 and #305. By decoding those as RTP we can see that the RTP sequence number increases just by one. See this screenshot:

    Now, if we have decoded everything as RTP (which is something Wireshark doesn’t get right by default so it needs a little help), we can change the filter to rtp.ssrc == 0x0088a82d  and see this clearly. The intent here is to try a connection that is almost guaranteed to work first (I used a similar rationale in the minimal viable SDP post recently even) and then switch to a peer-to-peer connection in order to minimize the load on the TURN servers.

    Wow, that is pretty slick. It likely reduces the call setup time the user perceives. Let me repeat that: this is a hack which makes the user experience better!

    By how much is hard to quantify. Only a large-scale measurement of both this approach and the standard approach can answer that.

    Lessons for WebRTC

    In WebRTC, we can do something similar, but it is a little more effort right now. We can setup the call with iceTransports: ‘relay’ which will skip host and server-reflexive candidates. Also, using a relay helps to guarantee the connetion will work (in conditions where WebRTC will work at all).

    There are some drawbacks to this approach in terms of round-trip-times due to TURN’s permission mechanism. Basically when creating a TURN-relayed candidate the following happens (in Chrome; Firefox’s behavior differs slightly):

    1. Chrome tries to create an allocation without authentication
    2. the TURN server asks for authentication
    3. Chrome retries to create an allocation with authentication
    4. the TURN server tells chrome the address and port of the candidate.
    5. Chrome signals the candidate to the JavaScript layer via the onicecandidate callback. That is two full round-trip times.
    6. after adding a remote candidate, Chrome will create a TURN permission on the server before the server will relay traffic from the peer. This is a security mechanism described here.
    7. now STUN binding requests can happen over the relayed address. This uses TURN send and data indications. These add the peer’s address and port to each packet received.
    8. when agreeing on a candidate, Chrome creates a TURN channel for the peer’s address which is more efficient in terms of overhead.

    Compared to this, the proprietary mechanism used by Whatsapp saves a number of roundtrips.

    this is a hack which makes the user experience better!

    If we started with just relay candidates, then, since this hides the IP addresses of the parties involved from each other, we might even establish the relayed connection and do the DTLS handshake before the callee accepts the call. This is known as transport warmup, it reduces the perceived time until media starts flowing.

    Once the relayed connection is established, we can call setConfiguration (formerly known as updateIce; which is currently not implemented) to remove the restriction to relay candidates and do an ICE restart by calling createOffer again with the iceRestart flag set to true. This would trigger an ICE restart which might determine that a P2P connection can be established.

    Despite updateIce not being implemented, we can still switch from a relay to peer-to-peer today. ICE restarts work in Chrome so the only bit we’re missing is the iceTransports ‘relay’ which just generates relay candidates. Now the same effect can be simulated in Javascript by dropping any non-relay candidates during the first iteration. It was pretty easy to implement this behaviour in my favorite sdp munging sample. The switch from relayed to P2P just works. The code is committed here.

    While ICE restart is inefficient currently, the actual media switch (which is hard) happens very seamlessly.

     

    In my humble opinion

    Whatsapp’s usage of STUN and RTP seems a little out of date. Arguably, the way STUN is used is very straightforward and makes things like implementing the switch from relayed calls to P2P mode easier. But ICE provides methods to accomplish the same thing, in a more robust way. Using a custom TURN-like functionality that delivers raw RTP from the conference bridge saves some bytes’ overhead for TURN channels, but that overhead is typically negligible.

    Not using DTLS-SRTP with ciphers capable of perfect forward secrecy is a pretty big issue in terms of privacy. SDES is known to have drawbacks and can be decrypted retroactively if the key (which is transmitted via the signaling server) is known. Note that the signaling exchange might still be protected the same way it is done for text messages.

    In terms of user experience, the mid-call blocking of P2P showed that this scenario had been considered which shows quite some thought. Echo cancellation is a serious problem though. The webrtc.org echo cancellation is capable of a much better job and seems to be included in the binary already. Maybe the team there would even offer their help in exchange for an acknowledgement… or awesome chocolate.

     

    {“author”: “Philipp Hancke“}

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

    The post What’s up with WhatsApp and WebRTC? appeared first on webrtcHacks.

    The WebRTC Troubleshooter: test.webrtc.org

    Mon, 04/20/2015 - 09:00

    WebRTC-based services are seeing new and larger deployments every week. One of the challenges I’m personally facing is troubleshooting as many different problems might occur (network, device, components…) and it’s not always easy to get useful diagnostic data from users.

    troubleshooting (Image source: google)

    Earlier this week, Tsahi, Chad and I participated at the WebRTC Global Summit in London and had the chance to catch up with some friends from Google, who publicly announced the launch of test.webrtc.org. This is great diagnostic tool but, to me, the best thing is that it can be easily integrated into your own applications; in fact, we are already integrating this in some of our WebRTC apps.

    Sam, André and Christoffer from Google are providing here a brief description of the tool. Enjoy it and happy troubleshooting!

    {“intro-by”: “victor“}

    The WebRTC Troubleshooter: test.webrtc.org (by Google) Why did we decide to build this?

    We have spent countless hours debugging things when a bug report comes in for a real-time application. Besides the application itself, there are many other components (audio, video, network) that can and will eventually go wrong due to the huge diversity among users’ system configurations.

    By running small tests targeted at each component we hoped to identify issues and create the possibility to gather information on the system reducing the need for round-trips between developers and users to resolve bug reports.

    Test with audio problem


    What did we build?

    It was important to be able to run this diagnostic tool without installing any software and ideally one should be able to integrate very closely with an application, thus making it possible to clearly identify bugs in an application from the components that power it.

    To accomplish this, we created a collection of tests that verify basic real-time functionality from within a web page: video capture, audio capture, connectivity, network limitations, stats on encode time, supported resolutions, etc… See details here. 

    We then bundled the tests on a web page that enables the user to download a report, or make it available via a URL that can be shared with developers looking into the issue.

    How can you use it?

    Take a look at test.webrtc.org and find out what tests you could incorporate in your app to help detect or diagnose user issues. For example, simple tests to distinguish application failures from system components failures, or more complex tests such as detecting if the camera is delivering frozen frames, or tell the user that their network signal quality is weak. 

    https://webrtchacks.com/wp-content/uploads/2015/04/test.webrtc.org_.mp4

    You are encouraged by us to take ideas and code from GitHub and integrate similar functionality in your own UX. Using test.webrtc.org should be part of any “support request” flow for real-time applications. We encourage developers to contribute! 

    In particular we’d love some help getting a uniform getStats API between browsers.

    test.webrtc.org repo

    What’s next?

    Working on adding more tests (e.g. network analysis detecting issues that affect audio and video performance is on the way).

    We want to learn how developers integrate our tests into their apps and we want to make them easier to use!

    {“authors”: [“Sam“, “André“, “Christoffer”]}

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

    The post The WebRTC Troubleshooter: test.webrtc.org appeared first on webrtcHacks.

    Put in a Bug in Apple’s Apple – Alex Gouaillard’s Plan

    Tue, 04/14/2015 - 13:08

    Apple Feast photo courtesy of flikr user Overduebook. Licensed under Creative Commons NC2.0.

    One of the biggest complaints about WebRTC is the lack of support for it inside Safari and iOS’s webview. Sure you can use a SDK or build your own native iOS app, but that is a lot of work compared to Android which has Chrome and WebRTC inside the native webview on Android 5 (Lollipop) today. Apple being Apple provides no external indications on what it plans to do with WebRTC. It is unlikely they will completely ignore a W3C standard, but who knows if iOS support is coming tomorrow or in 2 years.

    Former guest webrtcHacks interviewee Alex Gouillard came to me with an idea a few months ago for helping to push Apple and get some visibility. The idea is simple – leverage Apple’s bug process to publicly demonstrate the desire for WebRTC support today, and hopefully get some kind of response from them. See below for details on Alex’s suggestion and some additional Q&A at the end.

    Note: Alex is also involved in the webrtcinwebkit project – that is a separate project that is not directly related, although it shares the same goal of pushing Apple. Stay tuned for some coverage on that topic.

    {“intro-by”: “chad“}

    Plan to Get Apple to support WebRTC The situation

    According to some polls, adding WebRTC support to Safari, especially on iOS and in native apps in iOS, is the most wanted WebRTC item today.

    The technical side of the problem is simple: any native app has to follow Apple’s store rules to be accepted in the store. These rules state that any apps that “browse the web” need to use Apple provided WebView [rule 2.17] based on the WebKit framework. Safari is also based on WebKit. WebKit does not Support WebRTC… yet!

    First Technical step

    The webrtcinwebkit.org project aims at addressing the technical problem within the first half of 2015. However, bringing WebRTC support to WebKit is just part of the overall problem. Only Apple can decide to use it in their products, and they are not commenting about products that have not been released.

    There have been lots of signs though that Apple is not opposed to WebRTC in WebKit/Safari.

    • Before the Chrome fork of WebKit/WebCore in what became known as blink, Apple was publicly working on parts of the WebRTC implementation (source)
    • Two umbrella bugs to accept implementation of WebRTC in WebKit are still open and active in WebKit’s bugzilla, with an Apple media engineer in charge (Bug 124288 &  Bug 121101)
    • Apple Engineers, not the usual Apple standard representative, joined the W3C WebRTC working group early 2014 (public list), and participated to the technical plenary meeting in November 2014 (w3c members restricted link)
    • Finally, an early implementation of Media Streams and GetUserMedia API in WebKit was contributed late 2014 (original bug & commit).

    So how to let Apple know you want it and soon – potentially this year?

    Let Apple know!

    Chrome and Internet Explorer (IE), for example, have set up pages for web developers to directly give their feedback about which feature they want to see next (WebRTC related items generally rank high by the way).  There is no such thing yet for Apple’s product.

    The only way to formally provide feedback to Apple is through the bug process. One needs to have or create a developer account, and open a bug to let Apple know they want something.  Free accounts are available, so there is no financial cost associated with the process. One can open a bug in any given category, the bugs are then triaged and will end up in “WebRTC” placeholder internally.

    Volume counts. The more people will ask for this feature, the most likely Apple is to support it. The more requests the better.

    But that is not the only thing that counts. Users of WebRTC libraries, or any third party who has a business depending on WebRTC can also raise their case with Apple that their business would profit from Apple supporting WebRTC in their product. Here too, volume (of business) counts.

    As new releases of Safari are usually made with new releases of the OS, and generally in or around September, it is very unlikely to see WebRTC in Safari (if ever) before the next release, late 2015.

    We need you

    You want WebRTC support on iOS? You can help. See below for a step-by-step guide on how.

    How to Guide Step-by-step guide
    1. Register a free Apple Developer account. Whether you are a developer or not does not matter eventually. You will need to make an Apple ID if you do not have one already.
    2. Sign in to the Bug Reporter:
    3. Once signed in, you should see the following screen:
    4. Click on Open, then select Safari:
    5. Go ahead and write the bug report:

    It is very important here that you write WHY, in your own words, you want WebRTC support in Safari. There are a multiple of different reasons you might want it:

    • You’re a developer  you have developed a website that requires WebRTC support, and you cannot use it on Safari. If your users are requesting it, please share the volume of request, and/or share the volume of usage you’re getting on non-safari browsers to show the importance of the this for Apple.
    • You’re a company with a WebRTC product or service. You have the same problem as above, and the same suggestions apply.
    • You’re a user of a website that requires WebRTC, and owner of many Apple devices. You would love to be able to use your favorite WebRTC product or service on your beloved device.
    • You’re a company that propose a plugin for WebRTC in Safari, and you would love to get rid of it.
    • others

    Often times, some communities organize “bug writing campaigns” that include boilerplate text to include in a bug.  It’s a natural tendency for reviewers to discount those bugs somewhat because they feel like more of a “me too” than a bug filed by someone that took 60 seconds to write up a report in their own words.

    {“author”, “Alex Gouaillard“}

    {“editor”, “chad“}

    Chad’s follow-up Q&A with Alex

    Chad: What is Apple’s typical response to these bug filing campaigns?

    Alex: I do not have the direct answer to this, and I guess only Apple has. However, here are two very clear comments by an Apple representative:

    The only way to let Apple know that a feature is needed is through bug filling.

    I would just encourage people to describe why WebRTC (or any feature) is important to them in their own words. People sometimes start “bug writing campaigns” that include boilerplate text to include in a bug, and I think people here have a natural tendency to discount those bugs somewhat because they feel like more of a “me too” than a bug filed by someone that took 60 seconds to write up a report in their own words.”

    So my initiative here is not to start a bug campaign per say, where everybody would copy paste the same text, or click the same report to increment a counter. My goal here is to let the community know they can let Apple know their opinion in a way that counts.

    [Editor’s note: I was not able to get a direct confirmation from Apple (big suprise) – I did directly confirm  evidence that at least one relevant Apple employee agrees with the sentiment above.]

    Chad: Do you have any examples of where this process has worked in the past to add a whole new W3C-defined capability like WebRTC?

    Alex: I do not. However, the comment #1 above by Apple representative was very clear that whether it will eventually work or not, there is no other way.

    Chad: Is there any kind of threshold on the number of bug filings you think the community needs to meet?

    Alex: My understanding is that it’s not so much about the number of people that send bugs, it’s more about the case they make. It’s a blend between business opportunities and number of people. I guess volume counts – whether it is people or dollars. This is why it is so important that people use they own words and describe their own case. 

    Let’s say my friends at various other WebRTC Platform-as-a-Service providers desire to show the importance for them of having WebRTC in iOS or Safari- one representative of the company could go in and explain their use case and their numbers for the platform / service. They could also ask their devs to file a bug describing their application they developed on top of their WebRTC platform. They could also ask their users to describe why as users of the WebRTC app that they feel segregated against their friends who owns a Samsung tablet and who can enjoy WebRTC while they cannot on their iPad. (That is just an example, and I do not suggest that they should write exactly this. Again, everybody should use their own word.)

    If I understand correctly, it does not matter whether one or several employees of the above named company fill only one or several bugs for the same company use case.

    Chad: Are you confident this will be a good use of the WebRTC developer’s community’s time?

    Alex: Ha ha. Well, let’s put it that way, the whole process takes around a couple of minutes in general, and maybe just a little bit more for companies that have a bigger use case and want to weight in the balance. Less than what you are spending reading this blog post. If you don’t have a couple of minute to fill a bug to Apple, then I guess you don’t really need the feature.

    More seriously, I have been contacted by enough people that just wanted to have a way, anyway, to make it happen, that I know this information will be useful. For the cynics out there, I’m tempted to say, worse case scenario you lost a couple of minutes to prove me wrong. Don’t miss the opportunity.

    Yes, I’m positive this will be a good use of everybody’s time.

    {“interviewer”, “chad“}

    {“interviewee”, “Alex Gouaillard“}

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

    The post Put in a Bug in Apple’s Apple – Alex Gouaillard’s Plan appeared first on webrtcHacks.

    The Minimum Viable SDP

    Tue, 03/31/2015 - 13:30

    Unnatural shrinkage. Photo courtesy Flikr user Ed Schipul

     

    One evening last week, I was nerd-sniped by a question Max Ogden asked:

    That is quite an interesting question. I somewhat dislike using Session Description Protocol (SDP)  in the signaling protocol anyway and prefer nice JSON objects for the API and ugly XML blobs on the wire to the ugly SDP blobs used by the WebRTC API.

    The question is really about the minimum amount of information that needs to be exchanged for a WebRTC connection to succeed.

     WebRTC uses ICE and DTLS to establish a secure connection between peers. This mandates two constraints:

    1. Both sides of the connection need to send stuff to each other
    2. You need at minimum to exchange ice-ufrag, ice-pwd, DTLS fingerprints and candidate information

    Now the stock SDP that WebRTC uses (explained here) is a rather big blob of text, more than 1500 characters for an audio-video offer not even considering the ICE candidates yet.

    Do we really need all this?  It turns out that you can establish a P2P connection with just a little more than 100 characters sent in each direction. The minimal-webrtc repository shows you how to do that. I had to use quite a number of tricks to make this work, it’s a real hack.

    How I did it Get some SDP

    First, we want to establish a datachannel connection. Once we have this, we can potentially use it negotiate a second audio/video peerconnection without being constrained in the size of the offer or the answer. Also, the SDP for the data channel is a lot smaller to start with since the is no codec negotiation. Here is how to get that SDP:

    var pc = new webkitRTCPeerConnection(null); var dc = pc.createDataChannel('webrtchacks'); pc.createOffer( function (offer) { pc.setLocalDescription(offer); console.log(offer.sdp); }, function (err) { console.error(err); } );

    The resulting SDP is slightly more than 400 bytes. Now we need also some candidates included, so we wait for the end-of-candidates event:

    pc.onicecandidate = function (event) { if (!event.candidate) console.log(pc.localDescription.sdp); };

    The result is even longer:

    v=0 o=- 4596489990601351948 2 IN IP4 127.0.0.1 s=- t=0 0 a=msid-semantic: WMS m=application 47299 DTLS/SCTP 5000 c=IN IP4 192.168.20.129 a=candidate:1966762134 1 udp 2122260223 192.168.20.129 47299 typ host generation 0 a=candidate:211962667 1 udp 2122194687 10.0.3.1 40864 typ host generation 0 a=candidate:1002017894 1 tcp 1518280447 192.168.20.129 0 typ host tcptype active generation 0 a=candidate:1109506011 1 tcp 1518214911 10.0.3.1 0 typ host tcptype active generation 0 a=ice-ufrag:1/MvHwjAyVf27aLu a=ice-pwd:3dBU7cFOBl120v33cynDvN1E a=ice-options:google-ice a=fingerprint:sha-256 75:74:5A:A6:A4:E5:52:F4:A7:67:4C:01:C7:EE:91:3F:21:3D:A2:E3:53:7B:6F:30:86:F2:30:AA:65:FB:04:24 a=setup:actpass a=mid:data a=sctpmap:5000 webrtc-datachannel 1024

    Only take what you need

    We are only interested in a few bits of information here: 

    1. the ice-ufrag: 1/MvHwjAyVf27aLu
    2. the ice-pwd: 3dBU7cFOBl120v33cynDvN1E
    3. the sha-256 DTLS fingerprint: 75:74:5A:A6:A4:E5:52:F4:A7:67:4C:01:C7:EE:91:3F:21:3D:A2:E3:53:7B:6F:30:86:F2:30:AA:65:FB:04:24
    4. the ICE candidates

    The ice-ufrag is 16 characters due to randomness security requirements from RFC 5245. While it is possible to reduce that, it’s probably not worth the effort. The same applies to the 24 characters of the ice-pwd. Both are random so there is not much to gain from compressing them even.

    The DTLS fingerprint is a representation of the 256 bytes of the sha-256 hash. It’s length can easily be reduced from 95 characters to almost optimal (assuming we want to be binary-safe) 44 characters: 

    var line = "a=fingerprint:sha-256 75:74:5A:A6:A4:E5:52:F4:A7:67:4C:01:C7:EE:91:3F:21:3D:A2:E3:53:7B:6F:30:86:F2:30:AA:65:FB:04:24"; var hex = line.substr(22).split(':').map(function (h) { return parseInt(h, 16); }); console.log(btoa(String.fromCharCode.apply(String, hex))); // yields dXRapqTlUvSnZ0wBx+6RPyE9ouNTe28whvIwqmX7BCQ=

    So we have So we’re at 84 characters now. We can hardcode everything else in the application.

    Dealing with candidates

    Let’s look at the candidates. Wait, we got only host candidates. This is not going to work unless people are on the same network. STUN does not help much either since it only works in approximately 80% of all cases.

    So we need candidates that were gathered from a TURN server. In Chrome, the easy way to achieve this is to set the iceTransports constraint to ‘relay’ which will not even gather host and srflx candidates. In Firefox, you need to ignore all non-relay candidates currently.

    If you use the minimal-webrtc demo you need to use your own TURN credentials, the ones in the repository will no longer work since they’re using the time-based credential scheme. Here is what happened on my machine was that two candidates were gathered:

    a=candidate:1211076970 1 udp 41885439 104.130.198.83 47751 typ relay raddr 0.0.0.0 rport 0 generation 0 a=candidate:1211076970 1 udp 41819903 104.130.198.83 38132 typ relay raddr 0.0.0.0 rport 0 generation 0

    I believe this is a bug in chrome which gathers a relay candidate for an interface which is not routable, so I filed an issue.

    Lets look at the first candidate using the grammar defined in RFC 5245: 

    1. the foundation is 1211076970
    2. the component is 1. Another reason for using the datachannel, there are no RTCP candidates
    3. the transport is UDP
    4. the priority is 41885439
    5. the IP address is 104.130.198.83 (the ip of the TURN server I used)
    6. the port is 47751
    7. the typ is relay
    8. the raddr and rport are set to 0.0.0.0 and 0 respectively in order to avoid information leaks when iceTransports is set to relay
    9. the generation is 0. This is a Jingle extension of vanilla ICE that allows detecting ice restarts

    If we were to simply append both candidates to the 84 bytes we already have we would end up with 290 bytes. But we don’t need most of the information in there.

    The most interesting information is the IP and port. For IPv4, that is 32bits for the IP and 16 bits for the port. We can encode that using btoa again which yields 7 + 4 characters per candidate. Actually, if both candidates share the same IP, we can skip encoding it again, reducing the size.

    After consulting RFC 5245 it turned out that the foundation and priority can actually be skipped, even though that requires some effort. And everything else can be easily hard-coded in the application. 

    sdp.length = 106

    Let’s summarize what we have so far: 

    1. the ice-ufrag: 16 characters
    2. the ice-pwd: 22 characters
    3. the sha-256 DTLS fingerprint: 44 characters
    4. the ip and port: 11 characters for the first candidate, 4 characters for subsequent candidates from the same ip.

    Now we also want to encode whether this is an offer or an answer. Let’s use uppercase O and A respectively. Next, we concatenate this and separate the fields with a ‘,’ character. While that is less efficient than a binary encoding or one that relies on fixed field lengths, it is flexible. The result is a string like:

    O,1/MvHwjAyVf27aLu,3dBU7cFOBl120v33cynDvN1E, dXRapqTlUvSnZ0wBx+6RPyE9ouNTe28whvIwqmX7BCQ=, 1k85hij,1ek7,157k

    106 characters! So that is tweetable. Yay!

    You better be fast

    Now, if you try this it turns out it does not usually work unless you are fast enough pasting stuff.

    ICE is short for Interactive Connectivity Establishment. If you are not fast enough in transferring the answer and starting ICE at the Offerer, it will fail. You have less than 30 seconds between creating the answer at the Answerer and setting it at the Offerer. That’s pretty tough for humans doing copy-paste. And it will not work via twitter.

    What happens is that the Answerer is trying to perform connectivity checks as explained in RFC 5245. But those never reach the Offerer since we are using a TURN server. The TURN server does not allow traffic from the Answerer to be relayed to the Offerer before the Offerer creates a TURN permission for the candidate, which it can only do once the Offerer receives the answer. Even if we could ignore permissions, the Offerer can not form the STUN username without the Answerer’s ice-ufrag and ice-pwd. And if the Offerer does not reply to the connectivity checks by Answerer, the Answerer will conclude that ICE has failed.

     

    So what was the point of this?

    Now… it is pretty hard to come up with a use-case for this. It fits into an SMS. But sending your peer an URL where you both connect using a third-party signaling server is a lot more viable most of the time. Especially given that to achieve this, I had to make some tough design decisions like forcing a TURN server and taking some shortcuts with the ICE candidates which are not really safe. Also, this cannot use trickle ice.

    ¯\_(ツ)_/¯

    (thanks, Max)

    So is this just a case study in arcane signaling protocols? Probably. But hey, I can now use IRC as a signaling protocol for WebRTC. IRC has a limit of 512 characters so one can include more candidates and information even. CTCP WEBRTC anyone?

    {“author”: “Philipp Hancke“}

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

    The post The Minimum Viable SDP appeared first on webrtcHacks.

    Avoiding Contact Center IVR Hell with WebRTC

    Mon, 03/02/2015 - 14:08

    A couple of decades ago if you bought something of any reasonable complexity, odds are it came with a call center number you had to call in case something went wrong. Perhaps like the airline industry, economic pressures on contact centers shifted their modus operandi from customer delight to cost reduction. Unsurprisingly this has not done well for contact center public sentiment. Its no wonder the web came along to augment and replace much of this experience –  but no where near all it. Today, WebRTC offers a unique opportunity for contact centers to combine their two primary means of customer interaction – the web and phone calls – and entirely change the dynamic to the benefit of both sides.

    To delve into what this looks like, we invited Rob Welbourn to walk us through a typical WebRTC-enabled contact center infrastructure. Rob has been working on the intersection of telephony and web technologies for more than 8 years, starting at Covergence. Rob continued this work which eventually coalesced into deep enterprise and contact center WebRTC expertise at Acme Packet, Oracle, Cafe X, and now as an consultant for hire.

    Please see Rob’s great technology brief on WebRTC architectures in the Contact Center below.

    {“intro-by”: “chad“}

    Robert Welbourn

    Introduction

    If ever there was an area where WebRTC is expected to have a major impact, it is surely the contact center.  By now most readers of this blog have seen the Amazon Kindle Fire commercials, featuring the get-help-now Mayday button and Amy, the annoyingly perky call center agent:

    Those in the industry know that Mayday’s voice and video capability use WebRTC, as detailed by Chad and confirmed by Google WebRTC development lead Justin Uberti.   When combined with screen sharing, annotation and co-browsing, this makes for a compelling package. Executives in charge of call centers have taken notice, and are looking to their technology suppliers to spice up their call centers in the same way.

    Indeed, the contact center is a very instructive example of how WebRTC can be used to enhance a well-established, existing system.  For those who doubt that the technology isn’t mature enough for widespread deployment, I’ll let you into a dirty little secret: WebRTC on the consumer side of the call center isn’t happening in web browsers, it’s happening in mobile apps.  I’ll say more about this later.

    What a Contact Center looks like

    Before we examine how we can turbocharge a contact center with WebRTC, let’s take a look at the main component parts, and some of the pain points that both customers and call center staff encounter in their daily lives.

    (Disclaimer:  This sketch is a simplified caricature of a call center, drawn from the author’s experience with a number of different systems.   The same is true for the descriptions of WebRTC gateways in the following sections, which should be viewed as idealized and not a description of any one vendor’s offerings.)

    Generic Contact Center Architecture

    The web-to-call correlation problem

    Let’s imagine that we’re a consumer, calling our auto insurance company.  Perhaps we’ve been to their website, or maybe we’re using their shiny new mobile app on our smartphone.  Either way, we’ve logged into the insurer’s web portal, to get an update on an insurance claim, update our coverage, or whatever.  (And yes, even if we’re using a mobile app, we’re most likely still communicating with a web server.  It’s only the presentation layer that’s different.)

    Now suppose that we actually want to talk to a human being who can help us.  If we’re lucky, the web site will provide a phone number in an easy-to-find place, or maybe our mobile app will automatically bring up the phone’s dialer to make the call.  However, at this point, all of our contextual information, such as our identity and the web page we were on, gets lost.

    The main problem here is that it is not easy to correlate the web session with the phone call.  The PSTN provides no way of attaching a context identifier from a web session to a phone call, leaving the caller ID or dialed number as the only clues in the call signaling.   That leaves us with the following possibilities:

    • Use the caller ID.  This is ambiguous at best, in that a phone number doesn’t definitively identify a person, and mobile device APIs in any case forbid apps from harvesting a device’s phone number, so it can’t be readily passed into the contact center by the app.
    • Use the called number.  Some contact centers use the concept of the steering pool, where a phone number from a pool is used to temporarily identify a particular session, which could potentially be used by a mobile app.  However, the redial list is the enemy of this idea; since the number is temporarily allocated to a session, you wouldn’t want a customer mistakenly thinking they could use the same number to call back later.
    • Have the contact center call the customer back when it’s their turn, and an agent is about to become available.  This is in fact a viable approach, but complex to implement, largely for reasons of not tying up an agent while an attempt is made to reach the customer and verify they still want the call.
    • Use WebRTC for in-app, contextual communications.
    Customer-side interaction

    But let’s continue with the premise that the customer has made a regular phone call to the contact center.  From the diagram above, we can see that the first entity the call hits is the ingress gateway (if via TDM) or Session Border Controller (if via a SIP trunk).  This will most likely route the call directly to an Interactive Voice Response (IVR) system, to give the caller an opportunity to perform self-service actions, such as looking up their account balance.  Depending on the vendor, the ingress gateway or SBC may itself take part in the interaction, by hosting a VoiceXML browser, as is the case with Cisco products; or else the IVR may be an application running on a SIP-connected media server platform.

    Whatever the specific IVR architecture, it will certainly connect to the same customer database used by the web portal, but using DTMF to input an account number and PIN, rather than a username and password.  If the customer is lucky, they have managed to find an account statement that tells them what their account number is; if not, the conversation with agent is going to start by having them spell their name, give the last four digits of their Tax ID, and so on.  Not only that, but if a PIN is used, it is doubtless the same one used for their bank card, garage door opener and everything else, which hardly promotes security.  This whole process is time-consuming for both customer and agent, error-prone, and generally frustrating.

    At this point the IVR has determined who the caller is, and why they are calling – “Press 1 for auto claims, 2 for household claims…”; the call now needs to be held in a queue, awaiting a suitably qualified agent.  The job of managing the pool of agents with their various skills, and the queues of incoming calls, is the job of the Automated Call Distributor (ACD).  An ACD typically has a well-defined but proprietary interface or protocol by which it interacts with an IVR.  The IVR will submit various data items to the ACD, notably the caller ID, called number, customer identity and required skill group.  The ACD may then itself interrogate the customer database, perhaps to determine whether this is a customer who gets priority service, or whether they have a delinquent account and need to be handled specially, and so on, so that the call can be added to the appropriate queue.  The ACD may also be able to provide the IVR with the estimated wait time for an agent, for feedback to the caller.

    Agent-side interaction

    Let’s turn for a moment to the agent’s side of the contact center.  An agent will invariably have a phone (whether a physical device or a soft client), an interface to the ACD (possibly a custom “thick client”, but increasingly a web-based one in modern contact centers) and a view into the customer database.  For business-to-business contact centers, the agent may also be connected to a CRM system: Salesforce.com, Siebel, Oracle CRM, Microsoft Dynamics, and so on.

    For the purposes of our discussion, the agent’s phone is connected to a PBX, the PBX will provide call status information to the ACD using a standard telephony interface such as JTAPI, and the ACD will in turn use the same interface to direct incoming calls to agents.  This would typically be the case where an organization has a Cisco or Avaya PBX, for example, and the use of standard JTAPI allows for the deployment of a multi-vendor call center.  Other vendors, notably Genesys, have taken the approach of building their call center software using a SIP proxy as a key component, and the agents register their phones directly with the ACD rather than with a PBX.

    The agent will log into the ACD at the beginning of their shift, signaling that they are available.  Call handling is then directed by the ACD, and when a call is passed to an agent, the ACD pushes down the customer ID to the agent’s desktop, which is then used to automatically do a “screen pop” of the customer’s account details from the customer database or CRM system.

    Call handling in a contact center is thus a complex orchestration between an IVR, ACD, PBX and various pieces of enterprise software, usually requiring the writing of custom scripts and plugins to make everything work together  Not only this, but contact centers also make use of analytics software, call recording systems, and so on.

    The caller experience

    Let’s return to our caller, parked on the IVR and being played insipid on-hold music.  When the call eventually reaches the head of the queue, the ACD will instruct the IVR to transfer the call to a specific agent.  The agent gets the screen pop, asks the caller to verify their identity, and then begins the process of asking why they called.

    To summarize, the contact center experience typically involves:

    • Loss of contextual information from an existing web or mobile app session.
    • Navigating IVR Hell.
    • Waiting on hold.
    • Re-establishing identity and context.
    • A voice-only experience with a faceless representative, and lack of collaboration tools.

    It’s no wonder this is judged a poor experience, for customers and contact center agents alike.

    Adding WebRTC to the Contact Center

    WebRTC is part of what is called in the contact center business, the “omnichannel experience”, in which multiple modalities of communication between a customer and the contact center all work together seamlessly.  An interaction may start on social media, be escalated to chat, from there to voice and video, and possibly be accompanied by screen sharing and co-browsing.  But how is this accomplished?

    Contact Center with WebRTC Gateways and Co-browse Server

    The key thing to hold in mind is that voice and video are integrated into the contact center app, and that context is at all times preserved.  As a customer, you have already established your identity with the contact center’s web portal; there’s no need to have the PSTN strip that away when you want to talk to a human being.  And when you do get put through to an agent, why shouldn’t they be able to view the same web page that you do?  (Subject to permission, of course.)

    To do this, we need the following components (shown colored purple in the above diagram):

    • A back-end to the web portal that is capable of acting as a pseudo-IVR.  As far as the ACD is concerned, it’s getting a regular incoming call, which has to be queued and transferred to an agent as usual.  The fact that this is a WebRTC call and not from the PSTN is totally transparent to the ACD.
    • A co-browsing server – this acts as a rendezvous point between the customer and the agent for a particular co-browsing session, where changes to the customer’s web page are published over a secure WebSockets (WSS) connection, and the agent subscribes to those changes.  The actual details of how this works are proprietary and vary between vendors; however, the DOM Mutation Observer API is generally at the heart of the toolkit used.  When the agent wishes to navigate on behalf of the customer, mouse-click  events are sent back over the WSS connection from the agent and injected into the customer’s web page using a JavaScript or jQuery simulated mouse click event.  Annotation works similarly, with a mousedown event being passed over the WSS connection and used to paint on an HTML canvas element overlaying the customer’s web page.
    • A WebRTC-to-SIP signaling gateway (as webrtcHacks has covered here).
    • A media gateway, which transforms the SRTP used by WebRTC to the unencrypted RTP used by most enterprise telephony systems, and vice-versa.  This element may also carry out H.264 to VP8 video transcoding and audio codec transcoding if required.

    The signaling and media gateways are common components for vendors selling WebRTC add-ons for legacy SIP-based systems, and are functionally equivalent in the network to a Session Border Controller.  Indeed, several such products are based on SBCs, or a combination of an SBC for the media and a SIP application server for the signaling gateway.  On the other hand, the pseudo-IVR and co-browse servers are rather more specialized elements, designed for contact center applications.

    The work of this array of network elements is coordinated by the web portal, using their APIs and supporting SDKs.  The sequence diagrams in the next section show how the web portal and the ACD between them orchestrate a WebRTC call from its creation to being handed off as a SIP call to an agent, and how it is correlated with a co-browsing session.

    Finally, it should be noted that a reverse HTTP proxy is generally required to protect the web servers in this arrangement, which reside within the inner firewall.  The media gateway would normally be placed within the DMZ.  The use of multiplexing to allow the media streams of multiple calls to use a single RTP port is a particularly noteworthy feature of WebRTC, which is deserving of appreciation by those whose job it is to manage firewalls.

    Call Flows

    In the diagrams that follow, purple lines indicate web-based interactions, often based on REST APIs. Some interactions may use WebSockets because of their asynchronous, bidirectional nature, which is particularly useful for call signaling and event handling.

    Preparing for a call

    Let us start at the point where the customer has already been authenticated by the web portal, and has been perusing their account details. Seeing the big friendly ‘Get Help’ button on their mobile app (remember, this is a mobile-first deployment), they decide they want to talk to a human. Inevitably, an agent is never just sitting around waiting for the call, so there is work to be done to make this happen.

    WebRTC Contact Center Call Flow, Part 1

    The first step in preparing for the call is for the web portal code to allocate a SIP identity for the caller, in other words, the ‘From’ URI or the caller id. This could be any arbitrary string or number, but it should be unique, since we’re also going to use it to identify the co-browse session. Next, the portal requests the WebRTC signaling gateway to authorize a session for this particular URI, because, well, you don’t want people hacking into your PBX and committing toll fraud using WebRTC. The signaling gateway obliges, and passes back to the web portal a one-time authorization token. Armed with the token, the portal instructs the client app (or browser) to prepare for the WebRTC call. It provides the token, the From URI, the location of a STUN server and information on how to contact the signaling gateway.

    While the client is being readied, the portal makes a web services call to the ACD to see when an agent is expected to become available, given the customer’s identity and the nature of their inquiry. (The nature of the inquiry will be determined by what page of the website or app they were on when they pressed the ‘Get Help’ button.) Assuming an agent is not available at that very moment, the portal passes back the estimated wait time to be displayed by the client.

    But what about the insipid on-hold music I mentioned earlier? Don’t we need to transfer the customer to a media server to play this? Well, no, we don’t. This is the Web we’re talking about, and we can readily tell the client to play a video from YouTube, or wherever, while they are waiting.

    Next, the web portal submits the not-yet-created call to the ACD for queuing, via the pseudo-IVR component. Key pieces of information submitted are the From URI, the customer ID and the queue corresponding to the reason for the call. When the call reaches the head of the queue, the ACD instructs the call to be transferred to the selected agent.

    (Side-note: Pseudo-IVR adapters for contact centers are used for a variety of purposes. They may be used to dispatch social media “tweets”, inbound customer service emails and web-chat sessions, as well as WebRTC calls.)

    For modern deployments, agent desktop software may be constructed from a web-based framework, which allows third-party plugin components to pull information from the customer database, to connect to a CRM system, and in our case, to connect to the co-browse server. The screen pop to the agent uses the customer URI to connect to the correct session on the co-browse server.

    Making the call

    Now that the customer and agent are both ready, the web portal instructs the WebRTC client to call the agent’s URI. The actual details of how this is done depend on the vendor-specific WebRTC signaling protocol supported by the gateway; however, on the SIP side of the gateway they are turned into the standard INVITE, with the SDP payload reflecting the media gateway’s IP address and RTP ports.

    WebRTC Contact Center Call Flow, Part 2

    The fact that this is a video call is transparent to the ACD. The pool of agents with video capability can be put in their own skill group for the purposes of allocating them to customers using WebRTC.  The agents could be using video on suitably equipped phone handsets, or they could themselves be using WebRTC clients.  Indeed, some contact center vendors with whom I have spoken point to the advantages of delivering the entire agent experience within a browser: delivering updates to a SIP soft client or thick-client agent desktop software then becomes a thing of the past.

    After the video session has been established, the customer may assent to the sharing of their mobile app screen or web-browsing session.  The co-browse server acts as the rendezvous point, with the customer’s unique URI acting as the session identifier.

    Concluding thoughts: It’s all about mobile, stupid!

    The fact that WebRTC is not ubiquitous, that it is not currently supported in major browsers such as Safari and Internet Explorer, might be thought an insurmountable barrier to deploying it in a contact center.  But this is not the case.  The very same infrastructure that works for web browsers also works for mobile apps, which in many cases are simply mobile UI elements placed on top of a web application, making the same HTTP calls on a web server.

    All that is required is a WebRTC-based SDK that works in Android or iOS.  Happily for us, Google has made its WebRTC code readily available through the Chromium project.  Several vendors have made that code the basis of their mobile SDKs, wrapping them with Java and Objective-C language bindings equivalent to the JavaScript APIs found in browsers.

    For contact center executives, a mobile-first approach offers the following advantages:

    • You don’t want your customers messing around trying to install WebRTC browser plugins for Safari and IE.  If they’re going to download anything, it may as well be your mobile app.
    • Mobile devices are near-ubiquitous.  Both Pew and Nielsen report their popularity amongst older demographics in particular, where regular PCs might not be used.
    • Microphones and cameras on mobile devices are near-universal and of excellent quality, and echo cancellation works well.  That old PC with a flaky webcam?  Perhaps not so much.
    • If your customer is having a real-world problem, then the back-facing camera on a phone or tablet is a great way of showing it.  The auto insurance industry comes readily to mind.
    • Although the Great Video Codec Compromise now promises H.264 support in browsers, mobile SDKs have been able to take advantage of those devices’ hardware support for H.264 video encoding for some time.  When your contact center agents have sleek enterprise-class, video-capable phones that don’t support VP8, you don’t want to have to buy a pile of servers simply to do video transcoding.

    In the call center industry, Amazon and American Express have shown the way in supporting video in their tablet apps, and both these services use WebRTC under the hood.  Speaking at the 2014 Cisco Live! event in San Francisco, Amex executive Todd Walthall related how users of the Amex iPad app who used the video feature had greater levels of customer satisfaction, through a more personal experience.  This should not surprise us, as it’s much easier to empathize with a customer service representative if they’re not just a disembodied voice.

    For companies deploying WebRTC, it’s an incremental approach that doesn’t require significant architectural change or the replacement of existing systems. Early adopters are seeing shorter calls, as context is preserved and co-browsing allows problems to be resolved more quickly. One day we will look back at IVR Hell, waiting on endless hold with only a lo-fi rendition of Mantovani for company, trying in vain to find our account number and PIN, as if it were a childhood nightmare.

    {“author”: “Robert Welbourn“}

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates and news of technical WebRTC topics or our individual feeds @chadwallacehart@reidstidolph, @victorpascual and @tsahil.

    The post Avoiding Contact Center IVR Hell with WebRTC appeared first on webrtcHacks.

    Pages

    Using the greatness of Parallax

    Phosfluorescently utilize future-proof scenarios whereas timely leadership skills. Seamlessly administrate maintainable quality vectors whereas proactive mindshare.

    Dramatically plagiarize visionary internal or "organic" sources via process-centric. Compellingly exploit worldwide communities for high standards in growth strategies.

    Get free trial

    Wow, this most certainly is a great a theme.

    John Smith
    Company name

    Startup Growth Lite is a free theme, contributed to the Drupal Community by More than Themes.