TL;DR – YES.
Do I need a media server for a one-to-many WebRTC broadcast?
That’s the question I was asked on my chat widget this week. The answer was simple enough – yes.
Decided you need a media server? Here are a few questions to ask yourself when selecting an open source media server alternative.
Get the Selection Sheet
Then I received a follow up question that I didn’t expect:
That caught me off-guard. Not because I don’t know the answer. Because I didn’t know how to explain it in a single sentence that fits nicely in the chat widget. I guess it isn’t such a simple question either.
The simple answer is a limit in resources, along with the fact that we don’t control most of these resources.The Hard Upper Limit
Whenever we want to connect one browser to another with a direct stream, we need to create and use a peer connection.
Chrome 65 includes an upper limit to that which is used for garbage collection purposes. Chrome is not going to allow more than 500 concurrent peer connections to exist.
500 is a really large number. If you plan on more than 10 concurrent peer connections, you should be one of those who know what they are doing (and don’t need this blog). Going above 50 seems like a bad idea for all use cases that I can remember taking part of.
Understand that resources are limited. Free and implemented in the browser doesn’t mean that there aren’t any costs associated with it or a need for you to implement stuff and sweat while doing so.Bitrates, Speeds and Feeds
This is probably the main reason why you can’t broadcast with WebRTC, or with any other technology.
We are looking at a challenging domain with WebRTC. Media processing is hard. Real time media processing is harder.
Assume we want to broadcast a video at a low VGA resolution. We checked and decided that 500kbps of bitrate offers good results for our needs.
What happens if we want to broadcast our stream to 10 people?
Broadcasting our stream to 10 people requires bitrate of 5mbps uplink.
If we’re on an ADSL connection, then we can find ourselves with 1-3mbps uplink only, so we won’t be able to broadcast the stream to our 10 viewers.
For the most part, we don’t control where our broadcasters are going to be. Over ADSL? WiFi? 3G network with poor connectivity? The moment we start dealing with broadcast we will need to make such assumptions.
That’s for 10 viewers. What if we’re looking for 100 viewers? A 1,000? A million?
With a media server, we decide the network connectivity, the machine type of the server, etc. We can decide to cascade media servers to grow our scale of the broadcast. We have more control over the situation.
Broadcasting a WebRTC stream requires a media server.Sender Uniformity
I see this one a lot in the context of a mesh group call, but it is just as relevant towards broadcast.
When we use WebRTC for a broadcast type of a service, a lot of decisions end up taking place in the media server. If a viewer has a bad network, this will result with packet loss being reported to the media server. What should the media server do in such a case?
While there’s no simple answer to this question, the alternatives here include:
- Asking the broadcaster to send a new I-frame, which will affect all viewers and increase bandwidth use for the near future (you don’t want to do it too much as a media server)
- Asking the broadcaster to reduce bitrate and media quality to accomodate for the packet losses, affecting all viewers and not only the one on the bad network
- Ignoring the issue of packet loss, sacrificing the user for the “greater good” of the other viewers
- Using Simulcast or SVC, and move the viewer to a lower “layer” with lower media quality, without affecting other users
You can’t do most of these in a browser. The browser will tend to use the same single encoded stream as is to send to all others, and it won’t do a good job at estimating bandwidth properly in front of multiple users. It is just not designed or implemented to do that.You Need a Media Server
In most scenarios, you will need a media server in your implementation at some point.
If you are broadcasting, then a media server is mandatory. And no. Google doesn’t offer such a free service or even open source code that is geared towards that use case.
It doesn’t mean it is impossible – just that you’ll need to work harder to get there.
Looking to learn more about WebRTC? In the coming weeks, I’ll be refreshing my online WebRTC training. Join now so you don’t miss out.
The post Do I Need a Media Server for a One-to-Many WebRTC Broadcast? appeared first on BlogGeek.me.
Time to stop playing things on the internet and start building the internet of things.
We’ve been using that stupid IOT acronym for quite some time. Probably a decade. The idea and notion that every object can be network enabled, share its collected data and receive its commands remotely is quite exciting. I think we’re far from that vision.
It isn’t that we’re not making progress. We are. The apartment building I now live in is 3 years old. It is more automated than the previous apartment building I lived in, which was 15 years old. I wouldn’t call it IOT or a smart building quite yet. And I don’t think there’s a simple way to turn a dumb building into a smart one either.
When we moved to our new apartment we renovated a bit. There was this opportunity to add smart-home capabilities into the apartment. There were just a few teeny set of problems here:
- There’s no real business case for us yet. As a family, we really don’t need a smart-home, and frankly – I still haven’t seen one to appreciate the added benefit
- Since we’re in a highrise, the need for an apartment security/surveillance system seemed like an overkill. The most we ended up with is a peephole camera for the door. Mainly to empower or kids to see who’s knocking (no IOT or smarts in it)
- Talking to the electrician to ended up dealing with our power outlets at home, I understood that there’s not enough electricians available who know how to install a smart-home kit here in Israel
And to top it all, it felt like a one time undertaking that will be hard/impossible to upgrade or modify later on without a complete overhaul. That wasn’t what I was aiming for.
Mozilla just announced their Things Gateway that can be installed on a Raspberry Pi 3. It is a rather interesting project, especially since its learnings are then applied to the W3C Web of Things Interest Group with the intent of reducing the fragmentation of IOT. They’ve got their hands full of work.
IOT today is a patchwork of devices and companies, each trying to become a dominant player. The end result is that we’re living in a world where things can be placed on the internet, but they don’t amount for an internet of things.
Here are a few questions/hurdles that I think we’ll need to answer as an industry before we can reach that vision of IOT.Security
I am putting security here first. Here’s why:
- We all know it is mandatory
- We all know it is left as a backlog item if it is considered at all
I’ve seen it happen with VoIP and it is definitely happening today with IOT.
Until this becomes a priority, IOT will not really happen.
Security has many different aspects to it:
- Encryption of the communications, to maintain privacy and allow for authorization and authentication of it
- Upgradability, which itself should be secure, straightforward and automated
- Audit logs that are hard to tamper with, so we can investigate hacks
Most vendors won’t be able to get these done properly to being with. And they don’t have any real incentive to do that either.Standardization
There’s a need for standardization in this space. One that tackles all levels of the IOT food-chain.
Out of the top of my head, here are a few areas:
- Physical – Wi-Fi, Zigbee, Bluetooth – all are standards for the underlying network layer to be used. There’s also RFID and other type of connections that can be used. And we need to factor in 5G at some point. We’ve got wireless ones and wireline ones. A total mess. Just look at the mozilla Things Gateway announcement for the set of connectors they support and how these get supported. Too much information to get things done easily
- Transport – once we get communications, and assume (naively) that we have IP communications going, do we then run our data over TCP? Or TLS? Or maybe UDP? Or should we go for QUIC? Or HTTP/2? Should we do it over MQTT maybe? Over a WebSocket? There’s too many alternatives here
- Signaling – What are the types of messages we’re going to allow? What controls what sensor data? How do we describe it in a way that can be easily extendable and unambiguous? I’ve been there with VoIP and it was hard enough. Doing it for IOT is an order of magnitude harder (more players, more devices, more everything)
- Processing – this relates to the next topic of automation. Once we can collect, control and make decisions over a single device, can we do it in aggregate, and in ways that won’t lock us in to a single vendor?
I don’t believe we’ll get this thing standardized properly in our industry for quite some time.Automation
I’ve seen a lot of rules engines when it comes to IOT. You can program them to create sequences of events – if the density sensor indicates someone is at home, open the lights.
The problem is that you need to program them. This can’t scale.
The other problem is the issue of what to do with all that sensor data? Someone needs to collect it, aggregate it, process it, analyze it and make decisions out of it.
Simple rule engines are nice, but they won’t get us far down the IOT path.
We also need to add machine learning and AI into the mix.
The end result? Probably similar in nature to AWS Deep Lens. Only problem, it either needs to be really generic and flexible.Different Industries, Different Requirements and Ecosystems
There are different markets in IOT. they have different needs and different customers. They will have different ecosystems around them.
In broad strokes, we can split to consumer and enterprise. Enterprise here includes industrial, smart cities, etc. The consumer is all about the home, the car and the self.
Who will be the players here?From Smartphones to Smart Speakers
This is where I think we made the most progress.
Up until a year ago, IOT was something you end up delivering to customers via apps on a smartphone. You purchase a lightbulb, you get an app. You get a new TV, there’s an app. Refrigerator? App.
Amazon Alexa did something miraculous. It moved the discussion over the home from an app towards a stationary home device with voice activation and control. No screen or touch screen needed.
Since then, Google and Apple have joined and voice assistants in the home are all the rage now.
In some ways, I expect this to find its way into the enterprise as well. First via conference rooms and later – who knows?
This is one more piece in the IOT puzzle.Where do we go from here?
I have no clue.
To me, it seems that we’re still in the things on the internet, and we will be there for a lot longer.
There are things you don’t want to do when you are NIH’ing your way to a stellar WebRTC application.
Here’s a true, sad story. This month, the unimaginable happened. Rain (!) dropped from the sky here in Israel. The end of it was that 6 apartments in my building are suffering from moisture due to a leakage from a balcony of the penthouse. Being a new building, we’re at the mercies of the contractor to fix it.
Nothing in the construction market moves fast in Israel – or without threats, so we had to start sending official sounding letters to the constructor about the leak. I took charge, and immediately said we need to lawyer up and have a professional assist us in writing a letter from us to the constructor. Others were in the opinion we can do it on our own, as we need a lawyer only if he is signed directly on the document.
And then it hit me. I wanted to lawyer up is because I see many smart people failing with WebRTC. They are making rookie mistakes, and I didn’t want to make rookie mistakes when it comes to the moisture problems in my apartment.Why are we Failing with WebRTC?
I am not sure that smart people fail a lot more around WebRTC technology than they are with other technologies, but it certainly feels that way.
A famous Mark Twain quote goes like this:
“There is no such thing as a new idea. It is impossible. We simply take a lot of old ideas and put them into a sort of mental kaleidoscope. We give them a turn and they make new and curious combinations. We keep on turning and making new combinations indefinitely; but they are the same old pieces of colored glass that have been in use through all the ages.”
Many of the rookie mistakes people do about WebRTC stems from this. WebRTC is this kind of new. It is simply a lot of old ideas meshed into a new and curious combination. So we know it. And we assume we know how to handle ourselves around it.
Entrepreneurs? Skype is 14 years old. It shouldn’t be that hard to build something like Skype today.
VoIP developers? SIP we know. WebRTC is just SIP without the signaling. So we force SIP onto it and we’re done.
Web developers? WebRTC is part of HTML5. A few lines of JS code and we’re practically ready to go live.
Video developers? We can just take the WebRTC video feeds and put them on a CDN. Can’t we?
- Smart people decide they know enough to go it alone. And end up making some interesting mistakes
- People put their faith in one of the above personas… only to fail
My biggest gripe recently is people who decide in 2018 that peerJS is what they need for their WebRTC application. A project with 402 lines of code, last updated in 2015 (!). You can’t use such code with WebRTC. Code older than a year is stale or dead already. WebRTC is still too new and too dynamic.
That said, it isn’t as if you have a choice anymore. Flash is dying, and there’s no other serious alternative to WebRTC. If you’re thinking of adopting WebRTC, then here are five mistakes to avoid.Mistake #1: Failing to Configure STUN/TURN
You wouldn’t believe how often developers fail to configure NAT traversal servers. Just yesterday I had someone ask me over the chat widget of my website how can he run his application by hosting his signaling and web servers on HostGator without any STUN/TURN servers. It just doesn’t work.
The simple answer is that you can’t – barring some esoteric use cases, you will definitely need STUN servers. And for most use cases, TURN servers will also be mandatory if you want sessions to connect.
In the past month, I found myself explaining quite a lot about NAT traversal:
- You must use STUN and TURN servers
- Don’t rely on free STUN servers, and definitely don’t use “free” TURN servers
- Don’t force all sessions via TURN unless you absolutely know what you’re doing
- TURN has no added security in using it
- You don’t need more than 1 STUN server and 3 TURN servers (UDP, TCP and TLS) in your servers configuration in WebRTC
- Use temporary/ephemeral passwords in your TURN configuration
- STUN doesn’t affect media quality
- coturn or restund are great options for STUN/TURN servers
There’s more, but this should get you started.Mistake #2: Selecting the WRONG Signaling Framework
PeerJS anyone? PeerJS feels like a tourist trap:
With 1,693 stars and 499 forks, PeerJS is one of the most popular WebRTC projects on github. What can go wrong?
Maybe the fact that it is older than the internet?
A WebRTC project that had its last commit 3 years ago can’t be used today.
Same goes for using Muaz Khan’s code snippets and expecting them to be commercial grade, stable, highly scalable products. They’re not. They’re just very useful code snippets.
Planning to use some open source project? Make sure that:
- Make sure it was updated recently (=the last couple of months)
- Make sure it is popular enough
- Make sure you can understand the framework’s code and can maintain it on your own if needed
- Try to check if there’s someone behind it that can help you in times of trouble
Don’t take the selection process here lightly. Not when it comes to a signaling server and not when it comes to a media server.Mistake #3: Not Using Media Servers When You Should
I know what you’re thinking. WebRTC is peer to peer so there’s no need for servers. Some think that even signaling and web servers aren’t needed – I hope they can explain how participants are going to find each other.
To some, this peer to peer concept also means that you can run these ridiculously large scale sessions with no servers that carry on media.
Here are two such “architectures” I come across:
Mesh. It’s great. Don’t assume you can get it to run properly this year or the next. Move on.
Live broadcasting by forwarding content. It can be done, but most probably not the way you expect it to grow to a million users with no infrastructure and zero latency.
For many of the use cases out there, you will need a media server to process and route the media for you. Now that you are aware of it, go search for an open source media server. Or a commercial one.Mistake #4: Thinking Short-Term
You get an outsourcing vendor. Write him a nice requirements doc. Pay him. Get something implemented. And you’re done.
WebRTC is still at its infancy. The spec is changing. Browser implementations are changing. It is all in flux all the time. If you’re going to use WebRTC, either:
- Use some WebRTC API platform (here are a few), and you’ll be able to invest a bit less on an ongoing basis. There will be maintenance work, but not much
- Develop on your own or by outsourcing. In this case, you will need to continue investing in the project for at least the next 3 years or more
WebRTC code rots faster than most other HTML5 code. It will eventually change, but we’re not there yet.
It is also the reason I started with a few colleagues testRTC a few years ago. To help with the lifecycle of WebRTC applications, especially in the area of testing and monitoring.Mistake #5: Failing to Understand WebRTC
They say assumption is the mother of all mistakes. Google seems to agree with it. Almost.
WebRTC isn’t trivial. It sits somewhere between VoIP and the web. It is new, and the information out there on the Internet about it is scattered and somewhat dynamic (which means lots of it isn’t accurate).
If you plan on using WebRTC, make sure you first understand it and its intricacies. Understand the servers that are needed to deploy a WebRTC application. Understand the signaling mechanisms that are built into WebRTC. Understand how media is processes and sent over the network. understand the rich ecosystem of solutions that can be used with WebRTC to build a production ready system.
Lots of things to learn here. Don’t assume you know WebRTC just because you know web development or because you know VoIP or video processing.
If you are looking to seriously learn WebRTC, why not enroll to my Advanced WebRTC Architecture course?
What about my apartment? We’ve lawyered up, and now I have someone review and fix all the official sounding letters we’re sending out. Hopefully, it will get us faster to a resolution.
The post 5 Mistakes to Avoid When Developing WebRTC Applications appeared first on BlogGeek.me.
For WebRTC, Mobile and PC are moving in different directions. In the desktop, WebRTC Electron apps are gaining momentum.
In the good old days, people used to complain that WebRTC isn’t available on all browsers. Mobile was less of an issue for most as mobile application developers port WebRTC and use it natively on both iOS and Android.
How times change.
Need to know where WebRTC is available? Download this free WebRTC Device Cheat Sheet.
The challenge? None of the browsers are ready:
- Chrome uses Plan B, switching to Unified Plan
- Firefox is doing fine, but isn’t high on the priority list
- Edge doesn’t support the data channel, had its market share isn’t that great
- Safari doesn’t support VP8 and breaks a wee bit too often at the moment
What’s a developer to do?
Or maybe. Just maybe you should treat PCs and laptops the same way you do mobile? And build an app.
If that’s what you plan on doing then you’re not alone.
Here are 3 vendors making use of Electron (and WebRTC) for their desktop application:#1 – Slack
Slack are a popular team collaboration application. I’ve been using it in the browser for the last 3 years, but switched to their desktop Electron app on both my Ubuntu desktop and my Windows 10 laptop.
Why didn’t I use the app for so long? Because I don’t like installing things.
Why have I installed it now? Because I need to track 3+ slack accounts in parallel at all times now. This means a tab per slack account in my browser. On the desktop app, they don’t “eat up” multiple tabs. It isn’t a matter of memory or performance for me. Just one of “esthetics” – trying to preserve a tabs diet on my Chrome.
And that’s how Slack likes it. During the last Kranky Geek, the Slack team gave an interesting presentation about their current plans. It had about a minute dedicated to Electron in 2:30 of the session:
This recording lacks the Q&A part of the session. In an answer to a question regarding browsers support, Andrew MacDonald of Slack, said their focus is in their desktop app – not the browser. They make sure everything works on Chrome. Invest less time and effort on the other browsers. And focus a lot on their Slack desktop application.
It was telling.
If you are looking for desktop-application-only-features in Slack, then besides having a single window for all projects, there’s the collaboration they offer during screen sharing that isn’t available in the browser (yet another reason for me to switch – to check it out).
During that session, at 2:30 minutes? Andrew says why Electron is so useful to Slack, and it is in the domain of cross platform development and time to market – with their team size, they can’t update as fast as Electron does, so they took it “as is” for the built-in WebRTC implementation of it.#2 – Discord
Discord is a kind of Slack but different. A social network targeting gamers. You can also find there non-gaming groups. Discord is doing all it can to get you from the comfort of your browser right into their native application.
Here’s how the homepage looks like:
From the get go their call to action is to either Open Discord (in the browser) or Download for your operating system. On mobile, if you’re curious, the only alternative is to download the app.
Here’s the interesting part, though.
Discord’s call to action suggest by using green buttons you open Discord in the browser. That’s a lower friction action. You select a user name. Then pick an email and password (or use an unclaimed channel until you add your username and password). And now that you’re signed up for the service, it is time to suggest again you use their app:
And… if you skip this one, you’ll get a top bar reminder as well (that orange strip at the top):
You can do with Discord almost anything inside the browser, but they really really really want to get you off that damn internet and into their desktop app.
And it is working for them!#3 – TalkDesk
TalkDesk has its own reason for adopting Electron.
TalkDesk is a contact center solution that integrates with CRMs and third party systems. Towards that goal, you can:
- Use the TalkDesk application (=browser web app)
- Install the TalkDesk extension from Chrome, and have it latch on to other CRM systems
- install the Chrome Callbar app, so you can use it as a standalone without the need to have the browser opened at all
That third option is going the way of the dodo, along with Chrome apps. TalkDesk solved that by introducing Callbar Electron.
What we see here differs slightly from the previous two examples.
Where Slack and Discord try getting people off the web and into their desktop application, TalkDesk is just trying to be everywhere for them. Using HTML5 and Electron means they need not write yet-another-application for the desktop – they can reuse parts of their web app.They are NOT Alone
There are other vendors I know of that are using Electron for their WebRTC applications. They do it for one of the following reasons:
- It is an easy way to support Internet Explorer by not supporting it (or Safari)
- They want a “native” app because they need more control than what a browser could ever offer, but still want to work with cross platform development, and HTML5/JS seems like the cleanest approach
- Their users work in front of the service all day, so the browser isn’t the best interface for them
- They don’t want to tether themselves or limit themselves to the browser. Using web technology is just how they want to develop
- It brings with it “stability”, as it is up to you to decide when to push an update to your users as opposed to having browser vendors do it on their own timeframe. It is only semblance as most would still support both browsers and applications in parallel
This shift towards Electron apps makes it harder to estimate the real usage base of WebRTC. If most communications is shifting from Chrome browser (lets face it, most WebRTC comms happens in Chrome today if you only care about browsers) towards applications, then the statistics and trends collected by Google about WebRTC use are skewed. That said, it makes Chrome all the more dominant, as Electron use can be attributed back to Chromium.
Expect vendors to continue adopting Electron for their WebRTC applications. This trend is on .
Need to know where WebRTC is available? Download this free WebRTC Device Cheat Sheet.
Are AI cameras in our future?
In last year’s AWS re:invent event, which took place end of November, Amazon unveiled an interesting product: AWS DeepLens
There’s decent information about this new device on Amazon’s own website but very little of anything else out there. I decided to put my own thoughts on “paper” here as well.
Interested in AI, vision and where it meets communications? I am going to cover this topic in future articles, so you might want to sign-up for my newsletter
Get my free contentWhat is AWS DeepLens?
AWS DeepLens is the combination of 3 components: hardware (camera + machine), software and cloud. These 3 come in a tight integration that I haven’t seen before in a device that is first and foremost targeting developers.
With DeepLens, you can handle inference of video (and probably audio) inputs in the camera itself, without shipping the captured media towards the cloud.
The hype words that go along with this device? Machine Vision (or Computer Vision), Deep Learning (or Machine Learning), Serverless, IoT, Edge Computing.
It is all these words and probably more, but it is also somewhat less. It is a first tentative step of what a camera module will look like 5 years from today.
I’d like to go over the hardware and software and see how they combine into a solution.AWS DeepLens Hardware
AWS DeepLens hardware is essentially a camera that has been glued to an Intel NUC device:
Neither the camera nor the compute are on the higher end of the scale, which is just fine considering where we’re headed here – gazillion of low cost devices that can see.
The device itself was built in collaboration with Intel. As all chipset vendors, Intel is plunging into AI and deep learning as well. More on AWS+Intel vs Google later.
Here’s what’s in this package, based on the AWS blog post on DeepLens:
- 4 megapixel camera with the ability to capture 1080p video resolution
- Nothing is said about the frame rate in which this can run. I’d assume 30 fps
- The quality of this camera hasn’t been detailed either. In many cases, I’d say these devices will need to work in rather extreme lighting conditions
- 2D microphone array
- It is easy to understand why such a device needs a microphone, a 2D microphone array is very intriguing in this one
- This allows for better handling of things like directional sound and noise reduction algorithms to be used
- None of the deep learning samples provided by Amazon seem to make use of the microphone inputs. I hope these will come later as well
- Intel Atom X5 processor
- This one has 4 cores and 4 threads
- 8GB of memory and 16GB of storage – this is meant to run workloads and not store them for long periods of time
- Intel Gen9 graphics engine (here)
- If you are into numbers, then this does over 100 GFLOPS – quite capable for a “low end” device
- Remember that 1080p@30fps produces more than 62 million pixels a second to process, so we get ~1600 operations per pixel here
- You can squeeze out more “per pixel” by reducing frame rate or reducing resolution (both are probably done for most use cases)
- Like most Intel NUC devices, it has Wi-Fi, USB and micro HDMI ports. There’s also a micro SD port for additional memory based on the image above
The hardware tries to look somewhat polished, but it isn’t. Although this isn’t written anywhere, this is:
- The first version of what will be an iterative process for Amazon
- A reference design. Developers are expected to build the proof of concept with this, later shifting to their own form factor – I don’t see this specific device getting sold to end customers as a final product
In a way, this is just a more polished hardware version of Google’s computer vision kit. The real difference comes with the available tooling and workflow that Amazon baked into AWS DeepLens.AWS DeepLens Software
The AWS DeepLens software is where things get really interesting.
Before we get there, we need to understand a bit how machine learning works. At its basic, machine learning is about giving a “machine” a large dataset, letting it learn the data in one way or another, and then when you introduce similar new data, it will be able to classify it.
Dumbing the whole process and theory, at the end of the day, machine learning is built out of two main steps:
- TRAINING: You take a large set of data and use it for training purposes. You curate and classify it so the training process has something to check itself against. Then you pass the data through a process that ends up generating a trained model. This model is the algorithm we will be using later
- DEPLOY: When new data comes in (in our case, this will probably be an image or a video stream), we use our trained model to classify that data or even to run an algorithm on the data itself and modify it
With AWS DeepLens, the intent is to run the training in the AWS cloud (obviously), and then run the deployment step for real time classification directly on the AWS DeepLens device. This also means that we can run this while being disconnected from the cloud and from any other network.
How does all this come to play in AWS DeepLens software stack?On device
On the device, AWS DeepLens runs two main packages:
- AWS Greengrass Core SDK – Greengrass enables running AWS Lambda functions directly on devices. If Lambda is called serverless, then Greengrass can truly run serverless
- Device optimized MXNet package – an Apache open source project for machine learning
Why MXNet and not TensorFlow?
- TensorFlow comes from Google, which makes it less preferable for Amazon, a direct cloud competitor. It is also preferable by Intel (see below)
- MXNet is considered faster and more optimized at the moment. It uses less memory and less CPU power to handle the same task
The main component here is the new Amazon SageMaker:
SageMarker takes the effort away from the management of training machine learning, streamlining the whole process. That last step in the process of Deploy takes place in this case directly on AWS DeepLens.
Besides SageMaker, when using DeepLens you will probably make use of Amazon S3 for storage, Amazon Lambda when running serverless in the cloud, as well as other AWS services. Amazon even suggests using AWS DeepLens along with the newly announced Amazon Rekognition Video service.
To top it all, Amazon has a few pre-trained models and sample projects, shortening the path from getting a hold of an AWS DeepLens device to seeing it in action.AWS+Intel vs Google
So we’ve got AWS DeepLens. With its set of on-device and cloud software tools. Time to see what that means in the bigger picture.
I’d like to start with the main players in this story. Amazon, Intel and Google. Obviously, Google wasn’t part of the announcement. Its TensorFlow project was mentioned in various places and can be made to work with AWS DeepLens. But that’s about it.
Google is interesting here because it is THE company today that is synonymous to AI. And there’s the increasing rivalry between Amazon and Google that seems to be going on multiple fronts.
When Google came out with TensorFlow, it was with the intent of creating a baseline for artificial intelligence modeling that everyone will be using. It open sourced the code and let people play with it. That part succeeded nicely. TensorFlow is definitely one of the first projects developers would try to dabble with when it comes to machine learning. The problem with TensorFlow seems to be the amount of memory and CPU it requires for its computations compared to other frameworks. That is probably one of the main reasons why Amazon decided to place its own managed AI services on a different framework, ending up with MXNet which is said to be leaner with good scaling capabilities.
Google did one more thing though. It created its own special Tensor processing unit, calling it TPU. This is an ASIC type of a chip, designed specifically for high performance of machine learning calculations. In a research paper released by Google earlier last year, they show how their TPUs perform better than GPUs when it comes to TensorFlow machine learning work loads:
And if you’re wondering – you can get CLOUD TPU on the Google Cloud Platform, albait this is still in alpha stage.
This gives Google an advantage in hosting managed TensorFlow jobs, posing a threat to AWS when it comes to AI heavy applications (which is where we’re all headed anyway). So Amazon couldn’t really pick TensorFlow as its winning horse here.
Intel? They don’t sell TPUs at the moment. And like any other chip vendor, they are banking and investing heavily in AI. Which made working with AWS here on optimizing and working on end-to-end machine learning solutions for the internet of things in the form of AWS DeepLens an obvious choice.Artificial Intelligence and Vision
These days, it seems that every possible action or task is being scrutinized to see if artificial intelligence can be used to improve it. Vision is no different. You can find it other computer vision or machine vision and it covers a broad set of capabilities and algorithms.
Roughly speaking, there are two types of use cases here:
- Classification – with classification, the images or video stream, is being analyzed to find certain objects or things. From being able to distinguish certain objects, through person and face detection, to face recognition to activities and intents recognition
- Modification – AWS DeepLens Artistic Style Transfer example is one such scenario. Another one is fixing the nagging direct eye contact problem in video calls (hint – you never really experience it today)
As with anything else in artificial intelligence and analytics, none of this is workable at the moment for a broad spectrum of classifications. You need to be very specific in what you are searching and aiming for, and this isn’t going to change in the near future.
On the other hand, there are many many cases where what you need is a camera to classify a very specific and narrow vision problem. The usual things include person detection for security cameras, counting people at an entrance to a store, etc. There are other areas you hear about today such as using drones for visual inspection of facilities and robots being more flexible in assembly lines.
We’re at a point where we already have billions of cameras out there. They are in our smartphones and are considered a commodity. These cameras and sensors are now headed into a lot of devices to power the IOT world and allow it to “see”. The AWS DeepLens is one such tool that just happened to package and streamline the whole process of machine vision.Pricing
On the price side, the AWS DeepLens is far from a cheap product.
The baseline cost is of an AWS DeepLens camera? $249
But as with other connected devices, that’s only a small part of the story. The device is intended to be connected to the AWS cloud and there the real story (and costs) takes place.
The two leading cost centers after the device itself are going to be AWS Greengrass and Amazon SageMaker.
AWS Greegrass starts at $1.49 per year per device. Amazon SageMaker costs 20-25% on top of the usual AWS EC2 machine prices. To that, add the usual bandwidth and storage pricing of AWS, and higher prices for certain regions and discounts on large quantities.
It isn’t cheap.
This is a new service that is quite generic and is aimed at tinkerers. Startups looking to try out and experiment with new ideas. It is also the first iteration of Amazon with such an intriguing device.
I, for one, can’t wait to see where this is leading us.3 Different Compute Models for Machine Vision
AWS DeepLens is one of 3 different compute models that I see in this space of machine vision.
Here are all 3 of them:#1 – Cloud
In a cloud based model, the expectation is that the actual media is streamed towards the cloud:
- In real time
- Or at some future point in time
- When events occur; like motion being detected; or sound picked up on the mic
The data can be a video stream, or more often than not, it is just a set of captured images.
And that data gets classified in the cloud.
Here are two recent examples from a domain close to my heart – WebRTC.
At the last Kranky Geek event, Philipp Hancke shared how appear.in is trying to determine NSFW (Not Safe For Work):
The way this is done is by using Yahoo’s Open NSFW open source package. They had to resize images, send them to a server and there, using Python classify the image, determining if it is safe for work or not. Watch the video – it really is insightful at how to tackle such a project in the real world.
The other one comes from Chad Hart, who wrote a lengthy post about connecting WebRTC to TensorFlow for machine vision. The same technique was used – one of capturing still images from the stream and sending them towards a server for classification.
These approaches are nice, but they have their challenges:
- They are gravitating towards still images and not video streams at the moment. This relates to the costs and bandwidth involved in shipping and then analyzing such streams on a server. To give you an understanding of the costs – using Amazon Rekognition for one minute of video stream analysis costs $0.12. For a single minute. It is high, and the reason is that it really does require some powerful processing to achieve
- Sometimes, you really need to classify and make faster decisions. You can’t wait that extra 100’s of milliseconds or more for the classification to take place. Think augmented reality type of scenarios
- At least with WebRTC, I haven’t seen anyone who figured how to do this classification on the server side in real time for a video stream and not still images. Yet
This alternative is what we have today in smartphones and probably in modern room based video conferencing devices.
The camera is just the optics, but the heavy lifting takes place in the main processor that is doing other things as well. And since most modern CPUs today already have GPUs embedded as part of the SoC, and chip vendors are actively working on AI specific additions to chips (think Apple’s AI chip in the iPhone X or Google’s computational photography packed into the Pixel X phones).
The underlying concept here is that the camera is always tethered or embedded in a device that is powerful enough to handle the machine learning algorithms necessary.
They aren’t part of the camera but rather the camera is part of the device.
This works rather well, but you end up with a pricy device which doesn’t always make sense. Remember that our purpose here is to aim at having a larger number of camera sensors deployed and having an expensive computing device attached to it won’t make sense for many of the use cases.#3 – In the Camera
This is the AWS DeepLens model.
TBD – IMAGE
The computing power needed to run the classification algorithms is made part of the camera instead of taking place on another CPU.
We’re talking about $249 right now, but assuming this approach becomes popular, prices should go down. I can easily see such devices retailing at $49 on the low end in 2-3 technology cycles (5 years or so). And when that happens, the power developers will have over what use cases can be created are endless.
Think about a home surveillance system that costs below $1,000 to purchase and install. It is smart enough to have a lot less false positives in alerting its users. AND can be upgraded in its classification as time goes by. There can be a service put in place behind it with a monthly fee that includes such things. You can add face detection and classification of certain people – alerting you when the kids come home or leave for example. Ignoring a stray cat that came into view of the camera. And this system is independent of an external network to run on a regular basis. You can update it when an external network is connected, but other than that, it can live “offline” quite nicely.No Winning Model
All of the 3 models have their place in the world today. Amazon just made it a lot easier to get us to that third alternative of “in the camera”.IoT and the Cloud
Edge computing. Fog computing. Cloud computing. You hear these words thrown in the air when talking about the billions of devices that will comprise the internet of things.
For IoT to scale, there are a few main computing concepts that will need to be decided sooner rather than later:
- Decentralized – with so many devices, IoT services won’t be able to be centralized. It won’t be around scale out of servers to meet the demands, but rather on the edges becoming smarter – doing at least part of the necessary analysis. Which is why the concept of AWS DeepLens is so compelling
- On net and off net – IoT services need to be able to operate without being connected to the cloud at all times. Think of an autonomous car that needs to be connected to the cloud at all times – a no go for me
- Secured – it seems like the last thing people care about in IoT at the moment is security. The many data breaches and the ease at which devices can be hijacked point that out all too clearly. Something needs to be done there and it can’t be on the individual developer/company level. It needs to take place a lot earlier in the “food chain”
I was reading The Meridian Ascent recently. A science fiction book in a long series. There’s a large AI machine there called Big John which sifts through the world’s digital data:
“The most impressive thing about Big John was that nobody comprehended exactly how it worked. The scientists who had designed the core network of processors understood the fundamentals: feed sufficient information to uniquely identify a target, and then allow Big John to scan all known information – financial transactions, medical records, jobs, photographs, DNA, fingerprints, known associates, acquaintances, and so on.
But that’s where things shifted into another realm. Using the vast network of processors at its disposal, Big John began sifting external information through its nodes, allowing individual neurons to apply weight to data that had no apparent relation to the target, each node making its own relevance and correlation calculations.”
I’ve emphasized that sentence. To me, this shows the view of the same IoT network looking at it from a cloud perspective. There, the individual sensors and nodes need to be smart enough to make their own decisions and take their own actions.
All these words for a device that will only be launched April 2018…
We’re not there yet when it comes to IoT and the cloud, but developers are working on getting the pieces of the puzzle in place.
Interested in AI, vision and where it meets communications? I am going to cover this topic in future articles, so you might want to sign-up for my newsletter
Get my free content
The post AWS DeepLens and the Future of AI Cameras and Vision appeared first on BlogGeek.me.
As many as you like. You can cram anywhere from one to a million users into a WebRTC call.
You’ve been asked to create a group video call, and obviously, the technology selected for the project was WebRTC. It is almost the only alternative out there and certainly the one with the best price-performance ratio. Here’s the big question: How many users can we fit into that single group WebRTC call?
Need to understand your WebRTC group calling application backend? Take this free video mini-course on the untold story of WebRTC’s server side.
At least once a week I get approached by someone saying WebRTC is peer-to-peer and asking me if you can use it for larger groups, as the technology might not fit for such use cases. Well… WebRTC fits well into larger group calls.
You need to think of WebRTC as a set of technological building blocks that you mix and match as you see fit, and the browser implementation of WebRTC is just one building block.
The most common building block today in WebRTC for supporting group video calls is the SFU (Selective Forwarding Unit). a media router that receives media streams from all participants in a session and decides who to route that media to.
What I want to do in this article, is review a few of the aspects and decisions you’ll need to take when trying to create applications that support large group video sessions using WebRTC.Analyze the Complexity
The first step in our journey today will be to analyze the complexity of our use case.
With WebRTC, and real time video communications in general, we will all boil down to speeds and feeds:
- Speeds – the resolution and bitrate we’re expecting in our service
- Feeds – the stream count of the single session
Let’s start with an example.
Assume you want to run a group calling service for the enterprise. It runs globally. People will join work sessions together. You plan on limiting group sessions to 4 people. I know you want more, but I am trying to keep things simple here for us.
The illustration above shows you how a 4 participants conference would look like.Magic Squares: 720p
If the layout you want for this conference is the magic squares one, we’re in the domain of:
You want high quality video. That’s what everyone wants. So you plan on having all participants send out 720p video resolution, aiming for WQHD monitors (that’s 2560×1440). Say that eats up 1.5Mbps (I am stingy here – it can take more), so:
- Each participant in the session sends out 1.5Mbps and receives 3 streams of 1.5Mbps
- Across 4 participants, the media server needs to receive 6Mbps and send out 18Mbps
Summing it up in a simple table, we get:Resolution 720p Bitrate 1.5Mbps User outgoing 1.5Mbps (1 stream) User incoming 4.5Mbps (3 streams) SFU outgoing 18Mbps (12 streams) SFU incoming 6Mbps (4 streams) Magic Squares: VGA
If you’re not interested in resolution that much, you can aim for VGA resolution and even limit bitrates to 600Kbps:Resolution VGA Bitrate 600Kbps User outgoing 0.6Mbps (1 stream) User incoming 1.8Mbps (3 streams) SFU outgoing 7.2Mbps (12 streams) SFU incoming 2.4Mbps (4 streams)
The thing you may want to avoid when going VGA is the need to upscale the resolution on the display – it can look ugly, especially on the larger 4K displays.
With crude back of the napkin calculations, you can potentially cram 3 VGA conferences for the “price” of 1 720p conference.Hangouts Style
But what if our layout is a bit different? A main speaker and smaller viewports for the other participants:
I call it Hangouts style, because Hangouts is pretty known for this layout and was one of the first to use it exclusively without offering a larger set of additional layouts.
This time, we will be using simulcast, with the plan of having everyone send out high quality video and the SFU deciding which incoming stream to use as the dominant speaker, picking the higher resolution for it and which will pick the lower resolution.
You will be aiming for 720p, because after a few experiments, you decided that lower resolutions when scaled to the larger displays don’t look that good. You end up with this:
- Each participant in the session sends out 2.2Mbps (that’s 1.5Mbps for the 720p stream and the additional 80Kbps for the other resolutions you’ll be simulcasting with it)
- Each participant in the session receives 1.5Mbps from the dominant speaker and 2 additional incoming streams of ~300Kbps for the smaller video windows
- Across 4 participants, the media server needs to receive 8.8Mbps and send out 8.4Mbps
0.3Mbps (2 streams) SFU outgoing 8.4Mbps (12 streams) SFU incoming 8.8Mbps (4 streams)
This is what have we learned:
Different use cases of group video with the same number of users translate into different workloads on the media server.
And if it wasn’t mentioned specifically, simulcast works great and improves the effectiveness and quality of group calls (simulcast is what we used in our Hangouts Style meeting).
Across the 3 scenarios we depicted here for 4-way video call, we got this variety of activity in the SFU:Magic Squares: 720p Magic Squares: VGA Hangouts Style SFU outgoing 18Mbps 7.2Mbps 8.4Mbps SFU incoming 6Mbps 2.4Mbps 8.8Mbps
Here’s your homework – now assume we want to do a 2-way session that gets broadcasted to 100 people over WebRTC. Now calculate the number of streams and bandwidths you’ll need on the server side.How Many Users Can be Active in a WebRTC Call?
That’s a tough one.
If you use an MCU, you can get as many users on a call as your MCU can handle.
If you are using an SFU, it depends on a 3 different parameters:
- The level of sophistication of your media server, along with the performance it has
- The power you’ve got available on the client devices
- The way you’ve architected your infrastructure and worked out cascading
We’re going to review them in a sec.Same Scenario, Different Implementations
Anything about 8-10 users in a single call becomes complicated. Here’s an example of a publicly available service I want to share here.
- 9 participants in a single session, magic squares layout
- I use testRTC to get the users into the session, so it is all automated
- I run it for a minute. After that, it kills the session since it is a demo
- It takes into account that with 9 people on the screen, reducing resolutions for all to VGA, but it allocates 1.3Mbps for that resolution
- Leading to the browsers receiving 10Mbps of data to process
The media server decided here how to limit and gauge traffic.
And here’s another service with an online demo running the exact same scenario:
Now the incoming bitrate on average per browser was only 2.7Mbps – almost a fourth of the other service.
Same scenario. Different implementations.What About Some Popular Services?
What about some popular services that do video conferencing in an SFU routed model? What kind of size restrictions do they put on their applications?
Here’s what I found browsing around:
- Google Hangouts – up to 25 participants in a single session. It was 10 in the past. When I did my first-ever office hour for my WebRTC training, I maxed out at 10, which got me to start using other services
- Hangouts Meet – placed its maximum number at 50 participants in a single session
- Houseparty – decided on 8 participants
- Skype – 25 participants
- appear.in – their PRO accounts support up to 12 participants in a room
- Amazon Chime – 16 participants on the desktop and up to 8 participants on iOS (no Android support yet)
Does this mean you can’t get above 50?
My take on it is that there’s an increasing degree of difficulty as the meeting size increases:The CPaaS Limit on Size
When you look at CPaaS platforms, those supporting video and group calling often have limits to their meeting size. In most cases, they give out an arbitrary number they have tested against or are comfortable with. As we’ve seen, that number is suitable for a very specific scenario, which might not be the one you are thinking about.
In CPaaS, these numbers vary from 10 participants to 100’s of participants in a single sesion. Usually, if you can go higher, the additional participants will be view-only.Key Points to Remember
Few things to keep in mind:
- The higher the group size the more complicated it is to implement and optimize
- The browser needs to run multiple decoders, which is a burden in itself
- Mobile devices, especially older ones, can be brought down to their knees quite quickly in such cases. Test on the oldest, puniest devices you plan on supporting before determining the group size to support
- You can build the SFU in a way that it doesn’t route all incoming media to everyone but rather picks partial data to send out. For example, maybe only a single speaker on the audio channels, or the 4 loudest streams
Sizing and media servers is something I have been doing lately at testRTC. We’ve played a bit with Kurento in the past and are planning to tinker with other media servers. I get this question on every other project I am involved with:
How many sessions / users / streams can we cram into a single media server?
Given what we’ve seen above about speeds and feeds, it is safe to say that it really really really depends on what it is that you are doing.
If what you are looking for is group calling where everyone’s active, you should aim for 100-500 participants in total on a single server. The numbers will vary based on the machine you pick for the media server and the bitrates you are planning per stream on average.
If what you are looking for is a broadcast of a single person to a larger audience, all done over WebRTC to maintain low latency, 200-1,000 is probably a better estimate. Maybe even more.Big Machines or Small Machines?
Another thing you will need to address is on which machines are you going to host your media server. Will that be the biggest baddest machines available or will you be comfortable with smaller ones?
Going for big machines means you’ll be able to cram larger audiences and sessions into a single machine, so the complexity of your service will be lower. If something crashes (media servers do crash), more users will be impacted. And when you’ll need to upgrade your media server (and you will), that process can cost you more or become somewhat more complicated as well.
The bigger the machine, the more cores it will have. Which results in media servers that need to run in multithreaded mode. Which means they are more complicated to build, debug and fix. More moving parts.
Going for small machines means you’ll hit scale problems earlier and they will require algorithms and heuristics that are more elaborate. You’ll have more edge cases in the way you load balance your service.Scale Based on Streams, Bandwidth or CPU?
How do you decide that your media server achieved full capacity? How do you decide if the next session needs to be crammed into a new machine or another one or be placed on the current media server you’re using? If you use the current one, and new participants want to join a session actively running in this media server, will there be room enough for them?
These aren’t easy questions to answer.
I’ve see 3 different metrics used to decide on when to scale out from a single media server to others. Here are the general alternatives:
Based on CPU – when the CPU hits a certain percentage, it means the machine is “full”. It works best when you use smaller machines, as CPU would be one of the first resources you’ll deplete.
Based on Bandwidth – SFUs eat up lots of networking resources. If you are using bigger machines, you’ll probably won’t hit the CPU limit, but you’ll end up eating too much bandwidth. So you’ll end up determining the capacity available by way of bandwidth monitoring.
Based on Streams – the challenge sometimes with CPU and Bandwidth is that the number of sessions and streams that can be supported may vary, depending on dynamic conditions. Your scaling strategy might not be able to cope with that and you may want more control over the calculations. Which will lead to you sizing the machine using either CPU or bandwidth, but placing rules in place that are based on the number of streams the server can support.
The challenge here is that whatever scenario you pick, sizing is something you’ll need to be doing on your own. I see many who come to use testRTC when they need to address this problem.Cascading a Single Session
Cascading is the process of connecting one media server to another. The diagram below shows what I mean:
We have a 4-way group video call that is spread across 3 different media servers. The servers route the media between them as needed to get it connected. Why would you want to do this?#1 – Geographical Distribution
When you run a global service and have SFUs as part of it, the question that is raised immediately is for a new session, which SFU will you allocate for it? In which of the data centers? Since we want to get our media servers as close as possible to the users, we either have pre-knowledge about the session and know where to allocate it, or decide by some reasonable means, like geolocation – we pick the data center closest to the user that created the meeting.
Assume 4 people are on a call. 3 of them join from New York, while the 4th person is from France. What happens if the French guy joins first?
The server will be hosted in France. 3 out of 4 people will be located far from the media server. Not the best approach…
One solution is to conduct the meeting by spreading it across servers closest to each of the participants:
We use more server resources to get this session served, but we have a lot more control over the media routes so we can optimize them better. This improved media quality for the session.#2 – Fragmented Allocations
Assume that we can connect up to 100 participants in a single media server. Furthermore, every meeting can hold up to 10 participants. Ideally, we won’t want to assign more than 10 meetings per media server.
But what if I told you the average meeting size is 2 participants? It can get us to this type of an allocation:
This causes a lot of wasted server resources. How can we solve that?
- By having people commit in advance to the maximum meeting size. Not something you really want to do
- Taking a risk, assume that if you allocate 50% of a server’s capacity, the rest of the capacity you leave for existing meetings allowing them to grow. You still have wasted resources, but to a lower degree. There will be edge cases where you won’t be able to fill out the meetings due to server resources
- Migrating sessions across media servers in an effort to “defragment” the servers. It is as ugly as it sounds, and probably just as disrupting to the users
- Cascade sessions. Allow them to grow across machines
That last one of cascading? You can do that by reserving some of a media server’s resources for cascading existing sessions to other media servers.#3 – Larger Meetings
Assuming you want to create larger meetings than one a single media server can handle, your only choice is to cascade.
If your media server can hold 100 participants and you want meetings at the size of 5,000 participants, then you’ll need to be able to cascade to support them. This isn’t easy, which explains why there aren’t many such solutions available, but it definitely is possible.
Mind you, in such large meetings, the media flow won’t be bidirectional. You’ll have fewer participants sending media and a lot more only receiving media. For the pure broadcasting scenario, I’ve written a guest post on the scaling challenges on Red5 Pro’s blog.Recap
We’ve touched a lot of areas here. Here’s what you should do when trying to decide how many users can fit in your WebRTC calls:
- Whatever meeting size you have in mind it is possible to support with WebRTC
- It will be a matter of costs and aligning it with your business model that will make or break that one
- The larger the meeting size, the more complex it will be to get it done right, and the more limitations and assumptions you’ll need to add to the equation
- Analyze the complexity you need to support
- Count the incoming and outgoing streams to each device and media server
- Decide on the video quality (resolution and bitrate) for each stream
- Define the media server you’ll be using
- Select a machine type to run the media server on
- Figure out the sizing needed before you reach scale out
- Check if the growth is linear on the server’s resources
- Decide if you scale out based on bandwidth, CPU, streams count or anything else
- Figure how cascading fits into the picture
- Offer with it better geolocation support
- Assist in resource fragmentation on the cloud infrastructure
- Or use it to grow meetings beyond a single media server’s capacity
What’s the size of your WebRTC meetings?
Need to understand your WebRTC group calling application backend? Take this free video mini-course on the untold story of WebRTC’s server side.
Here are CPaaS trends you should be expecting this year.
There’s no doubt about it. CPaaS is growing and it is doing so rapidly. It is a multi billion dollars industry, and while still small, there’s no sign of its growth stopping anytime soon. You’ll see the numbers $4 billion and $8 billion a year appearing in different reports and estimates that are flying around when talking about the near future of the CPaaS market size and growth potential. I have no clue if the numbers are correct – I’ve never been one to play with estimates.
What I do know, is that we’ve got multiple CPaaS vendors now with ARR (Annual Run Rate) higher than $100 million. Most of it may still come from good old SMS and phone calls, but I think this will change along with how consumers communicate.
This change will make CPaaS a lot more interesting and diversified than the boring race to the bottom that seems to be prevalent in some of the players’ offering and messaging in this market. The problem with CPaaS today is twofold:
- SMS and voice are somewhat commoditized. There is a finite way in which you can send and receive SMS and phone calls over phone numbers, and we’ve exhausted them and how to express them in a simple API for developers to use years ago. Since then, the game we played was one of scalability, stability and price points
- Developers are resistant to paying for IP based communications services at the moment. They somehow believe that these are a lot easier to develop. While that is correct for the “hello world” implementation, once you need to provide long term maintenance and scalability capabilities this can grow into a huge headache – especially when you couple this with some of the trends in communication that are being introduced
Which brings me to what you can expect in 2018. Here are 7 CPaaS trends that will grow and become important this year – and more importantly – what they mean.
Planning on selecting a CPaaS vendor? Check out this shortlist of CPaaS vendor selection metrics:
Get the shortlist#1 – Serverless
Serverless is also known as Functions.
You might know about serverless from AWS Lambda, Azure Functions, Google’s Cloud Functions and Apache’s OpenWhisk. The list here isn’t random – it goes to show that all big cloud platforms are now offering serverless capabilities.
This still isn’t prevalent in CPaaS, where for the most part, developers are expected to develop, maintain and operate their own servers that communicate with the CPaaS vendor’s infrastructure. But we do see signs of serverless making its way here.
I’ve covered that last year, when I took a deeper look into the Twilio Functions offering and what that means to the CPaaS market.
At the time, Twilio stated that Functions is already Twilio’s fastest growing product ever. Here’s where they explain what it does:
Twilio being the market leader in CPaaS, and Functions being a fast growing product of theirs means that other CPaaS vendors will follow. Simply because demand here is obvious.#2 – Omnichannel
When SMS just isn’t enough.
Not sure when you last used SMS for personal reasons – I know that I rarely end up inside that app on my smartphone. The way things are going, SMS can be considered the spam channel of 2018. Or maybe the channel used by businesses who’ve been told that this is the best way to reach customers and interrupt them.
While I definitely see value in SMS, I also think that businesses should strive to communicate with their customers on other channels – channels their users are now focusing on with their social life. In Israel that would be Whatsapp. In the US probably a mixture of Facebook and iMessage will work better. Telegram would be the choice for Russia.
Whatever that channel is, to support it, someone needs to integrate with it. And then decide which channel to use for which customer and for what interaction. For CPaaS, that’s what Omnichannel is about. Enabling developers, and by extension businesses to communicate with their customers on the customer’s preferred channel.
2018 is going to be the year Omnichannel becomes a serious requirement.
Because now we can actually use it.
Apple’s own Business Chat service is planned to make its public debut this year.
Facebook has its own APIs already, and Whatsapp announced business accounts (=APIs).
That alone covers a large majority of customer bases.
Throw in SMS, mix and choose the ones you want. And voila! Omnichannel.
For businesses, relying on CPaaS for Omnichannel makes sense, as the hassle of adding all of these channels and maintaining them is expensive. Omichannel CPaaS APIs will abstract that away.
For CPaaS vendors, this is a way to differentiate and make switching between vendors harder.
From code, to REST, to point-and-click.
We used to use DOS as an “operating system”. I worked at a small computer shop as a kid when I grew up. For a couple of years, my role was to go to people’s homes and explain to them how to use the new computer they just purchased. How to put the DOS disk inside the floppy drive, list the files in a floppy, run games and other applications.
Then came Windows (along with Mac and OS/2 and others) and we all just moved to using a visual operating system and a mouse.
As a kid, I programmed using Logo and Basic. Then Turbo Pascal – in a decent IDE for the first time. In the university, I got acquainted to Tcl/Tk. And then UI development seemed fun. Even it if was by writing code by hand. Then one day, vtcl came to life – a visual editor. Things got easier.
Developing communications is taking the same path now.
It started by needing to build your own stuff from scratch, then with open source frameworks and later CPaaS and REST (or god forbid SOAP) APIs.
In 2017, Twilio Studio was announced – a visual IDE to use on top of the Twilio functionality. In that corner, you can also count Amazon Connect, though not CPaaS but still in the domain of communications – it has a visual IDE of its own.
The concept of using visual tools requiring less coding can greatly increase productivity and the target audience of these tools. They are no longer restricted to developers “who code”. Hell – I can use these tools. I played with Twilio Studio a bit – it was fun and intuitive. It guides the way you think about what needs to be done. About the flow of the service.
I really can’t see how other CPaaS vendors are going to ignore this trend and not work on their own visual offerings during 2018.#4 – Machine Learning and Artificial Intelligence
It is time to be smart about communications
When I worked at Amdocs some years ago, we’ve looked into the area of Big Data Analytics. It was all about how you take the boatloads of information telecommunication companies have and do something with it. You start by analyzing and visualizing it, moving towards the domain of actionable.
It frustrated the hell out of me to understand how little communication vendors are doing with their data compared to enterprises in other markets. Or at least that was my impression looking from inside a vendor.
Fast forward to today, and what you find with CPaaS vendors is that they are offering a well oiled machine that provides generic communications. You can do whatever you want with it, and the smart ones are adding analytics on top for their own needs.
But want about the CPaaS vendors themselves? Shouldn’t they be doing something about analytics? Or its better branded colleague known as machine learning?
Gustavo Garcia wrote a good article about it – improving real time communications with machine learning. This is where most CPaaS vendors are probably looking today, optimizing their network to offer a better service.
But it is just scratching the surface.
The obvious is adding things around NLP – speech to text, text to speech, translation. All those are being done by integrating with third parties today, and many of the CPaaS vendors offer these out of the box.
To move the needle and differentiate, more needs to be done:
- The internal structure of the CPaaS vendors should take into account the need for researching data. Data scientists and machine learning people have to be part of the development and product teams for this to ever happen
- CPaaS vendors need to start thinking on what they can offer by analyzing their own data (and their customer’s communications) beyond just optimizing it
If you are a CPaaS vendor and you don’t have at least a data scientist, a machine learning developer and a product manager savvy in this domain yet, then start recruiting.#5 – AR/VR
Time to connect ARKit and ARCode to communications.
Augmented reality and virtual reality have been around for the better part of the last decade or two. But somehow, they are only now becoming interesting.
I guess the popularity of AR has grown a lot, and where it fits directly in smartphones today (and not the bulky 3D headsets) is with things like Pokemon Go and camera filters (started by popularized snapchat and found everywhere today).
With the introduction of Apple ARKit and Google ARCore, this is only going to get more commonplace. And what we see now is CPaaS vendors finding their way around this technology.
The most interesting one yet is Twilio’s work with ARKit, which they showcased at last year’s Kranky Geek event:
With all the focus put in this domain, I am sure we’ll see more CPaaS vendors looking into it.#6 – Bots
Omnichannel + Machine Learning + Automation = Bots
Chat bots is all the rage. Search the internet and you’ll be thinking that humans no longer talk to customers anymore. It is all taken care of by bots.
I’ve added a chat widget to certain pages on my website. And every once in awhile I get a question there asking if that’s a human they’re interacting with.
Bots require integration and APIs. They are also about communications. Which is probably why CPaaS vendors are taking a step towards this direction as well. The ones adding Omnichannel offerings across multiple channels are in effect enabling bots to be created there across channels.
That’s a first step though, as the next would be to cater this market better by enabling conversational interfaces and easing the part of packaging the bots for the various channels.
Expect to see a few announcements around bots to be made by CPaaS vendors this year. A lot of it will revolve around Amazon Alexa and Google Home#7 – GDPR
The governance headache we’ve all been waiting for.
GDPR stands for General Data Protection Regulation. It is a new set of EU rules that have been put in place to protect the data related to EU citizens that is collected and stored.
While it is easy to assume that CPaaS vendors store no data – they “live” in the real time, that isn’t accurate.
Stored meta data and logs may fall into the GDPR black hole, and definitely recording services. With the introduction of Omnichannel and Bots comes chat history storage.
Twilio jumped on this bandwagon last year with a GDPR program. Other vendors such as MessageBird indicated future support of GDPR. All global CPaaS vendors will need to support GDPR, and since these regulations come to force this year, 2018 will be the year GDPR gets more attention and focus by CPaaS vendors.2018 – The Year CPaaS Vendors Differentiated
In the past few years, we’ve seen CPaaS vendors struggling in two directions:
- Increasing their customer base, mainly around SMS and voice offerings – which is where most of the revenue is these days
- Growing from a telecom focused player to a global player
That second point is important. Up until recently, CPaaS equated to running one or two data centers (or the equivalent of running from a small number of cloud based data centers), connecting developers via REST APIs to the telecom backend. With the introduction of IP based communications (and WebRTC), the was a growing need for client side SDKs along with more points of presence closer to the end user.
We seem to be past that hurdle for most CPaaS vendors. Most of them have grown their footprint to include a global infrastructure.
The next frontier is going to happen elsewhere:
- Serverless – in making the services easier for developers to adopt by reducing the requirement for customers to deploy their own machines
- Omnichannel – extending the reach beyond the telecom channels of SMS and voice into social networks
- Visual / IDE – grow the service beyond developers, making it easier to use and faster to deploy with
- Machine Learning and Artificial Intelligence – add intelligence and analytics based services
- AR/VR – capture the new world of augmented and virtual reality and enhance it with communications
- Bots – align with the A2P model of businesses communicating with customers through automation
- GDPR – provide support for the new EU initiative, adding governance and regulation as another added value of choosing CPaaS instead of in-house development
CPaaS will move in rapid pace in the next few years. Vendors who won’t invest and grow their offerings and business will not stay with us for long.
Planning on selecting a CPaaS vendor? Check out this shortlist of CPaaS vendor selection metrics:
Get the shortlist
adapter.js is the glue that sticks your code to the different browser implementations of WebRTC.
This article was co-written with Philipp Hancke. He has been the driving force behind adapter.js in the last two years, so it seemed like the best approach to have him contribute large portions of it. You can follow his writing here.
One of the visuals I created when I started out with WebRTC was this one:
It had several incarnations, and the main concept here is to show how WebRTC is different than traditional VoIP.
With traditional VoIP, you have multiple vendors implementing the specification, in hopes (as well as active interoperability testing) that the implementations will work in front of each other. If you knew one VoIP implementation, it said nothing about your ability to be able to yield another.
WebRTC was different. It brought to the table the concept of free, but also HTML5; and by that, I mean having a single API that every developer can use to add interactive voice and video to his application.
getUserMedia, PeerConnection and the data channel are all APIs specified in WebRTC. We’re now all speaking the same language when we’re implementing applications. And that, in turn, creates an ecosystem around it. One that was never there with such force with traditional VoIP.
Problem is, you can think of the WebRTC API as a suggestion only. That’s because today, version 1.0 of the specification isn’t yet a reality. We’ve got a candidate for it, but that says nothing about the implementations. Browser implementations of WebRTC are more like dialects of the same language. When you speak one, you understand another, but not fully. Not its nuances. And bad things can happen if two people with different dialects try to talk to each other without patience or understanding.
Which is probably where adapter.js comes into play.
Before we ask ourselves if adapter.js is needed today (it is), it would be worthwhile to understand how it came to be.adapter.js Origin Story
adapter.js has been around since the early days of WebRTC in late 2012 and early 2013. It was originally part of Google’s apprtc sample application. The original version can still be found in the Chrome tree. It was a very small project, less than 150 lines. The main job was to hide prefix differences like webkitRTCPeerConnection and mozRTCPeerConnection and to provide helper functions to attach a MediaStream to an HTML <audio> or <video> element.
During those wild west days of WebRTC, everyone wrote their own library to make WebRTC easier. This started to change in mid-2015 when Microsoft Edge came along. While Edge did not require prefixes for getUserMedia, attaching the MediaStream to a video element still worked in three different ways in as many implementations. This showed that there was a need to move to standardized behaviour. Also, as Microsoft’s Bernard Aboba pointed out, books were printed that showed the prefixed versions of the APIs — which is the wrong thing to teach.
Preferring ORTC over the WebRTC 1.0 API, Microsoft was extremely happy to support the addition of a shim of the RTCPeerConnection API on top of ORTC. This enabled early interoperability tests and allowed ironing out some bugs before the first public ORTC-enabled Edge version.
— Justin Uberti (@juberti) April 4, 2016
When Safari started shipping WebRTC they contributed a shim for the “legacy” bits of the WebRTC API that they did not want to ship. This was an interesting attempt to get developers to write modern, promise-based WebRTC code. However, it does not seem to have worked out as sadly the release version shipped with the legacy API is enabled by default.
With growing complexity (currently over 2,200 lines of code) and being in the “hot path”, testing of changes to the adapter.s code itself became more of an issue. Initially powered by Selenium the tests have been split up into unit tests and end-to-end tests that use standard testing tools like karma, mocha and chai to make assertions while running in a multitude of browsers on Travis-CI for every pull request and compare the results to previous runs. This shows the state of the art for testing WebRTC libraries and has been adopted by other projects as well.
For a quick and dirty project you can simply include https://webrtc.github.io/adapter/adapter-latest.js in your code.
This will give you the latest published version. Note however that your application will automatically pull any changes so this is not recommended for larger applications.
Since it is a polyfill, it transparently modifies the window object by default. The adapter object gives you information about the browser variant and version it detected in the browserDetails object:console.log(adapter.browserDetails.browser); console.log(adapter.browserDetails.version);
This is slightly different from a version detection library like platform as it treats Chromium-based browsers like Opera as Chrome — since they run the same WebRTC engine that makes sense.
You can use the detected browser and version to add your own logic for working around bugs present in certain Chrome versions (e.g. the Chrome 61/Android video freeze or the Chrome 58 TURN/TCP issue).
To check WebRTC support you will need to check that RTCPeerConnection is defined:!!window.RTCPeerConnection
and, if your use-case requires it, getUserMedia!!(navigator.mediaDevices && navigator.mediaDevices.getUserMedia)
or the createDataChannel method of the RTCPeerConnection:‘createDataChannel’ in RTCPeerConnection.prototype
After that you can simply write your WebRTC code as shown in the specification:
The official WebRTC samples are a great way to get started as they show a lot of use-cases and the maintainers ensure that they are semantically correct. Most of the shims are written in such a way that they will not become inactive when the native variant is available.Moving Forward
There are 4 forces at play with adapter.js:
- The WebRTC specification itself. This is what we expect and suggest developers build against.
- The browser’s implementation of WebRTC. At the moment, this is lagging behind the WebRTC specification and will take time to catch up. Until that time, use of adapter.js is suggested (you can write your own, but why bother maintaining it?)
- The adapter.js implementation, where you’ll need to keep an eye on newer versions, adopt them and test against them
- Your own implementation, and how it interacts with the other 3 forces
Will a day come when we no longer need adapter.js?
But don’t wait up for it.
If the lifespan of jQuery is any indication (11 years and still going strong, with the last 4 of them with articles on why we don’t need jquery), we will be using adapter.js for many years to come.
WebRTC is… everywhere.
WebRTC started some 6 years ago. It was just another VoIP protocol specification that just happened to be targeted at browsers.
Six years in, and now WebRTC is everywhere. There are still those who believe it has failed, or haven’t lived up to its expectations. I’d say it is the vendors who failed to adopt it are the ones that have failed.
How do I know?
It has to do with those that are using it. Here are 10 massive applications that are making use of WebRTC. These companies trust WebRTC to offer them the leverage they need to deliver the user experience they strive for.
Looking for more vendors using WebRTC? Here are 10 interviews with inspiring vendors using WebRTC.
Download the eBookWhat’s Massive in WebRTC Land?
Before we start though, I want to say a word about what massive is.
It is really hard to know what’s massive. How do you count it? Especially when none of the vendors are willing to share their numbers in meaningful ways here.
So let’s do a back-of-the-napkin kind of calculation here for a sec –
In the recent Kranky Geek event, Google shared in their session an interesting statistic:
Over 1.5 billion of weekly audio/video minutes.
That’s easily upwards of 214 million minutes a day.
And that’s only on Chrome.
This number does include:
- Other browsers. Today that means Firefox, Edge and Safari
- Usage through plugins. Which covers Internet Explorer
- Electron and CEF based applications. And there are a few very popular ones I can think of
- Mobile applications, making use of WebRTC
- Those who take the bits and pieces of WebRTC that they need, integrate it with their service, and then just make use of it (not always with proper attribution)
So the numbers are larger. Much larger.The Google Machine and its Leftovers
Back to that more than 214 million minutes a day.
During March 2017, Serge Lachapelle, the person in charge of WebRTC in the past and now of Google Hangouts and Meet, shared some numbers about video conferencing at Google during Google Cloud Next 2017:
9+ years daily translates to over 4.7 + minutes daily.
That’s the amount of use Google makes internally of Hangouts.
It is safe to assume that external use of non-Googlers can double that number with little effort to over 9 million minutes a day.
And continuing this lenient calculation, Hangouts accounts for 4-5% of all voice and video traffic in WebRTC.
Consider here fact that I counted Hangouts over multiple devices, browsers and applications while comparing it to Chrome only numbers, so I am fudging here a bit. On the other hand, I took non-Googlers to account for only half the usage, which is probably way too little.
Anyways, let’s look at them 10 massive applications who are already using WebRTC.1. Google Meet and Google Hangouts
9+ years daily. Inside Google alone.
Google Meet (or more accurately, Hangouts) is most probably one of the main reasons we have WebRTC.
Google had their own video conferencing service, working from Gmail, but it needed a plugin. Real time video just wasn’t there in the browser, which is where and why WebRTC started. And it started with a contribution by Google which we now know as webrtc.org.
To date, Google Meet (or Hangouts), is a massive application that makes use of WebRTC.2. Facebook Messenger
Here’s something I wrote some 5 years ago. It is about Skype vs Facebook. Here’s how I phrased it then:
Facebook can adopt WebRTC and provide a calling experience that surpasses most VoIP players.
The rest of the analysis then is kinda funny. Facebook did end up adopting WebRTC wholeheartedly into Messenger, but none of my suggestions were implemented (which in hindsight was probably for the best).
Here’s where Facebook have integrated WebRTC already:
- Messenger – video chat and group video chat, mobile and browser
- Facebook Live – when co-broadcasting
- VR Chat – video calls in Oculus
- Then there’s Workplace by Facebook and Instagram Live Video Chat
All using WebRTC. I am even ignoring WhatsApp here (not sure what parts of WebRTC they use exactly).
At the recent Kranky Geek, we had Li-Tal Mashiach of Facebook talk about what it is they are doing with WebRTC and how do they scale their service.
No minutes here, but 400 million people using WebRTC every month. That’s 13+ million people a day on average. With only a minute each this is already massive.3. Discord
I came across Discord and its use of WebRTC in July 2016. That’s when I added them to my dataset, through a message I saw on Facebook somewhere. As any other vendor that gets into my radar, I continued to follow them closely.
Discord is a social platform for gamers (for lack of a better term). They have been around for only 2.5 years. This month, they shared a few numbers. Specifically:
Nothing here about voice and video, but I do know that the numbers here are impressive.4. Amazon Chime
Amazon Chime is new to the scene of unified communications and already big.
Chime started as an acquisition only a year ago of a company called Biba. It was probably already well underway to become a replacement for Amazon’s own internal video conferencing services. At Amazon’s re:invent event last month, Amazon shared a few numbers of how they use Chime internally:
24.8 million minutes a month. That’s almost a million minutes a day. From Amazon’s internal meetings only. Not including any of their Chime customers.
Not as massive as the others, but still quite large.
One thing to note – this isn’t “pure” WebRTC. Amazon took the approach of supporting legacy video conferencing systems first, so they “did” something to WebRTC to make it work. Their roadmap for next year is to add direct browser for users as well. What we do know, is that this uses WebRTC technologies inside today already.
Oh… and I didn’t even mention Amazon Connect, Alexa and Mayday – all making use of WebRTC.5. Houseparty
Houseparty is huge. Especially if you’re a teen. My daughter will probably start using it in a few years… once she grows out of Whatsapp and Musical.ly. Or so I’ve been told.
Houseparty makes use of WebRTC, although it is a mobile only service.
There’s not much numbers going on about Houseparty this year, so I’ll stick to the ones we know from a year ago.
20 million minutes a day.
Enough said.6. Appear.in
Appear.in started as a summer internship project at Telenor Digital somewhere, growing up to this point in time. Today it got acquired by Videonor.
The service is a favorite of many in the WebRTC community (and elsewhere – they are doing million of minutes a day).
If you haven’t tried it yet, then you should: appear.in
And yes. It is in the league of the other vendors here when it comes to size.7. Gotomeeting
There are many traditional VoIP (interesting that VoIP can now be considered traditional) that have started adding WebRTC to their offerings.
Most can probably make it into this list of massive applications.
Out of them, I decided to choose GoToMeeting. Why? Because the integration they’ve done was quite a natural one. I’ve been using it for well over a year now whenever someone invited me into a meeting over GoToMeeting – in most cases, they weren’t even aware of the browser option.8. Peer5
I wanted to add a company that doesn’t do voice and video. Or rather ones that are making use of the WebRTC’s data channel.
The one I picked here was Peer5. It was the easiest for me to get numbers from (I am an advisor there).
The P2P CDN scene is getting quite interesting lately. Alongside the startups like Peer5 that are pushing the envelope we now see companies like Akamai who stated publicly that they are headed this way with WebRTC as well.
In this year’s Kranky Geek event, Hadar Weiss, Co-founder and CEO of Peer5, shared a few of their numbers:
1 billion connections a day is large. Compared to millions of minutes a day. But we have to remember – a lot of these connections are short-lived in nature (viewers reaching out to peers they might stream data from or to) and that the more interesting number, which isn’t publicly available yet, is about actual data traffic.9. CPaaS vendors
CPaaS vendors drive this industry forwards. They do so for the smaller vendors as well as the largest ones.
In 2016, Twilio claimed to process “more than a billion minutes of WebRTC calls made through Twilio” as part of their launch of Voice Insights.
TokBox has stated this year that they power social video apps including Monkey, Houseparty, Fam and live.ly.
And they are not alone with it. There are 20+ such vendors catering to the needs of other developers.
Some of the CPaaS vendors can definitely be considered massive when it comes to the WebRTC traffic they generate.10. Back to you now
I most definitely forgot a vendor or two here.
Scroll down and comment below with your 10th candidate for the massive application using WebRTC.WebRTC is Still Miniscule
Let’s look at some other engagement metrics out there.
Netflix shared their numbers for the year this month:
Netflix members around the world watched more than 140 million hours per day
Hours. Not minutes. In minutes? That’s 8.4 billion minutes a day. For a single vendor. Compared to WebRTC’s 214 million minutes a day on Chrome.
I’d say WebRTC has room to grow.
Here’s for a bigger 2018.
Looking for more vendors using WebRTC? Here are 10 interviews with inspiring vendors using WebRTC.
Download the eBook
Are you doing your WebRTC pricing per minute? per gigabyte? per device?
You’re a developer. You decide it is time to build an application. But you don’t really want to do everything from scratch. Hell – you don’t even want to maintain and update all of that media backend – what do you really know about video? So you go look for someone to do it for you, finding a nice set of vendors offering WebRTC PaaS services. You can easily plug into their SDK and in no time have your service do group calling.
You probably won’t be conquering the world as the next Whatsapp with such an approach, but getting that healthcare service up and running an education application or a visual contact center is now within easy reach.
And you won’t be alone in this either. About a third of the dataset of vendors using WebRTC that I am tracking is using third parties. Most of them use managed services.
But here comes the question. Do you know how much you’re going to pay for that WebRTC PaaS service?
I get requests to assist in vendor selection on a weekly basis. This has been going for a few years now. This year, one of the main focus areas in this process has been pricing. Or more accurately, understanding the pricing schemes or the different vendors, and comparing the costs of these vendors.
There’s no easy way to get that done…
- Because vendors have different pricing models
- Because you need to fully understand your scenario
- Because it just isn’t straightforward
Let’s review the 3 leading pricing parameters are going to dictate your costs:Minutes
This one may seem easy.
You are going to pay for the number of minutes you use in a service.
It should be easy to calculate. Easy to understand the value (the more you use the more you pay).
But somehow, people translate minutes to the “old” days of telecom, where you paid top dollars to make phone calls. By the minute of course.
The devil is in the details here.
Here are few differences you’ll see between vendors.
- Is there a minimum allowance of minutes? In many cases, a baseline monthly fee will be requested. That monthly fee will include pre-calculated minutes that you can use. They will usually be priced at their cost value. This is:
- Seriousness fee. You pay so the vendor will spend the time necessary in answering your nagging support questions
- Signal to customers. If that fee is high (hundreds of dollars or more), it is meant to signal you they are interested in businesses with money to spend – probably enterprises: “we’re taking only premium customers”. The alternative of very low monthly fee indicates a stance of “we cater all developers and happy to embrace the long tail”
- Reduce noise. Non-paying”free-tier” customers are noise. Lots and lots of noise. They ask the most amount of questions, and usually these questions (and demands) won’t lead to a sale anyway. So vendors put some built-in must-pay price point to filter out the free riders who probably won’t help their bottom line anyway
- Flat rate? Tiered? Pre-commit? Call us? Different vendors offer different methods to offer better price points (discounts) based on usage. Here’s what I’ve seen vendors do:
- Flat rate. There’s a single price point. Take it or leave it. You just take the number, multiply it by the minutes and voila! You get your costs. It always comes with text saying that high volume pricing is available
- Tiered. First X minutes are free (included in the plan). Next Y minutes come at a certain price. Z following minutes are at a lower price point and so one. Later minutes cost you less
- Pre-commit. Commit in advance (and pay) for a certain number of minutes. If you pass that number, the low price point you already committed to will continue to apply
- Call us. Almost always there in all plans. For big enough customers, we will negotiate deals suitable for both sides
- What gets counted? Saying the price is per minute is nice, but what are these minutes counted against? Here are a few examples:
- Actual media minutes. This is a common approach. You got an SDK of the vendor connected to a session, the time starts ticking
- Connected devices. Then there’s the approach of connected devices. You are connected – you pay. Even if you send or receive nothing. This isn’t a common approach, but it does exist when the price per minute is low and combined with bandwidth payment (see below). It can also be tiered
- Subscriptions. See below
The great thing about minutes? They are easy to comprehend and count.
If you have 10 people in a call for 10 minutes – that’s 100 minutes (assuming we count per device here).
The downside is that with minutes, there’s usually less regard to what is done in that minute. A video minute is the same as a voice minute on most platforms when it comes to pricing. And a low resolution video minute is the same as a high resolution video minute.Subscriptions
Subscriptions is related to minutes, and deals with the question of what it is you count the minutes against?
The two most common practices here is to count devices or count subscriptions.
Some of the WebRTC PaaS services work off the notion of a publish subscribe mechanism. Devices can publish media streams into a session, and devices can subscribe to media streams from the session. This is an elegant approach that can nicely be used when describing a complex scenario with asymmetric behaviors.
In an SFU group video call model, where each user publishes his own media streams and subscribes to the media streams of all other participants, the number of subscriptions grows at a polynomial rate: with N active users in a session, you’ll be counting N*(N-1) subscribed media streams.
In WebRTC PaaS, paying per subscribed minutes tends to be cheaper than paying per device minutes for lower group sizes (and vice versa)
Click To Tweet
It makes sense for a vendor to apply a per-subscription price as in many cases, his own costs are probably tightly coupled with the number of media subscriptions in the system.
Subscriptions are slightly harder to count than devices, but it is still gives you a solid number and an easy estimate.Bandwidth
The main complaint about per minute pricing is that it is a reminder of the old telecom days. The notion was that once we go for VoIP, cloud, web, WebRTC or whatever you want to call it, you can price it closer to the usage and not stay at the high level of a minute concept.
If it was limited only to the difference between audio and video then so be it. Give two price points per minute and you’re done. But video is different. It becomes more of a hassle with video. You can probably get video going with as little as 300kbps with 10-20mbps being applicable to 4K video resolutions. That’s not including things like 360 videos and other crazy trends like 8K or 10K resolutions that were just added to the HDMI spec.
So vendors are now looking into taking the route that is so common in IaaS – pricing per bandwidth processed.
Usually, that would be subscribed bandwidth. The reason for that is that cloud services usually cost the vendor based on the bandwidth he sends to browsers and mobile devices and not for bandwidth it receives on its cloud servers.
Here are a few quick things to validate in this price schemes:
- Is price calculated on subscribed bandwidth only or on both send and receive?
- If media gets routed towards the vendor (recording or SFU usually) AND the session needs to be relayed via a TURN server. Do you count the costs of TURN related traffic AND server processing traffic?
Note that if you’re doing peer-to-peer sessions (that means doing a 1-on-1 session where you don’t want media to go through the vendor’s servers), you won’t be paying for bandwidth at all – unless the media gets relayed via TURN. TURN relay depends on network conditions and can’t be estimated properly (highly reliant on your users), but a rule of thumb of 15-20% of the sessions is usually used here.
Paying per bandwidth will tend to be cheaper than by minute. The reason is that the end result will be tailored to your exact usage pattern. That said, there are several downsides here:
- It is usually hard to estimate in advance, as translating minutes of use to bandwidth isn’t straightforward
- Different services will give different bitrates for seemingly the same service (I am working for a customer now, looking into the differences across many group video services, and it is devilishly hard to find commonality across the applications)
- It is harder to calculate than the rest, and it usually contains also a per minute counting to go alongside the bandwidth calculation
Going for this IaaS type of a model is a great way to lower price points for customers, but at the same time it is a great way of dealing them with a huge headache.
At testRTC, I’ve been trying for some time now with my colleagues there to estimate what are costs are/should be. How much will we end up paying for our IaaS vendors every month? It is so hard, that I usually can’t even understand the detailed invoices we receive at the end of each month. I fear that the same is/will occur with per bandwidth pricing in WebRTC PaaS.Where Do We Go From Here?
In the latest update to my WebRTC PaaS report I’ve included a new appendix explaining pricing models in this space.
But the coolest thing yet was the inclusion of a new tool – a price calculator.
It is probably the 4th or 5th that I’ve created in 2017, each with its own nuances, target use cases and complexities.
This one was meant to be as generic and as simple as possible.
You enter the expected number of sessions you plan to have on a monthly basis, the number of users and the bandwidth per stream (there are a few suggested values in there).
Then you enter the pricing model and the price points of the vendors you want to compare, and the result will be the expected monthly cost you’ll have for each vendor.
Need something a bit more tailored? Reach out to me and I’ll help you out.
This latest update of my WebRTC PaaS report brings with it new vendors as well as a new price calculator.
It is becoming a ritual. Every 8 months or so I update the WebRTC PaaS (or CPaaS) report.
Every time I am surprised by the changes that occur. They come in 4 different areas:
- There are new vendors joining this market
- There are old vendors leaving this market
- There are changes in the feature set of existing vendors already covered in the report
- There are new trends that needs to be covered
How did we do since last time?New Vendors Covered ECLWebRTC by NTT Communications
I’ve been watching the work done by NTT Communications for quite some time. It started as a project that has signaling capabilities in it. At the time, they called it SkyWay.
Later on, they developed and added an SFU into the mix.
In September 2017 they decided to open up their platform globally. That’s the point where it made sense to add them to the report.Phenix
Phenix has been an enigma to me in the past two years.
From afar, it looked like a vendor trying to go after the broadcast market with a low latency technology based on WebRTC. Recently they approached me to explain what it is that they do and to check if it fits into this report.
And it did.
Phenix is focused on the large scale interactive streaming sessions. Places where you want to pick one or a few broadcasters and have their interactions shared with a larger audience.Vendors Closing Doors
We had those as well.Tropo by Cisco
Acquisitions of a WebRTC CPaaS vendor is sometimes beneficial and sometimes terrible for its customers.
TokBox’ acquisition by Telefonica was a good thing.
Tropo’s acquisition by Cisco… not so much.
Two years after its acquisition, Tropo closed doors to new customers. The signs were out there, since the platform didn’t really evolve. The service is still up and running, but I don’t think Tropo customers are happy to be using Tropo right now, and I don’t think Tropo/Cisco are happy to be needing to serve these customers. A lose-lose situation here.
Cisco simply pivoted. They decided that Tropo was not the right strategy and wanted to double down on Cisco Spark APIs and developer ecosystem.forge by Xura
Forge is another sad story of our industry.
Starting life as Crocodile RCS, it has been acquired by Acision. Acision was acquired by Comverse. Which got rebranded to Xura. Which was taken off the market by Siris Capital.
Forge, and probably other assets of Xura were just collateral damage in this process.M&A and Pivots in WebRTC PaaS Apidaze acquired by VoIP Innovations
VoIP Innovations acquired Apidaze. This is a good signal for the platform’s health. Looking at the investment section of Apidaze’ 4-pager in my report shows the story:
A lot of the attention and focus was taken from Apidaze API platform and put towards Ottspot, a “slack business phone app”.
This acquisition by VoIP Innovations might mean a renewed focus on the Apidaze platform and the developers who use it.TrueVoice is now Voxeet
TrueVoice was added to the report earlier this year. At the time, Voxeet added it as another product offering. This time around, Voxeet is making the APIs the main product.
This caused the TrueVoice brand to be removed, and Voxeet to be the actual thing.
Building a platform for developers is an all consuming process. Larger companies might be able to cope with doing that in parallel to other activities, but the smaller vendors will struggle. The fact that Voxeet decided to pivot and focus on developers is a good sign.Putting it all in a Visual
Here’s what it means visually:
2 in. 2 out. A few minor changes elsewhere.
The report shows the transitions in this market since 2014.What’s in the report?
The report is quite long. It now contains 223 pages. This includes:
- The explanation of WebRTC from the point of view of someone who has a build vs buy decision to make
- KPIs to use in the selection process – and why they should matter to you
- Vendor sections (20 of them) – 4 pages per vendor
- Old vendors – to give an understanding of why they “left” the market, and maybe use it as signals to the existing vendors and their future stability
- Appendixes. 9 of them
Want to get a sneak peak into the report? You can check out these two PDF resources:
As you can see, this time, TokBox were kind enough to sponsor their 4-pager of the report and have it publicly available.
Here’s what Badri Rajasekar, TokBox CTO had to say:
2017 has been a big year for WebRTC. In what many considered a very significant piece of the puzzle, Apple announced support for WebRTC in Safari, finally allowing developers to use WebRTC on any browser platform. At the same time, we’ve seen a surge in adoption of live video communications driven in part by consumer demand. BlogGeek.me’s evaluation of this market is a valuable read for those looking for snapshot of this year’s trends in WebRTC.
Check out TokBox 4-pager from the report. You can expect to see 19 other such detailed profiles of the other vendors that the report covers.Report Tools
The report doesn’t come only as a “standalone” PDF file. You can access to a few additional tools:
- Price calculator – an Excel sheet designed to make it easier to estimate your costs using different vendors
- Online vendors comparison matrix – an online comparison matrix you can use to quickly validate which vendors offer the feature set and capabilities you need
- Vendor selection blueprint – an Excel sheet and Word workbook with a step-by-step guide on how to narrow down and score vendors for your application
- Presentation visuals – the presentation visuals from the report, easily available for use in your own internal or external presentations
There’s a ton more in the report, and work I do with vendors in this space – those offering such services, looking to offer such services or want to use these services.
WebRTC API Platforms are different than the classic/legacy/common CPaaS.
As I am working on getting the final TBDs in my upcoming report update on Choosing a WebRTC API Platform, I wanted to share something that may seem obvious, but probably isn’t.
When talking about CPaaS, WebRTC brings with it something more than just accessibility from the browser.
Here’s the makeup of a CPaaS platform:
There’s backend telephony in there, built out of some VoIP server components, connected to the carriers to handle things like phone numbers and actual calling.
Developers connect to that backend via REST APIs, or some other form of scripting interface.
Latencies and wait times aren’t important for the most part, so the CPaaS vendor doesn’t need to be spread across the globe to provide the service. A couple of data centers for redundancy and some reduction in latencies is usually enough.
Here’s what a WebRTC API platform looks like:
There might or might not be REST APIs. they are important, but definitely aren’t the main way developers interact with the system. That’s done via the SDKs. The SDKs are wrappers around the REST APIs or some other interface (probably WebSocket based), allowing getting the actual media and processing it as part of the SDK – either in the browser or on a mobile device.
And then there’s the backend. Signaling and NAT traversal are rather mandatory. Without them, this won’t be a WebRTC API platform. In the majority of the cases, you’ll also have access to an SFU, allowing you to support group video calls. All that backend? Especially the media parts of NAT traversal and SFU? They have to be as close to the end user as possible, so these platforms often deploy globally, on all possible data centers of a cloud provider (think AWS or GCE) and sometimes running on multiple cloud providers to increase their reach.
The difference then?
- SDK that handles actual media processing; with less focus on REST APIs
- Globally spread backend, to reduce latencies
There’s a challenge selling to developers. They tend to underestimate the effort involved. And they usually prefer building new shiny toys than polishing and maintaining something that’s working. This is made worse by the seemingly “easy” fashion by which you can get a WebRTC peer-to-peer call happen inside a browser between two tabs. It gives the impression that developing and running WebRTC at scale is trivial.
Especially when you compare it to connecting to a phone number and dialing it. Doing this via an API is easy. But how do you go about dialing out a number on your own without the assistance of CPaaS? Is there a really simple example of this? Not really. This requires more than just programming – the value here is the accessibility to the phone network, which is considered a royal ongoing headache. So it is easy to outsource and to understand its value.
Here’s how the thinking goes:
SDKs? Sure. We can write them.
Signaling? I found a project on github that looks popular enough.
NAT Traversal? Everyone’s already using coturn. Should be simple enough to get it up and running.
SFU? Just passing data around. Can be written in a weekend.
Will WebRTC API Platform vendors be able to overcome this challenge? How can this be explained to developers? There is a lot that goes into building such a platform. More than the mere initial technical hurdles.
Browsers are changing. There are now 4 of them that have “support” for WebRTC. That support is different between browsers. New browser versions break things that used to work before. The specification is being finalized now, but no browser supports it yet.
Media backends need to be maintained. Monitored. Updated. Secured. In an ongoing basis.
In the coming years we will see a shift from H.264 and VP8 video codecs to VP9, HEVC and/or AV1 video codecs. This will require additional investment in the infrastructure.
And still it is believed to be easy and simple.
It isn’t.Planning on Launching Your Own WebRTC API Platform?
If you are planning to launch your own WebRTC API Platform, then you should know what you’re up against.
In the past 4 years I’ve been looking at this market, analyzing it. Seeing it grow and mature. The report covers 20+ vendors offering WebRTC API Platforms. Most of the are active. A few died or got acquired and taken off market.
One of the things to note is how new WebRTC API Platform vendors make their decision to launch their service. What do they decide to include in their initial launch. What do they use as differentiating factors from the existing players.
The space is rather crowded already, even if no clear winner exists yet.
Make sure to do your homework here. Understand what you’re up against and why should developers come to you and not to others. And plan for the long run.Planning to Use a WebRTC API Platform?
If you are in the build vs buy decision point, then think of the alternative costs of each approach. Also figure out your time to market and each and the risk of failure. For new projects, I tend to suggest a platform instead of self development. It reduces risk and upfront costs, but more than that, it enables experimenting and proving the business before committing too much into the project.
If you decided to build on your own, make sure your reasoning is rock solid. If the only reason is cost, then I suggest you recalculate.
If you decided to buy into a platform instead, then pick a platform that fits your need. But make sure it is here to stay as much as you can – this market is dynamic and is bound to stay that way for a few more years.The Report Update
The updated report will get published later this week.
If you want to learn more about it, just contact me.
WebRTC Index has been around for 3 years now. Are you listed?
The idea behind it was quite simple. We create a place where someone can come and publish his company and its services – assuming they are related to WebRTC. The list grew, and now stands at 250 published vendors.
What we also did, was make sure the site is sustainable (there’s work to be done to keep it up to date). We chose the sponsorship approach:
Vendors can be listed freely in the index, but if you are a sponsor, then you get a bit of extra juice. You appear on the main page as a sponsor, get listed first on relevant search results, and get a few more ways to express what it is you offer on your own page.
What the WebRTC Index turned out into is a place to search for relevant vendors to assist people in understanding the industry and to pick up someone to work with.
And here comes my question to you?
Are you listed in the WebRTC Index?
Got check – http://webrtcindex.com/
I’ll sit and wait here. In the dark. Next to the nameless virtual machine that is hosting this website of mine.
Not there? Then read on…How can you join the WebRTC Index?
The system is easy and works as a manual process.
- Go to https://webrtcindex.com and check if your company is already listed
- If it isn’t, then just press the red button saying “Add your company”:
- Fill out the Google Form you reached
- Wait a couple of days (a week tops – I promise) – until you get an email with your listing
It really is that simple.
And it is a free process – no need to pay anything to join the list.
So why wait?
Twilio isn’t the first CPaaS vendor to offer serverless. And it definitely won’t be the last. Expect serverless CPaaS offerings in the future.
When I started researching for my first WebRTC API platforms report, one of the vendors I looked at was Voximplant. One of the things they referred me to was something they call VoxEngine. As its web page describes it, it is “an application engine that runs your apps inside the VoxImplant cloud” = Serverless.
I liked the idea, but didn’t think much of it at the time. It was rather new anyway.What is Serverless Computing?
If you haven’t been following the API scene, then you might have missed the notion of serverless computing. It is a concept where the code you write gets executed by the cloud. Directly. No need to run your own OS, VM or whatever container. Write the code. And it runs. Magically.
If you look at the compute models of XaaS, here’s the picture you’ll probably find:
- If you use On Premise, then you’re in charge of EVERYTHING
- With IaaS, everything up to the operating system is something “someone else” is taking care of. Amazon, Google, Microsoft or someone else entirely
- Then there’s PaaS. With it, everything up to the runtime is something “done for you”. Your data and application are yours to worry about. You connect with the runtime via APIs (not always, more quite common)
- SaaS is just getting the whole thing out of the box. Not our worry here
Where would Serverless fit in?
With Serverless, you write the “Application” but it and its data get handled and maintained by someone else.
What do you gain out of it?
- Scalability – you no longer need to care about it. Someone else now does that for you. You wrote the core logic of what you want to achieve, and the platform hosting your code is the one that needs to sweat it when it comes to scaling the thing as needed
- Maintenance – less code means you have less to maintain. And you’re shedding here all the boring work of getting the thing to work. In a way, you’re writing the initial prototype, and have it run in production
- Security – assuming the PaaS vendor handles the headache of security well, then you have less to deal with here
- Time to market – less to write also means faster time to market. It will take you less time to get that application in the hands of customers
- Latency – since the code runs directly on top of the PaaS APIs, on the servers of the same vendor, there a lot less latency involved in the API calls. Might be important, or might not be – just a fact
What do we have here then? Economies of scale at play. The vendor doing PaaS is already handling scalability, maintenance and security for you and a lot of other customers, so theoretically, he is doing and can do a better job of it than you can in the long run. This free you up to focus more on the user experience, ending with a better application and faster time to market. And there’s the added benefit of where the code is running (closer to the rest of the code).Serverless = Functions
While Serverless is the popular name, there’s another one that has been coined – FaaS – Functions as a Service; which then made it into the names of many of these products: Google Cloud Functions, PubNub Functions and Twilio Function to name a few.
Many API vendors now are starting to offer these serverless capabilities – so now you no longer need to have a server of your own connected to their service – you can just run your code in their XXX Functions product instead.
In some cases, using these Functions product is free, while in most cases, there’s a usage based payment model on running these Functions.Serverless CPaaS
Back to CPaaS and where serverless fits.
I think there are only two vendors in the CPaaS market today who are offering serverless (If I missed anyone – please share in the comments below):
In the last Twilio Signal event in London, Jeff Lawson mentioned that Functions was Twilio’s fastest growing product since its launch, so there must be a market for that.
CPaaS is slightly more complex these days, so it is important to see what serverless fits first. Let’s split CPaaS into a couple of API layers and products:
- SMS and voice (via phone numbers)
- IP messaging, chat and omnichannel messaging
- VoIP (voice and video via WebRTC)
In some ways, the proprietary scripting language API layer can be viewed as a crude form of serverless. You state your needs inside a piece of script that indicates the flow of actions to take on events, offering it as response to webhooks from the CPaaS vendor.
The REST APIs are those that are easily usable within a serverless environment. Instead of making remote calls via APIs from one server to another, handling things like security, authentication and scale, you just run the call as close as possible to its destination.
And then there’s the client SDKs. These run on the target devices themselves, and it is hard to see how you can translate them into serverless – they are already built to communicate with the CPaaS vendor’s backend, so they’re out of scope here.
Since CPaaS products are roughly aligned by the types of API layers that are used for them, we can reach the following conclusions:
A few things to note here:
- IP Messaging makes more sense to run in serverless computing when traffic is high and latency is important
- Latency is usually less of an issue when it comes to SMS and voice
- VoIP has its own set of solutions other than serverless. These usually come in the form of pre-built widgets and iframes (but that’s for another article)
From a vendor’s perspective, serverless is now becoming important.
Simply because it is part of Twilio’s runtime offering. And one that Twilio states is growing rapidly. I wouldn’t want to be left behind as a competitor.Why not use an IaaS vendor’s FaaS offering?
Just had to put these two in the same sentence.
Since the dominant IaaS vendors (Azure, AWS and Google Cloud) all have a serverless offering, why do you need one in CPaaS? Can’t you just connect the IaaS one to the CPaaS one?
You most certainly can. But you will be using two different vendors now. And to some extent, using something like AWS Lambda only makes sense if you are already making use of multiple AWS services.
Assuming what you do gravitates around communications, then using a Serverless CPaaS product makes more sense. It will bring with it reduced latency and improved security over using an external serverless product.Serverless is coming to CPaaS
Like it or not, serverless is coming to CPaaS.
If you are a CPaaS vendor and you are asking yourself what’s next – make sure you’ve got serverless in your offering or your immediate roadmap.
If you are a developer using CPaaS – see if serverless can help you develop your application faster.Selecting a CPaaS vendor for your WebRTC application? Check out my WebRTC APIs report
WebRTC has many moving parts in it.
When WebRTC works it seems like magic. You point your browser to a URL. Get someone else to point his browser to a URL – and – you now see each other.
How cool can that be?
If you look below the hood, there’s a lot going on in there.
Looking for a WebRTC course to dig deeper and build a solid architecture for your product?
I’ll try to give the explanation of how WebRTC works in a few different angles here. Together, they should create a pretty good picture of what’s going on.WebRTC Basic Concept
Here’s the first thing I usually say about WebRTC:
WebRTC is the means to drive real time communications (voice, video and arbitrary data) directly inside a web browser. No need for any plugin or download to do that.
Somehow, that’s not saying much.
So let’s start with what makes WebRTC truly unique from a browser perspective.
If up until now, when you thought of a web application you were thinking client and server –
You have the browser as a client. It connects to the server to ask for stuff. Lets call these things requests. And the server obliged by sending responses. We’ve grown beyond that using WebSockets, but it still is rather the same. If I want to send a message to a friend who is looking at his own browser just now, the message needs to go to the server and from there to my friend. Much like the post office works.
WebRTC is where browsers and HTML diverges from this paradigm:
While we still need to somehow signal from one browser to the other so we will be able to locate each other, once that signaling is over, we can send them messages directly between the two browsers – without the web server ever touching the messages. Magic.
This is why many refer to WebRTC as a peer-to-peer technology. Or P2P in short. Because browsers can communicate directly.Separation of Signaling and Media
When loading web pages, we are now used to the fact that the browser goes fetching a 100 different resources just to render a web page. These resources can come from various different servers – the host of the page, a CDN holding static files and a few third party sites. That said, this will mostly boil down to three types of files:
- HTML and CSS, which makeup the main content of the site and its style
- JS, which is usually there to run the interactive part of the website
- Image files and other similar resources
It ends up being a mixture of static stuff and a bit of code to hold it all together.
WebRTC is… different.
It requires two types of interactions that go over the network. Signaling and media.
Signaling takes place over an HTTPS connection or a websocket. It is implemented via JS code. What you do in signaling is decide how the users are going to find each other and start a conversation.
One important thing about signaling – it isn’t part of WebRTC itself. The developer is left to decide how to pass the information needed to create a WebRTC session. WebRTC will generate the bits of information it needs to send and process such bits of information that gets received but it won’t really do anything over the network about them. These bits of information are packed into SDP messages by WebRTC today.
Media takes a different route than signaling over the network and behaves very differently. This is true for the browser, the network AND the servers you need to make it work.Audio and Video
Audio and video is the main thing you’ll notice with WebRTC. It is also what gets showcased in almost all demos and examples of WebRTC.
The reason for that is simple – video is VERY visual and interactive.
Audio and video in WebRTC works by using codecs. These are known algorithms that are used to compress and decompress audio and video data. There are different codecs you can use in WebRTC and I won’t get into it now.
Audio and video also gets interesting because it is sent with low latency in mind. If packets get lost along the way due to network issues – it might not be worth retransmitting them (another first in the HTML).
WebRTC uses known VoIP techniques to get media processed and sent through the network, and this is all done over SRTP – the secure and encrypted version of RTP. WebRTC did make some minor changes by using specific mechanisms in SRTP that were not in wide use before, making it a bit harder to interoperate with if you have a VoIP service deployed already.Data too
You can also send arbitrary data with WebRTC. This is done over what’s called the data channel in WebRTC.
The data channel can be used when what you want to do is send direct messages between browsers without going through any server (you may still need to relay it through a TURN server though).NAT Traversal
Being able to communicate directly across browsers is great, but it doesn’t always work.
The internet was built on the client-server paradigm some 30-40 years ago. Since then it has changed somewhat. Today, most users access the internet from behind a firewall or a NAT. These devices usually change the IP address of the user’s device and mask it from the open web. This masking can be just that, or it can also offer some measure of “protection” where unsolicited traffic is not allowed towards the user’s device. The problem with this approach, is that WebRTC uses different mediums for signaling and media so understanding what’s solicited and what’s unsolicited traffic isn’t easy.
Furthermore, there are enterprises who make it a point not to let any type of traffic into (or out of) their network without vetting it.
Which brings us to these types of scenarios:
The guy there on the left? He now might actually know the public IP address of the guy on the right due to that STUN request that was made. But the public IP address might only be opened to the STUN server and having anyone else try to connect through that “pinhole” that was created may still fail.
In order to overcome these issues, a user’s device will not be able to directly communicate with another device located inside some other private network. And the workaround for that is to relay that blocked media through a public server. This is the whole purpose of TURN servers:
You can expect anywhere between 5-20% of your sessions to require the use of TURN servers.
Due to this complication, a WebRTC session takes the following steps:
- Send out an SDP offer to a web server. This SDP message outlines what are the media channels the device wants to exchange and how to find them
- Receive an SDP answer via the web server from the other device. Remember that that other device may be a media server
- Initiate a procedure called ICE negotiation, meant to find out if the devices are reachable directly, peer-to-peer or do they require media relay via TURN. This process is best done using trickle ICE, but that’s for another day
- Once done, media flows directly between the devices
Oftentimes, developers won’t develop directly against the WebRTC APIs and will use third party frameworks and modules to do that for them – open source or commercial.Quick Recap
- WebRTC sends data directly across browsers – P2P
- It can send audio, video or arbitrary data in real time
- It needs to use NAT traversal mechanisms for browsers to reach each other
- Sometimes, P2P must go through a relay server (TURN)
- With WebRTC you need to think about signaling and media. They are separate from one another
- P2P is not mandated. It is just possible. You can place media servers if and when you need them. It “breaks” P2P, but we’re looking to solve problems, not write an academic dissertation
- Servers you’ll need in a WebRTC product:
- Signaling server (either as part of your application server or as a separate entity)
- STUN/TURN servers (that’s what gets used for NAT traversal
- Media servers (optional. Only if your use case calls for it)
WebRTC has 3 main API groups:
getUserMedia is in charge of giving the user access to the camera, microphone and screen. It alone gives value for those who need to do things locally, without implementing real time conversations.
Here are a few uses of standalone-getUserMedia:
- Take a user’s profile picture
- Collect audio samples and send them to a speech to text engine
- Record audio and video with no quality degradation due to packet loss
I am sure you can come up with more uses to it.PeerConnection
PeerConnection is at the heart of WebRTC and the most complex to implement and to understand. In a way, it does EVERYTHING.
- It handles all the SDP message exchange (not sending them through the network itself, but generating them and processing the incoming ones).
- It implements ICE in order to connect the media channels, going through TURN relays if needed
- It encodes and decodes the audio and video data in realtime
- It sends and receives the media over the network
- It handles network issues by employing adaptive jitter buffer, bandwidth estimation, packet loss concealment, forward error correction and other algorithms that you really don’t want to know, but eventually will need to learn
- It handles local audio issues using algorithms such as acoustic echo cancellation
Much of what goes on inside peer connection that affects the resulting media quality is based on heuristics. A specific set of arbitrary rules. Different implementations may have different behaviors and different media quality due to this.DataChannel
I’ve discussed the data channel somewhat earlier.
The only thing to add here is that:
- Data channels can be configured to be reliable or unreliable. If you set them to unreliable then messages will not be automatically retransmitted on them. Sometimes, that would be your preference. They can also be configured to be ordered or unordered in the way they deliver messages
- Data channels were designed to work on the API level similar to WebSocket, so once you open it, you can think about it in a similar fashion.
You can find a few ideas of what people are doing with data channels here. There are more ways you can make use of it.The WebRTC Implementer’s Viewpoint
If what you’re looking for is to implement an application that makes use of WebRTC, then here are some activities you’ll need to deal with:
- Client side
- NAT traversal
Before you continue, you may want to check out this article about programming languages in WebRTC.Client Side
The client side can be a browser, mobile application, PC application or an embedded device.
For mobile applications, this is mostly about finding an SDK you’re comfortable with. There are again a few available on github, along with the official ones coming from Google for iOS and Android. There are also some commercial mobile SDK out there that are pretty good.
You can go for a PC application. Most do it by using Electron. And there’s also the embedded approach, which means either taking the official Google WebRTC codebase and porting it to whatever device you have or developing something on your own – I’ve seen both approaches work.Signaling
You will need a signaling server. The first thing a WebRTC client will do is call the mothership. That is used to coordinate whatever session you have in mind for it.
The signaling server isn’t in the scope of the WebRTC specification so it is up to you to figure out what to use here. Most of the code you’ll find in the github for the browser client is actually going to be an implementation of a signaling server.
Remember that the signaling server can be separate from your web server or they can reside within the same process – up to you. And in any case, the first thing to do is to check if there’s already some kind of a signaling mechanism that you have in place for your application for things that aren’t WebRTC. You might be able to piggyback your SDP messages and other WebRTC related signaling over that mechanism (I know that’s what I’d try to do first).NAT Traversal
For NAT traversal you will need to deploy STUN/TURN servers.
We’ll first start with what NOT to do:
- Don’t assume you won’t be needing TURN
- Don’t use public STUN servers
- Don’t have a single server for everything
- Don’t start by building a world-class global network of servers. You’ll get there, but it can wait
Now what you should do:
- Deploy STUN and TURN in the same server. On the same process
- Use coturn. That’s what everyone else is using
- Or instead, just get a hosted NAT traversal service from someone. XirSys and Twilio are good alternatives
if you are planning on group voice and video sessions, connectivity to PSTN or other networks, recording or other fancy features, then media servers are in your immediate future.
Look for something that fits well with your use case.
I’d even say start here before picking anything else in your technology stack.
There are a few open source and commercial alternatives out there. They are different from one another in many ways.Looking for a WebRTC Training?
The purpose of this article is to get you the most basic understanding of WebRTC if you’re a newb. I didn’t want to take the approach of building a “hello world” application – you can find many of these on the internet already. What I wanted to do instead is go somewhat higher and take a look at the bigger picture – you’ll be needing it soon enough.
In many cases, people start with a “hello world” implementation of WebRTC and try to fit it to their own scenario. I find that it is the wrong way in many cases, as it all depends on what it is you are trying to build – it will dictate the starting point you’ll need to make in your journey.
Spend the time to read this article, and then go read a “hello world” manual or two for WebRTC. It will make it a lot more effective if you do.
Looking for a WebRTC course to dig deeper and build a solid architecture for your product?
An interview with Jeff Lawson, Co-founder and CEO of Twilio.
After going to Twilio Signal event in London in September, I was asked by Twilio’s analyst relations about the event. I shared my thoughts in a lengthy article already, so it was easy to send out a link.
I did one more thing.
I decided to ask her if I can interview Jeff Lawson in person the next time I’ll be in San Francisco (which happened to be the following month during Kranky Geek). My expectation was to be ignored, or to just be declined.
But when she came back with an approval… I was clueless as to how to proceed.
We ended up deciding together on a recorded video interview.I was given free reign as to what questions to ask, with the request to share them if possible before the interview. No restrictions were placed. I reached out to a few friends asking for their thoughts of good questions, added a few of mine and prepared for the interview.
Jeff gave me his full attention for the better part of an hour. I ended up using everything we recorded – not removing any of the answers.
The result? A longish interview of around 37 minutes. I’ve added the transcript below the interview as well, if you’re more of a textual person.
I’d like to thank Jeff and the team at Twilio that made this one happen.Transcript
Tsahi Levent-Levi: Good morning, Jeff.
Jeff Lawson: Good morning.
Tsahi: Okay. I’d like to start with something, a question that I was very interested in. You have two kids, right?
Tsahi: Are they young?
Tsahi: How do you explain to them what you do every day?
Jeff: That’s a great question. It’s hard to explain to a young kid what Twilio is, but here’s what I’ve found is they use their phones … They don’t use their phones. They steal our phones, but the only thing we really let them do is communicate. If you think about it, that’s the very first thing that a kid wants to do. Call Grandma, and I’ll FaceTime Grandma from the phone. I explain that Twilio … Twilio is a technology. We let everybody who wants to be able to build things that communicate, we let them do that.
Tsahi: Okay. So that’s CPaaS in a way, right?
Jeff: CPaaS. Yeah. In an essence, we let companies call Grandma.
Tsahi: Yes. Okay. Letting companies call Grandma. I’ll tell that to my daughter.
Jeff: If Grandma is your customer and you need to engage with her.
Tsahi: Yes. When you started Twilio, like nine or 10 years ago, what was the original vision behind it? I guess it was slightly different than what it is today.
Jeff: It’s actually pretty similar to what it is today, I have to say. We started Twilio because I’m a software developer. I’ve been a developer for 20 years, and I also started multiple companies prior to Twilio. At each company, a common thread arose. At every single one of those companies, first of all, we were using the power of software to build a customer experience that was better than anything in the industry that had come before us.
I had started a variety of companies. An academic content company for college students online, StubHub, the online ticket exchange for secondhand tickets, and a brick and mortar retailer, of all things. The common thread among all of these was we were using software to build a great customer experience. We were using software to build amazing web applications, to represent the business, to enable us to touch customers. StubHub is the whole ability just to be able to connect folks together to buy and sell tickets. Software was key to that, and the key of software is agility. The ability to constantly iterate, constantly listen to your customers, put something out there in the world that you think solves a problem for them, get feedback and iterate. Sprint over sprint, every couple of weeks, you’re putting out something better, learning from your customers. That’s the super power of software. In every one of those companies, I had another problem. At some point or another, I had always needed to reach out and communicate with my customers. Just makes sense. Every time it happened, I said, “Well, that’s neat, but I’m a software developer. What do I know about making the phone ring?” That’s like magic. I have no idea how that works.
So I’d go to the industry, and I’d say, “How are we supposed to build this idea that we have?” We want to integrate with these systems. I have this idea for how I want to touch our customers, and the industry would say, “Oh, okay. Yeah, yeah. We think we can help you with that. First thing, let’s pull a bunch of copper wires from the carrier to your data center. Then we’re going to rack up a bunch of carrier gear in your data center, and then, let’s see. None of this was designed to do this idea you have, so we’re going to bring in this professional services army. They need to come integrate it, and they’re going to beat up all that equipment and get it to work and do exactly what you want. That will take about two million bucks, and it will take a couple of years to build. Sign here.”
Every time, I remember thinking, “Huh. First of all, millions of dollars for this one part of my customer experience? That’s a lot of money. I don’t think I have that, but if it’s not for the money, though, what’s much more important? The time.” Think about it. Two years before I get version one in front of my customers, before I get that prototype in front of my customers? Get any feedback whatsoever? That’s insane. To software people, to spend two years before you get anything in front of a customer? That’s crazy.
After having that experience at three companies in a row over the course of 10 years, I realized, “Huh. The ethos of communications is diametrically opposed to the ethos of software.” It kind of makes sense. If I was shooting satellites into the air and laying down millions of miles of wire everywhere, I would operate slowly and methodically, and that’s what I would do. That’s what the industry of communications industry has done for 100 years. The thing is, how you and I, how individuals, how companies, get value out of these networks has shifted. It’s no longer about the physical networks. It’s about the software that’s running that defines how we get value out of that network, what we can do, what’s possible. That’s all about software.
So we started Twilio in 2008 to solve the problem of bringing communications out of its legacy in hardware and physical networks and into its future, which is software. Now, we do that with a powerful set of APIs that run in the cloud that let any software developer be able to start building that future.
Tsahi: I’d say you succeeded in that.
Jeff: Oh, well, thank you. We feel like we’ve just started.
Tsahi: Okay. In all of these years, what would be one of the most surprising use cases that you can say that you’ve seen or come in front was like, “Whoa. That’s cool. That’s neat”?
Jeff: There’s so many. We build the platform. We never know what people are going to build. In fact, one of the little Easter eggs in Twilio’s history is that in every press release when we launch a new product, my quote ends with the words, “We can’t wait to see what you build.” Every press release, year after year after year, that was always the line. Nobody ever caught on.
There’s so many use cases. There’s the obvious ones. The whole on demand economy. Things like Uber and Lyft and Airbnb, where Twilio is not only notifying you that your car is arriving, but also connecting drivers and riders together. That whole idea that I would use the internet and my phone to get a stranger to pull a car up and get in the car, I was always told to not get in stranger’s cars. But now, that’s what we do every day, and use cases around how communications, and Twilio has made that safe, made that convenient, made that easy. I never would have thought of those the day we launched Twilio, because really, mobile phones, their current incarnation, smart phones, were just getting started, and that whole idea of it; the applications of it were still completely unknown.
But then there’s the crazy use cases that I still can’t imagine. One of my favorite crazy use cases is there’s some researchers in the United States who study the migratory habits of bears.
Jeff: Right? It turns out that if you study the migratory habits of bears, you spend your days in a helicopter flying around looking for bears with binoculars. When you see a bear, you land your helicopter. You shoot the bear with a tranquilizer, then you climb up on the bear. You hope it’s tranquilized, and you put a collar on its neck that’s going to track its location. Then you run away very quickly, hopefully before the bear wakes up. Then a year later, you’re circling in your helicopter. You spot the bear again. You land. You shoot it with a tranquilizer again. You climb up on the bear again, hoping it’s actually tranquilized. You pull the data card out of the collar. You put a new one in, and you run away before the bear wakes up.
They’re like, “There’s got to be a better way. We would love to stop shooting bears with tranquilizers.” So they built a collar that had a 2G radio in it that collects all the data. When the bear wanders into an area with some cell service … They don’t exactly walk around in shopping malls. When it wanders in, it picks up coverage, and it texts all that data off the collar to a receptor they built on Twilio. That was, I thought, such a cool use case, because they’re using this technology, 2G radios. They’re low power. They’ve got maximum range, and it is texting the data off to build an app. You’re like, “Who would have thought of this?” We call this the internet of bears. I’m like, this is a use case I never would have imagined that there were people whose days were spent doing this. They found a use case for Twilio to solve this problem.
Here’s another crazy use case I love. There’s a researcher in the UK who built an app that allows you to call a phone number, and based on taking a recording of your voice, can detect with a very high degree of accuracy whether you’re likely to be predisposed to Parkinson’s disease.
Tsahi: I should use that one.
Jeff: You’ve done it?
Tsahi: No, but do you have the number?
Jeff: It’s a medical trial. They ran this trial. They found it to be an incredibly accurate way of assessing whether or not you are likely to develop Parkinson’s just by calling a phone number on Twilio and recording your voice for about 30 seconds. What’s amazing, as a researcher, he said trials like this would have usually cost millions of dollars to set up and run, because you would have needed all this sort of expertise and specialization. The doctor and his staff built it in a couple of weeks using Twilio for less than $1,000. They ran the whole trial, so it’s amazing.
Tsahi: Yes it is. I want to talk to you a little bit about the market itself and the different players in that market. The main ones that you would have thought that you would have lead or be part of that are the actual Telcos, the carriers, the ones that offer the phone service to the consumers. When you look at what they are doing in CPaaS and in APIs, they have services, but none of them are quite as successful as the other vendors out there. Why do you think that is?
Jeff: Well, I love the carriers. They have a very valuable product in that they are building out all the infrastructure that we all use every day to communicate in every way we can. I would say, though, that the carriers are not well situated to solve these software problems. Historically, carriers have not been software organizations. They’ve been very effective at ground operations, at getting infrastructure out in the field, repairing it, installing it. They’re very good at sales and marketing and servicing customers, but they historically have not been great software organizations, and that’s why I think a new type of company has been needed to come and solve this problem. A company that is a software company.
Twilio, half of our company is our software R&D group. That’s a different ethos. Building a world class software engineering organization, one that can ship and be agile and build resiliency with agility, which is what we call that process of having a high velocity of innovation but also achieving five nines of availability and things like that. That is a hard software problem, and so it takes a different kind of company to solve that.
Tsahi: Okay. What about all of the IaaS vendors? AWS, Google Cloud Platform, Microsoft Azure? They offer infrastructure. They give you compute and storage and databases today, and it’s like shouldn’t they also do communications? It’s the next step. Why do you think that they aren’t there yet or aren’t there today?
Jeff: I think two things. First is, these companies have been primarily focused in the communications for online consumers. A lot of them have a consumer play, whether it’s Microsoft with Skype or Google with Hangouts and things like that. Then on the infrastructure side, I think they’ve gone to the things that they do particularly well on the infrastructure to build, which is to say it’s compute and storage, the most common areas of software computation, which has been a huge meaty market to go after, which has meant that communications hasn’t been the focus of theirs.
I think companies like Twilio, we focus on communications all day every day. That’s what we wake up to do, and so I think we’re uniquely situated to be able to build out great services that target exactly the use cases of communications while the other platforms have been really focused more on compute and storage and the key areas of general purpose computation.
Tsahi: Okay. Another trend that I’ve seen in the last year or so is around UCaaS, Unified Communication as a Service. These companies that offer you desk phones, the video conferencing systems, the things that you need in order to run and operate your enterprise internally. Communication between people inside the enterprise. It seems that all or most of these vendors today start offering APIs. They bundle APIs on top of their service. When you go and talk to them, they usually say, “We’ve got APIs just like Twilio. When you use us, you don’t need to pay for blah, blah, blah, whatever.” It’s like they compare themselves and position themselves as direct competitors to Twilio. Where do you see these two markets going? UCaaS and CPaaS. Where do they meet?
Jeff: Yeah. It’s a very different thing. If you think about Unified Communications as a Service, you’ve got an application. When you build an application, you make all sorts of assumptions about how the world works. You have a domain. You’ve got models. You’ve got all the core components of unified communications. Then when you add APIs to it, which by the way, it makes a ton of sense. Every SaaS product has APIs. In fact, UCaaS has been a little late to that game, I actually believe. Most SaaS companies have had APIs for 10 years. But when you add APIs to a software application, those APIs bring with it all the assumptions that you made about that application. That’s both good for some things … If you want to extend the application in a certain way and you want APIs to do it, that’s what those kinds of APIs are good for.
Twilio is designed from the ground up to be a set of APIs, to be ultimate flexibility. To not make all those assumptions about the one application that the end user is going to use it for, but rather to say these APIs are designed like building blocks to be put together in any way you see fit. That’s why we can address a wide variety of use cases, whether it’s two-factor authentication, identity verification, call centers, anonymous communications, notifications, alerts, anything you can imagine, you can build with Twilio. That’s because we were created from the ground up for this recombination of these building blocks as opposed to taking something that’s already built and fixed in place and then saying, “We’re going to add APIs to it.” It’s just a different way of approaching the API problem. Both of them have merits, but I like our approach, because it gives us the ultimate flexibility to really enter any of these use cases in a really wide breadth of things.
Tsahi: Do you see a unified communication platform as a service; A vendor that does such a service deciding not to build the whole communication infrastructure on its own, but instead using someone like Twilio, a communication platform as a service, to build on top what it is that he is doing?
Jeff: Yeah. I believe that companies whose primary business is communications can and definitely should and would get competitive advantage by using a platform like Twilio to build upon. The reason why is this. It used to be when those UC companies started, their core competency was making the phone ring. Then they’d add some software functionality on top of it, sure, but the vast majority of what they worried about was how do I make the phone ring? The problem is Twilio has democratized that ability.
Every developer … Every mobile developer, every web developer … now has the ability to make the phone ring in 100 countries around the world where we have phone numbers and touch every phone on the planet … Mobile, landline, et cetera … with an API that is reliable, that is scalable, that is global. Now, you’ve got developers out there who get to focus solely on customer experience, features, integration, UX, mobile. Build the things customers really care about and bring this core competency of focusing on user experience that software developers do so well. A one or two developer team can actually create a customer experience that is better than some large company that is focused purely on Unified Communications as a Service.
The existing UCaaS vendors, they would be wise to build on top of the same platform that any developer in the world can come and start to compete with them on. If they don’t, those independent software developers, they can actually start and build companies that are really compelling competitors, because they don’t have to focus on the low level bits. They’re focused on the things customers really care about, which is features, functionality, and the user experience that matters.
We have seen this play out, for example, in the call center market. We’ve seen … At our first conference back in 2011, Tiago was the founder of the company TalkDesk. One developer. Do you know Tiago?
Jeff: Back in 2011, Tiago was the founder of TalkDesk. Single developer. He was a web developer. He knew web development really well and focused on building a product that he thought would be really compelling. Because of Twilio, he didn’t have to worry about any of the underlying infrastructure. Now, TalkDesk is hundreds of employees, has raised a lot of venture capital, has Fortune 1000 companies running call centers on them all because he was able to focus on the things customers really care about, is the features and functionality of the application. He did not have to worry about making the phone ring. That’s a really powerful competitive dynamic, as new players come in fundamentally uplevelled, because they’re building on platforms.
Tsahi: When I look at the feature set that you have at Twilio, the different types of functions that you offer, at the end of the day, that is something that is always commented when people talk about Twilio and they’re trying to attack Twilio as a company. They say, “All of the money comes at the end of the day from SMS and voice. That’s what they do, and at the end of the day, that’s too competitive as a market today.” If you actually look and search all of the CPaaS vendors, all of the direct competitors that you have, almost all of them have the same type of characteristics. They make most of their revenue today from SMS and voice and a lot less from the IP based services that they have, from the new things that come out. How do you as the leader in the CPaaS space deal with that and meet that challenge?
Jeff: I think there’s two things. First of all, most mature products for any company are generally going to be the largest contributors of revenue. Especially with developer products. We have a very long commitment to developers, and that takes a little longer than other products to adopt, because you launch a product, then developers have to see that product, understand it, and build their product, and then bring their product to market. You’ve got a little bit of an extra delay as a developer-focused company before products become commercially viable.
That is a long commitment, and that, quite frankly, is why a lot of companies don’t have the stomach to serve developers, because it’s a long commitment to developers to get those products to grow and be large. But we have that commitment. The way we look at developer products is that they have a slower start but then a fantastic ramp up capability. So I wouldn’t worry about the short term. We’re planning for the long term. In the long term, it is blatantly obvious that the software APIs and software communications are going to win. We’re there with all the products that developers need to build it. We see developers building amazing things using our software products, our video SDKs, Twilio Clients for Voice Over IP, the rest of our software products.
The other thing I’ll point out is that our software products often drive usage and adoption of our voice and SMS products as well. They don’t exist in a vacuum. When a customer builds a call center using Twilio’s TaskRouter product, which is a globally scalable cloud-based ACD … When you use TaskRouter to build a call center, guess what? It drives more voice revenue. When you use Twilio Client as the basis of your call center, it drives more PSTN revenue, generally, as well, because you’ve got an inbound phone number.
It’s interesting is that these new technologies, software-based communications, are actual drivers of competitive advantage for our customers who adopt them, whereas if you think about the customers of ours who’ve adopted Twilio Client to allow any computer with a web browser to be able to now become a call center by just plugging in a headset and using our Twilio Client product that’s powered by WebRTC, that has leveled the playing field because you no longer have to manufacture or sell hardware phones or PBXs in a closet. These new software technologies have been huge drivers of a new set of players to arise in this industry who previously wouldn’t have been able to do it. That’s creating a new market dynamic here of new players entering the field and new products entering the field that wouldn’t have existed 10 years ago.
That’s really exciting, and it’s creating a huge market shift, but it also draws more usage of the PSTN right along with it. The same thing you can say for our Twilio Chat product. The same thing you can say for a number of our products, Twilio Studio. So all of these products together, you usually don’t use them in a vacuum. You use them together with other products. That’s part of the nature of APIs. But having them all together and being able to plug them in together to do these interesting things is fundamentally changing the landscape of the companies and the products that are out there that are really pushing the ball forward on communications.
Tsahi: I think I saw the first thing that you said when I worked at RADVISION years ago, but in the opposite sense. At RADVISION, you had two business units. One of them was a technology business unit. We sold SDKs to others to build their own products. The second business unit dealt with selling videoconferencing equipment. Whenever there was a downturn in the company because of the market, the CEO came out and said, “We have this business unit that sells videoconferencing. It’s now slow because of the market. Then the TBU, the technology business unit, we’re still going strong because we see that this will go upstream three years from now when developers actually launch it.”
There, the business model was flipped. We usually licensed the software in advance so developers had to invest when they started, and not when they saw the revenue. What you are saying is that today, in order to be in the developer space, you don’t make the money up front from developers that build stuff in the future. You wait and you grow with them. That waiting for that growth is what makes a company big at the end, is being patient.
Jeff: Exactly right. It’s the combination of our usage-based revenue model that tightly aligns us with our customer’s success. This is key. When we think about what is the driver of innovation, what makes developers be successful in building their next idea, it is experimentation. Experimentation is the prerequisite to innovation. Everything that we do is about lowering the barriers to a developer getting started and running as many experiments as they can for an idea that they want to try out. That’s why we have such a low upfront. You get started … Every developer who has used Twilio started by spending their first penny to make that first phone call, send that first text message, fire up that first video session.
You never know which one of these ideas that developers are building is going to be the next great big idea. Our job is to make it so developers can try as many of these ideas and run as many experiments as they can until they find product market fit with the thing that they’re building. That’s why it’s a long commitment to developers, because you need to give them the runway. You need to have that patience, but you also need to have that attitude that it’s not about, “Hey, a developer came to our door. I’m here to get all the money from you today.” You’re like, “No. We’ll do well if you do well. I’m just here to make sure you do well. I’m here to do everything I can to make you successful in building your ideas.” Ultimately, that’s how I’m going to be successful, but it’s a long commitment.
We like to say, though, it is a compounding interest business, essentially. You invest in developers, and they build. With the usage-based model, as they grow, as they’re successful, that, then, turns into our success. For us, that means customer success is the very first thing. It’s the prerequisite to our own success. Everyone at Twilio is always focused on customer success first.
Tsahi: I’ve been to two Twilio SIGNAL events, both very interesting events. I really loved them. What I noticed that you know exactly what the product does. When there is a product launch, you play with it. You do it on stage. You use it. You’re a developer yourself. How can you do that and still be a CEO of more than 900 employees?
Jeff: I think as an API developer-first company, I have to do that. That’s how I can make sure that we’re building the right things, and that’s how I can make sure I’m close to our customers and I’m close to our products. I love playing around with the new Twilio products. I am the first person they give access to when we build stuff, or at least, I hope I am, because that’s how I love playing around. I just dive in there. I read the docs. I started building stuff. That’s really exciting.
Recently, I was building something for Halloween with my kids with some Arduinos. I love building internal things at Twilio. A few years ago, I built our goal-setting software that we were using at the time. I just dove in. They don’t let me touch production code anymore, which is probably a good thing, but I just love being a developer. Even though I’m a CEO, I love continuing to invest in that part of my life. Obviously, I don’t get to do it as much as I used to, but it would make me very sad if I had to stop. I’ve just arranged my schedule and arranged my life so that I always make sure I’ve got some time to stay current on new stuff, both inside Twilio and outside Twilio and build. I’ve always thought that just building, just having a project idea in mind and committing yourself to building it and picking even some new technologies you’ve never used before, that’s a great way to keep learning and keep building and keeping your skills up.
Tsahi: I can easily relate to that. Talking about products and what is it you do, the last year it seems that you have somewhat shifted. If up until now, you could have said that when Twilio launches a new product or introduces a new product, that would be yet another building block that you can use to do some kind of communication. A new communication service that you couldn’t build before. It seems that you’ve started moving upstream. There is the Engagement Cloud with Notify and Authy. Then there is even Twilio Studio that goes for me even one level above that. Why did you make that move? Why the shift?
Jeff: Well, we don’t see it as a shift, because to us, it’s always about having the right API for a developer to get the job done. As a platform, you start off with a set of building blocks that provide maximum flexibility, because you don’t necessarily know what developers are going to want to build. As you learn from developers what are the most common things that they want to get done, but also what was really hard? What did they think would be easy to build and it turned out was very hard?
We view our job as making our customers successful. When we see the things that we can do to make their lives easier, help them get the job done faster or not have to reinvent the wheel because they’re trying to figure out, “Hey, how do I figure out how to distribute calls?” and I see every other customer trying to figure that out, too, as they’re building a call center, it becomes obvious. You say, “Wow. My job is to make my customer’s life easier and make them more successful. Why don’t I build a product that does that thing?” So you end up with Twilio TaskRouter, for example.
In the case of Studio, we view it as making the developer’s job even easier and allowing more people to participate in the development and the maintenance of these applications they’re building. Why? Because we saw developers build an application, and certain parts of it are really exciting, like how do I figure out the exact experience I want? How do I integrate all this stuff? Then parts of it are really boring and become a tax to the developer and to the whole organization, such as when folks are saying, “Hey.” Product manager says, “Hey, can we update the text? We’re going to run an A/B test. Can you try 50% on this and 50% on that? Can you change the SMS text? Can you change how the call center greets the people coming in?”
The developers don’t see that as exciting. They see that as, “Oh, it’s continual maintenance. It keeps pulling story points off of me every week, because I’ve got to keep maintaining the thing.” We said, “Isn’t there a way that we can allow the developer to do the really important parts, the parts that are about integrating systems and things like that, and then take the other parts that are a little more standard and make it so not only the developer doesn’t have to write it … They can just drag and drop and build it easily … but they can also hand some of that off to other people in the organization.” Maybe the marketing people have ideas about how they want the content to work. Maybe the ops people want to change how the IVR call flow works. There’s all sorts of different people who are invested in these communications applications, because customer engagement touches so many parts of the company.
If we can offload a bunch of that work from the developer, that ultimately will accelerate our customer’s roadmap and make them more successful. Again, you go back. That’s our goal. By the way, when we make our customer successful, that makes us successful, so we’re all aligned in this. Studio is a great way to do that. So we keep listening to customers, hearing the things that they love about the API approach, the flexibility it gives them, the fact that they can now build things that they were never able to do in the past because pre-built software applications weren’t flexible enough. But then we say, “Great. How do I make it so that you can get that flexibility faster and easier than ever before?” You do that by listening to your customers and solving the most common pain points.
Tsahi: I really love Studio. I’ve played with it. It’s a great tool. Really.
Tsahi: How do you make the definition of it? Going … Building a UI tool, an IDE that can mix and match stuff and do this logic is never easy. I’ve used tools before that are similar. Some of them are good. Most of them not the good. How did you nail that experience in a way that, at least for me, was just point on?
Jeff: I think there have been fits and starts in the history of computation around visual designing of programming. Sometimes they work. Sometimes they don’t. To us, there were two things that were involved in that. Number one is working with a lot of customers and a lot of users. We actually started with paper and sticky notes and starting to design with them how they would want to design something like an IVR or an SMS bot or a chat bot, things like that. We actually did it with sticky notes before we wrote a single line of code. To us, that was the equivalent of for APIs, it’s writing the API docs first, putting them in front of a user and saying, “Hey, is this the API you would want?” We do that before we build the product. We did the same. We applied the same logic to building a user interface for drag and drop development.
Then the second thing was I think we constrained it down a bit to say, “This isn’t about general purpose computation,” because you get in all sorts of hairy things. We’re focused on the customer engagement. If we scope it down and we say, “We want to make the very best visual designer for Twilio for customer engagement. What are the things it should encompass?” I think that the key of building both power and simplicity is really understanding your domain that your customers are operating in and then designing the perfect thing for that domain.
I think that obviously, we’re just at the very beginning. We launched it just over a month ago, and so we’re continuing to learn from customers and get that feedback, but that’s our approach that I think has helped us to build something that customers find both powerful but also easy to adopt and easy to use. That comes from the same approach we’ve used to design APIs that I think customers would articulate in the same way. They’re powerful and easy to use.
Tsahi: What’s the feedback that you get about the engagement cloud? It’s out there for what, half a year now?
Jeff: Mm-hmm (affirmative). Look, when we talk to customers and we take a step back and we say, “What is Twilio all about? Why is Twilio important to you, ING Bank? Why is Twilio important to you, Morgan Stanley bank?” Some of these very large organizations, so obviously have a lot of options and a lot of legacy systems they could have kept using. The answer we get is, first of all, flexibility. With Twilio, we get this unprecedented flexibility.
When you think about the importance of customer engagement to a company, almost nothing is more important. When I talk to a CEO of a bank, and you ask them, “What’s important?” they are so concerned about, “How can I maintain my relationship with my customer?” That’s the biggest fear that C-level executives have. That is done with customer engagement. How do you keep up? If you think about the problem space here, it’s insane.
As consumers, the technology that we use has advanced incredibly rapidly in the last five to 10 years. We’ve got a wide variety of new applications that we use. We use video. I use video almost daily. I would have thought that was crazy 10 years ago. I would have thought that was stupid, and now here we are. We use video on a daily basis. We’ve got great chat applications. We’ve got apps in our chat and chat in our apps. It’s amazing. Yet, for companies to communicate to their customers, it is incredibly broken. Why? Because companies can’t keep up with the pace at which our expectations are changing for how communications is going to work and how great of an experience it’s going to be.
We’re still stuck in the days where you essentially call an IVR of a company and they don’t know who you are. You enter your 40-digit account number and then you talk to an agent. They’re still asking your name five times. You’re like, if I had that experience with a friend, if I called my friend and they asked me my name five times during the call, I would think there was something medically wrong with them. Yet when you call a company, that’s the experience you expect. Nothing is more broken about communications than how companies talk to their customers. We want to fix that.
When you talk to executives at companies and you say, “What keeps you up at night?” It’s, “Yeah. I’m worried about losing my connection to my customer. Being disintermediated by all these other technologies that are coming out. I need to keep the connection in order to stay top of mind and stay relevant to my customer.” When I think about how that works, it’s like, “Well, you’ve got rapidly proliferating ways in which you need to reach your customer.”
10, 15 years ago, talking to your customer generally meant you had a phone number and customers could call it. Now, you’ve got not just phone calls. You’ve got text messaging, you’ve got chat, you’ve got mobile apps with push notifications. You’ve got WeChat, WhatsApp, Facebook Messenger. You’ve got so many different … Now Alexa, Google Home, personal assistants. You have so many ways and very finite development resources to keep up with this changing world. By the way, it’s not just the ways in which you need to communicate that is proliferating. Think about all the departments in a company that need to actually keep up. You’ve got sales, marketing, customer support, onboarding, product teams. Every part of the company is trying to keep up with every part of this changing technology landscape. It is an unsolvable problem for most companies.
That’s what the engagement cloud is here to sell. We want to provide one system that allows companies to keep building, keep iterating, but to reduce the barriers, reduce the time to do that and give one tool to all these different teams who need to touch customers, to be able to keep up with this rapidly changing landscape and constantly iterating on those customer experiences with easy to use tools and infrastructure that they don’t have to worry about scaling. They don’t have to worry about reliability. They don’t have to worry about onboarding new platforms. We’re going to do that for them as the world is changing. They get all that stuff from us, and so they focus on, “Okay, what’s my special sauce? What’s the thing that makes my brand and my company engaging to my customer?” I’m going to focus on that last bit, and we’re going to iterate on that constantly, and I’m going to empower all these different teams inside the company to be able to have that at their fingertips. That’s what the engagement cloud vision is all about.
Tsahi: Thank you for your time, Jeff.
Jeff: Thank you, Tsahi.
Tsahi: I thoroughly enjoyed it.
The post Jeff Lawson on the Past, Present and Future of Programmable Communications appeared first on BlogGeek.me.
Vidyo has made several announcements in the past couple of weeks. Time to see why the time is right for RTC across markets.
It has been a busy month for Vidyo. It has made two interesting announcements:
- The introduction of VP9 into its products
- Streamlining its product line
Vidyo has been known for their video routing technologies for many years. Well before WebRTC came into the ring. It is great to see how they have come in merging the two, along with how they are trying to fit their business model to the realities of WebRTC.Vidyo, WebRTC, VP9 and SVC
How do you compete in a world where WebRTC is becoming the dominant media engine? Especially when the baseline implementation is dictated by what you get by default in the browser?
Vidyo has always had its own proprietary codec implementations. Ones that are optimized for SVC – Scalable Video Coding. Alex Eleftheriadis guest posted here last year with an explanation of SVC. To simplify, SVC gives two big advantages:
- Better error resiliency on poor network conditions
- Better support for multiparty and broadcast interactions
In many cases, you can get these things done without SVC and the end result would be good enough. But there are times when this extra kick to quality and optimization of how the network gets used makes all the difference.
When it comes to current browser implementations of WebRTC, the only video codec that has any kind of SVC support is VP9 and that takes place in Chrome. To take advantage of SVC, there are only two routes a company can take:
- Rely on the browser implementation and exposure of VP9/SVC features, and then implement these capabilities in its application
- Build its own XXX/SVC implementation into a non-browser application
Option (1) is great, but it assumes that:
- Browsers prioritize VP9/SVC over other features. The challenge here is that things like aligning with the upcoming WebRTC 1.0 spec is most likely a lot more important
- VP9/SVC will be implemented soon, and controlling its SVC capabilities will be exposed to the developers via JS APIs or additional SDP parameters
- The existence of media servers that support SVC and optimize and fine-tune well for it
Reality is that on Chrome, the VP9 implementation in WebRTC supports SVC on the decoder side, but it doesn’t yet supports WebRTC in the encoder side.
Vidyo took the middle ground here, trying to enjoy both worlds: It always had its own SVC implementation in H.264 but allowed using WebRTC. Now, with its VP9/SVC implementation, it gets the freedom to improve video quality of its sessions in ways that others can’t.
If you use Vidyo.io today (and its other products in the near future), then Vidyo will try and prioritize the use of VP9 over other video codecs. And if some of the users in the session are making use of Vidyo’s SDKs instead of the native browser WebRTC implementation (i.e – joining from mobile or a desktop app), they will encode VP9 with SVC capabilities, and Chrome will be able to decode the bitsream – though the browser’s own encoded bitstream won’t be using SVC (at least not for now).
This places Vidyo ahead of the pack in SVC support that plays well with WebRTC.Vidyo’s Product Line
Here’s the gist of the new product live view from Vidyo:
Vidyo has taken the approach of offering a single technical infrastructure to host and run all of its products. This is the right move forward and an embrace of the cloud. In a way, Vidyo is continuing its shift from on premise deployments towards a Vidyo hosted and managed cloud platform.
Vidyo.io can be defined as CPaaS, a Communication Platform as a Service; while its VidyoCloud can be defined as UCaaS, a Unified Communication Platform as a Service.
Vidyo started life in the UC business, moving to the cloud and then adding an API platform. In many other cases, UC / UCaaS vendors take the approach of adding an API on top of their UCaaS product and then just calling it CPaaS. Vidyo decided on “separating” the two which feels to me as the better approach. It casts a wider net over the potential target market and the types of use cases that Vidyo can now cater for.
To this product line, Vidyo has added earlier this year VidyoEngage, its answer to video based contact centers.
The end result? Vidyo can now be used in the 3 biggest domains for visual communications:
- Unified Communications, with its VideoCloud offering; providing a complete video communications platform
- Contact Centers, with VidyoEngage; providing a higher level abstraction of the call center modal to its customers
- All the rest, through its Vidyo.io platform for developers
You can use Vidyo.io to build a UC or a CC application if that’s your need, or you can just pick up VidyoCloud or VidyoEngage to get there.What’s Next?
The challenge for Vidyo will be in competing in 3 different fronts at the same time, and the threat of losing focus. I am guessing this is one of the reasons for this streamlining – it is meant to simplify its internal infrastructure that is used in these 3 products on the technical level.
Managing these separate businesses and keeping abreast in all 3 markets will be hard, but Vidyo is off to a good start here.
When it comes to Vidyo.io, the addition of VP9/SVC support positions Vidyo as the technology leader in its space with the ability to offer the best media quality. Its competitors will require
Jitsi is getting a boost in its development.
When a developers focused company gets acquired it is time to start worrying.
Was the acquisition due to the technology, the customers or the business model?
Will the product continue to grow and flourish in the new regime?
Are the current signed agreements going to be renewed?
For open source, there are even more questions.
How will the community that was created around the open source project be treated?
Will existing business models around support, customization and dual licensing be maintained or will they be killed?
Two and a half years ago or so we had 3 popular open source media servers for WebRTC: Janus, Jitsi and Kurento.
The progress made around Kurento since its acquisition was minimal at best. My guess is that Twilio is just too busy in getting its own multiparty video ready for GA to focus on the Kurento open source project itself. It also haven’t quite acquired everything that is Kurento – parts of it were left for the community and the original parent company Naevatec. The time passed is making a lot of the Kurento adopters frustrated and in search of different alternatives.Best time to join my WebRTC Course? Today. Office hours are starting next week, and there’s a great bonus ebook of how meet.jit.si built its scalable infrastructure.
So time to ask –
How did Jitsi fair since its acquisition?
And it seems to be getting a lot more interesting lately.
In the past 4 months, I’ve been adding almost on a weekly basis a post about Jitsi into the WebRTC Weekly. The team there has been continuously churning out new features into the project.
Here’s what was announced on the Jitsi blog since June when it comes to new features:
- New Layouts in Jitsi Meet
- Control the Volume for Every Meet Participant
- Speaker Times in Jitsi Meet
- Telephony Support on meet.jit.si
There’s a mix of announcements here. They range from addition of UX feature to some deep optimizations of the media server itself. And part of it is due to GSoC, Google Summer of Code, a project started by Google some years ago where university students can join open source projects as interns. Jitsi has been part of this project for some time now.UX Improvements
In a way, these are the least interesting features when it comes to a media server, but the ones that makes it easier to use.
What Jitsi did in this round was tweak the UI to be a bit more modern and easier to use. For video layouts, there was a decision to better cater for 1:1 scenarios and to move video thumbnails from the bottom of the page to the right side of the page. This is also what Google decided to do once they shifted away from Hangouts to Meet. This makes for a more modern approach that sits well with the wider displays we have in recent years.
An audio only button was added to the UI. I am assuming it is just a shortcut to muting incoming and outgoing video. Having this UI element there makes it easier for users to operate (and easier for adopters of the Jitsi Videobridge to customize).
The interesting addition to me is the speaker times one.
I am intrigued in this case to know how easy would it be for an application to get that information from the Jitsi Videobridge – is this supported via the signaling offered by Jitsi towards the web client or is it also available as a backend-to-backend REST API? I can see this being used later in various ways, assuming the API is detailed enough and easy to use.Integrations
A WebRTC media server is but a part of what you need to run a full application. While central and important, there are other aspects to it. In recent months, Jitsi have added a few additional integrations, making it easier to use and connect to.
Three such integration points were announced:1. Mobile SDK
Jitsi had mobile applications for quite some time. While nice, it is different than having a mobile SDK.
Something I’ve been telling media server vendors for a few years now, is that they should offer a mobile SDK as part of their media server. In WebRTC, it is an important part of their offering and one that is hard to ignore.
In the case of Jitsi, users had to use the mobile application as a reference and modify it to their heart’s content. The problem with this approach starts when you need to maintain the codebase in the long run. When a new version of the mobile app comes out – how do you know which parts are critical to upgrade (=without them the app will break with the new Jitsi Videoserver) and which ones are just UI fixes that you can ignore or just pass since you’ve created your own UI experience already?
This is exactly why an SDK is such an important aspect of the solution:
With a mobile SDK, application developers can now just use the Jitsi Meet mobile application as a reference or even write something from scratch on top of the mobile SDK itself. Each is independently updated and maintained, making it easier to upgrade to newer releases.2. Speech to text
Translation and NLP seems all the rage these days.
The way you get these things connected to WebRTC varies, but follows a similar approach for media servers:
You somehow collect the audio streams on the media server, mix and process them to the format supported by a 3rd party speech-to-text engine (Google Cloud speech-to-text seems quite popular these days), and once you get the resulting text, you do something with it.
In the case of Jitsi, this was a GSoC project. Information about its current status can be found on the developer’s website – Nik Vaessen.
This probably requires some more improvements and polish, but offers a good starting point for developers.
I’d wager that in GSoC 2018, the Jitsi team is planning on adding translation and text-to-speech to it.3. Telephony
Telephony was already available in Jitsi before. It is implemented via a Jigasi server (JItsi GAteway to SIP). Now Atlassian is eating its own dogfood and not only with its internal HipChat service but in its free meet.jit.si showcase service.
In the case of meet.jit.si, the length of calls was limited to 2 minutes, enabling hunting down meeting participants who haven’t joined the session.
This serves two purposes:
- Show that Jigasi works and showcase its use
- Work out the kinks of getting this into the UX
At the heart of Jitsi is the media server itself. This is what developers aim for to begin with and the additions there are quite interesting.
The first one is that Jitsi now supports peer to peer media traversal for 1:1 sessions – in effect – no media server. The reasoning being that many of the calls end up being 1:1 and it is far easier and cost effective to share media directly between the participants.
In the past, supporting such a thing with Jitsi required running a separate signaling mechanism for 1:1 sessions and then once the need arise to grow, shift and renegotiate everything in front of Jitsi. It was tedious at best.
The other work effort is way more interesting.
Bandwidth estimation is nasty. Network conditions are varying and dynamic. You can start a session with 2Mbps and have it considerably drop throughout the session, coming back up again and changing characteristics.
To get that right, WebRTC (and any other VoIP alternative) needs to use bandwidth estimation. This is a process where the device tries to understand how much bandwidth is available to him at any given point in time. The algorithm can be naive, smart, complex, whatever. And a lot of the perceived quality of a call would rely on the quality of the algorithm used for bandwidth estimation.
WebRTC has its own built in bandwidth estimation mechanism. It works. But you need your own algorithm in a media server. Jitsi has its algorithm, and it is work in progress.
The Jitsi team are now taking it to the next level, trying to not only understand availability of bandwidth but also what the best course of action should be – it is trying to discern if it is better to reduce bitrate or add forward error correction instead.
It also does that with the coolest set of tech tools available to us today – Tensor Flow and Machine Learning.
Here’s what Emil Ivov shared during our Kranky Geek event last month:Where to Next?
Looking for an open source alternative for your media server?
The most popular approaches out there for you are Janus and Jitsi.
Which one to pick out of the two seems to be based on personal taste more than anything else.Best time to join my WebRTC Course? Today. Office hours are starting next week, and there’s a great bonus ebook of how meet.jit.si built its scalable infrastructure.
Kranky Geek 2017 has been a roller coaster event for me. Time to discuss what I learned about the WebRTC last week.
Yap. We had a full room.
Well… More like 2 full rooms.
When talking to Lawrence some time in the afternoon, he joked with me, saying that apparently we have a problem – the overflow room is overflowing.
The best problem an event organizer could ever ask for.
If you are looking for the event videos, then they are already on YouTube.
I want to share some of my thoughts prior to the event and during to the event. And if possible, try and shed some light on where we’re headed from here.Want to keep abreast of the WebRTC ecosystem? Join the WebRTC Weekly Challenges Abound
Putting up an event is a stressful undertaking. There are a lot of aspects that needs to be covered with this constant worry that you’ll end up forgetting something or that something will screw you over. Both are guaranteed to happen no matter how much planning and effort you put into it.
This time, our challenges started early on. It was somewhat harder than usual to decide how to price the event to make it worthwhile doing. Kranky Geek events are expensive to run. From the beginning, we’ve aimed for events that are free to attend (I consider a $10 admission fee that gets donated as a free to attend event). This left us with covering our expenses and making some revenue out of it something that relies on sponsors.
Kranky Geek is all about quality content. High quality content. Top notch. The best you can find.
Which means that we select the topics we want. We then hunt for the speakers that fit into that. And we work with our speakers to make them shine.
This process doesn’t always work with sponsors… it is sometimes hard to explain how we operate and why. And at times, sponsors can focus on hard selling their warez, which doesn’t fit into the Kranky Geek spirit (and definitely not to our audience).
This time, it took us slightly longer than usual to get the sponsors onboard and to be certain that we can pull off the event.
We don’t always agree, but somehow we fit well together, each one covering the other one’s shortcomings. We make a good team for getting these events done. I hope
Why am I sharing all this?
To set the stage to what comes next for Kranky Geek, but also to explain the amount of work, effort,time, stress, pain and love that has been put into the Kranky Geek events in general and to this one in particular.
It hasn’t been all happy, but I am proud of the result and happy that we did this.We Had a Fire Drill!
During the day, we’ve had our share of technical challenges.
The projectors in the main room didn’t work at the beginning (that was before we started the day), and then a few other issues cropped up on us.
Doing this event in Google’s San Francisco office meant we had the best A/V team in the world on site to help us. The crew Google is working with there is top notch. The best I worked with. They made the problems seem easy to solve.
We had this to deal with…
— Lawrence Byrd (@LawrenceByrd) October 27, 2017
A week before the event we were told we will have a fire drill in the building on the day of the event. The time kept moving around, settling at 2pm. We’ve scheduled our breaks and sessions around it, with a huge worry of having people leave once the fire drill started.
(that’s Kranky going down the staircase during the drill)
We decided to embrace the fire drill and tried to celebrate it with our audience, and I hope we succeeded. Back from the fire drill, we had almost everyone back.
We should probably make fire drills an integral part of Kranky Geek events.
Time to stop rambling.The Event Recordings
The recordings are available online.
You can find them here.
We’ve had to reorder the sessions from our original agenda due to constraints we had with some of our speakers – late arrivals and early exits.
So I’ve reordered the sessions here. Following this, are the 13 sessions we had, in the original order we wanted (not that it really mattered).
I added some of my commentary on what I liked and learned in each of the sessions.Kranky Geek Team
Nothing to say here really, besides the fact that I envy Chad’s ability to create slides and present them.Facebook
This is the first time we had Facebook join us and share a story at Kranky Geek. We had the pleasure to have Li-Tal Mashiach an Engineering Manager at Facebook do the talk.
The numbers there are impressive as hell. 400 million monthly active users doing voice and video calls on Facebook Messenger using WebRTC. 400 million.
The next one who asks me if WebRTC is being adopted – I’ll just say 400 million. And then he’ll complain that this isn’t an enterprise application…
Anyways, what I found really interesting is how Facebook is dealing with optimization. The effort placed in the decision making process around video codecs, bitrates, etc.
WebRTC comes in a neat open source package that anyone can use. But it needs a lot more love and care when it comes to making it work at scale – just like any other technology.TokBox
Badri Rajasekar, CTO of TokBox, shared an experiment that TokBox has been running recently. It was about using head tracking technology to improve video quality.
The idea behind it is that you can scale up a region of interest in an image sacrificing other regions, which ends up putting more pixels encoded for these regions.
The great thing here, that you do it without touching the encoder or the decoder. Why do we want that? Because the more generic you can make an encoder, the easier it is to implement it in hardware.VoiceBase
Walter Bachtiger, Co-founder and CEO of VoiceBase talked about NLP (Natural Language Processing), and how great insights can be derived out of voice.
It was a bit of creepy, understanding how accurate machine learning can be at scale in a contact center.
The part I liked best in this one was how a contact center can decide within 30 seconds how likely you are to buy – if only the people who call me would have used it… it would have saved me a lot of time as a customer.Atlassian
Emil Ivov, Chief Video Architect at Atlassian, and a serial speaker at Kranky Geek gave a very interesting talk about machine learning and bandwidth estimation.
The team at Jitsi now use Tensor Flow to sift through metadata they have of calls to try and understand how the network behaves and what strategy would work best in improving network quality.
It seems like reducing bitrate doesn’t always have the necessary effect on things, and FEC might end up working better.Vidyo
Roi Sasson, CTO of Vidyo, talked about scale.
This wasn’t about how to scale a service, but rather how to scale a single call. Want 10 people on a call? You may not need to worry, but if you go to a 100 or a 1,000 – you need to think differently about it.
Which is where taking SFUs and cascading them, both within a single data center and geographically, starts making a lot of sense.WebKit
For the first time, we had a representative from Safari. We got to hear what Apple’s default browser does with WebRTC and how from Youenn Fablet, a contributor to WebKit.
It was great to have WebKit join us at Kranky Geek, and to hear their fresh thinking about privacy in WebRTC and how they’ve taken care of that in Safari.Peer5
Hadar Weiss, Co-founder and CEO of Peer5 talked about P2P CDN and using the WebRTC data channel.
We never did have a focused talk at the data channel in Kranky Geek, so this was a first.
I found really interesting how Peer5 does things differently than the rest of the WebRTC community. Mostly because they care less about call setup times and TURN connectivity and a lot more about throughput.
Hadar showed a few techniques I really liked, like the simple compression of SDP messages (which starts to make sense when you process and send millions of these a day).Slack
From Slack we had Lynsey Haynes and Andrew MacDonald.
Two things interesting about this session:
- The shift they made from a custom WebRTC implementation towards the use of Electron with a vinyl WebRTC implementation in Chromium – all due to maintenance costs
- Switching from a custom Janus media server towards a self developed one written in Elixir
During the Q&A (which didn’t make it to the recording), Slack were asked about their support of Firefox. Andrew answered that support for Firefox is unlikely to come due to the shift of Slack towards focusing on less browsers and on their Electron-based desktop application. I see this thought process taking place elsewhere as well – it doesn’t bode well to the future of browsers.Twilio
Rob Brazier from Twilio showed an AR (Augmented Reality) use case.
I’ve never been a fan of these acronyms such as IOT, AR, VR. Marrying them with WebRTC always seemed to me somewhat forced.
That said, Rob did a great job in making a case for AR in communication interactions. I am sure more exist.Frozen Mountain
Anton Venema, CTO of Frozen Mountain was there to give an interesting demo.
He cobbled up text to speech, translation and speech to text to their media server platform, doing a demo of live language translation taking place in a WebRTC session.Google
Niklas Blum, Huib Kleinhout and Justin Uberti from Google shared the progress made in WebRTC towards WebRTC 1.0.
This one had a lot of details for developers about things they need to know with the latest versions of Chrome and what to prepare for moving forward.Appear.in
This year’s closing session was given by Philipp Hancke of appear.in. He’s a repeat speaker at Kranky Geek.
Philipp delved into NSFW (Not Safe For Work) related technologies, experimenting with recognizing such content and deciding what to do with it.
It was an interesting mix of technologies, human behavior and compromises.Our Event Sponsors
Did I already say that Kranky Geek relies of its sponsors?
This year we had 6 of them:
I’d like to again thank our sponsors.Diversity and Kranky Geek
For the first time, we had female speakers. Great female speakers.
I want more of this.
If you are a woman, or know of a woman. One that has technical WebRTC chops. And a desire to share your experiences. Contact me…What’s Next for Kranky Geek?
We weren’t sure if we will have another Krank Geek event. But due to the success of the one we just had, there’s high probability that we will do another one next year.
Get ready for Kranky Geek 2018.
With more great content, and maybe – a fire drill.
And while at it, if you increase your visibility in the market, know that sponsoring a Kranky Geek is a great way to go about it. So put some budget aside for it. Q3/Q4 2018 is where it will take place.Want to keep abreast of the WebRTC ecosystem? Join the WebRTC Weekly
The post Kranky Geek 2017: What Does the Pulse of WebRTC Tells Us? appeared first on BlogGeek.me.
How can you make a living from WebRTC? You offer WebRTC developer tools.
One of the interesting questions is around monetizing WebRTC. The truth is, it is hard to monetize a concept, or a piece of technology. Kranky said it well over 3 years ago – WebRTC Market Size (is 0).
What does this mean? That you can either make money by selling tools to developers who need WebRTC. Or you make money by offering a service that makes use of WebRTC, but we can now debate if that’s WebRTC or not.
Anything that isn’t WebRTC developer tools talls into other market niches – healthcare, education, gaming, … all these compete and create business far from the WebRTC core itself.Want to learn who’s offering WebRTC Developer Tools? Check out my WebRTC Developer Tools Landscape infographic.
WebRTC developer tools though – that’s where a small WebRTC market niche exist. And there are several ways to make money in this market. Here are 6 different types of services you can offer to sell WebRTC to developers – some will offer multiple services.#1 – Sell a Managed Service (SaaS)
You can sell a managed service.
Find something that developers need.
Create a service that offers that solution.
Sell it in XaaS model.
- We do it at testRTC for testing and monitoring WebRTC services.
- Callstats.io does that for monitoring.
- XirSys and a few others offer a managed service for NAT Traversal (=someone else hosts the TURN and STUN servers that your application uses)
- Mobilinq and others offer a customized hosted offering
- And then there are CPaaS vendors. Many of them offering WebRTC as well (check out this report on WebRTC CPaaS)
This market is rather challenging, as the name of the game is scale, and getting there is hard. For some reason, this is also where most customers end up penny pinchin.#2 – License Software
You can develop a product that others need and offer it under a commercial license.
There are those who want or need to run their own service, not relying on managed services. And at times, they are happy to pay for a commercial license that comes with an SLA and someone you can shout at and threaten.
The best thing about most commercially licensed software is that the people behind it work on that software. And once they have paying customers, they are bound by contracts to support and maintain it, usually for long periods of time.
Open Source doesn’t mean free.
People need to be able to make money out of their work – even if they are idealists who are just contributing to the community as a whole.
The way to go about doing that is by writing software that then gets distributed freely under an open source license. This allows anyone to take that software, use it, modify it and even try and contribute back to it and improve upon it.
For popular open source projects, this creates a nice feedback loop that everyone enjoys. For the most obscure projects, it remains the work of a single maintainer.
So how can someone make a living out of open source? By offering one of three different alternatives (usually a mix of them):
- Support contracts – if you’re the owner and main maintainer of the open source, then you can sell support contracts. Those who use your open source project may have questions, and giving them priority support can be an income source. For companies, having support available on the open source projects they use can be an important aspect of choosing one open source project over another
- Customization work – copmanies who adopt open source projects sometimes need modifications to these projects. They can attempt to do it on their own, or they can just have the main maintainer of the project do it for them at a price
- Commercial license – LGPL, GPL, AGPL and other open source licenses are often considered as cancerous licenses for commercial products. The reason for that is that they “contaminate” the code written around them forcing their license terms on that code as well. There are other open source licenses that are more tolerable to companies (more about it here). Which is why in many cases, a company would prefer paying to get a commercial license instead of using the free open source licenses of a project. Dual licensing is another way of making a living
Jitsi, for example, was distributed under an LGPL license. This allowed the team behind it to make a living through all 3 approaches: support contracts, customization work and offering commercial licenses. After its acquisition by Atlassian, it switched from LGPL to a more lenient APL license. The main reason? Atlassian had other objectives for Jitsi and they weren’t about deriving direct monetary value from it. The Jitsi team no longer offers paid support or customization – it doesn’t mean they don’t support the code base, it just means that you can’t pay them for priority support.
Kurento got acquired by Twilio. Naevatec, the company behind Kurento made most of its direct revenue from Kurento by offering support and customization work. After the acquisition, Naevatec was left without its engineers that were experienced with Kurento and has since been struggling to maintain the Kurento codebase.
Janus is still an open source project. The company behind it offers support and customization work if someone needs it.
To be able to make a living out of an open source project, it needs to be one that is mission critical to the companies who use it, and it needs to be popular enough. If you plan on taking that route, remember that maintaining such a project can make you proud at the number of companies that end up adopting it, but may well frustrate you if you look at how many of these companies won’t be willing to pay for it at all.#4 – Conduct Analysis
This is something I wasn’t aware of up until several months ago.
There’s this interesting market niche in WebRTC, and I am not sure how prevalent it is with other technologies.
It is of companies and enterpreneurs who set out building a product with not enough knowledge and experience in WebRTC. They try to learn as they go along, floundering while at it. Many reasons why this happens:
- They are doing it with an itnernal team that doesn’t have the skill set
- They outsourced the project to an open source vendor who knows nothing about WebRTC, but knows how to build a mobile app, a website or even a VoIP service
- They outsource the project but don’t scope it properly, getting a product that isn’t what they really wanted – and then blaming the outsourcing company about it
When this happens, companies start looking for alternatives. And there really are only 4 things to do here:
- Close shop and go home. Consider this a failure and just move on to other projects
- Reboot. Look at all of it as sunk costs and start from scratch
- Fix. Get your team or pay the outsourcing vendor (or other outsourcing vendors) to continue working on the project until it is working
- Salvage. Get an expert to look at the existing codebase, analyse it, offer his advice and even let him do the fixing
Salvage is somewhat different from fixing, as it focuses on analyzing the whole architecture along with the implementation instead of just diving right in and continuing with the same approach that brought you to where you are in the first place.
You’re good with coding and know WebRTC?
Outsource it to others.
Many of the people who contact me are after developers with WebRTC experience. Some of them want to have these developers work as freelancers. Others want to outsource to a company. Others still are looking to recruit skilled workers, but understand they may end up outsourcing anyway.
There are quite a few companies and individuals who offer their outsourcing services around WebRTC.
The known freelancers who do WebRTC work are usually fully booked. It is hard to get their attention and time for new projects, but it is worth a try.
The outsourcing companies come in different shapes and sizes. Many don’t have the relevant skillset. Some will place inexperienced developers on your project. Some will do the best work for you.
Quality here varies greatly, so you should take the time to pick the right outsourcing vendor to work with.
In many cases, my role in such projects is to assist in deciding on the exact requirements, selecting the outsourcing vendor and “translating” the requirements between the company and the outsourcing vendor.#6 – Consult
There are those who simply offer consulting (I do that by the way).
Their role is to assist in the thought processes – be it the initial phases of helping in fleshing out the product’s roadmap and differentiation, assisting in the competitive analysis, in writing down the RFPs (or the response to an RFP), selecting vendors, suggesting architecture, etc.
Many of the experienced outsourcing vendors will usually add a consulting component into their service, and their customers will usually benefit from that consulting.What’s Next?
Looking to start a WebRTC project? Trying to understand how to get that done? Know that the market is dynamic and always changes.
Which is why I am in the process of updating two resources on my site:
- Choosing a WebRTC API Platform report
- If you think a vendor that isn’t in the report needs to be added to it – tell me
- If you plan on purchasing this report, then the best time would be from now until the publication of the update (see below)
- WebRTC Developer Tools Landscape will be updated soon – if you miss vendors here – tell me