Yes. We do need WebRTC events. Which is why you should join us at Kranky Geek next week.
I’ve been asked a few times in the past several months by people about events to go to.
Should I go to that event? Will it help me with my current WebRTC project?
What event should I go to, considering I am in need of WebRTC technology?
Where can I travel to learn about WebRTC? Is there a specific event?
Which event will guide me towards what I need with WebRTC? Have me understand the market dynamics? Be a place to mingle with the industry?Register for a Kranky Geek AMA webinar – a week ahead of our event, Chad Hart will be joining me to discuss WebRTC statistics and what to expect from this year’s Kranky Geek event
If you’re in telecom, then this is how you see WebRTC:
For telecom, WebRTC is just a piece of telecom. An evolution of it. Some way of getting the telecom and VoIP infrastructure into a web browser.
If you’re in web development, then this is how you see WebRTC:
For web developers, WebRTC seems just like another piece of the HTML5 technology stack. You learn a few JS APIs. Maybe some nifty CSS and a few HTML5 tags and you’re done.
And this is how I see WebRTC:
Now, most WebRTC related events so far have been initiated by people in the telecom industry. The end result is usually a very narrow prism of what WebRTC is what it is capable of achieving. And the side tracks done in the web related events? Most of them end up explaining what WebRTC is, not going nearly deep enough.
The end result has been unsatisfying. At least for me.
4 years into it, the question starts to crop up – do we still need WebRTC events?Why do we still need WebRTC events?
Is there still room with a WebRTC centric theme to it?
Shouldn’t WebRTC just be wrapped into all the telecom, communications and web events out there and be done with it?
I mean, we’ve got enough meetup groups around the world for this technology, but who wants to attend a longer event on WebRTC?
I think it boils down to that illustration up there – the one where WebRTC is smack in the middle of VoIP (telecom) and the web (internet). In a way, we’re still figuring out what that means exactly. How does the infrastructure of such a thing needs to be designed; how do you scale it; what kind of monitoring mechanisms do you need to have in place; what’s the team sizes, resources and time needed to get something from a proof of concept to production.
WebRTC might not be new, but the fact that it relies on a mix of technologies and disciplines make for a rather complex and interesting ecosystem.Join us at Kranky Geek SF 2017
Our next Kranky Geek event takes place on October 27 in San Francisco.
Kranky Geek is about WebRTC developers. Our role is to educate and share the experience coming from developers to developers.
The theme we’ve selected this time is twofold: implementation and beyond RTC.
- Implementation: Production ready systems. Those that have battle scars and live to tell their story. We have companies who’ve been running WebRTC in production, at scale for quite some time, and now they are here to explain what they are doing – the challenges they faced and the solutions they came up with
- Beyond RTC: You’ve probably heard a word or two about VR, AR, NLP, AI – acronyms that seem to be capturing the news and the imagination lately. We’ve decided to bring in a few experts in this field to explain how that fits into the story of WebRTC
We reached out to Youenn Fablet, who works on the WebKit WebRTC implementation. He will be speaking about iOS and Safari support of WebRTC.
Google will talk about their progress and roadmap of WebRTC.
Talking about Implementations, we will have Atlassian, Facebook, Peer5, Slack and Vidyo- each talking about different aspects of implementations and scaling.
Affectiva, TokBox, Twilio and VoiceBase will cover issues beyond RTC.
For our end-of-day session, we will have a repeat speaker at Kranky Geek – Philipp Hancke from appear.in – working his way around NSFW. Knowing Philipp (and seeing his draft slides), you definitely want to stick around for this one.Register for a Kranky Geek AMA webinar – a week ahead of our event, Chad Hart will be joining me to discuss WebRTC statistics and what to expect from this year’s Kranky Geek event
There’s a token admission fee in place, to control headcount and showups (free events tend to be under-attended, and we’re shifting away from that). The way this event ends up being funded is by our sponsors, who make this thing happen at all. They are part of our speakers and play an important role in the event itself.
See you at Kranky Geek.
How does Twilio Studio fit into Twilio’s Ask Your Developer campaign?
Last month I participated in Twilio’s Signal event that took place in London. I was invited to speak there on test automation in WebRTC. You can watch my video session on YouTube. That isn’t the point of this article though.
Signal is where Twilio announces most of its major new releases. Last time, earlier this year, it was all about the engagement cloud – a restructuring of how Twilio explains its services – and a migration from a single channel world into an omnichannel one. I’ve written at length about it in Is Twilio Redefining CPaaS (hint: it is). I wrote there:
Twilio has introduced a new paradigm for the way it is layering its product offerings.
In the process, it repositioned all of its higher level APIs as the Engagement Cloud. It stitched these APIs to use its lower Programmable Communications APIs, adding business logic and best practices. And it is now looking into machine learning as well.
It is a powerful package with nothing comparable on the market.
Twilio are the best of suite approach of CPaaS – offering the largest breadth of support across this space. And it is making sure to offer powerful building blocks to make developers think twice before going for an alternative.
I think that at Signal London 2017, they outdid that with the introduction of Twilio Studio.
Trying to figure out the best approach for developing your application? Check out this free WebRTC Development Paths Matrix to understand your alternatives
Get your WebRTC Development Paths MatrixBefore We Begin
You might want to take the time to watch Signal London 2017 keynote by Jeff Lawson.
A large part of the London keynote was a rehash of what was said in San Francisco earlier this year. It was about the shift towards omnichannel and the engagement cloud. The words that struck to to me when explaining the engagement cloud were BEST PRACTICES, BUSINESS PROCESSES, REINVENT THE WHEEL (=what not to do).
I’d like to touch in this articles a few main themes and approaches that Twilio is taking, which are shaping its vision and execution at the moment.“Ask Your Developer” is The Wrong Approach
I’ll start with where I think Twilio is missing the mark.
Ask Your Developer took center stage. Jeff Lawson wanted companies and the business people inside it to go ask their developers what they can do. How they can improve the business.
It gives us developers a great feeling of being in control. Of being valued. But for the most part, and for most developers, this is probably the wrong approach.
Most developers would be happy to work by spec.
The few that aren’t will be promoted quite fast to system architects, managerial roles in development or god forbid to product managers. Why? Because they can see the big picture.
They are the people that get asked. Or the people that answer without asking.
We should be asking our developers, but it should not be our strategy.
Which is where the miss came.
Twilio announced later on in the keynote Twilio Studio. A tool that takes some of that control from developers, putting it at the hands of decision makers.
You no longer have to ask your developer. You can work with him. Together.
More about this later.The Code that Counts
Some 20 minutes into the keynote, Jeff Lawson invited Patrick Malatack. He started with this:
It was core to how Twilio approaches its customers. Patrick explained that this is the most important code – it is the code that counts.
The idea being that your life as a developer should be made easy, so Twilio is adding not only APIs that serve the functions you need, but also a runtime behind it to facilitate rapid development and deployment – from helper libraries, to logging and debugging facilities, the new Twilio Functions, etc.
I think the code that counts here is developers focusing on their specific business problem – abstracting everything else.
It ended up being a concept of what Twilio Runtime is:
The yellow parts in that screenshot above are the newest announcements. The rest were there earlier. Twilio isn’t only adding more features to its platform – it is beefing up its runtime, making it another competitive advantage in front of many others where it comes to pure SMS and voice capabilities.
The message here is an interesting one, but it wasn’t polished enough. I think this is where we will see more in future Signal events from Twilio.Twilio Studio
It starts by explaining that building is fun but maintaining isn’t (he is correct).
The goal, based on Jeff Lawson, is to massively accelerate roadmaps of Twilio’s customers.
I think it is a lot more than that.
Because this is so new and fresh, still in developer preview (and something I’ve started playing with a bit), it is hard to write this in an ordered fashion. Which means I’ll be going for a bulleted list instead
- This is a really cool tool. From the demos and the time I’ve spent with Twilio Studio, it is really powerful
- Getting UI tools that handle state machines for developers is not easy. The Twilio Studio experience has a nice feel to it – I liked the experience
- Twilio Studio reminds me of Zapier. But where Zapier has a 1D linear approach to tooling and integration, Studio is its big brother, offering 2D visualization to communication state machines
- There’s no support for the visible communication parts in Twilio Studio. Yet
- You can send and receive programmable SMS and voice with it
- A bit of messaging as well
- But you can’t connect it to the voice in your SDK or manage a video chat room with it
- This will need to be added later at some point to complete the puzzle
- Is Twilio Studio the centerpoint of a customer’s flow or a corner piece of it?
- Twilio Studio can be used to express your whole business process, fleshing out the important parts and branching away to your integrations
- It can also be used to solve a minor piece of your bigger puzzle
- It is up to you to decide how you use it
- At the hand of an experienced architect, Twilio Studio will offer super powers
- There are many ways to define and template what you need
- Some approaches will work better, offering more flexibility
- The focus should be around inclusion of as many stakeholders in the company as possible – being able to show them and interact with them by looking at a Twilio Studio Flow
- Here’s a question: Is Twilio Studio a tool for Developers? Designers? Implementers? Analysts?
- Twilio Studio today is fit for developers, but it won’t stay that way long
- It can be used by implementers that know a bit about code but aren’t developers
- It can be used to open a discussion between a developer and a business analyst
- This is a way for expanding the target market within a Twilio’s customer from solely one of developers towards a larger audience. The motto is no longer “Ask your developer”
- Twilio Studio can be enhanced
- It is a great first step, but the next ones are a lot more interesting
- They are also a lot more threatening to competitors
- If Twilio succeeds here, it will dominate this space with the companies that matter the most
- Twilio Studio is the ultimate vendor lock-in
- Enterprises will adopt it, due to its many benefits
- They will find it hard to switch because of these benefits
- Enterprises won’t want to switch… Twilio Studio will be too valuable. Too transformative
This tool can do to contact centers what marketing automation is doing to email newsletters. If I were a contact center vendor… I’d consider Twilio Studio my biggest threat moving forward.Pricing
There were 3 price points for Studio:
- FREE – up to 1,000 Engagements. To get developers hooked up to this tool and make them not bother with actually “developing” using “code”. It is also a great way of getting developers to NOT look at other competing vendors
- The minimal plan, at +$100/month price point. Covers up to 20,000 Engagements. This is probably where most small companies will be “living”, which is just fine
- The enterprise, unlimited plan, at $10,000/month or more. Expensive, but it depends how much traffic you’re handling
Then there’s the question of what an Engagement is exactly. Is it a flow of a single event in a Flow? Is it a widget being accessed inside a Flow? In a 2-way bot conversation, each message exchange is probably an exchange I am assuming – the more talkative your app – the more Engagements it will eat up.
Not sure if I am missing a tier between PLUS and ENTERPRISE here. There seems to be too big of a gap in there.Positioning
One last thing – Twilio Studio has been positioned by Jeff Lawson inside the Engagement Cloud, below all of its current logical components:
I’d place it as a vertical bar next to the whole Twilio stack. Probably adding Functions write next to it:
My guess? Product management had a lot of internal discussions on this one, trying to decide where to place Studio – inside the engagement cloud, above it, right next to it. They ended up picking inside it.A Word About GDPR
GDPR stands for General Data Protection Regulation. It is a piece of legislation that will become effective May 2018, in less than a year. A period of two years of grace has been given to reach that date.
It deals with the protection and processing of private information of citizens of the EU, which practically covers any global player out there, and even many who aren’t.
In a nutshell, it is a headache. Especially if you’re making use of analytics, personalization, automation, chat bots, AI or any other big data related technology. It is also relevant if you just hold an SQL database of your customers.
If you were working in a specific regulated vertical, such as healthcare or finance, then you might be used to such things. If you’re not, then you should start paying attention. Especially with the communication part of whatever it is that you do – this is where personal information gets passed along with the metadata that needs to be handled with care.
Twilio pushing GDPR this early on means two things to me:
- They are looking at the enterprise, and making sure their platform is fit for their purpose (large multinational enterprises will be the first to adopt and adhere to something like GDPR)
- They are making sure that they are leading the CPaaS pack here. I am unaware of any other CPaaS vendor who has been pushing GDPR besides stating that they will be ready by May 2018. Twilio is trying to make sure it is synonymous with “GDPR compliant CPaaS”.
It also means that communication – telecom or IP based – is becoming slightly harder to handle. Something that works well for a vendor like Twilio whose purpose in life is simplifying complexity (=the more complexity the more value derived by Twilio).Where do we go from here?
Twilio was and still is the undisputed CPaaS king. They are bigger than anyone else by a large margin and they are working hard on maintaining a technology edge on everyone else.
The two main announcements here were Studio and GDPR. Studio brings Twilio to a larger audience and increases their vendor lock-in, whereby reducing the effectiveness of their competition. GDPR is put in place as another headache Twilio solves for its customers – the more regulatory and bureaucracy like GDPR the better for a company like Twilio – it reduces the competition from in-house developers – which is doubly important now.
These two announcements are there to deal with its perceived vulnerability. They make developing using Twilio easier than ever – almost risk-free. And it makes it harder for competition to succeed in future land grabs trying to go after Twilio’s bigger accounts.
It will be interesting to see how competitors would react to this in the long run, and even more interesting to see what will Twilio Studio grow into.
Trying to figure out the best approach for developing your application? Check out this free WebRTC Development Paths Matrix to understand your alternatives
Get your WebRTC Development Paths Matrix
The post Thoughts about Twilio Studio and the Future of CPaaS appeared first on BlogGeek.me.
No simple answer.
Apple recently announced that Safari will be supporting WebRTC. That support isn’t there yet to the point where it is stable enough, but we already know one thing:
Safari supports only the H.264 video codec.
Codec wars are over? 2 MTI (mandatory to implement) codecs in the form of VP8 and H.264?
Reality is that Apple decided at this stage not to support VP8 – and it hasn’t said anything about plans to support or not support VP8 in the future. That said, all signals indicate that support for VP8 in Safari is unlikely to happen.
This brings us to a simple yet challenging question:
When writing a WebRTC application. Should you make use of VP8 or H.264?
The answer isn’t a simple one. Choosing VP8 will leave you without Safari. Choosing H.264 will leave you without other important features and capabilities, as well as create a potential legal headache.
This is why I decided to create a new free video mini course – to guide you through the process and help you make the best decision here.
This video course, Picking a WebRTC Video Codec, is free and includes 4 lessons and a cheat sheet.
Find out which codec to use: VP8 or H.264
Looking for a WebRTC training? Search no more. My online WebRTC course is here.
I will be relaunching my Advanced WebRTC Architecture Course next week, so it is time to see what you’ll find in this WebRTC training program I’ve created and fine-tuned for over a year now.
Prefer watching and listening more than reading? Join my free webinar on Wednesday for a quick lesson on WebRTC architecture related topics, where I’ll also be explaining the WebRTC training course and its contents.
The sections below explain the various parts of this unique WebRTC training. These are decidedly focused on delivering the best learning experience possible.WebRTC Training Main Modules
The course is designed and built around 7 main modules:
Each module includes multiple lesson, and each lesson is a recorded video session of anywhere between 10-40 minutes of length. Most lessons also include additional links and some written content in them.
Module 1 gives you the baseline information about what WebRTC is. Consider it your introduction to the topic.
Modules 2-3 focus on signaling. They’ll take you from an understanding of UDP and TCP up to deciding what signaling protocol to use in each case and why.
Modules 4-5 are all about media. They explain voice and video codecs – in the context of their relevance to WebRTC. They also deal with the various media architectures available in group calling and recording scenarios.
Module 6 is all about the ecosystem. It lists the different strategies developers have in front of them when designing a WebRTC application, and then goes into details of each one of these.
Module 7 brings it all together. It takes different scenarios and use cases, analyzes them and builds the necessary architectures to support each use case. This is where the theory comes into practice.
The total length of the recordings in all modules and lessons? Over 15 hours.
You progress with the material at your own pace, jumping between lessons as you see fit, or through the original order they were laid out in.
If you’re looking for something to print and share, there’s a PDF version of the WebRTC course syllabus available.Get Your WebRTC Questions Answered in the Course Forum
The course itself is supplemented with an online forum.
I’ve been contemplating making that forum a Slack channel or a Facebook group. Decided against it. While that may change with time, the course does have a forum built into it.
When you enroll to the course, you also gain access to the forum, which is where you can ask questions and get answers to them.
At any point in time.
Be it about a specific lesson, or a challenge you have in what you’re currently doing at work with WebRTC.
And if sharing openly isn’t your thing, you can always just email me directly.WebRTC Training Office Hours
Twice a year, a series of office hours are provided for the course.
There are 12 such live sessions, taking place on roughly a weekly basis. They happen in 2 different times of the day, to fit different timezones.
These office hours include two parts in them:
- Me rambling about a topic. Call it a live lesson. It can be something from the actual course, or just thoughts and updates on what’s been going on lately with WebRTC out there
- Q&A. In this part, those enrolled to the course can ask anything they want. It is a part of the course which not many use, but those that do seem to enjoy it and derive benefit from it
The office hours are recorded and available for playback as well, so if you miss a session – you can always return to it and play it back.WebRTC Course Bonus Materials
Besides all the 7 course module, I’ve added a bonus module.
This one contains some extra lessons as well as cheat sheets and templates that are spread all over my site in an easy to reach location.
What lessons are in the bonus materials?
4 recorded lessons
- WebRTC standardization
- Writing RFP requirements for WebRTC
- Media algorithms
- Using testRTC
The media algorithms lesson is really important. It covers topic that I touch only lightly during the course such as echo cancellation and jitter buffer.
2 recorded guest lessons
In my last round of the course, appear.in, who took the corporate plan, were also kind enough to share two new guest lessons:
- Video Quality in WebRTC
- Deploying (co)TURN on AWS
Philipp Hancke and Bradley T. Hughes were the instructors for these two and I found myself learning a lot in these lessons as well. Now, they are part of the course bonus materials.What’s New in This Round of the WebRTC Course?
This is the third time I am running this course, and the second round of updates to it.
- I’ve updated some of the materials where appropriate (someone told me recently that Apple is doing something with WebRTC, so it had to find its way to the course )
- I also recorded a session from scratch because apparently, the audio recording of that one wasn’t the best
- The bonus materials (described above), are going to go away. They will be available only during course launch periods (=this week) or for corporate plans
- There’s a new eBook that is going to be added as a bonus to the course. It is called “Built to Scale”, and it is a look behind the scenes of how meet.jit.si is… built to scale
I am now adding an option to take my WebRTC training as part of every consulting project I take. Sometimes, the customer takes me up on the offer, and other times they don’t. There are questions that get asked almost all the time about the course by these customers, so I decided to answer the most common ones here.How long will it take to work through the WebRTC course?
It is entirely up to you.
There’s over 15 hours of recorded content in the course. More if you start going through the links, external slide decks and videos that I share in the course lessons.
But at the end of the day:
- You decide on the pace of your WebRTC studies
- You decide which lessons to start with first
- You decide if there are lessons you prefer skipping
- You decide if you want to watch to a specific lesson again
If you take a lesson in each working day, then 2 months is approximately what you’ll need to get from start to end.Is there any prerequisite to taking this WebRTC training?
This WebRTC training program assumes you have some good understanding of technology. The rest – it fills in with the various modules of the course.
You don’t need to have knowledge in VoIP to take this course. You don’t need to be a web developer either. What you do need, is to have some technical grasp and understanding.
If you already have prior knowledge, then that’s fine – this WebRTC course isn’t forcing you to take its modules and lessons by their order, so you can skip to the relevant topics that interest you.Is there a certificate?
As most online learning courses go, so too the WebRTC course offers a certificate.
Once you’ve completed the course, you will be receiving a WebRTC certificate indicating you’ve passed the course.
For companies, there’s a separate plan, which enables them to hold a badge of the WebRTC course. You can find the vendors that have taken this plan in the corporate partners page.What’s Next?
Want to learn more about media in WebRTC? Join this free webinar to see an analysis of a real case study I came across recently. What did the company had in mind to build and how they botched their architecture along the way.
And if you’re really serious, enroll to my Advanced WebRTC Architecture Course.
Media in WebRTC.
What makes it so challenging?
I guess it can be attributed to the many disciplines and different areas of knowledge that you are expected to grok.
My last two articles? They were about the differences between VoIP, WebRTC and the web.
By now, you probably recognize this:
If you’ve got some VoIP background, then you should know how WebRTC is different than VoIP.
If you’ve got a solid web background, then you should know why WebRTC development is different than web development.
When it comes to media, media flows and media related architectures, there seems to be an even bigger gap. People with VoIP background might have some understanding of voice, but little in the way of video. People with web background are usually clueless about real time media processing.
The result is that in too many cases, I see WebRTC architectures that make no sense in how they fit to what the vendor had in mind to create.
Want to learn more about media in WebRTC? Join this free webinar to see an analysis of a real case study I came across recently. What did the company had in mind to build and how they botched their architecture along the way.
Here are 4 reasons why media is so challenging:#1 – Media is as Real Time as it Gets
Page load speed is important. People leave if your site doesn’t load fast. Google incorporates it as an SEO ranking parameter.
This is how it is depicted today:
So… every second counts. And the post slug is “your-website-design-should-load-in-4-seconds”.
From a WebRTC point of view, here’s what I have to say about that:
If I were given a full second to get things done with WebRTC I’d be… (fill in the blank)
Seriously though, we’re talking about real time conversations between people.
Not this conversation:
But the one that requires me to be able to hold a real, live one. With a person that needs to listen to me with his ears, see me with his eyes, and react back by talking to me directly.
400 milliseconds of a roundtrip or less (that’s 200 milliseconds to get media from your camera to the display on the other side) is what we’re aiming for. A full second would be disastrous and not really usable.
For real.#2 – Media Requires Bandwidth. Lots and Lots of Bandwidth
This one seems obvious but it isn’t.
Here’s a typical ADSL line:
Most people live in countries where this is the type of a connection you have into your home. You’ll have 20, 40 or maybe 100MB downlink – that’s the maximum bitrate you can receive. And then you’ll have 1, 2 or god forbid 3MB uplink – that’s the maximum bitrate you can send.
You see, most of the home use of the internet is based on the premise that you consume more than you generate. But with WebRTC, you’re generating media at all times (if it isn’t a live streaming type of a use case). And that media generation is going to eat on your bandwidth.
Here’s how much it takes to deliver this page to your browser (text+code, text+code+images) versus running 5 minutes of audio (I went for 40kbps) and 5 minutes of video (I went for 1Mbps). I made sure the browser wasn’t caching any page elements.
There’s no competition here.
Especially if you remember that with the page it is you who is downloading it, while with audio and video you’re both sending and receiving – it it is relentless as long as the conversation goes on the data use will grow.
Three more things to consider here:
- Usually, the assumption is that you need twice the bandwidth available than what you’re going to effectively send or receive (overheads, congestion and pure magi)
- You’re not alone on your network. There are more activities running on your devices competing over the same bandwidth. There can be more people in your house competing over the same bandwidth
- If you’re connecting over WiFi, you need to factor in stupid issues such as reception, air interferences, etc. These affect the effective bandwidth you’ll have as well as the quality of the network
So it’s real time and it eats bandwidth. But that’s only half the story.
The second half involves anything else running on your device.
To encode and decode you’re going to need resources on that device.
CPU. Something capable. A usable hardware acceleration for the codecs to assist is welcomed.
Memory. Encoding and decoding are taxing processes. They need lots and lots of memory to work well. And also remember that the higher the resolution and frame rate of the video you’re pumping out – the higher the amount of memory you’ll be needing to be able to process it.
Bus. Usually neglected, there’s the device’s bus. Data needs to flow through your device. And video processing takes its toll.
Doing this in real time, means opening dedicated threads, running algorithms that are time sensitive (acoustic echo cancellation for example), synchronizing devices (lip syncing). This is hard. And doing it while maintaining a sleek UI and letting other unrelated processes run in the background as well makes it a tad harder.
So thinking of running multiple encoders and decoders on the device, working in mesh topologies in front of a large number of other users, or any other tricks you’re planning need to account for these challenges. And they need to put in focus the fact that browser vendors need to be aware of these topologies and use cases and take their time to optimize WebRTC to support them.#4 – Media is Just… Different
Then there’s this minor fact of media just being so darn different.
It isn’t TCP, like HTTP and Websocket.
It requires 3 (!) different servers to just get a peer to peer session going (and they dare call it peer to peer).
Here’s how most websites would indicate their interaction with the browser:
And this is how a basic one would look like for WebRTC:
We’ve got here two browsers to make it interesting. Then there’s the web server and a STUN/TURN server.
It gets more complicated when we want to add some media servers into the mix.
In essence, it is just different than what we’re used to in the web – or in VoIP (who decided to do signaling with HTTP anyway? Or rely on STUN and TURN instead of placing an SBC?).What’s Next?
These reasons of media being challenging? Real time, bandwidth-needy, resource hog and being different; That’s on the browser/client side only. Servers that need to process media suffer from the same challenges and a few more. One that comes to mind is handling scale.
So we’ve only touched the tip of the iceberg here.
This is why I created my Advanced WebRTC Architecture Course a bit over a year ago. It is a WebRTC training that aims at improving the WebRTC understanding of developers (and the semi-technical people around them).
In the coming weeks, I’ll be relaunching the office hours that run alongside the course for its third round. Towards that goal, I’ll be hosting a free webinar about media in WebRTC.
I’ll be doing something different this time.
I had an interesting call recently with a company moving away from CPaaS towards self development. The mistake they made was that they made that decision with little understanding of WebRTC.
Here’s what we’ll do during the webinar:
- Introduce the requirements they had
- Explain the architecture and technology stack they selected
- Show what went wrong
- Suggest an alternate route
Similar to my last launch, there will be a couple of time limited bonuses available to those who decide to enroll for the course.
Want to learn more about media in WebRTC? Join this free webinar to see an analysis of a real case study I came across recently. What did the company had in mind to build and how they botched their architecture along the way.
And if you’re really serious, enroll to my Advanced WebRTC Architecture Course.
The post Grokking Media in WebRTC (a free webinar for my WebRTC Course) appeared first on BlogGeek.me.
Soda and Mentos.
Last week I wrote about the difference between WebRTC and VoIP development. This week let’s see how WebRTC development is different from web development.
Let’s start by saying this for starters:
WebRTC is about Web Development
Well, mostly. It is more about doing RTC (real time communications). And enabling to do it over the web. And elsewhere. And not necessarily RTC.
WebRTC is quite powerful and versatile. It can be used virtually everywhere and it can be used for things other than VoIP or web.
When we do want to develop WebRTC for a web application, there are still differences – in the process, tools and infrastructure we will need to use.
Why is that?
Because real time media is different and tougher than most of the rest of the things you happen to be doing on the browser itself.
It boils down to this illustration (from last week):
So yes. WebRTC happens to run in the web browser. But it does a lot of things the way VoIP works (it is VoIP after all).
WebRTC dev != Web dev. And one of the critical parts is the servers we need to make it work. Join my free mini video WebRTC course that explains the server story of WebRTC.
Join the free server side WebRTC course
If you plan on doing anything with WebRTC besides a quick hello world page, then there’s lots of new things for you to learn if you’re coming from a web development background. Which brings me to the purpose of this article.
Here are 10 major differences between developing with WebRTC and web development:#1 – WebRTC is P2P
Seriously. You can send voice, video and any other arbitrary data you wish directly from one browser to another. On a secure connection. Not going through any backend server (unless you need a relay – more on that in #6).
That triangle you see there? For VoIP that’s obvious. But for the web that’s magical. It opens up a lot of avenues for new types of services that are unrelated to VoIP – things like WebTorrent and Peer5; The ability to send direct private messages; low latency game controllers; the alternatives here are endless.
But what does this triangle mean exactly?
It means that you are not going to send your media through a web server. You are going to either send it directly between the browsers. Or you are going to send it to a media server – dedicated to this task.
Media servers for example are almost always developed using C/C++ or Java. If you’ll need to debug them (and the serious companies do that), then you’ll need to understand these languages as well.
I guess this is the start of the following points as well, so here we go.
Today, the web is built on top of TCP. It started with HTTP. Moved to Websockets (also on top of TCP). And now HTTP/2 (also TCP).
There are attempts to allow for UDP type of traffic – QUIC is an example of it. But that isn’t there yet. And for most web developers that’s just under the hood anyway.
With WebRTC, all media is sent over UDP as much as possible. It can work over TCP if needed (I sent you to #6 didn’t I?), but we try to refrain for it – you get better media quality with UDP.
The table above shows the differences between UDP and TCP. This lies at the heart of how media is sent. We use unreliable connections with best effort.#4 – Compromise is the Name of the Game
That UDP thing? It adds unreliability into the mix. Which also means that what you send isn’t what you get. Coupled with the fact that codecs are resource hogs, we get into a game of compromise.
In VoIP (and WebRTC), almost any decision we make to improve things in one axis will end up costing us in another axis.
Want better compression? Lose quality.
Don’t want to lose quality? Use more CPU to compress.
Want to lower the latency? Lose quality (or invest more CPU).
On and on it goes.
While CPUs are getting better all the time, and available bandwidth seems to be getting higher as well, our demand of our media systems is growing just as well. At times even a lot faster.
That ends up with the need to compromise.
All the time.
You’ll need to know and understand media and networking in order to be able to decide where to compromise and where to invest.#5 – Best Effort is the Other Name
Here’s something I heard once in a call I had:
“We want our video quality to be a lot better than Skype and Hangouts”.
I am fine with such an approach.
But this is something I heard from:
- 2 entrepreneurs with no experience or understanding if video compression
- For a use case that needs to run in developing countries, with choppy cellular reception at best
- And they assumed they will be able to do it all by themselves using WebRTC
It just doesn’t work.
WebRTC (and VoIP) are a best effort kind of a play.
You make do with what you get, trying to make the best of it.
This is why WebRTC tries to estimate the bandwidth available to it, and will then commence eating up all that available bandwidth to improve the video quality.
This is why when the network starts to act (packet loss), WebRTC will reduce the bitrate it needs and reduce the media quality in order to accommodate what is now available to it.
Sometimes these approaches work well. Other times not so well.
And yes. A lot of the end result will be reliant on how well you’ve designed and laid out your infrastructure for the service.#6 – NAT Traversal Rules Your Life
Networks have NATs and Firewalls. These are nothing new, but if you are a web developer, then most likely they never did make life any difficult for you.
That’s because in the “normal” web, the browser will reach out to the server to connect to it. And being the main concept of our current day web, NATs and Firewalls expect that and allow this to happen.
Peer to peer communications, direct across browsers, as WebRTC operates. And with the use of UDP no less (again, something that isn’t usually done in the web browser)… these are things that firewalls and the IT personnel configuring them usually don’t need to contend with.
For WebRTC, this means the addition of STUN/TURN servers. Sometimes, you’ll hear the word ICE. ICE is an algorithm and not a server. ICE makes use and STUN and TURN. STUN and TURN are two protocols for NAT traversal, each using its own server. And usually, STUN and TURN servers are implemented in the same code and deployed using a single process.
WebRTC is doing a lot of effort to make sure its sessions will get connected. But at the end of the day, even that isn’t always enough. There are times when sessions just can’t get connected – whoever configured the firewall made sure of it.#7 – Server Scaling is Ridiculous
Server scaling with WebRTC is slightly different than that of regular web.
There are two main reasons for that:
- The numbers are usually way smaller. While web servers can handle 5 digit connections or more, their WebRTC counterparts will often struggle with the higher end of 3 digits. There’s a considerable cost of hosting HD video and media server processing
- WebRTC requires statefulness. Severing a connection and restarting it will always be noticeable – a lot more than in most other web related use cases. This makes high availability, fault tolerance, upgrading and similar activities harder to manage with WebRTC
You’ll need to understand how each of the WebRTC servers work in order to understand how to scale it.#8 – Bandwidth is Expensive
With web pages things are rather simple. The average web page size is growing year to year. We’ve got above 2.3MB in 2016. But that page is constructed out of different resources pulled from different servers. Some can be cached locally in the browser.
A 5 minute HD video at 2Mbps (not unheard of and rather common) will take up 75 MB during that 5 minutes.
If you are just doing 1:1 video calls with a 10% TURN relay factor, that can be quite taxing – running just 1,000 calls a day with an average of 5 minutes each will eat up 15 GB a day in your TURN server bandwidth costs. You probably want more calls a day and you want them running for longer periods of time as well.
Using a media server for group calling or recording makes this even higher.
As an example, at testRTC we can end up with tests that run into the 100’s of GBs of data per test. Easily…
When you start to work out your business model, be sure to factor in your bandwidth costs.#9 – Geography is Everything for Media Delivery
For the most part, and for most services, you can get away with running your service off a specific data center.
This website of mine is hosted somewhere in the US (I don’t even care where) and hooked up to CDN services that take care of the static files. It has never been an issue for me. And performance is reasonable.
When it comes to real time live media, which is where WebRTC comes in, this won’t always do.
Getting data from New York to Paris can easily take 100 milliseconds or more, and since one of the things we’re striving for is real time – we’d like to be able to reduce that as much as we can.
Which gets us to the illustration above. Imagine two people in Paris having a WebRTC conversation that gets relayed through a TURN server in New York. Not even mentioning the higher possibility of packet losses, there’s clearly a degradation in the quality of the call just by the added delay of this route taken.
WebRTC, even for a small scale service, may need a global deployment of its infrastructure servers.#10 – Different Browsers Behave Differently
Well… you know this one.
As a web developer, I am sure you’ve bumped into browsers acting differently with your HTML and CSS. Just recently, I tried to use <button> outside of a form element, only to find out the link that I placed inside it got ignored by Firefox.
The same is true for WebRTC. The difference is that it is a lot easier to bump into and it messes things up in two different levels:
- The API behavior – not all browsers support the exact same set of APIs (WebRTC isn’t really an official standard specification yet – just a draft; and browser implementations mostly adhere to recent variants of that draft)
- The network behavior – WebRTC means you communicate between browsers. At times, you might not get a session connected properly from one browser to another if they are different. They process SDP differently, they may not support the same codecs, etc.
As time goes by, this should get resolved. Browser vendors will shift focus from adding features and running after the specification towards making sure things interoperate across browsers.
But until then, we as developers will need to run after the browsers and expect things to break from time to time.#11 – You Know More Than You Think
The majority of WebRTC is related to VoIP. That’s because at the end of the day, is it a variant of VoIP (one of many). This means that VoIP developers have a huge head start on you when it comes to understanding WebRTC.
The problem for them is that they have a different education than you do. Someone taught them that a call has a caller and a callee. That you need to be able to put a call on hold. To transfer the call. To support blind transfer. Lots and lots of notions that are relevant to telephony but not necessarily to communications.
You aren’t “tainted” in this way. You don’t have to unlearn things – so that nagging part of an ego telling you how things are done with VoIP – it doesn’t exist. I had my share of training sessions where most of my time was spent on this unlearning part.
This means that in a way you already know one important thing with WebRTC – that there’s no right and wrong in how sessions are created – and you are free to experiment and break things with it before coming to a conclusion of how to use it.
That’s powerful.What’s Next?
If you have web development background, then there’s much you need to learn about how VoIP is done in order to understand WebRTC better.
WebRTC looks simple when you start with it. Most web developers will complain after a day or two of how complex it is. What they don’t really understand is how much more complicated VoIP is without WebRTC. We’ve been given a very powerful and capable tool with WebRTC.
Need to warm up to WebRTC? Try my free WebRTC server side mini course.
And if you’re really serious, enroll to my Advanced WebRTC Architecture Course.
The post Why Developing With WebRTC is Different than Web Development? appeared first on BlogGeek.me.
Water and oil?
Let’s start by saying this for starters:
WebRTC is VoIP
That said, it is different than VoIP in the most important of ways:
- In the ways entrepreneurs make use of it to bring their ideas to life
- In the ways developers yield it to build applications
Why is that?
Because WebRTC lends itself to two very different worlds, all running over the Internet: The World Wide Web. And VoIP.
And these two worlds? They don’t mix much. Beside the fact that they both run over IP, there’s not a lot of resemblance between them. Well, that and the fact that both SIP and HTTP has a 200 OK message.
Everyone is focused on the browser implementation of WebRTC. But what of the needed backend? Join my free mini video WebRTC course that explains the server story of WebRTC.
Join the free server side WebRTC course
If you ever developed anything in the world of VoIP, then you know how calls get connected. You’re all about ring tones and the many features that comprise a Class 5 softswitch. The turth of the matter is, that this kind of knowledge can often be your undoing when it comes to WebRTC.
Here are 10 major differences between developing with WebRTC and developing with VoIP:#1 – You are No Longer in Control
With VoIP, life was simple. All pieces of the solution was yours.
The server, the clients, whatever.
When something didn’t work, you’d go in, analyze it, fix the relevant piece of software, and be done with it.
WebRTC is different.
You’ve got this nagging thing known as the “browser”.
4 of them.
And they change. And update. A lot.
Here’s what happened in the past year with Chrome and Firefox:
A version every 6-8 weeks. For each of them.
And these versions? They tend to change things in how the browsers change their behavior when it comes to WebRTC. These changes may cause services to falter.
These changes means that:
- You are not in control over the whole software running your service
- You are not in control of when pieces of your deployment get upgraded (browsers will upgrade without you having a say in it)
VoIP doesn’t work this way.
You develop, integrate, deploy and then you decide when to upgrade or modify things. With WebRTC that isn’t the case any longer.
My pedigree comes from VoIP.
I am a VoIP developer.
I did development, project management, product management and then been a CTO of a business unit where what we did was develop VoIP software SDKs that were used (and are still used) in many communication products.
I am a great developer. Really. One of the best I know. At least when it comes to coding in C.
VoIP was traditionally developed in C/C++ and Java.
Click To Tweet
Three main reasons I can see for it:
- Fashion. Node.js is fashionable and new. WebRTC is also new, so there’s a fit
- Asynchronous. The signaling in WebRTC needs to be snappy and interactive. It needs to have a backend that can fit nicely with its model of asynchronous interactions and interfaces. Node.js offers just that and makes it easier to think of signaling on the frontend and backend at the same time. Which leads us to the third and probably most important reason –
VoIP is all about interoperability. A big happy family of vendors. All collaborating and cooperating. The idea is that if you purchase a phone from one vendor, you *should* be able to dial another vendor’s phone with it via a third vendor’s PBX. It works. Sometimes. And it requires a lot of effort in interoperability testing and tweaking. An ongoing arduous task. The end result though is a system where you end up testing a small set of vendors that are approved to work within a certain deployment.
VoIP and interoperability abhors the idea of islands. Different communication services that can’t connect to each other.
WebRTC is rather different. You no longer build one VoIP product or device that is designed to communicate with VoIP devices of other vendors. You build the whole shebang.
An island of sorts, but a rather big one. One where you can offer access through all browsers, operating systems and mobile devices.
You no longer care about interoperability with other vendors – just with interoperability of your service with the browsers you are relying on. It simplifies things some while complicating the whole issue of being in control (see #1 above).#4 – It is Cloudy
It seems like VoIP was always mean to run in local deployments. There are a few cases where you see it deployed globally, but they aren’t many. Usually, there’s a geography that goes into the process.
This is probably rooted with the origins of VoIP – as a replacement / digital copy of what you did in telecom before. It also relates to the fact that the world was bigger in the past – the cloud as we know it today (AWS and the many other cloud providers that followed) didn’t really exist.
Skype is said to have succeeded so much as it did due to the fact that it had a great speech codec at the time that was error resilient (it had FEC built-in at a time companies conceptualized about bickering in the IETF and the ITU standard bodies about adding FEC in the RTP layer). It also had NAT traversal that just worked (again, when STUN and TURN were just ideas). The rest of the world? We were all happy enough to instruct customers to install their gatekeepers and B2BUAs in the DMZ.
Since then VoIP has evolved a lot. It turned towards the SBC (more on this in #10).
WebRTC has bigger challenges and requirements ahead of it.
For the most part, and with most deployments of WebRTC, there are three things that almost always are apparent:
- Deployments are global. You never know from where the users will be joining. Not globally and not their type of network
- Networks are unmanaged. This is similar to the above. You have zero control over the networks, but your users will still complain about the quality (just check out any of Fippo’s analysis posts)
- We deploy them on AWS. All the time. On virtual machines. Inside Docker containers. Layers and layers of abstraction. For a real time service. It it seems to work
VoIP is easy. It is standardized. Be it SIP, H.323, XMPP or whatever you bring to the table. You are meant to use a signaling protocol. Something someone else has thought of in the far dark rooms in some standards organization. It is meant to keep you safe. To support the notion and model of interoperability. To allow for vendor agnostic deployments.
WebRTC did away with all this, opting to not have a signaling protocol at all out of the box.
Some complain about it (mostly VoIP people). I’ve written about it some 4 years ago – about the death of signaling.
With WebRTC you make the decision on what signaling protocol you will be using. You can decide to go for a standards based solution such as SIP over WebSocket, XMPP over BOSH or WebSocket – or you can use a newly created signaling protocol invented only for your specific scenario – or use whatever you already have in your app to signal people.
As with anything in WebRTC, it opens up a few immediate questions:
- Should you use a standards based signaling protocol or a proprietary one?
- Should you built it on your own from scratch or use a third party framework for it?
- Should you host and manage it on your own or use it as a service instead?
All answers are now valid.#6 – Encryption and Privacy are MANDATORY
With VoIP, encryption was always optional. Seldom used.
I remember going to these interoperability events as a developer. The tests that almost never really succeeded were the ones that used security. Why? You got to them last during the week long event, and nobody got that part quite the same as others.
That has definitely changed over the years, but the notion of using encryption hasn’t. VoIP products are shipped to customers and deployed without encryption. The encryption piece is an optional configuration that many skip. Encryption makes it hard to use wireshark to understand what goes in the network, it takes up CPU (not much anymore, but still conceptually it is), it complicates things.
WebRTC on the other hand, has only encryption configured into it. No way to use it with clear RTP. even if you really really want to. Even if you swear all browsers and their communications run inside a secure network. Nope. can’t take security out of WebRTC.#7 – If it is New, WebRTC Will be Using it
When WebRTC came out, it made use of the latest most recent RFCs that were VoIP related in the media domain.
Ability to bundle RTP and RTCP on the same stream? Check.
Ability to multiplex audio and video on the same stream? Check.
Ability to send FIR commands over RTCP and not signaling? Check.
Ability to negotiate keys over DTLS-SRTP instead of SDES? Check.
There are many other examples for it.
And in many cases, WebRTC went to the extreme of banning the other, more common, older mechanisms of doing things.
VoIP was always made with options in mind. You have at least 10 different ways in the standard to do something. And all are acceptable.
WebRTC takes what makes sense to it, throwing the rest out the window, leaving the standard slightly cleaner in the end of it.
Just recently, a decision was made about supporting non-multiplexed streams. This forced Asterisk and all of its users to upgrade.
VoIP and SIP were never really that important to WebRTC. Live with it.#8 – Identity Management and Authorization are Tricky
There’s no identity management in WebRTC.
There’s also no clear authorization model to be heard of.
Here’s a simple one:
With SIP, the way you handle users is giving them usernames and passwords.
The user clicks that into the client and this gets used to sign up to the service.
With regular apps, it is easy to set that username/password as your TURN credentials as well. But doing it with WebRTC inside a browser opens up a world of pain with the potential of harvesting that information to piggyback on your TURN servers, costing you money.
So instead you end up using ephemeral passwords in TURN with WebRTC. Here’s an explanation how to do just that.
In many other cases, you simply don’t care. If the user already logged into the page, and identified and authenticated himself in front of your service, then why have an additional set of credentials for him? You can just as easily piggyback a mechanism such as Facebook connect, Twitter, LinkedIn or Google accounts to get the authentication part going for you.#9 – Route. Don’t Mix
If you come from VoIP, then you know that for more than two participants in a call you mix the media. You do it usually for audio, but also for the video. That’s just how things are (were) done.
But for WebRTC, routing media through an SFU is how you do things.
It makes the most sense because of a multitude of reasons:
- For many use cases, this is the only thing that can work when it comes to meeting your business model. It strikes that balance between usability and costs
- This in turn, brings a lot of developers and researchers to this domain, improving media routing and SFU related technologies, making it even better as time goes by
- In WebRTC, the client belongs to the server – the server sends the client as HTML/JS code. With the added flexibility of getting multiple media streams, comes an added flexibility to the UI’s look and feel as well as behavior
There are those who are still resistant to the routing model. When these people have a VoIP pedigree, they’ll lean towards the mixing model of an MCU, calling it superior. It will usually cost 10 times or more to deploy an MCU instead of an SFU.
Be sure to know and understand SFUs if you plan on using WebRTC.#10 – SBCs are Useless
Or at least not mandatory anymore.
Every. SBC. vendor. out. there. is. adding. WebRTC.
And I get it. If you’re building an SBC – a Session Border Controller – then you should also make sure it supports WebRTC so all these pesky people looking to get access through the browser can actually get it.
An SBC was an abomination added to VoIP. It was a necessary evil.
It served the purpose of sitting in the DMZ, making sure your internal network is protected against malicious VoIP access. A firewall for VoIP traffic.
Later people bolted on that SBC the ability to handle interoperability, because different vendor products never really worked well with one another (we’ve already seen that in #3). Then transcoding was added, because we could. And then other functions.
And at some point, it was just obvious to place SBCs in VoIP infrastructure. Well… WebRTC doesn’t need an SBC.
VoIP needs an SBC that handles WebRTC. But if you’re planning on doing a WebRTC based application that doesn’t have much of VoIP in it, you can skip the SBC.#11 – Ecosystem Created by the API and Not the Specification
Did I say 10 differences? So here’s a bonus difference.
Ecosystems in VoIP are created around the network protocol.
You get people to understand the standard specification of the network protocol, and from there you build products.
In WebRTC, the center is not the network protocol (yes, it is important and everything) – it is the WebRTC APIs. The ones implemented in the browsers that enable you to build a client on top. One that theoretically should run across all browsers.
That’s a huge distinction.
Many of the developers in WebRTC are clueless about the network, which is a shame. On the other hand, many VoIP developers think they understand the network but fail to understand the nuanced differences between how the network works in VoIP and in WebRTC.What’s Next?
If you have VoIP background, then there are things for you to learn when shifting your focus towards WebRTC. And you need to come at it with an open mind.
WebRTC seems very similar to VoIP – and it is – because it is VoIP. But it is also very different. In the ways it is designed, thought of and used.
Knowing VoIP, you should have a head start on others. But only if you grok the differences.
Need to warm up to WebRTC? Try my free WebRTC server side mini course.
And if you’re really serious, enroll to my Advanced WebRTC Architecture Course.
The post Why Developing With WebRTC is Different than VoIP Development? appeared first on BlogGeek.me.
See you in September.
Time for some downtime for me.
Not from work – got too many projects going on at the moment – updating my course, testRTC and some interesting customer projects I am involved with. I am also working on an offering around APIs. More on that later.
This means – no new writing here for the next couple of weeks.
See you all once I am back.
In the meantime, if you have any questions or needs around the things I write about, feel free to contact me. I’ll gladly help you find your way around this tech (and even focus my writing in the areas you are interested in).
I say it doesn’t matter what the technique is as long as you go through the motion of upgrading your WebRTC Media Servers…
Here’s the thing. In many cases, you end up with a WebRTC deployment built for you. Or you invest in a project until its launch.
And that’s it.Why Upgrade WebRTC Media Servers?
With WebRTC, things become interesting. WebRTC is still a moving target. Yes. I am promised that WebRTC 1.0 will be complete and published by the end of the year. I hear that promise since 2015. It might actually happen in 2017, but it seems browser vendors are still moving fast with WebRTC, improving and optimizing their implementations. And breaking stuff at times as they move along.
Add to that the fact that media servers are complex, and they have their own fixes, patches, security updates, optimizations and features – and you find yourself with the need to upgrade them from time to time.
Upgrade as a non-functional feature is important for your WebRTC requirements. I just updated my template, so you don’t forget it:
I’ll take it a bit further still:
- With WebRTC, the browser (your client) will get upgraded automatically. It is for your own safety This in turn, may force you to upgrade the rest of your infrastructure; and the one prone the most?
- Your WebRTC media server needs to be upgraded. First to keep pace with the browsers, but also and not less important, to improve; but also
- The signaling server you use for WebRTC. That one may need some polish and fine tuning because of the browser. It may also need to get some care and attention – especially if and when you start expanding your service and need to scale out – locally or geographically
- Your TURN/STUN servers. These tend to go through the least amount of updates (and they are also relatively easy to upgrade in production)
Great. So we need to upgrade our backend servers. And we must do it if we want our service to be operational next year.Talking Production
But what about production system? One that is running and have active users on it.
How do you upgrade it exactly?
Gustavo García in a recent tweet gave the techniques available and asked to see them by popularity:
Just curious about how do you upgrade your #WebRTC mediaservers?
— Gustavo García (@anarchyco) August 4, 2017
I’d like to review these alternatives and see why developers opt for “Draining first”. I’ll be using Gustavo’s naming convention here as well. I will introduce them in a different order though.#1 – Immediate Kill+Reconnect
This one is the easiest and most straightforward alternative.
If you want to upgrade WebRTC media servers, you take the following steps:
- Kill the existing server(s)
- Upgrade their software (or outright replace their machines – virtual or bare metal)
- Reconnect the sessions that got interrupted – or don’t…
This is by far the simplest solution for developers and DevOps. But it is the most disruptive for the users.
That third step is also something of a choice – you can decide to not reconnect existing sessions, which means users will now have to reconnect on their own (refresh that web page or whatever), or you might have them reconnected, either by invoking it from the server somehow or having the clients implement some persistency in them to make them automatically retry on service interruption.
This is also the easiest way to maintain a single version of your backend running at all times (more on that later).#2 – Active/Passive Setup
In an active/passive setup you’ll have idle machines sitting and waiting to pick up traffic when the active WebRTC media servers are down (usually for whatever reasons and not only on upgrades).
This alternative is great for high availability – offering uptime when machines or whole data centers break, as the time to migrate or maintain service continuity will be close to instantaneous.
The downside here is cost. You pay for these idle machines that do nothing but sit and wait.
There are variations of this approach, such as active-active and clustering of machines. Not going to go in the details here.
In general, there are two ways to handle this approach:
- Upgrade the passive machines (maybe even just create them just before the upgrade). Once all are upgraded, divert new traffic to them. Kill the old machines one by one as the traffic on them whanes
- Employ rolling upgrade, where you upgrade one (or more) machines each time and continue to “roll” the upgrade across your infrastructure. This will reduce your costs somewhat if you don’t plan on keeping 1:1 active/passive setup at all times
(1) above is the classic active/passive setup. (2) is somewhat of an optimization that gets more relevant as your backend increases in its size – it is damn hard to replace everything at the same time, so you do it in stages instead.
Note that in all cases from here on you are going to have at least two versions of your WebRTC media servers running in your infrastructure during the upgrade. You also don’t really know when the upgrade is going to complete – it depends on when people will close their ongoing sessions.
In some ways, the next two cases are actually just answering the question – “but what do we do with the open sessions once we upgrade?”#3 – Sessions Migration First
Sessions migration first means that we aren’t going to wait for the current sessions to end before we kill the WebRTC media server they are on. But we aren’t going to just immediately kill the session either (as we did in option #1).
What we are going to do, is have some means of persistency for the sessions. Once a new upgraded WebRTC media server machine is up and running, we are going to instruct the sessions on the old machine to migrate to the new one.
- We can add some control message and send it via our signaling channel to the clients in that session so they’ll know that they need to “silently” reconnect
- We can have the client persistently try to reconnect the moment the session is severed with no explanation
- We can try and replicate the machine in full and have the load balancer do the switchover from old to new (don’t try this at home, and probably don’t waste your time on it – too much of a headache and effort to deal with anyways)
Whatever the technique, the result is that you are going to be able to migrate rather quickly from one version to the next – simply because once the upgrade is done, there won’t be any sessions left in the old machine and you’ll be able to decommission it – or upgrade it as well as part of a rolling upgrade mechanism.#4 – Draining First
Draining first is actually draining last… let’s see why.
What we are going to do here is bring up our new upgraded WebRTC media servers, route all new traffic to them and… that’s about it.
We will keep the old machines up and running until they drain out of the sessions that they are handling. This can take a couple of minutes. An hour. A couple of hours. A day. Indefinitely. Depending on the type of service you have and how users interact with it will determine how long on average it will take for a WebRTC media server to drain its sessions with no service interruption.
A few things to ponder about here (some came from the replies to that original tweet):
- WebRTC media servers can’t hold too much traffic (they don’t scale to millions of sessions in parallel)
- With a large service, you can easily get to hundreds of these machines
- Having two installations running in parallel, one with the new version and one with the old will be very expensive to operate
- The more servers you’ll have, the more you’ll want to practice a rolling upgrade, where not all servers are upgraded at the same time
- You can have more than two versions of the WebRTC media server running in parallel in your deployment. Especially if you have some really long lived sessions
- You can be impatient if you like. Let session drain for an hour. Or two. Or more. And then kill what’s left on the old WebRTC media server
- Media servers might be connected to other types of services – not only WebRTC clients. In such a case, you’ll need to figure out what it means to kill long lived sessions – and maybe decouple your WebRTC media server to further smaller servers
Gustavo’s poll garnered only 6 answers, but they somehow feel right. They make sense from what I’ve seen and heard from the discussions I’ve had with many vendors out there.
And the reasons for this are simple:
- There’s no additional development on the client or WebRTC media servers. It is mostly DevOps scripts that need to reroute new incoming traffic and some monitoring logic to decide when to kill an empty old WebRTC media server
- There’s no service disruption. Old sessions keep running until they naturally die. New sessions get the upgraded WebRTC media servers to work on
If you are planning on deploying your own infrastructure for WebRTC (or have it outsourced), you should definitely add into the mix the upgrade strategy for that infrastructure.
This is something I overlooked in my WebRTC Requirements How To – so I just added it into that template.
Need to write requirements for your WebRTC project? Make sure you don’t miss out on the upgrading strategy in your requirements:
In recent years, we’ve seen a lot of hysteria going on around WebRTC. Mainly it being unsafe to use. So much so, that there are tutorials out there explaining how to disable it in every conceivable browser out there.
If you are developing a WebRTC application AND you care about the security of your service and the privacy of your users, make sure to review my WebRTC Security Checklist.
WebRTC is a real time communication technology that is embedded in the browser. It can access your camera and your microphone as well as share the contents of your screen. As such, it enables a browser (and web developers) access to a lot more resources on the device of an end user.
This boils down to two main risks:
- Your data can be stolen by nefarious people
- Your privacy can be breached by knowing more about your device
Here are a few scary ideas:
- If I can access your microphone, I’ll be able to record all of your conversations
- If I can access your camera, I’ll be able to snoop on you. Maybe take a nice recording of your intimate moments
- If I can access your screen remotely, I’ll be able to record what you’re doing. Maybe even control your mouse and keyboard remotely while at it?
With all the goodness WebRTC brings, who wants to be spied on by his own device?
Now, that said, we also need to understand two things here:
- The browser isn’t the only game in town to gaining this access to your data and actions
- There are measures put in place to limit the ability to conduct in such activities
This one I guess is mostly about tracking you over the internet. Which is what ad networks are doing most of the time.
WebRTC gives access to more elements that are unique, which makes fingerprinting of the device (and you) a lot more accurate. Or so they say.
The main concern here are around the exposure of private IP addresses to web servers. There are many out there who see these “IP leaks” as a serious threat. for most of humanity, I believe it isn’t, which is why I’ll gladly publish my private IP address here: 10.0.0.9.
There are other, more nuanced ways in which WebRTC can be used for fingerprinting, such enumerating the device list as part of your device’s unique identity. Which is a concern, until you review the accuracy of fingerprinting without even using WebRTC. Here are two resources for you to enjoy:
- Fingerprintjs2 – one of the many libraries available to fingerprint your browser. It doesn’t use WebRTC, although there’s an “intent” in there to add support to it
In this area, Apple with their new WebRTC support in Safari is leading the way in maintaining privacy. You can read about it in a recent article in the WebKit blog. Look specifically on the sections titled “ICE Candidate Restrictions” and “Fingerprinting”.Why is WebRTC the Safest Alternative?
If you are a developer looking for a real time communications technology to use in your application, or you are an IT person trying to decide what to deploy in your company, then WebRTC should be your first alternative. Always.
Here’s why.1. Browser vendors take care security seriously
There are 4 major browser vendors: Apple, Google, Microsoft and Mozilla
All of these vendors are taking care of security and patching their browsers continuously. In some cases, they even roll out new versions at breakneck speeds of 6-8 weeks, with security patches in-between.
If a security threat is found, it gets fixed fast.
While many other vendors can say that they are fixing and patching security threats fast – do they deploy them fast? Do they have the means to do so?
Since browsers get updated and upgraded so frequently, and to hundreds of millions of users, getting a security patch to the field happens rather fast. Philipp Hancke showed and explained some Here are some browser upgrade stats last year. This is from real users conducting appear.in sessions. I asked him to share a more recent graph, and here’s what they’ve had in the last two large browser version cycles for Chrome:
Look at the point in time when each Chrome version got ramped up from less than 30% to over 80% in a span of a couple of days. Chrome 59 is especially interesting. Also note that there are at most 2 versions of Chrome out there with over 95%+ of use. Since they routinely do it, patching and deploying security issues is “easy”.
The only other vendors who can roll out and deploy patches so fast? Operating system vendors (again we end up with Apple, Google and Microsoft), and application developers, through mobile app stores (which sums up to Apple and Google).
Nothing comes close to it.
Takeaway: Assume there will be security breaches or at the very least the need to patch security issues. Which means you should also plan for upgrade policies. Browsers are the best at upgrading these days.2. You don’t need to redeploy the client software
Lets face it – most users don’t disable the automatic update policy of their browsers. If you’re even remotely interested in security, you shouldn’t disable automatic update policies of ANYTHING.
Manual updates bring with them a world of pain:(a) When do you upgrade?
Here’s the thing.
How do you know an upgrade is in order? Are you on the list of threat alerts of all the software and middleware you are using in your company? Once a threat is announced and a patch is available – do you immediately upgrade?
When we leave this decision to a human, then he might just miss the alert. Or fail to upgrade. Or decide to delay. Just because… he’s human.
Most software can get updated, but usually won’t do it automatically or won’t do it silently. And automation in this area that is done externally, such as the Kaspersky Software Updater. It works, but up to a point and it also adds another headache to contend with and manage.
If a browser does that for you freely, why not use it?(b) What if this fails?
Did you ever get a software update to fail?
What about doing that in a company with 100+ employees?
If software fails to update 1% of the time, it means that every time you update something – someone will complain or just fail to update, making you revert back to a manual process.
There are tons of reasons why these processes fail, and most are due to the fact that we all have different firmware, software and device drivers on our machines (see fingerprinting above). This fact alone means that if a software isn’t running on millions of devices already, it will fail for some. I’ve seen this too many times when the company I worked for developed a plugin for browsers.
Anyone not using WebRTC and deploying via software installation will cause you grief here. If this is only in front of employees, then maybe that’s fine. But often times this is also with end user devices – and you don’t want to mess there.
Browser upgrades will fail a lot less often, so better use that and just make use of WebRTC instead of rolling your own proprietary solution.(c) What about edge cases?
You can’t control your employees and their whereabouts for your upgrades.
People working from home.
People traveling abroad.
People using BYOD and… not having tight enterprise policies on their own home laptop.
If you want less headache in this department, then again – using WebRTC will give you peace of mind that security patches get updated.Why?
Look at it this way, the engine of WebRTC will always stay secure when you rely on browser and browser updates.
You have control over the backend (or rely on a cloud service provider with an SLA you are paying for exactly for this reason). The backend gets updated for security patches all the time (or as much as you care). The browsers get updated automatically so you can think less about it.
Using proprietary software or legacy VoIP vendor software means you’ll need to patch both backend and client software. This is harder to do and maintain – and easier to miss.3. WebRTC has inherent security measures in place
This should probably be the first reason…
One thing you hear many complain about is questioning why WebRTC is always encrypted. Somehow, developers decided that sending media in the clear is a good thing. While there might be some reasons to do that, most of them are rather irrelevant for something like WebRTC, meant to be used on unmanaged networks.
WebRTC took the approach of placing its security measures first. This means:
- There’s no way send media in the clear. Everything is always encrypted. In other VoIP solutions, you can configure encryption on and off (if encryption is even there)
- There’s no way to use WebRTC in websites that aren’t served over HTTPS. This means WebRTC forces developers to use secure connections for signaling – and for the whole site. And no. Using iframes won’t work either
- Users are asked to allow access to the their media inputs. Each browser handles this one slightly differently, and these models also changes over time, but suffice to say that the idea here is again – to balance privacy of the users and the usability of the service
Me? I’d rather rely on the security measures placed in browsers. These go through the scrutiny of lots of people who are all too happy to announce these security flaws. Software from vendors that is specific to communications? A lot less so.
And yes. This isn’t enough. WebRTC is the building block used to build an application. A lot of what goes to the security of the finished service will rely on the developers who developed the application – but at least they got a head start by using WebRTC.Ads and WebRTC
There’s an angle that isn’t much discussed about WebRTC. And that’s the uses it finds in the ad business.The Bad
Two main scenarios that I’ve seen here:
- Fingerprinting. You get better means to know more about who the user behind the browser is
- Serving ads themselves. Theoretically, you might be able to serve ads via WebRTC, and that at the moment has the potential to circumvent ad blockers
There’s the second part of it. When ads are served today, the companies paying for these ads being served like to get their ROI. On the other hand, there are those who would like the money spent on ads to be wasted. So they use bots to click ads. Probably by automating selenium processes.
This is similar in concept to the “I am not a robot” type of entry measures and captchas out there. WebRTC gives another layer of understanding about the user and its behavior – and enables us to know if he is a human or a bot inside that browser. And yes. We can use it for things other than ad serving.Where do we go from here?
There are two main approaches to security:
- Security by obscurity – relying on people not knowing the protocol in place. It works great when you’re small and insignificant, so no one is going to care about you anyway. It falls apart when you become popular
- Kerckhoffs’ principle – a system needs to be secure even when we know everything about the system. It works best when many people scrutinize, analyze and try to hack such systems, making it better and more robust through time
WebRTC is in the second category (the first one – security by obscurity – is often criticized for being unsecure by nature).
With all the resources put into WebRTC from all angles, security is also being taken care of and not left behind.
WebRTC is safe to adopt as developers. IT and security people in the enterprise shouldn’t shy away from it either – just make sure the vendor you pick did a decent job with his implementation.
Are you doing what it takes to improve the security of your WebRTC application?
When it comes to different verticals and market niches, it seems like WebRTC can fit anywhere.
6 years in, and there are many who still question if WebRTC is the way to go with their use case. This is one of the reasons why I started the WebRTC Dataset. The idea behind it all was to showcase all the variations and services where WebRTC is being used.
Here’s an example for you.
Musicians of all kinds make use of WebRTC. They have services today that are geared towards their specific needs. And I am not talking only about replacing Skype with a marketplace or a searchable directory of experts that can help you take private guitar lessons online.
When I bumped into Profound Studio, I knew this is an area I’d like to write a bit more about, so here it goes.
What I will be doing in this article, is go over some of the vendors found in the WebRTC Dataset, collected over the years, who are playing a role in the sound/music industry in one way or another.
I won’t be picking favorites here – my own experience with music is rather dull – I like to hear music just like anyone else, but I don’t consider myself an expert or a fan of anything really. This means that we’ll be going in alphabetic order of the vendors.Care 2 Rock
Care 2 Rock is that we-teach-guitar-lessons use case with a twist.
The basic premise is having teaching music lessons of any kind online, through a video call. The twist is that this is a paid/voluntary act on the side of the teacher, who ends up teaching and mentoring a foster care kid in his community.Profound Studio
Profound Studio connects musicians with recording experts.
This is a marketplace for professionals – not for hobbyists. You can run live classes there or do consultation calls.sofasession
sofasession is all about musicians making music together online. The majority of it is done asynchronously, where each musician contributes and edits tracks of the final masterpiece, but they don’t need to be live together at the same time. And still, this kind of a use case can use WebRTC.
Here’s a job posting they had from two years ago:
We will use Kurento as media server and extending the service for multitrack mixing and reducing latency by developing latency reducing algorithms for serving content to clients connected via the webRTC protocol.
For the layman, handling audio in realtime in the browser to handle that mixing module that’s inside sofasession requires low latency. The best way to get there is to have something like WebRTC manage it – we are talking about real time here.
I am not sure if and how far they went with their WebRTC support. They do support live jam sessions, but there’s a need to download dedicated software for that. It does use UDP to work, so there still might be some WebRTC in there.Soundtrap
Soundtrap is similar to sofasessions. It too focuses on musicians collaborating online.
Based in Stockholm, where some of the Google WebRTC team are located, even got it to appear at Google I/O in 2014:StreetJelly
StreetJelly is about live performances. It allows artists to stream their music live to a global audience.
At the moment, performing and viewing is free. Viewers can tip performers if they want.
On the technical side, StreetJelly uses HTML5 video playback for the viewers and either Flash (now dead) or WebRTC (the new method for StreetJelly) to be able to broadcast the performances. They explain this further here.Shots in the dark
As with any set of startups, the vendors in this space don’t always succeed.
How will others fair? Time will tell.The Appeal
Music and WebRTC. There’s an appeal there.
VoIP was crap most of the time up until recently. There were two main reasons for it:
- The selection of voice codecs, with the dominant ones being narrowband codecs like G.711 or the G.729. When you did get a better sounding codec, it was either protected by patents, not suitable for real time or just stuck in wideband but still focusing on speech and not music
- These codecs aren’t usually error resilient. The moment you introduce packet loss (and these happen regularly), the audio quality suffers. So something had to be done in that front
WebRTC comes with Opus “out of the box”. A wideband codec suitable for music – not only speech, which is royalty free, low latency and error resilient. To top it all – it is mandatory in WebRTC and one of only two such codecs (the other one being the ridiculously crappy G.711). What’s not to like as a musician?Why is this important?
Well… here’s the kicker.
None of it is new.
VoIP and video calling could have done this all before WebRTC.
But it didn’t.
Costs. Barriers of entry. Finding talent.
WebRTC solves all that, which is why I categorize all of these vendors and many others as WebRTC vendors.
They don’t care about WebRTC as a technology – for them it is just means to an end, which is just fine.
But what about you?
Want to learn more about a specific market niche where real time communications be of use? Want to instantly find who’s there already and what they are doing?
You’d better take a look at the WebRTC Dataset. Especially today, before the earlybird discount ends.
Knowledge=Power. Which is why the WebRTC Dataset might be just what you need.A Quick History Lesson
You see numbers flying around about WebRTC all the time. One of them is the number of vendors using WebRTC. 1,200 might sound familiar in that context. Well… it comes from the WebRTC Dataset that I am maintaining.
It all started ages ago. I think it was Alan Quayle who made a shortlist of the companies that were using WebRTC that he knew about. That was somewhere in 2012. Which made me start my own Excel sheet. Which was then converted into a Google sheet. Which was then converted into a whole operation of how to find, catalog and update a dataset.
The reason? One of the main companies who are influencers in WebRTC wanted access to it and were willing to pay, so I made it into a product. Since then it had a few more customers who got exclusive ongoing access to this dataset, and now, I decided to repackage it in a different fashion, making it more accessible to more companies.What’s in the WebRTC Dataset?
The WebRTC Dataset itself is a collection of vendors and projects who are making use of WebRTC in one way or another. It can be anything from a healthcare service to an outsourcing vendor to a live streaming service or a contact center.
The list includes today around 1,200 vendors and counting – it grows and gets updated on a monthly basis.
You’ll find in the dataset vendors large and small. Anything from Google, Cisco and Facebook to small startups and even individual projects that are popular or interesting enough.
You’ll find there acquisitions made in the industry, with reasons behind them and my own indication of how related they are to WebRTC.
You’ll find there vendors who have shut down. Those who have pivoted and changed their focus.
When the information is publicly known, available or can be found online – the suppliers that are used by a vendor are also indicated.
Here’s an example vendor’s information from the WebRTC Dataset:
The page is split into several parts:
- The top part, where general information about the company/project can be found. Including things like size, ranking, status and external sources such as Crunchbase
- Then there’s the verbal description and notes, which gets updated through time as the offering evolves
- After that, different classifications. These are parameters that you can easily use to filter out or find similarly typed vendors in the dataset
- Then come links from other sources as well as my own blog and the latest tweets from that vendor
- Last but not least, a quick form in the end allows you to ask anything you have about this vendor and get direct answers from me
All over the web.
Since I am actively working on projects like WebRTC Index and WebRTC Weekly, I got to keep tabs with anything related to WebRTC. I go over the blogs of all the vendors using WebRTC and investigate anything that looks like RTC that I bump into in whatever it is that I am reading. On top of that, I use additional sources like Google Alerts and a few other trade secrets
And I’ve been doing this since 2013.
The data in the WebRTC Dataset got created along the way. First as a resource for me to use whenever I need research information on certain domains. And then because it made sense to package it as a distinct product of its own.
Whatever is on the WebRTC Dataset it is something you can go and find out on your own. But it will take you time. Lots and lots of time.What can you DO With the WebRTC Dataset?
Lots of things actually. It all depends on what it is you’re trying to gain.
Here are a few ideas and uses that people have been using it for already:
- Mark potential companies as leads for your salespeople – if you have a cool solution or service that can fit a certain segment, then you can find some interesting companies who might be interested in what you have to offer in this dataset
- Check out your competitors – find who they are. See who their known customers are. Compare them to your own company
- Find target markets and their size – need to decide where to put your focus? Should it be Healthcare or should it be Education? Should you offer a click to dial button as a service or go for a video chat widget instead? Who are the competitors in the niche you’re trying to carve for yourself? What are they doing?
- Understand market trends – here’s what Serge Lachapelle, who was Group Product Manager at Google heading their WebRTC efforts, had to say of the time he used the WebRTC Dataset:
The dataset enables me to understand where the WebRTC platform is going and make strategic roadmap decisions based on where the innovation and heavy usage lies. Being able to get an updated complete view of the market at any given point in time over a large set of criteria makes it easy to see trends in different industries and verticals that make use of WebRTC.
I am sure you’ll be able to find other ways to use it if you only think about it.
I use this WebRTC Dataset all the time. One of the things I use it for is my annual “WebRTC State of the Market” infographic.
If you want to see how the WebRTC Dataset feels like to use, then here’s a short video:I’m interested. Now what?
Access to the WebRTC Dataset comes at $2,400.
The WebRTC Dataset access gives you 1 month of access to all the vendors there. You’ll be able to download the main worksheet and use it after that month is up.
You can decide to purchase it at any point in time, just head to the WebRTC Dataset page.
While we’re at it – if you decide to purchase before the end of July (even if you plan on using it later on), there’s an early bird discount of $400. Just use coupon code DATASET-EARLYBIRD.
Where are we headed with WebRTC?
Google made an interesting announcement recently. It was about WebRTC 1.0 and Google’s own commitment to it. It seems we’ve come to a point in time when WebRTC is considered a done deal and the rest are just details – getting bugs fixed and polishing its performance.
I wanted to understand a bit more where we are headed, from the point of view of the company who lead the effort up until now. So I reached out to Niklas Blum, who is leading product management for WebRTC at Google, to answer a few of my questions.
How is it like to manage something like WebRTC at Google?
WebRTC is an exciting project. It is one of these kind of projects that are only possible at companies like Google and a few other places when you think of scale and impact of the technology. We started about 6 years ago as an open source project in Chrome and now WebRTC is providing the stack for an ecosystem for real-time communication services on Web. From a product management perspective there are tons of requirements impacting the platform – ranging from enterprise multi-party communications to p2p video calling on bad networks and even streaming services. It’s a very challenging and exciting time, with so many opportunities to further evolve the product.
What metrics do you use to gauge WebRTC’s success?
We have very practical metrics like number of API requests and amount of media/data being consumed in Chrome from users that opt-in to share this data with us. From a product perspective, I like to measure the impact of the technology on the Internet. You are tracking for example the number of projects and services that build with WebRTC. The latest update I got from you was around 1200 projects and companies. I think this is a great metric reflecting the success of WebRTC and the impact we achieved by open sourcing it.
You recently made an announcement in discuss-webrtc around WebRTC 1.0. Why now?
We have reaching our goal of having all the standards defined, and the technology is now stable enough for everyone to use. The web-based RTC ecosystem is becoming mature as more and more services that build on top of WebRTC are getting massive reach.
With Chrome, Edge, Firefox and Safari supporting WebRTC, about 80% of all installed browsers globally have now WebRTC build in. This is a big milestone for us as we are achieving our initial goal of making audio and video available in all browsers, through a uniform standardized set of APIs. Additionally, formerly application-focused communication services are transitioning towards the Web platform and adopting WebRTC.
About 80% of all installed browsers globally have now WebRTC build in
Click To Tweet
We believe that interoperability between different WebRTC browsers is now of key importance to continue growing the adoption of WebRTC. It’s also of key importance to provide stability and a common ground to services and companies for continue growing a user base and eventually a flourishing business.
6 years in. What would you say worked great with WebRTC and what needs some improvement?
Our original mission to bring secure p2p real-time communication to the web has become real. This by itself is major contribution to the Web platform and the team is incredibly proud of this achievement. Our current efforts can be split into two main categories:
- Finalize the specs in Chrome
- Provide enterprise-grade reliability
We are working very hard on performance and to iron out remaining reliability issues in Chrome to make WebRTC the solution of choice for enterprise-grade communication services. These efforts address bugs like missing audio-input from the microphone or when the the camera is not detected. We are also getting close to launching a completely new echo canceller in Chrome for desktop. This should significantly improve the call quality when no headset is used on various devices. Additionally, we have major projects aiming at removing glitches in the audio and video capture and rendering processes. We are porting these time and resource critical processes to Mojo, a new process framework in Chrome. This will allow us to have a much better real-time performance in Chrome.
Looking 2 years ahead. What should we expect to see coming to WebRTC? AV1? Support for AR? …
Google is a founding member AOMedia and very active in defining the AV1 bitstream. Once AV1 is finalized we will start work on adding it to WebRTC. AR/VR/Mixed Reality is a completely new technology space with the potential to change how we consume services and media fundamentally. But the platform and overall product/market-fit is still unclear. But adding AR/VR functionality to WebRTC is definitely an interesting idea.
An interesting opportunity for evolving WebRTC is to replace RTP with QUIC. Experimenting with QUIC as media transport protocol could reduce the transport-layer protocol overhead and provide integrated congestion control. We also consider using QUIC for the DataChannel that is being used a lot by p2p CDNs for content distribution. Generally, we believe that there are still quite a few opportunities for reinventing real-time communications.
Looking a bit further ahead, a new mobile network generation (5G) is emerging. Which role WebRTC will play here still needs to be identified. But generally, having more bandwidth and lower latency available will open the door to explore video resolutions >4K and massive parallel connections. Additionally, having new software-defined networking functionality exposed to the application-layer seems to be an interesting option to improve real-time communication services. It will be very interesting to see the opportunities for WebRTC here.
During your time as the product manager of WebRTC at Google. What was the thing that surprised you the most?
I am still surprised every day by the creativity of developers building great services on top of WebRTC and the value those provide to users. A company called Qbtech, for example, uses WebRTC in a product that assess symptoms of ADHD. While traditional methods for assessing ADHD typically use subjective rating scales from physicians, Qbtech provides objective measurements by analyzing motion tracking over video. After implementing WebRTC, they went from specialized hardware to a web application that could run on a normal computer — opening up access to this technology to smaller clinics, schools, and even rural providers that might not have the resources for more specialized solutions.
Of course, there are many other great services that use WebRTC, but it’s this kind of out of the box thinking to apply WebRTC beyond its original audio/video calling use case and the value that is created by it that always surprises me.
How can developers contribute to WebRTC?
We have received thousands of user feedback reports and feature requests in the WebRTC and Chromium trackers from the growing WebRTC developer community. This feedback has been extremely valuable to improve WebRTC overall and especially to make it more stable for production usage. Generally, developers can provide feedback at bugs.webrtc.org by filing bugs or feature requests. And if you want to do more – you can become contributors and help us polishing the codebase – either as an individual or a company.
Are you sure you’re ready with your WebRTC product launch?
Here’s the thing. If you want to have a successful launch at the end of the project, you should focus on the planning phase in the beginning. Oh – and your plan should be different if you are going to self develop it all in-house or have the communication parts of it outsourced to external vendors.
Too often people contact me when they have already budgeted the project, spent the money, have a “product” in hand but it is lacking.
Two extreme cases recently:
- A startup hired a development company who said they know WebRTC. They were so good that they said there is no need to use a media server for 8 participants in a session unless the session is being recorded (if you think that way too, then you are more than welcome to take my WebRTC course)
- Another startup got delivered a finished product. Just to find out it didn’t even come with a TURN server
We see this even more at testRTC, where I am a co-founder. Companies come to us because they think the product a third party developed for them doesn’t work as expected, and too often it breaks in ways that are unacceptable (like stressing the service with 20 browsers).
The problem is finding these issues too late in the game, and paying dearly for them.
There are lots and lots of images out there that illustrate the issue. I’ll use the one from Raygun:
We can dispute the multipliers. They don’t really matter. But here’s how a typical WebRTC product with outsourced development takes place:
- Someone writes down the requirements (amount of detail varies wildly here, and of course, you can use my requirements template)
- He sends a few development companies the RFP. Mostly, these will be sent to local developers, but sometimes to other vendors as well
- Once responses are back, he will pick one. Almost at random… (I know it isn’t random, but it will mostly lean towards cost, as the one in charge knows little in this domain anyway)
- Now we wait. Just like placing a cake in the oven and waiting for it to cook. Once the big day arrives, the customer plays with what he gets back and finds out there are some holes in it. Areas left uncovered, or just an impression of poor media quality
Here’s the problem. Steps (1), (2) and (3) are the Design phase. And no one took any real ownership of it in this scenario.
Step (4) is probably Development,Testing and Staging. And they were all left to an outsourcing vendor. Who is most likely looking at this project from a cost perspective as well – but he doesn’t really care if this gets launched or not. Not really.
The customer got to step (5) immediately. With no milestones along the way. No checkpoints to see if everything is done correctly.
And please don’t sell me the story of agile development and how that will save the day for this customer. With agile he sees the “results” every week or two. In every sprint. But does he really know if that first Design phase was done properly?
Do you think you are getting a stable product that can scale to the millions (or even thousands) of users you plan on having? Are you sure your contractor feels the same way and didn’t build you a proof of concept instead?
Two things to do NOW about your WebRTC project:#1 – Make sure you have a solid WebRTC architecture
Do you trust your vendor to build you the architecture you need?
You should do your homework or bring someone on “your side” that knows what he is doing.
Go now and look at the architecture you’ve been promised.
Take that architecture you’ve been given by the vendor and get a second opinion. It will be worth the time and effort.#2 – Register to this free webinar
At testRTC we’ve decided to host a free webinar that deals with these issues exactly. And me? I decided to announce it here as well because I think it fits my readers AND it gets more people to know about testRTC, which is another thing I am a part of.
The webinar will take place on July 25 at 14:30 EDT.
It will be recorded and available for playback, so register now.
See you then!
Time for someone to offer an updated view of the WebRTC Developer Tools Landscape.
Want to learn more about the WebRTC Developer Tools Landscape?
As I am on vacation this week with my family (in Barcelona – ping me if you’re there this week and want to meet and say hi), I will keep this one short.
A few years back, Brad Bush of Genband released a WebRTC Landscape infographic. While he has updated it a couple of times, he since then stopped, leaving the WebRTC space altogether. His site was also lost in the abyss that is the Internet of un-renewed domain names, now hosting a Japanese website promising “services by married women” (Google’s translation).
I figured it was time to create another WebRTC Landscape infographic, but decided to be a bit more focused. Covering the whole WebRTC scene is rather hard, especially trying to do it on the first attempt. Which is why I decided to cover only the WebRTC Developer Tools Landscape in my infographic.
So without further ado, here it is:
A few quick notes:
- Permanent, high resolution, updated version can be found here
- Need it as a PDF? Get it here
- Feel free to share this and use it where needed
- Did I miss you? Ping me and we’ll remedy that – I plan on updating this infographic every couple of months
- Want to learn more? There’s a free Virtual Coffee session next week that will be focused on this Infographic. Register now
See you next week in my Virtual Coffee
A Video chat widget is something you may want to add to your website. And while WebRTC is the only alternative at the moment, there are many ways in which this can get done.
For all intent and purpose, I will assume we’re going to focus on a WordPress site (like my own, which has no video chat widget because it isn’t needed for my business).
I will base this article on this Quora question and the brief answer I’ve given there:Which 3rd party software should I use for a video conferencing feature in my company’s intranet?
Company’s intranet using wordpress and we are looking for somehow a whitelabel solution (obviously not skype) which allow employee to be able to do video conferencing on their laptops and mobile devices. (Though it is intranet, it is hosted on the internet just that only for private use)My answer on Quora:
You need something based on WebRTC (more about WebRTC here).
One way to go, is to search the WordPress plugins directory – webrtc – WordPress Plugins
But – most of the plugins there are too old or dead already.
Here’s a suggestion of 3 different vendors that should work well (even if they don’t have a plugin for WordPress, these can be easily created for them – either by you, a freelancer or the vendor himself once you contact him):
- RumbleTalk – RumbleTalk: Online group chat room for websites
- Coviu – Coviu video consultations
- Gruveo – The World’s Easiest Video Calls – Gruveo
These services are very different and very similar to one another at the same time.
Now, if you fail to find something that meet your exact needs, contact me directly and I’ll be able to help you further.Intranet or Internet?
The question itself is about a company’s intranet.
Two options here:
- The deployment itself is on premise, hosted on the company’s data center and accessible only within the company’s local network or via a VPN
- The deployment itself is cloud based and the use of the term Intranet is just to indicate that this is for the company employees’ own consumption and not for customers or other people
The question’s clarification mentions that this is of the second type, but… I don’t really think it matters. The answers here will apply to both.
Also remember that:
- You can have WebRTC installed and operated within a company’s local network
- You can have a local installation configured to be access externally if and when needed (ugly, but possible)
- You can decide to have the WebRTC video chat part executed and hosted over the “cloud” even though everything else (including maybe signaling) is hosted in the company’s local network
Video chat in a website can take different shapes and sizes.
Here are a few of these:
- 1:1 video chat widget for potential prospects
- Video “phone” directory of employees, accessible to people coming to the website. This can be used as a meeting point of the employee and his online “smart” business card
- Meeting point of users. Not necessarily your employees, where they can discuss the content on the page or a specific topic
- Virtual conference rooms for group calls. Either internal for employees only or more likely to collaborate with external people as well
This list is probably missing a few more use cases – feel free to suggest them in the comments.
I intentionally left out the more glaring contact center use cases, as these got covered as part of the story of O2’s contact center from a few months ago.What Features do you need?
There are different requirements that you might be after.
Here are some for you to consider:
- 1:1 or group calls? Group calls would require investing more. I’ve seen too many instances recently where mesh was being marketed, promoted and sold as a great and usable solution for group calls (it isn’t. If someone tells you that, know you aren’t dealing with someone with any experience or enough scars)
- Screen sharing. Want it? Need it?
- Recording. Are you interested in recording these video calls? All of them? Pick and choose? Voice only? Not at all?
- Video message. Can a user record and leave a message if no one’s there?
- Lobby. Do you need a lobby where people can select the room they want to join? Create new rooms? Find someone specific they want to chat with?
- Waiting room. Do you need a virtual waiting room? A place where people need to wait until a moderator or someone verifiable joins?
- Authentication. How do people authenticate? Is it part of the solution or will that be done by the fact that they are already on the website? Do you want to be able to have identifiers for these people (you do know we all have names – right?)
- Escalate from text. Do you need the ability to escalate a text chat conversation to a video one or the fact that people are on the page and pressed a button is enough to get them in?
- Sessions log. Is there a sessions log somewhere where you can see the sessions that were made? Might be something you need
- Downtime. Any status page of the service? Any ability to know when that part of the service is down? This is especially important when this is going to be one of your sales or support channels (it means money is involved)
- Working hours. Do you need this to appear only in specific hours of the day? Either during working hours or on off-hours
- Branding. Do you need this to be white labeled and carry your logo, colors, look and feel, etc. Or are you content with having the logo of a third party in there (with a link to their site of course)
- Mobile app. Do you need a mobile app for employees for this? For visitors? What should be the functionality of that mobile app?
Not exhaustive, but should get you started.
If you need to write that down, then you can use my requirements template for it.How do you want this done?
There are a few WordPress plugins already available out there that use WebRTC and do “things”. Will these be suitable for your needs? I am not sure, but it is worth a shot.
If there aren’t, it doesn’t mean you can’t get it there. For a few 100’s of dollars, you can probably get a developer to wrap an existing solution as a WordPress plugin that is tailored for your needs.
And if you can’t find any existing solution that fits – you can have that developed for a larger sum of money and have it embedded as well.
I guess these are your choices if I had to put them on a graph:Where do we go from here?
Yes. WebRTC enables putting video chat into websites.
Yes. There are solutions for it.
But somehow, people have an appetite once they see the capabilities.
When you plan on adding WebRTC video chat into your website, decide on the features you need and the budget you have. If they don’t match, either increase the budget or accommodate to it by changing your feature set.
I will be talking later this month in my Virtual Coffee about the WebRTC Developer Tools Landscape. I won’t be touching WordPress directly, but this is something you don’t want to miss as it will touch the various development strategies and the tools out there at your disposal.
Join me for a free Virtual Coffee on the WebRTC Developer Tools Landscape. Register now
The post Video Chat for a Website with WebRTC. Where to Start? appeared first on BlogGeek.me.
It seems like we’re in an ongoing roller coaster when it comes to CPaaS. Tropo has just closed its doors. At least if you want to be a new customer (which is doubtful at best at this point in time).
Looking for the public announcement that got deleted? You can find it hereTropo and Cisco – Where it all started?
Two years ago (give or take a month), Cisco acquired Tropo. I’ve written about it at the time, trying to figure out what will Cisco focus on when it comes to Tropo’s roadmap. I guess I was quite wrong in where I saw this headed and where it went to…
The two questions I did ask in that article that were probably the most important were:
Will developers be attracted to Tropo or will they look for their solution elsewhere?
Was this acquisition about the technology? Was it about the existing customer base? Was it about the developer ecosystem?
We’ve got an answer to the first question – they went elsewhere. If this was a growing business, then developers would be attracted, but it wasn’t.
Looking back at my recent update to my WebRTC API report, it made perfect sense that this will be shut down soon enough. Just look at the investment made by Tropo and then Cisco into the platform:
Nothing serious was done for developers on the WebRTC part of the platform since September 2014, and then after the acquisition, focus of the Tropo developers went to Cisco Spark.
This answers my second question – this wasn’t about technology or customers or even the existing developer ecosystem of Tropo. It was about the developers of Tropo and their understanding of developer ecosystems. It was about talent acquisition for Cisco Spark. Why kill the running business that is Tropo? I really don’t know or understand. I also fail to see how such a signal can be a good one to developers who plan on using other Cisco APIs.
When an API vendor (CPaaS, WebRTC or other) discontinues its APIs or stops investing in developers. What…
Click To Tweet
I don’t know what was the rationale of Cisco to acquire Tropo in the first place. If all they wanted was the talent, then this seems like too big of a deal to make. If it was for the platform, then they sure didn’t invest in it the needed resources to make it grow and flourish.
Now, here’s why I think this move took place, and it probably happened at around the time of the acquisition and in parallel to it.UCaaS or CPaaS? The Shifting Focus
We’ve got two worlds in a collision course.
UCaaS – Unified Communications as a Service – the “modern” day voice and video communications sold to enterprises.
CPaaS – Cloud Communications as a Service – the APIs used by developers to embed communications into their products.
UCaaS has a problem. Its problem is that it always saw the world in monochrome. If you were an organization, then ALL of your communication needs get met by UCaaS (or UC, or convergence or whatever name it had before). And it worked. Up to a point. Go to most UC or UCaaS vendors’ websites and look at their solutions or industries section.
The screenshot above was taken from Polycom’s homepage (couldn’t find one on Cisco since their move to Cisco Spark). Now, notice the industries? Education, Healthcare, …
All of these industries are now shifting focus. Many of the solutions found in these industries that used to go to expensive UC vendors are now going to niche vendors offering solutions specifically for these niches. And they are doing it by way of WebRTC. Here are a few examples from interviews I’ve had done over the years on this site:
You’ll even find some of these as distinct verticals in my updated WebRTC for Business People free report.
The result? The UCaaS market is shrinking because of WebRTC and CPaaS.
The UCaaS market is shrinking because of WebRTC and CPaaS
Click To Tweet
Why is this happening? Three reasons that come to play here:
- Cloud. And the availability of CPaaS to build whatever it is that I want
- WebRTC (and CPaaS), that is downsizing this thing we call communications from from a service into a feature – something to be embedded in other services
- The rise of the gig economy. In many cases, a business is offering communications not between its employees at all, but rather between its users. Think Uber – drivers and riders. Airbnb – hosts and travelers. Upwork – employers and freelancers.
With the rising popularity of the dominant CPaaS vendors, there’s a threat to UCaaS: at the end of the day, UCaaS is just a single example of a service that can be developed using CPaaS. And you start to see a few entrepreneurs actually trying to build UCaaS on top of CPaaS vendors.
If I had to draw the relationship here, I’d do it in this way probably:
Just the opposite of what UC vendors have imagined in the last two decades. And a really worrying possibility of how the future may look like.
What should a UCaaS vendor do in such a case?
So now UCaaS vendors are trying to add APIs to serve as their CPaaS layer on top of their existing business. Most of them do it with the intent of making sure their UCaaS customers just don’t go elsewhere for communications.
Think of a bank that has an internal UCaaS deployment. He now wants to build a contact center. He has 3 ways of doing that:
- Go to a contact center vendor. Probably a cloud based one – CCaaS
- Build something using CPaaS. By himself. Using external developers. Or just someone that is already integrated with CPaaS
- Try and do something with his UCaaS vendor
If (3) happens, then great.
If (1) happens, then it is fine, assuming the UCaaS vendor offers contact center capabilities as well. And if he doesn’t, then there’s at least a very high probability that there’s no real competition between him and that CCaaS vendor
If (2) happens… well… that can be a threat in the long run.
And did we talk about Uber, Upwork and Airbnb? Yes. They have internal communications between their employees. Maybe there’s some UCaaS in there. But most of the communications generated by these companies don’t even involve their employees – only their customers and users. Where the hell do you fit UCaaS in there?
The APIs are there as a moat against the CPaaS vendors. Which is just where Cisco decided to place Tropo. As Cisco’s moat for its UCaaS offering – Cisco Spark. The Tropo team are there to make sure the APIs are approachable and create the buzz and developer ecosystem around it.
Quite an expensive moat if the end result was just killing Tropo in the process (the asset that was acquired).The Surprise Announcement
Last week, Tropo had a blog post published on their site and had it removed after a few hours.
Since Tropo publish the full post content into their RSS, and Feedly grabs the content and has it archived, the information in that post was not lost. I copied it from Feedly and placed it in a shared Google Doc – you can find it here.
The post is no longer on Tropo’s website. A few reasons why this may be:
- Someone higher up probably decided to do this quietly by contacting the most active customers only
- The post was published too early, and Cisco wanted to break the news a bit later
- The post got too many queries from angry developers, so they took it down, until they rewrite it to a more positive note
That last one is what I think is the real reason. And that’s because after reading it more than once, I still find it rather confusing and not really clear.
This announcement from Tropo has two parts to it:
- We’ve got your back with our great Cisco Spark commitment to developers (an API portal, a marketplace, ambassadors program and an innovation fund)
- Tropo is dead if you aren’t on to Cisco Spark. But we will honor our SLA terms (because we must)
That first part is about CPaaS serving UCaaS. Cisco Spark is UCaaS with a layer of APIs that Cisco is now trying to grow.
It is also about damage control – if you’re a Tropo user, then this whole acquisition thing by Cisco probably ended bad for you. So they want to remind you about the alternatives they now offer.
The second part is about how they see current Tropo developers:
- If you want to onboard to Tropo as a new customer, then tough luck. Doors are closed. Go elsewhere. Or go enroll to the Cisco Spark ISV Program
- If you are just trying the system out and have an account in there that has no apps in production environment (i.e – you are a freeloader and not a paying customer) – you will be thrown out, no questions asked
- If you are an existing paying customer running in production, then Tropo “will honor all contractual obligations”
This reads to me like the acquisition of AddLive by Snapchat. First, the service was acquired and closed its doors to new customers. It honored existing contracts. And it shut down its platform to all existing customers this year – once all contracts have ended.
- Don’t expect Tropo to renew existing paying contracts
- Expect Cisco to push paying Tropo customers to Cisco Spark instead
- Tropo won’t add new features, and the existing ones, including any support to them will deteriorate, but will still fit into the SLA they have in place
Tropo was predominantly about voice and SMS.
It was also the biggest competitor of Twilio some 6-7 years ago, when both with comparable in size and focus. Or at least until Tropo shifted its focus from developers to carriers.
There are two places where Tropo customers can go:
- Cisco Spark. I think this won’t happen in droves. Some will be headed there, but most will probably go look for a different home
- Other CPaaS vendors. Mostly Twilio, but some will go to Nexmo, Plivo or Sinch. Maybe even TeleSign
The big winner here is Twilio.
The big losers are the developers who put their fate in Tropo.CPaaS: Buyers Beware
As I always say, the CPaaS market is a buyers beware one.
Developers can and should use CPaaS to build their applications. It will reduce a lot of the upfront costs and the risks involved in starting a new initiative. It gives you the ability to explore an idea, fine tuning and tweaking it until you find the right path in the market. But it comes at a price. And I don’t mean the money you need to put in order to use CPaaS. I mean the risk involved in the fluidity of this market.
CPaaS and WebRTC APIs. A buyers beware market
Click To Tweet
The trick then, is to pick a CPaaS vendor that doesn’t only fit your feature requirements, but is also here to stay. And it is hard to do that.
Large companies are not necessarily the right approach – Cisco/Tropo as an example. Or AT&T shutting down its WebRTC APIs.
Small startups aren’t necessarily the answer either – they might shut down or pivot. SightCall pivoted. OpenClove has silently closed doors.
Dominant players might not be a viable alternative – if you consider Tropo such a player, and Cisco as a sure bet of an acquirer – and where it now ended.
You can reduce these risks if you know who these CPaaS vendors are and where they are headed. If you follow them for quite some time and understand if and what they are investing on. And if you need help with that, just follow my blog or contact me.
The post Another One Bites the Dust: Tropo Closes its Doors to New Customers appeared first on BlogGeek.me.
Too many WebRTC open source projects. Not enough good ones.
Ever went to github to search for something you needed for your WebRTC project? Great. Today, there’s almost as many WebRTC github projects as there are WebRTC LinkedIn profiles.
Today, there’s almost as many WebRTC github projects as there are WebRTC LinkedIn profiles.
Click To Tweet
Some of these code repositories really are popular and useful while others? Less so.
Here’s the most glaring example for me –
When you just search for WebRTC on github, and let it select the “Best match” by default for you, you’ll get PubNub’s sample of using PubNub as your signaling for a simple 1:1 video call using WebRTC. And here’s the funny thing – it doesn’t even work any longer. Because it uses an old PubNub WebRTC SDK. And that’s for an area that requires less of an effort from you anyway (signaling).
Let’s assume you actually did find a WebRTC open source media server that you like (on github of course). How do you know that it is any good? Here are 10 different signals (not WebRTC ones) that you can use to make that decision.
Need to pick a WebRTC media server framework? Why not use my Free Media Server Framework Selection Worksheet when checking your alternatives?Get the worksheet1. Do You Grok the Code?
If you are going to adopt an open source media server for your WebRTC project then expect to need to dive into the code every once in awhile.
This is something you’ll have to do either to get the darn thing to work, fix a bug, tweak a setting or even write the functionality you need in a plugin/add-on/extension or whatever name that media server uses for making it work.
In many of the cases I see when vendors rely on an open source WebRTC media server framework, they end up having to dig in its code, sometimes to the point of taking full ownership of it and forking it from its baseline forever.
To make sure you’re making the right decision – check first that you understand what you’re getting yourself into and try to grok the code first.
My own personal preference would be a code that has comments in it (I know I have high standards). I just can’t get myself behind the notion of code that explains itself (it never does). So be sure to check if the non-obvious parts (think bandwidth estimation) are commented properly while you’re at it.2. Is the Code Fresh?
Apple just landed with WebRTC. And yes. We’re all happyyyyy.
But now we all need to shift out focus to H.264. Including that WebRTC media server you’ve been planning to use.
Oh – and Google? They just announced they will be migrating slowly from Plan B to Unified Plan. Don’t worry about the details – it just changes the way multiple streams are handled. Affecting most group calling implementations out there.
And there was that getstats() API change recently, since the draft specification of WebRTC finally decided on the correct definition of it, which was different than its Chrome implementation.
The end result?
Code written a year or two ago have serious issues in actually running.
Without even talking about upgrades, updates, security patches, etc.
Just baseline “make it work” stuff.
When you check that github page of the WebRTC media server you plan on adopting – make sure to look when it was last updated. Bonus points if you check what that update was about and how frequently updates take place.3. Anyone Using It?
Nothing like making the same mistakes others are making.
Err… wrong expression.
What you want is a popular open source WebRTC media server. Anything else has a reason, and that reason won’t be that you found a diamond in the rough.
Go for a popular framework. One that is battle tested. Preferably used by big names. In production. Inside commercial products.
Why? Because it gives you two things you need:
It gives you validation that this thing is worth something – others have already used it.
And it gives you an ecosystem of knowledge and experience around it. This can be leveraged sometimes for finding some freelancers who have already used it or to get assistance from more people in the “community”.
I wouldn’t pick a platform only based on popularity, but I would use it as a strong signal.4. Is This Thing Documented?
What you get in a media framework for WebRTC is a sort of an engine. Something that has to be connected to your whole app. To do that, you need to integrate with it somehow – either through its APIs when you link on it – or a lot more commonly these days by REST APIs (or GraphQL or whatever). You might need both if you plan on writing your own addon/extension to it.
And to do that, you need to know the interface that was designed especially for you.
Which means something written. Documentation.
When it comes to open source frameworks, documentation is not guaranteed to be of specific quality – it will vary greatly from one project to another.
Just make sure the one you’re using is documented to a level that makes it… understandable.
If possible, check that the documentation includes:
- Some introductory material to the makeup and architecture of the project
- An API reference
- A few demos and examples of how to use this thing
- Some information about installation, configuration, maintenance, scaling, …
The more documentation the better off you will be a year down the road.5. Is It Debuggable?
WebRTC is real time and real time is hard to debug.
It gets harder still if what you need to look at isn’t the signaling part but rather the media part.
I know you just LOVE adding your own printf and cout statements in your C++ code and try reproducing that nagging bug. Or better yet – start collecting PCAP files and… err… read them.
It would be nice though if some of that logging, debugging, etc would be available without you having to always add these statements into the code. Maybe even have a mechanism in place with different log levels – and have sensible information written into the logs so the time you’ll need to invest in finding bugs will be reduced.
Also – make sure it is easy to plug a monitoring system to the “application” metrics of the media server you are going to use. If there is no easy way to test the health of this thing in production, it is either not used in production or requires some modifications to get there. You don’t want to be in that predicament.
While at it – make sure the code itself is well documented. There’s nothing as frustrating (and as stupid) as the self explanatory code notion. People – code can’t explain itself – it does what it does. I know that the person who wrote that media server was god incarnate, but the rest of us aren’t. Your programmers are excellent, but trust me – not that good. Pick something maintainable. Something that is self explanatory because someone took the time to write some damn good comments in the tricky parts of the code. I know this one is also part of grokking the code, but it is that important – check it twice.
For me, the ability to debug, troubleshoot and find issues faster is usually a critical one for something that is going to get into my own business. Just a thought.6. Does it Scale?
Media servers are resource hogs (check this video mini series for a quick explanation).
This means that in most likelihood, if your business will become successful in any way, you will need more than a single media server to run at “scale”.
You can always crank it up on Amazon AWS from m4.large up to m4.16xlarge, but then what’s next?
At the end of the day, scaling of media servers comes down to adding more machines. Which is simple enough until you start dealing with group calls.
Here’s an example.
- Assume a single machine can handle 100 participants, split into any group type (I am simplifying here)
- And we have 10 participants on average in each call
- Each group call can have 2 participants, up to… 50 participants
Now… how do we scale this thing?
How many machines do we need to put out there? When do we decide that we don’t add new calls into a running machine? Do we cascade these machines when they are fully booked? Throw out calls and try to shift them to other machines?
I don’t know, but you should. At least if you’re serious about your product.
You probably won’t find the answers to this in the open source WebRTC media server’s documentation, but you should at least make sure you have some reasonable documentation of how to run it at scale, and not as a one-off instance only.7. What Languages Does it Use?
They wrote it in Kotlin.
Because that’s the greatest language ever. Here. Even Google just made it an official one for Android.
What can go wrong?
Two things I look for in the language being used:
- That the end result will be highly performant. Remember it’s a resource hog we’re dealing with here, so better start with something that is even remotely optimized
- That I know how to use and have it covered by my developers
Some of these things are Node.js based while other are written in Java. Which one are your developers more comfortable with? Which one fits better with your technology plans for your company in the next 5 years?
If you need to make a decision between two media servers and the main difference between them for you is the language – then go with the one that works better for your organization.8. Does It Fit Your Signaling Paradigm?
Three things you need in your WebRTC product:
- Signaling server
- STUN/TURN server
- Media server
That 3rd one isn’t mandatory, but it is if you’re reading this.
That media server needs to interact with the signaling server and the STUN/TURN server.
Sure. you can use the signaling capabilities of the media server, but they aren’t really meant for that, and my own suggestion is not to put the media server publicly out there for everything – have it controlled internally in your service. It just doesn’t make architectural sense for me.
So you’ll need to have it interacting with your signaling server. Check that they both share similar paradigms and notions, otherwise, this in itself is going to be quite a headache.
While at it – check that there’s easy integration between the media server you’re selecting and the STUN/TURN server that you’ve decided to use. This one is usually simple enough, but just make sure you’re not in for a surprise here.9. Is the License Right For You?
BSD? MIT? APL? GPL? AGPL?
What license does this open source WebRTC media server framework comes with exactly?
Interestingly, some projects switch their license somewhere along the way. Jitsi did it after its acquisition by Atlassian – moving from LGPL to a more permissive APL.
The way your business model looks like, and the way you plan to deploy the service are going to affect the types of open source licenses you will want and will be able to adopt inside your service.
There are different types of free when it comes to open source licenses.
Click To Tweet
Every piece of code you pick and use needs to adhere to these internal requirements and constraints that you have, and that also includes the media server. Since the media server isn’t something you’ll be replacing in and out of place like a used battery, my suggestion is to pick something that comes with a permissive open source license – or go for a commercial product instead (it will cost you, but will solve this nagging issue).
I’ve written about open source licenses before – check it out as well.10. Is Anyone Offering Paid Support For It?
Yes. I know.
You’re using open source to avoid paying anyone.
And yet. If you check many of the successful and well maintained open source projects – especially the small and mid-sized ones – you will see a business model behind them. A way for those who invest their time in it to make a living. In many cases, that business model is in support and custom development.
Having paid support as an option means two things:
- Someone is willing to take ownership and improve this thing and he is doing it as a day job – and not as a hobby
- If you’ll need help ASAP – you can always pay to get it
If no one is offering any paid support, then who is maintaining this code and to what end? What would encourage them to continue with this investment? Will they just decide to stop investing in it next month? Last month? Next year?Making the Decision
I am not sure about you, but I love making decisions. I really do.
Taking in the requirements and the constraints, understanding that there’s always unknowns and partial information. And from there distill a decision. A decisive selection of what to go with.
You can find more technical information on media servers in this great compilation made by Gustavo Garcia.
After you take the functional requirements that you have, and find a few suitable open source WebRTC media frameworks that can fit these requirements, go over this list.
See how each of them addresses the points raised. It should help you get to the answer you are seeking about which framework to go with.
Towards that goal, I also created a Media Server Framework Selection Sheet. Use it when the need comes to select an open source WebRTC media server framework for your project.Get the worksheet
The post 10 Tips for Choosing the Right WebRTC Open Source Media Server Framework appeared first on BlogGeek.me.
Last week WWDC was a happy occasion for those who deal with WebRTC. For the first time, we got the official word – and code – of WebRTC on Apple devices – WebRTC iOS and WebRTC Safari support is finally here.
I spent the time since then talking to a couple of developers and managers who either tinkered with Safari 11 already or have to make plans in their products for this development.
I came out a bit undecided about this whole thing. Here are where things stand.Apple’s Coverage
The WebKit site has its own announcement – as close as we’ll ever get to an official announcement from Apple with any detail it seems.
What I find interesting with this one (go read it – I’ll wait here):
- Google isn’t mentioned (god forbid). But their “LibWebRTC open source framework” is. Laughed my pants off of this one. The lengths companies would go to not mention one another is ridiculous
- WebRTC in WebKit isn’t mentioned. Two years ago I opined it wouldn’t move the needle. And it seems like it didn’t – or at least not enough for Apple to mention it in this WebKit announcement
- TokBox and BlueJeans are mentioned as early beta. TokBox I understand. Can’t miss one of the dominant players in this space. But BlueJeans? That was a surprise to me
First things first, here are some posts that were published already about Apple’s release of WebRTC Safari (in alphabetical order):
- Peer5 – great for the industry. And how it makes sense when Flash is dying and an alternative low latency solution is needed in the browser
- Quobis – checked their Sippo status. Interesting to see what works and doesn’t “out of the box”
- The New Dial Tone – hold on with the party. Amir Zmora details what’s included in Apple’s WebRTC as we know it now
- TokBox – excited about WebRTC in iOS 11 and Safari. Mostly due to the fact that this solves spontaneous use cases
- WebRTC.is – WebRTC ships in MacOS and iOS 11. Eric Lagerway wakes up after a year of non-activity on his blog
- WebRTC.ventures – WebRTC support in Safari 11. Arin Sime weighs in on what to expect in the short-mid term (=nothing yet), but sees a bright future
I’ve ignored a few generic posts that just indicated WebRTC is out there.
Most relevant Twitter thread on this?
— Saúl Ibarra Corretgé (@saghul) June 6, 2017Who’s Missing?
Well… the general media.
I haven’t really seen the keynote. I find the Apple keynotes irrelevant to my needs, so prefer not to waste my time on them. My understanding is that WebRTC took place there as a word in a slide. And that’s HUGE! Especially considering the fact that none of the general technology media outlets cared to mention it in any way.
Not even the general developers media.
What was mentioned instead were things like the iOS file manager. That seems a lot more transformative.
This isn’t a rant about general media and how they miss WebRTC. It is a rant about how WebRTC as a technology has not caught the attention at all, outside a little circle of people.
Is it because communications is no longer interesting?
At a large developers event here in Israel on that same week, there were 6 different tracks: Architecture, Full stack, Philosophy, Big Data, IOT, Security. No comms.
Not sure what the answer…
On one hand, WebRTC is transformative and important. On the other hand, it is mostly ignored. #iOS
Click To Tweet
WebRTC Safari Support
So what did get included in the WebRTC Safari implementation by Apple?
- The basics. GetUserMedia and PeerConnection
- One-on-one voice and video calls seem to work well across browsers AND devices (including things like Safari to Edge)
- Opus is there for audio
- H.264 only for video. There is no VP8 or VP9. More on that later
- The code supports Plan B, if you are into multiparty media routing
- Data Channel is supported, but too buggy to be used apparently
- No screen sharing. Yet
- This is both for the desktop and mobile:
- WebRTC Safari support is there for the macOS in the new version of safari
- WebRTC iOS support is there for iOS 11 – via the Safari web browser, and maybe(?) inside WebViews
This is mostly expected. See below why.WebKit, Blink and WebRTC
A long time ago, in a galaxy far far away. There was a browser rendering engine called WebKit. Everyone used WebKit. Especially Google and Apple. Google used WebKit for Chrome. And Apple used WebKit for Safari.
And then one day, in the middle of the development of WebRTC, Google decided enough is enough. They forked WebKit, renamed it to Blink, removed all excess code baggage and never looked back.
Apple never did care about WebRTC. At least not enough to make this thing happen until last week. I hope this is a move forward and a change of pace in Apple’s adoption of WebRTC.
Here’s what I think Apple did:
- Seems like they just took the Google code for WebRTC and hammered at it until it fit nicely back into WebKit (ignoring WebRTC in WebKit in the process)
- How did they modify it? Remove VP8. Add H.264 by hooking it up with the hardware codec in iOS and on the Mac
- And did the rest of the porting work – connecting devices, etc.
- Plan B is there because. Well. Google uses Plan B at the moment. And it just stands to reason that the code Apple had was Plan B
When it comes to WebRTC, the question is one of browser interoperability. There aren’t many browser vendors (I am counting four major ones).
The basics seem to work fine. If you run a simple peer-to-peer call between any of the 4 browsers, you’ll get a call going. Voice and video. The lowest common denominator for that seems to be Opus+H.264 due to Safari. Otherwise, Opus+VP8 would also be a possibility.
The challenge starts when what you’re trying to do is multiparty. While H.264 is supported by all browsers, the ability to use simulcast with H.264 isn’t. At the moment, Chrome doesn’t support simulcast with H.264. The current market perception is that multiparty video without simulcast is meh.
Doing group calling?
- Go for audio only
- Force everyone to use H.264 if you need Safari (either as a general rule or when the specific session has someone using Safari) – and understand that simulcast won’t be available to you
Now it is going to become a matter of who blinks first: Google by adding H.264 simulcasting to Chrome; or Apple by adding VP8 to Safari.The Next Video Codec War
Which leads us to the next frontier in the video codec wars.
If you came in late to the game, then know that we had over 3 years of continuous bickering regarding the mandatory video codec in WebRTC. Here’s the last I wrote about the codec wars when the Alliance of Open Media formed some two years ago.
At the time, both VP8 and H.264 were defined as mandatory video codecs in WebRTC. The trajectory was also quite obvious:
After H.264 and VP8, there was going to be a shift towards VP9 and then a leap towards AV1 – the new video codec currently being defined by the Alliance of Open Media.
Who isn’t in the alliance? Apple.
And it seems that Apple decided to bank on HEVC (H.265) – the successor of H.264. This is true for both iOS and macOS. This is done through hardware acceleration for both the encoder and the decoder, with the purpose of reducing storage requirements for video files and reducing bandwidth requirements for streaming.
But it goes to show where Apple will be going with WebRTC next. They will be adding HEVC support to it, ignoring VP9 altogether.
There’s a hefty cost in taking that route:
- H.264 is simple yet expensive – when you use it, you need to pay up royalties to a single patent pool – MPEG-LA
- HEVC is complex AND expensive – when you use it, you need to pay up royalties for MULTIPLE patent pools – MPEG-LA, HEVC Advance, Velos Media. Wondering which one you’ll need to pay for and how much? Me too
Which is why I think Apple is taking this route in the first place.
Apple has its own patents in HEVC, and is part of the MPEG-LA patent pool.
And it knows royalties is going to be complex and expensive. Which makes this a barrier for other vendors. Especially those who aren’t as vertically integrated – who needs to pay royalties here? The chipset vendor? The OS vendor? The handset manufacturer?
By embedding HEVC in iOS 11 and macOS High Sierra, Apple is doing what it does best – differentiates itself from anyone else based on quality:
- It has hardware acceleration for HEVC. Both encoding and decoding
- It starts using it today on its devices, and “magically” media quality improves and/or storage/network requirements decrease
Google, and Android by extension, won’t be adding HEVC. Google has taken the VP9 route. But in VP9 most solutions are software based – especially for the encoder. Which means that using VP9 eats up CPU. Results look just as good as HEVC, but the cost is higher on CPU and memory needs. Which means an “inferior” solution.
Which is exactly what Apple wants and needs. It just doesn’t care if interoperability with other devices requires lowering quality as the perception of who’s at fault will fall flat on Android and Google and not on Apple.
Don’t expect to see VP9 or AV1 anytime soon in Apple’s roadmap. Not for WebRTC and not for anything else.
Dan Rayburn covers the streaming side (non-WebRTC) of this HEVC decision quite nicely on StreamingMedia.
Oh, and while at it, Jeremy Noring wrote a well thought rant about this lack of support for VP8 and VP9. His suggestion? Go vote for bug #173141 on WebKit. It probably won’t help, but it will make you feel a bit better
The only way I see this being resolved? If Google retracts their support for H.264 and just blatantly removes it from Chrome until Apple adds VP8 to Safari. I’ll be happy to see this happening.FaceTime , Multiparty and WebRTC
Apple has FaceTime.
And FaceTime probably doesn’t use WebRTC. I am not sure if it ever will.
When Google came out with WebRTC, they had the Hangouts (now Meet) team about a year behind in its adoption of WebRTC as their underlying technology – but the intent and later execution was there.
When Microsoft came out with WebRTC, Skype didn’t support WebRTC. But they did launch Skype for Linux which is built somewhat on top of WebRTC, and Skype for Web which is taking the same route. Call it ORTC instead of WebRTC – they are one and the same as they are set to merge anyway.
Apple? Will they place FaceTime on top of WebRTC? I see no incentive there whatsoever.
Can Cisco change this? Rowan Trollope broke the news titled “Cisco and Apple Announce New Features” that WebEx and Cisco Spark now work on Safari – no download needed. I’ll translate it for you by adding the missing keyword here – WebRTC. Cisco is using WebRTC to do that. And since their stack is built atop H.264, they got that working on Safari.
Cisco and Apple here is interesting. While Cisco mentions this in the headline as if these new features were done jointly, Apple isn’t really acknowledging it. There’s no quote from an Apple representative, and at the same time, Cisco isn’t mentioned in the WebKit announcement – TokBox and BlueJeans are.
Back to FaceTime. Which is a 1:1 video chat service.
And the fact that many look into group video chat and other multiparty video configurations.
Will Apple care enough to support it well in WebRTC? Will it move from Plan B to Unified Plan? Will it care about simulcast? Will it invest in SVC? Will it listen and work with Cisco on its multiparty needs for the benefit of us all?Older Devices
Apple will not be upgrading iPhone 5 devices to iOS 11. That’s a 5 years old device.
In many requirement documents I see a request to support iPhone 4.
Will this bump the general audience out there to focus on iPhone 6 and upwards, seeing what Apple is doing as well? Will this mean vendors will need to port WebRTC on their own to support older devices?
Time will tell, but I think that switching to iPhone 6 and above and focusing there makes sense.Chrome/Firefox support on iOS
Here’s a question for you.
If you use Chrome or Firefox on iOS. And open a URL to a site using WebRTC. Will that work?
Here’s the catch.
The reason there was no real browser for iOS that supported iOS until today? Apple mandates WebKit as the rendering engine on any browser app that comes to its AppStore.
Now that WebKit is getting official WebRTC support – will Chrome and Firefox add WebRTC support to their iOS browsers?
And if they do – you’ll be getting the Apple restrictions. I can just see the WebRTC developer teams at Google and Mozilla cringing at this thought.
This can get really interesting if and when Apple decides to add HEVC support to WebRTC in its WebKit implementation of iOS. You’ll get Chrome on iOS with H.264 and HEVC and Chrome everywhere else with H.264, VP8 and VP9.
Fun times.What Should Developers Do?
Here’s what you’ve been waiting for. The question I’ve been asked multiple times already:
Do I need to build an app? Should I wait?
The suggest at the moment is wait. Question is for what and until when exactly.
If you are looking for a closed app and planning on developing native, then go with whatever worked for you until today. This news item just isn’t for you.
If you are looking for browser support on iOS, then go with Safari and plan on enabling H.264 video codec in your service. Don’t wait up for VP8.
If you want something that will be cross platform app development using HTML5, then wait. Webview WebRTC support in iOS is still an unknown. If it gets there in the coming months then waiting a few more minutes probably won’t make a real difference for you anyway.My Updated Cheat Sheet
As it is, this change with Safari, iOS and macOS required some necessary updates in my resources.
First to update is the WebRTC Device Cheat sheet. You can find the updated one in the same download page.
One last thing –Join my and Philipp Hancke for Virtual Coffee
I planned for a different Virtual Coffee session this month. One about developer tools. It got bumped into July.
The session takes place on Monday, June 19 at 15:30 EDT.
It is free to join, but will not be available later as a recording (unless you are a customer).
And while at it – don’t mix signaling with NAT traversal.
Somehow, many people are asking these question in different phrasing, taking different angles and approaches to it. The thing is, if you want to build a robust production worthy service using WebRTC, you need to split these three entities.If you haven’t already, then I suggest you check out my free 3-part video mini course on WebRTC servers.
Now, let’s dive into the details a bit –Signaling Servers
Signaling servers is something we all have in our WebRTC products.
Because without them there’s no call. At all. Not even a Hello World example.
It is that simple.
You can co-locate the signaling server with your application server.
Here are a few things that you probably surmised about these servers already:
- You can scale a single server to handle 1000’s or event 100,000’s of connections and sessions in parallel
- These servers must maintain state for each user connected to them, making them hard to scale out
- Oftentimes, decisions that take place in these servers rely on external databases
- Latency of a couple 100’s of milliseconds is fine for these servers, but it is rather easy to be abusive and have that blown out of proportion if not designed and implemented properly (a few high profile services that I use daily come to mind here)
Did I mention signaling servers are written in higher level languages? Java, Node.js, Rails, Python, PHP (god forbid), …NAT Traversal Servers
STUN and TURN is what I mean here.
And yes, we usually cram STUN along with TURN. TURN is the resource hog out of the two, but STUN can be attached to the same server just because they both have the same general purpose in life (to get the media flowing properly).
This is why I will ignore STUN here and focus on TURN.
Sometimes, people forget to TURN. They do so because WebRTC works great between two browser tabs or two people in the same office without the need for TURN, and putting Google’s STUN server URL is just so simple to do… that this is how they “ship” the product. Until all hell breaks loose.
TURN ends up relaying media between session participants. It does that when the participants can’t reach each other directly for one reason or another. This kind of a relay mechanism dictates two things:
- TURN will eat up bandwidth. And a lot of it
- Your preference is to place your TURN server as close it to the participant as possible. It is the only way to improve media quality and reduce latency, as from that TURN server, you have more control over the network (you can pay for better routes for example)
While you might not need many TURN servers, you probably want one at each availability zone of the cloud provider you are using.
Oh – and most NAT traversal servers I know are written in C/C++.Media Servers
Media Servers are optional. So much so that they aren’t really a part of the specification – they’re just something you’d add in order to support certain functions. Group calls and recording are good examples of features that almost always translate into needing media servers.
The problem is that media servers are resource hogs compared to any of the other servers you’ll be needing with WebRTC.
This means that they end up scaling quite differently – a lot faster to be exact. And when they fail or crash (which happens), you still want to be able to reconnect the session nicely in front of the customer.
But the main thing is that it has different specs than the other servers here.
Which is why in most cases, media servers are placed in “isolation”.
There’s a point in placing media servers co-located with TURN servers – they scale somewhat together when TURN is needed. But I am not in favor of this approach most times, because TURN is a lot more Internet facing than the media server. And while I haven’t seen any publicity around hackers attacking media servers, it is probably only a matter of time.
And guess what? Media Servers? They are usually implemented in C/C++. They say it’s for speed.Why Split them up?
Because they are different.
They serve different purposes.
And most likely, they need to be located in different parts of your deployment.
So just don’t. Place them in separate machines. Or VMs. Or Docker. Or whatever. Just have them logically separated and be prepared to separate them physically when the need arise.If you want to understand more about WebRTC servers, then try out my free WebRTC server side mini course. You won’t regret it.
The post With WebRTC, Don’t Never Ever Mix Media and Signaling appeared first on BlogGeek.me.