Subscribe to bloggeek feed
The leading authority on WebRTC
Updated: 2 hours 43 min ago

Google I/O 2018 and the Future of Computing

Mon, 05/21/2018 - 12:00

Google in 2018 is all about AI. But not only…

In November 2015, Google released TensorFlow, an open source machine learning framework. While we’ve had machine learning before that – at Google and elsewhere, this probably marks the date when machine learning and as an extension AI got its current spurt of growth.

Some time, between that day and the recent Google I/O event, Sundar Pichai, CEO of Google, probably brought his management team, knocked on the table and told them: “We are now an AI company. I don’t care what it is that you are doing, come back next week and make sure you show me a roadmap of your product that has AI in it.”

I don’t know if that meeting happened in such a form or another, but I’d bet that’s what have been going at Google for over a year now, culminating at Google I/O 2018.

After the obligatory icebreaker about the burger emoji crisis, Pichai immediately went to the heart of the keynote – AI.

Google announced AI at last year’s Google I/O event, and it was time to show what came out of it a year later. Throughout the 106 minutes keynote, AI was mentioned time and time again.

That said, there was more to that Google I/O 2018 keynote than just AI.

Google touched at its keynote 3 main themes:

  1. AI
  2. Wellbeing
  3. Fake news

I’d like to expand on each of these, as well as discuss parts of Smart Displays, Android P and Google Maps pieces of the keynote.

I’ll try in each section to highlight my own understanding and insights.

Before we begin

Many of the features announced are not released yet. Most of them will be available only closer to the end of the year.

Google’s goal was to show its AI power versus its competition more than anything else they wanted to share in this I/O event.

This is telling in a few ways:

  1. Google weren’t ready with real product announcements for I/O that were interesting enough to fill 100 minutes of content. Or more accurately, they were more interested in showing off the upcoming AI stuff NOW and not wait for next year or release it later
  2. Google either knows its competitors are aware of all the progress it is making, or doesn’t care if they know in advance. They are comfortable enough in their dominance in AI to announce work-in-progress as they feel the technology gap is wide enough

When it comes to AI, Google is most probably the undisputed king today. Runners up include Amazon, Microsoft, IBM, Apple and Facebook (probably at that order, though I am not sure about that part).

If I try to put into a diagram the shift that is happening in the industry, it is probably this one:

Not many companies can claim AI. I’ll be using ML (Machine Learning) and AI (Artificial Intelligence) interchangeably throughout the rest of this article. I leave it to you to decide which of the two I mean

AI was featured in 5 different ways during the keynote:

  1. Feature enhancer
  2. Google Assistant (=voice/speech)
  3. Google Lens (=vision)
  4. HWaaS
Feature Enhancer

In each and every single thing that Google does today, there’s an attention to how AI can improve that thing that needs doing. During the keynote, AI related features in GMail, Google Photos and Android were announced.

It started off with four warm-up feel-good type use cases that weren’t exactly product announcements, but were setting the stage on how positive this AI theme is:

  • Diagnosing diseases by analyzing human retina images in healthcare
  • Predicting probability of rehospitalization of a patient in the next 24 hours
  • Producing speaker based transcription by “watching” a video’s content
  • Predictive morse typing for accessibility

From here on, most sections of the keynote had an AI theme to them.

Moving forward, product managers should think hard and long about what AI related capabilities and requirements do they need to add to the features of their products.What are you adding to your product that is making it SMARTER?

Google Assistant (=voice and speech)

Google Assistant took center stage at I/O 2018. This is how Google shines and differentiates itself from its main 3 competitors: Apple, Amazon and Facebook.

In March, Forbes broke some interesting news: at the time, Amazon was hiring more developers for Alexa than Google was hiring altogether. Alexa is Amazon’s successful voice assistant. And while Google hasn’t talked about Google Home, its main competitor at all, it did emphasize its technology differentiation. This emphasis at I/O was important not only for Google’s customers but also for its potential future workforce. AI developers are super hard to come by these days. Expertise is scarce and competition between companies on talent is fierce. Google needs to make itself attractive for such developers, and showing it is ahead of competition helps greatly here.

Google Assistant got some major upgrades this time around:

  1. WaveNet. Google now offers an improved text to speech engine that makes its speech generator feel more natural. This means:
    1. To get new “voices” now requires Google to have less samples of a person speaking
    2. Which allowed it to introduce 6 new voices to its Assistant (at a lower effort and cost)
    3. To make a point of it, they started working with John Legend to get his voice to Assistant – his time is more expensive, and his voice “brand” is important to him, so letting Google use it shows his endorsement to Google’s text-to-speech technology
    4. This is the first step towards the ability to mimic the user’s own voice. More on that later, when I get to Google Duplex
  2. Additional languages and countries. Google promised support for 30 languages and 80 countries for Assistant by year end
  3. Naturally Conversational. Google’s speech to text engine now understand subtleties in conversations based not only on what is said but also how it is said, taking into account pitch, pace and pauses when people speak to it
  4. Continued conversation. “Hey Google”. I don’t need to say these action words anymore when engaging in a back and forth conversation with you. And you maintain context between the questions I ask
  5. Multiple actions. You can now ask the assistant to do multiple things at once. The assistant will now parse them properly

Besides these additions, where each can be seen as a huge step forward on its own right, Google came out with a demo of Google Duplex, something that is best explained with an audio recording straight from the keynote:

If you haven’t watched anything from the keynote, be sure to watch this short 4 minutes video clip.

There are a few things here that are interesting:

  • This isn’t a general purpose “chatbot”/AI. It won’t pass a turing test. It won’t do anything but handling appointments
  • And yet. It is better than anything we’ve seen before in doing this specific task
  • It does that so naturally, that people can’t distinguish it from a real person, at least not easily
  • It is also only a demo. There’s no release date to it. It stays in the domain of “we’ve got the best AI and we’re so sure of it that we don’t care of telling our competitors about it”
  • People were interested in the ethical parts of it, which caused Google to backtrack somewhat later and indicate Duplex will announce itself as such at the beginning of an interaction
    • Since we’re still in concept stage, I don’t see the problem
    • I wouldn’t say google were unethical – their main plan on this one was to: 1. Show supremacy; 2. Get feedback
    • Now they got feedback and are acting based on it
  • Duplex takes WaveNet to the next level, adding vocal queues to make the chatbot sound more natural when in a conversation. The result is uncanny, and you can see by the laughs of the crowds at I/O
  • Duplex is a reversal of the contact center paradigm
    • Contact center software, chatbots, ML and AI are all designed to get a business better talk with its customers. Usually through context and automation
    • Duplex is all about getting a person to better talk to businesses. First use case is scheduling, but if it succeeds, it won’t be limited to that
    • What’s there to stop Google from reversing it back and putting this at the hands of the small businesses, allowing them to field calls of customers more efficiently?
    • And what happens once you put Duplex in both ends of the call? An AI assistant for a user trying to schedule an appointment with an AI assistant of a business
  • When this thing goes to market, Google will have access to many more calls, which will end up improving their own services:
    • An improvement to the accuracy and scenarios Duplex will be relevant for
    • Ability to dynamically modify information based on the content of these calls (it showed an example of how it does that for opening hours on Google Maps during the keynote)
    • Can Google sell back a service to businesses for insights about their contact centers based on people’s requests and the answers they get? Maybe even offer a unique workforce optimization tool that no one else can
  • I’d LOVE to see cases where Duplex boches these calls in Google’s field trials. Should be hilarious

You’d like to read what Chad Hart has to write about Duplex as well.

For me, Duplex and Assistant are paving the way to where we are headed with voice assistants, chatbots and AI. Siri, Cortana and Lex seem like laggards here. It will interesting to see how they respond to these advancements.

Current advancements in speech recognition and understanding make it easier than ever to adopt these capabilities into your own products.If you plan on doing anything conversational in nature, look first at the cloud vendors and what they offer. As this topic is wide, no single vendor covers all use cases and capabilities.

While at it, make sure you have access to a data set to be able to train your models when the time comes.

Google Lens (=vision)

Where Google Assistant is all (or mostly) about voice, Google Lens is all about vision.

Google Lens is progressing in its classification capabilities. Google announced the following:

  • Lens now recognizes and understands words it “sees”, allowing use cases where you can copy+paste text from a photo – definitely a cool trick
  • Lens now handles style matching for clothing, able of bringing suggestions of similar styles
  • Lens offers points of interest and real time results by offering on-device ML, coupled with cloud ML

That last one is interesting, and it is where Google has taken the same approach as Amazon did with DeepLens, one that should be rather obvious based on the requirements here:

  1. You collect and train datasets in the cloud
  2. You run the classification itself on the edge device – or in the cloud

It took it a step further, offering it also programmatically through ML Kit – Google’s answer to Apple’s Core ML and Amazon’s SageMaker.

Here’s a table summarizing the differences between these three offerings:

Google Apple Amazon ML Framework TensorFlow Core ML + converters MXNet & TensorFlow Cloud component Google Firebase none AWS SageMaker Edge component ML Kit Core ML AWS DeepLens Edge device types Android & iOS iOS DeepLens Base use cases
  • Image labeling
  • Text recognition
  • Face detection
  • Barcode scanning
  • Landmark detection
  • Smart reply
Handpicked samples from open source repositories Samples:

  • Object detection
  • Hot dog not hot dog
  • Cat and dog
  • Artistic style transfer
  • Activity recognition
  • Face detection
Proprietary parts Cloud TPUs and productized use cases iOS only AWS ecosystem only Open parts Devices supported Machine learning frameworks Machine learning frameworks


Apple Core ML is a machine learning SDK available and optimized for iOS devices by Apple. You feed it with your trained model to it, and it runs on the device.

  • It is optimized for iOS and exists nowhere else
  • It has converters to all popular machine learning frameworks out there
  • It comes with samples from across the internet, pre-converted to Core ML for developers to play with
  • It requires the developers to figure out the whole cloud backend on their own


AWS DeepLens is the first ML enabled Amazon device. It is built on top of Amazon’s Rekognition and SageMaker cloud offerings.

  • It is a specific device that has ML capabilities in it
  • It connects to the AWS cloud backend along with its ML capabilities
  • It is open to whatever AWS has to offer, but focused on the AWS ecosystem
  • It comes with several baked samples for developers to use


Google ML Kit is Google’s machine learning solution for mobile devices, and has now launched in beta.

  • It runs on both iOS and Android
  • It makes use of TensorFlow Lite for the device side and on TensorFlow on the backend
  • It is tied into Google Firebase to rely on Google’s cloud for all backend ML requirements
  • It comes with real productized use cases and not only samples
  • It runs its models both on the device and in the cloud

This started as Google Lens and escalated to an ML Kit explanation.

Need to run ML? You need to think where training the model occurs and where classification takes place. These seem to be split these days between cloud and devices. In many cases, developers are pushing the classification algorithms towards the devices at the edge to gain speed and reduce costs and load on the backend. HWaaS

With everything moving towards the cloud, so does hardware in some sense. While the cloud started from hardware hosting of virtualized Linux machines, we’ve been seeing a migration towards different types of hardware recently:

We’re shifting from general purpose computing done by CPUs towards specialized hardware that fits specific workloads in the form of FPAG.

The FPGA in the illustration above is Google’s TPU. TPU stands for TensorFlow Processing Unit. These are FPGAs that have been designed and optimized to handle the TensorFlow mathematical functions.

TensorFlow is said to be slow on CPUs and GPUs compared to other alternatives, and somehow Google is using it to its advantage:

  • It open sourced TensorFlow, making it the most popular machine learning framework out there in a span of less than 3 years
  • It is now in its third generation of TPUs on Google Cloud for those who need to train large datasets quickly
  • TPUs are out of the reach of Amazon and other cloud providers. It is proprietary hardware designed, hosted and managed by Google, so any performance gains coming from it are left at the hands of Google for its customers to enjoy

Google’s TPUs got their fair share of time at the keynote in the beginning and were stitched throughout the keynote at strategic points:

  • Google Lens uses TPUs to offer the real time capabilities that it does
  • Waymo makes use of these TPUs to get to autonomous cars

Pichai even spent time boasting large terms like liquid cooling…

It is a miracle that these TPUs aren’t plastered all over the ML Kit landing page.

Going with TensorFlow? You’ll need to decide on the cloud platform you are going to use, especially when it comes to dataset processing and training. Google is working hard to differentiate itself there. Wellbeing

I am assuming you are just as addicted to your smartphone as I am. There are so many jokes, memes, articles and complaints about it that we can no longer ignore it. There are talks about responsibility and its place in large corporations.

Apple and Google are being placed on the spotlight on this one in 2018, and Google took the first step towards a solution. They are doing it in a long term project/theme named “Wellbeing”.

Wellbeing is similar to the AI initiative at Google in my mind. Someone came to the managers and told them one day something like this: “Our products are highly addictive. Apple are getting skewered in the news due to it and we’re next in line. Let’s do something about it to show some leadership and a differentiation versus Apple. Bring me ideas of how we can help our Android users with their addiction. We will take the good ideas and start implementing them”.

Here are a few things that came under Wellbeing, and one that didn’t but should have been:

  • Dashboard – Google is adding to Android P an activity dashboard to surface insights to the users on what they do on their smartphones
  • YouTube includes a new feature to remind you to take a break when a configured amount of time passes. You can apply the same to other third party apps as well
  • Smarter do not disturb feature, coupled with Shush – all in an effort to reduce notifications load and anxiety from the user
  • Wind down – switching to grayscale mode when a predetermined time of day arrives
  • Pretty Please – Google Assistant can be configured to respond “better” and offer positive reinforcements when asked nicely. This one should help parents make their kids more polite (I know I need it with my kids at home)

In a way, this is the beginning of a long road that I am sure will improve over time. It shows the maturity of mobile platforms.

Not sure how responsibility, accountability and wellbeing like aspects lend themselves to other products. If you are aiming at machine learning, think of the biases in your models – these are getting attention recently as well. Fake News

Under responsibility there’s the whole Fake News of recent years.

While Wellbeing targets mainly Apple, The Google News treatment in the keynote was all about addressing Facebook’s weakness. I am not talking about the recent debacle with Cambridge Analitica – this one and anything else related to user’s data privacy was carefully kept away from the keynote. What was addressed is Fake News, where Google gets way more favorable attention than Facebook (just search Google for “google fake news” and “facebook fake news” and look at the titles of the articles that bubble up – check it also on Bing out of curiosity).

What Google did here is create a new Google New experience. And what is interesting is that it tried to bring something to market that skims nicely between objectivity and personalization – things that don’t often correlate when it comes to opinion and politics. It comes with a new layer of visualization that is more inviting, but most of what it does is rooted in AI (as anything else in this I/O keynote).

Here’s what I took out of it:

  • AI is used to decide what are quality sources for certain news topics. They are designed to build trust in the news and to remove the “fake” part out of it
  • Personalized news is offered in the “category” level. Google will surface topics that interest you
  • Next to personalized news, there’s local news as well as trending news, which gets surfaced, probably without personalization though the choice of topics is most probably machine learning driven
  • Introduced Newscast – a presentation layer of a topic, enabling readers to get the gist of a topic and later drill down if they wish in what Google calls Full Coverage – an unfiltered view of an event – in an unpersonalized way

One more thing Google did? Emphasized that they are working with publishers on subscriptions, being publisher-friendly, where Facebook is… er… not. Will this hold water and help publishers enough? Time will tell.

AI and Machine Learning lends themselves well to this approach. It ends up being a mixture of personalization, trending and other capabilities that are surfaced when it comes to news. Can you see similar approaches suitable for your product offering? Smart Displays

Smart displays are a rather new category. Besides Android as an operating system for smartphones and the Waymo AI piece, there was no other device featured in the keynote.

Google Home wasn’t mentioned, but Smart Displays actually got their fair share of minutes in the keynote. The only reason I see for it is that it is coupled nicely with the Google Assistant.

The two features mentioned that are relevant?

  • It can now show visuals that relate to what goes on in the voice channel
    • This is similar in a way to what MindMeld tried doing years back, before its Cisco acquisition
    • The main difference is that this involves a person and a chatbot. Adding a visual element makes a lot of sense and can be used to enhance the experience
  • It offers rich and interactive responses, which goes hand in hand with the visuals part of it

I am unsure why Google gave smart displays the prominence it did at Google I/O. I really have no good explanation for it, besides being a new device category where Apple isn’t operating at all yet – and where Amazon Alexa poses a threat to Google Home.

Android P

10 years in, and Android P was introduced.

There were two types of changes mentioned here: smarts and polish.

Smarts was all about AI (but you knew that already). It included:

    • Adaptive Battery
    • Adaptive Brightness
    • ML Kit (see the Lens section above)

Polish included:

  • App Actions and Slices, bot offering faster and better opportunities for apps to interact with users outside of the app itself
  • UI/UX changes all around that are just part of the gradual evolution of Android

There was really not much to say about Android P. At least not after counting all the AI work that Google has been doing everywhere anyway.

App Actions and Slices are important if you develop Android Apps. ML Kit is where the true value is and it works on both Android and iOS – explore it first. Google Maps

Google Maps was given the stage at the keynote. It is an important application and getting more so as time goes by.

Google Maps is probably our 4th search destination:

  1. Google Search
  2. Google Assistant
  3. YouTube
  4. Google Maps

This is where people look for information these days.

In Search Google has been second to none for years. It wasn’t even part of the keynote.

Google Assistant was front and center in this keynote, most probably superior to its competitors (Siri, Cortana and Lex).

YouTube is THE destination for videos, with Facebook there, but with other worries at this point in time. It is also safe to say that younger generations and more visual audiences search YouTube more often than they do anything else.

Maps is where people search to get from one place to another, and probably searching even more these days – more abstract searches.

In a recent trip to the US, I made quite a few searches that were open ended on Google Maps and was quite impressed with the results. Google is taking this a step further, adding four important pillars to it:

  1. Smarts
  2. Personalization
  3. Collaboration
  4. Augmented Reality

Smarts comes from its ML work. Things like estimating arrival times, more commune alternatives (they’ve added motorcycle routes and estimates for example), etc.

Personalization was added by the introduction of a recommendation engine to Maps. Mostly around restaurants and points of interest. Google Maps can now actively recommend places you are more likely to like based on your past preferences.

On the collaboration front, Google is taking its first steps by adding the ability to share locations with friends so you can reach out a decision on a place to go to together.

AR was about improving walking directions and “fixing” the small gripes with maps around orienting yourself with that blue arrow shown on the map when you start navigating.

Where are we headed?

That’s the big question I guess.

More machine learning and AI. Expect Google I/O 2019 to be on the same theme.

If you don’t have it in your roadmap, time to see how to fit it in.

The post Google I/O 2018 and the Future of Computing appeared first on

Choosing a Live Video Platform – a new video series

Mon, 05/14/2018 - 12:00

If you are contemplating build versus buy for your live video platform, or just undecided on which one to pick, check out this 10-part video series.

My consulting projects these days tend to be in one of 3 domains:

  1. “We need more marketing exposure, and would like you to help us” (=marketing)
  2. “We want to talk about our strategy, differentiation and roadmap” (=product)
  3. “We want to make sure we’re building the product properly” (=architecture/development)

I like doing all of these types of projects simply because it keeps me interested. Especially since there’s no specific one that I like more than the others here. It does sometimes confuse potential customers, and probably doesn’t help me with “niching” or “focusing”, but it does give me a very wide view of the communications market.

I want to focus on the 3rd project type, the one where developers want assistance in making sure they pick the right technology, architecture the solution and get it to market with as little risk as possible, this is where things get interesting.

The first thing I do in such projects? Check for NIH.

NIH stands for Not Invented Here, and it is a syndrome of all developers. I know, because I suffer from it as well. Developers are builders and tinkerers. They like to make things work – not get them readymade, which is why when they have the opportunity of building something – they’ll go ahead and do it. The problem though, is that economies of scale as well as time to market aren’t in their favor. In many of the cases, it would be easier to just pick a CPaaS vendor and build your live video product on top of his platform instead of building it all from scratch.

There are many reasons why people go build their own video platform:

  1. They think it will cost them less in the long run (usually coupled with a feeling that the price points of the CPaaS vendors are too high and a dislike of paying per usage/minute and not a fixed fee)
  2. They have a unique scenario that isn’t quite covered by CPaaS vendors they tried out
  3. They want to own the video technology that they are using
  4. They need to run on premise due to their customers, regulation or any other reason/excuse

I spend some time uncovering and better understanding the reasons for the decision. Sometimes I feel they make sense, while other times less so.

Which is why when I sat down with Vidyo to think about an interesting project to do together some months back, the decision was made to put out a series of short videos explaining different aspects of live video platforms. I tried to cover as much ground as possible. From network impairments, through video coding technologies, through scale, devices and lots of other topics as well.

The purpose was to get developers and entrepreneurs acquainted with what is necessary when you go build your own infrastructure, and if you decide on buying a platform, to know what to look for.

The series is packed full with content. And I’d love to get your candid opinion of it. Check it out here:

What to Look for in a Live Video Platform


The post Choosing a Live Video Platform – a new video series appeared first on

What Comes Next in Communications?

Mon, 05/07/2018 - 12:00

There are opposite forces at play when it comes to the next wave of communication technologies.

There are a lot of changes going on at the moment, being introduced into the world of communications. If I had to make a shopping list of these technologies, I’d probably end up with something like this:

  1. Cloud, as a Service
  2. APIs and programmability
  3. Business messaging, social messaging
  4. “Teams”, enterprise messaging
  5. Contextual everything
  6. Artificial Intelligence, NLP, NLU, ML
  7. X Reality – virtual, augmented, mixed, …

Each item is worthy of technobabble marketing in its own rite, but the thing is, they do affect communications. The only question is in what ways.

I have been looking at it lately a lot, trying to figure out where things are headed, building different models to explain things. And looking at a few suggested models by other industry experts.

Communication domains – simplified

Ignoring outliers, there are 3 main distinct communication domains within enterprises:

  1. UC – Unified Communications
  2. CC – Contact Center
  3. CP – Communications Platform

Usually, we will be using the obligatory “aaS” to them: UCaaS, CCaaS and CPaaS

I’ll give my own simplified view on each of these acronyms before we proceed.


Unified Communications looks inwardly inside the company.

A company has employees. They need ways and means to communicate with each other. They also need to communicate with external entities such as suppliers, partners and customers. But predominantly, this is about internal communications. The external communications usually takes a second-class citizen position, with limited capabilities and accessibility; oftentimes, external communications will be limited to email, phone calls and SMS.

What will interest us here will be collaboration and communication.


Contact Centers are about customers. Or leads, which are potential customers.

We’ve got agents in the contact center, be it sales or customer care (=support), and they need to talk to customers.

Things we care about in contact centers? Handling time, customer satisfaction, …


Communication Platform as a Service is different.

It is a recent entry to the communications space, even if some would argue it has always been there.

CPaaS is a set of building blocks that enable us to use communications wherever we may need them. Both CCaaS and UCaaS can be built on top of CPaaS. But CPaaS is much more flexible than that. It can fit itself to almost any use case and scenario where communications is needed.

Communications in Consolidation

There’s a consolidation occuring in communications. One where vendors in different part of communications are growing their offering into the adjacent domains.

We are in a migration from analog to digital when it comes to communications. And from pure telecom/telephony towards browser based, internet communications. Part of it is the introduction of WebRTC technology (couldn’t hold myself back from mentioning WebRTC).

This migration opens up a lot of opportunities and even contemplation on how should we define these communication domains and are they even separate at all.

There have been some interesting moves lately in this space. Here are a few examples of where these lines get blurred and redefined:

  • Dialpad just introduced a contact center, tightly integrated and made a seamless part of its unified communications platform
  • Vonage acquires Nexmo, which is one of the leading CPaaS vendors. Other UC vendors have added APIs and developer portals to their UC offerings
  • Twilio just announced Flex, its first foray out of CPaaS and into the contact center realm

These are just examples. There are other vendors in the communication space who are going after adjacent domains.

The idea here is communication vendors looking into the communications venn diagram and reaching out to an adjacency, with the end result being a consolidation throughout the whole communications space.

External disruption to communications

This is where things get really interesting. The forces at play are pushing communications outwards:

UCaaS, CCaaS, CPaaS. It was almost always about real time. Communications happening between people in real time. When the moment is over, the content of that communications is lost – or more accurately – it becomes another person’s problem. Like a contact center recording calls for governance or quality reasons only, or having the calls transcribed to be pushed towards a CRM database.

Anything that isn’t real time and transient isn’t important with communications. Up until now.

We are now connecting the real time with the asynchronous communications. Adding messaging and textual conversations. We are thinking about context, which isn’t just the here and now, but also the history of it all.

Here’s what’s changing though:

UC and Teams

Unified Communications is ever changing. We’ve added collaboration to it, calling it UC&C. Then we’ve pushed it to the cloud and got UCaaS. Now we’re adding messaging to it. Well… we’re mostly adding UC to messaging (it goes the other way around). So we’re calling it Teams. Or Team Collaboration. Or Workstream Collaboration (WSC). Or Workstream Communication and Collaboration (WCC). I usually call it Enterprise Messaging.

The end result is simple. We focus on collaboration between teams in an organization, and we do that via group chat (=messaging) as our prime modal for communications.

Let’s give it a generic name that everyone understands: Slack

The question now is this: will UC gobble up Team communication vendors such as Slack (and now Workplace by Facebook; as well as many other “project management” and messaging type tools) OR will Slack and the likes of it gobble up UC?

I don’t really know the answer.

CC and CRMs

What about contact centers? These live in the world of CRM. The most important customer data resides in CRMs. And now, with the introduction of WebRTC, and to an extent CPaaS vendors, a CRM vendor can decide to add contact center capabilities as part of his offering. Not through partnerships, but through direct implementation.

Can contact centers do the same? Can they expand towards the CRM domain, starting to handle the customer data itself?

If salesforce starts offering a solid contact center solution in the cloud as part of its offering, that is highly integrated with the Salesforce experience, adding to it a layer of sophistication that contact center vendors will find hard to implement – what will customers do? NOT use it in favor of another contact center vendor or source it all from Salesforce? Just a thought.

There’s an additional trend taking place. That’s one of context and analytics. We’re adding context and analytics into “customer journeys”, sales funnels and marketing campaigns. These buzzwords happen to be part of what contact centers are, what modern CRMs can offer, and what dedicated tools do.

For example, most chat widget applications for websites today offer a backend CRM-like dashboard that also acts like a messaging contact center, and at the same time, these same tools act similarly to Google Analytics by following users as they visit your website trying to derive insights from their journey so the contact center agent can use it throughout the conversation. Altocloud did something similar and got acquired recently by Genesys, a large contact center vendor.

CP and PaaS

CPaaS is different a bit. We’re dealing with communication APIs here.

CPaaS market is evolving and changing. There are many reasons for it:

  1. SMS and voice is commoditized, with a lot of vendors offering these services
  2. IP based services are considered “easier” to implement, eroding their price point and popularity
  3. UCaaS vendors adding APIs, at times wanting to capture some of the market due to Twilio’s success
  4. As the market grows, there’s a looming sense of what would tech giants do – would Amazon add more CPaaS capabilities into AWS?

That last one is key. We’ve seen the large cloud vendors enhancing their platforms. Moving from pure CPU and storage services up the food chain. Amazon AWS has so many services today that it is hard to keep up. The question here is when will we reach an inflection point where AWS, GCE and Azure start adding serious CPaaS capabilities to their cloud platforms and compete directly with the CPaaS vendors?

Where is CPaaS headed anyway?

  • Does the future of CPaaS lies in attacking adjacent communication markets like Twilio is doing with Flex?
  • Will CPaaS end up being wrapped and baked into UC and “be done with it”?
  • Is CPaaS bound to be gobbled up by cloud providers as just another set of features?
  • Will CPaaS stay a distinct market on its own?
The Future of Communications

The future can unfold in three different ways when it comes to communications:

  1. Specialization in different communication domains continues and deepens
    • UC ,CC and CP remain distinct domains
    • May be a 4th domain comes in (highly unlikely to happen)
  2. Communication domains merge and we refer to it all as communications
    • UC does CC
    • CP used to build UC and CC
    • Customers going for best of suite (=single vendor) who can offer UC, CC and CP in a single platform
  3. Communication domains get gobbled up by their adjacencies
    • CC gets wrapped into CRM tools
    • UC being eaten by messaging and teams experiences (probably to be called UC again at the end of the process)
    • CP becoming part of larger, more generic cloud platforms

How do you think the future will unfold?

The post What Comes Next in Communications? appeared first on

In Search of WebRTC Developers

Mon, 04/30/2018 - 12:00

WebRTC developers are really hard to come by. I want to improve my ability to help companies in search of such skill.

If there’s something that occurs time and again, it is entrepreneurs and vendors who ask me if I know of anyone who can build their application. Some are looking to outsource the project as a whole or part of it, and then they are looking for agencies to work with. Others are looking for a single expert to work with on a specific task, or someone they could hire for long stretches of time who has WebRTC skills.

You a WebRTC Developer?


I’d like to know more about you IF you are looking for projects or for a new employer.

Here are a few things first:

  1. Even if you think I know you, please fill out the form
  2. No agencies. If you are an agency, contact me and we can have a chat. I know a few that I am comfortable working with
  3. Only starting out with WebRTC? Don’t fill out the form. Mark this page, get some experience and then fill it out
  4. The form is short, so shouldn’t take more than 5 of your minutes to fill
  5. Don’t beautify things more than they are – that will just get you thrown out of my radar. Tell things as they are

Fill out this form for me please (or via this link):


I won’t be reaching out to you immediately (or at all). I’ll be using this list when others ask for talent that fits your profile.

You looking for WebRTC Developers?

Got a need for developers that have WebRTC skills?

I am not sure exactly how to find them and where, but I am trying to get there.

Two ways to get there:

  1. I am thinking of opening up a job listing on WebRTC Weekly
    1. Payment will be needed to place a listing on the WebRTC Weekly, which reaches over 2,500 subscribers at the moment
    2. Cost will be kept low, especially considering the cost of talent acquisition elsewhere and the lack of available WebRTC developers out there
    3. I had a job listing sub-site in the past, didn’t work – this is another attempt I am trying out. If you want to try this one with me, I’ll be happy to take the leap
    4. Interested? Contact me
  2. Need a bit more than just finding a developer? I offer consulting services
    1. There are hourly rates available, as well as one-off consulting sessions
    2. I’ll be using the list I’ll be collecting of the WebRTC developers above to match you up with a candidate if you need – or just connect you with the agencies I am comfortable working with


The post In Search of WebRTC Developers appeared first on

RCS now Google Messages. What’s Next in Consumer Messaging?

Mon, 04/23/2018 - 12:00

Chat won’t bring carriers to their SMS-glory days.

The Verge came out with an exclusive last week that everyone out there is regurgitating. This is my attempt at doing the same

We’re talking about Google unveiling its plans for the consumer chat experience. To put things in quick bulleted points:

  • There’s a new service called “Chat”, which is supposed to be Google’s and the carrier’s answer to Apple iMessage, Facebook Messenger and the rest
  • Google’s default messages app on Android for SMS is getting an upgrade to support RCS, turning it into a modern messaging application
  • The moment this happens will vary between the different carriers, who are, by the way, those who make the decision and control and own the service
  • Samsung and other Android handset manufacturers will probably come out with their own messaging app instead of the one provided by Google
  • This is a risky plan with a lot of challenges ahead of it

I’d like to share my viewpoints and where things are going to get interesting.

SMS is dead

I liked Mashable’s title for their take on this:

Google’s plan to fix texting on Android is really about the death of SMS

While an apt title, my guess is that beyond carriers and reports written to them, we all know that already.

SMS has long been dead. The A2P (Application 2 Person) SMS messages are all that’s left out of it. Businesses texting us either their PIN codes and passwords for 2FA (2 Factor Authentication) and OTP (One Time Passwords) or just sending us marketing junk for us to ignore.

I asked a few friends of mine on a group chat yesterday (over Whatsapp, of course) when and how do they use SMS and why. Here are the replies I got (I translated them to English):

  • I prefer Whatsapp. It is the most lightweight and friendly alternative. I only use SMS when they are automatically sent to me on missed calls
  • Whatsapp is accessible. It has quick indicators and it is lightweight. It remembers everything in an orderly fashion
  • I noticed that people take too long to respond on SMS while they respond a lot faster over Whatsapp. Since SMS is more formal to me, I use it when sending messages for the first time to people I don’t know
  • I send SMS only to people I don’t know. I feel that Whatsapp is more personal
  • I use iMessage only with my boss. She’s ultra religious so she doesn’t have Whatsapp installed. For everything else I use Whatsapp
  • I mostly use Whatsapp for messages. I text via SMS only with my wife when I am flooded with Whatsapp messages and just want her notifications to be more prominent
  • SMS is dead for me. I don’t even have it on my home screen, and that says anything. I use SMS only to receive PIN codes from businesses
  • SMS is the new fax

These are 40 year olds in Israel. Most working out of the IT domain. The answers will probably vary elsewhere, but here in Israel, most will give you similar answers. Whatsapp has become the go-to app for communications. So much so, that we were forced to give our daughter her first smartphone at the age of 8 only so she can communicate with her friends via Whatsapp and won’t stay behind. Everyone uses it here in Israel.

You should also know that plans upwards of 2Gb of monthly data including unlimited voice and SMS in Israel cost less than $15 a month in Israel, so this has nothing to do with price pressure anymore. It has to do with network effects and simple user experience.

SMS is no longer ubiquitous across the globe. I can’t attest to other countries, but I guess Israel isn’t alone in this. SMS is just the last alternative to use when all else has failed.

Why is SMS interesting in this context?

Because a lot of what’s at stake here for Google relates to the benefits and characteristics of SMS.

RCS is (still) dead

RCS is the successor of SMS for getting carriers into the 21st century. It has been discussed for many years now, and it will most definitely, utterly, completely, unquestionably get people back from their Messenger, WhatsApp and WeChat back to the clutches of the carriers. NOT.

RCS is a design-by-committee solution, envisioned by people my age and older, targeting a younger audience across the globe in an attempt to kill fast moving social network with a standardized, ubiquitous, agreed upon specification that then needs to be implemented by multiple vendors, handset manufacturers and carriers globally to make any sense.

Not going to happen.

Google’s take on this was to acquire an RCS vendor – Jibe – two years ago for this purpose. The idea was probably to provide a combination of an infrastructure and a mobile client to speed up RCS deployments around the globe and make them interoperable faster than the carriers will ever achieve on their own.

Two years passed, and we’ve got nothing but a slide (and the article on The Verge) to show for this effort:

An impressive list of operators, OEMs and OS providers that are behind this RCS initiative. Is that due to Google? To some part, probably so.

In a way, this reminds me also of Google’s other industry initiative – the Alliance of Open Media, where it is one of 7 original founding members that just recently came out with AV1, a royalty free video codec. It is a different undertaking:

  • RCS will be controlled by carriers, who were never kind or benevolent to their users
  • For carriers, the incentive can be found in the GSMA’s announcement: “GSMAi estimate that this will open up an A2P RCS business worth an estimated $74bn by 2021”
    • This is about securing A2P SMS revenues by migrating to RCS
    • The sentences before this one in that announcement explain how they plan on reaching there: “The Universal Profile ensures the telecoms industry remains at the centre of digital communications by enabling Operators, OEMs and OS Providers to deliver this exciting new messaging service consistently, quickly and simply.”
    • Problem is, they are not the centre of digital communications, so this isn’t about ensuring or remaining. It is about winning back. And you can’t do that if your focus is A2P
  • This isn’t about an open platform for innovation. Of a level playing field for all. And that makes it starkly different from the AV1 initiative. It is probably closer to MPEG-LA’s response in a way of a new video codec initiative

Why is Google going into bed with the carriers on this one?

Google had no choice

The Verge had an exclusive interview with Anil Sabharwal, the Google VP leading this effort. This led to the long article about this initiative. The numbers that Anil shared were eye opening as to the abysmal state of Google’s messaging efforts thus far.

I went ahead and placed these numbers next to other announced messaging services for comparison:

A few things to note here:

  • Telegram, Facebook Messenger and Whatsapp are all apps users make a decision to install, and they are making that decision en masse
  • Apple has upwards of 1.3 billion active devices, which indicate the general size of its iMessage service
  • Google Messages is the default app on Android for SMS, unless:
    • Carriers replace it with their own app
    • Handset manufacturers replace it with their own app
    • Users replace it with another app they install
  • Google Messages sees around 100 million monthly active users – the table-stakes entry number to be relevant in this market, but rather low for an ubiquitous, default app
  • Google Allo has less than 50 million downloads. That’s not even monthly active users
  • Google Hangouts stopped announcing its user base years ago, and frankly, they stopped investing in it as well. The mobile app is defunct (for me) for quite some time now, with unusual slowness and unresponsiveness

Google failed to entice its billion+ Android users to install or even use its messaging applications.

Without the numbers, it couldn’t really come up with a strategy similar to Apple iMessage, where it essentially hijacks the messaging traffic from carriers, onboarding the users to its own social messaging experience.

Trying to do that would alienate Google with the carriers, which Google relies on for Android device sales. Some would argue that Google has the klout and size to do that, but that is not the case.

Android is open, so handset manufacturers and carriers could use it without Google’s direct approval, throwing away the default messaging app. Handset manufacturers and carriers would do that in an effort to gain more control over Android, which would kill the user experience, as most such apps by handset manufacturers and carriers do. The end result? More users purchasing iPhones, as carriers try to punish Google for the move.

What could Google do?

  1. Double down on their own social messaging app – hasn’t worked multiple times now. What can they do different?
  2. Build their own iMessage – alienate the Android ecosystem, with the risk of failing attracting users as they failed in the past
  3. Partner with carriers on RCS

Two years ago, Google decided to go for alternatives (1) and (3). Allo was their own social messaging app. Had it succeeded, my guess is that Google would have gone towards approach (2). In parallel, Google acquired Jibe in an effort to take route (3), which is now the strategy the company is behind for its consumer messaging.

The big risk here is that the plan itself relies on carriers and their decisions. We don’t even know when will this get launched. Reading between the lines of The Verge’s article, Google already completed the development and got the mobile client ready and deployed. It just isn’t enabled unless the carrier being used approves. Estimates indicate 6-12 months until that happens, but for which of the carriers? And will they use the stock Android app for that or their own ambitious better-than-whatsapp app?

E2EE can kill this initiative and hurt Google

The biggest risk to Google is the lack of E2EE (end to end encryption).

In each and every regurgitated post of The Verge article and in The Verge itself this is emphasized. Walt Mossberg’s tweet was mentioned multiple times as well:

Bottom line: Google builds an insecure messaging system controlled by carriers who are in bed with governments everywhere at exactly the time when world publics are more worried about data collection and theft than ever.

— Walt Mossberg (@waltmossberg) April 20, 2018

Bottom line: Google builds an insecure messaging system  controlled by carriers who are in bed with governments everywhere at exactly the time when world publics are more worried about data collection and theft than ever.

The problem for Google is that the news outlets are noticing and giving it a lot of publicity. And it couldn’t come at a less convenient time, where Facebook is being scrutinized for its malpractice of how it uses and protects user data in the Cambridge Analytica scandal. Google for the most part, has come unscathed out of it, but will this move put more of the spotlight on Google?

The other problem is that all the other messaging apps already have E2EE supported in one way or another. The apps usually mentioned here are Apple iMessage, Signal and Telegram. Whatsapp switched to E2EE by default two years ago. And Facebook Messenger has it as an option (though you do need to enable it manually per conversation).

Will customers accept using “Chat” (=RCS) when they know it isn’t encrypted end to end?

On the other hand, Russia is attempting to close Telegram by blocking millions of IP addresses in the country, and taking down with it other large services. If this succeeds, then Russia will do the same to all other popular messaging applications. And then other countries will follow. The end result will be the need to use the carrier (and Google’s) alternative instead. Thankfully, Russia is unsuccessful. For the time being.

Who owns the data?

Carriers do.

With RCS, the carriers are the ones that are intercepting, processing and forwarding the messages. In a way, it alludes to the fact that Google isn’t going to be the one reading these messages, at least not from the server.

This means that either Google decided there’s not enough value in these messages and in monetizing them – or – that they have other means to gain access to these messages.

Here are a few alternatives Google can use to accessing these messages:

  1. Through licensing and operating the servers on behalf of carriers. Not all carriers will roll their own and may prefer using Google as a service here. Having the messages in unencrypted format on the server side is beneficial for Google in a way, especially when they can “blame” the carriers and regulations
  2. Via Google’s Messages app. While messages might be sent via a carrier’s network, the client sending and receiving these messages is developed and maintained by Google, giving them the needed access. This can be coupled with features like backing up the messages in Google Drive or letting Google read the messages to improve its services for the user
  3. By coupling features such as Google Assistant and Smart Replies into it, which means Google needs to read the messages to offer the service

Google might have figured it has other means to get to the messages besides owning and controlling the whole experience – similar to how Google Photos is one of the top camera apps in Apple iTunes.

By offering a better experience than other RCS client competitors, it might elicit users to download its stock Chat app on devices who don’t have it by default. Who knows? It might even be able to get people to download and use it on an iPhone one day.

The success of Google here will translate into RCS being a vehicle for Google to get back to messaging more than the means for carriers to gain relevance again.

Ubiquity is here already, but not via SMS or RCS

I’ll put the graph here again – to make a point.

1.5 billion people is ubiquitous enough for me. Especially when the penetration rates in Israel are 100% in my network of connections.

People tend to talk about the ubiquity of SMS and how RCS will inherit that ubiquity.

They fail to take into account the following:

  1. SMS is ubiquitous, but it took it many years to get there
  2. It is used for marketing and 2FA mostly
  3. The marketing part is less valuable
    1. It can be treated as spam by consumers for the most part
    2. It is one way in nature, where social networks are around conversations
    3. Spam and unsolicited messages don’t work that well in social networks
  4. 2FA will be shifting away from SMS (see here)
    1. Google does a lot of its 2FA without SMS today
    2. Google can open it up to third parties at any point in time
    3. Apple can do the same with the iPhone
  5. The shift towards RCS won’t be done in a single day. It will be done in a patchwork fashion across the globe by different carriers

Think about it.

You can now send out an RCS message from your device. To anyone. If the other party has no RCS installed, the message gets converted to SMS. Sweet.

But what happens when the person you are sending that RCS message is located abroad? Are you seriously happy with getting a payment request from your carrier on a stupid international SMS message, or a full conversation of such for a thing you could have easily used Whatsapp for instead? And for free.

Ubiquity isn’t the word that comes to my mind when thinking about RCS.

The holy grail is business messaging

Consumer messaging is free these days. There is no direct monetary value to be gained by offering this service to consumers. Carriers won’t be able to put that jinni back into its bottle and start collecting money from users. Their only approach here might be to zero-rate RCS traffic, but that also isn’t very interesting to most consumers – at least not here in Israel.

The GSMA already suggested where the money is – in business messaging. They see this as a $74bn opportunity by 2021. The problem is that rolling RCS 6-12 months from now, by only some of the carriers, isn’t going to cut it. Apple Business Chat was just released, vertically integrated, with a lot of thought put into businesses, their discovery process and free of charge.

Then there’s the rest of the social networks opening their APIs towards the businesses, and contact center solutions driving the concept of omnichannel experiences for customers.

Carriers are getting into this game late and unprepared. On top of that, they will try to get money out of this market similar to how they do with SMS. But the price points they are used to make no sense anymore. Something will need to change for the carriers to be successful here.

Will carriers be able to succeed with RCS? I doubt it.

Will google be able to succeed with Chat? Maybe. But it is up to the carriers to allow that to happen.

The post RCS now Google Messages. What’s Next in Consumer Messaging? appeared first on

WebRTC 1.0 Training and Free Webinar Tomorrow (on Tuesday)

Sun, 04/08/2018 - 12:00

Join Philipp Hancke and me for a free training on WebRTC 1.0, prior to the relaunch of my advanced WebRTC training.

Here’s something that I get at least once a week through my website’s chat widget:

It is one of the main reasons why I’ve created my advanced WebRTC course. It is a paid WebRTC course that is designed to fill in the gaps and answer the many questions developers face when needing to deal with WebRTC.

Elephants, blind Men, alligators and WebRTC

I wanted to connect it to the parable of the six blind man and an elephant, explaining how wherever you go in the Internet, you are going to get a glimpse about WebRTC and never a full clear picture. I even searched for a good illustration to use for it. Then I bumped into this illustration:

It depicts what happens with WebRTC and developers all too well.

If you haven’t guessed it, the elephants here are WebRTC and the requirements of the application and that flat person is the developer.

This fits well with another joke I heard yesterday from a friend’s kid:

Q: Why can’t you go into the woods between 14:00-16:00?

A: Because the elephants are skydiving

There’s a follow up joke as well:

Q: Why are the alligators flat?

A: Because they entered the woods between 14:00-16:00

WebRTC development has a lot of rules. Many of which are unwritten.

WebRTC 1.0

There is a lot of nuances about WebRTC. A lot of written material, old and new – some of it irrelevant now, the rest might be correct but jumbled. And WebRTC is a moving target. It is hard to keep track of all the changes. There’s a lot of knowledge around WebRTC that is required – knowledge that doesn’t look like an API call or written in the standard specification.

This means that I get to update my course every few months just to keep up.

With WebRTC 1.0, there’s both a real challenge as well as an opportunity.

It is a challenge:

  • WebRTC 1.0 still isn’t here. There’s a working draft, which should get standardized *soon* (=soon started in 2015, and probably ends in 2018, hopefully)
  • Browser implementations lag behind the latest WebRTC 1.0 draft
  • Browser implementations don’t behave the same, or implement the same parts of the latest WebRTC 1.0 draft

It is an opportunity:

We might actually get to a point where we have a stable API with stable implementations.

But we’re still not there

Should you wait?


We’re 6-7 years in with WebRTC (depending who does the counting), and this hasn’t stopped well over a 1,000 vendors to jump in and make use of WebRTC in production services.

There’s already massive use of WebRTC.

Me and WebRTC 1.0

For me, WebRTC 1.0 is somewhat of a new topic.

I try to avoid the discussions going on around WebRTC in the standardization bodies. The work they do is important and critical, but often tedious. I had my fair share of it in the past with other standards and it isn’t something I enjoy these days.

This caused a kind of a challenge for me as well. How can I teach WebRTC, in a premium course, without explaining WebRTC 1.0 – a topic that needs to be addressed as developers need to prepare for the changes that are coming.

The answer was to ask Philipp Hancke to help out here, and create a course lesson for me on WebRTC 1.0. I like doing projects with Philipp, and do so on many fronts, so this is one additional project. It also isn’t the first time either – the bonus materials of my WebRTC course includes a recorded lesson by Philipp about video quality in WebRTC.

Free WebRTC 1.0 Webinar

Tomorrow, we will be recording the WebRTC 1.0 lesson together for my course. I’ll be there, and this time,  partially as a student.

To make things a bit more interesting, as well as promoting the whole course, this lesson will be given live in the form of a free webinar:

  • Anyone can join for free to learn about WebRTC 1.0
  • The recording will only be available as part of the advanced WebRTC course

This webinar/lesson will take place on

Tuesday, April 10

2-3PM EST (view in your timezone)

Save your seat →

The session’s recording will NOT be available after the event itself. While this lesson is free to attend live, the recording will become an integral part of the course’ lessons.

The post WebRTC 1.0 Training and Free Webinar Tomorrow (on Tuesday) appeared first on

AV1 Specification Released: Can we kiss goodbye to HEVC and royalty bearing video codecs?

Mon, 04/02/2018 - 12:00

AV1 for video coding is what Opus is for audio coding.

The Alliance of Open Media (AOMedia) issued last week a press release announcing its public release of the AV1 specification.

Last time I wrote about AOMedia was over a year ago. AOMedia is a very interesting organization. Which got me to sit down with Alex Eleftheriadis, Chief Scientist and Co-founder of Vidyo, for a talk about AV1, AOMedia and the future of real time video codecs. It was really timely, as I’ve been meaning to write about AV1 at some point. The press release, and my chat with Alex pushed me towards this subject.


  • We are moving towards a future of royalty free video codecs
  • This is due to the drastic changes in our industry in the last decade
  • It won’t happen tomorrow, but we won’t be waiting too long either

Before you start, if you need to make a decision today on your video codec, then check out this free online mini video course

H.264 or VP8?

Now let’s start, shall we?

AOMedia and AV1 are the result of greed

When AOMedia was announced I was pleasantly surprised. It isn’t that apparent that the founding members of AOMedia would actually find the strength to put their differences aside for the greater good of the video coding industry.

Video codec royalties 101

You see, video codecs at that point in time was a profit center for companies. You invested in research around video coding with the main focus on inventing new patents that will be incorporated within video codecs that will then be globally used. The vendors adopting these video codecs would pay royalties.

With H.264, said royalties came with a cap – if you distributed above a certain number of devices that use H.264, you didn’t have to pay more. And the same scheme was put in place when it came to HEVC (H.265) – just with a higher cap.

Why do we need this cap?

  1. Companies want to cap their commitment and expense. In many cases, you don’t see direct revenue per device, so no cap means this it is harder to match with asymmetric business models and applications that scale today to hundreds of millions of users
  2. If a company needs to pay based on the number of devices they sell, then the one holding the patents and getting the payment for royalties knows that number exactly – something which is considered trade secret for many companies

So how much money did MPEG-LA took in?

Being a private company, this is hard to know. I’ve seen estimates of $10M-50M, as well as $17.5B on Quora. The truth is probably somewhere in the middle. Which is still a considerable amount of money that was funnelled to the patent owners.

With royalty revenues flowing in, is it any wonder that companies wanted more?

An interesting tidbit about this greed (or shall we say rightfulness) can be found in the Wikipedia page of VP8:

In February 2011, MPEG LA invited patent holders to identify patents that may be essential to VP8 in order to form a joint VP8 patent pool. As a result, in March the United States Department of Justice (DoJ) started an investigation into MPEG LA for its role in possibly attempting to stifle competition. In July 2011, MPEG LA announced that 12 patent holders had responded to its call to form a VP8 patent pool, without revealing the patents in question, and despite On2 having gone to great lengths to avoid such patents.

So… we have a licensing company whose members are after royalty payments on patents. They are blinded by the success of H.264 and its royalty scheme and payments, so they go after anything and everything that looks and smells like competition. And they are working towards maintaining their market position and revenue in the upcoming HEVC specification.

The HEVC/H.265 royalties mess

Leonardo Chiariglione, founder and chairman of MPEG, attests in a rather revealing post:

Good stories have an end, so the MPEG business model could not last forever. Over the years proprietary and “royalty free” products have emerged but have not been able to dent the success of MPEG standards. More importantly IP holders – often companies not interested in exploiting MPEG standards, so called Non Practicing Entities (NPE) – have become more and more aggressive in extracting value from their IP.

HEVC, being a new playing ground, meant that there were new patents to be had – new areas where companies could claim having IP. And MPEG-LA found itself one of many patent holder groups:

MPEG-LA indicated its wish to take home $0.2 per device using HEVC, with a high cap of around $25M.

HEVC Advance started with a ridiculously greedy target of $0.8 per device AND %0.5 of the gross margin of streaming services (unheard of at the time) – with no cap. It since rescinded, making things somewhat better. It did it a bit too late in the game though.

Velos Media spent money on a clean and positive website. Their Q&A indicate that they haven’t yet made a decision on royalties, caps and content royalties. Which gives great confidence to those wanting to use HEVC today.

And then there are the unaffiliated. Companies claiming patents related to HEVC who are not in any pool. And if you think they won’t be suing anyone then think again – Blackberry just sued Facebook for messaging related patents – easy to see them suing for HEVC patents in their current position. Who can blame them? They have been repeatedly sued by patent trolls in the past.

HEVC is said to be the next biggest thing in video coding. The successor of our aging H.264 technology. And yet, there’s too many unknowns about the true price of using it. Should one pay royalties to MPEG-LA, HEVC Advance and Velos Media or only one of them? Would paying royalties protect from patent litigation?

Is it even economically viable to use HEVC?

Yes. Apple has introduced HEVC in iOS 11 and iPhone X. My guess is that they are willing to pay the price as long as this keeps the headache and mess on the Android camp (I can’t see the vendors there coming to terms of who is the one in the value chain that will end up paying the royalties for it).

With such greed and uncertainty, a void was left. One that got filled by AOMedia and AV1.

AOMedia – The who’s who of our industry

AOMedia is a who’s who list of our industry. It started small, with just 7 big names, and now has 12 founding members and 22 promoter members.

Some of these members are members of MPEG-LA or already have patents in HEVC and video coding. And this is important. Members of AOMedia effectively allow free access to essential patents in the implementation of AOMedia related specifications. I am sure there are restrictions applied here, but the intent is to have the codecs coming out of AOMedia royalty free.

A few interesting things to note about these members:

  • All browser vendors are there: Google, Mozilla, Microsoft and Apple
  • All large online streaming vendors are there: Google (=YouTube), Amazon and Netflix
  • From that same streaming industry, we also have Hulu, Bitmovin and Videolan
  • Most of the important chipset vendors are there: Intel, AMD, NVidia, Arm and Broadcom
  • Facebook is there
  • Of the enterprise video conferencing vendors we have Cisco, Vidyo and Polycom
  • Qualcomm is missing

AOMedia is at a point that stopping it will be hard.

Here’s how AOMedia visualize its members’ products:

What’s in AV1?

AV1 is a video codec specification, similar to VP8, H.264, VP9 and HEVC.

AV1 is built out of 3 main premises:

  1. Royalty free – what gets boiled into the specification is either based on patents of the members of AOMedia or uses techniques that aren’t patented. It doesn’t mean that companies can’t claim IP on AV1, but as far as the effort on developing AV1 goes, they aren’t knowingly letting in patents
  2. Open source reference implementation – AV1 comes with an open source implementation that you can take and start using. So it isn’t just a specification that you need to read and build with a codec from scratch
  3. Simple – similar to how WebRTC is way simpler than other real time media protocols, AV1 is designed to be simple

Simple probably needs a bit more elaboration here. It is probably the best news I heard from Alex about AV1.

Simplicity in AV1

You see, in standardization organizations, you’ll have competing vendors vying for an advantage on one another. I’ve been there during the glorious days of H.323 and 3G-324M. What happens there, is that companies come up with a suggestion. Oftentimes, they will have patents on that specific suggestion. So other vendors will try to block it from getting into the spec. Or at the very least delay it as much as they can. Another vendor will come up with a similar but different enough approach, with their own patents, of course. And now you’re in a deadlock – which one do you choose? Coalitions start emerging around each approach, with the end result being that both approaches will be accepted with some modifications and get added into the specification.

But do we really need both of these approaches? The more alternatives we have to do something similar, the more complex the end result. The more complex the end result, the harder it is to implement. The harder it is to implement, well… the closer it looks like HEVC.

Here’s the thing.

From what I understand, and I am not privy to the intricate details, but I’ve seen specifications in the past, and been part of making them happen, HEVC is your standard design-by-committee specification. HEVC was conceived by MPEG-LA, which in the last 20 years have given us MPEG-2, H.264 and HEVC. The number of members in MPEG-LA with interests in getting some skin in this game is large and growing. I am sure that HEVC was a mess of a headache to contend with.

This is where AV1 diverges. I think there’s a lot less politics going on in AOMedia at the moment than in MPEG-LA. Probably due to 2 main reasons:

  1. It is a newer organization, starting fresh. There’s politics there as there are multiple companies and many people, but since it is newer, the amount of politics involved will be lower than an organization that has been around for 20+ years
  2. There’s less money involved. No royalties means no pie to split between patent holders. So less fights about who gets his tools and techniques incorporated into the specification

The end result? The design is simpler, which makes for better implementations that are just easier to develop.


In real life, we’re yet to see if AV1 performs better than HEVC and in what ways.

Current estimates is that AV1 performans equal or better than HEVC when it comes to real time. That’s because AV1 has better tools for similar computation load than what can be found in HEVC.

So… if you have all the time in the world to analyze the video and pick your tools, HEVC might end up with better compression quality, but for the most part, we can’t really wait that long when we encode video – unless we encode the latest movie coming out from Hollywood. For the rest of us, faster will be better, so AV1 wins.

The exact comparison isn’t there yet, but I was told that experiments done on the implementations of both AV1 and HEVC shows that AV1 is equal or better to HEVC.

Streaming, Real Time and SVC

There is something to be said about real time, which brings me back to WebRTC.

Real time low delay considerations of AV1 were discussed from the onset. There are many who focus on streaming and offline encoding of videos within AOMedia, like Netflix and Hulu. But some of the founding members are really interested in real time video coding – Google, Facebook, Cisco, Polycom and Vidyo to name a few.

Polycom and Vidyo are chairing the real time work group, and SVC is considered a first class citizen within AV1. It is being incorporated into the specification from the start, instead of being bolt-on into it as was done with H.264 and VP9.

Low bitrate

Then there’s the aspect of working at low bitrates.

With the newer codecs, you see a real desire to enhance the envelope. In many cases, this means increasing the resolution and frame rates a video codec supports.

As far as I understand, there’s a lot of effort being put into AV1 in the other side of the scale – in working at low resolutions and doing that really well. This is important for Google for example, if you look at what they decided to share about VP9 on YouTube:

For YouTube, it isn’t only about 4K and UHD – it is on getting videos to be streamed everywhere.

Based on many of the projects I am involved with today, I can say that there are a lot of developers out there who don’t care too much about HD or 4K – they just want to get decent video being sent and that means VGA resolutions or even less. Being able to do that with lower bitrates is a boon.

Is AV1 “next gen”?

I have always considered AV1 to be the next next generation:

We have H.264 and VP8 as the current generation of video codecs, then HEVC and VP9 as the next generation, and then there’s AV1 as the next next generation.

In my mind, this is what you’d get when it comes to compression vs power requirements:

Alex opened my eyes here, explaining that reality is slightly different. If I try translating his words to a diagram, here’s what I get:

AV1 is an improvement over HEVC but probably isn’t a next generation video codec. And this is an advantage. When you start working on a new generation of a codec, the work necessary is long and arduous. Look at H.261, H.263, H.264 and HEVC codec generations:

Here are some interesting things that occured to me while placing the video codecs on a timeline:

  • The year indicated for each codec is the year in which an initial official release was published
  • Understand that each video codec went through iterations of improvements, annexes, appendices and versions (HEVC already has 4 versions)
  • It takes 7-10 from one version until the next one gets released. On the H.26x track, the number of years between versions has grown through time
  • VP8 and VP9 have only 4 years between one and the other. It makes sense, as VP8 came late in the game, playing catch-up with H.264 and VP9 is timed nicely with HEVC
  • AV1 comes only 6 years after HEVC. Not enough time for research breakthroughs that would suggest a brand new video codec generation, but probably enough to make improvements on HEVC and VP9
About the latest press release

AOMedia has been working towards this important milestone for quite some time – the 1.0 version specification of AV1.

The first thing I thought when seeing it is: they got there faster than WebRTC 1.0. WebRTC has been announced 6 years ago and we’re just about to have it announced (since 2015 that is). AOMedia started in 2015 and it now has its 1.0 ready.

The second one? I was interested in the quotes at the end of that release. They show the viewpoints of the various members involved.

  • Amazon – great viewing experience
  • Arm – bringing high-quality video to mobile and consumer markets
  • Cisco – ongoing success of collaboration products and services
  • Facebook – video being watched and shared online
  • Google – future of media experiences consumers love to watch, upload and stream
  • Intel – unmatched video quality and lower delivery costs across consumer and business devices as well as the cloud’s video delivery infrastructure
  • NVIDIA – server-generated content to consumers. […] streaming video at a higher quality […] over networks with limited bandwidth
  • Mozilla – making state-of-the-art video compression technology royalty-free and accessible to creators and consumers everywhere
  • Netflix – better streaming quality
  • Microsoft – empowering the media and entertainment industry
  • Adobe – faster and higher resolution content is on its way at a lower cost to the consumer
  • AMD – best media experiences for consumers
  • Amlogic – watch more streaming media
  • Argon Design – streaming media ecosystem
  • Bitmovin – greater innovation in the way we watch content
  • Broadcom – enhance the video experience across all forms of viewing
  • Hulu – Improving streaming quality
  • Ittiam Systems – the future of online video and video compression
  • NGCodec – higher quality and more immersive video experiences
  • Vidyo – solve the ongoing WebRTC browser fragmentation problem, and achieve universal video interoperability across all browsers and communication devices
  • Xillinx – royalty-free video across the entire streaming media ecosystem

Apple decided not to share a quote in the press release.

Most of the quotes there are about media streaming, with only a few looking at collaboration and social. This somewhat saddens me when it comes from the likes of Broadcom.

I am glad to see Intel and Arm taking active roles. Both as founding members and in their quotes to the press release. It is bad that Qualcomm and Samsung aren’t here, but you can’t have it all.

I also think Vidyo are spot-on. More about that later.

What’s next for AOMedia?

There’s work to be done within AOMedia with AV1. This is but a first release. There are bound to be some updates to it in the coming year.

Current plans are to have some meaningful software implementation of AV1 encoder/decoder by the end of 2018, and somewhere during 2019 (end of most probably) have hardware implementations available. Here’s the announced timeline from AOMedia:

Rather ambitious.

Realistically, mass adoption would happen somewhere in 2020-2022. Until then, we’ll be chugging along with VP8/H.264 and fighting it out around HEVC and VP9.

There are talks about adding still image format based on the work done in AV1, which makes sense. It wouldn’t be farfetched to also incorporate future voice codecs into AOMedia. This organization has shown it can bring into it the industry leaders into a table and come up with royalty free codecs that benefit everyone.

AV1 and WebRTC

Will we see AV1 in WebRTC? Definitely.

When? Probably after WebRTC 1.0. Or maybe not

It will take time, but the benefits are quite clear, which is what Alex of Vidyo alluded to in the quote given in the press release:

“solve the ongoing WebRTC browser fragmentation problem, and achieve universal video interoperability across all browsers and communication devices”

We’re still stuck in the challenge of which video codec to select in WebRTC applications.

  • Should we go for VP8, just because everyone does, it is there and it is royalty free?
  • Or should we opt for H.264, because Safari supports it, and it has better hardware support.
  • Maybe we should go for VP9 as it offers better quality, and “suffer” the computational hit that comes with it?

AV1 for video coding is what Opus is to audio coding. That article I’ve written in 2013? It is now becoming true for video. Once adoption of AV1 hits – and it will in the next 3-5 years, the dilemma of which video codec to select will be gone.

Until then, check out this free mini course on how to select the video codec for your application

Sign up for free

The post AV1 Specification Released: Can we kiss goodbye to HEVC and royalty bearing video codecs? appeared first on

Get trained to be your company’s WebRTC guy

Mon, 03/26/2018 - 12:00

Demand for WebRTC developers is stronger than supply.

My inbox is filled with requests for experienced WebRTC developers on a daily basis. It ranges from entrepreneurs looking for a technical partner, managers searching for outsourcing vendors to help them out. My only challenge here is that developers and testers who know a thing or two about WebRTC are hard to find. Finding developers who are aware of the media stack in WebRTC, and not just dabbled into using a github “hello world” demo – these are truly rare.

This is why I created my WebRTC course almost 2 years ago. The idea was to try and share my knowledge and experience around VoIP, media processing and of course WebRTC, with people who need it. This WebRTC training has been a pleasant success, with over 200 people who took it already. And now it is time for the 4th round of office hours for this course.

Who is this WebRTC training for?

This WebRTC course is for anyone who is using WebRTC in his daily work directly or indirectly. Developers, testers, software architects and product managers will be those who benefit from it the most.

It has been designed to give you the information necessary from the ground up.

If you are clueless about VoIP and networking, then this course will guide you through the steps needed to get to WebRTC. Explaining what TCP and UDP are, how HTTP and WebSockets fit on top of it, going to the acronyms used by WebRTC (SRTP, STUN, TURN and many others).

If you have VoIP knowledge and experience, then this course will cover the missing parts – where WebRTC fits into your world, and what to take special attention to, assuming a VoIP background (WebRTC brings with it a different mindset to the development process).

What I didn’t want to do, is have a course that is so focused on the specification that: (1) it becomes irrelevant the moment the next Chrome browser is released; (2) it doesn’t explain the ecosystem around WebRTC or give you design patterns of common use cases. Which is why I baked into the course a lot of materials around higher level media processing, the WebRTC ecosystem and common architectures in WebRTC.

TL;DR – if you follow this blog and find it useful, then this course is for you.

Why take it?

The question should be why not?

There are so many mistakes and bad decisions I see companies doing with WebRTC. From deciding how to model their media routes, to where to place their TURN servers (or configure them). Through how to design scale out, to which open source frameworks to pick. Such mistakes end up a lot more expensive than any online course would ever be.

In April, next month, I will be starting the next round of office hours.

While the course is pre-recorded and available online, I conduct office hours for a span of 3-4 months twice a year. In these live office hours I go through parts of the course, share new content and answer any questions.

What does it include?

The course includes:

  • 40+ lessons split into 7 different modules with an additional bonus module
  • 15 hours of video content, along with additional links for extra reading material
  • Several e-books available only as part of the course, like how the Jitsi team scales Jitsi Meet, and what are sought after characteristics in WebRTC developers
  • A private online forum
  • The office hours

In the past two months I’ve been working on refreshing some of the content, getting it up to date with recent developments. We’ve seen Edge and Safari introducing WebRTC during that time for example. These updated lessons will be updated in the course before the official launch.

When can I start?

Whenever you want. In April, I will be officially launching the office hours for this course round. At that point in time, the updated lessons will be part of the course.

What more, there will be a new lesson added – this one about WebRTC 1.0. Philipp Hancke was kind enough to host this lesson with me as a live webinar (free to attend live) that will become an integral lesson in the course.

If you are interested in joining this lesson live:

Free WebRTC 1.0 Live Lesson

What if I am not ready?

You can always take it later on, but I won’t be able to guarantee pricing or availability of the office hours at that point in time.

If you plan on doing anything with WebRTC in the next 6 months, you should probably enroll today.

And by the way – if you need to come as a team to up the knowledge and experience in WebRTC in your company, then there are corporate plans for the course as well.

CONTENT UPGRADE: If you are serious about learning WebRTC, then check out my online WebRTC training:

Enroll to course

The post Get trained to be your company’s WebRTC guy appeared first on

How WebRTC Statistics and Performance Monitoring Changed VoIP Monitoring

Mon, 03/19/2018 - 12:00

Monitoring focus is shifting from server-side to client-side in WebRTC statistics collection.

WebRTC happens to decentralize everything when it comes to VoIP. We’re on a journey here to shift the weight from the backend to the edge devices. While the technology in WebRTC isn’t any different than most other VoIP solutions, the way we end up using it and architecting our services around it is vastly different.

One of the prime examples here is how we shifted focus for group calling from an MCU mixing model to an SFU routing model. Suddenly, almost overnight, the notion of deploying MCU started to seem ridiculous. And believe me – I should know – I worked at a company where %60+ came from MCUs.

The shift towards SFU means we’re leaning more on the capabilities and performance of the edge device, giving it more power in the interaction when it comes to how to layout the display, instead of doing all the heavy lifting in the backend using an MCU. The next step here will be to build mesh networks, though I can’t see that future materializing any time soon.

VoIP != WebRTC. Maybe not from a direct technical point, but definitely from how we end up using it. If you need to learn more about WebRTC, then my WebRTC training is exactly what you need:

Enroll to course

What I wanted to mention here is something else that is happening, playing towards the same trend exactly – we are moving the collection of VoIP performance statistics (or more accurately WebRTC statistics) from the backend to the edge – we now prefer doing it directly from the browser/device.

VoIP Statistics Collection and Monitoring

If you are not familiar with VoIP statistics collecting and monitoring, then here’s a quick explainer for you:

VoIP is built out of the notion of interoperability. Developers build their products and then test it against the spec and in interoperability events. Then those deploying them integrate, install and run a service. Sometimes this ends up by using a single vendor, but more often than not, multiple vendor products run in the same deployment.

There is no real specification or standard to how monitoring needs to happen or what kind of statistics can, should or is collected. There are a few means of collecting that data, and one of the most common approaches is by employing HEP/EEP. As the specification states:

The Extensible Encapsulation protocol (“EEP”) provides a method to duplicate an IP datagram to a collector by encapsulating the original datagram and its relative header properties (as payload, in form of concatenated chunks) within a new IP datagram transmitted over UDP/TCP/SCTP connections for remote collection. Encapsulation allows for the original content to be transmitted without altering the original IP datagram and header contents and provides flexible allocation of additional chunks containing additional arbitrary data. The method is NOT designed or intended for “tunneling” of IP datagrams over network segments, and best serves as vector for passive duplication of packets intended for remote or centralized collection and long term storage and analysis.

Translating this to plain English: media packets are duplicated for the purpose of sending them off to be analyzed via a monitoring service.

The duplication of the packets happens in the backend, through the different media servers that can be found in a VoIP network. Here’s how it is depicted on HOMER/SIPCAPTURE’s website:

HOMER collects its data directly from the servers – OpenSIPS, FreeSWITCH, Asterisk, Kamailio – there’s no user devices here – just backend servers.

Other systems rely on the switches, routers and network devices that again reside in the backend infrastructure. Since in VoIP production networks, we almost always route the media through the backend servers, the assumption is that it is easier to collect it here where we have more control than from the devices.

This works great, but not really needed or helpful with WebRTC.

WebRTC Statistics Collection and Monitoring

With WebRTC, there are only a handful of browsers (4 to be exact), and they all adhere to the same API (that would be WebRTC). And they all have that thing called getstats() implemented in them. These get the same information you find in chrome://webrtc-internals.

Many deployments end up running peer-to-peer, having the media traverse directly through the internet and not through the backend of the service itself. Google Hangouts decided to take that route two years ago. Jitsi added this capability under the name Jitsi P2P4121. How do these services control and understand the quality of their users?

If you look at other media servers out there, most of them are a few years old only. WebRTC is just 6 years old now. So everyone’s focused on features and stability right now. Quality and monitoring is not in their focus area just yet.

Last, but not least, WebRTC is encrypted. Always. And everywhere. So sniffing packets and deducing quality from them isn’t that easy or accurate any longer.

This led to the focus of WebRTC applications in gathering WebRTC statistics from the browsers and devices directly, and not trying to get that information from the media servers.

The end result? Open source projects such as rtcstats and commercial services such as At the heart of these, WebRTC statistics gets collected using the getstats() API at an interval of one or more seconds, sent over to a monitoring server, where it is collected, stored, aggregated and analyzed. We use a similar mechanism at testRTC to collect, analyze and visualize the results of our own probes.

What does that give us?

  1. The most accurate indication of performance for the end user – since the statistics are collected directly on the user’s device, there’s no loss of information from backend collection
  2. Easy access to the information – there’s a uniform means of data collection here taking place. One you can also implement inside native mobile and desktop apps that use WebRTC
  3. Increased reliance on the edge, a trend we see everywhere with WebRTC anyway
What’s Next?

WebRTC chances a lot of things when it comes to how we think and architect VoIP networks. The part of how and why this is done on statistics and monitoring is something I haven’t seen discussed much, so I wanted to share it here.

The reason for that is threefold:

  1. Someone asked me a similar question on my contact page in the last couple of days, so it made sense to write a longform answer as well
  2. We’re contemplating at testRTC offering a passive monitoring product to use “on premise”. If you want to collect, store and analyze your own WebRTC statistics without giving it to any third party cloud service, then ping us at testRTC
  3. My online WebRTC training is getting a refresher and a new round of office hours. This all starts in April. Time to enroll if you want to educate yourself on WebRTC


The post How WebRTC Statistics and Performance Monitoring Changed VoIP Monitoring appeared first on

Twilio Flex = Twilio Flexing its Flexibility (or the programmable contact centers)

Wed, 03/14/2018 - 12:00

Twilio Flex is a peak into the future of enterprise software.

This week, Twilio announced a new product called Flex. The name and the broad strokes about what Flex is found their way to TechCrunch some two weeks ago. I wanted to share my thoughts about Twilio Flex.

A few notes before I start
  • Twilio isn’t paying me for writing this
    • They are a customer in other areas, but this one is all me. I think Flex (as well as Studio, Engagement Cloud, Functions, etc.) are interesting products coming from Twilio, and they are worth a long form analysis and review
    • Articles on are never paid for. Neither are guest posts or interviews. If something interests me, I’ll write about it
  • The information here is based mainly on a briefing I received about Flex and what I found since then on other sites (and on Twilio’s website)
  • Flex is a departure of many things Twilio has been doing, making it an interesting initiative to analyze
What is Twilio Flex?

Twilio Flex is CCaaS (Contact Center as a Service. It isn’t the first one. Twilio is touting it a Programmable Contact Center, which is how they are referring to all of their products.

Here’s Jeff Lawson’s keynote from Enterprise Connect, as usual, Jeff’s keynotes are worth the time and attention:

Where Twilio tried to differentiate Flex from existing solutions is by making it a fully functional contact center solution that is Flexible enough to customize and modify. It has APIs, but the day-to-day users won’t see them, and a lot of the customizations needed don’t require digging deep into the API layer either. That’s at least the intent (I didn’t have the chance to see the integration and API layers of Flex yet).

Twilio highlights 5 main benefits with Flex:

  • Unlimited customization – through the lower layers of Twilio’s product portfolio, along with a new addition to it, the Flex UI (not a lot/enough was explained about it thus far)
  • Instant omnichannel – support for multiple communication channels. More on this later
  • Contextual intelligent – Twilio’s ML/AI roadmap lies here
  • Trusted scale- due to its use of the Twilio infrastructure
  • 2 million developers – that’s the number of Twilio registered developers

Flex fits well into one of Twilio’s largest market segments – the contact center. And there, Twilio are aiming for the contact centers sizing 1,000+ seats. The big boyz.

As it was working to move up the food chain, offering ever larger components, migrating away from developers towards end users in the B2B space and in contact centers made sense.

Flex and the Twilio Portfolio

If I had to map the road Twilio is taking with its portfolio, it would end up being something like this (I’ve removed a lot of the products for simplicity):

Transactional: It started with SMS and Voice, adding VoIP services and later on expanding horizontally to other components and building blocks such as IP Messaging and others. In this layer, and to some extent in Omnichannel, Twilio’s focus is in a horizontal expansion towards “Best of Suite” offering.

Omnichannel: In 2017, Twilio added the Twilio Engagement Cloud. It placed a few existing products from its portfolio in that layer and added Notify and Proxy to them. They stated that these are “Declarative APIs” talking about general intent while including logic of their own. At the end of the day, many of the products/APIs in this layer are Omnichannel – they work across channels using the one available/preferred/whatever for the task at hand.

Visual: This is where the story became really interesting. Twilio added Studio to its portfolio. It went up the food chain again, this time, with a visual IDE and a message that Twilio is no longer a company that serves only developers, but one that can be used by others within the organization.

Programmable Enterprise Software: This is where Flex comes in, going up the food chain again. This time, offering a solution that doesn’t interact with the end users only as a consequence (a phone rings), but rather has a new set of users – people who aren’t developers or planners who sit in front of the tool every day and use it. The contact center agents and personnel.

Flex was defined to me in the domain of “Programmable Applications”. Twilio, in a way, trying to do two things with this definition:

  1. Programmable means it isn’t diverging from its roots completely, just taking the obvious next step in its evolution. All of its core products are Programmable X (X being SMS, Voice, Video, …)
  2. It allows it to position Flex not as another contact center, but rather as something new that is different

To me it is about the future of enterprise software and how to make it programmable and flexible in ways that are still impossible today. The closest to that we’ve got is probably having so many vendors integrate with Zapier.

I am sold to that kind of a future, but I am not sure others will be.

Flex Channels Proposition

Flex leans on a lot of other products in Twilio’s portfolio. One of its core values lies in omnichannel, and the fact that Twilio is already investing in a programmable layer that handles that (the Engagement Cloud). The proposition here is that whatever Twilio adds as a channel for developers, gets almost automatically added to Flex for its contact center customers.

Out the door, Flex comes with support for Voice, SMS, Chat, Video, Email, Fax, Twitter DM, Google RCS, Facebook Messenger and LINE. It also includes Screen Sharing and Co-Browsing as additional capabilities within the interactions. Developers can add additional channels to customize their contact center as well.

The list of channels is impressive, but somehow Apple Business Chat is missing in that list. Apple’s launch partners in this case were contact center vendors (LivePerson, Nuance, Genesys and Salesforce). Twilio, which is still recognized solely as a CPaaS vendor didn’t make the cut. I am sure Twilio tried becoming a partner, so this is more likely a decision made by Apple. I am also sure that once Apple opens up Business Chat to more developers, Twilio will be adding support to it.

The biggest promise here? Twilio is already committed to omnichannel in its products, and Flex will enjoy from that commitment as will Flex’ customers.

Think you know how WebRTC fits in a contact center? Check out with The Complete WebRTC Contact Center Uses Swipefile

Get the swipefile Machine Learning and Artificial Intelligence in Flex

A year or two ago, ML and AI in CPaaS was science fiction. Twilio as well as its competitors delved in the real time. In transactional and transient communications. If any machine learning work was taking place, it was in the operational layers – in an effort to optimize cost and deliverability of its service to its customers.

Last year, Twilio launched Understand, a layer built on top of Google’s Natural Language Processing capabilities (NLP). Understand is where Twilio started looking in ML and AI in the context of actual services for its customers. It looks at the problem domain of its customers (mainly contact centers) and tries to offer higher level APIs that are easier to use and are targeted at NLU (Natural Language Understanding). This then gets focused to the specific domain of the customer’s needs, and you get something that is usable today (as opposed to building a general purpose AI such as Siri, Alexa or Google Assistant).

The result in Understand is a way to simplify the development processes and requirements for Twilio’s customers when it comes to NLU.

That also got wrapped into Flex, at least on slides.

My feelings? The AI story of Flex is built out of two parts:

  1. Collecting all the existing ML/AI/intelligent related capabilities of Twilio and wrapping them inside Flex. This is done through internal APIs as well as via partners
  2. Having a roadmap vision / story of what AI means in Flex moving forward

AI being the holy grail that it is, you can’t ignore it when launching a new service these days.

Flex Pricing is Key

Pricing for Flex hasn’t been announced, but one thing was made clear – it will be based on a per seat price and not usage based as other Twilio products.

This is where things get somewhat challenging for Twilio, and here’s why:

  • Twilio has been comfortable so far to offer a usage based model. Switching to a per seat model will have its differences in how it calculates its revenue and margins
  • By opting for per seat pricing, Twilio falls into the contact center industry “comfort zone” – the model is known and accepted already
  • But this also makes comparing Twilio Flex pricing to other contact centers rather “easy”. It means I can now compare apples to apples when selecting between Flex and any other vendor
  • We don’t have price points, but if the price point will be based on the industry average or accepted standard, then many analysts and experts will end up saying that there’s no disruption or anything new in Twilio Flex. For the pundits, Flex may seem like an ordinary contact center and without price disruption there can be no disruption with that mindset
  • If the price points are too high, then Twilio will be going after its own contact center customers, who will see this as direct competition. Such a move can signal others that Twilio is willing to go into their turf as well. It will question the potential and attractiveness of joining the Flex marketplace
  • If the price points will be lower, then where will be the margins for Twilio?

My guess is that Twilio is still looking for price validation and it is doing so this week at Enterprise Connect and planning to continue doing so in the coming weeks until it is ready to announce the price points publicly.

Who is Twilio Flex for?

This is the main question, and one that I am not sure of the answer.

Twilio is saying the target audience is 1,000+ seats contact centers. It makes sense to go for the larger contact centers at a time when the transition towards the cloud and digital transformations of contact centers is happening more.

But would I be using it in my business or go through a third party?

Should a Twilio customer that built a contact center on its own on top of Twilio migrate to Flex?

Should a Twilio customer that built a contact center for others to use on top of Twilio see Flex as a threat or as an opportunity to improve its own contact center offering?

Twilio stated that 89% of contact centers today are still deployed on premise, and that the market is large enough. These statement was said to answer two questions:

  1. The market is big enough for both its existing customers and for Flex, so it isn’t competing directly with its customers (I guess its customers will have to decide if that’s true for them or not)
  2. The market is big for Twilio to grow in. Twilio is relying on that to keep growing

Twilio was already trending upwards when the word on Flex leaked by TechCrunch on Feb 17, and has increasing since:

source: Google

Is that related to Flex or not, I can’t say. To me, going to contact centers as an adjacent market and eating up more of the pie there is a bold move. If it succeed, then Twilio will be much bigger than it is today.

The Unknowns

There are things that are still unknown to me here. They are technical ones, but important for my own perspective and analysis. They are related to what wasn’t directly in the briefing or the materials I’ve seen since the official announcement.

Here are a few things I am really interested in:

  • What are the exact integration points for Flex?
  • How are developers expected to integrate with it?
  • Where do you use Twilio APIs? Where will you be making use of Twilio Studio? Where do you write a Twilio Function? How about Twilio Understand?
  • Flex UI is brand new. How does it fair as a standalone product enabler? What can developers do with it?
  • What will it mean to integrate Flex with a CRM? Does it make more sense to integrate the CRM into the Flex UI or does it make more sense to integrate Flex into the CRM UI?
  • What parts of “contextual intelligence” really exist in Flex today? How does it compare to existing market offerings?
  • What do contact center vendors using Twilio think about Flex? How will they react to it?
Is CPaaS Eating CCaaS?


Here’s one way to map the communications landscape:

And here’s another:

What’s your worldview here?


The post Twilio Flex = Twilio Flexing its Flexibility (or the programmable contact centers) appeared first on

WebRTC 1.0 – What on earth is it anyway? (register to the webinar)

Mon, 03/12/2018 - 12:00

TL;DR – register to this webinar about WebRTC 1.0

As I am prepping to another launch of my Advanced WebRTC Architecture Course, I went through the content to make sure it is up to date. This is by far the hardest thing about a course about something like WebRTC – what was right on Chrome 63 might not be correct anymore for Chrome 64. Or is it 65 now?

I ended up spending time in updating and refreshing some of the lessons with some new material, but I ended up with one area that the course is weak at. And that’s WebRTC 1.0 information.

The problem there is that while I can tell some of the story, I definitely can’t tell it to the level I wanted. It got me to partner again with Philipp Hancke, which I love working with on lots of mini-projects. I asked Philipp if he will be willing to host such a lesson for me as a live webinar and he said yes (yippie).

What’s in the Webinar?

So here’s what we’re going to do:

Next month, right after Passover, and because Philipp asked for April, we’re going to host a lesson/webinar about WebRTC 1.0.

Philipp will skim quickly over the backstory of WebRTC 1.0, where we are today and more importantly where we’re headed with it. What we will cover in more detail will include answers to questions like:

  • What should you change in your app due to WebRTC 1.0?
  • What new tricks did 1.0 teach the “old” WebRTC dog?
  • Do you need to update my app to be compliant and work in Chrome next year?
  • How much effort is involved in this migration to WebRTC 1.0 anyway?
  • If you pick out a WebRTC project on github, how would you know if it supports WebRTC 1.0 or not?

What I want here is for you (and me) to really understand the impact WebRTC 1.0 is going to have on all of us in 2018 and on.


This webinar/lesson will take place on

Tuesday, April 10

1-2PM EST (view in your timezone)

Save your seat →

The session’s recording will NOT be available after the event itself. While this lesson is free to attend live, the recording will become an integral part of the course’ lessons.

The post WebRTC 1.0 – What on earth is it anyway? (register to the webinar) appeared first on

You Better Ignore the Default Protocol Ports You Implement

Mon, 03/05/2018 - 12:00

Default protocol ports are great, but ones that will work in the real world are better.

If you want something done properly, you should probably ignore the specification of the protocols you use every once in awhile. When I worked years ago in implementing protocols directly, there was this notion – you need to send messages in the strictest format possible but be very lenient in how you enable receiving them. The reason behind that is that by being strict on the sender side, you will achieve higher interoperability (more devices will be able to “decipher” what you sent) and by being lenient on the receiving side, you achieve the same (being able to understand messages from more devices). Somehow, it isn’t worth to be right here – it just makes more sense to be smart.

The same apply to default protocol ports.

Assume for the sake of argument that we have a theoretical protocol that requires the use of port number 5349. You setup the server, configure it to listen on that port (after all, we want to be standard compliant), and you run your service.

Will that work well for you?

For the most part, as the illustration above shows, yes it will.

The protocol is probably client-server based. A client somewhere from inside his private network is accessing the Internet, going to the public IP of your server to that specific port and connects. Life is good.

Only sometimes it isn’t.

Hmm… what’s going on here now? Someone in the IT department decided to block outgoing traffic to port 5349. Or maybe, just maybe, he decided to open outgoing traffic solely for ports 80 and 443. And why would he do that? Because that’s where HTTP and HTTPS traffic go to, which is web servers that our browsers connect to. And I don’t know any blue collar employee today who would be able to do his job without connecting the the Internet with his browser. Writing this draft of an article requires such a connection (I do it on Google Doc and then copy it to WordPress once done).

So the same scenario, with the same requirements won’t work if our server decides to use the default port 5349.

What if we decide to pass it through port 443?

Now it has a better chance of working. Why? Because port 443 is reserved for TLS traffic, which is encrypted. This means that beyond the destination of the data, the firewall we’re dealing with can’t know a thing about what’s being sent or where, so he will usually treat it as “HTTPS” type of traffic and will just pass it along.

There are caveats here. If the enterprise is enforcing a local trusted web proxy, it actually acts as a man in the middle and opens all packets, which means he now sees the traffic and might decide not to pass it since he can’t understand it.

What we’re aiming for is best coverage. And port 443 will give us that. It might get blocked, but there’s less of a chance for that to happen.

Here are a few examples where ignoring your protocol default ports is suggested:


The reason for this article is TURN. TURN is used by WebRTC (and other protocols) to get your media session connected in case you can’t send it directly peer-to-peer. It acts as a relay to the media that sits in the public internet with the sole purpose of punching holes in NATs and traversing firewalls.

TURN runs over UDP, TCP and TLS. And yes. You WANT to configure and run it on UDP, TCP and TLS (don’t be lazy – configure them all – it won’t cost you more).

Want to learn more about WebRTC in general and NAT traversal specifically? Enroll to my WebRTC training today to become a pro WebRTC developer.

Enroll to course

The default ports for your STUN and TURN servers (you’re most probably going to deploy them in the same process) are:

  • 3478 for STUN (over UDP)
  • 3478 for TURN over UDP – same as STUN
  • 3478 for TURN over TCP – same as STUN and as TURN over UDP
  • 5349 for TURN over TLS

A few things that come to mind from this list above:

  1. We’re listening to the same port for both UDP and TCP, and for both STUN and TURN – which is just fine
  2. Remember that 5349 from my story above?

Here’s the thing. If you deploy only STUN, then many WebRTC sessions won’t connect. If you deploy also with TURN/UDP then some sessions still won’t connect (mainly because of IT admins blocking UDP altogether). TURN/TCP might not connect either. And guess what – TURN/TLS on 5349 can still be blocked.

What a developer to do in such a case?

Just point your WebRTC devices towards port 443 for ALL of your STUN/TURN traffic and be done with it. This approach has no real downsides versus deploying with the default ports and all the potential upsides.

Here’s how a couple of services I checked almost on random do this properly (I’ve used chrome://webrtc-internals to get them):

Hangouts Meet

Or Google Hangouts. Or Google Meet. Or whatever name it now has. I did use the Meet one:, { iceServers: [,,,,], iceTransportPolicy: all, bundlePolicy: max-bundle, rtcpMuxPolicy: require, iceCandidatePoolSize: 0 }, {enableDtlsSrtp: {exact: false}, enableRtpDataChannels: {exact: true}, advanced: [{googHighStartBitrate: {exact: 0}}, {googPayloadPadding: {exact: true}}, {googScreencastMinBitrate: {exact: 400}}, {googCpuOveruseDetection: {exact: true}}, {googCpuOveruseEncodeUsage: {exact: true}}, {googCpuUnderuseThreshold: {exact: 55}}, {googCpuOveruseThreshold: {exact: 85}}]}

Google Meet comes with STUN:19302 with 5 different subdomain names for the server. There’s no TURN here because the service uses ICE-TCP directly from their media servers.

The selection of port 19302 is quaint. I couldn’t find any reference to that number or why it is interesting (not even a mathematical one).

Google AppRTC

You’d think Google’s showcase of WebRTC would be an exemplary citizen of a solid STUN/TURN configuration. Well… he’s what it got me:, { iceServers: [turn:, turn:[2a00:1450:400c:c08::7f]:19305?transport=udp, turn:, turn:[2a00:1450:400c:c08::7f]:443?transport=tcp,], iceTransportPolicy: all, bundlePolicy: max-bundle, rtcpMuxPolicy: require, iceCandidatePoolSize: 0 },

It had TURN/UDP at 19305, TURN/TCP at 443 and STUN at 19302. Unlike others, it had explicit IPv6 addresses. It had no TURN/TLS.

Jitsi Meet, { iceServers: [,,,,,,,,], iceTransportPolicy: all, bundlePolicy: balanced, rtcpMuxPolicy: require, iceCandidatePoolSize: 0 }, {advanced: [{googHighStartBitrate: {exact: 0}}, {googPayloadPadding: {exact: true}}, {googScreencastMinBitrate: {exact: 400}}, {googCpuOveruseDetection: {exact: true}}, {googCpuOveruseEncodeUsage: {exact: true}}, {googCpuUnderuseThreshold: {exact: 55}}, {googCpuOveruseThreshold: {exact: 85}}, {googEnableVideoSuspendBelowMinBitrate: {exact: true}}]}

Jitsi shows multiple locations for STUN and TURN – eu-central, eu-west with STUN:443, TURN/UDP:443 and TURN/TCP:443. No TURN/TLS., { iceServers: [,,], iceTransportPolicy: all, bundlePolicy: balanced, rtcpMuxPolicy: require, iceCandidatePoolSize: 0 }, {advanced: [{googCpuOveruseDetection: {exact: true}}]} went for TURN/UDP:443, TURN/TCP:443 and TURN/TLS:443. STUN is implicit here via the use of TURN.

Facebook Messenger, { iceServers: [, turn:, turn:, turn:], iceTransportPolicy: all, bundlePolicy: balanced, rtcpMuxPolicy: require, iceCandidatePoolSize: 0 }, {advanced: [{enableDtlsSrtp: {exact: true}}]}

Messenger uses port 3478 for STUN, TURN over UDP on port 40002, TURN over TCP on port 3478. It also uses TURN over TCP on port 443. No TURN/TLS for Messenger.

Here’s what I’ve learned here:

  • People don’t use the default STUN/TURN ports in their deployments
  • Even if they don’t use ports that make sense (443), they may not use the default ports (See Google Meet)
  • With seemingly something straightforward as STUN/TURN, everyone ends up implementing it differently

We’ve looked at at NAT Traversal and its STUN and TURN server. But what about some signaling protocols? The first one that came to mind when I thought about other examples was MQTT.

MQTT is a messaging protocol that is used in the IOT and M2M space. Others use it as well – Facebook for example:

They explained how MQTT is used as part of their Messenger backend for the WebRTC signaling (and I guess all other messages they send over Messenger).

MQTT can run over TCP listening on port 1883 and over TLS on port 8883. But then when you look at the AWS documentation for AWS IOT, you find this:

There’s no port 1883 at all, and now port 443 can be used directly if needed.


It would be interesting to know if Facebook Messenger on their mobile app use MQTT over port 443 or 8883 – and if it is port 443, is it MQTT over TLS or MQTT over WebSocket. If what they do with their STUN and TURN servers is any indication, any port number here is a good guess.


SIP is the most common VoIP signaling protocol out there. I haven’t remembered the details, so I checked in Wikipedia:

SIP clients typically use TCP or UDP on port numbers 5060 or 5061 for SIP traffic to servers and other endpoints. Port 5060 is commonly used for non-encrypted signaling traffic whereas port 5061 is typically used for traffic encrypted with Transport Layer Security (TLS).

Port 5060 for UDP and TCP traffic. And port 5061 for TLS traffic.

Then I asked a friend who knows a thing or two about SIP (he’s built more than his share of production SIP networks). His immediate answer?


He remembered 5060 was UDP, 5061 was TCP and 443 is for TLS.

When you want to deploy a production SIP network, you configure your servers to do SIP over TLS on port 443.

Next Steps

If you are looking at protocol implementations and you happen to see some default ports that are required, ask yourself if using them is in your best interest. To get past firewalls and other nasty devices along the route, you might want to consider using other ports.

While you’re at it, I’d avoid sending stuff in the clear if possible and opt for TLS on the connection, which brings us back to 443. Possibly the most important port on the Internet.

If you are serious about learning WebRTC, then check out my online WebRTC training:

Enroll to course

The post You Better Ignore the Default Protocol Ports You Implement appeared first on

“Open Source” SDK for SaaS and CPaaS are… Meh

Mon, 02/26/2018 - 12:00

Open Source SDKs from SaaS vendors aren’t interesting.

Every once in awhile, I see a SaaS vendor boasting to have open source SDKs. The assumption is that if you say “open source” on something you are doing it immediately makes the thing free and open. The truth is far from it.

Planning on selecting a CPaaS vendor? Check out this shortlist of CPaaS vendor selection metrics:

Get the shortlist

Open Source Today

I want to start with an explanation of open source today.

Open source is a way for a vendor or a single developer to share his code with the “community” at large. There are many reasons why a vendor would do such a thing:

  1. To get others in the industry to assist in the effort of building and maintaining that code base (in most cases, such initiatives fail to meet their objective)
  2. To show technical savviness as a company. This is good for the brand’s name and when a company wants to attract top notch developers
  3. To showcase one’s technical abilities. An individual developer can use his github account to attract potential employers and projects
  4. To offer a reference implementation or a helper library for integrating with the company’s application

The above reasons are related to companies with proprietary software that they want protected. What they end up doing, is share modules or parts of their codebase as open source. Usually ones they assume won’t help a competitor copy and compete with them directly.

The other approach, is to use open source as a full fledged business model:

  1. Releasing a project as open source, then offering a non-open source license
  2. Or offering support and an SLA to it
  3. Or offering a hosted version of it
  4. Or offering customization work around it

A good example here is FreeSWITCH. They are offering support and customization work around this popular open source project. And now, there’s SignalWire, an upcoming hosted version of FreeSWITCH.

You see, for a company to employ open source, there needs to be an upside. Philanthropy isn’t a business model for most.

Cloud versus On-premise when Consuming Open Source

SaaS changes the equation a bit.

I tried placing different open source licenses on a kind of a graph, alongside different deployment models. Here’s what I got:

(if you’re interested here’s where to learn more about open source licenses)

CPaaS and SaaS in general are cloud deployments. They enable the company more leeway in the type of open source licenses it can consume. An on-premise type of business better beware of using GPL, whereas a cloud deployment one is just fine using GPL.

This isn’t to say that GPL can’t be used by on premise deployments – just that it complicates things to a point that oftentimes the risks of doing so outweighs the potential reward.

CPaaS / SaaS vendors and Interfaces

On the other end of the equation you’ll find how customers interact with CPaaS vendors.

Towards that goal, the main approach today is by way of an API. And APIs today are almost always defined using REST.

In the illustration above, we have a SaaS or CPaaS vendor exposing a REST API. On top of that API, customers can build their own applications. The vendor wants to make life easier for them, to increase adoption, so he ends up implementing helper libraries. The helper libraries can be official ones or unofficial ones, either created by third parties or the vendor himself. They can just be reference implementations on top of the API, offered as starting points to customers with no real documentation or interface of their own.

For the most part, helper libraries are something I’d expect customers to deploy and run on their servers, to make it easier for them to connect from whatever language and framework they want to use to the vendor’s service.

On a client device, we have SDKs. In some ways, SDKs are just like helper libraries. They connect to the backend REST API, though sometimes they may have a more direct/optimized connection to the platform (proprietary, undocumented WebSocket connection for example).

SDKs is something you’ll find with most of the services where a state machine needs to be maintained on the client side. In the context of most of the things I write here, this includes CPaaS platforms deciding to offer VoIP calling (voice or video) by way of WebRTC or by other means over non-browser implementations. In many of these cases, the developers never actually implement REST calls – they just use the SDK’s interface to get things done.

Which is where the notion of open source SDKs sometimes comes up.

The Open Source SDK

If we’re talking about a SaaS platform, then having the source code of the SDK has its benefits, but none of them relate to “open source”. There’s no ecosystem or adoption at play for the open source code.

The reasons why we’d like to have the source code of an SDK are varied:

  1. Reading the code can give us better understanding of how the service works
  2. Being able to run the code step by step in a debugger makes it easier to troubleshoot stuff
  3. Stack traces are more meaningful in crashes

Here’s the thing though –

Trying to market the SDK as open source is kinda misleading as to what you’re getting out of your end of the deal.

When it comes to CPaaS and WebRTC, there’s this added complexity: vendors will “open source” or give the source code of their JS SDK (because there’s no real alternative today, at least not until WebAssembly becomes commonplace). As for the Android and iOS SDKs, I don’t remember seeing one that is offered in source code form – probably because all vendors are tweaking and modifying the baseline WebRTC code.

SaaS and Open Source

In a way, SaaS has changed the models and uses of open source. When it was first introduced to the world, software was executed on premise only. There was no cloud, and SDKs and frameworks were commercially licensed. If you wanted something done, you either had to license it or build it yourself.

Open source came and changed all that by enabling vendors to build on top of open source code. Vendors came out with business models around dual licensing of code as well as support and customization models.

SaaS vendors today use open source in three different ways:

  1. They use it to build their platform. Due to their model, they are less restricted as to the type of open source licenses they can live with
  2. They open source code modules. Either by forking and sharing modified open source modules they use or by open sourcing specific modules
    1. Mostly because their developers push towards that goal
    2. And because they believe these modules won’t give away any of their competitive advantages
    3. Or to attract potential customers
  3. They may open source their whole platform. Not common, but it does happen. Idea here is to make revenue out of hosting the service at scale and giving away the baseline service for free (think WordPress for example)


Planning on selecting a CPaaS vendor? Check out this shortlist of CPaaS vendor selection metrics:

Get the shortlist

The post “Open Source” SDK for SaaS and CPaaS are… Meh appeared first on

Do I Need a Media Server for a One-to-Many WebRTC Broadcast?

Tue, 02/20/2018 - 12:00


Do I need a media server for a one-to-many WebRTC broadcast?

That’s the question I was asked on my chat widget this week. The answer was simple enough – yes.

Decided you need a media server? Here are a few questions to ask yourself when selecting an open source media server alternative.

Get the Selection Sheet

Then I received a follow up question that I didn’t expect:


That caught me off-guard. Not because I don’t know the answer. Because I didn’t know how to explain it in a single sentence that fits nicely in the chat widget. I guess it isn’t such a simple question either.

The simple answer is a limit in resources, along with the fact that we don’t control most of these resources.

The Hard Upper Limit

Whenever we want to connect one browser to another with a direct stream, we need to create and use a peer connection.

Chrome 65 includes an upper limit to that which is used for garbage collection purposes. Chrome is not going to allow more than 500 concurrent peer connections to exist.

500 is a really large number. If you plan on more than 10 concurrent peer connections, you should be one of those who know what they are doing (and don’t need this blog). Going above 50 seems like a bad idea for all use cases that I can remember taking part of.

Understand that resources are limited. Free and implemented in the browser doesn’t mean that there aren’t any costs associated with it or a need for you to implement stuff and sweat while doing so.

Bitrates, Speeds and Feeds

This is probably the main reason why you can’t broadcast with WebRTC, or with any other technology.

We are looking at a challenging domain with WebRTC. Media processing is hard. Real time media processing is harder.

Assume we want to broadcast a video at a low VGA resolution. We checked and decided that 500kbps of bitrate offers good results for our needs.

What happens if we want to broadcast our stream to 10 people?


Broadcasting our stream to 10 people requires bitrate of 5mbps uplink.

If we’re on an ADSL connection, then we can find ourselves with 1-3mbps uplink only, so we won’t be able to broadcast the stream to our 10 viewers.

For the most part, we don’t control where our broadcasters are going to be. Over ADSL? WiFi? 3G network with poor connectivity? The moment we start dealing with broadcast we will need to make such assumptions.

That’s for 10 viewers. What if we’re looking for 100 viewers? A 1,000? A million?

With a media server, we decide the network connectivity, the machine type of the server, etc. We can decide to cascade media servers to grow our scale of the broadcast. We have more control over the situation.

Broadcasting a WebRTC stream requires a media server.

Sender Uniformity

I see this one a lot in the context of a mesh group call, but it is just as relevant towards broadcast.

When we use WebRTC for a broadcast type of a service, a lot of decisions end up taking place in the media server. If a viewer has a bad network, this will result with packet loss being reported to the media server. What should the media server do in such a case?

While there’s no simple answer to this question, the alternatives here include:

  • Asking the broadcaster to send a new I-frame, which will affect all viewers and increase bandwidth use for the near future (you don’t want to do it too much as a media server)
  • Asking the broadcaster to reduce bitrate and media quality to accomodate for the packet losses, affecting all viewers and not only the one on the bad network
  • Ignoring the issue of packet loss, sacrificing the user for the “greater good” of the other viewers
  • Using Simulcast or SVC, and move the viewer to a lower “layer” with lower media quality, without affecting other users

You can’t do most of these in a browser. The browser will tend to use the same single encoded stream as is to send to all others, and it won’t do a good job at estimating bandwidth properly in front of multiple users. It is just not designed or implemented to do that.

You Need a Media Server

In most scenarios, you will need a media server in your implementation at some point.

If you are broadcasting, then a media server is mandatory. And no. Google doesn’t offer such a free service or even open source code that is geared towards that use case.

It doesn’t mean it is impossible – just that you’ll need to work harder to get there.

Looking to learn more about WebRTC? In the coming weeks, I’ll be refreshing my online WebRTC training. Join now so you don’t miss out.

Enroll to the WebRTC course


The post Do I Need a Media Server for a One-to-Many WebRTC Broadcast? appeared first on

The Internet of Things or Things on the Internet?

Mon, 02/12/2018 - 12:00

Time to stop playing things on the internet and start building the internet of things.

We’ve been using that stupid IOT acronym for quite some time. Probably a decade. The idea and notion that every object can be network enabled, share its collected data and receive its commands remotely is quite exciting. I think we’re far from that vision.

It isn’t that we’re not making progress. We are. The apartment building I now live in is 3 years old. It is more automated than the previous apartment building I lived in, which was 15 years old. I wouldn’t call it IOT or a smart building quite yet. And I don’t think there’s a simple way to turn a dumb building into a smart one either.

When we moved to our new apartment we renovated a bit. There was this opportunity to add smart-home capabilities into the apartment. There were just a few teeny set of problems here:

  1. There’s no real business case for us yet. As a family, we really don’t need a smart-home, and frankly – I still haven’t seen one to appreciate the added benefit
  2. Since we’re in a highrise, the need for an apartment security/surveillance system seemed like an overkill. The most we ended up with is a peephole camera for the door. Mainly to empower or kids to see who’s knocking (no IOT or smarts in it)
  3. Talking to the electrician to ended up dealing with our power outlets at home, I understood that there’s not enough electricians available who know how to install a smart-home kit here in Israel

And to top it all, it felt like a one time undertaking that will be hard/impossible to upgrade or modify later on without a complete overhaul. That wasn’t what I was aiming for.

Mozilla just announced their Things Gateway that can be installed on a Raspberry Pi 3. It is a rather interesting project, especially since its learnings are then applied to the W3C Web of Things Interest Group with the intent of reducing the fragmentation of IOT. They’ve got their hands full of work.

IOT today is a patchwork of devices and companies, each trying to become a dominant player. The end result is that we’re living in a world where things can be placed on the internet, but they don’t amount for an internet of things.

Here are a few questions/hurdles that I think we’ll need to answer as an industry before we can reach that vision of IOT.


I am putting security here first. Here’s why:

  1. We all know it is mandatory
  2. We all know it is left as a backlog item if it is considered at all

I’ve seen it happen with VoIP and it is definitely happening today with IOT.

Until this becomes a priority, IOT will not really happen.

Security has many different aspects to it:

  • Encryption of the communications, to maintain privacy and allow for authorization and authentication of it
  • Upgradability, which itself should be secure, straightforward and automated
  • Audit logs that are hard to tamper with, so we can investigate hacks

Most vendors won’t be able to get these done properly to being with. And they don’t have any real incentive to do that either.


There’s a need for standardization in this space. One that tackles all levels of the IOT food-chain.

Out of the top of my head, here are a few areas:

  • Physical – Wi-Fi, Zigbee, Bluetooth – all are standards for the underlying network layer to be used. There’s also RFID and other type of connections that can be used. And we need to factor in 5G at some point. We’ve got wireless ones and wireline ones. A total mess. Just look at the mozilla Things Gateway announcement for the set of connectors they support and how these get supported. Too much information to get things done easily
  • Transport – once we get communications, and assume (naively) that we have IP communications going, do we then run our data over TCP? Or TLS? Or maybe UDP? Or should we go for QUIC? Or HTTP/2? Should we do it over MQTT maybe? Over a WebSocket? There’s too many alternatives here
  • Signaling – What are the types of messages we’re going to allow? What controls what sensor data? How do we describe it in a way that can be easily extendable and unambiguous? I’ve been there with VoIP and it was hard enough. Doing it for IOT is an order of magnitude harder (more players, more devices, more everything)
  • Processing – this relates to the next topic of automation. Once we can collect, control and make decisions over a single device, can we do it in aggregate, and in ways that won’t lock us in to a single vendor?

I don’t believe we’ll get this thing standardized properly in our industry for quite some time.


I’ve seen a lot of rules engines when it comes to IOT. You can program them to create sequences of events – if the density sensor indicates someone is at home, open the lights.

The problem is that you need to program them. This can’t scale.

The other problem is the issue of what to do with all that sensor data? Someone needs to collect it, aggregate it, process it, analyze it and make decisions out of it.

Simple rule engines are nice, but they won’t get us far down the IOT path.

We also need to add machine learning and AI into the mix.

The end result? Probably similar in nature to AWS Deep Lens. Only problem, it either needs to be really generic and flexible.

Different Industries, Different Requirements and Ecosystems

There are different markets in IOT. they have different needs and different customers. They will have different ecosystems around them.

In broad strokes, we can split to consumer and enterprise. Enterprise here includes industrial, smart cities, etc. The consumer is all about the home, the car and the self.

Who will be the players here?

From Smartphones to Smart Speakers

This is where I think we made the most progress.

Up until a year ago, IOT was something you end up delivering to customers via apps on a smartphone. You purchase a lightbulb, you get an app. You get a new TV, there’s an app. Refrigerator? App.

Amazon Alexa did something miraculous. It moved the discussion over the home from an app towards a stationary home device with voice activation and control. No screen or touch screen needed.

Since then, Google and Apple have joined and voice assistants in the home are all the rage now.

In some ways, I expect this to find its way into the enterprise as well. First via conference rooms and later – who knows?

This is one more piece in the IOT puzzle.

Where do we go from here?

I have no clue.

To me, it seems that we’re still in the things on the internet, and we will be there for a lot longer.

The post The Internet of Things or Things on the Internet? appeared first on

5 Mistakes to Avoid When Developing WebRTC Applications

Mon, 02/05/2018 - 12:00

There are things you don’t want to do when you are NIH’ing your way to a stellar WebRTC application.

Here’s a true, sad story. This month, the unimaginable happened. Rain (!) dropped from the sky here in Israel. The end of it was that 6 apartments in my building are suffering from moisture due to a leakage from a balcony of the penthouse. Being a new building, we’re at the mercies of the contractor to fix it.

Nothing in the construction market moves fast in Israel – or without threats, so we had to start sending official sounding letters to the constructor about the leak. I took charge, and immediately said we need to lawyer up and have a professional assist us in writing a letter from us to the constructor. Others were in the opinion we can do it on our own, as we need a lawyer only if he is signed directly on the document.

And then it hit me. I wanted to lawyer up is because I see many smart people failing with WebRTC. They are making rookie mistakes, and I didn’t want to make rookie mistakes when it comes to the moisture problems in my apartment.

Why are we Failing with WebRTC?

I am not sure that smart people fail a lot more around WebRTC technology than they are with other technologies, but it certainly feels that way.

A famous Mark Twain quote goes like this:

“There is no such thing as a new idea. It is impossible. We simply take a lot of old ideas and put them into a sort of mental kaleidoscope. We give them a turn and they make new and curious combinations. We keep on turning and making new combinations indefinitely; but they are the same old pieces of colored glass that have been in use through all the ages.”

Many of the rookie mistakes people do about WebRTC stems from this. WebRTC is this kind of new. It is simply a lot of old ideas meshed into a new and curious combination. So we know it. And we assume we know how to handle ourselves around it.

Entrepreneurs? Skype is 14 years old. It shouldn’t be that hard to build something like Skype today.

VoIP developers? SIP we know. WebRTC is just SIP without the signaling. So we force SIP onto it and we’re done.

Web developers? WebRTC is part of HTML5. A few lines of JS code and we’re practically ready to go live.

Video developers? We can just take the WebRTC video feeds and put them on a CDN. Can’t we?

The result?

  1. Smart people decide they know enough to go it alone. And end up making some interesting mistakes
  2. People put their faith in one of the above personas… only to fail

My biggest gripe recently is people who decide in 2018 that peerJS is what they need for their WebRTC application. A project with 402 lines of code, last updated in 2015 (!). You can’t use such code with WebRTC. Code older than a year is stale or dead already. WebRTC is still too new and too dynamic.

That said, it isn’t as if you have a choice anymore. Flash is dying, and there’s no other serious alternative to WebRTC. If you’re thinking of adopting WebRTC, then here are five mistakes to avoid.

Mistake #1: Failing to Configure STUN/TURN

You wouldn’t believe how often developers fail to configure NAT traversal servers. Just yesterday I had someone ask me over the chat widget of my website how can he run his application by hosting his signaling and web servers on HostGator without any STUN/TURN servers. It just doesn’t work.

The simple answer is that you can’t – barring some esoteric use cases, you will definitely need STUN servers. And for most use cases, TURN servers will also be mandatory if you want sessions to connect.

In the past month, I found myself explaining quite a lot about NAT traversal:

  • You must use STUN and TURN servers
  • Don’t rely on free STUN servers, and definitely don’t use “free” TURN servers
  • Don’t force all sessions via TURN unless you absolutely know what you’re doing
  • TURN has no added security in using it
  • You don’t need more than 1 STUN server and 3 TURN servers (UDP, TCP and TLS) in your servers configuration in WebRTC
  • Use temporary/ephemeral passwords in your TURN configuration
  • STUN doesn’t affect media quality
  • coturn or restund are great options for STUN/TURN servers

There’s more, but this should get you started.

Mistake #2: Selecting the WRONG Signaling Framework

PeerJS anyone? PeerJS feels like a tourist trap:

With 1,693 stars and 499 forks, PeerJS is one of the most popular WebRTC projects on github. What can go wrong?

Maybe the fact that it is older than the internet?

A WebRTC project that had its last commit 3 years ago can’t be used today.

Same goes for using Muaz Khan’s code snippets and expecting them to be commercial grade, stable, highly scalable products. They’re not. They’re just very useful code snippets.

Planning to use some open source project? Make sure that:

  • Make sure it was updated recently (=the last couple of months)
  • Make sure it is popular enough
  • Make sure you can understand the framework’s code and can maintain it on your own if needed
  • Try to check if there’s someone behind it that can help you in times of trouble

Don’t take the selection process here lightly. Not when it comes to a signaling server and not when it comes to a media server.

Mistake #3: Not Using Media Servers When You Should

I know what you’re thinking. WebRTC is peer to peer so there’s no need for servers. Some think that even signaling and web servers aren’t needed – I hope they can explain how participants are going to find each other.

To some, this peer to peer concept also means that you can run these ridiculously large scale sessions with no servers that carry on media.

Here are two such “architectures” I come across:

Mesh. It’s great. Don’t assume you can get it to run properly this year or the next. Move on.

Live broadcasting by forwarding content. It can be done, but most probably not the way you expect it to grow to a million users with no infrastructure and zero latency.

For many of the use cases out there, you will need a media server to process and route the media for you. Now that you are aware of it, go search for an open source media server. Or a commercial one.

Mistake #4: Thinking Short-Term

You get an outsourcing vendor. Write him a nice requirements doc. Pay him. Get something implemented. And you’re done.

Not really.

WebRTC is still at its infancy. The spec is changing. Browser implementations are changing. It is all in flux all the time. If you’re going to use WebRTC, either:

  1. Use some WebRTC API platform (here are a few), and you’ll be able to invest a bit less on an ongoing basis. There will be maintenance work, but not much
  2. Develop on your own or by outsourcing. In this case, you will need to continue investing in the project for at least the next 3 years or more

WebRTC code rots faster than most other HTML5 code. It will eventually change, but we’re not there yet.

It is also the reason I started with a few colleagues testRTC a few years ago. To help with the lifecycle of WebRTC applications, especially in the area of testing and monitoring.

Mistake #5: Failing to Understand WebRTC

They say assumption is the mother of all mistakes. Google seems to agree with it. Almost.

WebRTC isn’t trivial. It sits somewhere between VoIP and the web. It is new, and the information out there on the Internet about it is scattered and somewhat dynamic (which means lots of it isn’t accurate).

If you plan on using WebRTC, make sure you first understand it and its intricacies. Understand the servers that are needed to deploy a WebRTC application. Understand the signaling mechanisms that are built into WebRTC. Understand how media is processes and sent over the network. understand the rich ecosystem of solutions that can be used with WebRTC to build a production ready system.

Lots of things to learn here. Don’t assume you know WebRTC just because you know web development or because you know VoIP or video processing.

If you are looking to seriously learn WebRTC, why not enroll to my Advanced WebRTC Architecture course?

Enroll to course

What about my apartment? We’ve lawyered up, and now I have someone review and fix all the official sounding letters we’re sending out. Hopefully, it will get us faster to a resolution.


The post 5 Mistakes to Avoid When Developing WebRTC Applications appeared first on

WebRTC Electron Implementations are on 🔥

Mon, 01/29/2018 - 12:00

For WebRTC, Mobile and PC are moving in different directions. In the desktop, WebRTC Electron apps are gaining momentum.

In the good old days, people used to complain that WebRTC isn’t available on all browsers. Mobile was less of an issue for most as mobile application developers port WebRTC and use it natively on both iOS and Android.

How times change.

Need to know where WebRTC is available? Download this free WebRTC Device Cheat Sheet.

Get the Cheat Sheet

Today? All modern browsers support WebRTC. We’ve got Chrome, Firefox, Edge and Safari with official WebRTC implementations.

The challenge? None of the browsers are ready:

  • Chrome uses Plan B, switching to Unified Plan
  • Firefox is doing fine, but isn’t high on the priority list
  • Edge doesn’t support the data channel, had its market share isn’t that great
  • Safari doesn’t support VP8 and breaks a wee bit too often at the moment

What’s a developer to do?

Use adapter.js. Or go for a plugin. Or just ignore a few browsers.

Or maybe. Just maybe you should treat PCs and laptops the same way you do mobile? And build an app.

If that’s what you plan on doing then you’re not alone.

The most popular way to build an app for the desktop is by using Electron. There are other ways, like CEF and actual native development, but Electron is by far the most common approach.

Here are 3 vendors making use of Electron (and WebRTC) for their desktop application:

#1 – Slack

Slack are a popular team collaboration application. I’ve been using it in the browser for the last 3 years, but switched to their desktop Electron app on both my Ubuntu desktop and my Windows 10 laptop.

Why didn’t I use the app for so long? Because I don’t like installing things.

Why have I installed it now? Because I need to track 3+ slack accounts in parallel at all times now. This means a tab per slack account in my browser. On the desktop app, they don’t “eat up” multiple tabs. It isn’t a matter of memory or performance for me. Just one of “esthetics” – trying to preserve a tabs diet on my Chrome.

And that’s how Slack likes it. During the last Kranky Geek, the Slack team gave an interesting presentation about their current plans. It had about a minute dedicated to Electron in 2:30 of the session:

This recording lacks the Q&A part of the session. In an answer to a question regarding browsers support, Andrew MacDonald of Slack, said their focus is in their desktop app – not the browser. They make sure everything works on Chrome. Invest less time and effort on the other browsers. And focus a lot on their Slack desktop application.

It was telling.

If you are looking for desktop-application-only-features in Slack, then besides having a single window for all projects, there’s the collaboration they offer during screen sharing that isn’t available in the browser (yet another reason for me to switch – to check it out).

During that session, at 2:30 minutes? Andrew says why Electron is so useful to Slack, and it is in the domain of cross platform development and time to market – with their team size, they can’t update as fast as Electron does, so they took it “as is” for the built-in WebRTC implementation of it.

#2 – Discord

Discord is a kind of Slack but different. A social network targeting gamers. You can also find there non-gaming groups. Discord is doing all it can to get you from the comfort of your browser right into their native application.

Here’s how the homepage looks like:

From the get go their call to action is to either Open Discord (in the browser) or Download for your operating system. On mobile, if you’re curious, the only alternative is to download the app.

Here’s the interesting part, though.

Discord’s call to action suggest by using green buttons you open Discord in the browser. That’s a lower friction action. You select a user name. Then pick an email and password (or use an unclaimed channel until you add your username and password). And now that you’re signed up for the service, it is time to suggest again you use their app:

And… if you skip this one, you’ll get a top bar reminder as well (that orange strip at the top):

You can do with Discord almost anything inside the browser, but they really really really want to get you off that damn internet and into their desktop app.

And it is working for them!

#3 – TalkDesk

TalkDesk has its own reason for adopting Electron.

TalkDesk is a contact center solution that integrates with CRMs and third party systems. Towards that goal, you can:

  • Use the TalkDesk application (=browser web app)
  • Install the TalkDesk extension from Chrome, and have it latch on to other CRM systems
  • install the Chrome Callbar app, so you can use it as a standalone without the need to have the browser opened at all

That third option is going the way of the dodo, along with Chrome apps. TalkDesk solved that by introducing Callbar Electron.

What we see here differs slightly from the previous two examples.

Where Slack and Discord try getting people off the web and into their desktop application, TalkDesk is just trying to be everywhere for them. Using HTML5 and Electron means they need not write yet-another-application for the desktop – they can reuse parts of their web app.

They are NOT Alone

There are other vendors I know of that are using Electron for their WebRTC applications. They do it for one of the following reasons:

  • It is an easy way to support Internet Explorer by not supporting it (or Safari)
  • They want a “native” app because they need more control than what a browser could ever offer, but still want to work with cross platform development, and HTML5/JS seems like the cleanest approach
  • Their users work in front of the service all day, so the browser isn’t the best interface for them
  • They don’t want to tether themselves or limit themselves to the browser. Using web technology is just how they want to develop
  • It brings with it “stability”, as it is up to you to decide when to push an update to your users as opposed to having browser vendors do it on their own timeframe. It is only semblance as most would still support both browsers and applications in parallel

Add to that CPaaS vendors officially supporting Electron. and TokBox are such examples. They do it not because they think it is nice, but because there’s customer demand for it.

This shift towards Electron apps makes it harder to estimate the real usage base of WebRTC. If most communications is shifting from Chrome browser (lets face it, most WebRTC comms happens in Chrome today if you only care about browsers) towards applications, then the statistics and trends collected by Google about WebRTC use are skewed. That said, it makes Chrome all the more dominant, as Electron use can be attributed back to Chromium.

Expect vendors to continue adopting Electron for their WebRTC applications. This trend is on .

Need to know where WebRTC is available? Download this free WebRTC Device Cheat Sheet.

Get the Cheat Sheet


The post WebRTC Electron Implementations are on 🔥 appeared first on

AWS DeepLens and the Future of AI Cameras and Vision

Mon, 01/22/2018 - 12:00

Are AI cameras in our future?

In last year’s AWS re:invent event, which took place end of November, Amazon unveiled an interesting product: AWS DeepLens

There’s decent information about this new device on Amazon’s own website but very little of anything else out there. I decided to put my own thoughts on “paper” here as well.

Interested in AI, vision and where it meets communications? I am going to cover this topic in future articles, so you might want to sign-up for my newsletter

Get my free content

What is AWS DeepLens?

AWS DeepLens is the combination of 3 components: hardware (camera + machine), software and cloud. These 3 come in a tight integration that I haven’t seen before in a device that is first and foremost targeting developers.

With DeepLens, you can handle inference of video (and probably audio) inputs in the camera itself, without shipping the captured media towards the cloud.

The hype words that go along with this device? Machine Vision (or Computer Vision), Deep Learning (or Machine Learning), Serverless, IoT, Edge Computing.

It is all these words and probably more, but it is also somewhat less. It is a first tentative step of what a camera module will look like 5 years from today.

I’d like to go over the hardware and software and see how they combine into a solution.

AWS DeepLens Hardware

AWS DeepLens hardware is essentially a camera that has been glued to an Intel NUC device:

Neither the camera nor the compute are on the higher end of the scale, which is just fine considering where we’re headed here – gazillion of low cost devices that can see.

The device itself was built in collaboration with Intel. As all chipset vendors, Intel is plunging into AI and deep learning as well. More on AWS+Intel vs Google later.

Here’s what’s in this package, based on the AWS blog post on DeepLens:

  • 4 megapixel camera with the ability to capture 1080p video resolution
    • Nothing is said about the frame rate in which this can run. I’d assume 30 fps
    • The quality of this camera hasn’t been detailed either. In many cases, I’d say these devices will need to work in rather extreme lighting conditions
  • 2D microphone array
    • It is easy to understand why such a device needs a microphone, a 2D microphone array is very intriguing in this one
    • This allows for better handling of things like directional sound and noise reduction algorithms to be used
    • None of the deep learning samples provided by Amazon seem to make use of the microphone inputs. I hope these will come later as well
  • Intel Atom X5 processor
    • This one has 4 cores and 4 threads
    • 8GB of memory and 16GB of storage – this is meant to run workloads and not store them for long periods of time
  • Intel Gen9 graphics engine (here)
    • If you are into numbers, then this does over 100 GFLOPS – quite capable for a “low end” device
    • Remember that 1080p@30fps produces more than 62 million pixels a second to process, so we get ~1600 operations per pixel here
    • You can squeeze out more “per pixel” by reducing frame rate or reducing resolution (both are probably done for most use cases)
  • Like most Intel NUC devices, it has Wi-Fi, USB and micro HDMI ports. There’s also a micro SD port for additional memory based on the image above

The hardware tries to look somewhat polished, but it isn’t. Although this isn’t written anywhere, this is:

  1. The first version of what will be an iterative process for Amazon
  2. A reference design. Developers are expected to build the proof of concept with this, later shifting to their own form factor – I don’t see this specific device getting sold to end customers as a final product

In a way, this is just a more polished hardware version of Google’s computer vision kit. The real difference comes with the available tooling and workflow that Amazon baked into AWS DeepLens.

AWS DeepLens Software

The AWS DeepLens software is where things get really interesting.

Before we get there, we need to understand a bit how machine learning works. At its basic, machine learning is about giving a “machine” a large dataset, letting it learn the data in one way or another, and then when you introduce similar new data, it will be able to classify it.

Dumbing the whole process and theory, at the end of the day, machine learning is built out of two main steps:

  1. TRAINING: You take a large set of data and use it for training purposes. You curate and classify it so the training process has something to check itself against. Then you pass the data through a process that ends up generating a trained model. This model is the algorithm we will be using later
  2. DEPLOY: When new data comes in (in our case, this will probably be an image or a video stream), we use our trained model to classify that data or even to run an algorithm on the data itself and modify it

With AWS DeepLens, the intent is to run the training in the AWS cloud (obviously), and then run the deployment step for real time classification directly on the AWS DeepLens device. This also means that we can run this while being disconnected from the cloud and from any other network.

How does all this come to play in AWS DeepLens software stack?

On device

On the device, AWS DeepLens runs two main packages:

  1. AWS Greengrass Core SDK – Greengrass enables running AWS Lambda functions directly on devices. If Lambda is called serverless, then Greengrass can truly run serverless
  2. Device optimized MXNet package – an Apache open source project for machine learning

Why MXNet and not TensorFlow?

  • TensorFlow comes from Google, which makes it less preferable for Amazon, a direct cloud competitor. It is also preferable by Intel (see below)
  • MXNet is considered faster and more optimized at the moment. It uses less memory and less CPU power to handle the same task
In the cloud

The main component here is the new Amazon SageMaker:

SageMarker takes the effort away from the management of training machine learning, streamlining the whole process. That last step in the process of Deploy takes place in this case directly on AWS DeepLens.

Besides SageMaker, when using DeepLens you will probably make use of Amazon S3 for storage, Amazon Lambda when running serverless in the cloud, as well as other AWS services. Amazon even suggests using AWS DeepLens along with the newly announced Amazon Rekognition Video service.

To top it all, Amazon has a few pre-trained models and sample projects, shortening the path from getting a hold of an AWS DeepLens device to seeing it in action.

AWS+Intel vs Google

So we’ve got AWS DeepLens. With its set of on-device and cloud software tools. Time to see what that means in the bigger picture.

I’d like to start with the main players in this story. Amazon, Intel and Google. Obviously, Google wasn’t part of the announcement. Its TensorFlow project was mentioned in various places and can be made to work with AWS DeepLens. But that’s about it.

Google is interesting here because it is THE company today that is synonymous to AI. And there’s the increasing rivalry between Amazon and Google that seems to be going on multiple fronts.

When Google came out with TensorFlow, it was with the intent of creating a baseline for artificial intelligence modeling that everyone will be using. It open sourced the code and let people play with it. That part succeeded nicely. TensorFlow is definitely one of the first projects developers would try to dabble with when it comes to machine learning. The problem with TensorFlow seems to be the amount of memory and CPU it requires for its computations compared to other frameworks. That is probably one of the main reasons why Amazon decided to place its own managed AI services on a different framework, ending up with MXNet which is said to be leaner with good scaling capabilities.

Google did one more thing though. It created its own special Tensor processing unit, calling it TPU. This is an ASIC type of a chip, designed specifically for high performance of machine learning calculations. In a research paper released by Google earlier last year, they show how their TPUs perform better than GPUs when it comes to TensorFlow machine learning work loads:

And if you’re wondering – you can get CLOUD TPU on the Google Cloud Platform, albait this is still in alpha stage.

This gives Google an advantage in hosting managed TensorFlow jobs, posing a threat to AWS when it comes to AI heavy applications (which is where we’re all headed anyway). So Amazon couldn’t really pick TensorFlow as its winning horse here.

Intel? They don’t sell TPUs at the moment. And like any other chip vendor, they are banking and investing heavily in AI. Which made working with AWS here on optimizing and working on end-to-end machine learning solutions for the internet of things in the form of AWS DeepLens an obvious choice.

Artificial Intelligence and Vision

These days, it seems that every possible action or task is being scrutinized to see if artificial intelligence can be used to improve it. Vision is no different. You can find it other computer vision or machine vision and it covers a broad set of capabilities and algorithms.

Roughly speaking, there are two types of use cases here:

  1. Classification – with classification, the images or video stream, is being analyzed to find certain objects or things. From being able to distinguish certain objects, through person and face detection, to face recognition to activities and intents recognition
  2. Modification – AWS DeepLens Artistic Style Transfer example is one such scenario. Another one is fixing the nagging direct eye contact problem in video calls (hint – you never really experience it today)

As with anything else in artificial intelligence and analytics, none of this is workable at the moment for a broad spectrum of classifications. You need to be very specific in what you are searching and aiming for, and this isn’t going to change in the near future.

On the other hand, there are many many cases where what you need is a camera to classify a very specific and narrow vision problem. The usual things include person detection for security cameras, counting people at an entrance to a store, etc. There are other areas you hear about today such as using drones for visual inspection of facilities and robots being more flexible in assembly lines.

We’re at a point where we already have billions of cameras out there. They are in our smartphones and are considered a commodity. These cameras and sensors are now headed into a lot of devices to power the IOT world and allow it to “see”. The AWS DeepLens is one such tool that just happened to package and streamline the whole process of machine vision.


On the price side, the AWS DeepLens is far from a cheap product.

The baseline cost is of an AWS DeepLens camera? $249

But as with other connected devices, that’s only a small part of the story. The device is intended to be connected to the AWS cloud and there the real story (and costs) takes place.

The two leading cost centers after the device itself are going to be AWS Greengrass and Amazon SageMaker.

AWS Greegrass starts at $1.49 per year per device. Amazon SageMaker costs 20-25% on top of the usual AWS EC2 machine prices. To that, add the usual bandwidth and storage pricing of AWS, and higher prices for certain regions and discounts on large quantities.

It isn’t cheap.

This is a new service that is quite generic and is aimed at tinkerers. Startups looking to try out and experiment with new ideas. It is also the first iteration of Amazon with such an intriguing device.

I, for one, can’t wait to see where this is leading us.

3 Different Compute Models for Machine Vision

AWS DeepLens is one of 3 different compute models that I see in this space of machine vision.

Here are all 3 of them:

#1 – Cloud

In a cloud based model, the expectation is that the actual media is streamed towards the cloud:

  • In real time
  • Or at some future point in time
  • When events occur; like motion being detected; or sound picked up on the mic

The data can be a video stream, or more often than not, it is just a set of captured images.

And that data gets classified in the cloud.

Here are two recent examples from a domain close to my heart – WebRTC.

At the last Kranky Geek event, Philipp Hancke shared how is trying to determine NSFW (Not Safe For Work):

The way this is done is by using Yahoo’s Open NSFW open source package. They had to resize images, send them to a server and there, using Python classify the image, determining if it is safe for work or not. Watch the video – it really is insightful at how to tackle such a project in the real world.

The other one comes from Chad Hart, who wrote a lengthy post about connecting WebRTC to TensorFlow for machine vision. The same technique was used – one of capturing still images from the stream and sending them towards a server for classification.

These approaches are nice, but they have their challenges:

  1. They are gravitating towards still images and not video streams at the moment. This relates to the costs and bandwidth involved in shipping and then analyzing such streams on a server. To give you an understanding of the costs – using Amazon Rekognition for one minute of video stream analysis costs $0.12. For a single minute. It is high, and the reason is that it really does require some powerful processing to achieve
  2. Sometimes, you really need to classify and make faster decisions. You can’t wait that extra 100’s of milliseconds or more for the classification to take place. Think augmented reality type of scenarios
  3. At least with WebRTC, I haven’t seen anyone who figured how to do this classification on the server side in real time for a video stream and not still images. Yet
#2 – In the Box

This alternative is what we have today in smartphones and probably in modern room based video conferencing devices.

The camera is just the optics, but the heavy lifting takes place in the main processor that is doing other things as well. And since most modern CPUs today already have GPUs embedded as part of the SoC, and chip vendors are actively working on AI specific additions to chips (think Apple’s AI chip in the iPhone X or Google’s computational photography packed into the Pixel X phones).

The underlying concept here is that the camera is always tethered or embedded in a device that is powerful enough to handle the machine learning algorithms necessary.

They aren’t part of the camera but rather the camera is part of the device.

This works rather well, but you end up with a pricy device which doesn’t always make sense. Remember that our purpose here is to aim at having a larger number of camera sensors deployed and having an expensive computing device attached to it won’t make sense for many of the use cases.

#3 – In the Camera

This is the AWS DeepLens model.


The computing power needed to run the classification algorithms is made part of the camera instead of taking place on another CPU.

We’re talking about $249 right now, but assuming this approach becomes popular, prices should go down. I can easily see such devices retailing at $49 on the low end in 2-3 technology cycles (5 years or so). And when that happens, the power developers will have over what use cases can be created are endless.

Think about a home surveillance system that costs below $1,000 to purchase and install. It is smart enough to have a lot less false positives in alerting its users. AND can be upgraded in its classification as time goes by. There can be a service put in place behind it with a monthly fee that includes such things. You can add face detection and classification of certain people – alerting you when the kids come home or leave for example. Ignoring a stray cat that came into view of the camera. And this system is independent of an external network to run on a regular basis. You can update it when an external network is connected, but other than that, it can live “offline” quite nicely.

No Winning Model


All of the 3 models have their place in the world today. Amazon just made it a lot easier to get us to that third alternative of “in the camera”.

IoT and the Cloud

Edge computing. Fog computing. Cloud computing. You hear these words thrown in the air when talking about the billions of devices that will comprise the internet of things.

For IoT to scale, there are a few main computing concepts that will need to be decided sooner rather than later:

  • Decentralized – with so many devices, IoT services won’t be able to be centralized. It won’t be around scale out of servers to meet the demands, but rather on the edges becoming smarter – doing at least part of the necessary analysis. Which is why the concept of AWS DeepLens is so compelling
  • On net and off net – IoT services need to be able to operate without being connected to the cloud at all times. Think of an autonomous car that needs to be connected to the cloud at all times – a no go for me
  • Secured – it seems like the last thing people care about in IoT at the moment is security. The many data breaches and the ease at which devices can be hijacked point that out all too clearly. Something needs to be done there and it can’t be on the individual developer/company level. It needs to take place a lot earlier in the “food chain”

I was reading The Meridian Ascent recently. A science fiction book in a long series. There’s a large AI machine there called Big John which sifts through the world’s digital data:

“The most impressive thing about Big John was that nobody comprehended exactly how it worked. The scientists who had designed the core network of processors understood the fundamentals: feed sufficient information to uniquely identify a target, and then allow Big John to scan all known information – financial transactions, medical records, jobs, photographs, DNA, fingerprints, known associates, acquaintances, and so on.

But that’s where things shifted into another realm. Using the vast network of processors at its disposal, Big John began sifting external information through its nodes, allowing individual neurons to apply weight to data that had no apparent relation to the target, each node making its own relevance and correlation calculations.”

I’ve emphasized that sentence. To me, this shows the view of the same IoT network looking at it from a cloud perspective. There, the individual sensors and nodes need to be smart enough to make their own decisions and take their own actions.

All these words for a device that will only be launched April 2018…

We’re not there yet when it comes to IoT and the cloud, but developers are working on getting the pieces of the puzzle in place.

Interested in AI, vision and where it meets communications? I am going to cover this topic in future articles, so you might want to sign-up for my newsletter

Get my free content

The post AWS DeepLens and the Future of AI Cameras and Vision appeared first on

How Many Users Can Fit in a WebRTC Call?

Mon, 01/15/2018 - 12:00

As many as you like. You can cram anywhere from one to a million users into a WebRTC call.

You’ve been asked to create a group video call, and obviously, the technology selected for the project was WebRTC. It is almost the only alternative out there and certainly the one with the best price-performance ratio. Here’s the big question: How many users can we fit into that single group WebRTC call?

Need to understand your WebRTC group calling application backend? Take this free video mini-course on the untold story of WebRTC’s server side.

Enroll now

At least once a week I get approached by someone saying WebRTC is peer-to-peer and asking me if you can use it for larger groups, as the technology might not fit for such use cases. Well… WebRTC fits well into larger group calls.

You need to think of WebRTC as a set of technological building blocks that you mix and match as you see fit, and the browser implementation of WebRTC is just one building block.

The most common building block today in WebRTC for supporting group video calls is the SFU (Selective Forwarding Unit). a media router that receives media streams from all participants in a session and decides who to route that media to.

What I want to do in this article, is review a few of the aspects and decisions you’ll need to take when trying to create applications that support large group video sessions using WebRTC.

Analyze the Complexity

The first step in our journey today will be to analyze the complexity of our use case.

With WebRTC, and real time video communications in general, we will all boil down to speeds and feeds:

  1. Speeds – the resolution and bitrate we’re expecting in our service
  2. Feeds – the stream count of the single session

Let’s start with an example.

Assume you want to run a group calling service for the enterprise. It runs globally. People will join work sessions together. You plan on limiting group sessions to 4 people. I know you want more, but I am trying to keep things simple here for us.

The illustration above shows you how a 4 participants conference would look like.

Magic Squares: 720p

If the layout you want for this conference is the magic squares one, we’re in the domain of:

You want high quality video. That’s what everyone wants. So you plan on having all participants send out 720p video resolution, aiming for WQHD monitors (that’s 2560×1440). Say that eats up 1.5Mbps (I am stingy here – it can take more), so:

  • Each participant in the session sends out 1.5Mbps and receives 3 streams of 1.5Mbps
  • Across 4 participants, the media server needs to receive 6Mbps and send out 18Mbps

Summing it up in a simple table, we get:

Resolution 720p Bitrate 1.5Mbps User outgoing 1.5Mbps (1 stream) User incoming 4.5Mbps (3 streams) SFU outgoing 18Mbps (12 streams) SFU incoming 6Mbps (4 streams) Magic Squares: VGA

If you’re not interested in resolution that much, you can aim for VGA resolution and even limit bitrates to 600Kbps:

Resolution VGA Bitrate 600Kbps User outgoing 0.6Mbps (1 stream) User incoming 1.8Mbps (3 streams) SFU outgoing 7.2Mbps (12 streams) SFU incoming 2.4Mbps (4 streams)


The thing you may want to avoid when going VGA is the need to upscale the resolution on the display – it can look ugly, especially on the larger 4K displays.

With crude back of the napkin calculations, you can potentially cram 3 VGA conferences for the “price” of 1 720p conference.

Hangouts Style

But what if our layout is a bit different? A main speaker and smaller viewports for the other participants:

I call it Hangouts style, because Hangouts is pretty known for this layout and was one of the first to use it exclusively without offering a larger set of additional layouts.

This time, we will be using simulcast, with the plan of having everyone send out high quality video and the SFU deciding which incoming stream to use as the dominant speaker, picking the higher resolution for it and which will pick the lower resolution.

You will be aiming for 720p, because after a few experiments, you decided that lower resolutions when scaled to the larger displays don’t look that good. You end up with this:

  • Each participant in the session sends out 2.2Mbps (that’s 1.5Mbps for the 720p stream and the additional 80Kbps for the other resolutions you’ll be simulcasting with it)
  • Each participant in the session receives 1.5Mbps from the dominant speaker and 2 additional incoming streams of ~300Kbps for the smaller video windows
  • Across 4 participants, the media server needs to receive 8.8Mbps and send out 8.4Mbps
Resolution 720p highest (in Simulcast) Bitrate 150Kbps – 1.5Mbps User outgoing 2.2Mbps (1 stream) User incoming 1.5Mbps (1 stream)

0.3Mbps (2 streams) SFU outgoing 8.4Mbps (12 streams) SFU incoming 8.8Mbps (4 streams)


This is what have we learned:

Different use cases of group video with the same number of users translate into different workloads on the media server.

And if it wasn’t mentioned specifically, simulcast works great and improves the effectiveness and quality of group calls (simulcast is what we used in our Hangouts Style meeting).

Across the 3 scenarios we depicted here for 4-way video call, we got this variety of activity in the SFU:

Magic Squares: 720p Magic Squares: VGA Hangouts Style SFU outgoing 18Mbps 7.2Mbps 8.4Mbps SFU incoming 6Mbps 2.4Mbps 8.8Mbps


Here’s your homework – now assume we want to do a 2-way session that gets broadcasted to 100 people over WebRTC. Now calculate the number of streams and bandwidths you’ll need on the server side.

How Many Users Can be Active in a WebRTC Call?

That’s a tough one.

If you use an MCU, you can get as many users on a call as your MCU can handle.

If you are using an SFU, it depends on a 3 different parameters:

  1. The level of sophistication of your media server, along with the performance it has
  2. The power you’ve got available on the client devices
  3. The way you’ve architected your infrastructure and worked out cascading

We’re going to review them in a sec.

Same Scenario, Different Implementations

Anything about 8-10 users in a single call becomes complicated. Here’s an example of a publicly available service I want to share here.

The scenario:

  • 9 participants in a single session, magic squares layout
  • I use testRTC to get the users into the session, so it is all automated
  • I run it for a minute. After that, it kills the session since it is a demo
  • It takes into account that with 9 people on the screen, reducing resolutions for all to VGA, but it allocates 1.3Mbps for that resolution
  • Leading to the browsers receiving 10Mbps of data to process

The media server decided here how to limit and gauge traffic.

And here’s another service with an online demo running the exact same scenario:

Now the incoming bitrate on average per browser was only 2.7Mbps – almost a fourth of the other service.

Same scenario. Different implementations.

What About Some Popular Services?

What about some popular services that do video conferencing in an SFU routed model? What kind of size restrictions do they put on their applications?

Here’s what I found browsing around:

  • Google Hangouts – up to 25 participants in a single session. It was 10 in the past. When I did my first-ever office hour for my WebRTC training, I maxed out at 10, which got me to start using other services
  • Hangouts Meet – placed its maximum number at 50 participants in a single session
  • Houseparty – decided on 8 participants
  • Skype – 25 participants
  • – their PRO accounts support up to 12 participants in a room
  • Amazon Chime – 16 participants on the desktop and up to 8 participants on iOS (no Android support yet)

Does this mean you can’t get above 50?

My take on it is that there’s an increasing degree of difficulty as the meeting size increases:

The CPaaS Limit on Size

When you look at CPaaS platforms, those supporting video and group calling often have limits to their meeting size. In most cases, they give out an arbitrary number they have tested against or are comfortable with. As we’ve seen, that number is suitable for a very specific scenario, which might not be the one you are thinking about.

In CPaaS, these numbers vary from 10 participants to 100’s of participants in a single sesion. Usually, if you can go higher, the additional participants will be view-only.

Key Points to Remember

Few things to keep in mind:

  • The higher the group size the more complicated it is to implement and optimize
  • The browser needs to run multiple decoders, which is a burden in itself
  • Mobile devices, especially older ones, can be brought down to their knees quite quickly in such cases. Test on the oldest, puniest devices you plan on supporting before determining the group size to support
  • You can build the SFU in a way that it doesn’t route all incoming media to everyone but rather picks partial data to send out. For example, maybe only a single speaker on the audio channels, or the 4 loudest streams
Sizing Your Media Server

Sizing and media servers is something I have been doing lately at testRTC. We’ve played a bit with Kurento in the past and are planning to tinker with other media servers. I get this question on every other project I am involved with:

How many sessions / users / streams can we cram into a single media server?

Given what we’ve seen above about speeds and feeds, it is safe to say that it really really really depends on what it is that you are doing.

If what you are looking for is group calling where everyone’s active, you should aim for 100-500 participants in total on a single server. The numbers will vary based on the machine you pick for the media server and the bitrates you are planning per stream on average.

If what you are looking for is a broadcast of a single person to a larger audience, all done over WebRTC to maintain low latency, 200-1,000 is probably a better estimate. Maybe even more.

Big Machines or Small Machines?

Another thing you will need to address is on which machines are you going to host your media server. Will that be the biggest baddest machines available or will you be comfortable with smaller ones?

Going for big machines means you’ll be able to cram larger audiences and sessions into a single machine, so the complexity of your service will be lower. If something crashes (media servers do crash), more users will be impacted. And when you’ll need to upgrade your media server (and you will), that process can cost you more or become somewhat more complicated as well.

The bigger the machine, the more cores it will have. Which results in media servers that need to run in multithreaded mode. Which means they are more complicated to build, debug and fix. More moving parts.

Going for small machines means you’ll hit scale problems earlier and they will require algorithms and heuristics that are more elaborate. You’ll have more edge cases in the way you load balance your service.

Scale Based on Streams, Bandwidth or CPU?

How do you decide that your media server achieved full capacity? How do you decide if the next session needs to be crammed into a new machine or another one or be placed on the current media server you’re using? If you use the current one, and new participants want to join a session actively running in this media server, will there be room enough for them?

These aren’t easy questions to answer.

I’ve see 3 different metrics used to decide on when to scale out from a single media server to others. Here are the general alternatives:

Based on CPU – when the CPU hits a certain percentage, it means the machine is “full”. It works best when you use smaller machines, as CPU would be one of the first resources you’ll deplete.

Based on Bandwidth – SFUs eat up lots of networking resources. If you are using bigger machines, you’ll probably won’t hit the CPU limit, but you’ll end up eating too much bandwidth. So you’ll end up determining the capacity available by way of bandwidth monitoring.

Based on Streams – the challenge sometimes with CPU and Bandwidth is that the number of sessions and streams that can be supported may vary, depending on dynamic conditions. Your scaling strategy might not be able to cope with that and you may want more control over the calculations. Which will lead to you sizing the machine using either CPU or bandwidth, but placing rules in place that are based on the number of streams the server can support.

The challenge here is that whatever scenario you pick, sizing is something you’ll need to be doing on your own. I see many who come to use testRTC when they need to address this problem.

Cascading a Single Session

Cascading is the process of connecting one media server to another. The diagram below shows what I mean:

We have a 4-way group video call that is spread across 3 different media servers. The servers route the media between them as needed to get it connected. Why would you want to do this?

#1 – Geographical Distribution

When you run a global service and have SFUs as part of it, the question that is raised immediately is for a new session, which SFU will you allocate for it? In which of the data centers? Since we want to get our media servers as close as possible to the users, we either have pre-knowledge about the session and know where to allocate it, or decide by some reasonable means, like geolocation – we pick the data center closest to the user that created the meeting.

Assume 4 people are on a call. 3 of them join from New York, while the 4th person is from France. What happens if the French guy joins first?

The server will be hosted in France. 3 out of 4 people will be located far from the media server. Not the best approach…

One solution is to conduct the meeting by spreading it across servers closest to each of the participants:

We use more server resources to get this session served, but we have a lot more control over the media routes so we can optimize them better. This improved media quality for the session.

#2 – Fragmented Allocations

Assume that we can connect up to 100 participants in a single media server. Furthermore, every meeting can hold up to 10 participants. Ideally, we won’t want to assign more than 10 meetings per media server.

But what if I told you the average meeting size is 2 participants? It can get us to this type of an allocation:

This causes a lot of wasted server resources. How can we solve that?

  1. By having people commit in advance to the maximum meeting size. Not something you really want to do
  2. Taking a risk, assume that if you allocate 50% of a server’s capacity, the rest of the capacity you leave for existing meetings allowing them to grow. You still have wasted resources, but to a lower degree. There will be edge cases where you won’t be able to fill out the meetings due to server resources
  3. Migrating sessions across media servers in an effort to “defragment” the servers. It is as ugly as it sounds, and probably just as disrupting to the users
  4. Cascade sessions. Allow them to grow across machines

That last one of cascading? You can do that by reserving some of a media server’s resources for cascading existing sessions to other media servers.

#3 – Larger Meetings

Assuming you want to create larger meetings than one a single media server can handle, your only choice is to cascade.

If your media server can hold 100 participants and you want meetings at the size of 5,000 participants, then you’ll need to be able to cascade to support them. This isn’t easy, which explains why there aren’t many such solutions available, but it definitely is possible.

Mind you, in such large meetings, the media flow won’t be bidirectional. You’ll have fewer participants sending media and a lot more only receiving media. For the pure broadcasting scenario, I’ve written a guest post on the scaling challenges on Red5 Pro’s blog.


We’ve touched a lot of areas here. Here’s what you should do when trying to decide how many users can fit in your WebRTC calls:

  1. Whatever meeting size you have in mind it is possible to support with WebRTC
    1. It will be a matter of costs and aligning it with your business model that will make or break that one
    2. The larger the meeting size, the more complex it will be to get it done right, and the more limitations and assumptions you’ll need to add to the equation
  2. Analyze the complexity you need to support
    1. Count the incoming and outgoing streams to each device and media server
    2. Decide on the video quality (resolution and bitrate) for each stream
  3. Define the media server you’ll be using
    1. Select a machine type to run the media server on
    2. Figure out the sizing needed before you reach scale out
    3. Check if the growth is linear on the server’s resources
    4. Decide if you scale out based on bandwidth, CPU, streams count or anything else
  4. Figure how cascading fits into the picture
    1. Offer with it better geolocation support
    2. Assist in resource fragmentation on the cloud infrastructure
    3. Or use it to grow meetings beyond a single media server’s capacity

What’s the size of your WebRTC meetings?

Need to understand your WebRTC group calling application backend? Take this free video mini-course on the untold story of WebRTC’s server side.

Enroll now

The post How Many Users Can Fit in a WebRTC Call? appeared first on

7 CPaaS Trends to Follow in 2018

Mon, 01/08/2018 - 12:00

Here are CPaaS trends you should be expecting this year.

There’s no doubt about it. CPaaS is growing and it is doing so rapidly. It is a multi billion dollars industry, and while still small, there’s no sign of its growth stopping anytime soon. You’ll see the numbers $4 billion and $8 billion a year appearing in different reports and estimates that are flying around when talking about the near future of the CPaaS market size and growth potential. I have no clue if the numbers are correct – I’ve never been one to play with estimates.

What I do know, is that we’ve got multiple CPaaS vendors now with ARR (Annual Run Rate) higher than $100 million. Most of it may still come from good old SMS and phone calls, but I think this will change along with how consumers communicate.

This change will make CPaaS a lot more interesting and diversified than the boring race to the bottom that seems to be prevalent in some of the players’ offering and messaging in this market. The problem with CPaaS today is twofold:

  1. SMS and voice are somewhat commoditized. There is a finite way in which you can send and receive SMS and phone calls over phone numbers, and we’ve exhausted them and how to express them in a simple API for developers to use years ago. Since then, the game we played was one of scalability, stability and price points
  2. Developers are resistant to paying for IP based communications services at the moment. They somehow believe that these are a lot easier to develop. While that is correct for the “hello world” implementation, once you need to provide long term maintenance and scalability capabilities this can grow into a huge headache – especially when you couple this with some of the trends in communication that are being introduced

Which brings me to what you can expect in 2018. Here are 7 CPaaS trends that will grow and become important this year – and more importantly – what they mean.

Planning on selecting a CPaaS vendor? Check out this shortlist of CPaaS vendor selection metrics:

Get the shortlist

#1 – Serverless

Serverless is also known as Functions.

You might know about serverless from AWS Lambda, Azure Functions, Google’s Cloud Functions and Apache’s OpenWhisk. The list here isn’t random – it goes to show that all big cloud platforms are now offering serverless capabilities.

This still isn’t prevalent in CPaaS, where for the most part, developers are expected to develop, maintain and operate their own servers that communicate with the CPaaS vendor’s infrastructure. But we do see signs of serverless making its way here.

I’ve covered that last year, when I took a deeper look into the Twilio Functions offering and what that means to the CPaaS market.

At the time, Twilio stated that Functions is already Twilio’s fastest growing product ever. Here’s where they explain what it does:

Twilio being the market leader in CPaaS, and Functions being a fast growing product of theirs means that other CPaaS vendors will follow. Simply because demand here is obvious.

#2 – Omnichannel

When SMS just isn’t enough.

Not sure when you last used SMS for personal reasons – I know that I rarely end up inside that app on my smartphone. The way things are going, SMS can be considered the spam channel of 2018. Or maybe the channel used by businesses who’ve been told that this is the best way to reach customers and interrupt them.

While I definitely see value in SMS, I also think that businesses should strive to communicate with their customers on other channels – channels their users are now focusing on with their social life. In Israel that would be Whatsapp. In the US probably a mixture of Facebook and iMessage will work better. Telegram would be the choice for Russia.

Whatever that channel is, to support it, someone needs to integrate with it. And then decide which channel to use for which customer and for what interaction. For CPaaS, that’s what Omnichannel is about. Enabling developers, and by extension businesses to communicate with their customers on the customer’s preferred channel.

2018 is going to be the year Omnichannel becomes a serious requirement.


Because now we can actually use it.

Apple’s own Business Chat service is planned to make its public debut this year.

Facebook has its own APIs already, and Whatsapp announced business accounts (=APIs).

That alone covers a large majority of customer bases.

Throw in SMS, mix and choose the ones you want. And voila! Omnichannel.

For businesses, relying on CPaaS for Omnichannel makes sense, as the hassle of adding all of these channels and maintaining them is expensive. Omichannel CPaaS APIs will abstract that away.

For CPaaS vendors, this is a way to differentiate and make switching between vendors harder.

A win-win.

The ones offering that already? Nexmo with their Chat App and Twilio through their Engagement Cloud.

#3 – Visual / IDE

From code, to REST, to point-and-click.

We used to use DOS as an “operating system”. I worked at a small computer shop as a kid when I grew up. For a couple of years, my role was to go to people’s homes and explain to them how to use the new computer they just purchased. How to put the DOS disk inside the floppy drive, list the files in a floppy, run games and other applications.

Then came Windows (along with Mac and OS/2 and others) and we all just moved to using a visual operating system and a mouse.

As a kid, I programmed using Logo and Basic. Then Turbo Pascal – in a decent IDE for the first time. In the university, I got acquainted to Tcl/Tk. And then UI development seemed fun. Even it if was by writing code by hand. Then one day, vtcl came to life – a visual editor. Things got easier.

Developing communications is taking the same path now.

It started by needing to build your own stuff from scratch, then with open source frameworks and later CPaaS and REST (or god forbid SOAP) APIs.

In 2017, Twilio Studio was announced – a visual IDE to use on top of the Twilio functionality. In that corner, you can also count Amazon Connect, though not CPaaS but still in the domain of communications – it has a visual IDE of its own.

In a recent VoxImplant event I was invited to speak at in Russia, VoxImplant introduced a new service in beta called Smartcalls – a visual IDE on top of their CPaaS offering. Albeit… in Russian.

The concept of using visual tools requiring less coding can greatly increase productivity and the target audience of these tools. They are no longer restricted to developers “who code”. Hell – I can use these tools. I played with Twilio Studio a bit – it was fun and intuitive. It guides the way you think about what needs to be done. About the flow of the service.

I really can’t see how other CPaaS vendors are going to ignore this trend and not work on their own visual offerings during 2018.

#4 – Machine Learning and Artificial Intelligence

It is time to be smart about communications

When I worked at Amdocs some years ago, we’ve looked into the area of Big Data Analytics. It was all about how you take the boatloads of information telecommunication companies have and do something with it. You start by analyzing and visualizing it, moving towards the domain of actionable.

It frustrated the hell out of me to understand how little communication vendors are doing with their data compared to enterprises in other markets. Or at least that was my impression looking from inside a vendor.

Fast forward to today, and what you find with CPaaS vendors is that they are offering a well oiled machine that provides generic communications. You can do whatever you want with it, and the smart ones are adding analytics on top for their own needs.

But want about the CPaaS vendors themselves? Shouldn’t they be doing something about analytics? Or its better branded colleague known as machine learning?

Gustavo Garcia wrote a good article about it – improving real time communications with machine learning. This is where most CPaaS vendors are probably looking today, optimizing their network to offer a better service.

But it is just scratching the surface.

The obvious is adding things around NLP – speech to text, text to speech, translation. All those are being done by integrating with third parties today, and many of the CPaaS vendors offer these out of the box.

To move the needle and differentiate, more needs to be done:

  1. The internal structure of the CPaaS vendors should take into account the need for researching data. Data scientists and machine learning people have to be part of the development and product teams for this to ever happen
  2. CPaaS vendors need to start thinking on what they can offer by analyzing their own data (and their customer’s communications) beyond just optimizing it

If you are a CPaaS vendor and you don’t have at least a data scientist, a machine learning developer and a product manager savvy in this domain yet, then start recruiting.

#5 – AR/VR

Time to connect ARKit and ARCode to communications.

Augmented reality and virtual reality have been around for the better part of the last decade or two. But somehow, they are only now becoming interesting.

I guess the popularity of AR has grown a lot, and where it fits directly in smartphones today (and not the bulky 3D headsets) is with things like Pokemon Go and camera filters (started by popularized snapchat and found everywhere today).

With the introduction of Apple ARKit and Google ARCore, this is only going to get more commonplace. And what we see now is CPaaS vendors finding their way around this technology.

The most interesting one yet is Twilio’s work with ARKit, which they showcased at last year’s Kranky Geek event:

With all the focus put in this domain, I am sure we’ll see more CPaaS vendors looking into it.

#6 – Bots

Omnichannel + Machine Learning + Automation = Bots

Chat bots is all the rage. Search the internet and you’ll be thinking that humans no longer talk to customers anymore. It is all taken care of by bots.

I’ve added a chat widget to certain pages on my website. And every once in awhile I get a question there asking if that’s a human they’re interacting with.

Bots require integration and APIs. They are also about communications. Which is probably why CPaaS vendors are taking a step towards this direction as well. The ones adding Omnichannel offerings across multiple channels are in effect enabling bots to be created there across channels.

That’s a first step though, as the next would be to cater this market better by enabling conversational interfaces and easing the part of packaging the bots for the various channels.

Expect to see a few announcements around bots to be made by CPaaS vendors this year. A lot of it will revolve around Amazon Alexa and Google Home

#7 – GDPR

The governance headache we’ve all been waiting for.

GDPR stands for General Data Protection Regulation. It is a new set of EU rules that have been put in place to protect the data related to EU citizens that is collected and stored.

While it is easy to assume that CPaaS vendors store no data – they “live” in the real time, that isn’t accurate.

Stored meta data and logs may fall into the GDPR black hole, and definitely recording services. With the introduction of Omnichannel and Bots comes chat history storage.

Twilio jumped on this bandwagon last year with a GDPR program. Other vendors such as MessageBird indicated future support of GDPR. All global CPaaS vendors will need to support GDPR, and since these regulations come to force this year, 2018 will be the year GDPR gets more attention and focus by CPaaS vendors.

2018 – The Year CPaaS Vendors Differentiated

In the past few years, we’ve seen CPaaS vendors struggling in two directions:

  1. Increasing their customer base, mainly around SMS and voice offerings – which is where most of the revenue is these days
  2. Growing from a telecom focused player to a global player

That second point is important. Up until recently, CPaaS equated to running one or two data centers (or the equivalent of running from a small number of cloud based data centers), connecting developers via REST APIs to the telecom backend. With the introduction of IP based communications (and WebRTC), the was a growing need for client side SDKs along with more points of presence closer to the end user.

We seem to be past that hurdle for most CPaaS vendors. Most of them have grown their footprint to include a global infrastructure.

The next frontier is going to happen elsewhere:

  1. Serverless – in making the services easier for developers to adopt by reducing the requirement for customers to deploy their own machines
  2. Omnichannel – extending the reach beyond the telecom channels of SMS and voice into social networks
  3. Visual / IDE – grow the service beyond developers, making it easier to use and faster to deploy with
  4. Machine Learning and Artificial Intelligence – add intelligence and analytics based services
  5. AR/VR – capture the new world of augmented and virtual reality and enhance it with communications
  6. Bots – align with the A2P model of businesses communicating with customers through automation
  7. GDPR – provide support for the new EU initiative, adding governance and regulation as another added value of choosing CPaaS instead of in-house development

CPaaS will move in rapid pace in the next few years. Vendors who won’t invest and grow their offerings and business will not stay with us for long.

Planning on selecting a CPaaS vendor? Check out this shortlist of CPaaS vendor selection metrics:

Get the shortlist

The post 7 CPaaS Trends to Follow in 2018 appeared first on


Using the greatness of Parallax

Phosfluorescently utilize future-proof scenarios whereas timely leadership skills. Seamlessly administrate maintainable quality vectors whereas proactive mindshare.

Dramatically plagiarize visionary internal or "organic" sources via process-centric. Compellingly exploit worldwide communities for high standards in growth strategies.

Get free trial

Wow, this most certainly is a great a theme.

John Smith
Company name

Startup Growth Lite is a free theme, contributed to the Drupal Community by More than Themes.