Social messaging is killing RCS in all the places that matter.
When looking at messaging in the context of communications and people, we can probably split the story into 3 distinct models:
I’ll quickly sift through the first two and focus on the third.Consumer Centric
Consumer centric is easy. That’s where Apple iMessage, WhatsApp, Facebook Messenger, Telegram, WeChat and a bunch of others are competing. The approach there today is to deliver a rich messaging experience that includes text, images, video, voice and video calling, location, groups, … – the list goes on. And on. And on.
They have won the war against SMS. We still have SMS. Some mistakenly call it ubiquitous (on my phone it is used for spam and 2FA messages only). They won the war against RCS that never really started.
To give you a clue – Israel is a WhatsApp country. If you don’t have WhatsApp you don’t exist. It is true from the age of 8. I just purchased the first smartphone for my 8 year old boy. Not so he can play or call with the phone – just so he can send messages to his classmates and stay part of the social fabric of his class. It happened to my daughter when she reached that age. I am now a part of multiple WhatsApp groups: family, close friends, parents of my kids’ classes and after classes, work related, etc.
How easy would it be to move people in Israel from entrenched groups that hold history, images and videos? And to what end? How would RCS be any better in its experience?Business Centric
Business centric is Slack. It used to be all about calling and the PBX. Slack changed the game. Everyone is talking about “team messaging” today. I used the term enterprise messaging years ago.
What Slack did was find a good balance between functionality and user experience that no other player has been able to copy properly so far, but everyone is after.
WhatsApp is unlikely to penetrate businesses in a meaningful way. Facebook built Workplace instead of trying to introduce Facebook or Messenger directly.
Where’s SMS in this orgy of messaging? Meaningful conversations happen in IP messaging services and not over SMS anymore. Some solutions, like VonageFlow offer a seamless experience that encompasses both messaging as we know it today and SMS, though I’d argue that capability is a business to consumer one.
For all intent and purpose, SMS is non-existent when it comes to business centric messaging.Business to Consumer
Back to RCS. RCS was supposed to be the future of SMS when we all move to IP based packet networks. Guess what? We’re all on IP based packet networks, and RCS isn’t really here yet in any meaningful way.
In the past couple of years, RCS got a new tune by its proponents. The strategy changed from getting consumers back from social networks towards being the one ubiquitous network – the ring to rule them all. Here’s the idea: you get RCS on all smartphones worldwide. Now carriers have the ubiquity they had with SMS. And businesses would pay for such access to customer’s phones.
Not going to happen.
Why? Because Apple and Facebook have other plans for us.
Apple now has Apple Business Chat. It is built into the iPhone, making businesses discoverable and reachable over iMessage from the Safari browser, Spotlight search, Siri assistant and Apple Maps. I’ve written extensively about it when it was introduced on SearchUC: Apple Business Chat looks to polish customer messaging
WhatsApp came out with their own offering called WhatsApp Business API. Similarly to Apple Business Chat, it offers the ability for businesses to communicate with consumers. Apple does that by focusing on contact center vendors while Whatsapp partners with CPaaS vendors. The goal? Get higher exposure and not working directly with longtail developers in the initial release.
What drove me to even start writing this article? This title of a TechCrunch post: Wish, Netflix, Uber and ~100 others testing WhatsApp’s new Business API
Businesses aren’t waiting for RCS. They are trying to figure out how to communicate with their customers via WhatsApp.
They had Line, WeChat, Facebook Messenger. And they’re still aiming for WhatsApp – a messaging service that isn’t even a US-thing.
Which brings me to the main thing – business to consumer is now a social messaging realm. Carriers have lost that domain as well.1 Billion Defines the Moat
Remember ubiquity? Here’s what it takes to be interesting:
1 Billion Monthly Active Users
Who has that number today?
Facebook (WhatsApp + Messenger), Apple Business Chat and WeChat. WhatsApp being the biggest one are redefining this market. You hear a lot about how customers still phone businesses and chat isn’t catching up with contact centers. That might be true, but only partially.
Today’s chat solutions usually require being on the company’s website. SMS hasn’t proven itself in a large scale for anything other than notifications to customers on orders and transactions. Whatsapp can change that – and to that extent, any of the other 1B+ MAU social messaging apps.
RCS? With what billion users exactly?
With the large social networks, a 100 million monthly active users seem like a rounding error.Focus is on Customer Care – Not Marketing
Another interesting aspect (and difference) is that social networks are keeping user identity and access close to their chest. While WhatsApp is using phone numbers for identity, piggybacking on carriers in a way, they are not allowing anyone access to a user without the user’s permission. This means:
What these networks are trying to do is to get businesses and consumers off their SMS communications and shift it to their network. To do so, they plan on offering a superior experience. They are doing that not only by adding richness over the limited 160 character experience of SMS, but they are also making sure this will be a useful service to their user base and won’t be considered spammy.
Will there be other avenues opened to businesses on social networks to interact with users through marketing campaigns and outbound messaging? Sure. But it isn’t the first priority. The market needs to be created first.Where Can We Go Next?
We are headed towards an omnichannel interaction model.
To me that means that a business will meet a customer wherever it is comfortable for the customer in the context of that specific interaction.
A customer may prefer a phone call at one interaction, but a chat over WhatsApp on another.
The challenge here is that different customers may prefer different social networks. Or aren’t even approachable on some of the social networks. This isn’t going to change any time soon either. The number of social networks is still growing, and while we have a few huge players, others are important to specific populations.
Businesses will need to rely on multiple such channels if they want to reach out to a larger target audience of potential customers.Back to RCS
It is coming. In some carriers. On some devices. In some form.
Is it going to take back ownership of the interactions from social networks? No.
What it can be, is just another channel. Right next to the rest. It will only become important if it can make that 1 billion monthly active users mark.
Oh, and it will need to succumb to the rules of engagement laid out by social networks today, around business-to-user permissions.
The post Social Messaging != Carrier Messaging (the stories of Whatsapp Business API & Apple Business Chat) appeared first on BlogGeek.me.
I has been more than a year since Apple first added WebRTC support to Safari. My original post reviewing the implementation continues to be popular here, but it does not reflect some of the updates since the first limited release. More importantly, given its differences and limitations, many questions still remained on how to best develop WebRTC applications for Safari.
I ran into Chad Phillips at Cluecon (again) this year and we ended up talking about his arduous experience making WebRTC work on Safari.
Visual design tools in CPaaS are now a part of the offering.
In October 2017, almost a year ago, Twilio announced Studio. I wrote at the time a lengthy article about my thoughts on Twilio Studio and CPaaS. My closing paragraph then was this one:
It will be interesting to see how competitors would react to this in the long run, and even more interesting to see what will Twilio Studio grow into.
Then in January 2018, I wrote about the 7 CPaaS Trends to Follow in 2018. The ones I zeroed in on:
Not sure which CPaaS vendor to use? Check out my free CPaaS Vendor Selection Matrix. It will give you the KPIs to look for.
Download the CPaaS Vendor Selection Matrix
Guess what happened since with Visual/IDE?
Messagebird introduced Flow Builder: “The power of our Voice and SMS solutions at your fingertips, without writing a single line of code.”
Plivo announced PHLO on August: “A whole new visual way of integrating communications that would empower developers to design collaboratively, build visually and deploy instantly.”
Voximplant came out with Smartcalls: “a smart and flexible tool that helps you create outbound call campaigns in no time”
All of these CPaaS players invested into a Twilio Studio-like tool.
Let’s check out what each player did and why.Twilio Studio
Where it all started (even if there were tools before or in parallel to it).
Studio’s entry point is either an incoming message, an incoming call or a REST API call. From there, the actions include things you do with messages and phone calls, along with the ability to execute generic functions.
A nice touch to Studio is its revision control system – it saves past changes made to the flows you built, allowing switching back and forth between revisions. It would be nice to have named revisions, some automated verbose explanation of changes made, etc.Messagebird Flow Builder
Messagebird Flow Builder is focused around SMS. The inputs you can use for it are either an incoming SMS or an incoming webhook API call. Once in the “flow”, you can branch the flow based on the time and date or other conditions related to the contents of the message. The end result? An outgoing SMS, email or webhook. There’s a bit more to it than that, like the ability to manage subscriptions in Messagebird or wait for certain replies inside the flow.
What I like about the Messagebird Flow Builder is that it is rigid in how it outlines the boxes and their connections – it doesn’t let you move boxes around (a cool feature that got tiresome rather quickly on me in other tools here – Studio and PHLO).Plivo PHLO
Plivo PHLO is a me-too Twilio Studio tool.
It has the same entry points, node types and capabilities, assuming you’re interested in SMS and voice calls that is. Where Twilio Studio offers more generic “Messages”, Plivo has only SMS. This is probably fine for most users.
The only thing I couldn’t find in PHLO is the ability to execute an arbitrary JS function. There’s also no revision control as of yet. Other than that, PHLO is a rather straightforward too to use.Voximplant Smartcalls
The Voximplant Smartcalls service is different in nature. Where the rest of the pack here is focused on incoming events that trigger action, Smatcalls is all about campaigns. And all about voice.
You can create a scenario. Scenarios in Smartcalls is a visual decision tree of what to do with an outgoing call. You dial, someone answers, you play a specific recording, maybe ask them to click on digits, etc.
You can do things like send email or call a REST webhook, but the purpose of it all is to drive an automated outbound voice campaign: once you have a scenario, you create a campaign. A campaign is a time window, a scenario and a list of phone numbers to dial out to. Smartcalls does the rest to automate the scenario created across all phone numbers at the specified time window.On Pricing
Here things get somewhat murkier.
Do you pay for using the designer tool itself when it gets invoked? (you do with Twilio Studio)
Do you need to pay for the communications used within the flows created? (you don’t with Voximplant Smartcals).
Plivo, being the shadow of Twilio for voice and SMS, decided not to price the use of PHLO at all, and make that an important part of their announcement as well:
“That’s why, in addition to bringing in 100% Plivo-API support out-of-the-box, we are also making it FREE to build using PHLO. This is not just a commercial decision. This is our stake in the ground — as we truly believe this is how the communication capabilities of the future will be built.”
Here’s the visual from the product page:
Will this create pressure on Twilio? I doubt it, but who am I to say?A Comparison Table
I put these tools in a table, to see where each one is focused:
Twilio Studio Messagebird Flow Builder Plivo PHLO Voximplant Smartcalls Focus Inbound Inbound Inbound Outbound Medium Voice, SMS, Omnichannel messages SMS Voice, SMS Voice Cool factor Revision control Really easy to use Campaign management Flow pricing Per flow invoked Free Free Per minute charges Communications pricing Not included Not included Not included Included A Word about iPaaS
Maybe a few paragraphs…
iPaaS stands for Integration Platform as a Service. The poster child service here is probably Zapier, allowing the connectivity of one service to another. I use it daily in my own business to power many of the integrations on this website.
Many of the CPaaS players have been working on enabling their use via Zapier, so a user doesn’t need to be a developer to send a message for example. Being able to build more complex communication flows using a visual builder sits well with this approach.
What will be interesting to see is how the two play out with each other, if at all. Will these visual builders get integrated into Zapier? Will these visual builders include easier integration points to other services besides what they themselves offer and a rudimentary capability of invoking a REST call?Welcome to Visual CPaaS
CPaaS is more than making communication API calls or offering github repositories. In the past two years we’ve seen some interesting movements in this space and innovations coming out.
I can’t wait to see what will come next.
Not sure which CPaaS vendor to use? Check out my free CPaaS Vendor Selection Matrix. It will give you the KPIs to look for.
Download the CPaaS Vendor Selection Matrix
The post The CPaaS Version of iPaaS: MessageBird & Plivo Join the Twilio Studio Bandwagon appeared first on BlogGeek.me.
WebRTC isn’t the only cool media API on the Web Platform. The Web Virtual Reality (WebVR) spec was introduced a few years ago to bring support for virtual reality devices in a web browser. It has since been migrated to the newer WebXR Device API Specification.
I was at ClueCon earlier this summer where Dan Jenkins gave a talk showing that it is relatively easy to add a WebRTC video conference streams into a virtual reality environment using WebVR using FreeSWITCH.
A web survey says… that you need to join in to learn more about real time video technology.
I’ve partnered up with Vidyo on a survey they are working on with Hanover Research. This one is focused on how real time video technology gets used in different industries, as well as how decisions are made when choosing the technology stack to use.
I worked as a programmer during my time at school. It was fun, but it is hard to call it professional work (although the last place was a startup focused on medical patient records in the Israel healthcare system). My first “grownup” job as a developer was at a video conferencing company. You can say I’ve been spending my time in front of a webcam for more than half of my lifetime, communicating with peers and colleagues.
In the last several years, as a consultant, much of my work is conducted online. At times with customers that I have never met face to face – only through a video conference.
At testRTC, almost all of our sales are done through video conferencing. Recently, we had a conference call conducted on one of the web conferencing platforms that was selected for use by our customer (we tend to use Google Meet by default, but flexible to use whatever the customer is comfortable with). People from that company always join with their video turned off. I forgot mine on for a couple of seconds, which allowed me to use it as an excuse to ask the person who I had working relations with for several months now to see her as well. She obliged, and for a brief few seconds it felt more human. Now it is a lot easier for me to have a mental image of that person when she speaks. This adds volumes to the connection between us humans.
For me video isn’t a gimmick. It is a critical tool.
Are all my calls video calls? No. Just like I use messaging but still use voice calling. Different tools for different jobs.
When Vidyo asked me to join them for the survey, I automatically said yes. As someone who uses video on a daily basis, I am always interested in understanding how others are making use of video if at all.
The survey Vidyo is doing comes to answer one main question: How (and why) video gets embedded into different businesses?
For me, one of the more interesting questions relates to the applications businesses develop, and if they don’t plan on adding communication functions into them, then why. Understanding what barriers and challenges people see in these technologies can help us as an industry decide where to put our focus.
If you are reading this blog and want to help me out in understanding the industry better, would you be so kind as to fill out this online survey? If you do, you’ll have my thanks as well as a copy of the research findings.
The post Understanding video tech in the enterprise: a web survey appeared first on BlogGeek.me.
Our AI in RTC report got published, and I am proud of the results. Purchase it now while it is under its launch price.
It has been quite a ride to get this report completed. We spent many hours interviewing vendors, researching individually, sifting through web survey results, discussing topics between us and writing. Lots of writing.
When Chad said he estimates the report to be in the range of 60 pages – 80 tops – I laughed. It seemed ridiculous that the report will be “that short”. My own estimate was 100. Give or take a couple of pages.
We ended up with 147 pages. And not because we’ve increased the fonts or used double lines
There was just so much to cover and so much we wanted to discuss. We ended up with almost 30,000 words.
The report has 37 figures and 23 tables. We added them to make some of the concepts easier to understand and to put some order and methodology into the data provided.
Each chapter has its own set of recommendations, to help you move forward. We wanted to have an actionable report and not a lukewarm one.Initial Feedback
Last week, we delivered the final report to our prepublication customers – those who were willing to trust us with our work before even knowing it was complete.
I talked to one such customer two days later. He said he already read the whole report once, but will surely dive into it at least twice more. He had to digest all the information in it and see how it fits with his product roadmap.Artificial Intelligence and … Your Company
Here is something that I am sure today more than ever.
Machine learning and artificial intelligence are here to stay. They are going to be integrated into products and services across all industries, and communications is not going to be any different here.
There are 3 ways this can play out for a vendor in our industry:
What we’ve seen in our interviews for this report, along with the discussions we had with customers who purchased the report, I know that this is the right time to look into this domain and plan for the future.
I’d like to invite you on this journey – we’ve created a report preview, which contains the executive summary, scope and methodologies and the table of contents. You can download the preview from the research page on Kranky Geek:
There’s a special launch price at the moment, which will not be available once we hit September. So if you are interested, there’s no better time than the present.
Video, in the hands of the correct company can be a powerful thing.
In 2012 Telefonica acquires TokBox. I wrote about it at the time – almost 6 years ago. It seems sad reading that piece about TokBox acquisition again. I suggested three areas where Telefonica can make a difference with TokBox. Let’s see what happened.What Could Telefonica do with TokBox?
What I said in 2012:
Will Telefonica wait the same amount of time it did with Jajah until it does something with this acquisition? I hope they will move faster this time…
Telefonica did nothing with TokBox. They haven’t integrated them into anything. They decided to leave TokBox independent.
This has helped grow TokBox in the 6 years into one of the dominant players in video APIs for real time communications. Almost any developer and initiative that I talk to which has decided to go for a 3rd party platform decided to use TokBox. I see others as well, but not as frequent.
Since the acquisition, TokBox:
Telefonica failed to make use of TokBox. It didn’t go into video with it. It didn’t try to figure our VoIP. It didn’t try to understand why developers chose TokBox. Telefonica did nothing other than let TokBox continue in its trajectory. It is probably why Telefonica lost interest and decided to sell TokBox to Vonage.
Telefonica plans on folding TokBox into BlueVia, but how will they combine TokBox, if at all, with their Tu Me VoIP OTT service?
Telefonica made no use of its strengths to find synergies with TokBox. Would doing so kill TokBox altogether, or could it made them stronger?
What will Telefonica do about voice? Their main API set doesn’t seem to include voice calling, but now it has video… will they be going for Twilio or Voxeo for that one? Or will they roll out their own? Will they skip voice altogether?
TokBox doubled down on video, beefing up their capabilities in that domain. It has a SIP connector, but nothing more than that. It is a missed opportunity.Where is TokBox today?
TokBox is video communication APIs. There are other vendors out there doing that today: Twilio, Vidyo.io, Agora, Sinch, Voximplant, Temasys and probably a few others I forgot to mention (sorry for missing out on you).
TokBox are the market leader here, when it comes to breadths of features in the video space.
It just wasn’t enough to get them to more customers and garner more than $35 million in the acquisition. I’d attribute this to:
Does this say anything about the market of video APIs? The viability of it to other vendors? The importance of video in the bigger picture?
I don’t really know.Where are we with Video CPaaS?
Video CPaaS, and in a way we can extend it to WebRTC CPaaS vendors – those who don’t dabble too much with PSTN voice and/or SMS is a finickey market. The vendors that get acquired in this space are gobbled up never to be seen again (think AddLive or Requestec) or they just don’t grow fast enough or become as big as their PSTN voice/SMS counterparts.
IDC maintains that the U.S. programmable video market will be a $7.4 billion opportunity by 2022, representing more than a 140% four-year CAGR. Assuming only 10% of that becomes a reality, the question becomes who will be the winners in programmable video?
What types of services do they need to offer? What products? Are these lower level APIs, or higher level abstractions? Maybe we’re looking at almost complete solutions with a nice API lipstick on top that get calculated in that $7.4 billion.
Video is here to stay.
It won’t be replacing every voice call. But it definitely has its place.
Otherwise, why did apple go for group video calls in FaceTime with 32 participants in their latest iOS?
And why did Whatsapp just add group video calls? And Instagram added group video calls?
Are they doing it just for fun? Is the market bound to be focused only on larger social networks?
I can’t believe that will be the case.
I came from a video conferencing company. Every year I was promised by management that this year will be the year of video. It never happened.
The last 5 years, I am using video so much that the year of video has passed already.
I guess the next question is what year will be the year of video CPaaS?
The difference in these two questions is that the year of video is the year when video became a widespread service. The year of video CPaaS will be the year when video becomes a widespread feature. We’re not there yet, but we’re heading in that direction.
In many ways, TokBox is one of the vendors figuring out how to get there.Where are we with CPaaS?
CPaaS seems to be different, but only slightly.
Growth in this space, as far as I understand, comes from SMS and PSTN voice. That’s it.
VoIP? WebRTC? IP messaging? Social omnichannel aggregation? Video? All nice to have features for now that don’t affect the bottomline enough. And at the moment, they don’t seem to be big enough to fill in the gap when SMS and PSTN voice fall out of favor.
To be a successful CPaaS vendor today, you need to:
The thing about that third point, is that it won’t be as simple to achieve as doing what CPaaS did with SMS and PSTN. In SMS and PSTN, CPaaS needed to act as an aggregator of carriers with a simple API. No one wants to deal with carriers (which is why they fail with these API initiatives when it comes to WebRTC and video services), so friendly CPaaS vendors are a great alternative.
What is the mote/barrier that CPaaS vendors are building in the IP world? Answering this question holds the key to the future of CPaaS.What will Vonage do with TokBox?
Not have it as a standalone business.
Doing that, would mean perpetuating what happened in Telefonica. While not all of it was bad, it didn’t bring the expected growth with it.
Vonage is uniquely positioned here – more than any other vendor in the market, which is probably why it ended up acquiring TokBox.
I’ll go back to my venn diagrams for an explanation here:
TBD – IMAGE HERE
The opportunity space:
Telefonica was never a serious competitor in video CPaaS.
Nexmo and by extension Vonage is.
Nexmo is probably second to only Twilio.
TokBox is probably first in video CPaaS.
They combine nicely and offer Nexmo a capability that its competitors don’t have if you look at the breadth of their video offering.
If Vonage executes this well, the end result will be a better CPaaS offering, a better Nexmo and a better Vonage.
If you’re new to WebRTC, Jitsi was the first open source Selective Forwarding Unit (SFU) and continues to be one of the most popular WebRTC platforms. They were in the news last week because their parent group inside Atlassian was sold off to Slack but the team clarified this does not have any impact on the Jitsi […]
The post Suspending Simulcast Streams for Savvy Streamlining (Brian Baldino) appeared first on webrtcHacks.
Simulcast is one of the more interesting aspects of WebRTC for multiparty conferencing. In a nutshell, it means sending three different resolution (spatial scalability) and different frame rates (temporal scalability) at the same time. Oscar Divorra’s post contains the full details. Usually, one needs a SFU to take advantage of simulcast. But there is a […]
Our AI in RTC report is just about ready. Here are all of its price points.
If you aren’t interested in AI and RTC, then move on – this one isn’t for you.
In the past several months I’ve been adding into my daily activities the creation of a new report – one about AI in RTC.
It has taken its toll – I’ve slept a bit less. Read a bit less. Turned down and postponed a few clients. All in order to get this project going. I’ve partnered with Chad Hart on it, one of my partners in crime at Kranky Geek and a fellow consultant.
We wanted to work on something new and interesting and this seemed to be the right thing to do.
After countless hours in interviews with vendors and suppliers in this space, discussions we had with one another and time spent just looking at the ceiling of my office and thinking, I can say that we’re almost ready with the report. Most of it is already written, and what is left will be completed really soon.What will you find in this report?
Publication date is scheduled to end of July. We might miss it by a few days due to editing and some last minute changes.
We’re allowing payment via PayPal and wire transfer inside the US. We don’t have any digital shopping cart, as this is a first for us through Kranky Geek Research. It also means we’re treating each and every purchaser as royalty
Why wait for the price to raise? Join those who’ve already purchased at our discounted prepublication price. Interested? Just email us.
The post AI in RTC: Final Price Points and End of Prepublication Discount appeared first on BlogGeek.me.
Autonomous cars are sucking all the oxygen out of video AI in real time comms. Talent is focusing elsewhere
I went to the data science summit in Israel a month or so back. It was an interesting day. But somehow, I had to make sure to dodge all the boring autonomous cars sessions .they just weren’t meant for me, as I was wondering around, trying to figure out where machine learning and AI fit in RTC (you do remember I am working on a report on this – right?).
After countless of interviews done this past month, along with my partner in crime here, Chad Hart, I can say that I now know a lot more about this topic. We’ve mapped the industry in and out. Talking to technology vendors, open source projects, suppliers, consumers, you name it.
There were two interesting themes that relate to the use of AI in video – again – focus is on real time communications:
Guess what – we’re about to incorporate the responses we got on our web survey on AI in RTC into the report. If you fill it, you’ll get our upcoming “Introduction to AI in RTC ebook” and a chance to win on of 5 $100 Amazon gift cards – along with our appreciation of helping us out. Why wait?
In broad strokes, when you want to do something with AI, you’ll need to either source it from other vendors or build it on your own.
As an example, you can just use Amazon Rekognition to handle object classification, and then you don’t need a lot of in-house expertise.
The savvy vendors will have people handling machine learning and AI internally as well. Being in the build category, means you need 3 types of skills:
Data scientists are the hardest to find and retain. In one of our interviews, we were told that the company in question had to train their internal workforce for machine learning because it was impossible to hire experience in the valley – Google, Apple, Facebook and Amazon are the main recruiters for that position and they are too competitive in what they offer employees.
Data engineers are probably easier to find and train, but what is it you need them to do exactly?
And then there’s product managers. I am not even sure there’s any training program specifically for product managers who need to work in this space. I know I am still learning what that means exactly. Part of it by asking through our current research how do vendors end up adding AI into their products. The answers vary and are quite interesting.
Anyways – lots of hype. Less in the way of real skills out there you can hire for the job.Autonomous driving is where computer vision is today
If you follow the general technology media out there, then there are 3 things that bubble up to the surface these days when it comes to AI:
The third one is a very distinct use case. And it is the one that is probably eating away a lot of the talent when it comes to computer vision. The industry as a whole is interested for some reasons to take a stab at making cars drive on their own. This is quite a challenge, and it is probably why so many researchers are flocking towards it. A lot of the data being processed in order to get us there is visual data.
Vision in autonomous cars cannot be understated. This ABC News clip of the recent Uber accident drives that point home. Look at these few seconds explaining things:
“These vehicles are trained to see pedestrians, to see cyclists, to see redlights. So it’s really unclear what went wrong here”
And then you ask a data scientist to deal withboring video meeting recordings to do whatever it is we need to do in real time communications with AI. Not enough fame in it as opposed to self driving cars. Not enough of a good story to tell your friends when you meet them after work.Computer vision in video meetings is nascent
Then there’s the actual tidbit of what we do with AI in computer vision versus what we do with AI in video meetings.
I’d like to break this down into a table:Computer vision Video meeting AI
Why is this difference? Two main reasons:
As we move forward, companies will start figuring this one out – deciding how data pipeline for computer vision need to look like in video meetings AND decide what use cases are best addressed with computer vision.Where are we headed?
The communication market is changing. We are seeing tremendous shifts in our market – cloud and APIs are major contributors to this. Adding AI into the mix means change is ahead of us for years to come.
On my end, I am adding ML/AI expertise to the things I consult about, with the usual focus of communications in mind. If you want to take the first step into understanding where AI in RTC is headed, check out our upcoming report – there’s a discount associated with purchasing it before it gets published:
You can download our report prospectus here.
WebRTC H.264 hardware acceleration is no guarantee for anything. Not even for hardware acceleration.
There was a big war going on when it came to the video codec in WebRTC. Should we all be using VP8 or should we be using H.264? A lot of digital ink was spilled on this topic (here as well as in other places). The final decision that was made?
Both VP8 and H.264 became mandatory to implement by browsers.
So… which of these video codecs should you use in your application? Here’s a free mini video course to help you decide.
Enroll to free course
Fast forward to today, and you have this interesting conundrum:
Leaving aside the question of what mandatory really means in English (leaving it here for the good people at Apple to review), that makes only a fraction of the whole story.
There are reasons why one would like to use VP8:
There are reasons why one would like to use H.264:
I want to open up the challenges here. Especially in leveraging hardware based encoding in WebRTC H.264 implementations. Before we dive into them though, there’s one more thing I want to make clear:
You can use a mobile app with VP8 (or H.264) on iOS devices.
The fact that Apple decided NOT to implement VP8, doesn’t bar your own mobile app from supporting it.WebRTC H.264 Challenges
Before you decide going for a WebRTC H.264 implementation, you should need to take into consideration a few of the challenges associated with it.
I want to start by explaining one thing about video codecs – they come with multiple features, knobs, capabilities, configurations and profiles. These additional doozies are there to improve the final quality of the video, but they aren’t always there. To use them, BOTH the encoder and the decode need to support them, which where a lot of the problems you’ll be facing stem from.#1 – You might not have access to a hardware implementation of H.264
In the past, developers had no access to the H.264 codec on iOS. You could only get it to record a file or playback one. Not use it to stream media in real time. This has changed and now that’s possible.
But there’s also Android to contend with. And in Android, you’re living in the wild wild west and not the world wide web.
It would be safe to say that all modern Android devices today have H.264 encoder and decoder available in hardware acceleration, which is great. But do you have access to it?
The illustration above shows the value chain of the hardware acceleration. Who’s in charge of exposing that API to you as a developer?
The silicon designer? The silicon manufacturer? The one who built the hardware acceleration component and licensed it to the chipset vendor? Maybe the handset manufacturer? Or is it Google?
The answer is all of them and none of them.
WebRTC is a corner case of a niche of a capability inside the device. No one cares about it enough to make sure it works out of the factory gate. Which is why in some of the devices, you won’t have access to the hardware acceleration for H.264 and will be left to deal with a software implementation.
Which brings us to the next challenge:#2 – Software implementations of H.264 encoders might require royalty payments
Since you will be needing a software implementation of H.264, you might end up needing to pay royalties for using this codec.
I know there’s this thing called OpenH264. I am not a lawyer, though my understanding is that you can’t really compile it on your own if you want to keep it “open” in the sense of no royalty payments. And you’ll probably need to compile it or link it with your code statically to work.
This being the case, tread carefully here.
Oh, and if you’re using a 3rd party CPaaS, you might want to ask that vendor if he is taking care of that royalty payment for you – my guess is that he isn’t.#3 – Simulcast isn’t really supported. At least not everywhere
Simulcast is how most of us do group video calls these days. At least until SVC becomes more widely available.
What simulcast does is allows devices to send multiple resolutions/bitrates of the same video towards the server. This removes the need of an SFU to transcode media and at the same time, let the SFU offer the most suitable experience for each participant without resorting to lowest common denominator type of strategies.
The problem is that simulcast in H.264 isn’t available yet in any of the web browsers. It is coming to Chrome, but that’s about it for now. And even when it will be, there’s no guarantee that Apple will be so kind as to add it to Safari.
It is better than nothing, though not as good as VP8 simulcast support today.#4 – H.264 hardware implementations aren’t always compatible with WebRTC
Here’s the kicker – I learned this one last month, from a thread in discuss-webrtc – the implementation requirements of H.264 in WebRTC are such that it isn’t always easy to use hardware acceleration even if and when it is available.
Read this from that thread:
Remember to differentiate between the encoder and the decoder.
The Chrome software encoder is OpenH264 – https://github.com/cisco/openh264
Contributions are welcome, but the encoder currently doesn’t support either High or Main (or even full Baseline), according to the README file.
Hardware encoders vary greatly in their capabilities.
Harald Alvestrand from Google offers here a few interesting statements. Let me translate them for you:
And then comes this nice reply from the good guys at Fuze:
@Harald: we’ve actually been facing issues related to the different profiles support with OpenH264 and the hardware encoders. Wouldn’t it make more sense for Chrome to only offer profiles supported by both? Here’s the bad corner case we hit: we were accidentally picking a profile only supported by the hardware encoder on Mac. As a result, when Chrome detected CPU issues for instance, it would try to reduce quality to a level not supported by the hardware encoder which actually led to a fallback to the software encoder… which didn’t support the profile. There didn’t seem to be a good way to handle this scenario as the other side would just stop receiving anything.
If I may translate this one as well for your entertainment:
So. Got hardware encoder and/or decoder. Might not be able to use it.#5 – For now, H.264 video quality is… lower than VP8
That implementation of H.264 in WebRTC? It isn’t as good as the VP8 one. At least not in Chrome.
This is for the same scenario running on the same machines encoding the same raw video. The outgoing bitrate variance for VP8 is 0.115 while it is 0.157 for H.264 (the lower the better). Not such a big difference. The framerate of H.264 seems to be somewhat lower at times.
I tried out our new scoring system in testRTC that is available in beta on both these test runs, and got these numbers:
The 9.0 score was given to the VP8 test run while H.264 got an 8.8 score.
There’s a bit of a difference with how stable VP8’s implementation is versus the H.264 one. It isn’t that Cisco’s H.264 code is bad. It might just be that the way it got integrated into WebRTC isn’t as optimized as the VP8’s integration.
Then there’s this from the same discuss-webrtc thread:
We tried h264 baseline at 6mbps. The problem we ran into is the bitrate drastically jumped all over the place.
I am not sure if this relates to the fact that it is H.264 or just to trying to use WebRTC at such high bitrates, or the machine or something else entirely. But the encoder here is suspect as well.
I also have a feeling that Google’s own telemetry and stats about the video codecs being used will point to VP8 having a larger portion of ongoing WebRTC sessions.#6 – The future lies in AV1
After VP8 and H.264 there’s VP9 and H.265 respectively.
H.265 is nowhere to be found in WebRTC, and I can’t see it getting there.
And then there’s AV1, which includes as its founding members Apple, Google, Microsoft and Mozilla (who all happen to be the companies behind the major web browsers).
The best trajectory to video codecs in WebRTC will look something like this:Why doesn’t this happen in VP8?
It does. To some extent. But a lot less.
The challenges in VP8 are limited as it is mostly software based, with a single main implementation to baseline against – the one coming from Google directly. Which happens to be the one used by Chrome’s WebRTC as well.
Since everyone work against the same codebase, using the same bitstreams and software to test against, you don’t see the same set of headaches.
There’s also the limitation of available hardware acceleration for VP8, which ends up being an advantage here – hardware acceleration is hard to upgrade. Software is easy. Especially if it gets automatically upgraded every 6-8 weeks like Chrome does.
Hardware beats software at speed and performance. But software beats hardware on flexibility and agility. Every. Day. of. The. Week.What’s Next?
The current situation isn’t a healthy one, but it is all we’ve got to work with.
I am not advocating against H.264, just against using it blindingly.
How the future will unfold depends greatly on the progress made in AV1 as well as the steps Apple will be taking with WebRTC and their decisions of the video codecs to incorporate into Webkit, Safari and the iOS ecosystem.
Whatever you end up deciding to go with, make sure you do it with your eyes wide open.
So… which of these video codecs should you use in your application? Here’s a free mini video course to help you decide.
Enroll to free course
The post The Challenging Path to WebRTC H.264 Video Codec Hardware Support appeared first on BlogGeek.me.