Video, in the hands of the correct company can be a powerful thing.
In 2012 Telefonica acquires TokBox. I wrote about it at the time – almost 6 years ago. It seems sad reading that piece about TokBox acquisition again. I suggested three areas where Telefonica can make a difference with TokBox. Let’s see what happened.What Could Telefonica do with TokBox?
What I said in 2012:
Will Telefonica wait the same amount of time it did with Jajah until it does something with this acquisition? I hope they will move faster this time…
Telefonica did nothing with TokBox. They haven’t integrated them into anything. They decided to leave TokBox independent.
This has helped grow TokBox in the 6 years into one of the dominant players in video APIs for real time communications. Almost any developer and initiative that I talk to which has decided to go for a 3rd party platform decided to use TokBox. I see others as well, but not as frequent.
Since the acquisition, TokBox:
- Switched to WebRTC fully, killing its Flash based solution
- Increased its session sizes to fit thousands of parallel streams per session
- Added recording and broadcasting
- Created their Inspector tool, one of the best I’ve seen on the market for debugging sessions after the fact
- Cleaned, beefed up and curated their documentation. Again – one of the best I’ve seen on the market for communication APIs
- They gained customers as well. Per the press release, over 2,300 customers
Telefonica failed to make use of TokBox. It didn’t go into video with it. It didn’t try to figure our VoIP. It didn’t try to understand why developers chose TokBox. Telefonica did nothing other than let TokBox continue in its trajectory. It is probably why Telefonica lost interest and decided to sell TokBox to Vonage.
Telefonica plans on folding TokBox into BlueVia, but how will they combine TokBox, if at all, with their Tu Me VoIP OTT service?
- Didn’t happen
- BlueVia died somewhere between 2013-2014
- Along with Jajah, Tu Me and Tu whatever that Telefonica built
- VoIP is not a thing for carriers
- appear.in was sold by Telenor to Videonor
- AT&T started and stopped its WebRTC APIs initiative
- What will happen with Deutsche Telekom’s immmr?
Telefonica made no use of its strengths to find synergies with TokBox. Would doing so kill TokBox altogether, or could it made them stronger?
What will Telefonica do about voice? Their main API set doesn’t seem to include voice calling, but now it has video… will they be going for Twilio or Voxeo for that one? Or will they roll out their own? Will they skip voice altogether?
TokBox doubled down on video, beefing up their capabilities in that domain. It has a SIP connector, but nothing more than that. It is a missed opportunity.Where is TokBox today?
TokBox is video communication APIs. There are other vendors out there doing that today: Twilio, Vidyo.io, Agora, Sinch, Voximplant, Temasys and probably a few others I forgot to mention (sorry for missing out on you).
TokBox are the market leader here, when it comes to breadths of features in the video space.
It just wasn’t enough to get them to more customers and garner more than $35 million in the acquisition. I’d attribute this to:
- They weren’t operating as a startup. Being part of Telefonica meant stability, which probably took away their focus on revenue and growth in the way you see in other CPaaS vendors. The end result of such a thing is expenses that were too high when aligned to revenue or to the potential to raise money in the VC world. Vonage will need to handle this, and a change in direction and DNA is never an easy one
- Telefonica probably wanted out. They weren’t interested in continuing with this, so any amount above $0 was a good number for them
Does this say anything about the market of video APIs? The viability of it to other vendors? The importance of video in the bigger picture?
I don’t really know.Where are we with Video CPaaS?
Video CPaaS, and in a way we can extend it to WebRTC CPaaS vendors – those who don’t dabble too much with PSTN voice and/or SMS is a finickey market. The vendors that get acquired in this space are gobbled up never to be seen again (think AddLive or Requestec) or they just don’t grow fast enough or become as big as their PSTN voice/SMS counterparts.
IDC maintains that the U.S. programmable video market will be a $7.4 billion opportunity by 2022, representing more than a 140% four-year CAGR. Assuming only 10% of that becomes a reality, the question becomes who will be the winners in programmable video?
What types of services do they need to offer? What products? Are these lower level APIs, or higher level abstractions? Maybe we’re looking at almost complete solutions with a nice API lipstick on top that get calculated in that $7.4 billion.
Video is here to stay.
It won’t be replacing every voice call. But it definitely has its place.
Otherwise, why did apple go for group video calls in FaceTime with 32 participants in their latest iOS?
And why did Whatsapp just add group video calls? And Instagram added group video calls?
Are they doing it just for fun? Is the market bound to be focused only on larger social networks?
I can’t believe that will be the case.
I came from a video conferencing company. Every year I was promised by management that this year will be the year of video. It never happened.
The last 5 years, I am using video so much that the year of video has passed already.
I guess the next question is what year will be the year of video CPaaS?
The difference in these two questions is that the year of video is the year when video became a widespread service. The year of video CPaaS will be the year when video becomes a widespread feature. We’re not there yet, but we’re heading in that direction.
In many ways, TokBox is one of the vendors figuring out how to get there.Where are we with CPaaS?
CPaaS seems to be different, but only slightly.
Growth in this space, as far as I understand, comes from SMS and PSTN voice. That’s it.
VoIP? WebRTC? IP messaging? Social omnichannel aggregation? Video? All nice to have features for now that don’t affect the bottomline enough. And at the moment, they don’t seem to be big enough to fill in the gap when SMS and PSTN voice fall out of favor.
To be a successful CPaaS vendor today, you need to:
- Look into the future and execute the future
- Rely on SMS and PSTN revenue – AND improve your services in that domain
- Cultivate multiple IP based solutions and services, preparing to reap rewards once that market grows exponentially
The thing about that third point, is that it won’t be as simple to achieve as doing what CPaaS did with SMS and PSTN. In SMS and PSTN, CPaaS needed to act as an aggregator of carriers with a simple API. No one wants to deal with carriers (which is why they fail with these API initiatives when it comes to WebRTC and video services), so friendly CPaaS vendors are a great alternative.
What is the mote/barrier that CPaaS vendors are building in the IP world? Answering this question holds the key to the future of CPaaS.What will Vonage do with TokBox?
Not have it as a standalone business.
Doing that, would mean perpetuating what happened in Telefonica. While not all of it was bad, it didn’t bring the expected growth with it.
Vonage is uniquely positioned here – more than any other vendor in the market, which is probably why it ended up acquiring TokBox.
I’ll go back to my venn diagrams for an explanation here:
TBD – IMAGE HERE
The opportunity space:
- VBC at Vonage deals with UCaaS
- Nexmo and TokBox are all about CPaaS
- TokBox will probably be merged with Nexmo, brining a single offering to developers
- Nexmo has voice, SMS, IP messaging and omnichannel aggregation, with video just launched. TokBox has video
- Together, that completes the gap in communication services for developers, brining Vonage on par with its biggest CPaaS competitor – Twilio
- This means the threat of customers leaving TokBox to Twilio because they want to deal with a single vendor and need other telephony services is now lessened
- It also means that the threat of customers leaving Nexmo to Twilio because Nexmo lacks a good video service is now lessened as well
- If you are a TokBox customer that also uses Twilio, it might make sense for you to switch to Nexmo. I am sure Nexmo will be running the roster of TokBox customers to see if they have there Twilio customers that they can convert
- TokBox had time to flesh out their service in a unique way – the time Telefonica gave them were put into good use when it comes to infrastructure and developer related capabilities (look at Inspector and their documentation). Next, Vonage can decide to cherry pick the best pieces of Nexmo and TokBox to combine them and give a better user experience across the board for the developers using their CPaaS platform
- On the UCaaS front, Vonage is using Amazon Chime today. The challenge with Chime is that it is a complete standalone product – something that is harder to embed and integrate into an existing experience. Vonage isn’t alone here – RingCentral is relying on Zoom. Such integrations are nice, but they can’t go deep
- TokBox brings APIs that are far superior and more flexible than what Zoom, Chime or any other video conferencing player can bring with its integration APIs. Using these to bake video right into its UCaaS VBC app makes sense, and puts Vonage at a better position than its UCaaS competitors
- Especially if video is the next frontier
Telefonica was never a serious competitor in video CPaaS.
Nexmo and by extension Vonage is.
Nexmo is probably second to only Twilio.
TokBox is probably first in video CPaaS.
They combine nicely and offer Nexmo a capability that its competitors don’t have if you look at the breadth of their video offering.
If Vonage executes this well, the end result will be a better CPaaS offering, a better Nexmo and a better Vonage.
Our AI in RTC report is just about ready. Here are all of its price points.
If you aren’t interested in AI and RTC, then move on – this one isn’t for you.
In the past several months I’ve been adding into my daily activities the creation of a new report – one about AI in RTC.
It has taken its toll – I’ve slept a bit less. Read a bit less. Turned down and postponed a few clients. All in order to get this project going. I’ve partnered with Chad Hart on it, one of my partners in crime at Kranky Geek and a fellow consultant.
We wanted to work on something new and interesting and this seemed to be the right thing to do.
After countless hours in interviews with vendors and suppliers in this space, discussions we had with one another and time spent just looking at the ceiling of my office and thinking, I can say that we’re almost ready with the report. Most of it is already written, and what is left will be completed really soon.What will you find in this report?
- An introduction to machine learning and artificial intelligence. A high level one, which should be suitable for people who are less conversant in it
- Speech Analytics. A thorough chapter looking at how speech analytics is used in real time communications, including use cases, vendors and a lot more. I’d say the majority of the writing is here, as most of the focus of our industry is here
- Voice Bots. While a lot is said about chatbots, we decided to skip them (it would have de-focused us) and instead look at the domain of voice bots. Think Google Duplex, but for the enterprise
- Computer Vision. You probably saw just like me how autonomous driving is taking out the life out of computer vision elsewhere. That said, there are still vendors and places in RTC where you can find computer vision, which is what’s in this chapter of our report
- Cost and Quality Optimization. That’s the silent participant in every VoIP session you have. And it is slowly moving towards AI as well. We’ve found those who use it today and talked to those who don’t, trying to figure out both sides of the equation
- Survey summary. Remember that online survey? We’re still collecting the final responses, so be sure to fill it out if you haven’t. That’s where we will be writing our analysis if the responses we’ve received
- Other things?
- The introductory ebook on AI in RTC (still not written), that is also given for free to ALL those filling the online survey
- Glossary of terms related to RTC
- A powerpoint deck of all the illustrations from the report
Publication date is scheduled to end of July. We might miss it by a few days due to editing and some last minute changes.
- Prepublication price: $1,170 (available until publication)
- Launch discount: $1,950 (available until September 7)
- Official price: $2,950
We’re allowing payment via PayPal and wire transfer inside the US. We don’t have any digital shopping cart, as this is a first for us through Kranky Geek Research. It also means we’re treating each and every purchaser as royalty
Why wait for the price to raise? Join those who’ve already purchased at our discounted prepublication price. Interested? Just email us.
The post AI in RTC: Final Price Points and End of Prepublication Discount appeared first on BlogGeek.me.
Autonomous cars are sucking all the oxygen out of video AI in real time comms. Talent is focusing elsewhere
I went to the data science summit in Israel a month or so back. It was an interesting day. But somehow, I had to make sure to dodge all the boring autonomous cars sessions .they just weren’t meant for me, as I was wondering around, trying to figure out where machine learning and AI fit in RTC (you do remember I am working on a report on this – right?).
After countless of interviews done this past month, along with my partner in crime here, Chad Hart, I can say that I now know a lot more about this topic. We’ve mapped the industry in and out. Talking to technology vendors, open source projects, suppliers, consumers, you name it.
There were two interesting themes that relate to the use of AI in video – again – focus is on real time communications:
- There’s a lot less expertise to go around in the industry, where the industry is real time comms and not machine learning or computer vision in general
- The industry’s standards and capabilities seem higher and better than what we see in RTC today
Guess what – we’re about to incorporate the responses we got on our web survey on AI in RTC into the report. If you fill it, you’ll get our upcoming “Introduction to AI in RTC ebook” and a chance to win on of 5 $100 Amazon gift cards – along with our appreciation of helping us out. Why wait?
In broad strokes, when you want to do something with AI, you’ll need to either source it from other vendors or build it on your own.
As an example, you can just use Amazon Rekognition to handle object classification, and then you don’t need a lot of in-house expertise.
The savvy vendors will have people handling machine learning and AI internally as well. Being in the build category, means you need 3 types of skills:
- Data scientists – people who can look at hoards of data, check out different algorithms and decide on what works best – what pieces of data to look at and what model to build
- Data engineers – these are the devops of this field. They are there to connect the dots of the different elements in the system and build a kind of a pipeline where data gets processed and handled. They don’t need to know the details of algorithms, but they do need to know the jargon and concepts
- Product managers – these are the guys who need to decide what to do. Without them, engineers will play without any focus or oversight, wasting time and resources instead of working towards value creation. These product managers need to know a thing or two about data science, machine learning and how it works
Data scientists are the hardest to find and retain. In one of our interviews, we were told that the company in question had to train their internal workforce for machine learning because it was impossible to hire experience in the valley – Google, Apple, Facebook and Amazon are the main recruiters for that position and they are too competitive in what they offer employees.
Data engineers are probably easier to find and train, but what is it you need them to do exactly?
And then there’s product managers. I am not even sure there’s any training program specifically for product managers who need to work in this space. I know I am still learning what that means exactly. Part of it by asking through our current research how do vendors end up adding AI into their products. The answers vary and are quite interesting.
Anyways – lots of hype. Less in the way of real skills out there you can hire for the job.Autonomous driving is where computer vision is today
If you follow the general technology media out there, then there are 3 things that bubble up to the surface these days when it comes to AI:
- AI and job displacement
- The end of privacy (coupled with fake news in some ways)
- Autonomous cars
The third one is a very distinct use case. And it is the one that is probably eating away a lot of the talent when it comes to computer vision. The industry as a whole is interested for some reasons to take a stab at making cars drive on their own. This is quite a challenge, and it is probably why so many researchers are flocking towards it. A lot of the data being processed in order to get us there is visual data.
Vision in autonomous cars cannot be understated. This ABC News clip of the recent Uber accident drives that point home. Look at these few seconds explaining things:
“These vehicles are trained to see pedestrians, to see cyclists, to see redlights. So it’s really unclear what went wrong here”
And then you ask a data scientist to deal withboring video meeting recordings to do whatever it is we need to do in real time communications with AI. Not enough fame in it as opposed to self driving cars. Not enough of a good story to tell your friends when you meet them after work.Computer vision in video meetings is nascent
Then there’s the actual tidbit of what we do with AI in computer vision versus what we do with AI in video meetings.
I’d like to break this down into a table:Computer vision Video meeting AI
- Count faces/people
- Speaker identification
- Facial recognition
- Gesture control
- Emotion detection
- Auto-frame participants
Why is this difference? Two main reasons:
- Video meetings are real time in nature and limited in the available compute power. There’s more on that in our upcoming report. But the end result is that adopting the latest and greatest that computer vision has to offer isn’t trivial
- We haven’t figured out as an industry where’s the ROI in most of the computer vision capabilities when it comes to video meetings – there are lower hanging fruit these days in the form of transcription, translation and what you can do with speech
As we move forward, companies will start figuring this one out – deciding how data pipeline for computer vision need to look like in video meetings AND decide what use cases are best addressed with computer vision.Where are we headed?
The communication market is changing. We are seeing tremendous shifts in our market – cloud and APIs are major contributors to this. Adding AI into the mix means change is ahead of us for years to come.
On my end, I am adding ML/AI expertise to the things I consult about, with the usual focus of communications in mind. If you want to take the first step into understanding where AI in RTC is headed, check out our upcoming report – there’s a discount associated with purchasing it before it gets published:
You can download our report prospectus here.
WebRTC H.264 hardware acceleration is no guarantee for anything. Not even for hardware acceleration.
There was a big war going on when it came to the video codec in WebRTC. Should we all be using VP8 or should we be using H.264? A lot of digital ink was spilled on this topic (here as well as in other places). The final decision that was made?
Both VP8 and H.264 became mandatory to implement by browsers.
So… which of these video codecs should you use in your application? Here’s a free mini video course to help you decide.
Enroll to free course
Fast forward to today, and you have this interesting conundrum:
- Chrome, Firefox and Edge implement VP8 and H.264
- Safari implements H.264. No VP8
Leaving aside the question of what mandatory really means in English (leaving it here for the good people at Apple to review), that makes only a fraction of the whole story.
There are reasons why one would like to use VP8:
- It has been there from the start, so its implementation is highly optimized already
- Royalty free, so no need to deal with patents and payments and whatnot. I know there’s FUD around patents in VP8, but for the most part, 100% of the industry is treating it as free
- It nicely supports simulcast, so quite friendly to video group calling scenarios
There are reasons why one would like to use H.264:
- You already have H.264 equipment, so don’t want to transcode – be it cameras, video conferencing gear or the need to broadcast via HLS or RTMP
- You want to support Safari
- You want to leverage hardware based encoding and decoding to increase battery life on your mobile devices
I want to open up the challenges here. Especially in leveraging hardware based encoding in WebRTC H.264 implementations. Before we dive into them though, there’s one more thing I want to make clear:
You can use a mobile app with VP8 (or H.264) on iOS devices.
The fact that Apple decided NOT to implement VP8, doesn’t bar your own mobile app from supporting it.WebRTC H.264 Challenges
Before you decide going for a WebRTC H.264 implementation, you should need to take into consideration a few of the challenges associated with it.
I want to start by explaining one thing about video codecs – they come with multiple features, knobs, capabilities, configurations and profiles. These additional doozies are there to improve the final quality of the video, but they aren’t always there. To use them, BOTH the encoder and the decode need to support them, which where a lot of the problems you’ll be facing stem from.#1 – You might not have access to a hardware implementation of H.264
In the past, developers had no access to the H.264 codec on iOS. You could only get it to record a file or playback one. Not use it to stream media in real time. This has changed and now that’s possible.
But there’s also Android to contend with. And in Android, you’re living in the wild wild west and not the world wide web.
It would be safe to say that all modern Android devices today have H.264 encoder and decoder available in hardware acceleration, which is great. But do you have access to it?
The illustration above shows the value chain of the hardware acceleration. Who’s in charge of exposing that API to you as a developer?
The silicon designer? The silicon manufacturer? The one who built the hardware acceleration component and licensed it to the chipset vendor? Maybe the handset manufacturer? Or is it Google?
The answer is all of them and none of them.
WebRTC is a corner case of a niche of a capability inside the device. No one cares about it enough to make sure it works out of the factory gate. Which is why in some of the devices, you won’t have access to the hardware acceleration for H.264 and will be left to deal with a software implementation.
Which brings us to the next challenge:#2 – Software implementations of H.264 encoders might require royalty payments
Since you will be needing a software implementation of H.264, you might end up needing to pay royalties for using this codec.
I know there’s this thing called OpenH264. I am not a lawyer, though my understanding is that you can’t really compile it on your own if you want to keep it “open” in the sense of no royalty payments. And you’ll probably need to compile it or link it with your code statically to work.
This being the case, tread carefully here.
Oh, and if you’re using a 3rd party CPaaS, you might want to ask that vendor if he is taking care of that royalty payment for you – my guess is that he isn’t.#3 – Simulcast isn’t really supported. At least not everywhere
Simulcast is how most of us do group video calls these days. At least until SVC becomes more widely available.
What simulcast does is allows devices to send multiple resolutions/bitrates of the same video towards the server. This removes the need of an SFU to transcode media and at the same time, let the SFU offer the most suitable experience for each participant without resorting to lowest common denominator type of strategies.
The problem is that simulcast in H.264 isn’t available yet in any of the web browsers. It is coming to Chrome, but that’s about it for now. And even when it will be, there’s no guarantee that Apple will be so kind as to add it to Safari.
It is better than nothing, though not as good as VP8 simulcast support today.#4 – H.264 hardware implementations aren’t always compatible with WebRTC
Here’s the kicker – I learned this one last month, from a thread in discuss-webrtc – the implementation requirements of H.264 in WebRTC are such that it isn’t always easy to use hardware acceleration even if and when it is available.
Read this from that thread:
Remember to differentiate between the encoder and the decoder.
The Chrome software encoder is OpenH264 – https://github.com/cisco/openh264
Contributions are welcome, but the encoder currently doesn’t support either High or Main (or even full Baseline), according to the README file.
Hardware encoders vary greatly in their capabilities.
Harald Alvestrand from Google offers here a few interesting statements. Let me translate them for you:
- H.264 encoders and decoders are different kinds of pain. You need to solve the problem of each of these separately (more about that later)
- Chrome’s encoder is based on Cisco’s OpenH264 project, which means this is what Google spend the most time testing against when it looks at WebRTC H.264 implementations. Here’s an illustration of what that means:
- The econder’s implementation of OpenH264 isn’t really High profile or Main profile or even Baseline profile. It just implements something in-between that fits well into real time communications
- And if you decide not to use it and use a hardware encoder, then be sure to check what that encoder is capable of, as this is the wild wild west as we said, so even if the encoder is accessible, it is going to be like a box of chocolate – you never know what they’re going to support
And then comes this nice reply from the good guys at Fuze:
@Harald: we’ve actually been facing issues related to the different profiles support with OpenH264 and the hardware encoders. Wouldn’t it make more sense for Chrome to only offer profiles supported by both? Here’s the bad corner case we hit: we were accidentally picking a profile only supported by the hardware encoder on Mac. As a result, when Chrome detected CPU issues for instance, it would try to reduce quality to a level not supported by the hardware encoder which actually led to a fallback to the software encoder… which didn’t support the profile. There didn’t seem to be a good way to handle this scenario as the other side would just stop receiving anything.
If I may translate this one as well for your entertainment:
- You pick a profile for the encoder which might not be available in the decoder. And Chrome doesn’t seem to be doing the matchmaking here (not sure if that true and if Chrome can even do that if it really wanted to)
- Mac’s hardware acceleration for the encoder of H.264, as any other Apple product, has its very own configuration to it, which is supported only by it. But somehow, it doesn’t at some point which kills off the ability to even use that configuration when you try to fallback to software
- This is one edge case, but there are probably more like it lurking around
So. Got hardware encoder and/or decoder. Might not be able to use it.#5 – For now, H.264 video quality is… lower than VP8
That implementation of H.264 in WebRTC? It isn’t as good as the VP8 one. At least not in Chrome.
This is for the same scenario running on the same machines encoding the same raw video. The outgoing bitrate variance for VP8 is 0.115 while it is 0.157 for H.264 (the lower the better). Not such a big difference. The framerate of H.264 seems to be somewhat lower at times.
I tried out our new scoring system in testRTC that is available in beta on both these test runs, and got these numbers:
The 9.0 score was given to the VP8 test run while H.264 got an 8.8 score.
There’s a bit of a difference with how stable VP8’s implementation is versus the H.264 one. It isn’t that Cisco’s H.264 code is bad. It might just be that the way it got integrated into WebRTC isn’t as optimized as the VP8’s integration.
Then there’s this from the same discuss-webrtc thread:
We tried h264 baseline at 6mbps. The problem we ran into is the bitrate drastically jumped all over the place.
I am not sure if this relates to the fact that it is H.264 or just to trying to use WebRTC at such high bitrates, or the machine or something else entirely. But the encoder here is suspect as well.
I also have a feeling that Google’s own telemetry and stats about the video codecs being used will point to VP8 having a larger portion of ongoing WebRTC sessions.#6 – The future lies in AV1
After VP8 and H.264 there’s VP9 and H.265 respectively.
H.265 is nowhere to be found in WebRTC, and I can’t see it getting there.
And then there’s AV1, which includes as its founding members Apple, Google, Microsoft and Mozilla (who all happen to be the companies behind the major web browsers).
The best trajectory to video codecs in WebRTC will look something like this:Why doesn’t this happen in VP8?
It does. To some extent. But a lot less.
The challenges in VP8 are limited as it is mostly software based, with a single main implementation to baseline against – the one coming from Google directly. Which happens to be the one used by Chrome’s WebRTC as well.
Since everyone work against the same codebase, using the same bitstreams and software to test against, you don’t see the same set of headaches.
There’s also the limitation of available hardware acceleration for VP8, which ends up being an advantage here – hardware acceleration is hard to upgrade. Software is easy. Especially if it gets automatically upgraded every 6-8 weeks like Chrome does.
Hardware beats software at speed and performance. But software beats hardware on flexibility and agility. Every. Day. of. The. Week.What’s Next?
The current situation isn’t a healthy one, but it is all we’ve got to work with.
I am not advocating against H.264, just against using it blindingly.
How the future will unfold depends greatly on the progress made in AV1 as well as the steps Apple will be taking with WebRTC and their decisions of the video codecs to incorporate into Webkit, Safari and the iOS ecosystem.
Whatever you end up deciding to go with, make sure you do it with your eyes wide open.
So… which of these video codecs should you use in your application? Here’s a free mini video course to help you decide.
Enroll to free course
The post The Challenging Path to WebRTC H.264 Video Codec Hardware Support appeared first on BlogGeek.me.
Parallax, or eye contact in video conferencing is a problem that should be solved, and AI is probably how we end up solving it.
I’ve been working at a video conferencing company about 20 years ago. Since then a lot have changed:
- Resolutions and image quality have increased dramatically
- Systems migrated from on prem to the cloud
- Our focus changed from large room systems, to mobile, to desktop and now to huddle rooms
- We went from designed hardware to running it all on commodity hardware
- And now we’re going after commodity software with the help of WebRTC
One thing hasn’t really changed in all that time.
I still see straight into your nose or straight at your forehead. I can never seem to be able to look you in the eye. When I do, it ends up being me gazing straight at my camera, which is unnatural for me either.
The reason for this is known as the parallax problem in video conferencing. Parallax. What a great word.
If you believe Wikipedia, then “Parallax is a displacement or difference in the apparent position of an object viewed along two different lines of sight, and is measured by the angle or semi-angle of inclination between those two lines.”
A mouthful. Let me illustrate the problem:
What happens here is that as I watch the eyes of the person on the screen, my camera is capturing me. But I am not looking at my camera. I am looking at an angle above or beyond it. And with a group call with a couple of people in it in Hollywood squares, who should I be looking at anyway?
So you end up with either my nose.
Or my forehead.
What we really want/need is to have that camera right behind the eyes of the person we’re looking at on our display – be it a smartphone, laptop, desktop or room system.
Over the years, the notion was to “ignore” this problem as it is too hard to solve. The solution to it usually required the use of mirrors and an increase in the space the display needed.
Here’s an example from a failed kickstarter project that wanted to solve this for tablets – the eTeleporter:
The result is usually cumbersome and expensive. Which is why it never caught on.
There are those who suggest tilting the monitor. This may work well for static devices in meeting rooms, but then again, who would do the work needed, and would the same angle work on every room size and setup?
When I worked years ago at a video conferencing company, we had a European research project we participated in that included 3D imaging, 3D displays, telepresence and a few high end cameras. The idea was to create a better telepresence experience that got eye contact properly as well. It never saw the light of day.
Today, multiple cameras and depth sensors just might work.
Let’s first take it to the extreme. Think of Intel True View. Pepper a stadium with enough cameras, and you can decide to synthetically re-create any scene from that football game.
Since we’re not going to have 20+ 5K cameras in our meeting rooms, we will need to make do with one. Or two. And some depth information. Gleaned via a sensor, dual camera contraption or just by using machine learning.
Which is where two recent advancements give a clue to where we’re headed:
- Apple Memoji (and earlier Bitmoji). iPhone X can 3D scan your face and recognize facial movements and expressions
- Facebook can now open eyes in selfie images with the help of AI
The idea? Analyze and “map” what the camera sees, and then tweak it a bit to fit the need. You won’t be getting the real, raw image, but what you’ll get will be eye contact.Back to AI in RTC
In our interviews this past month we’ve been talking to many vendors who make use of machine learning and AI in their real time communication products. We’ve doubled down on computer vision in the last week or two, trying to understand where is the technology today – what’s in production and what’s coming in the next release or two.
Nothing I’ve seen was about eye contact, and computer vision in real time communication is still quite nascent, solving simpler problems. But you do see the steps taken towards that end game, just not from the video communication players yet.
The post Can AI and Computer Vision solve the video conferencing eye contact problem? appeared first on BlogGeek.me.
Is it machine learning or artificial intelligence? It ends up depending who you ask and what is it you care about.
There are multiple ways to think and look at machine learning and artificial intelligence. And just like any other hyped technologies, people seem to mix the two and use them interchangeably.
I’ll let you in on a little secret: we’re doing the same with our upcoming AI in RTC report.
Want to help us with our research AND get a free ebook AND have a chance to win one of five $100 Amazon gift cards?
We could have just as easily use the title “ML in RTC” instead of “AI in RTC”. The way we’d approach and cover the space and end up writing this market research would be… the same – in both cases.
- I’ve never been a stickler to such details, especially when so many are mixing them up
- This is the same as having VoIP, Convergence, UC and now Teams mean the exact same things – just slightly differently
- Or why WebRTC is both a standard specification (almost at least) and an open source project implementing an approximation of that standard specification
- And it is why people mix between ML and AI. The distinctions aren’t big enough for most of the population to care – or understand
- Whenever a new technology or term becomes interesting and gets hyped, overzealous marketing and sales people would start using it and abusing it
- Which is what we see with this whole AI thing that is just everywhere now
- So why not us with our new report about AI in RTC?
Which brings me to this article.
Machine Learning and Artificial Intelligence are somewhat different from one another. The problem is to decide what that difference is.
Here are 4 ways to think about ML and AI:#1 – ML = AI
Let’s start with the easiest one: ML is AI. There’s no difference between the two and they can be used interchangeably.
This is the viewpoint of the marketer, and today, of the market itself.
When everyone talks about AI, you can’t not talk about AI. Even if what you do is just ML. Or BigData. Or analytics. Or… whatever. Just say you’re doing AI. It is good for the health of your stock price.
While at it, make sure to say you’re doing AI in an ICO cryptocurrency fashion. What can go wrong?
Someone tells you he is doing AI? Assume ML, and ask for more information. Make your own judgement.#2 – The road to AI From Operational to BI
We’ve had databases in our products for many years now. We use them to store data, run transactions and take actions. These are known as operational databases. For many years we’ve had another set of databases – the analytical ones, used in data warehouses. The reason we needed them is because they worked better when asking questions requiring aggregations that look at large series of historical data.
That got the marketing terms of BI (Business Intelligence) and even Analytics.
BI because we’re selling now to the business (at a higher price point of course). And what we’re selling is value.
Analytics because it sounds harder than the operational stuff.From BI to BigData
The next leg of that journey started about a decade ago with BigData.
Storage started costing close to nothing, so it made sense to store everything. But now data warehouses from the good-ol’ BI days got too expensive and limiting. So we came out with BigData. Things like Hadoop and Cassandra came to be and we were happy again.
Now we could just throw all our data into Hadoop and run batch processes on it called MapReduce that ended up replacing/augmenting our data warehouses.
BigData was in big hype for some time. While it is very much alive today, it seems to have run out of steam for marketers. They moved on to Machine Learning.From BigData to ML
This step is a bit more nuanced, and maybe it isn’t a step at all.
Machine Learning covers the research area of getting machines to decide on their own algorithm – or more accurately – decide on how an algorithm will be used based on a given dataset.
Machine learning algorithms have been around well before machines. If you check the notes on Wikipedia for Linear Regression, you’ll find the earliest methods for it were published in 1805. And to be fair, these algorithms are used in BI as well.
The leap from BigData to ML happened mostly because of Deep Learning. Which I am keeping as a separate leap from ML. Why? Because many of the things we do today end up being simpler ML algorithms. We just call it AI (or ML) just because.
Deep Learning got everyone on the ML bandwagon.From ML to Deep Learning
Deep Learning is a branch of Machine Learning. A certain type of machine learning algorithms.
They became widely popular in recent years since they enabled the accuracy of certain tasks to increase significantly.
There are two things we can now achieve due to deep learning:
- Better image classification
- Better accuracy in speech to text
Here’s how Google fairs now (taken from KPCB internet trends):
We’ve been around the 70% accuracy at 2010, after a gradual rise in the past 40 years or so from 50%.
This steep rise in accuracy in this decade is attributed to the wide use of machine learning and the amount of data available as training material to the algorithms.
Deep learning is usually explained as neural networks, making it akin to human thinking (at least until the next wave of better algorithms will be invented which are more akin to human thinking).From Deep Learning to AI
And then there’s artificial intelligence.
Less a specific algorithm and more a target. To replace humans. Or to do what humans can do.
Or my favorite:
AI is a definition of what we can’t do with machines today.
Once we figure that out, we’ll just put AI on the next pedestal so we’ll have a target to conquer.#3 – Learning or Imitating?
Here’s one that is slightly different. I heard it at a data science event a couple of weeks ago.
Machine Learning is about getting machines to select their own algorithm by presenting them a set of rules and outcomes:
- You give a machine voice recordings, along with the transcription. And let them decide from that input on a new voice recording what the transcription should be
- You give a machine the rules to play a game, and let it play many times (millions?) until he gets better at it, devising his own algorithm and strategy
Artificial Intelligence is about doing something a human can do. Probably with the intent to replace him by automating the specific task. Think about autonomous driving – we’re not changing the roads or the rules of driving, we just want a car to drive itself the way a human would (we actually want the machine to drive better than humans).
- Machine Learning is about letting a machine devise his own algorithm based on data we give it
- Artificial Intelligence is about doing a task the way a human would
This one I saw at a recent event, which got me on this track of ML vs AI in the first place.
Machine Learning is about Predictions, while Artificial Intelligence is about Actions.
You can use machine learning to understand things, to classify them, predict and estimate. But once the time comes to act upon it, we’re in the realm of artificial intelligence.
It also indicates that any AI system needs ML to operate.
I am sure you can poke holes in this one, but it is useful in many ways.Why do we care?
While I am not a stickler to such details, words do have meaning. It becomes an issue where everyone everywhere is doing AI but some end up with a Google Duplex while others show a rolling average on a single metric value.
If you are using communications and jumpstarting an AI initiative, then be sure to check out our upcoming report: AI in RTC.
Want to help us with our research AND get a free ebook AND have a chance to win one of five $100 Amazon gift cards?
The post ML vs AI: What’s the difference between machine learning and artificial intelligence? appeared first on BlogGeek.me.
An interview with Alan Masarek, CEO of Vonage.
Doing these video interviews is fun, so when the opportunity arose to be at the Vonage headquarters in Holmdel, New Jersey, it made sense to ask for a video interview with Alan Masarek, the CEO of Vonage.
In this interview, I wanted to get Alan’s viewpoint about the space he is operating in, especially now, some two years after the acquisition of Nexmo. It is quite common to find UCaaS vendors then are heading towards the contact center. Many will even add APIs on top. Vonage is the only one who decided to acquire a dominant CPaaS vendor (Nexmo).
As usual, you’ll find the transcript right below the video.
I enjoyed the interview and the hospitality. I’d like to thank Alan and the team at Vonage for setting this one up.Transcript
Tsahi: Hi. So I have got here today, Alan Masarek, CEO of Vonage at the Holmdel, Vonage Technology Center.
Alan: That’s correct. We’re thrilled to be here at our Vonage Technology Center. It’s a pleasure to be with you, Tsahi. Thank you.
Tsahi: Thank you for having me here. I have a question before we start and this really bugged me a bit during the time that I’ve learnt about you and about the company: You came from Google to Vonage.
Alan: Well, first of all, if that’s the only thing that’s bugged you, that would be exceptional. But in all seriousness, what excited me when I was presented this opportunity when I was at Google … And I’d gotten to Google from selling my earlier company to them back in 2012. So I was a director in the Chrome and apps group and I was very involved in the whole rollout of what is now today, G Suite. We used to call it Google for Enterprise.
What intrigued me about coming here was the opportunity to take this almost iconic consumer brand company that built this amazing level of awareness around providing residential phone service and how you could take the brand and the network asset as well as the cash flow from consumer candidly, and use that to pivot into business. I always look at markets the same way. You sort of sit back and you say, “Is that market worth winning and do you have the assets to give you an ability to win it?”
So when you look at the broader business communications market, it’s a massive TAM growing very quickly. And then even when you look at the competitive set, I found the big companies in this set were pretty unfocused. Most of the competitors were smaller companies, had less brand awareness, less sort of national scope, less profitability. So you have this huge TAM, a surmountable competitive set, then you have these assets from consumer that we felt we could bring to bear to win and that’s exactly what we’ve been executing on, that’s what we saw when I was at Google, that’s what I came here to do.
Tsahi: So you’re actually staying in this area between consumer and enterprise. You did that at Google with acquisition and now here at Vonage, moving from consumer to businesses.
Alan: That’s correct. So the company that I sold to Google focused really in the prosumer and enterprise segment. So we were a productivity solution that individuals would use and corporations would use. Here, we obviously have moved very specifically from our roots in consumer, in residential, focused in business. When we began that pivot, we started with small companies because that’s where the action was and the move to cloud, but now we’ve moved very purposefully upmarket to larger and larger corporate customers.
Last year, we signed what I think is the largest deal ever done in cloud communications with the largest residential real estate company in the United States. 21,000 corporate seats moving from prem to cloud and another 125,000 franchise seats.
Tsahi: Interesting. And what gets you up in the morning?
Alan: Well, this morning at 5 o’clock, my alarm clock but … What I’m excited about and I’ve continued … The reason I came here to begin with is I want to build a remarkable company here. It’s not just the transformation from moving from a residential-focused company to a business-focused company. We’re clearly executing on all those elements, whether it’s the technology platform itself, sales execution, the post-sales experience we provide our customers, all those things that we’re doing. But as important and in some respects if not more important, it’s the cultural transformation as well.
What I find that is really sort of stimulating to me is to create that switched-on Silicon Valley mindset culture. I like to think that we’re a billion dollar startup is what we talk about it. Last year, we finally crossed the billion dollar in revenue threshold. But I want to have the agility, the speed, the openness, the transparency, the honesty, all that, in order for Vonage to be … The way I describe it is I want Vonage to be that destination place to work the way Google was and everybody celebrates when they get a Google. I want them to feel the same way getting a job here.
Tsahi: Okay. And you’re a cloud communication company at the end of the day and cloud communication in the last few years have got a lot of attention, especially this last year. How come most of the businesses today are still on-premise when it comes to their communication needs?
Alan: On the communication side, the move to cloud has happened more slowly than CRM and ERP and HRM software, things like that. I think because the nature of dial tone has been about as reliable as the sun coming up tomorrow and there’s a great degree of risk that’s associated with it. Companies sit back and they say, “My goodness. It works. I don’t necessarily want to change it.” Now, the reality is when you move from the traditional prem-based solutions and the old PSTN network and such to IP-based, cloud-based solutions, you have infinite scalability, much, much more functionality, the whole notion of unified communications and communications platform as a service all stems from that. But I just think there’s been a fear factor that has caused it to migrate to the cloud more slowly than some of these other verticals.
But you see this amazing tipping point as recently as five years ago, only small companies for the most part were moving to the cloud. Now it has moved all the way up to major enterprises. And there are just example after example of other huge companies, global multinationals moving to cloud. It’s sort of no longer in dispute that cloud will supplant prem. It’s just like anything takes time.
Tsahi: What triggers them to do that shift, that migration from on-prem to cloud?
Alan: There are several trigger points. A couple of them are the comfort of moving to cloud. The cloud was scary just a few years ago and so it was to be avoided by bigger companies. But beyond that, it’s the productivity that they can get. Every company out there is going through their own digital transformation of one form or the other. Everybody is looking over their shoulder, scared to death of the more digitally transformed competitor has a bullseye on their back, is coming after their business. Obviously, we can always cite the example of physical retail stores versus Amazon eCommerce. That notion of digital transformation everyone has to go through and I think what’s happened is up until very recently, communications has been sort of the underappreciated element of digital transformation.
I always have this sort of visual metaphor in my mind that you can picture somebody on the old black rotary dial phone talking to a colleague saying, “We got to get that eCommerce site up.” Not realizing that the problem itself or a major piece of the problem itself is their communications infrastructure, how people work differently with one another, how they collaborate, et cetera, et cetera. All those elements of what we’re providing with these cloud communications solutions are fueling their digital transformations. I think that’s now being seen. Folks are more aware of that all the time and that’s why you’re seeing kind of everything change and move to cloud so quickly.
Tsahi: When you look at the communication market, for me, it’s like a Venn diagram with different parts of it. There’re unified communication and then contact centers and recently, we see APIs, these CPaaS communication platform as a service. When I look at what competitors do in this space, your competitors and unified communications, they end up going and doing something or adding stuff in the contact center. And then when they look at the APIs, usually go and say, “Well, we just put an API”; obviously they do because 2018, everybody uses an API on top of what they do. But you did something differently. You went and acquired the company called Nexmo and then their APIs, haven’t even touched it in a way and you left that to be a separate part of the business or a business all its own, with and without relationship to what you’re doing in unified communications.
Alan: The reason that we bought Nexmo is we have a view of what business communications is and will be that’s different than most. Most in the example have hosted PBX which has really been the principal use case of UCaaS or hosted contact center which has been the principal use case of CCaaS. In our view, those are just applications. Hosted PBX, moving your prem-based PBX to the cloud is a big TAM onto itself but it’s not necessarily an industry. The same applies to contact center. It’s not an industry. It’s simply an application or a use case which is really large and really important. But at the same token, the whole now new acronym of CPaaS, Communications Platform as a Service, says, “Well, there are other elements of communications that I want to simply program into my workflow, my mobile app, my business process, my website.” What have you. But have nothing to do with the contact center or the PBX.
Our view has been that we’re building a communications platform company. The whole notion of it is it’s a microservices architected platform. So we’re taking the Nexmo platform and our own Vonage Business Cloud platform and bringing those together. We refer to that internally as 1V, One Vonage. From that microservices architecture, you’re just going to serve customers in those big use cases. So whether you bundle several hundred of those microservices together in a use case called PBX or in a use case called contact center, or sell them one at a time that just get embedded into something else via the software APIs, it doesn’t matter. It’s the same platform. You’re just feeding where the needs are the greatest.
And the notion of this is that there’s not different industries, UCaaS, CCaaS, CPaaS. It’s simply communication elements, how they get deployed. The way I like to think about it is I go back to the music industry. We grew up, here’s songs and we can buy it only one way. Packaged, pre-published on an album. Apple came along and the cloud and said, “I’m going to unbundle the model and you can buy a song one at a time.” And then streaming services and subscription services have come along and the ability to mash up your music. They’re just different delivery models of the same song. It’s the way I think about cloud communications. There are communication elements, audio, video, messaging. Whether you package them in big applications like PBX or unbundle them as microservices, which is the CPaaS model, it doesn’t really matter. It’s just where the needs are the greatest.
Because at the end of the day, communication only serves a purpose. Does it make the company more productive? Does it connect my customers in a more personalized way with me as a company? And does it drive better business outcomes for my business? If it doesn’t do that, it doesn’t really matter whether you call it UCaaS or CCaaS or CPaaS. It simply has to drive those better business outcomes and that’s the approach that we’re taking.
Tsahi: Talking about Nexmo, they are now 12, 18 months part of Vonage now.
Alan: Almost two years. June 5th will be two years.
Tsahi: What synergies have you seen since the acquisition, up until today?
Alan: There’s been a great deal of synergies. You mentioned before about the Venn diagrams where much of the industry has developed as if the segments, UCaaS, CCaaS, CPaaS have been separate. We reject that. If they were all Venn diagrams, they all will be separate. Our view is they’re coming together all the time. So increasingly, the purchaser at a company, Acme company, is the line of business manager. The conventional wisdom used to be that if I’m buying UCaaS, I’m the CIO or the head of IT and if I’m buying CCaaS contact center, I’m the help center. And if I’m buying communications platform as a service, I’m an individual developer, perhaps even the CMO. What you’re finding now is it’s coming together as lines of business. Given that trend from a synergy point of view, we’ve organized since the acquisition, completely functionally so that the entire engineering team, Vonage traditional or Nexmo reports up to the same CTO. The product organization up to the same chief product officer. Sales under the same chief revenue officer, same with marketing.
And they’re already doing tremendous amounts of lead sharing within the groups, operational sharing, sales enablement, sales training and things like that. Because what we’re finding is that in the cloud PBX world, your salespeople don’t want to go out there and go to a customer and say, “Buy me because my hunt group or my auto attendant is better than the other guys.” Because this very sort of baseline functionality. What you want to do is go into your customer and have a conversation about better business outcomes. So they’re just naturally carrying Nexmo into the discussion with every prospect out there. You can look at every one of our large company wins. It began with a Nexmo conversation interestingly, more than just the feature set of the PBX or the contact center. So you’re seeing very, very natural synergies happen. Now, it’s not a cost synergy issue for us in terms of people. When we bought Nexmo, it was about 175 people. I think it’s above 300 today and as I recall last time when I was in our London office, there was 140 open jobs for Nexmo this calendar year, so we’re growing in a big hurry.
Tsahi: We’ve talked about the cloud, we’ve talked about API. There is another big buzzword these days around communications and that’s “Teams”. The notion of what Slack started in a way. Messaging inside groups, smaller groups which is more ad hoc than the usual grounded structured way of communications. And you see today Microsoft going there, Cisco going there. All the big companies are headed there and then next to you, you got Google and Amazon joining this specific space. How is Vonage preparing towards that future of team collaboration, enterprise messaging, whatever you want to call it?
Alan: So not to sort of disclose all the goodies that are coming but within our roadmap, we have some very, very interesting developments around the collaboration and work stream messaging space that will be coming out later this year. And that’s tightly integrated as a single app whether you’re mobile, desktop or browser, with the experience in the communications system. Now, it also will integrate well with the major players that you just talked about. Slack, Stride, Teams, et cetera. Or it’s going to be WebEx, et cetera. Because it has to.
In our view, we can’t play king maker and say, “Oh. Mr. Customer, Mrs. Customer, you cannot use these other collaboration tools.” That’s ultimately going to the decision of the customer. So we have to have our own solution that is built-in in a fully integrated way but then the ability to integrate in with the others and that’s the approach that we’re taking.
Tsahi: Can I ask a question that just occurred to me?
Tsahi: What about contact centers?
Alan: I think contact center is incredibly important as part of the integrated solution. And so today, we have a contact center built into Vonage Business Cloud which is our own proprietary call processing stack. And for our Vonage Enterprise Solution, we use BroadWorks contact center functionality. Then, in those situations where they need an advanced contact center solution, then we are a reseller of inContact. But again, it’s integrated fully in with our solution, so it appears like it’s a single experience. And then we serve it as if it’s a single experience so the contract is on our paper, the support is ours, things like that.
Contact center though becomes very, very important in the CPaaS market because so much of how communications get embedded in through some software API into that website, that mobile app, business process, what have you, is about customer experience. And so think of it as task routing. Somebody is on my website and they’re looking at my product and they have a question. Today, they may pick up the phone and call and have to start over because there was no context to what they were doing on the website, and these CPaaS type tools are all about the contextual. The software identifies the context to what I was doing.
So if was on Delta Airlines site trying to book a flight and I was 10 minutes into booking the itinerary and all of a sudden it had a problem, in the past, I’d pick up the phone and just call and have to start over because no one had any idea of the itinerary I was just trying to book. These new contextual tools that you can embed in, understand the itinerary so that it routes through the appropriate IVR into the contact center. So think of it as a task, an intelligent task. It knows I was trying to book a flight from Tokyo to Shanghai next Thursday and it will route me through the appropriate IVR to the person on the help desk for the international Asia markets.
And so you can envision from a customer personalization or a customer intimacy, rather than me having to start over which is what happens today, which is very frustrating to all of us. You can imagine the agent picking the phone up and saying, “Hi, Mr. Masarek. I see you’re trying to book a flight next Thursday from Tokyo to Shanghai. How can I help?” That’s a direct connection between the customer experience, routing the task into the contact center. We think that’s very important.
Tsahi: Let’s look a little bit into the future.
Tsahi: What do you think is the biggest challenge for the modern businesses moving forward from now on? When it comes to communications of course.
Alan: I’m not sure it’s a challenge. I don’t want to sort of split words between challenge and opportunity, but I actually think communications is going to fundamentally change by virtue of we’re no longer tethered to a physical device. We think about communications, I’m on a call, either a landline or a desk. In our vision for it, communications is in everything. So whether it’s a click-to-call or click-to-communicate functionality in the website or … Pick whatever app you want. You’re on Salesforce, I’m on an Excel spreadsheet, someone else is in G Suite or in Gmail, or in Google Sheets. Doesn’t matter. There will be click-to-communicate functionality everywhere and naturally, these microservices that are going to be created increasingly by these CPaaS type solutions. So you’re going to have I think this explosion in communications the way I think about it because you’re no longer tethered to anything physical. You’re in an app or a website or what have you.
And the way I think about it is your decision of how you communicate is simply going to be a function of the limitations of the physical device that you got onto the internet with. So for instance, if the device doesn’t have a camera, you’re not going to do video. If it doesn’t have a speaker and microphone, you’re only going to do messaging, that’s all you can. But the mode, video, audio or messaging is going to be the limitations of the device and your personal preference, also kind of situational. If you just stepped out of the shower, you’re not going to do video likely. So the point is regardless of how you’re interacting in some sort of app or website, you’re going have communication everywhere. So I think the notion of the challenge to companies is less the challenge and more that I think it’s going to change the way we work because the notion of how we collaborate, how we share, the tightness of the communication, sort of that feedback loop is going to get tighter, and tighter, and tighter is the way I think about it.
I actually think about communication, this renaissance or this explosion in communication a little bit like the internet 10 years ago. 10 years ago, there was no video flying around the internet. It was kind of more flat files and such. There wasn’t full-motion video. There certainly wasn’t virtual reality and things like that, and self-driving cars and all these stuff that is just massive quantities of data that are going around the internet. When that began, look what happened with all the content delivery networks. They just kind of went like this in terms of the volume of capacity they have on the internet. I think communications is going to go through this similar renaissance or explosion in the sense because if communications are everywhere, not just on specific devices, you’re going to be communicating all the time, and so I think you’re going to see this massive uplift in it. If it’s a challenge out there, it’s going to create sort of communication overload, perhaps, but maybe smarter people than use will figure it out on how to make it simpler.
Tsahi: And moving forward, would businesses end up building their communication needs on top of APIs, go pick a UCaaS or a communication solution to do that for them or go for even a very specific niche SaaS product to get what they need?
Alan: I think that increasingly, communications will be built on top of the platform, the PaaS product, not going and buying some monolithic application. Like you said earlier, everybody’s got APIs. The old way we used to write software, we write a big monolithic solution from the UI, the user interface, all the way down to the metal called PBX, in our example. I can open up APIs to the PBX but it’s not programmable. It’s simply an API into that monolithic solution. Where we sit today is a microservices architecture where it’s fully programmable.
And I think what you’ll see, and this is exactly the strategy we’re building to, is whether you want to use that big chunk of microservices in a particular use case that is as a big application like PBX or a big application like contact center, it’s just a function of what’s the best way to deliver it to a customer. Do I think people are going to build their own PBX all the time? No. Because I think to me it’s analogous to the vast majority of people don’t build their own computer. You certainly could. You could be a hobbyist and build your own PC and buy the motherboard and the chassis and the whole bit, but very few people do that when you go out and buy a computer for $400. So I think the PBX distribution model where it’s something you’re going to subscribe to, it’s a SaaS solution, will persist, but I think the microservices are really going to takeover where communications get woven into everything else.
Tsahi: Vonage in 5 to 10 years from now, where do you see the company itself? What are you going to sell to businesses, to consumers? What kind of services are going to be there?
Alan: Vonage in the next five years will be an extraordinarily different company than it is today. Let me go backwards first. Four years ago, we were 100% consumer. Now, this year in 2018, roughly 60% of the revenue is business. Business is growing really quickly. So as of last quarter, 22% growth organically, nothing to do with acquisitions. And consumer has been declining as residential home phone usage is in decline, by 12% roughly. Now that business is the larger of the two segments and growing at twice the rate that consumer’s declining, you can imagine where the line separate in a very big hurry. So the whole focus of the organization is on business. It already is. Consumer is still a meaningful piece, it’s 40% but it’s getting smaller all the time as a percentage of the total.
What’s interesting from a how we’re going to serve customers is precisely the way we do it today. Our whole approach from a platform perspective, the way I described it where irrespective of whether it’s UCaaS, CCaaS or CPaaS, coming out of a common platform, we will continue to execute on that. What’s interesting where I think a value unlock happens for the company is you’re now going to have … We’re already having consolidated revenue growth.
Last year, we did just above a billion dollars in revenue. This year, Wall Street has us close to a billion fifty. Again, as the smaller piece, consumer, get smaller and smaller, it’s mitigating impact and overall growth declines. Therefore, we’re sort of more and more of a consolidated growth company. Again, unrelated to any acquisitions, just purely organically. The notion then of, “Oh my goodness. You’re in the midst of a transformation” goes away because you’ve now transformed.
So where I can see us in pretty short order is serving our approach to our customers in this differentiated way which I think will withstand the test of time, will withstand competitive entrance because, the end of the day, we’re just rooted in how do we provide better business outcomes for our customers. But now you’re going to have this increasingly fast growing consolidated company, well greater than a billion dollars in revenue, highly profitable still and I think that’s going to be a value unlock for the story. When I go back to many transformational stories in the early days, there’s a lot of investor skepticism about transformational stories is most of them don’t work. This one’s worked and that’s why we’ve had sort of a almost quadrupling of our stock price over the last four years.
Alan: All right.
Tsahi: Thanks for your time, Alan.
Alan: My pleasure. Thanks so much. I enjoyed it.
Tsahi: Me too.
Alan: Sure. Thank you.
Tsahi: Thank you.
The post UCaaS, CCaaS & CPaaS: An interview with Alan Masarek, Vonage CEO appeared first on BlogGeek.me.
ML in RTC can fit anywhere – from low level optimization to the higher application layers.TL;DR – I am working with Chad Hart on a new ML in RTC report. If you are interested in it, scroll down to the end of this article.
Machine Learning (ML), Artificial Intelligence (AI), Big Data Analytics. Call it what you will. You’ll be finding it everywhere. Autonomous cars, ecommerce websites, healthcare – the list goes on. In recent years we’ve seen a flourish in this domain due to the increase in memory and processing power, but also due to some interesting breakthrough in machine learning algorithms – breakthroughs that have rapidly increased the accuracy of what a machine can now do.My ML Origin Story
I’ve been looking and dealing with machine learning for many years now. Never directly calling it that, but always in the vicinity of the communications industry.
It probably started in university. I decided to do an M.Sc because I was somewhat bored at work. I took a course in computational linguistics which then ended with me doing research in backward transliteration, looking at phonemic similarities between English and Spanish (#truestory). That was in 2005, and we used a variant of dynamic programming and the viterbi algorithm. That and other topics such as hidden markov model were my part and parcel at the time.
Later on, I researched the domain of Big Data and Analytics at Amdocs. I was part of a larger group trying to understand what these mean in telecommunications. Since then, that effort grew into a full business group within Amdocs (as well as the acquisition of Pontis, well after I left Amdocs for independent consulting).
Which is why when I talked to Chad Hart about what we can do together, we came to an agreement that something around ML and AI made a lot of sense for both of us, and taking it through the prism of RTC (real time communications), placed it in the comfort zone of both of us.
During that period, we thought a lot about what domains we wish to cover and what ML in RTC really means.Categorizing ML in RTC
Communications is a broad enough topic, even when limited to the type that involves humans. So we limited even further to real time communications – RTC. And while at it, threw text out the window (or at the very least decided that it must include voice and video).
Why do that? So we don’t have to deal with the chatbots craze. That’s too broad of a topic on its own, and we figured there should be quite a few reports there already – and a few oil snake sellers as well. Not our cup of tea.
This still left the interesting question – what exactly can you do with AI and ML in RTC?
We set out to look at the various vendors out there and understand what are they doing when it comes to ML in RTC.
Our decision was to model it around 4 domains: Speech Analytics, Computer Vision, Voice Bots / Assistants and RTC quality / cost optimization.1. Speech Analytics
Speech Analytics deals a lot with Natural Language Processing (NLP) and Natural Language Understanding (NLU).
Each has a ton of different use cases and algorithms to it.
Think of a contact center and what you can do there with speech analytics:
- Employ speech-to-text for transcription of the sessions
- Go further with sentiment analysis from analyzing voice queues and not only the transcripted text
- Glean meaning out of the transcription and glean actionable insights based on that meaning
You will find a lot of speech analytics related RTC ML taking place in contact centers. A bit less of it in unified communications, though that might be changing if you factor in Dialpad’s acquisition of TalkIQ.2. Computer Vision
Computer Vision deals a lot with object classification and face detection, with all the derivative use cases you can bring to bear from it.
“Simple” things like face recognition or emotion recognition can be used in real time communications for a multitude of communication applications. Object detection and classification can be used in augmented reality scenarios, where you want to mark and emphasize certain elements in the scene.
Compared to speech analytics, computer vision is still nascent, though moving rapidly forward. You’ll find a growing number of startups in this domain as well as the cloud platform giants.3. Voice Bots & Assistants
To me, voice bots and assistants is the tier that comes right above speech analytics.
If speech analytics gets you to NLP and NLU, the ability to convert speech to text and from there moving to intent. Voice bots are about conversations – moving from a single request to a fluid interaction. The best example? Probably the Google Duplex demo – the future of what conversational AI may feel like.
Voice bots and assistants are rather new to the scene and they bring with them another challenge – do you build them as a closed application or do you latch on to the new voice bot ecosystems that have been rapidly making headway? How do you factor in the likes of Amazon Alexa, Google Home, Google Assistant, Siri and Cortana into your planning? Are they going to be the interaction points of your customers? Does building your own independent voice bot even makes sense?
Whatever the answers are, I am pretty sure there’s a voice bot in the future of your communications application. Maybe not in 2018, but down the road this is something you’ll need to plan for.4. RTC Quality & Cost Optimizations
While the previous 3 machine learning domain areas revolve around new use cases, scenarios and applications through enabling technologies, this one is all about optimization.
There are many areas in real time communication that are built around heuristics or simple rule engines. To give an example, when we compress and decompress media we do so using a codec. The encoding process (=compression) is lossy in nature. We don’t keep all the data from the original media, but rather throw away stuff we assume won’t be noticed anyway (sounds outside the human hearing range, small changes in color tones, etc) and then we compress the data.
The codecs we use for that purpose are defined by the decoder – by what you do if you receive a compressed bitstream. No one is defining when an encoder needs to look like or behave. That is left to developers to decide, and ecoders differ in many ways. They can’t brute-force their way to the best possible media quality, especially not in real-time – there’s not enough time to do that. So they end up being built around guesswork and heuristics.
Can we improve this with machine learning? Definitely.
Can we improve network routing, bandwidth estimation, echo cancellation and the myriad of other algorithms necessary in real time communications using machine learning? Sure we can.
The result is that you get better media quality and user experience by optimizing under the hood. Not many do it, as the work isn’t as high profile as the other domains. That said, it is necessary.Interested in ML in RTC?
Here are a few things you can do:Fill out our survey
This will get factored into the quantitative part of our report. If you fill it out, you will also receive a complimentary e-book we’re writing titled Intro to AI in RTC.
Interested in the report itself? Thinking of purchasing it? Great! We have a special launch discount.
You can find more information about the report itself in our research page.
Doing something interesting in this space? Share your thoughts with us.
Contact us via firstname.lastname@example.org to participate in our study.
The post Where does Machine Learning fit in Real Time Communication (ML in RTC)? appeared first on BlogGeek.me.
What should you be doing about the upcoming WebRTC 1.0 release?
That comic strip above? I think it embodies nicely what comes next.
We’ve started with WebRTC somewhere in 2011 or 2012. Depends who’s counting. So we’re 6 or 7 years in now.
I’ve been promised WebRTC 1.0 in 2015 I think.
Then again in 2016.
In 2017, I was told that WebRTC 1.0 is just around the corner. Definitely going to happen before year end.
Guess what? We’re now almost halfway through 2018. And no WebRTC 1.0. Yet.
But it is coming.
To give you the gist, Google will be ripping out some code, adding new code. Removing APIs. Modifying others. The timeline stated for all this in that posting?
- End of April 2018: “Unified Plan” and the new APIs stabilizes
- July 2018: Default SdpSemantics changes to UnifiedPlan
- No earlier than end of year 2018: PlanB semantics removed and UnifiedPlan becomes the only option
Change is in the air…
That change is going to affect developers and testers everywhere, and the end result is going to be uncertainties and surprises in the coming months. How many months? Many months
There’s not much you can do about it besides allocating resources to the problem in the short and mid term future. These resources should be in development and testing.
I touched development in a previous webinar I did with Philipp Hancke when I launched the next round of my WebRTC training.
Now I want to talk about preparation aspects in the testing domain – what exactly you should be expecting moving forward.
To that end, I am a visitor of the upcoming WebRTC Standards webinar series. The webinar takes place later today –
Building an interactive application? There’s more than one WebRTC programming language that can fit your needs.
Last time I’ve written about WebRTC programming languages it was some two years ago. My focus then was how programming languages fit to different WebRTC components. That article is still relevant, so I suggest you read it as well.
This time, I want to focus on something slightly different. In recent months I’ve had the pleasure of watching as well as consulting teams of developers who are using programming languages that don’t always make sense to me when it comes to WebRTC. And yet, once they explained their reasoning, the decision and path they took would be one I couldn’t just dismiss if I were in their shoes.
There are two main places where you see JaveScript in WebRTC apps today: client side as well as signaling server.Clients
Client side is simple enough. It is due to the fact that this is how you use WebRTC in browsers, but it can also be seen in cross platform mobile development (though not that much) or when using Electron for WebRTC.Signaling
Signaling servers is about Node.js. And yes, the guest article from 2013 (!) is as relevant now as it was then. Twelephone (mentioned in that article) is long dead, and Chris has moved on to IOT and later decentralized networks. But the use of Node.js as a signaling server for WebRTC is going strong. If you take that route, just make sure to pick a popular framework.Media Servers
I’ve seen Node.js being used in media servers as well:
- mediasoup for example, is an open source SFU that was built as a Node.js module to fit into a larger application
- SwitchRTC, a commercial SFU that got acquired by YouNow, was a combination of C/C++ and Node.js
- appear.in’s SFU (available for their PRO accounts) was built using Node.js
I’ll be using C and C++ together here, as I don’t see a huge distinction between the two (and in most cases, those that develop code in C create C++ abstractions, and those developing in C++ end up writing C code anyways).
C/C++ is a kind of a lowest common denominator. My guess is that a lot of the languages here have their compilers/interpreters written in C/C++ anyways. It is also a language that is available everywhere, though not always accessible directly.Clients
Go to webrtc.org and you’ll find out that the code WebRTC that Google open sourced is written in C++. The easiest thing to do if you want to support WebRTC on an embedded device is probably to start by taking that code and hammering it until it fits your device.
It isn’t the only way to get WebRTC into a device but it sure is a popular one.Signaling
Signaling with C++ isn’t common. Where you will find it is when SIP meets WebRTC.
Interestingly, all main open source SIP servers are written in C/C++: Asterisk, OpenSIPS and FreeSWITCH.
I am assuming this is because they are older than other WebRTC signaling implementations that tend to use higher languages.STUN/TURN
NAT traversal servers are written today in C/C++.
Not all of them. But the most popular one is (coturn).Media Servers
Media servers need to be highly performant, which is why most of them also end up being written in C/C++ – at least the parts that matter for performance.
- Janus, a popular media server is written in C
- SwitchRTC mentioned in relation to Node.js has a C++ component to it handling all media networking
- Kurento’s core is written in C/C++
- CPaaS vendors are using C/C++ in their media servers – at least those that I know the programming languages that they use
Java is the most popular language based on the TIOBE index.
I don’t like Java that much, probably due to its verbosity and all the pain that its garbage collection causes to real time apps. But I don’t get to decide what others use
Java is probably one of the most popular programming languages also in WebRTC backend development, but not only there.Clients
Android requires Java, which means Android native development with WebRTC also requires Java coding.
Besides this obvious option, there’s also the part of writing WebRTC clients in Java from scratch.
While you won’t find any open source Java client for WebRTC, I know of two separate WebRTC client implementations that use Java.Signaling
There are many signaling servers out there that end up using Java when it comes to WebRTC. You’ll find it in enterprise software but not only.
A few of them are also open sourced, though there isn’t a specific one that is widely popular or highly recommended as far as I can tell.Media Servers
Several of the media servers out there use Java. Either for everything or for the higher level abstractions.
Here are two of them:
- Jitsi is written in Java
- OpenVidu, an SFU implemented on top of Kurento by the Kurento maintainers is written in Java
If Java is needed for development of WebRTC in Android, then Swift is what you need for iOS.
Unless you’re fine with using Objective-C and not adopt Swift.
Other than iOS, I haven’t seen Swift used anywhere else when it comes to WebRTC implementations (or otherwise for that matter).Python
As a higher level language, Python gets high marks. I’ve been introduced to it about a decade ago and loved it since, though I can’t say I used it too much myself.
When it comes to WebRTC, Python has its role in signaling servers, as many other higher level languages.
The most notable Python project related to WebRTC is [matrix]. Its open source Synapse server implementation is written in Python.
There are others using Python for their signaling. My guess is that this is just because of familiarity with the language.Ruby
You can’t mention Python without mentioning Ruby.
I am guessing that Ruby can fit in signaling servers just as well as Python. Only difference is that I know of no one that is doing that.C#
If someone is making use of a Windows based development stack, he is more likely to use C# than anything else.
In such cases, you will see the use of C# for all aspects of WebRTC – from native WebRTC client implementations, through signaling, to NAT traversal and media servers.
I am not a big fan of backend development on Windows, but if you are one of those who need it, then know you are not alone.PHP
Having such huge market share means more haters, especially when the language itself isn’t the most modern one out there.
What surprised me (it shouldn’t, but it did), was that there are companies who use PHP for their signaling server when it comes to WebRTC. I would never have thought that would be the case, but it is.
If you want to use PHP for your signaling server, then go for it. Just make sure you understand the limitations and implications of it.Erlang
We’re getting into the more exotic alternatives.
Erlang is such a programming language to me. Created by Ericsson and open sourced ages ago, Erlang offers some interesting capabilities (go read Wikipedia).
There are a few projects out there that make use of Erlang for signaling and one for NAT traversal. I am not aware of anyone using Erlang in a production service, though I am sure there is.Elixir
Elixir is built on top of Erlang (or at least on its virtual machine).
Only one I know who makes use of it for WebRTC is Slack.
In last year’s Kranky Geek event, Slack shared their plans of migrating from their Janus implementation to an in-house developed Elixir media server. You can watch that here:Go
Go is a programming language created by Google. It is somewhere between C and C++ as much as I can tell (never been an expert in Go).
This one came to my attention with the recent implementation of STUN/TURN in Go – Pion TURN.
Not sure how popular is Go with WebRTC developers.Where do these languages fit?
I tried taking the information above and placing it in an easy to use table, to give a quick summary of the state of WebRTC programming languages.
- Green indicates a popular choice
- Orange indicates an alternative that I’ve seen being used in the wild (or in production)
- Gray is something I haven’t seen used at all
I am sure I missed a language or two. And I am also sure some people are using WebRTC programing languages differently than I’ve described here. Feel free to share in the comments for this article or by emailing me about it – I’d love to learn more.
What programming language should you use?
That’s a different question. I’d say it depends on several factors:
- What is it you are implementing and for which devices?
- What do your current developers know and are comfortable with?
- What is your operational envelope for the service?
- Are there any popular open source or commercial products for WebRTC in that language?
- How easy will it be to find experienced developers for that language? How about developers who know both this language and WebRTC?
Google in 2018 is all about AI. But not only…
In November 2015, Google released TensorFlow, an open source machine learning framework. While we’ve had machine learning before that – at Google and elsewhere, this probably marks the date when machine learning and as an extension AI got its current spurt of growth.
Some time, between that day and the recent Google I/O event, Sundar Pichai, CEO of Google, probably brought his management team, knocked on the table and told them: “We are now an AI company. I don’t care what it is that you are doing, come back next week and make sure you show me a roadmap of your product that has AI in it.”
I don’t know if that meeting happened in such a form or another, but I’d bet that’s what have been going at Google for over a year now, culminating at Google I/O 2018.
After the obligatory icebreaker about the burger emoji crisis, Pichai immediately went to the heart of the keynote – AI.
Google announced AI at last year’s Google I/O event, and it was time to show what came out of it a year later. Throughout the 106 minutes keynote, AI was mentioned time and time again.
That said, there was more to that Google I/O 2018 keynote than just AI.
Google touched at its keynote 3 main themes:
- Fake news
I’d like to expand on each of these, as well as discuss parts of Smart Displays, Android P and Google Maps pieces of the keynote.
I’ll try in each section to highlight my own understanding and insights.Before we begin
Many of the features announced are not released yet. Most of them will be available only closer to the end of the year.
Google’s goal was to show its AI power versus its competition more than anything else they wanted to share in this I/O event.
This is telling in a few ways:
- Google weren’t ready with real product announcements for I/O that were interesting enough to fill 100 minutes of content. Or more accurately, they were more interested in showing off the upcoming AI stuff NOW and not wait for next year or release it later
- Google either knows its competitors are aware of all the progress it is making, or doesn’t care if they know in advance. They are comfortable enough in their dominance in AI to announce work-in-progress as they feel the technology gap is wide enough
When it comes to AI, Google is most probably the undisputed king today. Runners up include Amazon, Microsoft, IBM, Apple and Facebook (probably at that order, though I am not sure about that part).
If I try to put into a diagram the shift that is happening in the industry, it is probably this one:
Not many companies can claim AI. I’ll be using ML (Machine Learning) and AI (Artificial Intelligence) interchangeably throughout the rest of this article. I leave it to you to decide which of the two I mean
AI was featured in 5 different ways during the keynote:
- Feature enhancer
- Google Assistant (=voice/speech)
- Google Lens (=vision)
In each and every single thing that Google does today, there’s an attention to how AI can improve that thing that needs doing. During the keynote, AI related features in GMail, Google Photos and Android were announced.
It started off with four warm-up feel-good type use cases that weren’t exactly product announcements, but were setting the stage on how positive this AI theme is:
- Diagnosing diseases by analyzing human retina images in healthcare
- Predicting probability of rehospitalization of a patient in the next 24 hours
- Producing speaker based transcription by “watching” a video’s content
- Predictive morse typing for accessibility
From here on, most sections of the keynote had an AI theme to them.
Moving forward, product managers should think hard and long about what AI related capabilities and requirements do they need to add to the features of their products.What are you adding to your product that is making it SMARTER?Google Assistant (=voice and speech)
Google Assistant took center stage at I/O 2018. This is how Google shines and differentiates itself from its main 3 competitors: Apple, Amazon and Facebook.
In March, Forbes broke some interesting news: at the time, Amazon was hiring more developers for Alexa than Google was hiring altogether. Alexa is Amazon’s successful voice assistant. And while Google hasn’t talked about Google Home, its main competitor at all, it did emphasize its technology differentiation. This emphasis at I/O was important not only for Google’s customers but also for its potential future workforce. AI developers are super hard to come by these days. Expertise is scarce and competition between companies on talent is fierce. Google needs to make itself attractive for such developers, and showing it is ahead of competition helps greatly here.
Google Assistant got some major upgrades this time around:
- WaveNet. Google now offers an improved text to speech engine that makes its speech generator feel more natural. This means:
- To get new “voices” now requires Google to have less samples of a person speaking
- Which allowed it to introduce 6 new voices to its Assistant (at a lower effort and cost)
- To make a point of it, they started working with John Legend to get his voice to Assistant – his time is more expensive, and his voice “brand” is important to him, so letting Google use it shows his endorsement to Google’s text-to-speech technology
- This is the first step towards the ability to mimic the user’s own voice. More on that later, when I get to Google Duplex
- Additional languages and countries. Google promised support for 30 languages and 80 countries for Assistant by year end
- Naturally Conversational. Google’s speech to text engine now understand subtleties in conversations based not only on what is said but also how it is said, taking into account pitch, pace and pauses when people speak to it
- Continued conversation. “Hey Google”. I don’t need to say these action words anymore when engaging in a back and forth conversation with you. And you maintain context between the questions I ask
- Multiple actions. You can now ask the assistant to do multiple things at once. The assistant will now parse them properly
Besides these additions, where each can be seen as a huge step forward on its own right, Google came out with a demo of Google Duplex, something that is best explained with an audio recording straight from the keynote:
If you haven’t watched anything from the keynote, be sure to watch this short 4 minutes video clip.
There are a few things here that are interesting:
- This isn’t a general purpose “chatbot”/AI. It won’t pass a turing test. It won’t do anything but handling appointments
- And yet. It is better than anything we’ve seen before in doing this specific task
- It does that so naturally, that people can’t distinguish it from a real person, at least not easily
- It is also only a demo. There’s no release date to it. It stays in the domain of “we’ve got the best AI and we’re so sure of it that we don’t care of telling our competitors about it”
- People were interested in the ethical parts of it, which caused Google to backtrack somewhat later and indicate Duplex will announce itself as such at the beginning of an interaction
- Since we’re still in concept stage, I don’t see the problem
- I wouldn’t say google were unethical – their main plan on this one was to: 1. Show supremacy; 2. Get feedback
- Now they got feedback and are acting based on it
- Duplex takes WaveNet to the next level, adding vocal queues to make the chatbot sound more natural when in a conversation. The result is uncanny, and you can see by the laughs of the crowds at I/O
- Duplex is a reversal of the contact center paradigm
- Contact center software, chatbots, ML and AI are all designed to get a business better talk with its customers. Usually through context and automation
- Duplex is all about getting a person to better talk to businesses. First use case is scheduling, but if it succeeds, it won’t be limited to that
- What’s there to stop Google from reversing it back and putting this at the hands of the small businesses, allowing them to field calls of customers more efficiently?
- And what happens once you put Duplex in both ends of the call? An AI assistant for a user trying to schedule an appointment with an AI assistant of a business
- When this thing goes to market, Google will have access to many more calls, which will end up improving their own services:
- An improvement to the accuracy and scenarios Duplex will be relevant for
- Ability to dynamically modify information based on the content of these calls (it showed an example of how it does that for opening hours on Google Maps during the keynote)
- Can Google sell back a service to businesses for insights about their contact centers based on people’s requests and the answers they get? Maybe even offer a unique workforce optimization tool that no one else can
- I’d LOVE to see cases where Duplex boches these calls in Google’s field trials. Should be hilarious
You’d like to read what Chad Hart has to write about Duplex as well.
For me, Duplex and Assistant are paving the way to where we are headed with voice assistants, chatbots and AI. Siri, Cortana and Lex seem like laggards here. It will interesting to see how they respond to these advancements.
Current advancements in speech recognition and understanding make it easier than ever to adopt these capabilities into your own products.If you plan on doing anything conversational in nature, look first at the cloud vendors and what they offer. As this topic is wide, no single vendor covers all use cases and capabilities.
While at it, make sure you have access to a data set to be able to train your models when the time comes.Google Lens (=vision)
Where Google Assistant is all (or mostly) about voice, Google Lens is all about vision.
Google Lens is progressing in its classification capabilities. Google announced the following:
- Lens now recognizes and understands words it “sees”, allowing use cases where you can copy+paste text from a photo – definitely a cool trick
- Lens now handles style matching for clothing, able of bringing suggestions of similar styles
- Lens offers points of interest and real time results by offering on-device ML, coupled with cloud ML
That last one is interesting, and it is where Google has taken the same approach as Amazon did with DeepLens, one that should be rather obvious based on the requirements here:
- You collect and train datasets in the cloud
- You run the classification itself on the edge device – or in the cloud
It took it a step further, offering it also programmatically through ML Kit – Google’s answer to Apple’s Core ML and Amazon’s SageMaker.
Here’s a table summarizing the differences between these three offerings:Google Apple Amazon ML Framework TensorFlow Core ML + converters MXNet & TensorFlow Cloud component Google Firebase none AWS SageMaker Edge component ML Kit Core ML AWS DeepLens Edge device types Android & iOS iOS DeepLens Base use cases
- Image labeling
- Text recognition
- Face detection
- Barcode scanning
- Landmark detection
- Smart reply
- Object detection
- Hot dog not hot dog
- Cat and dog
- Artistic style transfer
- Activity recognition
- Face detection
Apple Core ML is a machine learning SDK available and optimized for iOS devices by Apple. You feed it with your trained model to it, and it runs on the device.
- It is optimized for iOS and exists nowhere else
- It has converters to all popular machine learning frameworks out there
- It comes with samples from across the internet, pre-converted to Core ML for developers to play with
- It requires the developers to figure out the whole cloud backend on their own
AWS DeepLens is the first ML enabled Amazon device. It is built on top of Amazon’s Rekognition and SageMaker cloud offerings.
- It is a specific device that has ML capabilities in it
- It connects to the AWS cloud backend along with its ML capabilities
- It is open to whatever AWS has to offer, but focused on the AWS ecosystem
- It comes with several baked samples for developers to use
Google ML Kit is Google’s machine learning solution for mobile devices, and has now launched in beta.
- It runs on both iOS and Android
- It makes use of TensorFlow Lite for the device side and on TensorFlow on the backend
- It is tied into Google Firebase to rely on Google’s cloud for all backend ML requirements
- It comes with real productized use cases and not only samples
- It runs its models both on the device and in the cloud
This started as Google Lens and escalated to an ML Kit explanation.Need to run ML? You need to think where training the model occurs and where classification takes place. These seem to be split these days between cloud and devices. In many cases, developers are pushing the classification algorithms towards the devices at the edge to gain speed and reduce costs and load on the backend. HWaaS
With everything moving towards the cloud, so does hardware in some sense. While the cloud started from hardware hosting of virtualized Linux machines, we’ve been seeing a migration towards different types of hardware recently:
We’re shifting from general purpose computing done by CPUs towards specialized hardware that fits specific workloads in the form of FPAG.
The FPGA in the illustration above is Google’s TPU. TPU stands for TensorFlow Processing Unit. These are FPGAs that have been designed and optimized to handle the TensorFlow mathematical functions.
TensorFlow is said to be slow on CPUs and GPUs compared to other alternatives, and somehow Google is using it to its advantage:
- It open sourced TensorFlow, making it the most popular machine learning framework out there in a span of less than 3 years
- It is now in its third generation of TPUs on Google Cloud for those who need to train large datasets quickly
- TPUs are out of the reach of Amazon and other cloud providers. It is proprietary hardware designed, hosted and managed by Google, so any performance gains coming from it are left at the hands of Google for its customers to enjoy
Google’s TPUs got their fair share of time at the keynote in the beginning and were stitched throughout the keynote at strategic points:
- Google Lens uses TPUs to offer the real time capabilities that it does
- Waymo makes use of these TPUs to get to autonomous cars
Pichai even spent time boasting large terms like liquid cooling…
It is a miracle that these TPUs aren’t plastered all over the ML Kit landing page.Going with TensorFlow? You’ll need to decide on the cloud platform you are going to use, especially when it comes to dataset processing and training. Google is working hard to differentiate itself there. Wellbeing
I am assuming you are just as addicted to your smartphone as I am. There are so many jokes, memes, articles and complaints about it that we can no longer ignore it. There are talks about responsibility and its place in large corporations.
Apple and Google are being placed on the spotlight on this one in 2018, and Google took the first step towards a solution. They are doing it in a long term project/theme named “Wellbeing”.
Wellbeing is similar to the AI initiative at Google in my mind. Someone came to the managers and told them one day something like this: “Our products are highly addictive. Apple are getting skewered in the news due to it and we’re next in line. Let’s do something about it to show some leadership and a differentiation versus Apple. Bring me ideas of how we can help our Android users with their addiction. We will take the good ideas and start implementing them”.
Here are a few things that came under Wellbeing, and one that didn’t but should have been:
- Dashboard – Google is adding to Android P an activity dashboard to surface insights to the users on what they do on their smartphones
- YouTube includes a new feature to remind you to take a break when a configured amount of time passes. You can apply the same to other third party apps as well
- Smarter do not disturb feature, coupled with Shush – all in an effort to reduce notifications load and anxiety from the user
- Wind down – switching to grayscale mode when a predetermined time of day arrives
- Pretty Please – Google Assistant can be configured to respond “better” and offer positive reinforcements when asked nicely. This one should help parents make their kids more polite (I know I need it with my kids at home)
In a way, this is the beginning of a long road that I am sure will improve over time. It shows the maturity of mobile platforms.Not sure how responsibility, accountability and wellbeing like aspects lend themselves to other products. If you are aiming at machine learning, think of the biases in your models – these are getting attention recently as well. Fake News
Under responsibility there’s the whole Fake News of recent years.
While Wellbeing targets mainly Apple, The Google News treatment in the keynote was all about addressing Facebook’s weakness. I am not talking about the recent debacle with Cambridge Analitica – this one and anything else related to user’s data privacy was carefully kept away from the keynote. What was addressed is Fake News, where Google gets way more favorable attention than Facebook (just search Google for “google fake news” and “facebook fake news” and look at the titles of the articles that bubble up – check it also on Bing out of curiosity).
What Google did here is create a new Google New experience. And what is interesting is that it tried to bring something to market that skims nicely between objectivity and personalization – things that don’t often correlate when it comes to opinion and politics. It comes with a new layer of visualization that is more inviting, but most of what it does is rooted in AI (as anything else in this I/O keynote).
Here’s what I took out of it:
- AI is used to decide what are quality sources for certain news topics. They are designed to build trust in the news and to remove the “fake” part out of it
- Personalized news is offered in the “category” level. Google will surface topics that interest you
- Next to personalized news, there’s local news as well as trending news, which gets surfaced, probably without personalization though the choice of topics is most probably machine learning driven
- Introduced Newscast – a presentation layer of a topic, enabling readers to get the gist of a topic and later drill down if they wish in what Google calls Full Coverage – an unfiltered view of an event – in an unpersonalized way
One more thing Google did? Emphasized that they are working with publishers on subscriptions, being publisher-friendly, where Facebook is… er… not. Will this hold water and help publishers enough? Time will tell.AI and Machine Learning lends themselves well to this approach. It ends up being a mixture of personalization, trending and other capabilities that are surfaced when it comes to news. Can you see similar approaches suitable for your product offering? Smart Displays
Smart displays are a rather new category. Besides Android as an operating system for smartphones and the Waymo AI piece, there was no other device featured in the keynote.
Google Home wasn’t mentioned, but Smart Displays actually got their fair share of minutes in the keynote. The only reason I see for it is that it is coupled nicely with the Google Assistant.
The two features mentioned that are relevant?
- It can now show visuals that relate to what goes on in the voice channel
- This is similar in a way to what MindMeld tried doing years back, before its Cisco acquisition
- The main difference is that this involves a person and a chatbot. Adding a visual element makes a lot of sense and can be used to enhance the experience
- It offers rich and interactive responses, which goes hand in hand with the visuals part of it
I am unsure why Google gave smart displays the prominence it did at Google I/O. I really have no good explanation for it, besides being a new device category where Apple isn’t operating at all yet – and where Amazon Alexa poses a threat to Google Home.Android P
10 years in, and Android P was introduced.
There were two types of changes mentioned here: smarts and polish.
Smarts was all about AI (but you knew that already). It included:
- Adaptive Battery
- Adaptive Brightness
- ML Kit (see the Lens section above)
- App Actions and Slices, bot offering faster and better opportunities for apps to interact with users outside of the app itself
- UI/UX changes all around that are just part of the gradual evolution of Android
There was really not much to say about Android P. At least not after counting all the AI work that Google has been doing everywhere anyway.App Actions and Slices are important if you develop Android Apps. ML Kit is where the true value is and it works on both Android and iOS – explore it first. Google Maps
Google Maps was given the stage at the keynote. It is an important application and getting more so as time goes by.
Google Maps is probably our 4th search destination:
- Google Search
- Google Assistant
- Google Maps
This is where people look for information these days.
In Search Google has been second to none for years. It wasn’t even part of the keynote.
Google Assistant was front and center in this keynote, most probably superior to its competitors (Siri, Cortana and Lex).
YouTube is THE destination for videos, with Facebook there, but with other worries at this point in time. It is also safe to say that younger generations and more visual audiences search YouTube more often than they do anything else.
Maps is where people search to get from one place to another, and probably searching even more these days – more abstract searches.
In a recent trip to the US, I made quite a few searches that were open ended on Google Maps and was quite impressed with the results. Google is taking this a step further, adding four important pillars to it:
- Augmented Reality
Smarts comes from its ML work. Things like estimating arrival times, more commune alternatives (they’ve added motorcycle routes and estimates for example), etc.
Personalization was added by the introduction of a recommendation engine to Maps. Mostly around restaurants and points of interest. Google Maps can now actively recommend places you are more likely to like based on your past preferences.
On the collaboration front, Google is taking its first steps by adding the ability to share locations with friends so you can reach out a decision on a place to go to together.
AR was about improving walking directions and “fixing” the small gripes with maps around orienting yourself with that blue arrow shown on the map when you start navigating.Where are we headed?
That’s the big question I guess.
More machine learning and AI. Expect Google I/O 2019 to be on the same theme.
If you don’t have it in your roadmap, time to see how to fit it in.
If you are contemplating build versus buy for your live video platform, or just undecided on which one to pick, check out this 10-part video series.
My consulting projects these days tend to be in one of 3 domains:
- “We need more marketing exposure, and would like you to help us” (=marketing)
- “We want to talk about our strategy, differentiation and roadmap” (=product)
- “We want to make sure we’re building the product properly” (=architecture/development)
I like doing all of these types of projects simply because it keeps me interested. Especially since there’s no specific one that I like more than the others here. It does sometimes confuse potential customers, and probably doesn’t help me with “niching” or “focusing”, but it does give me a very wide view of the communications market.
I want to focus on the 3rd project type, the one where developers want assistance in making sure they pick the right technology, architecture the solution and get it to market with as little risk as possible, this is where things get interesting.
The first thing I do in such projects? Check for NIH.
NIH stands for Not Invented Here, and it is a syndrome of all developers. I know, because I suffer from it as well. Developers are builders and tinkerers. They like to make things work – not get them readymade, which is why when they have the opportunity of building something – they’ll go ahead and do it. The problem though, is that economies of scale as well as time to market aren’t in their favor. In many of the cases, it would be easier to just pick a CPaaS vendor and build your live video product on top of his platform instead of building it all from scratch.
There are many reasons why people go build their own video platform:
- They think it will cost them less in the long run (usually coupled with a feeling that the price points of the CPaaS vendors are too high and a dislike of paying per usage/minute and not a fixed fee)
- They have a unique scenario that isn’t quite covered by CPaaS vendors they tried out
- They want to own the video technology that they are using
- They need to run on premise due to their customers, regulation or any other reason/excuse
I spend some time uncovering and better understanding the reasons for the decision. Sometimes I feel they make sense, while other times less so.
Which is why when I sat down with Vidyo to think about an interesting project to do together some months back, the decision was made to put out a series of short videos explaining different aspects of live video platforms. I tried to cover as much ground as possible. From network impairments, through video coding technologies, through scale, devices and lots of other topics as well.
The purpose was to get developers and entrepreneurs acquainted with what is necessary when you go build your own infrastructure, and if you decide on buying a platform, to know what to look for.
The series is packed full with content. And I’d love to get your candid opinion of it. Check it out here:
The post Choosing a Live Video Platform – a new video series appeared first on BlogGeek.me.
There are opposite forces at play when it comes to the next wave of communication technologies.
There are a lot of changes going on at the moment, being introduced into the world of communications. If I had to make a shopping list of these technologies, I’d probably end up with something like this:
- Cloud, as a Service
- APIs and programmability
- Business messaging, social messaging
- “Teams”, enterprise messaging
- Contextual everything
- Artificial Intelligence, NLP, NLU, ML
- X Reality – virtual, augmented, mixed, …
Each item is worthy of technobabble marketing in its own rite, but the thing is, they do affect communications. The only question is in what ways.
I have been looking at it lately a lot, trying to figure out where things are headed, building different models to explain things. And looking at a few suggested models by other industry experts.Communication domains – simplified
Ignoring outliers, there are 3 main distinct communication domains within enterprises:
- UC – Unified Communications
- CC – Contact Center
- CP – Communications Platform
Usually, we will be using the obligatory “aaS” to them: UCaaS, CCaaS and CPaaS
I’ll give my own simplified view on each of these acronyms before we proceed.UCaaS
Unified Communications looks inwardly inside the company.
A company has employees. They need ways and means to communicate with each other. They also need to communicate with external entities such as suppliers, partners and customers. But predominantly, this is about internal communications. The external communications usually takes a second-class citizen position, with limited capabilities and accessibility; oftentimes, external communications will be limited to email, phone calls and SMS.
What will interest us here will be collaboration and communication.CCaaS
Contact Centers are about customers. Or leads, which are potential customers.
We’ve got agents in the contact center, be it sales or customer care (=support), and they need to talk to customers.
Things we care about in contact centers? Handling time, customer satisfaction, …CPaaS
Communication Platform as a Service is different.
It is a recent entry to the communications space, even if some would argue it has always been there.
CPaaS is a set of building blocks that enable us to use communications wherever we may need them. Both CCaaS and UCaaS can be built on top of CPaaS. But CPaaS is much more flexible than that. It can fit itself to almost any use case and scenario where communications is needed.Communications in Consolidation
There’s a consolidation occuring in communications. One where vendors in different part of communications are growing their offering into the adjacent domains.
We are in a migration from analog to digital when it comes to communications. And from pure telecom/telephony towards browser based, internet communications. Part of it is the introduction of WebRTC technology (couldn’t hold myself back from mentioning WebRTC).
This migration opens up a lot of opportunities and even contemplation on how should we define these communication domains and are they even separate at all.
There have been some interesting moves lately in this space. Here are a few examples of where these lines get blurred and redefined:
- Dialpad just introduced a contact center, tightly integrated and made a seamless part of its unified communications platform
- Vonage acquires Nexmo, which is one of the leading CPaaS vendors. Other UC vendors have added APIs and developer portals to their UC offerings
- Twilio just announced Flex, its first foray out of CPaaS and into the contact center realm
These are just examples. There are other vendors in the communication space who are going after adjacent domains.
The idea here is communication vendors looking into the communications venn diagram and reaching out to an adjacency, with the end result being a consolidation throughout the whole communications space.External disruption to communications
This is where things get really interesting. The forces at play are pushing communications outwards:
UCaaS, CCaaS, CPaaS. It was almost always about real time. Communications happening between people in real time. When the moment is over, the content of that communications is lost – or more accurately – it becomes another person’s problem. Like a contact center recording calls for governance or quality reasons only, or having the calls transcribed to be pushed towards a CRM database.
Anything that isn’t real time and transient isn’t important with communications. Up until now.
We are now connecting the real time with the asynchronous communications. Adding messaging and textual conversations. We are thinking about context, which isn’t just the here and now, but also the history of it all.
Here’s what’s changing though:UC and Teams
Unified Communications is ever changing. We’ve added collaboration to it, calling it UC&C. Then we’ve pushed it to the cloud and got UCaaS. Now we’re adding messaging to it. Well… we’re mostly adding UC to messaging (it goes the other way around). So we’re calling it Teams. Or Team Collaboration. Or Workstream Collaboration (WSC). Or Workstream Communication and Collaboration (WCC). I usually call it Enterprise Messaging.
The end result is simple. We focus on collaboration between teams in an organization, and we do that via group chat (=messaging) as our prime modal for communications.
Let’s give it a generic name that everyone understands: Slack
The question now is this: will UC gobble up Team communication vendors such as Slack (and now Workplace by Facebook; as well as many other “project management” and messaging type tools) OR will Slack and the likes of it gobble up UC?
I don’t really know the answer.CC and CRMs
What about contact centers? These live in the world of CRM. The most important customer data resides in CRMs. And now, with the introduction of WebRTC, and to an extent CPaaS vendors, a CRM vendor can decide to add contact center capabilities as part of his offering. Not through partnerships, but through direct implementation.
Can contact centers do the same? Can they expand towards the CRM domain, starting to handle the customer data itself?
If salesforce starts offering a solid contact center solution in the cloud as part of its offering, that is highly integrated with the Salesforce experience, adding to it a layer of sophistication that contact center vendors will find hard to implement – what will customers do? NOT use it in favor of another contact center vendor or source it all from Salesforce? Just a thought.
There’s an additional trend taking place. That’s one of context and analytics. We’re adding context and analytics into “customer journeys”, sales funnels and marketing campaigns. These buzzwords happen to be part of what contact centers are, what modern CRMs can offer, and what dedicated tools do.
For example, most chat widget applications for websites today offer a backend CRM-like dashboard that also acts like a messaging contact center, and at the same time, these same tools act similarly to Google Analytics by following users as they visit your website trying to derive insights from their journey so the contact center agent can use it throughout the conversation. Altocloud did something similar and got acquired recently by Genesys, a large contact center vendor.CP and PaaS
CPaaS is different a bit. We’re dealing with communication APIs here.
CPaaS market is evolving and changing. There are many reasons for it:
- SMS and voice is commoditized, with a lot of vendors offering these services
- IP based services are considered “easier” to implement, eroding their price point and popularity
- UCaaS vendors adding APIs, at times wanting to capture some of the market due to Twilio’s success
- As the market grows, there’s a looming sense of what would tech giants do – would Amazon add more CPaaS capabilities into AWS?
That last one is key. We’ve seen the large cloud vendors enhancing their platforms. Moving from pure CPU and storage services up the food chain. Amazon AWS has so many services today that it is hard to keep up. The question here is when will we reach an inflection point where AWS, GCE and Azure start adding serious CPaaS capabilities to their cloud platforms and compete directly with the CPaaS vendors?
Where is CPaaS headed anyway?
- Does the future of CPaaS lies in attacking adjacent communication markets like Twilio is doing with Flex?
- Will CPaaS end up being wrapped and baked into UC and “be done with it”?
- Is CPaaS bound to be gobbled up by cloud providers as just another set of features?
- Will CPaaS stay a distinct market on its own?
The future can unfold in three different ways when it comes to communications:
- Specialization in different communication domains continues and deepens
- UC ,CC and CP remain distinct domains
- May be a 4th domain comes in (highly unlikely to happen)
- Communication domains merge and we refer to it all as communications
- UC does CC
- CP used to build UC and CC
- Customers going for best of suite (=single vendor) who can offer UC, CC and CP in a single platform
- Communication domains get gobbled up by their adjacencies
- CC gets wrapped into CRM tools
- UC being eaten by messaging and teams experiences (probably to be called UC again at the end of the process)
- CP becoming part of larger, more generic cloud platforms
How do you think the future will unfold?
WebRTC developers are really hard to come by. I want to improve my ability to help companies in search of such skill.
If there’s something that occurs time and again, it is entrepreneurs and vendors who ask me if I know of anyone who can build their application. Some are looking to outsource the project as a whole or part of it, and then they are looking for agencies to work with. Others are looking for a single expert to work with on a specific task, or someone they could hire for long stretches of time who has WebRTC skills.You a WebRTC Developer?
I’d like to know more about you IF you are looking for projects or for a new employer.
Here are a few things first:
- Even if you think I know you, please fill out the form
- No agencies. If you are an agency, contact me and we can have a chat. I know a few that I am comfortable working with
- Only starting out with WebRTC? Don’t fill out the form. Mark this page, get some experience and then fill it out
- The form is short, so shouldn’t take more than 5 of your minutes to fill
- Don’t beautify things more than they are – that will just get you thrown out of my radar. Tell things as they are
Fill out this form for me please (or via this link):
I won’t be reaching out to you immediately (or at all). I’ll be using this list when others ask for talent that fits your profile.You looking for WebRTC Developers?
Got a need for developers that have WebRTC skills?
I am not sure exactly how to find them and where, but I am trying to get there.
Two ways to get there:
- I am thinking of opening up a job listing on WebRTC Weekly
- Payment will be needed to place a listing on the WebRTC Weekly, which reaches over 2,500 subscribers at the moment
- Cost will be kept low, especially considering the cost of talent acquisition elsewhere and the lack of available WebRTC developers out there
- I had a job listing sub-site in the past, didn’t work – this is another attempt I am trying out. If you want to try this one with me, I’ll be happy to take the leap
- Interested? Contact me
- Need a bit more than just finding a developer? I offer consulting services
- There are hourly rates available, as well as one-off consulting sessions
- I’ll be using the list I’ll be collecting of the WebRTC developers above to match you up with a candidate if you need – or just connect you with the agencies I am comfortable working with
Chat won’t bring carriers to their SMS-glory days.
The Verge came out with an exclusive last week that everyone out there is regurgitating. This is my attempt at doing the same
We’re talking about Google unveiling its plans for the consumer chat experience. To put things in quick bulleted points:
- There’s a new service called “Chat”, which is supposed to be Google’s and the carrier’s answer to Apple iMessage, Facebook Messenger and the rest
- Google’s default messages app on Android for SMS is getting an upgrade to support RCS, turning it into a modern messaging application
- The moment this happens will vary between the different carriers, who are, by the way, those who make the decision and control and own the service
- Samsung and other Android handset manufacturers will probably come out with their own messaging app instead of the one provided by Google
- This is a risky plan with a lot of challenges ahead of it
I’d like to share my viewpoints and where things are going to get interesting.SMS is dead
I liked Mashable’s title for their take on this:
While an apt title, my guess is that beyond carriers and reports written to them, we all know that already.
SMS has long been dead. The A2P (Application 2 Person) SMS messages are all that’s left out of it. Businesses texting us either their PIN codes and passwords for 2FA (2 Factor Authentication) and OTP (One Time Passwords) or just sending us marketing junk for us to ignore.
I asked a few friends of mine on a group chat yesterday (over Whatsapp, of course) when and how do they use SMS and why. Here are the replies I got (I translated them to English):
- I prefer Whatsapp. It is the most lightweight and friendly alternative. I only use SMS when they are automatically sent to me on missed calls
- Whatsapp is accessible. It has quick indicators and it is lightweight. It remembers everything in an orderly fashion
- I noticed that people take too long to respond on SMS while they respond a lot faster over Whatsapp. Since SMS is more formal to me, I use it when sending messages for the first time to people I don’t know
- I send SMS only to people I don’t know. I feel that Whatsapp is more personal
- I use iMessage only with my boss. She’s ultra religious so she doesn’t have Whatsapp installed. For everything else I use Whatsapp
- I mostly use Whatsapp for messages. I text via SMS only with my wife when I am flooded with Whatsapp messages and just want her notifications to be more prominent
- SMS is dead for me. I don’t even have it on my home screen, and that says anything. I use SMS only to receive PIN codes from businesses
- SMS is the new fax
These are 40 year olds in Israel. Most working out of the IT domain. The answers will probably vary elsewhere, but here in Israel, most will give you similar answers. Whatsapp has become the go-to app for communications. So much so, that we were forced to give our daughter her first smartphone at the age of 8 only so she can communicate with her friends via Whatsapp and won’t stay behind. Everyone uses it here in Israel.
You should also know that plans upwards of 2Gb of monthly data including unlimited voice and SMS in Israel cost less than $15 a month in Israel, so this has nothing to do with price pressure anymore. It has to do with network effects and simple user experience.
SMS is no longer ubiquitous across the globe. I can’t attest to other countries, but I guess Israel isn’t alone in this. SMS is just the last alternative to use when all else has failed.
Why is SMS interesting in this context?
Because a lot of what’s at stake here for Google relates to the benefits and characteristics of SMS.RCS is (still) dead
RCS is the successor of SMS for getting carriers into the 21st century. It has been discussed for many years now, and it will most definitely, utterly, completely, unquestionably get people back from their Messenger, WhatsApp and WeChat back to the clutches of the carriers. NOT.
RCS is a design-by-committee solution, envisioned by people my age and older, targeting a younger audience across the globe in an attempt to kill fast moving social network with a standardized, ubiquitous, agreed upon specification that then needs to be implemented by multiple vendors, handset manufacturers and carriers globally to make any sense.
Not going to happen.
Google’s take on this was to acquire an RCS vendor – Jibe – two years ago for this purpose. The idea was probably to provide a combination of an infrastructure and a mobile client to speed up RCS deployments around the globe and make them interoperable faster than the carriers will ever achieve on their own.
Two years passed, and we’ve got nothing but a slide (and the article on The Verge) to show for this effort:
An impressive list of operators, OEMs and OS providers that are behind this RCS initiative. Is that due to Google? To some part, probably so.
In a way, this reminds me also of Google’s other industry initiative – the Alliance of Open Media, where it is one of 7 original founding members that just recently came out with AV1, a royalty free video codec. It is a different undertaking:
- RCS will be controlled by carriers, who were never kind or benevolent to their users
- For carriers, the incentive can be found in the GSMA’s announcement: “GSMAi estimate that this will open up an A2P RCS business worth an estimated $74bn by 2021”
- This is about securing A2P SMS revenues by migrating to RCS
- The sentences before this one in that announcement explain how they plan on reaching there: “The Universal Profile ensures the telecoms industry remains at the centre of digital communications by enabling Operators, OEMs and OS Providers to deliver this exciting new messaging service consistently, quickly and simply.”
- Problem is, they are not the centre of digital communications, so this isn’t about ensuring or remaining. It is about winning back. And you can’t do that if your focus is A2P
- This isn’t about an open platform for innovation. Of a level playing field for all. And that makes it starkly different from the AV1 initiative. It is probably closer to MPEG-LA’s response in a way of a new video codec initiative
Why is Google going into bed with the carriers on this one?Google had no choice
The Verge had an exclusive interview with Anil Sabharwal, the Google VP leading this effort. This led to the long article about this initiative. The numbers that Anil shared were eye opening as to the abysmal state of Google’s messaging efforts thus far.
I went ahead and placed these numbers next to other announced messaging services for comparison:
A few things to note here:
- Telegram, Facebook Messenger and Whatsapp are all apps users make a decision to install, and they are making that decision en masse
- Apple has upwards of 1.3 billion active devices, which indicate the general size of its iMessage service
- Google Messages is the default app on Android for SMS, unless:
- Carriers replace it with their own app
- Handset manufacturers replace it with their own app
- Users replace it with another app they install
- Google Messages sees around 100 million monthly active users – the table-stakes entry number to be relevant in this market, but rather low for an ubiquitous, default app
- Google Allo has less than 50 million downloads. That’s not even monthly active users
- Google Hangouts stopped announcing its user base years ago, and frankly, they stopped investing in it as well. The mobile app is defunct (for me) for quite some time now, with unusual slowness and unresponsiveness
Google failed to entice its billion+ Android users to install or even use its messaging applications.
Without the numbers, it couldn’t really come up with a strategy similar to Apple iMessage, where it essentially hijacks the messaging traffic from carriers, onboarding the users to its own social messaging experience.
Trying to do that would alienate Google with the carriers, which Google relies on for Android device sales. Some would argue that Google has the klout and size to do that, but that is not the case.
Android is open, so handset manufacturers and carriers could use it without Google’s direct approval, throwing away the default messaging app. Handset manufacturers and carriers would do that in an effort to gain more control over Android, which would kill the user experience, as most such apps by handset manufacturers and carriers do. The end result? More users purchasing iPhones, as carriers try to punish Google for the move.
What could Google do?
- Double down on their own social messaging app – hasn’t worked multiple times now. What can they do different?
- Build their own iMessage – alienate the Android ecosystem, with the risk of failing attracting users as they failed in the past
- Partner with carriers on RCS
Two years ago, Google decided to go for alternatives (1) and (3). Allo was their own social messaging app. Had it succeeded, my guess is that Google would have gone towards approach (2). In parallel, Google acquired Jibe in an effort to take route (3), which is now the strategy the company is behind for its consumer messaging.
The big risk here is that the plan itself relies on carriers and their decisions. We don’t even know when will this get launched. Reading between the lines of The Verge’s article, Google already completed the development and got the mobile client ready and deployed. It just isn’t enabled unless the carrier being used approves. Estimates indicate 6-12 months until that happens, but for which of the carriers? And will they use the stock Android app for that or their own ambitious better-than-whatsapp app?E2EE can kill this initiative and hurt Google
The biggest risk to Google is the lack of E2EE (end to end encryption).
In each and every regurgitated post of The Verge article and in The Verge itself this is emphasized. Walt Mossberg’s tweet was mentioned multiple times as well:
Bottom line: Google builds an insecure messaging system controlled by carriers who are in bed with governments everywhere at exactly the time when world publics are more worried about data collection and theft than ever.
— Walt Mossberg (@waltmossberg) April 20, 2018
Bottom line: Google builds an insecure messaging system controlled by carriers who are in bed with governments everywhere at exactly the time when world publics are more worried about data collection and theft than ever.
The problem for Google is that the news outlets are noticing and giving it a lot of publicity. And it couldn’t come at a less convenient time, where Facebook is being scrutinized for its malpractice of how it uses and protects user data in the Cambridge Analytica scandal. Google for the most part, has come unscathed out of it, but will this move put more of the spotlight on Google?
The other problem is that all the other messaging apps already have E2EE supported in one way or another. The apps usually mentioned here are Apple iMessage, Signal and Telegram. Whatsapp switched to E2EE by default two years ago. And Facebook Messenger has it as an option (though you do need to enable it manually per conversation).
Will customers accept using “Chat” (=RCS) when they know it isn’t encrypted end to end?
On the other hand, Russia is attempting to close Telegram by blocking millions of IP addresses in the country, and taking down with it other large services. If this succeeds, then Russia will do the same to all other popular messaging applications. And then other countries will follow. The end result will be the need to use the carrier (and Google’s) alternative instead. Thankfully, Russia is unsuccessful. For the time being.Who owns the data?
With RCS, the carriers are the ones that are intercepting, processing and forwarding the messages. In a way, it alludes to the fact that Google isn’t going to be the one reading these messages, at least not from the server.
This means that either Google decided there’s not enough value in these messages and in monetizing them – or – that they have other means to gain access to these messages.
Here are a few alternatives Google can use to accessing these messages:
- Through licensing and operating the servers on behalf of carriers. Not all carriers will roll their own and may prefer using Google as a service here. Having the messages in unencrypted format on the server side is beneficial for Google in a way, especially when they can “blame” the carriers and regulations
- Via Google’s Messages app. While messages might be sent via a carrier’s network, the client sending and receiving these messages is developed and maintained by Google, giving them the needed access. This can be coupled with features like backing up the messages in Google Drive or letting Google read the messages to improve its services for the user
- By coupling features such as Google Assistant and Smart Replies into it, which means Google needs to read the messages to offer the service
Google might have figured it has other means to get to the messages besides owning and controlling the whole experience – similar to how Google Photos is one of the top camera apps in Apple iTunes.
By offering a better experience than other RCS client competitors, it might elicit users to download its stock Chat app on devices who don’t have it by default. Who knows? It might even be able to get people to download and use it on an iPhone one day.
The success of Google here will translate into RCS being a vehicle for Google to get back to messaging more than the means for carriers to gain relevance again.Ubiquity is here already, but not via SMS or RCS
I’ll put the graph here again – to make a point.
1.5 billion people is ubiquitous enough for me. Especially when the penetration rates in Israel are 100% in my network of connections.
People tend to talk about the ubiquity of SMS and how RCS will inherit that ubiquity.
They fail to take into account the following:
- SMS is ubiquitous, but it took it many years to get there
- It is used for marketing and 2FA mostly
- The marketing part is less valuable
- It can be treated as spam by consumers for the most part
- It is one way in nature, where social networks are around conversations
- Spam and unsolicited messages don’t work that well in social networks
- 2FA will be shifting away from SMS (see here)
- Google does a lot of its 2FA without SMS today
- Google can open it up to third parties at any point in time
- Apple can do the same with the iPhone
- The shift towards RCS won’t be done in a single day. It will be done in a patchwork fashion across the globe by different carriers
Think about it.
You can now send out an RCS message from your device. To anyone. If the other party has no RCS installed, the message gets converted to SMS. Sweet.
But what happens when the person you are sending that RCS message is located abroad? Are you seriously happy with getting a payment request from your carrier on a stupid international SMS message, or a full conversation of such for a thing you could have easily used Whatsapp for instead? And for free.
Ubiquity isn’t the word that comes to my mind when thinking about RCS.The holy grail is business messaging
Consumer messaging is free these days. There is no direct monetary value to be gained by offering this service to consumers. Carriers won’t be able to put that jinni back into its bottle and start collecting money from users. Their only approach here might be to zero-rate RCS traffic, but that also isn’t very interesting to most consumers – at least not here in Israel.
The GSMA already suggested where the money is – in business messaging. They see this as a $74bn opportunity by 2021. The problem is that rolling RCS 6-12 months from now, by only some of the carriers, isn’t going to cut it. Apple Business Chat was just released, vertically integrated, with a lot of thought put into businesses, their discovery process and free of charge.
Then there’s the rest of the social networks opening their APIs towards the businesses, and contact center solutions driving the concept of omnichannel experiences for customers.
Carriers are getting into this game late and unprepared. On top of that, they will try to get money out of this market similar to how they do with SMS. But the price points they are used to make no sense anymore. Something will need to change for the carriers to be successful here.
Will carriers be able to succeed with RCS? I doubt it.
Will google be able to succeed with Chat? Maybe. But it is up to the carriers to allow that to happen.
The post RCS now Google Messages. What’s Next in Consumer Messaging? appeared first on BlogGeek.me.
Join Philipp Hancke and me for a free training on WebRTC 1.0, prior to the relaunch of my advanced WebRTC training.
Here’s something that I get at least once a week through my website’s chat widget:
It is one of the main reasons why I’ve created my advanced WebRTC course. It is a paid WebRTC course that is designed to fill in the gaps and answer the many questions developers face when needing to deal with WebRTC.Elephants, blind Men, alligators and WebRTC
I wanted to connect it to the parable of the six blind man and an elephant, explaining how wherever you go in the Internet, you are going to get a glimpse about WebRTC and never a full clear picture. I even searched for a good illustration to use for it. Then I bumped into this illustration:
It depicts what happens with WebRTC and developers all too well.
If you haven’t guessed it, the elephants here are WebRTC and the requirements of the application and that flat person is the developer.
This fits well with another joke I heard yesterday from a friend’s kid:
Q: Why can’t you go into the woods between 14:00-16:00?
A: Because the elephants are skydiving
There’s a follow up joke as well:
Q: Why are the alligators flat?
A: Because they entered the woods between 14:00-16:00
WebRTC development has a lot of rules. Many of which are unwritten.WebRTC 1.0
There is a lot of nuances about WebRTC. A lot of written material, old and new – some of it irrelevant now, the rest might be correct but jumbled. And WebRTC is a moving target. It is hard to keep track of all the changes. There’s a lot of knowledge around WebRTC that is required – knowledge that doesn’t look like an API call or written in the standard specification.
This means that I get to update my course every few months just to keep up.
With WebRTC 1.0, there’s both a real challenge as well as an opportunity.
It is a challenge:
- WebRTC 1.0 still isn’t here. There’s a working draft, which should get standardized *soon* (=soon started in 2015, and probably ends in 2018, hopefully)
- Browser implementations lag behind the latest WebRTC 1.0 draft
- Browser implementations don’t behave the same, or implement the same parts of the latest WebRTC 1.0 draft
It is an opportunity:
We might actually get to a point where we have a stable API with stable implementations.
But we’re still not thereShould you wait?
We’re 6-7 years in with WebRTC (depending who does the counting), and this hasn’t stopped well over a 1,000 vendors to jump in and make use of WebRTC in production services.
There’s already massive use of WebRTC.Me and WebRTC 1.0
For me, WebRTC 1.0 is somewhat of a new topic.
I try to avoid the discussions going on around WebRTC in the standardization bodies. The work they do is important and critical, but often tedious. I had my fair share of it in the past with other standards and it isn’t something I enjoy these days.
This caused a kind of a challenge for me as well. How can I teach WebRTC, in a premium course, without explaining WebRTC 1.0 – a topic that needs to be addressed as developers need to prepare for the changes that are coming.
The answer was to ask Philipp Hancke to help out here, and create a course lesson for me on WebRTC 1.0. I like doing projects with Philipp, and do so on many fronts, so this is one additional project. It also isn’t the first time either – the bonus materials of my WebRTC course includes a recorded lesson by Philipp about video quality in WebRTC.Free WebRTC 1.0 Webinar
Tomorrow, we will be recording the WebRTC 1.0 lesson together for my course. I’ll be there, and this time, partially as a student.
To make things a bit more interesting, as well as promoting the whole course, this lesson will be given live in the form of a free webinar:
- Anyone can join for free to learn about WebRTC 1.0
- The recording will only be available as part of the advanced WebRTC course
This webinar/lesson will take place on
Tuesday, April 10
2-3PM EST (view in your timezone)
The session’s recording will NOT be available after the event itself. While this lesson is free to attend live, the recording will become an integral part of the course’ lessons.
The post WebRTC 1.0 Training and Free Webinar Tomorrow (on Tuesday) appeared first on BlogGeek.me.
AV1 for video coding is what Opus is for audio coding.
The Alliance of Open Media (AOMedia) issued last week a press release announcing its public release of the AV1 specification.
Last time I wrote about AOMedia was over a year ago. AOMedia is a very interesting organization. Which got me to sit down with Alex Eleftheriadis, Chief Scientist and Co-founder of Vidyo, for a talk about AV1, AOMedia and the future of real time video codecs. It was really timely, as I’ve been meaning to write about AV1 at some point. The press release, and my chat with Alex pushed me towards this subject.
- We are moving towards a future of royalty free video codecs
- This is due to the drastic changes in our industry in the last decade
- It won’t happen tomorrow, but we won’t be waiting too long either
Before you start, if you need to make a decision today on your video codec, then check out this free online mini video course
H.264 or VP8?
Now let’s start, shall we?AOMedia and AV1 are the result of greed
When AOMedia was announced I was pleasantly surprised. It isn’t that apparent that the founding members of AOMedia would actually find the strength to put their differences aside for the greater good of the video coding industry.Video codec royalties 101
You see, video codecs at that point in time was a profit center for companies. You invested in research around video coding with the main focus on inventing new patents that will be incorporated within video codecs that will then be globally used. The vendors adopting these video codecs would pay royalties.
With H.264, said royalties came with a cap – if you distributed above a certain number of devices that use H.264, you didn’t have to pay more. And the same scheme was put in place when it came to HEVC (H.265) – just with a higher cap.
Why do we need this cap?
- Companies want to cap their commitment and expense. In many cases, you don’t see direct revenue per device, so no cap means this it is harder to match with asymmetric business models and applications that scale today to hundreds of millions of users
- If a company needs to pay based on the number of devices they sell, then the one holding the patents and getting the payment for royalties knows that number exactly – something which is considered trade secret for many companies
So how much money did MPEG-LA took in?
Being a private company, this is hard to know. I’ve seen estimates of $10M-50M, as well as $17.5B on Quora. The truth is probably somewhere in the middle. Which is still a considerable amount of money that was funnelled to the patent owners.
With royalty revenues flowing in, is it any wonder that companies wanted more?
An interesting tidbit about this greed (or shall we say rightfulness) can be found in the Wikipedia page of VP8:
In February 2011, MPEG LA invited patent holders to identify patents that may be essential to VP8 in order to form a joint VP8 patent pool. As a result, in March the United States Department of Justice (DoJ) started an investigation into MPEG LA for its role in possibly attempting to stifle competition. In July 2011, MPEG LA announced that 12 patent holders had responded to its call to form a VP8 patent pool, without revealing the patents in question, and despite On2 having gone to great lengths to avoid such patents.
So… we have a licensing company whose members are after royalty payments on patents. They are blinded by the success of H.264 and its royalty scheme and payments, so they go after anything and everything that looks and smells like competition. And they are working towards maintaining their market position and revenue in the upcoming HEVC specification.The HEVC/H.265 royalties mess
Leonardo Chiariglione, founder and chairman of MPEG, attests in a rather revealing post:
Good stories have an end, so the MPEG business model could not last forever. Over the years proprietary and “royalty free” products have emerged but have not been able to dent the success of MPEG standards. More importantly IP holders – often companies not interested in exploiting MPEG standards, so called Non Practicing Entities (NPE) – have become more and more aggressive in extracting value from their IP.
HEVC, being a new playing ground, meant that there were new patents to be had – new areas where companies could claim having IP. And MPEG-LA found itself one of many patent holder groups:
MPEG-LA indicated its wish to take home $0.2 per device using HEVC, with a high cap of around $25M.
HEVC Advance started with a ridiculously greedy target of $0.8 per device AND %0.5 of the gross margin of streaming services (unheard of at the time) – with no cap. It since rescinded, making things somewhat better. It did it a bit too late in the game though.
Velos Media spent money on a clean and positive website. Their Q&A indicate that they haven’t yet made a decision on royalties, caps and content royalties. Which gives great confidence to those wanting to use HEVC today.
And then there are the unaffiliated. Companies claiming patents related to HEVC who are not in any pool. And if you think they won’t be suing anyone then think again – Blackberry just sued Facebook for messaging related patents – easy to see them suing for HEVC patents in their current position. Who can blame them? They have been repeatedly sued by patent trolls in the past.
HEVC is said to be the next biggest thing in video coding. The successor of our aging H.264 technology. And yet, there’s too many unknowns about the true price of using it. Should one pay royalties to MPEG-LA, HEVC Advance and Velos Media or only one of them? Would paying royalties protect from patent litigation?
Is it even economically viable to use HEVC?
Yes. Apple has introduced HEVC in iOS 11 and iPhone X. My guess is that they are willing to pay the price as long as this keeps the headache and mess on the Android camp (I can’t see the vendors there coming to terms of who is the one in the value chain that will end up paying the royalties for it).
With such greed and uncertainty, a void was left. One that got filled by AOMedia and AV1.AOMedia – The who’s who of our industry
AOMedia is a who’s who list of our industry. It started small, with just 7 big names, and now has 12 founding members and 22 promoter members.
Some of these members are members of MPEG-LA or already have patents in HEVC and video coding. And this is important. Members of AOMedia effectively allow free access to essential patents in the implementation of AOMedia related specifications. I am sure there are restrictions applied here, but the intent is to have the codecs coming out of AOMedia royalty free.
A few interesting things to note about these members:
- All browser vendors are there: Google, Mozilla, Microsoft and Apple
- All large online streaming vendors are there: Google (=YouTube), Amazon and Netflix
- From that same streaming industry, we also have Hulu, Bitmovin and Videolan
- Most of the important chipset vendors are there: Intel, AMD, NVidia, Arm and Broadcom
- Facebook is there
- Of the enterprise video conferencing vendors we have Cisco, Vidyo and Polycom
- Qualcomm is missing
AOMedia is at a point that stopping it will be hard.
Here’s how AOMedia visualize its members’ products:What’s in AV1?
AV1 is a video codec specification, similar to VP8, H.264, VP9 and HEVC.
AV1 is built out of 3 main premises:
- Royalty free – what gets boiled into the specification is either based on patents of the members of AOMedia or uses techniques that aren’t patented. It doesn’t mean that companies can’t claim IP on AV1, but as far as the effort on developing AV1 goes, they aren’t knowingly letting in patents
- Open source reference implementation – AV1 comes with an open source implementation that you can take and start using. So it isn’t just a specification that you need to read and build with a codec from scratch
- Simple – similar to how WebRTC is way simpler than other real time media protocols, AV1 is designed to be simple
Simple probably needs a bit more elaboration here. It is probably the best news I heard from Alex about AV1.Simplicity in AV1
You see, in standardization organizations, you’ll have competing vendors vying for an advantage on one another. I’ve been there during the glorious days of H.323 and 3G-324M. What happens there, is that companies come up with a suggestion. Oftentimes, they will have patents on that specific suggestion. So other vendors will try to block it from getting into the spec. Or at the very least delay it as much as they can. Another vendor will come up with a similar but different enough approach, with their own patents, of course. And now you’re in a deadlock – which one do you choose? Coalitions start emerging around each approach, with the end result being that both approaches will be accepted with some modifications and get added into the specification.
But do we really need both of these approaches? The more alternatives we have to do something similar, the more complex the end result. The more complex the end result, the harder it is to implement. The harder it is to implement, well… the closer it looks like HEVC.
Here’s the thing.
From what I understand, and I am not privy to the intricate details, but I’ve seen specifications in the past, and been part of making them happen, HEVC is your standard design-by-committee specification. HEVC was conceived by MPEG-LA, which in the last 20 years have given us MPEG-2, H.264 and HEVC. The number of members in MPEG-LA with interests in getting some skin in this game is large and growing. I am sure that HEVC was a mess of a headache to contend with.
This is where AV1 diverges. I think there’s a lot less politics going on in AOMedia at the moment than in MPEG-LA. Probably due to 2 main reasons:
- It is a newer organization, starting fresh. There’s politics there as there are multiple companies and many people, but since it is newer, the amount of politics involved will be lower than an organization that has been around for 20+ years
- There’s less money involved. No royalties means no pie to split between patent holders. So less fights about who gets his tools and techniques incorporated into the specification
The end result? The design is simpler, which makes for better implementations that are just easier to develop.AV1 IRL
In real life, we’re yet to see if AV1 performs better than HEVC and in what ways.
Current estimates is that AV1 performans equal or better than HEVC when it comes to real time. That’s because AV1 has better tools for similar computation load than what can be found in HEVC.
So… if you have all the time in the world to analyze the video and pick your tools, HEVC might end up with better compression quality, but for the most part, we can’t really wait that long when we encode video – unless we encode the latest movie coming out from Hollywood. For the rest of us, faster will be better, so AV1 wins.
The exact comparison isn’t there yet, but I was told that experiments done on the implementations of both AV1 and HEVC shows that AV1 is equal or better to HEVC.Streaming, Real Time and SVC
There is something to be said about real time, which brings me back to WebRTC.
Real time low delay considerations of AV1 were discussed from the onset. There are many who focus on streaming and offline encoding of videos within AOMedia, like Netflix and Hulu. But some of the founding members are really interested in real time video coding – Google, Facebook, Cisco, Polycom and Vidyo to name a few.
Polycom and Vidyo are chairing the real time work group, and SVC is considered a first class citizen within AV1. It is being incorporated into the specification from the start, instead of being bolt-on into it as was done with H.264 and VP9.Low bitrate
Then there’s the aspect of working at low bitrates.
With the newer codecs, you see a real desire to enhance the envelope. In many cases, this means increasing the resolution and frame rates a video codec supports.
As far as I understand, there’s a lot of effort being put into AV1 in the other side of the scale – in working at low resolutions and doing that really well. This is important for Google for example, if you look at what they decided to share about VP9 on YouTube:
For YouTube, it isn’t only about 4K and UHD – it is on getting videos to be streamed everywhere.
Based on many of the projects I am involved with today, I can say that there are a lot of developers out there who don’t care too much about HD or 4K – they just want to get decent video being sent and that means VGA resolutions or even less. Being able to do that with lower bitrates is a boon.Is AV1 “next gen”?
I have always considered AV1 to be the next next generation:
We have H.264 and VP8 as the current generation of video codecs, then HEVC and VP9 as the next generation, and then there’s AV1 as the next next generation.
In my mind, this is what you’d get when it comes to compression vs power requirements:
Alex opened my eyes here, explaining that reality is slightly different. If I try translating his words to a diagram, here’s what I get:
AV1 is an improvement over HEVC but probably isn’t a next generation video codec. And this is an advantage. When you start working on a new generation of a codec, the work necessary is long and arduous. Look at H.261, H.263, H.264 and HEVC codec generations:
Here are some interesting things that occured to me while placing the video codecs on a timeline:
- The year indicated for each codec is the year in which an initial official release was published
- Understand that each video codec went through iterations of improvements, annexes, appendices and versions (HEVC already has 4 versions)
- It takes 7-10 from one version until the next one gets released. On the H.26x track, the number of years between versions has grown through time
- VP8 and VP9 have only 4 years between one and the other. It makes sense, as VP8 came late in the game, playing catch-up with H.264 and VP9 is timed nicely with HEVC
- AV1 comes only 6 years after HEVC. Not enough time for research breakthroughs that would suggest a brand new video codec generation, but probably enough to make improvements on HEVC and VP9
AOMedia has been working towards this important milestone for quite some time – the 1.0 version specification of AV1.
The first thing I thought when seeing it is: they got there faster than WebRTC 1.0. WebRTC has been announced 6 years ago and we’re just about to have it announced (since 2015 that is). AOMedia started in 2015 and it now has its 1.0 ready.
The second one? I was interested in the quotes at the end of that release. They show the viewpoints of the various members involved.
- Amazon – great viewing experience
- Arm – bringing high-quality video to mobile and consumer markets
- Cisco – ongoing success of collaboration products and services
- Facebook – video being watched and shared online
- Google – future of media experiences consumers love to watch, upload and stream
- Intel – unmatched video quality and lower delivery costs across consumer and business devices as well as the cloud’s video delivery infrastructure
- NVIDIA – server-generated content to consumers. […] streaming video at a higher quality […] over networks with limited bandwidth
- Mozilla – making state-of-the-art video compression technology royalty-free and accessible to creators and consumers everywhere
- Netflix – better streaming quality
- Microsoft – empowering the media and entertainment industry
- Adobe – faster and higher resolution content is on its way at a lower cost to the consumer
- AMD – best media experiences for consumers
- Amlogic – watch more streaming media
- Argon Design – streaming media ecosystem
- Bitmovin – greater innovation in the way we watch content
- Broadcom – enhance the video experience across all forms of viewing
- Hulu – Improving streaming quality
- Ittiam Systems – the future of online video and video compression
- NGCodec – higher quality and more immersive video experiences
- Vidyo – solve the ongoing WebRTC browser fragmentation problem, and achieve universal video interoperability across all browsers and communication devices
- Xillinx – royalty-free video across the entire streaming media ecosystem
Apple decided not to share a quote in the press release.
Most of the quotes there are about media streaming, with only a few looking at collaboration and social. This somewhat saddens me when it comes from the likes of Broadcom.
I am glad to see Intel and Arm taking active roles. Both as founding members and in their quotes to the press release. It is bad that Qualcomm and Samsung aren’t here, but you can’t have it all.
I also think Vidyo are spot-on. More about that later.What’s next for AOMedia?
There’s work to be done within AOMedia with AV1. This is but a first release. There are bound to be some updates to it in the coming year.
Current plans are to have some meaningful software implementation of AV1 encoder/decoder by the end of 2018, and somewhere during 2019 (end of most probably) have hardware implementations available. Here’s the announced timeline from AOMedia:
Realistically, mass adoption would happen somewhere in 2020-2022. Until then, we’ll be chugging along with VP8/H.264 and fighting it out around HEVC and VP9.
There are talks about adding still image format based on the work done in AV1, which makes sense. It wouldn’t be farfetched to also incorporate future voice codecs into AOMedia. This organization has shown it can bring into it the industry leaders into a table and come up with royalty free codecs that benefit everyone.AV1 and WebRTC
Will we see AV1 in WebRTC? Definitely.
When? Probably after WebRTC 1.0. Or maybe not
It will take time, but the benefits are quite clear, which is what Alex of Vidyo alluded to in the quote given in the press release:
“solve the ongoing WebRTC browser fragmentation problem, and achieve universal video interoperability across all browsers and communication devices”
We’re still stuck in the challenge of which video codec to select in WebRTC applications.
- Should we go for VP8, just because everyone does, it is there and it is royalty free?
- Or should we opt for H.264, because Safari supports it, and it has better hardware support.
- Maybe we should go for VP9 as it offers better quality, and “suffer” the computational hit that comes with it?
AV1 for video coding is what Opus is to audio coding. That article I’ve written in 2013? It is now becoming true for video. Once adoption of AV1 hits – and it will in the next 3-5 years, the dilemma of which video codec to select will be gone.
Until then, check out this free mini course on how to select the video codec for your application
Sign up for free
The post AV1 Specification Released: Can we kiss goodbye to HEVC and royalty bearing video codecs? appeared first on BlogGeek.me.
Demand for WebRTC developers is stronger than supply.
My inbox is filled with requests for experienced WebRTC developers on a daily basis. It ranges from entrepreneurs looking for a technical partner, managers searching for outsourcing vendors to help them out. My only challenge here is that developers and testers who know a thing or two about WebRTC are hard to find. Finding developers who are aware of the media stack in WebRTC, and not just dabbled into using a github “hello world” demo – these are truly rare.
This is why I created my WebRTC course almost 2 years ago. The idea was to try and share my knowledge and experience around VoIP, media processing and of course WebRTC, with people who need it. This WebRTC training has been a pleasant success, with over 200 people who took it already. And now it is time for the 4th round of office hours for this course.Who is this WebRTC training for?
This WebRTC course is for anyone who is using WebRTC in his daily work directly or indirectly. Developers, testers, software architects and product managers will be those who benefit from it the most.
It has been designed to give you the information necessary from the ground up.
If you are clueless about VoIP and networking, then this course will guide you through the steps needed to get to WebRTC. Explaining what TCP and UDP are, how HTTP and WebSockets fit on top of it, going to the acronyms used by WebRTC (SRTP, STUN, TURN and many others).
If you have VoIP knowledge and experience, then this course will cover the missing parts – where WebRTC fits into your world, and what to take special attention to, assuming a VoIP background (WebRTC brings with it a different mindset to the development process).
What I didn’t want to do, is have a course that is so focused on the specification that: (1) it becomes irrelevant the moment the next Chrome browser is released; (2) it doesn’t explain the ecosystem around WebRTC or give you design patterns of common use cases. Which is why I baked into the course a lot of materials around higher level media processing, the WebRTC ecosystem and common architectures in WebRTC.
TL;DR – if you follow this blog and find it useful, then this course is for you.Why take it?
The question should be why not?
There are so many mistakes and bad decisions I see companies doing with WebRTC. From deciding how to model their media routes, to where to place their TURN servers (or configure them). Through how to design scale out, to which open source frameworks to pick. Such mistakes end up a lot more expensive than any online course would ever be.
In April, next month, I will be starting the next round of office hours.
While the course is pre-recorded and available online, I conduct office hours for a span of 3-4 months twice a year. In these live office hours I go through parts of the course, share new content and answer any questions.What does it include?
The course includes:
- 40+ lessons split into 7 different modules with an additional bonus module
- 15 hours of video content, along with additional links for extra reading material
- Several e-books available only as part of the course, like how the Jitsi team scales Jitsi Meet, and what are sought after characteristics in WebRTC developers
- A private online forum
- The office hours
In the past two months I’ve been working on refreshing some of the content, getting it up to date with recent developments. We’ve seen Edge and Safari introducing WebRTC during that time for example. These updated lessons will be updated in the course before the official launch.When can I start?
Whenever you want. In April, I will be officially launching the office hours for this course round. At that point in time, the updated lessons will be part of the course.
What more, there will be a new lesson added – this one about WebRTC 1.0. Philipp Hancke was kind enough to host this lesson with me as a live webinar (free to attend live) that will become an integral lesson in the course.
If you are interested in joining this lesson live:
You can always take it later on, but I won’t be able to guarantee pricing or availability of the office hours at that point in time.
If you plan on doing anything with WebRTC in the next 6 months, you should probably enroll today.
And by the way – if you need to come as a team to up the knowledge and experience in WebRTC in your company, then there are corporate plans for the course as well.
CONTENT UPGRADE: If you are serious about learning WebRTC, then check out my online WebRTC training:
Monitoring focus is shifting from server-side to client-side in WebRTC statistics collection.
WebRTC happens to decentralize everything when it comes to VoIP. We’re on a journey here to shift the weight from the backend to the edge devices. While the technology in WebRTC isn’t any different than most other VoIP solutions, the way we end up using it and architecting our services around it is vastly different.
One of the prime examples here is how we shifted focus for group calling from an MCU mixing model to an SFU routing model. Suddenly, almost overnight, the notion of deploying MCU started to seem ridiculous. And believe me – I should know – I worked at a company where %60+ came from MCUs.
The shift towards SFU means we’re leaning more on the capabilities and performance of the edge device, giving it more power in the interaction when it comes to how to layout the display, instead of doing all the heavy lifting in the backend using an MCU. The next step here will be to build mesh networks, though I can’t see that future materializing any time soon.VoIP != WebRTC. Maybe not from a direct technical point, but definitely from how we end up using it. If you need to learn more about WebRTC, then my WebRTC training is exactly what you need:
What I wanted to mention here is something else that is happening, playing towards the same trend exactly – we are moving the collection of VoIP performance statistics (or more accurately WebRTC statistics) from the backend to the edge – we now prefer doing it directly from the browser/device.VoIP Statistics Collection and Monitoring
If you are not familiar with VoIP statistics collecting and monitoring, then here’s a quick explainer for you:
VoIP is built out of the notion of interoperability. Developers build their products and then test it against the spec and in interoperability events. Then those deploying them integrate, install and run a service. Sometimes this ends up by using a single vendor, but more often than not, multiple vendor products run in the same deployment.
There is no real specification or standard to how monitoring needs to happen or what kind of statistics can, should or is collected. There are a few means of collecting that data, and one of the most common approaches is by employing HEP/EEP. As the specification states:
The Extensible Encapsulation protocol (“EEP”) provides a method to duplicate an IP datagram to a collector by encapsulating the original datagram and its relative header properties (as payload, in form of concatenated chunks) within a new IP datagram transmitted over UDP/TCP/SCTP connections for remote collection. Encapsulation allows for the original content to be transmitted without altering the original IP datagram and header contents and provides flexible allocation of additional chunks containing additional arbitrary data. The method is NOT designed or intended for “tunneling” of IP datagrams over network segments, and best serves as vector for passive duplication of packets intended for remote or centralized collection and long term storage and analysis.
Translating this to plain English: media packets are duplicated for the purpose of sending them off to be analyzed via a monitoring service.
The duplication of the packets happens in the backend, through the different media servers that can be found in a VoIP network. Here’s how it is depicted on HOMER/SIPCAPTURE’s website:
HOMER collects its data directly from the servers – OpenSIPS, FreeSWITCH, Asterisk, Kamailio – there’s no user devices here – just backend servers.
Other systems rely on the switches, routers and network devices that again reside in the backend infrastructure. Since in VoIP production networks, we almost always route the media through the backend servers, the assumption is that it is easier to collect it here where we have more control than from the devices.
This works great, but not really needed or helpful with WebRTC.WebRTC Statistics Collection and Monitoring
With WebRTC, there are only a handful of browsers (4 to be exact), and they all adhere to the same API (that would be WebRTC). And they all have that thing called getstats() implemented in them. These get the same information you find in chrome://webrtc-internals.
Many deployments end up running peer-to-peer, having the media traverse directly through the internet and not through the backend of the service itself. Google Hangouts decided to take that route two years ago. Jitsi added this capability under the name Jitsi P2P4121. How do these services control and understand the quality of their users?
If you look at other media servers out there, most of them are a few years old only. WebRTC is just 6 years old now. So everyone’s focused on features and stability right now. Quality and monitoring is not in their focus area just yet.
Last, but not least, WebRTC is encrypted. Always. And everywhere. So sniffing packets and deducing quality from them isn’t that easy or accurate any longer.
This led to the focus of WebRTC applications in gathering WebRTC statistics from the browsers and devices directly, and not trying to get that information from the media servers.
The end result? Open source projects such as rtcstats and commercial services such as callstats.io. At the heart of these, WebRTC statistics gets collected using the getstats() API at an interval of one or more seconds, sent over to a monitoring server, where it is collected, stored, aggregated and analyzed. We use a similar mechanism at testRTC to collect, analyze and visualize the results of our own probes.
What does that give us?
- The most accurate indication of performance for the end user – since the statistics are collected directly on the user’s device, there’s no loss of information from backend collection
- Easy access to the information – there’s a uniform means of data collection here taking place. One you can also implement inside native mobile and desktop apps that use WebRTC
- Increased reliance on the edge, a trend we see everywhere with WebRTC anyway
WebRTC chances a lot of things when it comes to how we think and architect VoIP networks. The part of how and why this is done on statistics and monitoring is something I haven’t seen discussed much, so I wanted to share it here.
The reason for that is threefold:
- Someone asked me a similar question on my contact page in the last couple of days, so it made sense to write a longform answer as well
- We’re contemplating at testRTC offering a passive monitoring product to use “on premise”. If you want to collect, store and analyze your own WebRTC statistics without giving it to any third party cloud service, then ping us at testRTC
- My online WebRTC training is getting a refresher and a new round of office hours. This all starts in April. Time to enroll if you want to educate yourself on WebRTC
The post How WebRTC Statistics and Performance Monitoring Changed VoIP Monitoring appeared first on BlogGeek.me.
Twilio Flex is a peak into the future of enterprise software.
This week, Twilio announced a new product called Flex. The name and the broad strokes about what Flex is found their way to TechCrunch some two weeks ago. I wanted to share my thoughts about Twilio Flex.A few notes before I start
- Twilio isn’t paying me for writing this
- They are a customer in other areas, but this one is all me. I think Flex (as well as Studio, Engagement Cloud, Functions, etc.) are interesting products coming from Twilio, and they are worth a long form analysis and review
- Articles on BlogGeek.me are never paid for. Neither are guest posts or interviews. If something interests me, I’ll write about it
- The information here is based mainly on a briefing I received about Flex and what I found since then on other sites (and on Twilio’s website)
- Flex is a departure of many things Twilio has been doing, making it an interesting initiative to analyze
Twilio Flex is CCaaS (Contact Center as a Service. It isn’t the first one. Twilio is touting it a Programmable Contact Center, which is how they are referring to all of their products.
Here’s Jeff Lawson’s keynote from Enterprise Connect, as usual, Jeff’s keynotes are worth the time and attention:
Where Twilio tried to differentiate Flex from existing solutions is by making it a fully functional contact center solution that is Flexible enough to customize and modify. It has APIs, but the day-to-day users won’t see them, and a lot of the customizations needed don’t require digging deep into the API layer either. That’s at least the intent (I didn’t have the chance to see the integration and API layers of Flex yet).
Twilio highlights 5 main benefits with Flex:
- Unlimited customization – through the lower layers of Twilio’s product portfolio, along with a new addition to it, the Flex UI (not a lot/enough was explained about it thus far)
- Instant omnichannel – support for multiple communication channels. More on this later
- Contextual intelligent – Twilio’s ML/AI roadmap lies here
- Trusted scale- due to its use of the Twilio infrastructure
- 2 million developers – that’s the number of Twilio registered developers
Flex fits well into one of Twilio’s largest market segments – the contact center. And there, Twilio are aiming for the contact centers sizing 1,000+ seats. The big boyz.
As it was working to move up the food chain, offering ever larger components, migrating away from developers towards end users in the B2B space and in contact centers made sense.Flex and the Twilio Portfolio
If I had to map the road Twilio is taking with its portfolio, it would end up being something like this (I’ve removed a lot of the products for simplicity):
Transactional: It started with SMS and Voice, adding VoIP services and later on expanding horizontally to other components and building blocks such as IP Messaging and others. In this layer, and to some extent in Omnichannel, Twilio’s focus is in a horizontal expansion towards “Best of Suite” offering.
Omnichannel: In 2017, Twilio added the Twilio Engagement Cloud. It placed a few existing products from its portfolio in that layer and added Notify and Proxy to them. They stated that these are “Declarative APIs” talking about general intent while including logic of their own. At the end of the day, many of the products/APIs in this layer are Omnichannel – they work across channels using the one available/preferred/whatever for the task at hand.
Visual: This is where the story became really interesting. Twilio added Studio to its portfolio. It went up the food chain again, this time, with a visual IDE and a message that Twilio is no longer a company that serves only developers, but one that can be used by others within the organization.
Programmable Enterprise Software: This is where Flex comes in, going up the food chain again. This time, offering a solution that doesn’t interact with the end users only as a consequence (a phone rings), but rather has a new set of users – people who aren’t developers or planners who sit in front of the tool every day and use it. The contact center agents and personnel.
Flex was defined to me in the domain of “Programmable Applications”. Twilio, in a way, trying to do two things with this definition:
- Programmable means it isn’t diverging from its roots completely, just taking the obvious next step in its evolution. All of its core products are Programmable X (X being SMS, Voice, Video, …)
- It allows it to position Flex not as another contact center, but rather as something new that is different
To me it is about the future of enterprise software and how to make it programmable and flexible in ways that are still impossible today. The closest to that we’ve got is probably having so many vendors integrate with Zapier.
I am sold to that kind of a future, but I am not sure others will be.Flex Channels Proposition
Flex leans on a lot of other products in Twilio’s portfolio. One of its core values lies in omnichannel, and the fact that Twilio is already investing in a programmable layer that handles that (the Engagement Cloud). The proposition here is that whatever Twilio adds as a channel for developers, gets almost automatically added to Flex for its contact center customers.
Out the door, Flex comes with support for Voice, SMS, Chat, Video, Email, Fax, Twitter DM, Google RCS, Facebook Messenger and LINE. It also includes Screen Sharing and Co-Browsing as additional capabilities within the interactions. Developers can add additional channels to customize their contact center as well.
The list of channels is impressive, but somehow Apple Business Chat is missing in that list. Apple’s launch partners in this case were contact center vendors (LivePerson, Nuance, Genesys and Salesforce). Twilio, which is still recognized solely as a CPaaS vendor didn’t make the cut. I am sure Twilio tried becoming a partner, so this is more likely a decision made by Apple. I am also sure that once Apple opens up Business Chat to more developers, Twilio will be adding support to it.
The biggest promise here? Twilio is already committed to omnichannel in its products, and Flex will enjoy from that commitment as will Flex’ customers.
Think you know how WebRTC fits in a contact center? Check out with The Complete WebRTC Contact Center Uses SwipefileGet the swipefile Machine Learning and Artificial Intelligence in Flex
A year or two ago, ML and AI in CPaaS was science fiction. Twilio as well as its competitors delved in the real time. In transactional and transient communications. If any machine learning work was taking place, it was in the operational layers – in an effort to optimize cost and deliverability of its service to its customers.
Last year, Twilio launched Understand, a layer built on top of Google’s Natural Language Processing capabilities (NLP). Understand is where Twilio started looking in ML and AI in the context of actual services for its customers. It looks at the problem domain of its customers (mainly contact centers) and tries to offer higher level APIs that are easier to use and are targeted at NLU (Natural Language Understanding). This then gets focused to the specific domain of the customer’s needs, and you get something that is usable today (as opposed to building a general purpose AI such as Siri, Alexa or Google Assistant).
The result in Understand is a way to simplify the development processes and requirements for Twilio’s customers when it comes to NLU.
That also got wrapped into Flex, at least on slides.
My feelings? The AI story of Flex is built out of two parts:
- Collecting all the existing ML/AI/intelligent related capabilities of Twilio and wrapping them inside Flex. This is done through internal APIs as well as via partners
- Having a roadmap vision / story of what AI means in Flex moving forward
AI being the holy grail that it is, you can’t ignore it when launching a new service these days.Flex Pricing is Key
Pricing for Flex hasn’t been announced, but one thing was made clear – it will be based on a per seat price and not usage based as other Twilio products.
This is where things get somewhat challenging for Twilio, and here’s why:
- Twilio has been comfortable so far to offer a usage based model. Switching to a per seat model will have its differences in how it calculates its revenue and margins
- By opting for per seat pricing, Twilio falls into the contact center industry “comfort zone” – the model is known and accepted already
- But this also makes comparing Twilio Flex pricing to other contact centers rather “easy”. It means I can now compare apples to apples when selecting between Flex and any other vendor
- We don’t have price points, but if the price point will be based on the industry average or accepted standard, then many analysts and experts will end up saying that there’s no disruption or anything new in Twilio Flex. For the pundits, Flex may seem like an ordinary contact center and without price disruption there can be no disruption with that mindset
- If the price points are too high, then Twilio will be going after its own contact center customers, who will see this as direct competition. Such a move can signal others that Twilio is willing to go into their turf as well. It will question the potential and attractiveness of joining the Flex marketplace
- If the price points will be lower, then where will be the margins for Twilio?
My guess is that Twilio is still looking for price validation and it is doing so this week at Enterprise Connect and planning to continue doing so in the coming weeks until it is ready to announce the price points publicly.Who is Twilio Flex for?
This is the main question, and one that I am not sure of the answer.
Twilio is saying the target audience is 1,000+ seats contact centers. It makes sense to go for the larger contact centers at a time when the transition towards the cloud and digital transformations of contact centers is happening more.
But would I be using it in my business or go through a third party?
Should a Twilio customer that built a contact center on its own on top of Twilio migrate to Flex?
Should a Twilio customer that built a contact center for others to use on top of Twilio see Flex as a threat or as an opportunity to improve its own contact center offering?
Twilio stated that 89% of contact centers today are still deployed on premise, and that the market is large enough. These statement was said to answer two questions:
- The market is big enough for both its existing customers and for Flex, so it isn’t competing directly with its customers (I guess its customers will have to decide if that’s true for them or not)
- The market is big for Twilio to grow in. Twilio is relying on that to keep growing
Twilio was already trending upwards when the word on Flex leaked by TechCrunch on Feb 17, and has increasing since:
Is that related to Flex or not, I can’t say. To me, going to contact centers as an adjacent market and eating up more of the pie there is a bold move. If it succeed, then Twilio will be much bigger than it is today.The Unknowns
There are things that are still unknown to me here. They are technical ones, but important for my own perspective and analysis. They are related to what wasn’t directly in the briefing or the materials I’ve seen since the official announcement.
Here are a few things I am really interested in:
- What are the exact integration points for Flex?
- How are developers expected to integrate with it?
- Where do you use Twilio APIs? Where will you be making use of Twilio Studio? Where do you write a Twilio Function? How about Twilio Understand?
- Flex UI is brand new. How does it fair as a standalone product enabler? What can developers do with it?
- What will it mean to integrate Flex with a CRM? Does it make more sense to integrate the CRM into the Flex UI or does it make more sense to integrate Flex into the CRM UI?
- What parts of “contextual intelligence” really exist in Flex today? How does it compare to existing market offerings?
- What do contact center vendors using Twilio think about Flex? How will they react to it?
Here’s one way to map the communications landscape:
And here’s another:
What’s your worldview here?
The post Twilio Flex = Twilio Flexing its Flexibility (or the programmable contact centers) appeared first on BlogGeek.me.