Video, in the hands of the correct company can be a powerful thing.
In 2012 Telefonica acquires TokBox. I wrote about it at the time – almost 6 years ago. It seems sad reading that piece about TokBox acquisition again. I suggested three areas where Telefonica can make a difference with TokBox. Let’s see what happened.What Could Telefonica do with TokBox?
What I said in 2012:
Will Telefonica wait the same amount of time it did with Jajah until it does something with this acquisition? I hope they will move faster this time…
Telefonica did nothing with TokBox. They haven’t integrated them into anything. They decided to leave TokBox independent.
This has helped grow TokBox in the 6 years into one of the dominant players in video APIs for real time communications. Almost any developer and initiative that I talk to which has decided to go for a 3rd party platform decided to use TokBox. I see others as well, but not as frequent.
Since the acquisition, TokBox:
Telefonica failed to make use of TokBox. It didn’t go into video with it. It didn’t try to figure our VoIP. It didn’t try to understand why developers chose TokBox. Telefonica did nothing other than let TokBox continue in its trajectory. It is probably why Telefonica lost interest and decided to sell TokBox to Vonage.
Telefonica plans on folding TokBox into BlueVia, but how will they combine TokBox, if at all, with their Tu Me VoIP OTT service?
Telefonica made no use of its strengths to find synergies with TokBox. Would doing so kill TokBox altogether, or could it made them stronger?
What will Telefonica do about voice? Their main API set doesn’t seem to include voice calling, but now it has video… will they be going for Twilio or Voxeo for that one? Or will they roll out their own? Will they skip voice altogether?
TokBox doubled down on video, beefing up their capabilities in that domain. It has a SIP connector, but nothing more than that. It is a missed opportunity.Where is TokBox today?
TokBox is video communication APIs. There are other vendors out there doing that today: Twilio, Vidyo.io, Agora, Sinch, Voximplant, Temasys and probably a few others I forgot to mention (sorry for missing out on you).
TokBox are the market leader here, when it comes to breadths of features in the video space.
It just wasn’t enough to get them to more customers and garner more than $35 million in the acquisition. I’d attribute this to:
Does this say anything about the market of video APIs? The viability of it to other vendors? The importance of video in the bigger picture?
I don’t really know.Where are we with Video CPaaS?
Video CPaaS, and in a way we can extend it to WebRTC CPaaS vendors – those who don’t dabble too much with PSTN voice and/or SMS is a finickey market. The vendors that get acquired in this space are gobbled up never to be seen again (think AddLive or Requestec) or they just don’t grow fast enough or become as big as their PSTN voice/SMS counterparts.
IDC maintains that the U.S. programmable video market will be a $7.4 billion opportunity by 2022, representing more than a 140% four-year CAGR. Assuming only 10% of that becomes a reality, the question becomes who will be the winners in programmable video?
What types of services do they need to offer? What products? Are these lower level APIs, or higher level abstractions? Maybe we’re looking at almost complete solutions with a nice API lipstick on top that get calculated in that $7.4 billion.
Video is here to stay.
It won’t be replacing every voice call. But it definitely has its place.
Otherwise, why did apple go for group video calls in FaceTime with 32 participants in their latest iOS?
And why did Whatsapp just add group video calls? And Instagram added group video calls?
Are they doing it just for fun? Is the market bound to be focused only on larger social networks?
I can’t believe that will be the case.
I came from a video conferencing company. Every year I was promised by management that this year will be the year of video. It never happened.
The last 5 years, I am using video so much that the year of video has passed already.
I guess the next question is what year will be the year of video CPaaS?
The difference in these two questions is that the year of video is the year when video became a widespread service. The year of video CPaaS will be the year when video becomes a widespread feature. We’re not there yet, but we’re heading in that direction.
In many ways, TokBox is one of the vendors figuring out how to get there.Where are we with CPaaS?
CPaaS seems to be different, but only slightly.
Growth in this space, as far as I understand, comes from SMS and PSTN voice. That’s it.
VoIP? WebRTC? IP messaging? Social omnichannel aggregation? Video? All nice to have features for now that don’t affect the bottomline enough. And at the moment, they don’t seem to be big enough to fill in the gap when SMS and PSTN voice fall out of favor.
To be a successful CPaaS vendor today, you need to:
The thing about that third point, is that it won’t be as simple to achieve as doing what CPaaS did with SMS and PSTN. In SMS and PSTN, CPaaS needed to act as an aggregator of carriers with a simple API. No one wants to deal with carriers (which is why they fail with these API initiatives when it comes to WebRTC and video services), so friendly CPaaS vendors are a great alternative.
What is the mote/barrier that CPaaS vendors are building in the IP world? Answering this question holds the key to the future of CPaaS.What will Vonage do with TokBox?
Not have it as a standalone business.
Doing that, would mean perpetuating what happened in Telefonica. While not all of it was bad, it didn’t bring the expected growth with it.
Vonage is uniquely positioned here – more than any other vendor in the market, which is probably why it ended up acquiring TokBox.
I’ll go back to my venn diagrams for an explanation here:
TBD – IMAGE HERE
The opportunity space:
Telefonica was never a serious competitor in video CPaaS.
Nexmo and by extension Vonage is.
Nexmo is probably second to only Twilio.
TokBox is probably first in video CPaaS.
They combine nicely and offer Nexmo a capability that its competitors don’t have if you look at the breadth of their video offering.
If Vonage executes this well, the end result will be a better CPaaS offering, a better Nexmo and a better Vonage.
If you’re new to WebRTC, Jitsi was the first open source Selective Forwarding Unit (SFU) and continues to be one of the most popular WebRTC platforms. They were in the news last week because their parent group inside Atlassian was sold off to Slack but the team clarified this does not have any impact on the Jitsi […]
The post Suspending Simulcast Streams for Savvy Streamlining (Brian Baldino) appeared first on webrtcHacks.
Simulcast is one of the more interesting aspects of WebRTC for multiparty conferencing. In a nutshell, it means sending three different resolution (spatial scalability) and different frame rates (temporal scalability) at the same time. Oscar Divorra’s post contains the full details. Usually, one needs a SFU to take advantage of simulcast. But there is a […]
Our AI in RTC report is just about ready. Here are all of its price points.
If you aren’t interested in AI and RTC, then move on – this one isn’t for you.
In the past several months I’ve been adding into my daily activities the creation of a new report – one about AI in RTC.
It has taken its toll – I’ve slept a bit less. Read a bit less. Turned down and postponed a few clients. All in order to get this project going. I’ve partnered with Chad Hart on it, one of my partners in crime at Kranky Geek and a fellow consultant.
We wanted to work on something new and interesting and this seemed to be the right thing to do.
After countless hours in interviews with vendors and suppliers in this space, discussions we had with one another and time spent just looking at the ceiling of my office and thinking, I can say that we’re almost ready with the report. Most of it is already written, and what is left will be completed really soon.What will you find in this report?
Publication date is scheduled to end of July. We might miss it by a few days due to editing and some last minute changes.
We’re allowing payment via PayPal and wire transfer inside the US. We don’t have any digital shopping cart, as this is a first for us through Kranky Geek Research. It also means we’re treating each and every purchaser as royalty
Why wait for the price to raise? Join those who’ve already purchased at our discounted prepublication price. Interested? Just email us.
The post AI in RTC: Final Price Points and End of Prepublication Discount appeared first on BlogGeek.me.
Autonomous cars are sucking all the oxygen out of video AI in real time comms. Talent is focusing elsewhere
I went to the data science summit in Israel a month or so back. It was an interesting day. But somehow, I had to make sure to dodge all the boring autonomous cars sessions .they just weren’t meant for me, as I was wondering around, trying to figure out where machine learning and AI fit in RTC (you do remember I am working on a report on this – right?).
After countless of interviews done this past month, along with my partner in crime here, Chad Hart, I can say that I now know a lot more about this topic. We’ve mapped the industry in and out. Talking to technology vendors, open source projects, suppliers, consumers, you name it.
There were two interesting themes that relate to the use of AI in video – again – focus is on real time communications:
Guess what – we’re about to incorporate the responses we got on our web survey on AI in RTC into the report. If you fill it, you’ll get our upcoming “Introduction to AI in RTC ebook” and a chance to win on of 5 $100 Amazon gift cards – along with our appreciation of helping us out. Why wait?
In broad strokes, when you want to do something with AI, you’ll need to either source it from other vendors or build it on your own.
As an example, you can just use Amazon Rekognition to handle object classification, and then you don’t need a lot of in-house expertise.
The savvy vendors will have people handling machine learning and AI internally as well. Being in the build category, means you need 3 types of skills:
Data scientists are the hardest to find and retain. In one of our interviews, we were told that the company in question had to train their internal workforce for machine learning because it was impossible to hire experience in the valley – Google, Apple, Facebook and Amazon are the main recruiters for that position and they are too competitive in what they offer employees.
Data engineers are probably easier to find and train, but what is it you need them to do exactly?
And then there’s product managers. I am not even sure there’s any training program specifically for product managers who need to work in this space. I know I am still learning what that means exactly. Part of it by asking through our current research how do vendors end up adding AI into their products. The answers vary and are quite interesting.
Anyways – lots of hype. Less in the way of real skills out there you can hire for the job.Autonomous driving is where computer vision is today
If you follow the general technology media out there, then there are 3 things that bubble up to the surface these days when it comes to AI:
The third one is a very distinct use case. And it is the one that is probably eating away a lot of the talent when it comes to computer vision. The industry as a whole is interested for some reasons to take a stab at making cars drive on their own. This is quite a challenge, and it is probably why so many researchers are flocking towards it. A lot of the data being processed in order to get us there is visual data.
Vision in autonomous cars cannot be understated. This ABC News clip of the recent Uber accident drives that point home. Look at these few seconds explaining things:
“These vehicles are trained to see pedestrians, to see cyclists, to see redlights. So it’s really unclear what went wrong here”
And then you ask a data scientist to deal withboring video meeting recordings to do whatever it is we need to do in real time communications with AI. Not enough fame in it as opposed to self driving cars. Not enough of a good story to tell your friends when you meet them after work.Computer vision in video meetings is nascent
Then there’s the actual tidbit of what we do with AI in computer vision versus what we do with AI in video meetings.
I’d like to break this down into a table:Computer vision Video meeting AI
Why is this difference? Two main reasons:
As we move forward, companies will start figuring this one out – deciding how data pipeline for computer vision need to look like in video meetings AND decide what use cases are best addressed with computer vision.Where are we headed?
The communication market is changing. We are seeing tremendous shifts in our market – cloud and APIs are major contributors to this. Adding AI into the mix means change is ahead of us for years to come.
On my end, I am adding ML/AI expertise to the things I consult about, with the usual focus of communications in mind. If you want to take the first step into understanding where AI in RTC is headed, check out our upcoming report – there’s a discount associated with purchasing it before it gets published:
You can download our report prospectus here.
WebRTC H.264 hardware acceleration is no guarantee for anything. Not even for hardware acceleration.
There was a big war going on when it came to the video codec in WebRTC. Should we all be using VP8 or should we be using H.264? A lot of digital ink was spilled on this topic (here as well as in other places). The final decision that was made?
Both VP8 and H.264 became mandatory to implement by browsers.
So… which of these video codecs should you use in your application? Here’s a free mini video course to help you decide.
Enroll to free course
Fast forward to today, and you have this interesting conundrum:
Leaving aside the question of what mandatory really means in English (leaving it here for the good people at Apple to review), that makes only a fraction of the whole story.
There are reasons why one would like to use VP8:
There are reasons why one would like to use H.264:
I want to open up the challenges here. Especially in leveraging hardware based encoding in WebRTC H.264 implementations. Before we dive into them though, there’s one more thing I want to make clear:
You can use a mobile app with VP8 (or H.264) on iOS devices.
The fact that Apple decided NOT to implement VP8, doesn’t bar your own mobile app from supporting it.WebRTC H.264 Challenges
Before you decide going for a WebRTC H.264 implementation, you should need to take into consideration a few of the challenges associated with it.
I want to start by explaining one thing about video codecs – they come with multiple features, knobs, capabilities, configurations and profiles. These additional doozies are there to improve the final quality of the video, but they aren’t always there. To use them, BOTH the encoder and the decode need to support them, which where a lot of the problems you’ll be facing stem from.#1 – You might not have access to a hardware implementation of H.264
In the past, developers had no access to the H.264 codec on iOS. You could only get it to record a file or playback one. Not use it to stream media in real time. This has changed and now that’s possible.
But there’s also Android to contend with. And in Android, you’re living in the wild wild west and not the world wide web.
It would be safe to say that all modern Android devices today have H.264 encoder and decoder available in hardware acceleration, which is great. But do you have access to it?
The illustration above shows the value chain of the hardware acceleration. Who’s in charge of exposing that API to you as a developer?
The silicon designer? The silicon manufacturer? The one who built the hardware acceleration component and licensed it to the chipset vendor? Maybe the handset manufacturer? Or is it Google?
The answer is all of them and none of them.
WebRTC is a corner case of a niche of a capability inside the device. No one cares about it enough to make sure it works out of the factory gate. Which is why in some of the devices, you won’t have access to the hardware acceleration for H.264 and will be left to deal with a software implementation.
Which brings us to the next challenge:#2 – Software implementations of H.264 encoders might require royalty payments
Since you will be needing a software implementation of H.264, you might end up needing to pay royalties for using this codec.
I know there’s this thing called OpenH264. I am not a lawyer, though my understanding is that you can’t really compile it on your own if you want to keep it “open” in the sense of no royalty payments. And you’ll probably need to compile it or link it with your code statically to work.
This being the case, tread carefully here.
Oh, and if you’re using a 3rd party CPaaS, you might want to ask that vendor if he is taking care of that royalty payment for you – my guess is that he isn’t.#3 – Simulcast isn’t really supported. At least not everywhere
Simulcast is how most of us do group video calls these days. At least until SVC becomes more widely available.
What simulcast does is allows devices to send multiple resolutions/bitrates of the same video towards the server. This removes the need of an SFU to transcode media and at the same time, let the SFU offer the most suitable experience for each participant without resorting to lowest common denominator type of strategies.
The problem is that simulcast in H.264 isn’t available yet in any of the web browsers. It is coming to Chrome, but that’s about it for now. And even when it will be, there’s no guarantee that Apple will be so kind as to add it to Safari.
It is better than nothing, though not as good as VP8 simulcast support today.#4 – H.264 hardware implementations aren’t always compatible with WebRTC
Here’s the kicker – I learned this one last month, from a thread in discuss-webrtc – the implementation requirements of H.264 in WebRTC are such that it isn’t always easy to use hardware acceleration even if and when it is available.
Read this from that thread:
Remember to differentiate between the encoder and the decoder.
The Chrome software encoder is OpenH264 – https://github.com/cisco/openh264
Contributions are welcome, but the encoder currently doesn’t support either High or Main (or even full Baseline), according to the README file.
Hardware encoders vary greatly in their capabilities.
Harald Alvestrand from Google offers here a few interesting statements. Let me translate them for you:
And then comes this nice reply from the good guys at Fuze:
@Harald: we’ve actually been facing issues related to the different profiles support with OpenH264 and the hardware encoders. Wouldn’t it make more sense for Chrome to only offer profiles supported by both? Here’s the bad corner case we hit: we were accidentally picking a profile only supported by the hardware encoder on Mac. As a result, when Chrome detected CPU issues for instance, it would try to reduce quality to a level not supported by the hardware encoder which actually led to a fallback to the software encoder… which didn’t support the profile. There didn’t seem to be a good way to handle this scenario as the other side would just stop receiving anything.
If I may translate this one as well for your entertainment:
So. Got hardware encoder and/or decoder. Might not be able to use it.#5 – For now, H.264 video quality is… lower than VP8
That implementation of H.264 in WebRTC? It isn’t as good as the VP8 one. At least not in Chrome.
This is for the same scenario running on the same machines encoding the same raw video. The outgoing bitrate variance for VP8 is 0.115 while it is 0.157 for H.264 (the lower the better). Not such a big difference. The framerate of H.264 seems to be somewhat lower at times.
I tried out our new scoring system in testRTC that is available in beta on both these test runs, and got these numbers:
The 9.0 score was given to the VP8 test run while H.264 got an 8.8 score.
There’s a bit of a difference with how stable VP8’s implementation is versus the H.264 one. It isn’t that Cisco’s H.264 code is bad. It might just be that the way it got integrated into WebRTC isn’t as optimized as the VP8’s integration.
Then there’s this from the same discuss-webrtc thread:
We tried h264 baseline at 6mbps. The problem we ran into is the bitrate drastically jumped all over the place.
I am not sure if this relates to the fact that it is H.264 or just to trying to use WebRTC at such high bitrates, or the machine or something else entirely. But the encoder here is suspect as well.
I also have a feeling that Google’s own telemetry and stats about the video codecs being used will point to VP8 having a larger portion of ongoing WebRTC sessions.#6 – The future lies in AV1
After VP8 and H.264 there’s VP9 and H.265 respectively.
H.265 is nowhere to be found in WebRTC, and I can’t see it getting there.
And then there’s AV1, which includes as its founding members Apple, Google, Microsoft and Mozilla (who all happen to be the companies behind the major web browsers).
The best trajectory to video codecs in WebRTC will look something like this:Why doesn’t this happen in VP8?
It does. To some extent. But a lot less.
The challenges in VP8 are limited as it is mostly software based, with a single main implementation to baseline against – the one coming from Google directly. Which happens to be the one used by Chrome’s WebRTC as well.
Since everyone work against the same codebase, using the same bitstreams and software to test against, you don’t see the same set of headaches.
There’s also the limitation of available hardware acceleration for VP8, which ends up being an advantage here – hardware acceleration is hard to upgrade. Software is easy. Especially if it gets automatically upgraded every 6-8 weeks like Chrome does.
Hardware beats software at speed and performance. But software beats hardware on flexibility and agility. Every. Day. of. The. Week.What’s Next?
The current situation isn’t a healthy one, but it is all we’ve got to work with.
I am not advocating against H.264, just against using it blindingly.
How the future will unfold depends greatly on the progress made in AV1 as well as the steps Apple will be taking with WebRTC and their decisions of the video codecs to incorporate into Webkit, Safari and the iOS ecosystem.
Whatever you end up deciding to go with, make sure you do it with your eyes wide open.
So… which of these video codecs should you use in your application? Here’s a free mini video course to help you decide.
Enroll to free course
The post The Challenging Path to WebRTC H.264 Video Codec Hardware Support appeared first on BlogGeek.me.
Parallax, or eye contact in video conferencing is a problem that should be solved, and AI is probably how we end up solving it.
I’ve been working at a video conferencing company about 20 years ago. Since then a lot have changed:
One thing hasn’t really changed in all that time.
I still see straight into your nose or straight at your forehead. I can never seem to be able to look you in the eye. When I do, it ends up being me gazing straight at my camera, which is unnatural for me either.
The reason for this is known as the parallax problem in video conferencing. Parallax. What a great word.
If you believe Wikipedia, then “Parallax is a displacement or difference in the apparent position of an object viewed along two different lines of sight, and is measured by the angle or semi-angle of inclination between those two lines.”
A mouthful. Let me illustrate the problem:
What happens here is that as I watch the eyes of the person on the screen, my camera is capturing me. But I am not looking at my camera. I am looking at an angle above or beyond it. And with a group call with a couple of people in it in Hollywood squares, who should I be looking at anyway?
So you end up with either my nose.
Or my forehead.
What we really want/need is to have that camera right behind the eyes of the person we’re looking at on our display – be it a smartphone, laptop, desktop or room system.
Over the years, the notion was to “ignore” this problem as it is too hard to solve. The solution to it usually required the use of mirrors and an increase in the space the display needed.
Here’s an example from a failed kickstarter project that wanted to solve this for tablets – the eTeleporter:
The result is usually cumbersome and expensive. Which is why it never caught on.
There are those who suggest tilting the monitor. This may work well for static devices in meeting rooms, but then again, who would do the work needed, and would the same angle work on every room size and setup?
When I worked years ago at a video conferencing company, we had a European research project we participated in that included 3D imaging, 3D displays, telepresence and a few high end cameras. The idea was to create a better telepresence experience that got eye contact properly as well. It never saw the light of day.
Today, multiple cameras and depth sensors just might work.
Let’s first take it to the extreme. Think of Intel True View. Pepper a stadium with enough cameras, and you can decide to synthetically re-create any scene from that football game.
Since we’re not going to have 20+ 5K cameras in our meeting rooms, we will need to make do with one. Or two. And some depth information. Gleaned via a sensor, dual camera contraption or just by using machine learning.
Which is where two recent advancements give a clue to where we’re headed:
The idea? Analyze and “map” what the camera sees, and then tweak it a bit to fit the need. You won’t be getting the real, raw image, but what you’ll get will be eye contact.Back to AI in RTC
In our interviews this past month we’ve been talking to many vendors who make use of machine learning and AI in their real time communication products. We’ve doubled down on computer vision in the last week or two, trying to understand where is the technology today – what’s in production and what’s coming in the next release or two.
Nothing I’ve seen was about eye contact, and computer vision in real time communication is still quite nascent, solving simpler problems. But you do see the steps taken towards that end game, just not from the video communication players yet.
The post Can AI and Computer Vision solve the video conferencing eye contact problem? appeared first on BlogGeek.me.
Is it machine learning or artificial intelligence? It ends up depending who you ask and what is it you care about.
There are multiple ways to think and look at machine learning and artificial intelligence. And just like any other hyped technologies, people seem to mix the two and use them interchangeably.
I’ll let you in on a little secret: we’re doing the same with our upcoming AI in RTC report.
Want to help us with our research AND get a free ebook AND have a chance to win one of five $100 Amazon gift cards?
We could have just as easily use the title “ML in RTC” instead of “AI in RTC”. The way we’d approach and cover the space and end up writing this market research would be… the same – in both cases.
Which brings me to this article.
Machine Learning and Artificial Intelligence are somewhat different from one another. The problem is to decide what that difference is.
Here are 4 ways to think about ML and AI:#1 – ML = AI
Let’s start with the easiest one: ML is AI. There’s no difference between the two and they can be used interchangeably.
This is the viewpoint of the marketer, and today, of the market itself.
When everyone talks about AI, you can’t not talk about AI. Even if what you do is just ML. Or BigData. Or analytics. Or… whatever. Just say you’re doing AI. It is good for the health of your stock price.
While at it, make sure to say you’re doing AI in an ICO cryptocurrency fashion. What can go wrong?
Someone tells you he is doing AI? Assume ML, and ask for more information. Make your own judgement.#2 – The road to AI From Operational to BI
We’ve had databases in our products for many years now. We use them to store data, run transactions and take actions. These are known as operational databases. For many years we’ve had another set of databases – the analytical ones, used in data warehouses. The reason we needed them is because they worked better when asking questions requiring aggregations that look at large series of historical data.
That got the marketing terms of BI (Business Intelligence) and even Analytics.
BI because we’re selling now to the business (at a higher price point of course). And what we’re selling is value.
Analytics because it sounds harder than the operational stuff.From BI to BigData
The next leg of that journey started about a decade ago with BigData.
Storage started costing close to nothing, so it made sense to store everything. But now data warehouses from the good-ol’ BI days got too expensive and limiting. So we came out with BigData. Things like Hadoop and Cassandra came to be and we were happy again.
Now we could just throw all our data into Hadoop and run batch processes on it called MapReduce that ended up replacing/augmenting our data warehouses.
BigData was in big hype for some time. While it is very much alive today, it seems to have run out of steam for marketers. They moved on to Machine Learning.From BigData to ML
This step is a bit more nuanced, and maybe it isn’t a step at all.
Machine Learning covers the research area of getting machines to decide on their own algorithm – or more accurately – decide on how an algorithm will be used based on a given dataset.
Machine learning algorithms have been around well before machines. If you check the notes on Wikipedia for Linear Regression, you’ll find the earliest methods for it were published in 1805. And to be fair, these algorithms are used in BI as well.
The leap from BigData to ML happened mostly because of Deep Learning. Which I am keeping as a separate leap from ML. Why? Because many of the things we do today end up being simpler ML algorithms. We just call it AI (or ML) just because.
Deep Learning got everyone on the ML bandwagon.From ML to Deep Learning
Deep Learning is a branch of Machine Learning. A certain type of machine learning algorithms.
They became widely popular in recent years since they enabled the accuracy of certain tasks to increase significantly.
There are two things we can now achieve due to deep learning:
Here’s how Google fairs now (taken from KPCB internet trends):
We’ve been around the 70% accuracy at 2010, after a gradual rise in the past 40 years or so from 50%.
This steep rise in accuracy in this decade is attributed to the wide use of machine learning and the amount of data available as training material to the algorithms.
Deep learning is usually explained as neural networks, making it akin to human thinking (at least until the next wave of better algorithms will be invented which are more akin to human thinking).From Deep Learning to AI
And then there’s artificial intelligence.
Less a specific algorithm and more a target. To replace humans. Or to do what humans can do.
Or my favorite:
AI is a definition of what we can’t do with machines today.
Once we figure that out, we’ll just put AI on the next pedestal so we’ll have a target to conquer.#3 – Learning or Imitating?
Here’s one that is slightly different. I heard it at a data science event a couple of weeks ago.
Machine Learning is about getting machines to select their own algorithm by presenting them a set of rules and outcomes:
Artificial Intelligence is about doing something a human can do. Probably with the intent to replace him by automating the specific task. Think about autonomous driving – we’re not changing the roads or the rules of driving, we just want a car to drive itself the way a human would (we actually want the machine to drive better than humans).
This one I saw at a recent event, which got me on this track of ML vs AI in the first place.
Machine Learning is about Predictions, while Artificial Intelligence is about Actions.
You can use machine learning to understand things, to classify them, predict and estimate. But once the time comes to act upon it, we’re in the realm of artificial intelligence.
It also indicates that any AI system needs ML to operate.
I am sure you can poke holes in this one, but it is useful in many ways.Why do we care?
While I am not a stickler to such details, words do have meaning. It becomes an issue where everyone everywhere is doing AI but some end up with a Google Duplex while others show a rolling average on a single metric value.
If you are using communications and jumpstarting an AI initiative, then be sure to check out our upcoming report: AI in RTC.
Want to help us with our research AND get a free ebook AND have a chance to win one of five $100 Amazon gift cards?
The post ML vs AI: What’s the difference between machine learning and artificial intelligence? appeared first on BlogGeek.me.
An interview with Alan Masarek, CEO of Vonage.
Doing these video interviews is fun, so when the opportunity arose to be at the Vonage headquarters in Holmdel, New Jersey, it made sense to ask for a video interview with Alan Masarek, the CEO of Vonage.
In this interview, I wanted to get Alan’s viewpoint about the space he is operating in, especially now, some two years after the acquisition of Nexmo. It is quite common to find UCaaS vendors then are heading towards the contact center. Many will even add APIs on top. Vonage is the only one who decided to acquire a dominant CPaaS vendor (Nexmo).
As usual, you’ll find the transcript right below the video.
I enjoyed the interview and the hospitality. I’d like to thank Alan and the team at Vonage for setting this one up.Transcript
Tsahi: Hi. So I have got here today, Alan Masarek, CEO of Vonage at the Holmdel, Vonage Technology Center.
Alan: That’s correct. We’re thrilled to be here at our Vonage Technology Center. It’s a pleasure to be with you, Tsahi. Thank you.
Tsahi: Thank you for having me here. I have a question before we start and this really bugged me a bit during the time that I’ve learnt about you and about the company: You came from Google to Vonage.
Alan: Well, first of all, if that’s the only thing that’s bugged you, that would be exceptional. But in all seriousness, what excited me when I was presented this opportunity when I was at Google … And I’d gotten to Google from selling my earlier company to them back in 2012. So I was a director in the Chrome and apps group and I was very involved in the whole rollout of what is now today, G Suite. We used to call it Google for Enterprise.
What intrigued me about coming here was the opportunity to take this almost iconic consumer brand company that built this amazing level of awareness around providing residential phone service and how you could take the brand and the network asset as well as the cash flow from consumer candidly, and use that to pivot into business. I always look at markets the same way. You sort of sit back and you say, “Is that market worth winning and do you have the assets to give you an ability to win it?”
So when you look at the broader business communications market, it’s a massive TAM growing very quickly. And then even when you look at the competitive set, I found the big companies in this set were pretty unfocused. Most of the competitors were smaller companies, had less brand awareness, less sort of national scope, less profitability. So you have this huge TAM, a surmountable competitive set, then you have these assets from consumer that we felt we could bring to bear to win and that’s exactly what we’ve been executing on, that’s what we saw when I was at Google, that’s what I came here to do.
Tsahi: So you’re actually staying in this area between consumer and enterprise. You did that at Google with acquisition and now here at Vonage, moving from consumer to businesses.
Alan: That’s correct. So the company that I sold to Google focused really in the prosumer and enterprise segment. So we were a productivity solution that individuals would use and corporations would use. Here, we obviously have moved very specifically from our roots in consumer, in residential, focused in business. When we began that pivot, we started with small companies because that’s where the action was and the move to cloud, but now we’ve moved very purposefully upmarket to larger and larger corporate customers.
Last year, we signed what I think is the largest deal ever done in cloud communications with the largest residential real estate company in the United States. 21,000 corporate seats moving from prem to cloud and another 125,000 franchise seats.
Tsahi: Interesting. And what gets you up in the morning?
Alan: Well, this morning at 5 o’clock, my alarm clock but … What I’m excited about and I’ve continued … The reason I came here to begin with is I want to build a remarkable company here. It’s not just the transformation from moving from a residential-focused company to a business-focused company. We’re clearly executing on all those elements, whether it’s the technology platform itself, sales execution, the post-sales experience we provide our customers, all those things that we’re doing. But as important and in some respects if not more important, it’s the cultural transformation as well.
What I find that is really sort of stimulating to me is to create that switched-on Silicon Valley mindset culture. I like to think that we’re a billion dollar startup is what we talk about it. Last year, we finally crossed the billion dollar in revenue threshold. But I want to have the agility, the speed, the openness, the transparency, the honesty, all that, in order for Vonage to be … The way I describe it is I want Vonage to be that destination place to work the way Google was and everybody celebrates when they get a Google. I want them to feel the same way getting a job here.
Tsahi: Okay. And you’re a cloud communication company at the end of the day and cloud communication in the last few years have got a lot of attention, especially this last year. How come most of the businesses today are still on-premise when it comes to their communication needs?
Alan: On the communication side, the move to cloud has happened more slowly than CRM and ERP and HRM software, things like that. I think because the nature of dial tone has been about as reliable as the sun coming up tomorrow and there’s a great degree of risk that’s associated with it. Companies sit back and they say, “My goodness. It works. I don’t necessarily want to change it.” Now, the reality is when you move from the traditional prem-based solutions and the old PSTN network and such to IP-based, cloud-based solutions, you have infinite scalability, much, much more functionality, the whole notion of unified communications and communications platform as a service all stems from that. But I just think there’s been a fear factor that has caused it to migrate to the cloud more slowly than some of these other verticals.
But you see this amazing tipping point as recently as five years ago, only small companies for the most part were moving to the cloud. Now it has moved all the way up to major enterprises. And there are just example after example of other huge companies, global multinationals moving to cloud. It’s sort of no longer in dispute that cloud will supplant prem. It’s just like anything takes time.
Tsahi: What triggers them to do that shift, that migration from on-prem to cloud?
Alan: There are several trigger points. A couple of them are the comfort of moving to cloud. The cloud was scary just a few years ago and so it was to be avoided by bigger companies. But beyond that, it’s the productivity that they can get. Every company out there is going through their own digital transformation of one form or the other. Everybody is looking over their shoulder, scared to death of the more digitally transformed competitor has a bullseye on their back, is coming after their business. Obviously, we can always cite the example of physical retail stores versus Amazon eCommerce. That notion of digital transformation everyone has to go through and I think what’s happened is up until very recently, communications has been sort of the underappreciated element of digital transformation.
I always have this sort of visual metaphor in my mind that you can picture somebody on the old black rotary dial phone talking to a colleague saying, “We got to get that eCommerce site up.” Not realizing that the problem itself or a major piece of the problem itself is their communications infrastructure, how people work differently with one another, how they collaborate, et cetera, et cetera. All those elements of what we’re providing with these cloud communications solutions are fueling their digital transformations. I think that’s now being seen. Folks are more aware of that all the time and that’s why you’re seeing kind of everything change and move to cloud so quickly.
Tsahi: When you look at the communication market, for me, it’s like a Venn diagram with different parts of it. There’re unified communication and then contact centers and recently, we see APIs, these CPaaS communication platform as a service. When I look at what competitors do in this space, your competitors and unified communications, they end up going and doing something or adding stuff in the contact center. And then when they look at the APIs, usually go and say, “Well, we just put an API”; obviously they do because 2018, everybody uses an API on top of what they do. But you did something differently. You went and acquired the company called Nexmo and then their APIs, haven’t even touched it in a way and you left that to be a separate part of the business or a business all its own, with and without relationship to what you’re doing in unified communications.
Alan: The reason that we bought Nexmo is we have a view of what business communications is and will be that’s different than most. Most in the example have hosted PBX which has really been the principal use case of UCaaS or hosted contact center which has been the principal use case of CCaaS. In our view, those are just applications. Hosted PBX, moving your prem-based PBX to the cloud is a big TAM onto itself but it’s not necessarily an industry. The same applies to contact center. It’s not an industry. It’s simply an application or a use case which is really large and really important. But at the same token, the whole now new acronym of CPaaS, Communications Platform as a Service, says, “Well, there are other elements of communications that I want to simply program into my workflow, my mobile app, my business process, my website.” What have you. But have nothing to do with the contact center or the PBX.
Our view has been that we’re building a communications platform company. The whole notion of it is it’s a microservices architected platform. So we’re taking the Nexmo platform and our own Vonage Business Cloud platform and bringing those together. We refer to that internally as 1V, One Vonage. From that microservices architecture, you’re just going to serve customers in those big use cases. So whether you bundle several hundred of those microservices together in a use case called PBX or in a use case called contact center, or sell them one at a time that just get embedded into something else via the software APIs, it doesn’t matter. It’s the same platform. You’re just feeding where the needs are the greatest.
And the notion of this is that there’s not different industries, UCaaS, CCaaS, CPaaS. It’s simply communication elements, how they get deployed. The way I like to think about it is I go back to the music industry. We grew up, here’s songs and we can buy it only one way. Packaged, pre-published on an album. Apple came along and the cloud and said, “I’m going to unbundle the model and you can buy a song one at a time.” And then streaming services and subscription services have come along and the ability to mash up your music. They’re just different delivery models of the same song. It’s the way I think about cloud communications. There are communication elements, audio, video, messaging. Whether you package them in big applications like PBX or unbundle them as microservices, which is the CPaaS model, it doesn’t really matter. It’s just where the needs are the greatest.
Because at the end of the day, communication only serves a purpose. Does it make the company more productive? Does it connect my customers in a more personalized way with me as a company? And does it drive better business outcomes for my business? If it doesn’t do that, it doesn’t really matter whether you call it UCaaS or CCaaS or CPaaS. It simply has to drive those better business outcomes and that’s the approach that we’re taking.
Tsahi: Talking about Nexmo, they are now 12, 18 months part of Vonage now.
Alan: Almost two years. June 5th will be two years.
Tsahi: What synergies have you seen since the acquisition, up until today?
Alan: There’s been a great deal of synergies. You mentioned before about the Venn diagrams where much of the industry has developed as if the segments, UCaaS, CCaaS, CPaaS have been separate. We reject that. If they were all Venn diagrams, they all will be separate. Our view is they’re coming together all the time. So increasingly, the purchaser at a company, Acme company, is the line of business manager. The conventional wisdom used to be that if I’m buying UCaaS, I’m the CIO or the head of IT and if I’m buying CCaaS contact center, I’m the help center. And if I’m buying communications platform as a service, I’m an individual developer, perhaps even the CMO. What you’re finding now is it’s coming together as lines of business. Given that trend from a synergy point of view, we’ve organized since the acquisition, completely functionally so that the entire engineering team, Vonage traditional or Nexmo reports up to the same CTO. The product organization up to the same chief product officer. Sales under the same chief revenue officer, same with marketing.
And they’re already doing tremendous amounts of lead sharing within the groups, operational sharing, sales enablement, sales training and things like that. Because what we’re finding is that in the cloud PBX world, your salespeople don’t want to go out there and go to a customer and say, “Buy me because my hunt group or my auto attendant is better than the other guys.” Because this very sort of baseline functionality. What you want to do is go into your customer and have a conversation about better business outcomes. So they’re just naturally carrying Nexmo into the discussion with every prospect out there. You can look at every one of our large company wins. It began with a Nexmo conversation interestingly, more than just the feature set of the PBX or the contact center. So you’re seeing very, very natural synergies happen. Now, it’s not a cost synergy issue for us in terms of people. When we bought Nexmo, it was about 175 people. I think it’s above 300 today and as I recall last time when I was in our London office, there was 140 open jobs for Nexmo this calendar year, so we’re growing in a big hurry.
Tsahi: We’ve talked about the cloud, we’ve talked about API. There is another big buzzword these days around communications and that’s “Teams”. The notion of what Slack started in a way. Messaging inside groups, smaller groups which is more ad hoc than the usual grounded structured way of communications. And you see today Microsoft going there, Cisco going there. All the big companies are headed there and then next to you, you got Google and Amazon joining this specific space. How is Vonage preparing towards that future of team collaboration, enterprise messaging, whatever you want to call it?
Alan: So not to sort of disclose all the goodies that are coming but within our roadmap, we have some very, very interesting developments around the collaboration and work stream messaging space that will be coming out later this year. And that’s tightly integrated as a single app whether you’re mobile, desktop or browser, with the experience in the communications system. Now, it also will integrate well with the major players that you just talked about. Slack, Stride, Teams, et cetera. Or it’s going to be WebEx, et cetera. Because it has to.
In our view, we can’t play king maker and say, “Oh. Mr. Customer, Mrs. Customer, you cannot use these other collaboration tools.” That’s ultimately going to the decision of the customer. So we have to have our own solution that is built-in in a fully integrated way but then the ability to integrate in with the others and that’s the approach that we’re taking.
Tsahi: Can I ask a question that just occurred to me?
Tsahi: What about contact centers?
Alan: I think contact center is incredibly important as part of the integrated solution. And so today, we have a contact center built into Vonage Business Cloud which is our own proprietary call processing stack. And for our Vonage Enterprise Solution, we use BroadWorks contact center functionality. Then, in those situations where they need an advanced contact center solution, then we are a reseller of inContact. But again, it’s integrated fully in with our solution, so it appears like it’s a single experience. And then we serve it as if it’s a single experience so the contract is on our paper, the support is ours, things like that.
Contact center though becomes very, very important in the CPaaS market because so much of how communications get embedded in through some software API into that website, that mobile app, business process, what have you, is about customer experience. And so think of it as task routing. Somebody is on my website and they’re looking at my product and they have a question. Today, they may pick up the phone and call and have to start over because there was no context to what they were doing on the website, and these CPaaS type tools are all about the contextual. The software identifies the context to what I was doing.
So if was on Delta Airlines site trying to book a flight and I was 10 minutes into booking the itinerary and all of a sudden it had a problem, in the past, I’d pick up the phone and just call and have to start over because no one had any idea of the itinerary I was just trying to book. These new contextual tools that you can embed in, understand the itinerary so that it routes through the appropriate IVR into the contact center. So think of it as a task, an intelligent task. It knows I was trying to book a flight from Tokyo to Shanghai next Thursday and it will route me through the appropriate IVR to the person on the help desk for the international Asia markets.
And so you can envision from a customer personalization or a customer intimacy, rather than me having to start over which is what happens today, which is very frustrating to all of us. You can imagine the agent picking the phone up and saying, “Hi, Mr. Masarek. I see you’re trying to book a flight next Thursday from Tokyo to Shanghai. How can I help?” That’s a direct connection between the customer experience, routing the task into the contact center. We think that’s very important.
Tsahi: Let’s look a little bit into the future.
Tsahi: What do you think is the biggest challenge for the modern businesses moving forward from now on? When it comes to communications of course.
Alan: I’m not sure it’s a challenge. I don’t want to sort of split words between challenge and opportunity, but I actually think communications is going to fundamentally change by virtue of we’re no longer tethered to a physical device. We think about communications, I’m on a call, either a landline or a desk. In our vision for it, communications is in everything. So whether it’s a click-to-call or click-to-communicate functionality in the website or … Pick whatever app you want. You’re on Salesforce, I’m on an Excel spreadsheet, someone else is in G Suite or in Gmail, or in Google Sheets. Doesn’t matter. There will be click-to-communicate functionality everywhere and naturally, these microservices that are going to be created increasingly by these CPaaS type solutions. So you’re going to have I think this explosion in communications the way I think about it because you’re no longer tethered to anything physical. You’re in an app or a website or what have you.
And the way I think about it is your decision of how you communicate is simply going to be a function of the limitations of the physical device that you got onto the internet with. So for instance, if the device doesn’t have a camera, you’re not going to do video. If it doesn’t have a speaker and microphone, you’re only going to do messaging, that’s all you can. But the mode, video, audio or messaging is going to be the limitations of the device and your personal preference, also kind of situational. If you just stepped out of the shower, you’re not going to do video likely. So the point is regardless of how you’re interacting in some sort of app or website, you’re going have communication everywhere. So I think the notion of the challenge to companies is less the challenge and more that I think it’s going to change the way we work because the notion of how we collaborate, how we share, the tightness of the communication, sort of that feedback loop is going to get tighter, and tighter, and tighter is the way I think about it.
I actually think about communication, this renaissance or this explosion in communication a little bit like the internet 10 years ago. 10 years ago, there was no video flying around the internet. It was kind of more flat files and such. There wasn’t full-motion video. There certainly wasn’t virtual reality and things like that, and self-driving cars and all these stuff that is just massive quantities of data that are going around the internet. When that began, look what happened with all the content delivery networks. They just kind of went like this in terms of the volume of capacity they have on the internet. I think communications is going to go through this similar renaissance or explosion in the sense because if communications are everywhere, not just on specific devices, you’re going to be communicating all the time, and so I think you’re going to see this massive uplift in it. If it’s a challenge out there, it’s going to create sort of communication overload, perhaps, but maybe smarter people than use will figure it out on how to make it simpler.
Tsahi: And moving forward, would businesses end up building their communication needs on top of APIs, go pick a UCaaS or a communication solution to do that for them or go for even a very specific niche SaaS product to get what they need?
Alan: I think that increasingly, communications will be built on top of the platform, the PaaS product, not going and buying some monolithic application. Like you said earlier, everybody’s got APIs. The old way we used to write software, we write a big monolithic solution from the UI, the user interface, all the way down to the metal called PBX, in our example. I can open up APIs to the PBX but it’s not programmable. It’s simply an API into that monolithic solution. Where we sit today is a microservices architecture where it’s fully programmable.
And I think what you’ll see, and this is exactly the strategy we’re building to, is whether you want to use that big chunk of microservices in a particular use case that is as a big application like PBX or a big application like contact center, it’s just a function of what’s the best way to deliver it to a customer. Do I think people are going to build their own PBX all the time? No. Because I think to me it’s analogous to the vast majority of people don’t build their own computer. You certainly could. You could be a hobbyist and build your own PC and buy the motherboard and the chassis and the whole bit, but very few people do that when you go out and buy a computer for $400. So I think the PBX distribution model where it’s something you’re going to subscribe to, it’s a SaaS solution, will persist, but I think the microservices are really going to takeover where communications get woven into everything else.
Tsahi: Vonage in 5 to 10 years from now, where do you see the company itself? What are you going to sell to businesses, to consumers? What kind of services are going to be there?
Alan: Vonage in the next five years will be an extraordinarily different company than it is today. Let me go backwards first. Four years ago, we were 100% consumer. Now, this year in 2018, roughly 60% of the revenue is business. Business is growing really quickly. So as of last quarter, 22% growth organically, nothing to do with acquisitions. And consumer has been declining as residential home phone usage is in decline, by 12% roughly. Now that business is the larger of the two segments and growing at twice the rate that consumer’s declining, you can imagine where the line separate in a very big hurry. So the whole focus of the organization is on business. It already is. Consumer is still a meaningful piece, it’s 40% but it’s getting smaller all the time as a percentage of the total.
What’s interesting from a how we’re going to serve customers is precisely the way we do it today. Our whole approach from a platform perspective, the way I described it where irrespective of whether it’s UCaaS, CCaaS or CPaaS, coming out of a common platform, we will continue to execute on that. What’s interesting where I think a value unlock happens for the company is you’re now going to have … We’re already having consolidated revenue growth.
Last year, we did just above a billion dollars in revenue. This year, Wall Street has us close to a billion fifty. Again, as the smaller piece, consumer, get smaller and smaller, it’s mitigating impact and overall growth declines. Therefore, we’re sort of more and more of a consolidated growth company. Again, unrelated to any acquisitions, just purely organically. The notion then of, “Oh my goodness. You’re in the midst of a transformation” goes away because you’ve now transformed.
So where I can see us in pretty short order is serving our approach to our customers in this differentiated way which I think will withstand the test of time, will withstand competitive entrance because, the end of the day, we’re just rooted in how do we provide better business outcomes for our customers. But now you’re going to have this increasingly fast growing consolidated company, well greater than a billion dollars in revenue, highly profitable still and I think that’s going to be a value unlock for the story. When I go back to many transformational stories in the early days, there’s a lot of investor skepticism about transformational stories is most of them don’t work. This one’s worked and that’s why we’ve had sort of a almost quadrupling of our stock price over the last four years.
Alan: All right.
Tsahi: Thanks for your time, Alan.
Alan: My pleasure. Thanks so much. I enjoyed it.
Tsahi: Me too.
Alan: Sure. Thank you.
Tsahi: Thank you.
The post UCaaS, CCaaS & CPaaS: An interview with Alan Masarek, Vonage CEO appeared first on BlogGeek.me.
The Chrome Webstore has decided to stop allowing inline installation for Chrome extensions. This has quite an impact on WebRTC applications since screensharing in Chrome currently requires an extension. Will the [crayon-5b2272a8d9b0f447286991-i/] API come to the rescue? Screensharing in Chrome When screensharing was introduced in Chrome 33, it required implementation via an extension as a way to […]
The post Chrome Screensharing Blues – preparing for getDisplayMedia appeared first on webrtcHacks.
Now that it is getting relatively easy to setup video calls (most of the time), we can move on to doing fun things with the video stream. With new advancements in Machine Learning (ML) and a growing number of API’s and libraries out there, computer vision is also getting easier to do. Google’s ML Kit is […]
The post Smile, You’re on WebRTC – Using ML Kit for Smile Detection appeared first on webrtcHacks.
ML in RTC can fit anywhere – from low level optimization to the higher application layers.TL;DR – I am working with Chad Hart on a new ML in RTC report. If you are interested in it, scroll down to the end of this article.
Machine Learning (ML), Artificial Intelligence (AI), Big Data Analytics. Call it what you will. You’ll be finding it everywhere. Autonomous cars, ecommerce websites, healthcare – the list goes on. In recent years we’ve seen a flourish in this domain due to the increase in memory and processing power, but also due to some interesting breakthrough in machine learning algorithms – breakthroughs that have rapidly increased the accuracy of what a machine can now do.My ML Origin Story
I’ve been looking and dealing with machine learning for many years now. Never directly calling it that, but always in the vicinity of the communications industry.
It probably started in university. I decided to do an M.Sc because I was somewhat bored at work. I took a course in computational linguistics which then ended with me doing research in backward transliteration, looking at phonemic similarities between English and Spanish (#truestory). That was in 2005, and we used a variant of dynamic programming and the viterbi algorithm. That and other topics such as hidden markov model were my part and parcel at the time.
Later on, I researched the domain of Big Data and Analytics at Amdocs. I was part of a larger group trying to understand what these mean in telecommunications. Since then, that effort grew into a full business group within Amdocs (as well as the acquisition of Pontis, well after I left Amdocs for independent consulting).
Which is why when I talked to Chad Hart about what we can do together, we came to an agreement that something around ML and AI made a lot of sense for both of us, and taking it through the prism of RTC (real time communications), placed it in the comfort zone of both of us.
During that period, we thought a lot about what domains we wish to cover and what ML in RTC really means.Categorizing ML in RTC
Communications is a broad enough topic, even when limited to the type that involves humans. So we limited even further to real time communications – RTC. And while at it, threw text out the window (or at the very least decided that it must include voice and video).
Why do that? So we don’t have to deal with the chatbots craze. That’s too broad of a topic on its own, and we figured there should be quite a few reports there already – and a few oil snake sellers as well. Not our cup of tea.
This still left the interesting question – what exactly can you do with AI and ML in RTC?
We set out to look at the various vendors out there and understand what are they doing when it comes to ML in RTC.
Our decision was to model it around 4 domains: Speech Analytics, Computer Vision, Voice Bots / Assistants and RTC quality / cost optimization.1. Speech Analytics
Speech Analytics deals a lot with Natural Language Processing (NLP) and Natural Language Understanding (NLU).
Each has a ton of different use cases and algorithms to it.
Think of a contact center and what you can do there with speech analytics:
You will find a lot of speech analytics related RTC ML taking place in contact centers. A bit less of it in unified communications, though that might be changing if you factor in Dialpad’s acquisition of TalkIQ.2. Computer Vision
Computer Vision deals a lot with object classification and face detection, with all the derivative use cases you can bring to bear from it.
“Simple” things like face recognition or emotion recognition can be used in real time communications for a multitude of communication applications. Object detection and classification can be used in augmented reality scenarios, where you want to mark and emphasize certain elements in the scene.
Compared to speech analytics, computer vision is still nascent, though moving rapidly forward. You’ll find a growing number of startups in this domain as well as the cloud platform giants.3. Voice Bots & Assistants
To me, voice bots and assistants is the tier that comes right above speech analytics.
If speech analytics gets you to NLP and NLU, the ability to convert speech to text and from there moving to intent. Voice bots are about conversations – moving from a single request to a fluid interaction. The best example? Probably the Google Duplex demo – the future of what conversational AI may feel like.
Voice bots and assistants are rather new to the scene and they bring with them another challenge – do you build them as a closed application or do you latch on to the new voice bot ecosystems that have been rapidly making headway? How do you factor in the likes of Amazon Alexa, Google Home, Google Assistant, Siri and Cortana into your planning? Are they going to be the interaction points of your customers? Does building your own independent voice bot even makes sense?
Whatever the answers are, I am pretty sure there’s a voice bot in the future of your communications application. Maybe not in 2018, but down the road this is something you’ll need to plan for.4. RTC Quality & Cost Optimizations
While the previous 3 machine learning domain areas revolve around new use cases, scenarios and applications through enabling technologies, this one is all about optimization.
There are many areas in real time communication that are built around heuristics or simple rule engines. To give an example, when we compress and decompress media we do so using a codec. The encoding process (=compression) is lossy in nature. We don’t keep all the data from the original media, but rather throw away stuff we assume won’t be noticed anyway (sounds outside the human hearing range, small changes in color tones, etc) and then we compress the data.
The codecs we use for that purpose are defined by the decoder – by what you do if you receive a compressed bitstream. No one is defining when an encoder needs to look like or behave. That is left to developers to decide, and ecoders differ in many ways. They can’t brute-force their way to the best possible media quality, especially not in real-time – there’s not enough time to do that. So they end up being built around guesswork and heuristics.
Can we improve this with machine learning? Definitely.
Can we improve network routing, bandwidth estimation, echo cancellation and the myriad of other algorithms necessary in real time communications using machine learning? Sure we can.
The result is that you get better media quality and user experience by optimizing under the hood. Not many do it, as the work isn’t as high profile as the other domains. That said, it is necessary.Interested in ML in RTC?
Here are a few things you can do:Fill out our survey
This will get factored into the quantitative part of our report. If you fill it out, you will also receive a complimentary e-book we’re writing titled Intro to AI in RTC.
Interested in the report itself? Thinking of purchasing it? Great! We have a special launch discount.
You can find more information about the report itself in our research page.
Doing something interesting in this space? Share your thoughts with us.
Contact us via firstname.lastname@example.org to participate in our study.
The post Where does Machine Learning fit in Real Time Communication (ML in RTC)? appeared first on BlogGeek.me.
What should you be doing about the upcoming WebRTC 1.0 release?
That comic strip above? I think it embodies nicely what comes next.
We’ve started with WebRTC somewhere in 2011 or 2012. Depends who’s counting. So we’re 6 or 7 years in now.
I’ve been promised WebRTC 1.0 in 2015 I think.
Then again in 2016.
In 2017, I was told that WebRTC 1.0 is just around the corner. Definitely going to happen before year end.
Guess what? We’re now almost halfway through 2018. And no WebRTC 1.0. Yet.
But it is coming.
To give you the gist, Google will be ripping out some code, adding new code. Removing APIs. Modifying others. The timeline stated for all this in that posting?
Change is in the air…
That change is going to affect developers and testers everywhere, and the end result is going to be uncertainties and surprises in the coming months. How many months? Many months
There’s not much you can do about it besides allocating resources to the problem in the short and mid term future. These resources should be in development and testing.
I touched development in a previous webinar I did with Philipp Hancke when I launched the next round of my WebRTC training.
Now I want to talk about preparation aspects in the testing domain – what exactly you should be expecting moving forward.
To that end, I am a visitor of the upcoming WebRTC Standards webinar series. The webinar takes place later today –