bloggeek

Subscribe to bloggeek feed
The leading authority on WebRTC
Updated: 8 min 52 sec ago

Kranky Geek 2018. A post event post

Mon, 11/19/2018 - 12:00

For me, Kranky Geek 2018 was a tremendously fun experience.

We had our fourth Kranky Geek event in San Francisco last week. As usual, it is a nerve wrecking experience up until the point it ends. And it doesn’t start on the day of the event itself – we’ve been busy with content curation, handling presentation drafts and doing dry runs for a few weeks.

The result is quite satisfying. We’ve decided this time to dig even deeper into the domain of artificial intelligence and machine learning and its role in real time communications. As I’ve been saying, WebRTC is ready – so what would be the point of doing an event about WebRTC? We have a lot of WebRTC topics already covered from our past events – and they are all available in the Kranky Geek YouTube channel.

The way we see it, there are 4 domains we had to cover: speech analytics, voicebots, computer vision and RTC optimization.

So we went hunting for the event. In the end, we were able to cover all four domains and squeeze a few WebRTC specific topics as well.

The Sessions

This year, we had the biggest number of sessions. The event has become a full day event from a shorter one over the years. The people I talked to noted that the day was long and tiring, but somehow, almost everyone stayed to the end. Here’s what we had this year:

Our own welcome

Kranky Geek SF 2018: AI in RTC from Tsahi Levent-levi

One thing to note here – our AI in RTC report got a promotional discount of ~33%, which will be available until the end of the month. If this space interests you, then definitely check it out.

Discord

Discord operates a large chat operation for gamers. Part of that service includes voice and video calling. At peak, they handle 2.8 million concurrent voice connections to their service.

What they shared, was the changes they have done to the vinyl WebRTC code base in order to fit their needs.

Facebook

Facebook were kind enough to give a presentation around Facebook Portal – their new home device that is capable of handling video calls (using WebRTC of course). The device uses machine learning to track the people in the room during a call. They talked about the challenges that comes with automating the camera’s zoom and with connecting calls from Portal devices to mobile phones.

This was the first time they shared that information publicly at a conference.

Intel

Intel announced open sourcing their media server – the Intel Collaboration Suite for WebRTC – under the name of Open Media Streamer. They also shared information of svt-hevc, their open source HEVC encoder.

Voicebase

Voicebase talked about Paralinguistics – the way we speak as opposed to the words we are saying. They shared the path they took charting that space, and understanding what makes more sense or less sense in terms of value.

Voicera

Voicera discussed virtual assistants and how they need to understand transcriptions.

IBM

IBM explained the notion of voicebots and how it fits into contact centers. They explained the need to be able to handoff a voicebot to a human agent.

Nexmo

Nexmo showed a demo using Dialog Flow, connected to a voice service for ordering a pizza. It stressed the need to be able to connect communication services to various machine learning ones.

Dialpad

Dialpad explained how to take an open source speech to text engine and add some custom words into it in order to improve the accuracy of the transcription.

Callstats

Callstats clustered the sessions they are collecting, trying to figure out by that information the type of call and root cause of issues it may have.

RingCentral

RingCentral normalized MOS scores of audio calls across its network and devices, to be able to give a clear indication of call quality – it appears that while there’s a standard specification for MOS, asking device manufacturers to follow it to the letter is rather challenging, so using machine learning they are “fixing” that issue.

Google

Google talked about the current status and efforts in getting Chrome’s WebRTC implementation to 1.0 specification. It also shared the work being done to improve audio stability and performance in Chrome (lots of architecture changes in how devices get accessed in order to reduce the number of threads used and get a stable delay model for its acoustic echo canceller). There was also a look at what goes after 1.0 – WebRTC NV and what role may WebAssembly play there (I’ll write more about it in the future).

Agora

Agora showed how they use super resolution to improve video quality in calls, and what it means to run super resolution on a mobile device.

Houseparty

Houseparty used machine learning to improve video quality as well, taking a different approach. They shared the work they are doing and the effort it takes to bring it to production.

Microsoft

Microsoft shared the work done on WebRTC on UWP and explained how AR/VR fits into the story and the enterprise use cases they are seeing in the market.

Session Recordings

As always, all the sessions were recorded and are available online.

Kranky Geek in 2019

Every year we’ve done a Kranky Geek event, we came in with the notion that this is the last one. Not sure why, but that was always the case. Then about 9 months after the event, we started discussing with Google about the next event.

We’ve changed that this time. We are going to do an event in 2019, and we have a name for it:

Kranky Geek SF 2019

We have a tentative date for the event: November 15, 2019

Put it in your calendar.

We don’t yet know what the theme for next year will be, but I have a hunch that it will include WebRTC and machine learning

If you want to speak – contact me

If you want to sponsor – contact me

If you have feedback on what we should improve – you know – contact me

Oh – and if you are interested in AI in WebRTC, check out our report – there’s a discount available for it until the end of the month.

The post Kranky Geek 2018. A post event post appeared first on BlogGeek.me.

8×8 Acquires Jitsi From Atlassian. Winners and Losers

Thu, 11/08/2018 - 12:00

Jitsi was just acquired by 8×8, shifting hands from Atlassian. Here’s what to expect.

It seems that Jitsi has now switched hands, moving from Atlassian to 8×8.

Three months ago, Atlassian made a bold (desperate?) decision. It put up a white flag, decided to kill Stride, after investing in it huge amounts of money and resources, throw Hipchat along with it, and “sell” them to Slack, who “acquired” them.

The weird thing in this acquisition was that Jitsi was left behind.

Jitsi is an open source media framework. One of the most popular WebRTC frameworks out there. I wrote about that acquisition in 2015. The reason behind it was Atlassian’s need to own the video communications technically that powered Hipchat. And now that Hipchat is gone, what would Atlassian need Jitsi for?

The last 3 years

The last 3 years have been good for Jitsi in Atlassian.

The team of developers it had was big, considering its scope (and open-sourceness). Especially if you factor in the fact that everything that Hipchat (and Stride) needed from Jitsi was implemented directly inside Jitsi. Not on a private branch of the project available only to Atlassian.

Compare it to how Twilio treated Kurento after its acquisition… Atlassian did a great job at keeping Jitsi’s momentum and community. At the very least, it didn’t hurt the project, letting it grow and flourish, paying the salaries of its developers.

The interesting initiative that took place alongside the Jitsi open source project is Jitsi Meet – a free version of a group video calling service. One that wasn’t limited to a small number of participants or lower video resolutions.

Jitsi is in a better place than it were 3 years ago prior to its acquisition.

Leaving Atlassian

Leaving Atlassian was a matter of time.

There was no room in today’s Atlassian for an open source project like Jitsi that brings no added value to its commercial products.

Jitsi didn’t go to Slack as part of the Hipchat/Stride deal. Slack were already using Janus, and moving on to their own homegrown media server – something they shared with us at Kranky Geek 2017 (hint: come and join us this year at Kranky Geek 2018). There was no reason for them to further invest in yet another migration – or they might have wanted to migrate to Jitsi and acquihire the team but it didn’t pan out.

That left Atlassian with one of 3 alternatives:

  1. Kill the project and be done with it. Send the developers home or integrate them into some other parts of Atlassian. It would work nicely, but if the asset can be sold, then why not recoup some money?
  2. Spin out the project. Let the team go, giving them back ownership of the code, and have them go scrape for a livelihood around Jitsi. Probably by offering a commercial license, support and customization services, etc. – this isn’t that far out as an idea – it is how Janus (another open source media framework) operates today and how Jitsi operated prior to its acquisition by Atlassian
  3. Sell it to someone who’s interested in it. This is what it ended up doing. Given the other alternatives in front of them, I tend to agree with Andy’s statement that this is a mercy sale
Joining 8×8

8×8 acquiring Jitsi is an interesting choice.

Here’s where things get interesting:

8×8 already has a WebRTC based web conferencing solution called “8×8 Virtual Office Meetings Online”. Somewhere in 2016, this service got rewritten. At some point between then and now, guest access on Chrome was introduced. From the looks of it, based on WebRTC.

Why would 8×8 need/want Jitsi when it had a solution already?

I can think of three possible reasons for it:

  1. Their WebRTC solution isn’t that good, too expensive, and they were looking for a better alternative. Jitsi was a catch in such a case
  2. 8×8 is looking to own its video technology and not use third party software, commercial or open source
  3. They were using Jitsi for their 8×8 meetings thingy and Atlassian selling that assent was an opportunity for them to control the tech stack without relying on a third party – probably on the cheap

What would 8×8 do with Jitsi?

The obvious thing is to integrate the tech into its meetings service. If it is already there, then use the Jitsi team of developers to tweak and finetune the thing for the 8×8 use case.

If it isn’t there yet, then integrate it and replace its current WebRTC tech in the meetings app. This is a more challenging undertaking, as Jitsi will need to meet the current feature list of what 8×8 already has in that domain, along with integrating to an existing codebase of a service and an application.

Jitsi probably has most of the needed features to make this happen. It wouldn’t have been acquired otherwise.

On a different area, 8×8 has no real open source activity at the moment. Its github account is mostly forked repos. Searching for “8×8 open source” is dominated by the Jitsi acquisition news:

(the rest are comparisons to other vendors, who are leaning more heavily on open source)

If 8×8 is interested in embracing open source, then it just got an interesting opportunity to do just that. While brings me to the last topic –

The future of Jitsi

What will be of Jitsi?

Here we need to look at Jitsi and Jisti Meet separately.

Jitsi

The Jitsi Videobridge, along with its derivatives, add ons, plugins, extensions and client-side SDKs.

That’s the open source part of the project. At Atlassian, there was nothing kept for internal use of Hipchat/Stride. Everything found its way back to the open source project.

Will 8×8 continue in that path?

Their focus in the coming months is going to be the integration of Jitsi into their 8×8 meetings service. They are bound to use the resources of the Jitsi team to do that.

Managers may decide to implement some of the features in the 8×8 meetings service moving forward and not invest in adding it to the Jitsi open source project. Or they might decide to add everything via Jitsi.

8×8 might end up taking the extreme – ditching the Jitsi project as an open source one – embed it into their meetings app and from there on, invest in that privat branch only. I see that as a highly unlikely outcome in the next 2-3 years.

Time will tell which direction is taken.

Jitsi Meet

Jitsi Meet is a different story altogether.

It is a group video meeting service. One which doesn’t limit the users’ bitrate in sessions, doesn’t limit the number of users in a session, offers mobile apps, Slack and calendar integration and scales globally. All for free.

Would 8×8 see it as competition to their own 8×8 meetings app? If it grows in popularity and its maintenance costs increase, how happy would 8×8 be in paying the bills? Would it see Jitsi Meet as a sales tool for its other services? How would it measure the success of this service?

Whatsapp’s founders just left Facebook this year. It was over disputes about data, privacy and such. Most of all, it was probably a dispute around the future of Whatsapp and Facebook’s intent of monetizing the asset. The same (at a much smaller scale) can happen here at some point.

How would 8×8 monetize Jitsi Meet? Should it? If it doesn’t, should it kill it?

I don’t know the answers. I am sure 8×8 doesn’t either. It is just too early to tell.

Last Words

Jitsi is an open source success story in WebRTC. There’s no doubt about it.

It is now entering a new chapter in its life, under 8×8.

I wish the team the best of luck and us as an industry to have the option to use Jitsi for our future projects.

Media Frameworks are part of the picture of the backend story of WebRTC. Care to learn the rest? Try out my free mini-video series on WebRTC backedn servers:

Register to the video series

The post 8×8 Acquires Jitsi From Atlassian. Winners and Losers appeared first on BlogGeek.me.

Meet me @ Kranky Geek San Francisco 2018

Mon, 11/05/2018 - 12:00

Kranky Geek is happening this year again, the date is Nov 16, and we’ve got the best lineup of speakers for you.

Kranky Geek started almost by mistake. Like most good things that happened to me. It wasn’t planned. The result though is becoming a tradition by now, where I get to work with Chris Koehncke and Chad Hart for a period of time that can be considered quite intense (we’re all too opinionated).

Google, along with our other sponsors make this event happen. We only curate the content to make sure the end result is great.

In last year’s event, we started looking at the domain of AI. You can find the recordings of that event on YouTube. The feedback we got was positive, so this year we’re taking a step further here. Many of the sessions will focus on machine learning and AI and its impact on real time communications.

What’s on the Agenda?

AI in RTC.

As always, our intent here is to focus as much as possible on services and applications that are running in production already. It won’t be theories about what can be done but what are people doing. Today.

The updated agenda can be found online. It might change a bit in its ordering, but it is mostly ready.

This year, we have some brand new speakers for you:

  • Discord will be giving a session about their service and what they had to do with WebRTC to make it work for their use case. My suggestion? Read their post to get ready for this session – it will be really interesting
  • Houseparty are joining us for the first time as well. Tinkering with machine learning on device. One of the main challenges these days is deciding where to run inference with machine learning – on device or in the cloud. We will see both options throughout the day
  • Agora will explain what they are doing to improve video quality in real time on mobile devices by using machine learning
  • Voicera will be talking about the challenges in speech recognition when it comes to handling meetings
  • Dialpad are there to talk custom vocabularies. Every company has that. How do you transcribe Kranky Geek? That’s a question I’ll ask in the Q&A of this session…
  • Intel will discuss newly open sourced visual processing tools to help you build out your application
  • RingCentral is joining us late in the game. We’re figuring out with them a stellar topic for the event

We also have some “repeat” speakers:

  • Facebook this year will give us a sneak peek at the technology (and AI) behind their new Facebook Portal device. What I am really keen on hearing is what decisions they made to get their “follow you around” feature to work
  • Voicebase will focus on paralinguistics this time. The nuances of speech that aren’t text – and how to capture their meaning
  • Callstats will be discussing this time the use of looking at ongoing call data using… machine learning
  • IBM will be all over voicebots and their uses in contact centers. We will get to look under the hood on how these get implemented
  • Nexmo are going to show us the complexity of connecting real time voice streams to cloud based speech to text engines. (technically, there are a new speaker, but I figured that now that TokBox is part of Vonage which also owns Nexmo, they are repeat speakers)
  • Google will give an update on Chrome’s implementation of WebRTC, with a focus on 1.0. They will also give a deep-dive into the upcoming architectural changes in Chrome’s audio processing engine
  • Microsoft is going to give us a demo of WebRTC, Mixed/Augmented Reality and HoloLens. And we’re saving this for last so you’ll stick around

We are expanding our family of Kranky Geek speakers and Kranky Geek companies, which is a true joy. I can’t wait to hear your feedback once the day is over.

Our sponsors this year

As always, the event is practically free to attend (there’s a $10 admission fee that gets donated to Girl Develop It).

The companies that made this event happen this year are Google, Intel, Agora.io and Nexmo who are our premium partners for the event; Callstats.io ,Voicebase and RingCentral who are our silver partners for the event.

No fire drill

I am not sure if this is good or bad. We had a surprise fire drill last year. We knew about it about a week or two before the event. It cause so much headache for us. And a lot of worries.

It ended up pretty well, with our audience and speakers getting a one hour break outside on a beautiful sunny day. Almost all of them came back after the drill, which isn’t obvious or even expected.

Many were happy for the break – and the smalltalk that ensued during it.

Hopefully, there will only be pleasant surprises this year as well.

What are we looking for in Kranky Geek?

We had to turn down a few vendors who wanted to speak. This is a process that takes place every year.

There’s no specific set of rules of what we approve or don’t as a session in Kranky Geek, but for me it boils down to this:

  1. Something new that wasn’t discussed at Kranky Geek before
  2. Preference to something running in production at scale
  3. An interesting topic that would appeal developers
  4. Related to real time communications
  5. A speaker that can “hold a room”

While the lineup of speakers for this year is full, if you want to speak in future Kranky Geek events – be sure to catch me during the event for a chat.

Should you travel just for this single day?

I got this question a few times in the past few weeks.

My guess is that if this is the only thing you’re doing in San Francisco and coming for, then skip it. Especially if you are traveling from abroad.

That said, if you want to feel where WebRTC is headed, talk to many of the people who deal with it daily in the real world, then this is the place to be. So many discussions take place during the breaks that it might be worth coming only for the breaks… I know a person or two that are coming only for that.

We try to make Kranky Geek special and unique. We work hard to select the speakers and work with them on their presentations. All to make it worth your travel, wherever you come from.

Can non-developers attend?

We received this question recently.

There is no easy answer to this one. On one hand, the event and its session are technical in nature as our focus is developers. On the other hand, the sessions are short (20 minutes all-in-all), so our speakers tend to focus on the essence and not dive too deep into the nitty gritty details. So a tough call.

My suggestion? Check out some of the session recordings on YouTube from past events and make your decision based on that.

Register now

Yes. there’s this minor detail.

You need to register to attend. There’s limited room capacity, and at some point, we will need to close the registration.

We’re already half full in our registration list, so save your spot now and don’t wait.

Register NOW

 

 

 

Do you want to meet me prior to the event?

I’ll be in San Francisco Nov 12-17. Nov 15-16 are reserved for Kranky Geek. The rest for meetings with people – around WebRTC, CPaaS, testRTC, my WebRTC course, consulting and just catching up.

If you want to meet me during that week, leave me a note.

The post Meet me @ Kranky Geek San Francisco 2018 appeared first on BlogGeek.me.

Are Embeddable Video Experiences Necessary?

Mon, 10/29/2018 - 12:00

There’s no one size fits all in communications. In video, that means that embeddable video experiences are necessary and they are here to stay – they aren’t a passing trend.

Source: Vidyo

Years ago, before WebRTC came into our lives, I worked at a video conferencing company. My role there at the time was CTO of the business unit dealing with licensing VoIP technology to others. The leading product at the time, was a video conferencing client that can fit into device and able to interoperate in SIP and H.323. As a CTO, I was given the initiative of getting us into the cloud, which ended up involving something that was meant to become a CPaaS (just not using that term as it didn’t exist). It never came to fruition since I left the company a bit after WebRTC was announced and I knew where the future is headed.

Anyway, one day I was asked to take a business trip to the US, to meet with customers and potential customers. One of these customers was a vendor involved in the prison industry (not sure what’s the whitewashed term for that is, so just using prison industry).

Video Conferencing in Prisons

To clarify: I am not taking a stand here around prisons, prisoners or video conferencing in prisons. Just sharing this as a requirement that I’ve seen in the past.

What they were doing was building “phone booths” for prisoners so they could call home and talk to friends and family. They were in the process of shifting towards video calling, and were using at the time one of the known brands – I don’t remember which. Think of Polycom or Cisco video conferencing systems for reference.

Source (somehow, the happy faces seem exaggerated for the use case)

The challenge was in the fact that these vendors and their solutions were geared towards video conferencing in the enterprise – what we now wrap under the term of unified communications. This meant that a lot of the features and requirements that a vendor developing a communications service for prisoners were hard or impossible to meet:

  • Full moderation of the call by a third party at all times
  • Ability to join the session as a silent or known participant (that’s the moderator)
  • Ability to manage and control session length
  • Knowing the identity of both people in the call, but having the system flexible enough to accomodate for new users and guests in the system
  • Wrap the whole experience with other features (browsing) that prisoners might want to use

They ended up licensing our technology to build it all, at prices that today would seem ridiculously high, though made sense at these days, when real time communications technology wasn’t a commodity and wasn’t open sourced.

If we’re at the domain of anecdotes, funnily enough, we’ve been using GIPS for the audio codecs at that time on PCs. The same company that Google acquired and built WebRTC out of.

Back to Embeddable Video Experiences

Prisons and prisoners aren’t the real story here.

Embeddable video is.

Communications between humans is something that can’t really be placed into a set of known rules.

Yes. We’ve had the telephone companies around for 120 years or so, explaining and educating us on how to communicate with each other remotely.

Unified communications has a gazillion of features dealing with telephony, trying to accommodate each and every eventuality that a customer may want and need. Which is nice, but from a certain point, it is really hard to scale across customers with different needs.

Video conferencing has been the hardest of all. Video is hard, so everything about it is hard as well.

This all meant that communications was always a service. Something you get “out of the box” as is. Or something you can customize if you are big enough, with enough money to pay.

WebRTC, cloud, virtualization, SaaS and a few other terms came into our lives. What they essentially did was reduce the barrier of entry for those who need video communications. This meant that scenarios that weren’t catered for with enterprise video conferencing were now possible to achieve at lower price points.

The end result?

We are now seeing video communications being embedded in places where it never really existed.

Are these new?

They are and they aren’t.

They aren’t because the need was always there.

They are because only now they can be satisfied commercially.

The only question that remains is where do you see embeddable video contributing to your business and how do you go about implementing it. In the last few months, I’ve been working with Vidyo on a research around this topic exactly.

Interested in the state of embedded video in 2018? Download the free report here.There’s also a joint webinar on the topic coming up – be sure to register to it:

Register to the free webinar

The post Are Embeddable Video Experiences Necessary? appeared first on BlogGeek.me.

WebRTC is Ready. Now What? (a look at the state of WebRTC in 2019)

Mon, 10/22/2018 - 12:00

There should be no doubt about WebRTC anymore. It is here and it is ready for everyone. The question is: “now what?” Where are we headed with WebRTC in 2019

Is WebRTC Ready Yet?

That was the name of a website that tracked how well is WebRTC adopted by the various browser vendors.

Apparently, it is also the most common question on Google about WebRTC:

It is time we say it outloud (I don’t believe anyone has done that up until now):

WebRTC is READY

I was asked to speak at Apidays Amsterdam last week, which was a true joy. The topic I was tasked was around WebRTC being a standard, and well… where are we headed next. So I decided to rephrase it a bit and ignore that tiny bit of a fact that WebRTC 1.0 still isn’t an official standard (nobody but those in standardization organizations and those opposing to adopting WebRTC seem to care either).

So I sat down to think what does it mean that WebRTC is ready. Which led to this question:

Why I think that WebRTC is ready?

The best way for me to answer that question was to give 3 recent examples on things happening with WebRTC (and I don’t mean Uber doing VoIP using WebRTC):

#1 – VP8 Supported by Safari

I’ve been a critic about Apple’s non-support of WebRTC and then Apple’s non-support of VP8.

The fact that Apple decided at the time to support only the H.264, a royalty bearing video codec, and ignore VP8, the royalty free alternative, wasn’t a good sign.

In the past two weeks, tweets and webkit bug links have been flying around, indicating that if the mountain won’t come to Muhammad, then Muhammad must go to the mountain. Or more accurately, that Apple decided to do a Microsoft and support VP8.

Do a Microsoft because this is the same steps Microsoft took when going WebRTC. Starting with H.264 and only later adding VP8.

So Apple has started with H.264 and only now adding VP8.

When will this be available for all? Ask Apple.

What’s important is that ALL modern browsers now support both VP8 and H.264. More on that in a sec.

It doesn’t stop there either. Apple joined the Alliance of Open Media as a founding member. This alliance is behind the future video codec AV1, and now has 40 members in it.

#2 – H.264 Simulcast Support

The second example is H.264. It is now becoming a first class citizen.

H.264 on Chrome didn’t have simulcast support. The “fix” for that was available for quite some time, but was never incorporated into Chrome. Simulcast increases the quality of group video calls, so not supporting it in H.264 made H.264 useless for group video calls.

There can be two reasons for this feet dragging by Google:

  1. Timing and priorities. Google didn’t really care enough to add that in and deal with the headaches of pushing code from a third party with the fix and validating it
  2. The push towards VP8. Increasing the quality of H.264 would get more developers to adopt it, especially when Apple supports only H.264 on Safari

Since VP8 is coming to Safari, the reason to give it an edge over H.264 isn’t there anymore. Especially considering the healthy growth of the Alliance of Open Media.

The end result?

  • All modern browsers support VP8 (Safari support is imminent)
  • All modern browsers support H.264; and simulcast will soon be possible for it
  • VP9 is available only in Chrome and Firefox for WebRTC – but who cares? The future will be AV1. And ALL browser vendors are part of the Alliance of Open Media where AV1 is getting specified (YouTube is already testing AV1 decoding in Chrome and Firefox)

This media codecs disparity between browsers was the main challenge for the WebRTC community. It is now behind us.

#3 – Google Shifts Focus

That third reason why I believe WebRTC is ready?

Google is shifting focus. It is doing what is needed to support WebRTC and the migration to the 1.0 specification (unified plan for example), but its heart and mind is already elsewhere:

At the beginning of this month, Google announced Project Stream – a cloud based service that streams high end games from resource intensive cloud based machines to low end devices in real time.

There’s not a lot to go on about the technology, but it seems to be based on WebRTC.

Project Stream official gameplay capture: 1080p@60fpshttps://t.co/SjznbRCBAP

— Justin Uberti (@juberti) October 2, 2018

Why else would Justin Uberti from Google’s WebRTC team publish this? 1080p resolution at 60 frames per second with low latency for gaming. This type of a use case is different from real time communications. It requires a different focus and optimizations. And yet… the WebRTC team at Google have probably spent some cycles on supporting it.

Why is that a good thing?

Because for Google, WebRTC is ready when it comes to real time communications, and beyond optimizations and house keeping, it is time to move on and look at other use cases where WebRTC can be beneficial.

What’s Next?

So. WebRTC is here:

  1. Apple supports it now; and there’s codec parity across browsers
  2. H.264 is a first class citizen in WebRTC
  3. And Google has moved on to other use cases for WebRTC

What’s next for WebRTC?

The answer I gave in that presentation at Apidays was Machine Learning.

I like that slide above. I like it because you can take RTC out of it, replace it with whatever word/term/industry you want and it will STILL be true.

In the rest of that presentation, I went over the research report that Chad Hart and I have written, sharing some of our findings.

I went into the 4 domains we’ve mapped in our research, in each giving an example of the impact and use cases that are now possible:

  1. Speech analytics, and how we’re shifting from offline processing to real time
  2. Voicebots, and how work in that area is accelerating
  3. Computer vision, where use cases are vastly different between consumer and enterprise settings
  4. Media optimization, and the shift from heuristics to machine learning
That Deck from Amsterdam

That slide deck from Amsterdam is now available online as well. You can view it here:

WebRTC is READY. What's Next? from Tsahi Levent-levi Machine Learning and Real Time Comms

If you are interested to learn more about machine learning, to be able to make smart decisions in your own company about the use and introduction of machine learning and artificial intelligence in a communications application, then definitely check out our report: AI in RTC

The post WebRTC is Ready. Now What? (a look at the state of WebRTC in 2019) appeared first on BlogGeek.me.

Can Google RCS Win the Messaging Game Through AI?

Mon, 10/15/2018 - 12:00

RCS is being brought from the dead by Google, and its next play will probably be with AI.

Carriers have a problem

SMS won’t stay here forever. In fact, most of the messaging traffic is happening on social networks now.

Voice is shifting as well. Migrating to these same social networks. With the ability to upgrade these calls to video calls. With stickers. And silly hats, cat lenses and whatnots.

Want to learn more about the use if silly hats and other AI features in communications? Check out our AI in RTC report preview

Download the preview

Their circuit switched network technology is decaying, left in its 80’s or probably 50’s. Most of what goes on there is spam or OTP passwords anyways. Nobody cares.

So much so that Google is planning on diverting incoming calls to its assistant (but more about it later).

The solution, in the form of IMS and later RCS (or call it Joyn or whatever other branding it was given throughout the years) are some 20 years in the making. And they don’t seem to be coming any time soon. At least not if left to the arduous processes of carriers and their suppliers.

Google has a problem

 

A VERY different problem.

Google has no messaging clout.

For consumers?

Apple iMessage wins on iOS. It acts as a Chameleon, catching up your messages and deciding if they should be demoted to SMS or use modern messaging via iMessage instead.

Facebook with Messenger and Whatsapp is ruling supreme in Android, and in many cases on iPhones as well. Where they aren’t as strong, you’ve got a slew of other social players with 100+ million monthly active users. None of them looks like a carrier. And none of them is Google.

Google has Allo, Duo, Chat, Meet, Hangouts, Messages and probably a few more apps that I’ve forgotten to mention. All in different states and capabilities; but none which is dominant compared to its competitors. Actual monthly active users and amount of real messages going between users? Not shared. Probably not stellar.

And Google has RCS..

For businesses?

Apple, Facebook and others are adding APIs. Introducing bot platforms. Building marketplaces. And they are doing it slowly, fearful of becoming the spam cesspit that is the good ol’ carrier communications tech today.

Slack is killing it. And the rest of the cadre of UCaaS and enterprise communications players are trying to move into their space.

Google has Meet and Hangouts Chat. Part of G Suite. Meet gets used. Hangouts Chat I don’t really know. But it seems that most just skip it and move on to Slack or some other tool.

Google also has nothing similar to a business angle to its consumer facing communications applications yet, or at least nothing popular enough.

What’s new in RCS land?

Nothing really.

I’ve written in April about RCS being still dead. For some reason, Google is still hammering away at it. Similar to Google+ if I need something to compare it to.

A press release last month by Samsung and Google brings Samsung to the RCS graveyard. New Samsung devices, and maybe layer older ones will come -gasp- with a Samsung Messages app that will work seamlessly with the Android Messages app using each other’s RCS technology!

This interoperability nightmare of the carriers will continue on, leaving RCS dead.

Adding new carriers or smartphones or chipset makes into the fold won’t help either.

And it isn’t as if Apple is making any noises of being interested in RCS, and why should they be?

That said, there are those who will be adopting RCS.

We are shifting towards an omnichannel world. No single protocol to rule them all. No single vendor to rule them all. You want to send your message as a business to a consumer?

You can use SMS. Or better do it over Messenger or Whatsapp or Apple Business Chat – there’s more context and richness in those, and consumers actually care about these channels. Which brings us to a place where businesses just need to support wherever their customers are with no decent common denominator.

And wouldn’t it be great if we could throw SMS and use RCS instead? At least where we can?

So CPaaS vendors are adding support for RCS and announcing it in their arms race to world domination by collecting as many social messaging icons as they can.

That’s great, but not enough to save RCS.

Can Google change RCS predicament?

Not really.

There are just too many players and this is a domain where Google has been struggling to go it alone as it is.

Here’s what it takes to bring RCS properly to the masses:

Chipset vendors

Chipset vendors are at the bottom of the food chain, but they need to offer their support to make RCS happen.

Unlike other messaging services, RCS is “bolted” on to the identity of the user and his device. The SIM card. The ability to connect the end user, through an application, to the SIM card, and from there to the carrier network is what presumably makes RCS different. But for that to happen, chipset vendors need to pave the way, even if just a little bit.

Handset manufacturers

Handset manufacturers need to make sure that the RCS application is there implemented, supported and pre-installed in the device.

Without being pre-installed, users will need to pick and choose between an RCS app from a handset manufacturer or a carrier (the word bloatware comes to mind) OR pick Whatsapp instead. The choice is a simple one for most.

They need to make the application attractive and sleek. Things they can’t really do. Competing with current successful social messaging apps requires a lot of investment. Nailing the user experience is a lot harder than it looks.

Carriers

Carriers need to actually support RCS. As a service. In their network. And have these things called mobile phones that support RCS. and enough people that have these devices so they can actually talk to each other.

Preferably, all carriers within a country should light on the switch on RCS simultaneously.

How likely is that to happen?

Single, very complex specification

And all of these players need to do so for a very complex IMS/RCS specification.

Testing the combinations of devices and networks is going to be hellish, especially for those who aren’t going to just select the default Google implementation of RCS client/server.

Which is exactly what Samsung decided to do. Have its own service and then interoperate it with Google’s. I can easily see other big players – chipset vendors, handset vendors and carriers who would be either scared shitless of ceding control to Google or not magnanimous enough in letting Google take control over that piece.

This headache also suggests something really important:

If RCS succeeds, it won’t move as fast as any of the other social networks in introducing new features, services and capabilities

There are too many moving parts, controlled by different players, some of which doing the same things.

Network effects

Then there’s the network effects.

When can I use RCS on my phone?

It needs to be installed there. Probably pre-installed.

The people I communicate should have it as well.

Our networks should support it.

Oh – and there’s this minor detail of me actually going into that app to send a message.

How many times this week have you clicked on this icon on your Android phone?

What about these icons?

Enter Artificial Intelligence

I’ve been thinking about it for quite some time.

How can Google become relevant in messaging?

It is unlikely to come from features and capabilities at the core of social messaging. None of its services stick:

  • Google+ was “shutdown” publicly this month. Google found a great excuse – a potential security flaw
  • Duo was supposed to compete head-on with Apple FaceTime, offering things like faster connections and knock knock feature. But what have we seen from Duo since its launch? And are you using it at all?
  • Allo was interesting, but got no adoption. It got halted on April if you believe the news
  • Hangouts is being replaced by Meet, at least for the enterprise. Will it be shut down for consumers? Time will tell
  • Hangouts Chat is only starting its way, though I haven’t heard anything at all since its public launch
  • Meet works just fine. For the enterprise. If you have a Google account
  • The Google Messages app is purely for SMS. And it is crappy to say the least. It doesn’t respond as fast or as fluid as other social messaging apps, and frankly, I don’t really care about the technical reasons for it

The one thing Google has going for it is AI. in droves.

Which is probably why Google Duplex is reportedly rolling out next month, helping phone users book tables at restaurants – on their behalf.

It is also why Google is now adding to its Assistant the ability to screen spam calls:

These AI features have a potential to actually succeed. They don’t really relate to RCS or even messaging, but they are about telephony.

Allo was about messaging. As reported on The Verge in the April Allo pause:

As part of that effort, Google says it’s “pausing” work on its most recent entry into the messaging space, Allo. It’s the sort of “pause” that involves transferring almost the entire team off the project and putting all its resources into another app, Android Messages.

Google won’t build the iMessage clone that Android fans have clamored for, but it seems to have cajoled the carriers into doing it for them. In order to have some kind of victory in messaging, Google first had to admit defeat.

That’s the Google RCS effort right there.

If you take the AI related features in Allo, and think of them as getting Google Assistant into Messages, the Google RCS app, then it makes sense in a way. But not enough sense.

The Google Assistant doesn’t feel like a product by now. It is a large set of features and capabilities that can be used to add smarts into phones. It is a window to the phone’s (and Google’s) AI for the consumer.

Limiting it to run for RCS only doesn’t seem like the right thing to do. Would it be enough to save RCS? Would it be enough for Google to gain back users from other messaging apps?

It is too early to say, as none of it as come to fruition in an app customers can use.

Google could have tried to do with Allo the same things it is doing with its Contact Center AI:

Provide the whole AI for communication part as an API, a set of building blocks for others to use and embed. It worked so well for them that it got many in the industry lining up to partner with it in contact centers. Launch partners for the Contact Center AI include Mitel, Genesys, Vonage, Cisco, RingCentral, Five9 and Twilio to name a few.

Would such a thing work with social messaging apps?

Apple wouldn’t touch it with a long stick for its iMessage.

Facebook wouldn’t either. So no Messenger or Whatsapp.

Telegram? I don’t see that happening.

WeChat? Chinese.

Who would they be left with? The smaller players, who might grow, but none seem to be rising above white noise level.

Which gets us back to Google itself. With Messenger/RCS/Chat.

What Google needs to do is find the sticky features that will get users to use its app. Those that can get value out of it even when the other participant isn’t using the same app. Add smarts into SMS itself, while providing a rich experience to the user when interacting with others who have that app.

The real question is why limit this to RCS and carriers? why not just offer it as the out of the box Android experience to everyone? Have it there by default. Let people download and install it on older devices and on iPhones.

Probably because Google still believes it relies on carriers for its Android success. Which is what’s keeping it back in mobile social messaging since Android came to our lives.

Want to learn more about the use if silly hats and other AI features in communications? Check out our AI in RTC report preview

Download the preview

The post Can Google RCS Win the Messaging Game Through AI? appeared first on BlogGeek.me.

WebRTC vs Zoom. Who has Better Video Quality?

Mon, 10/08/2018 - 12:00

WebRTC vs Zoom? WebRTC is actually quite good. But you knew that already – didn’t you?

They say quality is in the eye of the beholder. So behold.

We’ve all been told once and again that this video conferencing vendor or that video conferencing vendor work great. They offer the best quality. The best experience. They work in conditions that others don’t.

I even had a call once with an entrepreneur that explained to me how he is going to offer a service that is better in its 1:1 video quality than Skype and Google Hangouts. And he is going to do it with WebRTC. I spent the better part of that call to get him off that idea (something about his logic was off there).

But I am digressing.

As many others, I’ve been told time and again how Zoom is great. How in spite of the fact that it doesn’t work in the browser and forces you to download its client (some even refer to it as a virus), it gets traction and adoption. It feels like it is the best game in town. And then they mention the reasons:

  1. It’s free (until it isn’t, which is a great business model if you can make it work, and Zoom is making it work)
  2. It has better video quality than the competition. Especially WebRTC

I am not the only one who needs to listen to it, and even believe it to some extent. The guys at Jitsi got curious – why not put it to the test?

So they took a Mac device, placed it on a WiFi network, added a network limiter so they can fiddle with the network configuration, and did a 1:1 call. Once with Zoom. And once with WebRTC.

Idea is this – start with as much bandwidth as the video call wants. Then limit it to 500kbps. Check how much time it takes to adapt. Remove the limit and change how much time it takes it to adapt back. More about it in Jitsi’s blog.

Essentially – testing for this network conditions:

The longer that marked areas, the worse the experience is going to be for the users.

And guess what? Zoom faired worse than WebRTC. Not a little, but a lot worse.

Full adaptation to limiting the bandwidth took WebRTC 20 seconds. It took Zoom 156 seconds (!).

Ramp up back to 2mbps took WebRTC 32 seconds. It took Zoom 62 seconds.

Now here’s my analysis of this.

WebRTC Rocks

Yap. it really does.

The screen capture from that Zoom blog post that was pasted by Jitsi?

Stating that “web-RTC is a very limited solution that would not allow us to provide all the excellent features that our users have come to expect from us”?

That’s from 2015.

A lot have been improved in WebRTC since then, if that explanation was even correct in 2015 to begin with.

Without the need for most of us to do anything, we’re getting updates to a top notch media engine in the form of WebRTC inside the browsers we use. The code used in Chrome are open sourced, so they are accessible to all to embed it in their own applications as well.

Security fixes? New codecs? Improved media algorithms? They just “happen”. Out of thin air. For most of us.

Defending Zoom

If I look at it from Zoom’s point of view, besides the fact of being a dominant player in the market with or without WebRTC, here’s the challenges with such a test scenario:

  • It was done once, or a few times. But it is still only one scenario
  • It wasn’t a real life scenario. Just something concocted for this. Jitsi could have rigged it and tweaked it so that WebRTC would shine, but in real life, that doesn’t happen, and at Zoom we’re optimizing for real life scenarios
    • (that isn’t really so. From my experience and knowledge of the Jitsi team, I’d estimate they tried to be VERY careful here to not fall into that trap)
    • (and what’s real life scenarios anyway?)
  • The network limiter used changes behavior in ways that aren’t close enough to reality
    • (that I can understand and live with. We see faster uptake of the same type of scenarios for WebRTC at testRTC – more on that later)
  • Zoom might be working through external remote servers for that same session while WebRTC is going peer to peer on the local network. Servers behave differently than clients, so the results seem somewhat “off”
  • In other scenarios, Zoom might actually be better than WebRTC

Which leads us to the fact that more tests are needed to know which one is best and in which scenarios.

This starts to sound like the VP8 vs H.264 quality comparisons of the past (I never could tell the difference).

It’s the Infrastructure Stupid

With WebRTC, it all boils down to the infrastructure. The one with the better deployment wins the quality game.

  1. Do you peer to peer for 1:1 sessions and seamlessly switch to SFU architecture when more participants join?
  2. Where are your media servers located?
  3. Do you cascade the session across media servers to improve quality?
  4. Do you provide feedback to the user about the network conditions?
  5. Do you switch video off when there’s not enough bandwidth?
  6. How are you managing things like FEC, simulcast, SVC, … ?
  7. What about mobile and native app support?

And the list goes on.

With vendors who use proprietary codecs and transport protocols, this is doubly so, as they need to cater for the browser once they reach WebRTC. So while their native apps might be optimized, it might all go down the drain once they transcode or just “translate” to reach the browser using WebRTC.

Need to understand WebRTC and how to design and architect real world solutions with it? A first step is to understand the servers used to connect WebRTC.

Join a free video course on WebRTC servers

Which brings us to why someone like Zoom should use WebRTC and thing about the quality issues once connecting to it:

You Need WebRTC

Zoom already supports WebRTC. I just found out when I searched for stuff to write this article: there’s a Zoom Web Client

It runs on Chrome and enables using audio in Chrome when joining meetings. No video, probably because transcoding the proprietary video codec Zoom uses to the ones in WebRTC is too complicated, but using G.711 or Opus in the browser and transcoding or using the same in Zoom is way simpler.

Zoom is going through the same phases that Amazon did with Chime:

  • Amazon Chime started with a downloadable client
  • They then added limited browser support that enabled users to view the screen shared in the browser and connect via the phone without the need to download the client
  • Later on, audio support was added to the web client
  • And recently, video got supported
  • Screen sharing and remote desktop control still doesn’t work. I’d say it is a matter of time

This exact same path has been happening to other vendors in one way or another.

Why not Check Your Own Service?

While writing this article, it dawned on me, that this is one of these scenarios that is ridiculously easy to simulate using testRTC, so I went ahead and created a script that does just that:

  • Loads up Jitsi with 2 participants. That should cause them to work peer-to-peer
  • Run the call for 1 minute unhindered
  • Limit bitrate to 500kbps and run for 2 more minutes
  • Remove bitrate limit and run for 2 more minutes

Here’s how the main part of the script looks like:

   // Wait for 1 minute client    .pause(60*sec)    .rtcScreenshot('ALL GOOD');    if (probeType === 1) {    client        .rtcEvent('Start limit', 'global')        .rtcSetNetworkProfile('custom', 'bandwidth', 500000, 'both', 'both')    }    // 2 minutes with bandwidth limits client    .pause(60*sec)    .rtcScreenshot('LIMITED')    .pause(60*sec);    if (probeType === 1) {     client        .rtcSetNetworkProfile('') // back to pristine network conditions        .rtcEvent('Stop limit', 'global');    } client    // 2 more minutes unlimited    .pause(60*sec)    .rtcScreenshot('BACK TO NORMAL')    .pause(60*sec);

 

The .rtcEvent() calls are there to place a vertical lines on the graphs while the .rtcSetNetworkProfile() is there to fiddle around with the network conditions.

There were two probes here, each one a participant in the call. The first one is the one I limited while the second one was left “untouched”.

Here’s what the graphs look like on the second probe:

The above graph shows the outgoing birate. Within a span of 5 seconds, WebRTC finds out the new effective bitrate and adapts to it. Ramping back up takes some 20 seconds.

The above graph shows the incoming frame rate. You can see how frame rate reporting in WebRTC takes a bit of time to get back to its usual self – also some 20 seconds or so.

I wanted to check how the Jitsi SFU would behave, so I tweaked the test URL for that. The results? Still better than the Zoom one. 20 seconds to hit 30 frames per second and around 50 seconds to get back to full bitrate.

If you want to try it yourself, just import the JSON file in this Google Drive folder to your testRTC account and modify it to fit your needs.

Where to now?

WebRTC is more than good enough.

Making it better is usually about thinking your way through the best possible architecture, along with media servers that take care of network conditions properly.

As for Zoom… please make sure your next call with me is on something that has WebRTC. The machine I regularly use for call is Linux. Zoom doesn’t work there… it doesn’t really support Chrome or Linux. Yet.

The post WebRTC vs Zoom. Who has Better Video Quality? appeared first on BlogGeek.me.

WebRTC FAQ: The 2018 Version

Mon, 09/17/2018 - 12:00

An updated WebRTC FAQ for those who wish to understand this tech somewhat better.

It is 2018, and it seems like there’s no good FAQ for WebRTC. Nowhere. They’re just not up to date. That, coupled with my own need to be the best source of information on the web about WebRTC (and the fact that my last few articles were more about CPaaS and messaging than WebRTC), got me to write this one.

What is WebRTC?

WebRTC is both a standard specification and an open source project.

WebRTC allows sending and receiving of real time voice, video and arbitrary data across browsers and other devices. This means we now have an easy way as users to conduct voice and video conferences from a browser or from our mobile devices. WebRTC can do a lot more than that, but voice and video in real time is the basis of what you get out of it.

There’s a short video explaining What is WebRTC on my site.

Who is behind WebRTC?

WebRTC originated from Google. It started by an acquisition of a few companies, whose technology was then repackaged and released as open source under the name of WebRTC.

Google is still the main vendor behind WebRTC. That’s because its own WebRTC engine is the main WebRTC open source project out there and it is also the one that gets integrated into the Chrome browser.

Mozilla, Microsoft and Apple all contribute to WebRTC and have their own implementations of WebRTC in their browsers (some of these implementations are derived from the Google code).

Other vendors and individuals contribute to the specification through the IETF and W3C, where the standardization process of WebRTC takes place.

My own contribution to WebRTC is this site, which publishes a lot of free information around WebRTC as well as the Kranky Geek event, WebRTC Index and WebRTC Glossary.

Is WebRTC ready for commercial use?

Yes.

WebRTC is used today by commercial services (here are 10 such examples).

Some complain and gripe that WebRTC isn’t ready for commercial use. This stems due to the many changes that the codebase and specification is undergoing. It also means that if you plan on using WebRTC, either do that through a third party managed service (a CPaaS vendor – list here) or make sure to have a team of savvy developers that can keep up with the pace.

The changes introduced to the WebRTC codebase itself oftentimes breaks backward compatibility and features, probably by sticking to a “move fast and break things” motto to some extent.

Why should I use WebRTC?

If you don’t need real time voice and video then you might not need to use WebRTC at all.

If you do, then it is a matter of capability, resources and time to market:

  • If you want your service to work inside a web browser, then WebRTC is your only way of getting real time voice and video into a browser
  • If you want it elsewhere, then in almost all cases, using WebRTC will cost you less and get you there faster than the alternatives
What codecs are used in WebRTC?

For voice, the mandatory codecs are G.711 and Opus. Out of these two, be sure to use Opus (G.711 is old and crappy).

For video, the mandatory codecs are VP8 and H.264. Apple’s Safari browser doesn’t support VP8. And on Android, Chrome won’t support H.264 on *some* devices (I’ll let you go figure out on which ones). More about that in this video mini-series.

VP9 is supported by Chrome and Firefox. AV1 seems to be the future.

What browsers support WebRTC?

All of them. Almost. But not exactly. And there are differences.

  • Chrome is where most developers focus. It isn’t 100% aligned with the specification yet (none of the browsers are)
  • Firefox is the next that gets focus from developers. Close enough to Chrome in its implementation
  • Edge doesn’t support data channels. And many skip it when it comes to testing due to is low market adoption
  • Safari is what everyone wants (Apple you know), but it is still buggy and doesn’t have support for VP8. Most need Safari support for iOS but are fine with not supporting Safari on Mac. Read this webrtcHacks post for more

There’s a devices cheat sheet on my website.

And then there’s adapter.js which you should definitely use.

Can I use WebRTC on mobile devices?

Yes.

On Android, on official Chrome and Firefox browsers, WebRTC is available.

On iOS, Safari offers something usable if you are willing to invest the energy to get it working well.

On both Android and iOS you can take the WebRTC source code and integrate it inside your native application. Google even releases prebuilt packages for both Android and iOS.

If you want to use a Webview inside your app, then this is easy with Android, restrictive with iOS for now (you won’t be able to access the camera or the microphone there).

Do I need special servers to run WebRTC?

Yes.

You definitely need a signaling server. And STUN/TURN server. You might need a media server.

WebRTC is said to be peer-to-peer. It is when it comes to the media as much as possible. But developers can make use of it in server centric environments. And there are some scenarios where it makes no technical sense to use peer-to-peer (for example if you want to broadcast something to a million people or conduct a video conference with 20 participants).

There’s a free video mini series explaining WebRTC servers on this site.

Can WebRTC be used to create large conferences?

Yap.

Think of WebRTC as a basic building block that gives you superpowers. With it you have the ability to send and receive voice and video in real time virtually on every device and browser.

Now what you do with this superpower, how you interact with it, architect your solution around it – that’s up to you.

There are vendors offering video conferencing that uses WebRTC and gets to 10’s of participants. Webinars with 100’s of live viewers in the audience.

You can read more about scale and size of WebRTC.

Is WebRTC posing a security threat for me?

No.

And yes.

Depending who you are and what are your needs.

I wrote a lot about WebRTC security in the past. It gets tiring.

WebRTC comes with security in mind. It encrypts everything. Can’t remove that encryption. And browsers get security updates faster than any other software you have.

The one sticking issue is probably the fact that it exposes the local IP address of your machine when it is used. VPNs that are implemented properly solve that as well. More about that over at webrtcHacks and VPN leaks.

What does WebRTC 1.0 mean?

WebRTC 1.0 is the first time that WebRTC will have an official specification.

Up until now, we had drafts and browser implementations that were an approximation of the drafts. Now we have an approximation of the WebRTC 1.0 specification and approximations of implementations to it in browsers.

Confused?

Don’t be. Assume WebRTC is good to go commercially (check that part of my FAQ) and just go read Jan-Ivar’s explanation @ Mozilla’s Advancing WebRTC blog.

Oh – and be sure to use adapter.js.

How much does WebRTC cost?

It doesn’t. And it does.

WebRTC is freely available in browsers.

The source code is also freely available.

The servers you will need to use it – someone will need to pay for them. That payment can be to a managed service, or to a cloud vendors and developers who will develop, install and maintain them. Up to you to decide.

Oftentimes, developers assume everything should be free with WebRTC, whereas reality is different. And for some reason, most perceive development  costs as free or sunk costs (they will call it investment) as opposed to paying a third party for doing the hard stuff for you.

A bit more on this here.

How can I learn more about WebRTC?

If you are into free, then try reading the specs, playing with the official samples, reading this blog and webrtcHacks.

There are a few courses on coursera, pluralsight and elsewhere. Never tried them, but read their agendas. Take a look for yourself and decide what’s for you.

There are books, but none of them is up to date with the specification.

Best place? Hands down? My paid course. Advanced WebRTC Architecture Course

Can I help you?

Maybe.

There’s my course. There’s testRTC where I am a co-founder (we do testing and monitoring of WebRTC apps).

I also consult. Around architecture, vendor selection, defining requirements, setting roadmaps, working on differentiation and doing pure marketing related work. What can I say?

I like the variety.

You can reach out to me here.

Got a question about WebRTC that needs to go into this FAQ? Add it below in the comments.

The post WebRTC FAQ: The 2018 Version appeared first on BlogGeek.me.

Social Messaging != Carrier Messaging (the stories of Whatsapp Business API & Apple Business Chat)

Thu, 09/13/2018 - 12:00

Social messaging is killing RCS in all the places that matter.

When looking at messaging in the context of communications and people, we can probably split the story into 3 distinct models:

  1. Consumer centric
  2. Business centric
  3. Businesses to consumers (and vice versa)

I’ll quickly sift through the first two and focus on the third.

Consumer Centric

Consumer centric is easy. That’s where Apple iMessage, WhatsApp, Facebook Messenger, Telegram, WeChat and a bunch of others are competing. The approach there today is to deliver a rich messaging experience that includes text, images, video, voice and video calling, location, groups, … – the list goes on. And on. And on.

They have won the war against SMS. We still have SMS. Some mistakenly call it ubiquitous (on my phone it is used for spam and 2FA messages only). They won the war against RCS that never really started.

To give you a clue – Israel is a WhatsApp country. If you don’t have WhatsApp you don’t exist. It is true from the age of 8. I just purchased the first smartphone for my 8 year old boy. Not so he can play or call with the phone – just so he can send messages to his classmates and stay part of the social fabric of his class. It happened to my daughter when she reached that age. I am now a part of multiple WhatsApp groups: family, close friends, parents of my kids’ classes and after classes, work related, etc.

How easy would it be to move people in Israel from entrenched groups that hold history, images and videos? And to what end? How would RCS be any better in its experience?

Business Centric

Business centric is Slack. It used to be all about calling and the PBX. Slack changed the game. Everyone is talking about “team messaging” today. I used the term enterprise messaging years ago.

What Slack did was find a good balance between functionality and user experience that no other player has been able to copy properly so far, but everyone is after.

WhatsApp is unlikely to penetrate businesses in a meaningful way. Facebook built Workplace instead of trying to introduce Facebook or Messenger directly.

Where’s SMS in this orgy of messaging? Meaningful conversations happen in IP messaging services and not over SMS anymore. Some solutions, like VonageFlow offer a seamless experience that encompasses both messaging as we know it today and SMS, though I’d argue that capability is a business to consumer one.

For all intent and purpose, SMS is non-existent when it comes to business centric messaging.

Business to Consumer

Back to RCS. RCS was supposed to be the future of SMS when we all move to IP based packet networks. Guess what? We’re all on IP based packet networks, and RCS isn’t really here yet in any meaningful way.

In the past couple of years, RCS got a new tune by its proponents. The strategy changed from getting consumers back from social networks towards being the one ubiquitous network – the ring to rule them all. Here’s the idea: you get RCS on all smartphones worldwide. Now carriers have the ubiquity they had with SMS. And businesses would pay for such access to customer’s phones.

Not going to happen.

Why? Because Apple and Facebook have other plans for us.

Apple now has Apple Business Chat. It is built into the iPhone, making businesses discoverable and reachable over iMessage from the Safari browser, Spotlight search, Siri assistant and Apple Maps. I’ve written extensively about it when it was introduced on SearchUC: Apple Business Chat looks to polish customer messaging

WhatsApp came out with their own offering called WhatsApp Business API. Similarly to Apple Business Chat, it offers the ability for businesses to communicate with consumers. Apple does that by focusing on contact center vendors while Whatsapp partners with CPaaS vendors. The goal? Get higher exposure and not working directly with longtail developers in the initial release.

What drove me to even start writing this article? This title of a TechCrunch post: Wish, Netflix, Uber and ~100 others testing WhatsApp’s new Business API

Businesses aren’t waiting for RCS. They are trying to figure out how to communicate with their customers via WhatsApp.

They had Line, WeChat, Facebook Messenger. And they’re still aiming for WhatsApp – a messaging service that isn’t even a US-thing.

Which brings me to the main thing – business to consumer is now a social messaging realm. Carriers have lost that domain as well.

1 Billion Defines the Moat

Remember ubiquity? Here’s what it takes to be interesting:

1 Billion Monthly Active Users

Who has that number today?

Facebook (WhatsApp + Messenger), Apple Business Chat and WeChat. WhatsApp being the biggest one are redefining this market. You hear a lot about how customers still phone businesses and chat isn’t catching up with contact centers. That might be true, but only partially.

Today’s chat solutions usually require being on the company’s website. SMS hasn’t proven itself in a large scale for anything other than notifications to customers on orders and transactions. Whatsapp can change that – and to that extent, any of the other 1B+ MAU social messaging apps.

RCS? With what billion users exactly?

With the large social networks, a 100 million monthly active users seem like a rounding error.

Focus is on Customer Care – Not Marketing

Another interesting aspect (and difference) is that social networks are keeping user identity and access close to their chest. While WhatsApp is using phone numbers for identity, piggybacking on carriers in a way, they are not allowing anyone access to a user without the user’s permission. This means:

  1. Businesses can’t “spam” users by sending them unsolicited messages just because they know their phone number or user name
  2. A user must first approach the business. Inbound use cases are the focus here, which lends itself nicely to support and purchasing activities
  3. Outbound marketing campaigns, ads, promotions – these aren’t something that are encouraged at the moment

What these networks are trying to do is to get businesses and consumers off their SMS communications and shift it to their network. To do so, they plan on offering a superior experience. They are doing that not only by adding richness over the limited 160 character experience of SMS, but they are also making sure this will be a useful service to their user base and won’t be considered spammy.

Will there be other avenues opened to businesses on social networks to interact with users through marketing campaigns and outbound messaging? Sure. But it isn’t the first priority. The market needs to be created first.

Where Can We Go Next?

We are headed towards an omnichannel interaction model.

To me that means that a business will meet a customer wherever it is comfortable for the customer in the context of that specific interaction.

A customer may prefer a phone call at one interaction, but a chat over WhatsApp on another.

The challenge here is that different customers may prefer different social networks. Or aren’t even approachable on some of the social networks. This isn’t going to change any time soon either. The number of social networks is still growing, and while we have a few huge players, others are important to specific populations.

Businesses will need to rely on multiple such channels if they want to reach out to a larger target audience of potential customers.

Back to RCS

It is coming. In some carriers. On some devices. In some form.

Is it going to take back ownership of the interactions from social networks? No.

What it can be, is just another channel. Right next to the rest. It will only become important if it can make that 1 billion monthly active users mark.

Oh, and it will need to succumb to the rules of engagement laid out by social networks today, around business-to-user permissions.

The post Social Messaging != Carrier Messaging (the stories of Whatsapp Business API & Apple Business Chat) appeared first on BlogGeek.me.

The CPaaS Version of iPaaS: MessageBird & Plivo Join the Twilio Studio Bandwagon

Tue, 09/04/2018 - 12:00

Visual design tools in CPaaS are now a part of the offering.

In October 2017, almost a year ago, Twilio announced Studio. I wrote at the time a lengthy article about my thoughts on Twilio Studio and CPaaS. My closing paragraph then was this one:

It will be interesting to see how competitors would react to this in the long run, and even more interesting to see what will Twilio Studio grow into.

Then in January 2018, I wrote about the 7 CPaaS Trends to Follow in 2018. The ones I zeroed in on:

  1. Serverless – a few more CPaaS vendors now offer serverless
  2. Omnichannel – more about that in one of my next articles
  3. Visual/IDE – guess why I wrote this article?
  4. Machine learning and Artificial Intelligence – Got a whole new report covering AI in RTC if you are interested
  5. AR/VR – planning to write about this one a bit later
  6. Bots – they’re already everywhere, directly linked to both omnichannel and AI
  7. GDPR – everyone covers that now in CPaaS

Not sure which CPaaS vendor to use? Check out my free CPaaS Vendor Selection Matrix. It will give you the KPIs to look for.

Download the CPaaS Vendor Selection Matrix

Guess what happened since with Visual/IDE?

Messagebird introduced Flow Builder: “The power of our Voice and SMS solutions at your fingertips, without writing a single line of code.”

Plivo announced PHLO on August: “A whole new visual way of integrating communications that would empower developers to design collaboratively, build visually and deploy instantly.”

 

Voximplant came out with Smartcalls: “a smart and flexible tool that helps you create outbound call campaigns in no time”

All of these CPaaS players invested into a Twilio Studio-like tool.

Let’s check out what each player did and why.

Twilio Studio

Where it all started (even if there were tools before or in parallel to it).

Studio’s entry point is either an incoming message, an incoming call or a REST API call. From there, the actions include things you do with messages and phone calls, along with the ability to execute generic functions.

A nice touch to Studio is its revision control system – it saves past changes made to the flows you built, allowing switching back and forth between revisions. It would be nice to have named revisions, some automated verbose explanation of changes made, etc.

Messagebird Flow Builder

Messagebird Flow Builder is focused around SMS. The inputs you can use for it are either an incoming SMS or an incoming webhook API call. Once in the “flow”, you can branch the flow based on the time and date or other conditions related to the contents of the message. The end result? An outgoing SMS, email or webhook. There’s a bit more to it than that, like the ability to manage subscriptions in Messagebird or wait for certain replies inside the flow.

What I like about the Messagebird Flow Builder is that it is rigid in how it outlines the boxes and their connections – it doesn’t let you move boxes around (a cool feature that got tiresome rather quickly on me in other tools here – Studio and PHLO).

Plivo PHLO

Plivo PHLO is a me-too Twilio Studio tool.

It has the same entry points, node types and capabilities, assuming you’re interested in SMS and voice calls that is. Where Twilio Studio offers more generic “Messages”, Plivo has only SMS. This is probably fine for most users.

The only thing I couldn’t find in PHLO is the ability to execute an arbitrary JS function. There’s also no revision control as of yet. Other than that, PHLO is a rather straightforward too to use.

Voximplant Smartcalls

The Voximplant Smartcalls service is different in nature. Where the rest of the pack here is focused on incoming events that trigger action, Smatcalls is all about campaigns. And all about voice.

You can create a scenario. Scenarios in Smartcalls is a visual decision tree of what to do with an outgoing call. You dial, someone answers, you play a specific recording, maybe ask them to click on digits, etc.

You can do things like send email or call a REST webhook, but the purpose of it all is to drive an automated outbound voice campaign: once you have a scenario, you create a campaign. A campaign is a time window, a scenario and a list of phone numbers to dial out to. Smartcalls does the rest to automate the scenario created across all phone numbers at the specified time window.

On Pricing

Here things get somewhat murkier.

Do you pay for using the designer tool itself when it gets invoked? (you do with Twilio Studio)

Do you need to pay for the communications used within the flows created? (you don’t with Voximplant Smartcals).

Plivo, being the shadow of Twilio for voice and SMS, decided not to price the use of PHLO at all, and make that an important part of their announcement as well:

“That’s why, in addition to bringing in 100% Plivo-API support out-of-the-box, we are also making it FREE to build using PHLO. This is not just a commercial decision. This is our stake in the ground — as we truly believe this is how the communication capabilities of the future will be built.”

Here’s the visual from the product page:

Will this create pressure on Twilio? I doubt it, but who am I to say?

A Comparison Table

I put these tools in a table, to see where each one is focused:

 

Twilio Studio Messagebird Flow Builder Plivo PHLO Voximplant Smartcalls Focus Inbound Inbound Inbound Outbound Medium Voice, SMS, Omnichannel messages SMS Voice, SMS Voice Cool factor Revision control Really easy to use Campaign management Flow pricing Per flow invoked Free Free Per minute charges Communications pricing Not included Not included Not included Included A Word about iPaaS

Maybe a few paragraphs…

iPaaS stands for Integration Platform as a Service. The poster child service here is probably Zapier, allowing the connectivity of one service to another. I use it daily in my own business to power many of the integrations on this website.

Many of the CPaaS players have been working on enabling their use via Zapier, so a user doesn’t need to be a developer to send a message for example. Being able to build more complex communication flows using a visual builder sits well with this approach.

What will be interesting to see is how the two play out with each other, if at all. Will these visual builders get integrated into Zapier? Will these visual builders include easier integration points to other services besides what they themselves offer and a rudimentary capability of invoking a REST call?

Welcome to Visual CPaaS

CPaaS is more than making communication API calls or offering github repositories. In the past two years we’ve seen some interesting movements in this space and innovations coming out.

I can’t wait to see what will come next.

Not sure which CPaaS vendor to use? Check out my free CPaaS Vendor Selection Matrix. It will give you the KPIs to look for.

Download the CPaaS Vendor Selection Matrix

The post The CPaaS Version of iPaaS: MessageBird & Plivo Join the Twilio Studio Bandwagon appeared first on BlogGeek.me.

Understanding video tech in the enterprise: a web survey

Mon, 08/20/2018 - 13:00

A web survey says… that you need to join in to learn more about real time video technology.

I’ve partnered up with Vidyo on a survey they are working on with Hanover Research. This one is focused on how real time video technology gets used in different industries, as well as how decisions are made when choosing the technology stack to use.

Fill out the survey

I worked as a programmer during my time at school. It was fun, but it is hard to call it professional work (although the last place was a startup focused on medical patient records in the Israel healthcare system). My first “grownup” job as a developer was at a video conferencing company. You can say I’ve been spending my time in front of a webcam for more than half of my lifetime, communicating with peers and colleagues.
In the last several years, as a consultant, much of my work is conducted online. At times with customers that I have never met face to face – only through a video conference.

At testRTC, almost all of our sales are done through video conferencing. Recently, we had a conference call conducted on one of the web conferencing platforms that was selected for use by our customer (we tend to use Google Meet by default, but flexible to use whatever the customer is comfortable with). People from that company always join with their video turned off. I forgot mine on for a couple of seconds, which allowed me to use it as an excuse to ask the person who I had working relations with for several months now to see her as well. She obliged, and for a brief few seconds it felt more human. Now it is a lot easier for me to have a mental image of that person when she speaks. This adds volumes to the connection between us humans.

For me video isn’t a gimmick. It is a critical tool.

Are all my calls video calls? No. Just like I use messaging but still use voice calling. Different tools for different jobs.

 

When Vidyo asked me to join them for the survey, I automatically said yes. As someone who uses video on a daily basis, I am always interested in understanding how others are making use of video if at all.

The survey Vidyo is doing comes to answer one main question: How (and why) video gets embedded into different businesses?

For me, one of the more interesting questions relates to the applications businesses develop, and if they don’t plan on adding communication functions into them, then why. Understanding what barriers and challenges people see in these technologies can help us as an industry decide where to put our focus.

If you are reading this blog and want to help me out in understanding the industry better, would you be so kind as to fill out this online survey? If you do, you’ll have my thanks as well as a copy of the research findings.

Fill out the survey

The post Understanding video tech in the enterprise: a web survey appeared first on BlogGeek.me.

AI in RTC: Report Preview

Mon, 08/13/2018 - 12:00

Our AI in RTC report got published, and I am proud of the results. Purchase it now while it is under its launch price.

The Report

It has been quite a ride to get this report completed. We spent many hours interviewing vendors, researching individually, sifting through web survey results, discussing topics between us and writing. Lots of writing.

When Chad said he estimates the report to be in the range of 60 pages – 80 tops – I laughed. It seemed ridiculous that the report will be “that short”. My own estimate was 100. Give or take a couple of pages.

We ended up with 147 pages. And not because we’ve increased the fonts or used double lines

There was just so much to cover and so much we wanted to discuss. We ended up with almost 30,000 words.

The report has 37 figures and 23 tables. We added them to make some of the concepts easier to understand and to put some order and methodology into the data provided.

Each chapter has its own set of recommendations, to help you move forward. We wanted to have an actionable report and not a lukewarm one.

Initial Feedback

Last week, we delivered the final report to our prepublication customers – those who were willing to trust us with our work before even knowing it was complete.

I talked to one such customer two days later. He said he already read the whole report once, but will surely dive into it at least twice more. He had to digest all the information in it and see how it fits with his product roadmap.

Artificial Intelligence and … Your Company

Here is something that I am sure today more than ever.

Machine learning and artificial intelligence are here to stay. They are going to be integrated into products and services across all industries, and communications is not going to be any different here.

There are 3 ways this can play out for a vendor in our industry:

  1. You take the leap and start on your road towards smarter communications by adding AI functionality to your company
  2. You wait until you get dragged into AI by competitors who are now way smarter than you (thanks to AI)
  3. You resist and die. It won’t happen immediately, but it will happen

What we’ve seen in our interviews for this report, along with the discussions we had with customers who purchased the report, I know that this is the right time to look into this domain and plan for the future.

I’d like to invite you on this journey – we’ve created a report preview, which contains the executive summary, scope and methodologies and the table of contents. You can download the preview from the research page on Kranky Geek:

Learn more about the AI in RTC report

 

There’s a special launch price at the moment, which will not be available once we hit September. So if you are interested, there’s no better time than the present.

The post AI in RTC: Report Preview appeared first on BlogGeek.me.

Vonage acquires TokBox. Where do we go from here?

Mon, 08/06/2018 - 12:00

Video, in the hands of the correct company can be a powerful thing.

In 2012 Telefonica acquires TokBox. I wrote about it at the time – almost 6 years ago. It seems sad reading that piece about TokBox acquisition again. I suggested three areas where Telefonica can make a difference with TokBox. Let’s see what happened.

What Could Telefonica do with TokBox?

What I said in 2012:

Will Telefonica wait the same amount of time it did with Jajah until it does something with this acquisition? I hope they will move faster this time…

Telefonica did nothing with TokBox. They haven’t integrated them into anything. They decided to leave TokBox independent.

This has helped grow TokBox in the 6 years into one of the dominant players in video APIs for real time communications. Almost any developer and initiative that I talk to which has decided to go for a 3rd party platform decided to use TokBox. I see others as well, but not as frequent.

Since the acquisition, TokBox:

  • Switched to WebRTC fully, killing its Flash based solution
  • Increased its session sizes to fit thousands of parallel streams per session
  • Added recording and broadcasting
  • Created their Inspector tool, one of the best I’ve seen on the market for debugging sessions after the fact
  • Cleaned, beefed up and curated their documentation. Again – one of the best I’ve seen on the market for communication APIs
  • They gained customers as well. Per the press release, over 2,300 customers

Telefonica failed to make use of TokBox. It didn’t go into video with it. It didn’t try to figure our VoIP. It didn’t try to understand why developers chose TokBox. Telefonica did nothing other than let TokBox continue in its trajectory. It is probably why Telefonica lost interest and decided to sell TokBox to Vonage.

Telefonica plans on folding TokBox into BlueVia, but how will they combine TokBox, if at all, with their Tu Me VoIP OTT service?

  • Didn’t happen
  • BlueVia died somewhere between 2013-2014
  • Along with Jajah, Tu Me and Tu whatever that Telefonica built
  • VoIP is not a thing for carriers
  • appear.in was sold by Telenor to Videonor
  • AT&T started and stopped its WebRTC APIs initiative
  • What will happen with Deutsche Telekom’s immmr?

Telefonica made no use of its strengths to find synergies with TokBox. Would doing so kill TokBox altogether, or could it made them stronger?

What will Telefonica do about voice? Their main API set doesn’t seem to include voice calling, but now it has video… will they be going for Twilio or Voxeo for that one? Or will they roll out their own? Will they skip voice altogether?

TokBox doubled down on video, beefing up their capabilities in that domain. It has a SIP connector, but nothing more than that. It is a missed opportunity.

Where is TokBox today?

TokBox is video communication APIs. There are other vendors out there doing that today: Twilio, Vidyo.io, Agora, Sinch, Voximplant, Temasys and probably a few others I forgot to mention (sorry for missing out on you).

TokBox are the market leader here, when it comes to breadths of features in the video space.

It just wasn’t enough to get them to more customers and garner more than $35 million in the acquisition. I’d attribute this to:

  1. They weren’t operating as a startup. Being part of Telefonica meant stability, which probably took away their focus on revenue and growth in the way you see in other CPaaS vendors. The end result of such a thing is expenses that were too high when aligned to revenue or to the potential to raise money in the VC world. Vonage will need to handle this, and a change in direction and DNA is never an easy one
  2. Telefonica probably wanted out. They weren’t interested in continuing with this, so any amount above $0 was a good number for them

Does this say anything about the market of video APIs? The viability of it to other vendors? The importance of video in the bigger picture?

I don’t really know.

Where are we with Video CPaaS?

Video CPaaS, and in a way we can extend it to WebRTC CPaaS vendors – those who don’t dabble too much with PSTN voice and/or SMS is a finickey market. The vendors that get acquired in this space are gobbled up never to be seen again (think AddLive or Requestec) or they just don’t grow fast enough or become as big as their PSTN voice/SMS counterparts.

And yet.

IDC maintains that the U.S. programmable video market will be a $7.4 billion opportunity by 2022, representing more than a 140% four-year CAGR. Assuming only 10% of that becomes a reality, the question becomes who will be the winners in programmable video?

What types of services do they need to offer? What products? Are these lower level APIs, or higher level abstractions? Maybe we’re looking at almost complete solutions with a nice API lipstick on top that get calculated in that $7.4 billion.

Video is here to stay.

It won’t be replacing every voice call. But it definitely has its place.

Otherwise, why did apple go for group video calls in FaceTime with 32 participants in their latest iOS?

And why did Whatsapp just add group video calls? And Instagram added group video calls?

Are they doing it just for fun? Is the market bound to be focused only on larger social networks?

I can’t believe that will be the case.

I came from a video conferencing company. Every year I was promised by management that this year will be the year of video. It never happened.

The last 5 years, I am using video so much that the year of video has passed already.

I guess the next question is what year will be the year of video CPaaS?

The difference in these two questions is that the year of video is the year when video became a widespread service. The year of video CPaaS will be the year when video becomes a widespread feature. We’re not there yet, but we’re heading in that direction.

In many ways, TokBox is one of the vendors figuring out how to get there.

Where are we with CPaaS?

CPaaS seems to be different, but only slightly.

Growth in this space, as far as I understand, comes from SMS and PSTN voice. That’s it.

VoIP? WebRTC? IP messaging? Social omnichannel aggregation? Video? All nice to have features for now that don’t affect the bottomline enough. And at the moment, they don’t seem to be big enough to fill in the gap when SMS and PSTN voice fall out of favor.

To be a successful CPaaS vendor today, you need to:

  1. Look into the future and execute the future
  2. Rely on SMS and PSTN revenue – AND improve your services in that domain
  3. Cultivate multiple IP based solutions and services, preparing to reap rewards once that market grows exponentially

The thing about that third point, is that it won’t be as simple to achieve as doing what CPaaS did with SMS and PSTN. In SMS and PSTN, CPaaS needed to act as an aggregator of carriers with a simple API. No one wants to deal with carriers (which is why they fail with these API initiatives when it comes to WebRTC and video services), so friendly CPaaS vendors are a great alternative.

What is the mote/barrier that CPaaS vendors are building in the IP world? Answering this question holds the key to the future of CPaaS.

What will Vonage do with TokBox?

Not have it as a standalone business.

Doing that, would mean perpetuating what happened in Telefonica. While not all of it was bad, it didn’t bring the expected growth with it.

Vonage is uniquely positioned here – more than any other vendor in the market, which is probably why it ended up acquiring TokBox.

I’ll go back to my venn diagrams for an explanation here:

TBD – IMAGE HERE

The opportunity space:

  • VBC at Vonage deals with UCaaS
  • Nexmo and TokBox are all about CPaaS

CPaaS:

  • TokBox will probably be merged with Nexmo, brining a single offering to developers
  • Nexmo has voice, SMS, IP messaging and omnichannel aggregation, with video just launched. TokBox has video
  • Together, that completes the gap in communication services for developers, brining Vonage on par with its biggest CPaaS competitor – Twilio
  • This means the threat of customers leaving TokBox to Twilio because they want to deal with a single vendor and need other telephony services is now lessened
  • It also means that the threat of customers leaving Nexmo to Twilio because Nexmo lacks a good video service is now lessened as well
  • If you are a TokBox customer that also uses Twilio, it might make sense for you to switch to Nexmo. I am sure Nexmo will be running the roster of TokBox customers to see if they have there Twilio customers that they can convert
  • TokBox had time to flesh out their service in a unique way – the time Telefonica gave them were put into good use when it comes to infrastructure and developer related capabilities (look at Inspector and their documentation). Next, Vonage can decide to cherry pick the best pieces of Nexmo and TokBox to combine them and give a better user experience across the board for the developers using their CPaaS platform

UCaaS:

  • On the UCaaS front, Vonage is using Amazon Chime today. The challenge with Chime is that it is a complete standalone product – something that is harder to embed and integrate into an existing experience. Vonage isn’t alone here – RingCentral is relying on Zoom. Such integrations are nice, but they can’t go deep
  • TokBox brings APIs that are far superior and more flexible than what Zoom, Chime or any other video conferencing player can bring with its integration APIs. Using these to bake video right into its UCaaS VBC app makes sense, and puts Vonage at a better position than its UCaaS competitors
  • Especially if video is the next frontier
What does this mean to TokBox competitors?

Telefonica was never a serious competitor in video CPaaS.

Nexmo and by extension Vonage is.

Nexmo is probably second to only Twilio.

TokBox is probably first in video CPaaS.

They combine nicely and offer Nexmo a capability that its competitors don’t have if you look at the breadth of their video offering.

If Vonage executes this well, the end result will be a better CPaaS offering, a better Nexmo and a better Vonage.

The post Vonage acquires TokBox. Where do we go from here? appeared first on BlogGeek.me.

AI in RTC: Final Price Points and End of Prepublication Discount

Mon, 07/23/2018 - 12:00

Our AI in RTC report is just about ready. Here are all of its price points.

If you aren’t interested in AI and RTC, then move on  – this one isn’t for you.

Still here?

Good.

In the past several months I’ve been adding into my daily activities the creation of a new report – one about AI in RTC.

It has taken its toll – I’ve slept a bit less. Read a bit less. Turned down and postponed a few clients. All in order to get this project going. I’ve partnered with Chad Hart on it, one of my partners in crime at Kranky Geek and a fellow consultant.

We wanted to work on something new and interesting and this seemed to be the right thing to do.

After countless hours in interviews with vendors and suppliers in this space, discussions we had with one another and time spent just looking at the ceiling of my office and thinking, I can say that we’re almost ready with the report. Most of it is already written, and what is left will be completed really soon.

What will you find in this report?
  • An introduction to machine learning and artificial intelligence. A high level one, which should be suitable for people who are less conversant in it
  • Speech Analytics. A thorough chapter looking at how speech analytics is used in real time communications, including use cases, vendors and a lot more. I’d say the majority of the writing is here, as most of the focus of our industry is here
  • Voice Bots. While a lot is said about chatbots, we decided to skip them (it would have de-focused us) and instead look at the domain of voice bots. Think Google Duplex, but for the enterprise
  • Computer Vision. You probably saw just like me how autonomous driving is taking out the life out of computer vision elsewhere. That said, there are still vendors and places in RTC where you can find computer vision, which is what’s in this chapter of our report
  • Cost and Quality Optimization. That’s the silent participant in every VoIP session you have. And it is slowly moving towards AI as well. We’ve found those who use it today and talked to those who don’t, trying to figure out both sides of the equation
  • Survey summary. Remember that online survey? We’re still collecting the final responses, so be sure to fill it out if you haven’t. That’s where we will be writing our analysis if the responses we’ve received
  • Other things?
    • The introductory ebook on AI in RTC (still not written), that is also given for free to ALL those filling the online survey
    • Glossary of terms related to RTC
    • A powerpoint deck of all the illustrations from the report
Where can you learn more about the report?

Three places:

How much does it cost?

Publication date is scheduled to end of July. We might miss it by a few days due to editing and some last minute changes.

  • Prepublication price: $1,170 (available until publication)
  • Launch discount: $1,950 (available until September 7)
  • Official price: $2,950

We’re allowing payment via PayPal and wire transfer inside the US. We don’t have any digital shopping cart, as this is a first for us through Kranky Geek Research. It also means we’re treating each and every purchaser as royalty

Why wait for the price to raise? Join those who’ve already purchased at our discounted prepublication price. Interested? Just email us.

 

The post AI in RTC: Final Price Points and End of Prepublication Discount appeared first on BlogGeek.me.

Autonomous Cars Are Killing Video AI in RTC

Mon, 07/16/2018 - 12:00

Autonomous cars are sucking all the oxygen out of video AI in real time comms. Talent is focusing elsewhere

I went to the data science summit in Israel a month or so back. It was an interesting day. But somehow, I had to make sure to dodge all the boring autonomous cars sessions .they just weren’t meant for me, as I was wondering around, trying to figure out where machine learning and AI fit in RTC (you do remember I am working on a report on this – right?).

After countless of interviews done this past month, along with my partner in crime here, Chad Hart, I can say that I now know a lot more about this topic. We’ve mapped the industry in and out. Talking to technology vendors, open source projects, suppliers, consumers, you name it.

There were two interesting themes that relate to the use of AI in video – again – focus is on real time communications:

  1. There’s a lot less expertise to go around in the industry, where the industry is real time comms and not machine learning or computer vision in general
  2. The industry’s standards and capabilities seem higher and better than what we see in RTC today

Guess what – we’re about to incorporate the responses we got on our web survey on AI in RTC into the report. If you fill it, you’ll get our upcoming “Introduction to AI in RTC ebook” and a chance to win on of 5 $100 Amazon gift cards – along with our appreciation of helping us out. Why wait?

Fill out the web survey

Knowledge in AI is lacking

In broad strokes, when you want to do something with AI, you’ll need to either source it from other vendors or build it on your own.

As an example, you can just use Amazon Rekognition to handle object classification, and then you don’t need a lot of in-house expertise.

The savvy vendors will have people handling machine learning and AI internally as well. Being in the build category, means you need 3 types of skills:

  1. Data scientists – people who can look at hoards of data, check out different algorithms and decide on what works best – what pieces of data to look at and what model to build
  2. Data engineers – these are the devops of this field. They are there to connect the dots of the different elements in the system and build a kind of a pipeline where data gets processed and handled. They don’t need to know the details of algorithms, but they do need to know the jargon and concepts
  3. Product managers – these are the guys who need to decide what to do. Without them, engineers will play without any focus or oversight, wasting time and resources instead of working towards value creation. These product managers need to know a thing or two about data science, machine learning and how it works

Data scientists are the hardest to find and retain. In one of our interviews, we were told that the company in question had to train their internal workforce for machine learning because it was impossible to hire experience in the valley – Google, Apple, Facebook and Amazon are the main recruiters for that position and they are too competitive in what they offer employees.

Data engineers are probably easier to find and train, but what is it you need them to do exactly?

And then there’s product managers. I am not even sure there’s any training program specifically for product managers who need to work in this space. I know I am still learning what that means exactly. Part of it by asking through our current research how do vendors end up adding AI into their products. The answers vary and are quite interesting.

Anyways – lots of hype. Less in the way of real skills out there you can hire for the job.

Autonomous driving is where computer vision is today

If you follow the general technology media out there, then there are 3 things that bubble up to the surface these days when it comes to AI:

  1. AI and job displacement
  2. The end of privacy (coupled with fake news in some ways)
  3. Autonomous cars

The third one is a very distinct use case. And it is the one that is probably eating away a lot of the talent when it comes to computer vision. The industry as a whole is interested for some reasons to take a stab at making cars drive on their own. This is quite a challenge, and it is probably why so many researchers are flocking towards it. A lot of the data being processed in order to get us there is visual data.

Vision in autonomous cars cannot be understated. This ABC News clip of the recent Uber accident drives that point home. Look at these few seconds explaining things:

“These vehicles are trained to see pedestrians, to see cyclists, to see redlights. So it’s really unclear what went wrong here”

And then you ask a data scientist to deal withboring video meeting recordings to do whatever it is we need to do in real time communications with AI. Not enough fame in it as opposed to self driving cars. Not enough of a good story to tell your friends when you meet them after work.

Computer vision in video meetings is nascent

Then there’s the actual tidbit of what we do with AI in computer vision versus what we do with AI in video meetings.

I’d like to break this down into a table:

Computer vision Video meeting AI
  • Count faces/people
  • Speaker identification
  • Facial recognition
  • Gesture control
  • Emotion detection
  • Auto-frame participants

Why is this difference? Two main reasons:

  1. Video meetings are real time in nature and limited in the available compute power. There’s more on that in our upcoming report. But the end result is that adopting the latest and greatest that computer vision has to offer isn’t trivial
  2. We haven’t figured out as an industry where’s the ROI in most of the computer vision capabilities when it comes to video meetings – there are lower hanging fruit these days in the form of transcription, translation and what you can do with speech

As we move forward, companies will start figuring this one out – deciding how data pipeline for computer vision need to look like in video meetings AND decide what use cases are best addressed with computer vision.

Where are we headed?

The communication market is changing. We are seeing tremendous shifts in our market – cloud and APIs are major contributors to this. Adding AI into the mix means change is ahead of us for years to come.

On my end, I am adding ML/AI expertise to the things I consult about, with the usual focus of communications in mind. If you want to take the first step into understanding where AI in RTC is headed, check out our upcoming report – there’s a discount associated with purchasing it before it gets published:

AI in RTC

You can download our report prospectus here.

The post Autonomous Cars Are Killing Video AI in RTC appeared first on BlogGeek.me.

The Challenging Path to WebRTC H.264 Video Codec Hardware Support

Mon, 07/09/2018 - 12:00

WebRTC H.264 hardware acceleration is no guarantee for anything. Not even for hardware acceleration.

There was a big war going on when it came to the video codec in WebRTC. Should we all be using VP8 or should we be using H.264? A lot of digital ink was spilled on this topic (here as well as in other places). The final decision that was made?

Both VP8 and H.264 became mandatory to implement by browsers.

So… which of these video codecs should you use in your application? Here’s a free mini video course to help you decide.

Enroll to free course

Fast forward to today, and you have this interesting conundrum:

  • Chrome, Firefox and Edge implement VP8 and H.264
  • Safari implements H.264. No VP8

Leaving aside the question of what mandatory really means in English (leaving it here for the good people at Apple to review), that makes only a fraction of the whole story.

There are reasons why one would like to use VP8:

  1. It has been there from the start, so its implementation is highly optimized already
  2. Royalty free, so no need to deal with patents and payments and whatnot. I know there’s FUD around patents in VP8, but for the most part, 100% of the industry is treating it as free
  3. It nicely supports simulcast, so quite friendly to video group calling scenarios

There are reasons why one would like to use H.264:

  1. You already have H.264 equipment, so don’t want to transcode – be it cameras, video conferencing gear or the need to broadcast via HLS or RTMP
  2. You want to support Safari
  3. You want to leverage hardware based encoding and decoding to increase battery life on your mobile devices

I want to open up the challenges here. Especially in leveraging hardware based encoding in WebRTC H.264 implementations. Before we dive into them though, there’s one more thing I want to make clear:

You can use a mobile app with VP8 (or H.264) on iOS devices.

The fact that Apple decided NOT to implement VP8, doesn’t bar your own mobile app from supporting it.

WebRTC H.264 Challenges

Before you decide going for a WebRTC H.264 implementation, you should need to take into consideration a few of the challenges associated with it.

I want to start by explaining one thing about video codecs – they come with multiple features, knobs, capabilities, configurations and profiles. These additional doozies are there to improve the final quality of the video, but they aren’t always there. To use them, BOTH the encoder and the decode need to support them, which where a lot of the problems you’ll be facing stem from.

#1 – You might not have access to a hardware implementation of H.264

In the past, developers had no access to the H.264 codec on iOS. You could only get it to record a file or playback one. Not use it to stream media in real time. This has changed and now that’s possible.

But there’s also Android to contend with. And in Android, you’re living in the wild wild west and not the world wide web.

It would be safe to say that all modern Android devices today have H.264 encoder and decoder available in hardware acceleration, which is great. But do you have access to it?

The illustration above shows the value chain of the hardware acceleration. Who’s in charge of exposing that API to you as a developer?

The silicon designer? The silicon manufacturer? The one who built the hardware acceleration component and licensed it to the chipset vendor? Maybe the handset manufacturer? Or is it Google?

The answer is all of them and none of them.

WebRTC is a corner case of a niche of a capability inside the device. No one cares about it enough to make sure it works out of the factory gate. Which is why in some of the devices, you won’t have access to the hardware acceleration for H.264 and will be left to deal with a software implementation.

Which brings us to the next challenge:

#2 – Software implementations of H.264 encoders might require royalty payments

Since you will be needing a software implementation of H.264, you might end up needing to pay royalties for using this codec.

I know there’s this thing called OpenH264. I am not a lawyer, though my understanding is that you can’t really compile it on your own if you want to keep it “open” in the sense of no royalty payments. And you’ll probably need to compile it or link it with your code statically to work.

This being the case, tread carefully here.

Oh, and if you’re using a 3rd party CPaaS, you might want to ask that vendor if he is taking care of that royalty payment for you – my guess is that he isn’t.

#3 – Simulcast isn’t really supported. At least not everywhere

Simulcast is how most of us do group video calls these days. At least until SVC becomes more widely available.

What simulcast does is allows devices to send multiple resolutions/bitrates of the same video towards the server. This removes the need of an SFU to transcode media and at the same time, let the SFU offer the most suitable experience for each participant without resorting to lowest common denominator type of strategies.

The problem is that simulcast in H.264 isn’t available yet in any of the web browsers. It is coming to Chrome, but that’s about it for now. And even when it will be, there’s no guarantee that Apple will be so kind as to add it to Safari.

It is better than nothing, though not as good as VP8 simulcast support today.

#4 – H.264 hardware implementations aren’t always compatible with WebRTC

Here’s the kicker – I learned this one last month, from a thread in discuss-webrtc – the implementation requirements of H.264 in WebRTC are such that it isn’t always easy to use hardware acceleration even if and when it is available.

Read this from that thread:

Remember to differentiate between the encoder and the decoder.

The Chrome software encoder is OpenH264 – https://github.com/cisco/openh264

Contributions are welcome, but the encoder currently doesn’t support either High or Main (or even full Baseline), according to the README file.

Hardware encoders vary greatly in their capabilities.

Harald Alvestrand from Google offers here a few interesting statements. Let me translate them for you:

  • H.264 encoders and decoders are different kinds of pain. You need to solve the problem of each of these separately (more about that later)
  • Chrome’s encoder is based on Cisco’s OpenH264 project, which means this is what Google spend the most time testing against when it looks at WebRTC H.264 implementations. Here’s an illustration of what that means:
  • The econder’s implementation of OpenH264 isn’t really High profile or Main profile or even Baseline profile. It just implements something in-between that fits well into real time communications
  • And if you decide not to use it and use a hardware encoder, then be sure to check what that encoder is capable of, as this is the wild wild west as we said, so even if the encoder is accessible, it is going to be like a box of chocolate – you never know what they’re going to support

And then comes this nice reply from the good guys at Fuze:

@Harald: we’ve actually been facing issues related to the different profiles support with OpenH264 and the hardware encoders. Wouldn’t it make more sense for Chrome to only offer profiles supported by both? Here’s the bad corner case we hit: we were accidentally picking a profile only supported by the hardware encoder on Mac. As a result, when Chrome detected CPU issues for instance, it would try to reduce quality to a level not supported by the hardware encoder which actually led to a fallback to the software encoder… which didn’t support the profile. There didn’t seem to be a good way to handle this scenario as the other side would just stop receiving anything.

If I may translate this one as well for your entertainment:

  • You pick a profile for the encoder which might not be available in the decoder. And Chrome doesn’t seem to be doing the matchmaking here (not sure if that true and if Chrome can even do that if it really wanted to)
  • Mac’s hardware acceleration for the encoder of H.264, as any other Apple product, has its very own configuration to it, which is supported only by it. But somehow, it doesn’t at some point which kills off the ability to even use that configuration when you try to fallback to software
  • This is one edge case, but there are probably more like it lurking around

So. Got hardware encoder and/or decoder. Might not be able to use it.

#5 – For now, H.264 video quality is… lower than VP8

That implementation of H.264 in WebRTC? It isn’t as good as the VP8 one. At least not in Chrome.

I’ve taken testRTC for a spin on this one, running AppRTC with it. Once with VP8 and another time with H.264. Here’s what I got:

VP8

Bitrate:

Framerate:

H.264

Bitrate:

Framerate:

This is for the same scenario running on the same machines encoding the same raw video. The outgoing bitrate variance for VP8 is 0.115 while it is 0.157 for H.264 (the lower the better). Not such a big difference. The framerate of H.264 seems to be somewhat lower at times.

I tried out our new scoring system in testRTC that is available in beta on both these test runs, and got these numbers:

The 9.0 score was given to the VP8 test run while H.264 got an 8.8 score.

There’s a bit of a difference with how stable VP8’s implementation is versus the H.264 one. It isn’t that Cisco’s H.264 code is bad. It might just be that the way it got integrated into WebRTC isn’t as optimized as the VP8’s integration.

Then there’s this from the same discuss-webrtc thread:

We tried h264 baseline at 6mbps. The problem we ran into is the bitrate drastically jumped all over the place.

I am not sure if this relates to the fact that it is H.264 or just to trying to use WebRTC at such high bitrates, or the machine or something else entirely. But the encoder here is suspect as well.

I also have a feeling that Google’s own telemetry and stats about the video codecs being used will point to VP8 having a larger portion of ongoing WebRTC sessions.

#6 – The future lies in AV1

After VP8 and H.264 there’s VP9 and H.265 respectively.

H.265 is nowhere to be found in WebRTC, and I can’t see it getting there.

And then there’s AV1, which includes as its founding members Apple, Google, Microsoft and Mozilla (who all happen to be the companies behind the major web browsers).

The best trajectory to video codecs in WebRTC will look something like this:

Why doesn’t this happen in VP8?

It does. To some extent. But a lot less.

The challenges in VP8 are limited as it is mostly software based, with a single main implementation to baseline against – the one coming from Google directly. Which happens to be the one used by Chrome’s WebRTC as well.

Since everyone work against the same codebase, using the same bitstreams and software to test against, you don’t see the same set of headaches.

There’s also the limitation of available hardware acceleration for VP8, which ends up being an advantage here – hardware acceleration is hard to upgrade. Software is easy. Especially if it gets automatically upgraded every 6-8 weeks like Chrome does.

Hardware beats software at speed and performance. But software beats hardware on flexibility and agility. Every. Day. of. The. Week.

What’s Next?

The current situation isn’t a healthy one, but it is all we’ve got to work with.

I am not advocating against H.264, just against using it blindingly.

How the future will unfold depends greatly on the progress made in AV1 as well as the steps Apple will be taking with WebRTC and their decisions of the video codecs to incorporate into Webkit, Safari and the iOS ecosystem.

Whatever you end up deciding to go with, make sure you do it with your eyes wide open.

So… which of these video codecs should you use in your application? Here’s a free mini video course to help you decide.

Enroll to free course

 

The post The Challenging Path to WebRTC H.264 Video Codec Hardware Support appeared first on BlogGeek.me.

Can AI and Computer Vision solve the video conferencing eye contact problem?

Mon, 07/02/2018 - 12:00

Parallax, or eye contact in video conferencing is a problem that should be solved, and AI is probably how we end up solving it.

I’ve been working at a video conferencing company about 20 years ago. Since then a lot have changed:

  • Resolutions and image quality have increased dramatically
  • Systems migrated from on prem to the cloud
  • Our focus changed from large room systems, to mobile, to desktop and now to huddle rooms
  • We went from designed hardware to running it all on commodity hardware
  • And now we’re going after commodity software with the help of WebRTC

One thing hasn’t really changed in all that time.

I still see straight into your nose or straight at your forehead. I can never seem to be able to look you in the eye. When I do, it ends up being me gazing straight at my camera, which is unnatural for me either.

The reason for this is known as the parallax problem in video conferencing. Parallax. What a great word.

If you believe Wikipedia, then “Parallax is a displacement or difference in the apparent position of an object viewed along two different lines of sight, and is measured by the angle or semi-angle of inclination between those two lines.”

A mouthful. Let me illustrate the problem:

What happens here is that as I watch the eyes of the person on the screen, my camera is capturing me. But I am not looking at my camera. I am looking at an angle above or beyond it. And with a group call with a couple of people in it in Hollywood squares, who should I be looking at anyway?

So you end up with either my nose.

Or my forehead.

What we really want/need is to have that camera right behind the eyes of the person we’re looking at on our display – be it a smartphone, laptop, desktop or room system.

Over the years, the notion was to “ignore” this problem as it is too hard to solve. The solution to it usually required the use of mirrors and an increase in the space the display needed.

Here’s an example from a failed kickstarter project that wanted to solve this for tablets – the eTeleporter:

The result is usually cumbersome and expensive. Which is why it never caught on.

There are those who suggest tilting the monitor. This may work well for static devices in meeting rooms, but then again, who would do the work needed, and would the same angle work on every room size and setup?

When I worked years ago at a video conferencing company, we had a European research project we participated in that included 3D imaging, 3D displays, telepresence and a few high end cameras. The idea was to create a better telepresence experience that got eye contact properly as well. It never saw the light of day.

Today, multiple cameras and depth sensors just might work.

Let’s first take it to the extreme. Think of Intel True View. Pepper a stadium with enough cameras, and you can decide to synthetically re-create any scene from that football game.

Since we’re not going to have 20+ 5K cameras in our meeting rooms, we will need to make do with one. Or two. And some depth information. Gleaned via a sensor, dual camera contraption or just by using machine learning.

Which is where two recent advancements give a clue to where we’re headed:

  1. Apple Memoji (and earlier Bitmoji). iPhone X can 3D scan your face and recognize facial movements and expressions
  2. Facebook can now open eyes in selfie images with the help of AI

The idea? Analyze and “map” what the camera sees, and then tweak it a bit to fit the need. You won’t be getting the real, raw image, but what you’ll get will be eye contact.

Back to AI in RTC

In our interviews this past month we’ve been talking to many vendors who make use of machine learning and AI in their real time communication products. We’ve doubled down on computer vision in the last week or two, trying to understand where is the technology today – what’s in production and what’s coming in the next release or two.

Nothing I’ve seen was about eye contact, and computer vision in real time communication is still quite nascent, solving simpler problems. But you do see the steps taken towards that end game, just not from the video communication players yet.

Interested in AI and RTC? Check out our upcoming report and be sure to assist us with our web survey (there’s an ebook you’ll receive and 5 $100 Amazon Gift cards we will raffle).

The post Can AI and Computer Vision solve the video conferencing eye contact problem? appeared first on BlogGeek.me.

ML vs AI: What’s the difference between machine learning and artificial intelligence?

Mon, 06/25/2018 - 12:00

Is it machine learning or artificial intelligence? It ends up depending who you ask and what is it you care about.

There are multiple ways to think and look at machine learning and artificial intelligence. And just like any other hyped technologies, people seem to mix the two and use them interchangeably.

I’ll let you in on a little secret: we’re doing the same with our upcoming AI in RTC report.

Want to help us with our research AND get a free ebook AND have a chance to win one of five $100 Amazon gift cards?

Fill out our AI in RTC survey

We could have just as easily use the title “ML in RTC” instead of “AI in RTC”. The way we’d approach and cover the space and end up writing this market research would be… the same – in both cases.

Why?

  1. I’ve never been a stickler to such details, especially when so many are mixing them up
    1. This is the same as having VoIP, Convergence, UC and now Teams mean the exact same things – just slightly differently
    2. Or why WebRTC is both a standard specification (almost at least) and an open source project implementing an approximation of that standard specification
    3. And it is why people mix between ML and AI. The distinctions aren’t big enough for most of the population to care – or understand
  2. Marketing
    1. Whenever a new technology or term becomes interesting and gets hyped, overzealous marketing and sales people would start using it and abusing it
    2. Which is what we see with this whole AI thing that is just everywhere now
    3. So why not us with our new report about AI in RTC?

Which brings me to this article.

Machine Learning and Artificial Intelligence are somewhat different from one another. The problem is to decide what that difference is.

Here are 4 ways to think about ML and AI:

#1 – ML = AI

Let’s start with the easiest one: ML is AI. There’s no difference between the two and they can be used interchangeably.

This is the viewpoint of the marketer, and today, of the market itself.

When everyone talks about AI, you can’t not talk about AI. Even if what you do is just ML. Or BigData. Or analytics. Or… whatever. Just say you’re doing AI. It is good for the health of your stock price.

While at it, make sure to say you’re doing AI in an ICO cryptocurrency fashion. What can go wrong?

Someone tells you he is doing AI? Assume ML, and ask for more information. Make your own judgement.

#2 – The road to AI

From Operational to BI

We’ve had databases in our products for many years now. We use them to store data, run transactions and take actions. These are known as operational databases. For many years we’ve had another set of databases – the analytical ones, used in data warehouses. The reason we needed them is because they worked better when asking questions requiring aggregations that look at large series of historical data.

That got the marketing terms of BI (Business Intelligence) and even Analytics.

BI because we’re selling now to the business (at a higher price point of course). And what we’re selling is value.

Analytics because it sounds harder than the operational stuff.

From BI to BigData

The next leg of that journey started about a decade ago with BigData.

Storage started costing close to nothing, so it made sense to store everything. But now data warehouses from the good-ol’ BI days got too expensive and limiting. So we came out with BigData. Things like Hadoop and Cassandra came to be and we were happy again.

Now we could just throw all our data into Hadoop and run batch processes on it called MapReduce that ended up replacing/augmenting our data warehouses.

BigData was in big hype for some time. While it is very much alive today, it seems to have run out of steam for marketers. They moved on to Machine Learning.

From BigData to ML

This step is a bit more nuanced, and maybe it isn’t a step at all.

Machine Learning covers the research area of getting machines to decide on their own algorithm – or more accurately – decide on how an algorithm will be used based on a given dataset.

Machine learning algorithms have been around well before machines. If you check the notes on Wikipedia for Linear Regression, you’ll find the earliest methods for it were published in 1805. And to be fair, these algorithms are used in BI as well.

The leap from BigData to ML happened mostly because of Deep Learning. Which I am keeping as a separate leap from ML. Why? Because many of the things we do today end up being simpler ML algorithms. We just call it AI (or ML) just because.

Deep Learning got everyone on the ML bandwagon.

From ML to Deep Learning

Deep Learning is a branch of Machine Learning. A certain type of machine learning algorithms.

They became widely popular in recent years since they enabled the accuracy of certain tasks to increase significantly.

There are two things we can now achieve due to deep learning:

  1. Better image classification
  2. Better accuracy in speech to text

Here’s how Google fairs now (taken from KPCB internet trends):

We’ve been around the 70% accuracy at 2010, after a gradual rise in the past 40 years or so from 50%.

This steep rise in accuracy in this decade is attributed to the wide use of machine learning and the amount of data available as training material to the algorithms.

Deep learning is usually explained as neural networks, making it akin to human thinking (at least until the next wave of better algorithms will be invented which are more akin to human thinking).

From Deep Learning to AI

And then there’s artificial intelligence.

Less a specific algorithm and more a target. To replace humans. Or to do what humans can do.

Or my favorite:

AI is a definition of what we can’t do with machines today.

Once we figure that out, we’ll just put AI on the next pedestal so we’ll have a target to conquer.

#3 – Learning or Imitating?

Here’s one that is slightly different. I heard it at a data science event a couple of weeks ago.

Machine Learning is about getting machines to select their own algorithm by presenting them a set of rules and outcomes:

  • You give a machine voice recordings, along with the transcription. And let them decide from that input on a new voice recording what the transcription should be
  • You give a machine the rules to play a game, and let it play many times (millions?) until he gets better at it, devising his own algorithm and strategy

Artificial Intelligence is about doing something a human can do. Probably with the intent to replace him by automating the specific task. Think about autonomous driving – we’re not changing the roads or the rules of driving, we just want a car to drive itself the way a human would (we actually want the machine to drive better than humans).

So:

  • Machine Learning is about letting a machine devise his own algorithm based on data we give it
  • Artificial Intelligence is about doing a task the way a human would
#4 – Predictions vs Actions

This one I saw at a recent event, which got me on this track of ML vs AI in the first place.

Machine Learning is about Predictions, while Artificial Intelligence is about Actions.

You can use machine learning to understand things, to classify them, predict and estimate. But once the time comes to act upon it, we’re in the realm of artificial intelligence.

It also indicates that any AI system needs ML to operate.

I am sure you can poke holes in this one, but it is useful in many ways.

Why do we care?

While I am not a stickler to such details, words do have meaning. It becomes an issue where everyone everywhere is doing AI but some end up with a Google Duplex while others show a rolling average on a single metric value.

If you are using communications and jumpstarting an AI initiative, then be sure to check out our upcoming report: AI in RTC.

Want to help us with our research AND get a free ebook AND have a chance to win one of five $100 Amazon gift cards?

Fill out our AI in RTC survey

The post ML vs AI: What’s the difference between machine learning and artificial intelligence? appeared first on BlogGeek.me.

UCaaS, CCaaS & CPaaS: An interview with Alan Masarek, Vonage CEO

Thu, 06/21/2018 - 12:00

An interview with Alan Masarek, CEO of Vonage.

Doing these video interviews is fun, so when the opportunity arose to be at the Vonage headquarters in Holmdel, New Jersey, it made sense to ask for a video interview with Alan Masarek, the CEO of Vonage.

In this interview, I wanted to get Alan’s viewpoint about the space he is operating in, especially now, some two years after the acquisition of Nexmo. It is quite common to find UCaaS vendors then are heading towards the contact center. Many will even add APIs on top. Vonage is the only one who decided to acquire a dominant CPaaS vendor (Nexmo).

As usual, you’ll find the transcript right below the video.

I enjoyed the interview and the hospitality. I’d like to thank Alan and the team at Vonage for setting this one up.

Transcript

Tsahi: Hi. So I have got here today, Alan Masarek, CEO of Vonage at the Holmdel, Vonage Technology Center.

Alan: That’s correct. We’re thrilled to be here at our Vonage Technology Center. It’s a pleasure to be with you, Tsahi. Thank you.

Tsahi: Thank you for having me here. I have a question before we start and this really bugged me a bit during the time that I’ve learnt about you and about the company: You came from Google to Vonage.

Alan: Yup.

Tsahi: Why?

Alan: Well, first of all, if that’s the only thing that’s bugged you, that would be exceptional. But in all seriousness, what excited me when I was presented this opportunity when I was at Google … And I’d gotten to Google from selling my earlier company to them back in 2012. So I was a director in the Chrome and apps group and I was very involved in the whole rollout of what is now today, G Suite. We used to call it Google for Enterprise.

What intrigued me about coming here was the opportunity to take this almost iconic consumer brand company that built this amazing level of awareness around providing residential phone service and how you could take the brand and the network asset as well as the cash flow from consumer candidly, and use that to pivot into business. I always look at markets the same way. You sort of sit back and you say, “Is that market worth winning and do you have the assets to give you an ability to win it?”

So when you look at the broader business communications market, it’s a massive TAM growing very quickly. And then even when you look at the competitive set, I found the big companies in this set were pretty unfocused. Most of the competitors were smaller companies, had less brand awareness, less sort of national scope, less profitability. So you have this huge TAM, a surmountable competitive set, then you have these assets from consumer that we felt we could bring to bear to win and that’s exactly what we’ve been executing on, that’s what we saw when I was at Google, that’s what I came here to do.

Tsahi: So you’re actually staying in this area between consumer and enterprise. You did that at Google with acquisition and now here at Vonage, moving from consumer to businesses.

Alan: That’s correct. So the company that I sold to Google focused really in the prosumer and enterprise segment. So we were a productivity solution that individuals would use and corporations would use. Here, we obviously have moved very specifically from our roots in consumer, in residential, focused in business. When we began that pivot, we started with small companies because that’s where the action was and the move to cloud, but now we’ve moved very purposefully upmarket to larger and larger corporate customers.

Last year, we signed what I think is the largest deal ever done in cloud communications with the largest residential real estate company in the United States. 21,000 corporate seats moving from prem to cloud and another 125,000 franchise seats.

Tsahi: Interesting. And what gets you up in the morning?

Alan: Well, this morning at 5 o’clock, my alarm clock but … What I’m excited about and I’ve continued … The reason I came here to begin with is I want to build a remarkable company here. It’s not just the transformation from moving from a residential-focused company to a business-focused company. We’re clearly executing on all those elements, whether it’s the technology platform itself, sales execution, the post-sales experience we provide our customers, all those things that we’re doing. But as important and in some respects if not more important, it’s the cultural transformation as well.

What I find that is really sort of stimulating to me is to create that switched-on Silicon Valley mindset culture. I like to think that we’re a billion dollar startup is what we talk about it. Last year, we finally crossed the billion dollar in revenue threshold. But I want to have the agility, the speed, the openness, the transparency, the honesty, all that, in order for Vonage to be … The way I describe it is I want Vonage to be that destination place to work the way Google was and everybody celebrates when they get a Google. I want them to feel the same way getting a job here.

Tsahi: Okay. And you’re a cloud communication company at the end of the day and cloud communication in the last few years have got a lot of attention, especially this last year. How come most of the businesses today are still on-premise when it comes to their communication needs?

Alan: On the communication side, the move to cloud has happened more slowly than CRM and ERP and HRM software, things like that. I think because the nature of dial tone has been about as reliable as the sun coming up tomorrow and there’s a great degree of risk that’s associated with it. Companies sit back and they say, “My goodness. It works. I don’t necessarily want to change it.” Now, the reality is when you move from the traditional prem-based solutions and the old PSTN network and such to IP-based, cloud-based solutions, you have infinite scalability, much, much more functionality, the whole notion of unified communications and communications platform as a service all stems from that. But I just think there’s been a fear factor that has caused it to migrate to the cloud more slowly than some of these other verticals.

But you see this amazing tipping point as recently as five years ago, only small companies for the most part were moving to the cloud. Now it has moved all the way up to major enterprises. And there are just example after example of other huge companies, global multinationals moving to cloud. It’s sort of no longer in dispute that cloud will supplant prem. It’s just like anything takes time.

Tsahi: What triggers them to do that shift, that migration from on-prem to cloud?

Alan: There are several trigger points. A couple of them are the comfort of moving to cloud. The cloud was scary just a few years ago and so it was to be avoided by bigger companies. But beyond that, it’s the productivity that they can get. Every company out there is going through their own digital transformation of one form or the other. Everybody is looking over their shoulder, scared to death of the more digitally transformed competitor has a bullseye on their back, is coming after their business. Obviously, we can always cite the example of physical retail stores versus Amazon eCommerce. That notion of digital transformation everyone has to go through and I think what’s happened is up until very recently, communications has been sort of the underappreciated element of digital transformation.

I always have this sort of visual metaphor in my mind that you can picture somebody on the old black rotary dial phone talking to a colleague saying, “We got to get that eCommerce site up.” Not realizing that the problem itself or a major piece of the problem itself is their communications infrastructure, how people work differently with one another, how they collaborate, et cetera, et cetera. All those elements of what we’re providing with these cloud communications solutions are fueling their digital transformations. I think that’s now being seen. Folks are more aware of that all the time and that’s why you’re seeing kind of everything change and move to cloud so quickly.

Tsahi: When you look at the communication market, for me, it’s like a Venn diagram with different parts of it. There’re unified communication and then contact centers and recently, we see APIs, these CPaaS communication platform as a service. When I look at what competitors do in this space, your competitors and unified communications, they end up going and doing something or adding stuff in the contact center. And then when they look at the APIs, usually go and say, “Well, we just put an API”; obviously they do because 2018, everybody uses an API on top of what they do. But you did something differently. You went and acquired the company called Nexmo and then their APIs, haven’t even touched it in a way and you left that to be a separate part of the business or a business all its own, with and without relationship to what you’re doing in unified communications.

Alan: The reason that we bought Nexmo is we have a view of what business communications is and will be that’s different than most. Most in the example have hosted PBX which has really been the principal use case of UCaaS or hosted contact center which has been the principal use case of CCaaS. In our view, those are just applications. Hosted PBX, moving your prem-based PBX to the cloud is a big TAM onto itself but it’s not necessarily an industry. The same applies to contact center. It’s not an industry. It’s simply an application or a use case which is really large and really important. But at the same token, the whole now new acronym of CPaaS, Communications Platform as a Service, says, “Well, there are other elements of communications that I want to simply program into my workflow, my mobile app, my business process, my website.” What have you. But have nothing to do with the contact center or the PBX.

Our view has been that we’re building a communications platform company. The whole notion of it is it’s a microservices architected platform. So we’re taking the Nexmo platform and our own Vonage Business Cloud platform and bringing those together. We refer to that internally as 1V, One Vonage. From that microservices architecture, you’re just going to serve customers in those big use cases. So whether you bundle several hundred of those microservices together in a use case called PBX or in a use case called contact center, or sell them one at a time that just get embedded into something else via the software APIs, it doesn’t matter. It’s the same platform. You’re just feeding where the needs are the greatest.

And the notion of this is that there’s not different industries, UCaaS, CCaaS, CPaaS. It’s simply communication elements, how they get deployed. The way I like to think about it is I go back to the music industry. We grew up, here’s songs and we can buy it only one way. Packaged, pre-published on an album. Apple came along and the cloud and said, “I’m going to unbundle the model and you can buy a song one at a time.” And then streaming services and subscription services have come along and the ability to mash up your music. They’re just different delivery models of the same song. It’s the way I think about cloud communications. There are communication elements, audio, video, messaging. Whether you package them in big applications like PBX or unbundle them as microservices, which is the CPaaS model, it doesn’t really matter. It’s just where the needs are the greatest.

Because at the end of the day, communication only serves a purpose. Does it make the company more productive? Does it connect my customers in a more personalized way with me as a company? And does it drive better business outcomes for my business? If it doesn’t do that, it doesn’t really matter whether you call it UCaaS or CCaaS or CPaaS. It simply has to drive those better business outcomes and that’s the approach that we’re taking.

Tsahi: Talking about Nexmo, they are now 12, 18 months part of Vonage now.

Alan: Almost two years. June 5th will be two years.

Tsahi: What synergies have you seen since the acquisition, up until today?

Alan: There’s been a great deal of synergies. You mentioned before about the Venn diagrams where much of the industry has developed as if the segments, UCaaS, CCaaS, CPaaS have been separate. We reject that. If they were all Venn diagrams, they all will be separate. Our view is they’re coming together all the time. So increasingly, the purchaser at a company, Acme  company, is the line of business manager. The conventional wisdom used to be that if I’m buying UCaaS, I’m the CIO or the head of IT and if I’m buying CCaaS contact center, I’m the help center. And if I’m buying communications platform as a service, I’m an individual developer, perhaps even the CMO. What you’re finding now is it’s coming together as lines of business. Given that trend from a synergy point of view, we’ve organized since the acquisition, completely functionally so that the entire engineering team, Vonage traditional or Nexmo reports up to the same CTO. The product organization up to the same chief product officer. Sales under the same chief revenue officer, same with marketing.

And they’re already doing tremendous amounts of lead sharing within the groups, operational sharing, sales enablement, sales training and things like that. Because what we’re finding is that in the cloud PBX world, your salespeople don’t want to go out there and go to a customer and say, “Buy me because my hunt group or my auto attendant is better than the other guys.” Because this very sort of baseline functionality. What you want to do is go into your customer and have a conversation about better business outcomes. So they’re just naturally carrying Nexmo into the discussion with every prospect out there. You can look at every one of our large company wins. It began with a Nexmo conversation interestingly, more than just the feature set of the PBX or the contact center. So you’re seeing very, very natural synergies happen. Now, it’s not a cost synergy issue for us in terms of people. When we bought Nexmo, it was about 175 people. I think it’s above 300 today and as I recall last time when I was in our London office, there was 140 open jobs for Nexmo this calendar year, so we’re growing in a big hurry.

Tsahi: We’ve talked about the cloud, we’ve talked about API. There is another big buzzword these days around communications and that’s “Teams”. The notion of what Slack started in a way. Messaging inside groups, smaller groups which is more ad hoc than the usual grounded structured way of communications. And you see today Microsoft going there, Cisco going there. All the big companies are headed there and then next to you, you got Google and Amazon joining this specific space. How is Vonage preparing towards that future of team collaboration, enterprise messaging, whatever you want to call it?

Alan: So not to sort of disclose all the goodies that are coming but within our roadmap, we have some very, very interesting developments around the collaboration and work stream messaging space that will be coming out later this year. And that’s tightly integrated as a single app whether you’re mobile, desktop or browser, with the experience in the communications system. Now, it also will integrate well with the major players that you just talked about. Slack, Stride, Teams, et cetera. Or it’s going to be WebEx, et cetera. Because it has to.

In our view, we can’t play king maker and say, “Oh. Mr. Customer, Mrs. Customer, you cannot use these other collaboration tools.” That’s ultimately going to the decision of the customer. So we have to have our own solution that is built-in in a fully integrated way but then the ability to integrate in with the others and that’s the approach that we’re taking.

Tsahi: Can I ask a question that just occurred to me?

Alan: Sure.

Tsahi: What about contact centers?

Alan: I think contact center is incredibly important as part of the integrated solution. And so today, we have a contact center built into Vonage Business Cloud which is our own proprietary call processing stack. And for our Vonage Enterprise Solution, we use BroadWorks contact center functionality. Then, in those situations where they need an advanced contact center solution, then we are a reseller of inContact. But again, it’s integrated fully in with our solution, so it appears like it’s a single experience. And then we serve it as if it’s a single experience so the contract is on our paper, the support is ours, things like that.

Contact center though becomes very, very important in the CPaaS market because so much of how communications get embedded in through some software API into that website, that mobile app, business process, what have you, is about customer experience. And so think of it as task routing. Somebody is on my website and they’re looking at my product and they have a question. Today, they may pick up the phone and call and have to start over because there was no context to what they were doing on the website, and these CPaaS type tools are all about the contextual. The software identifies the context to what I was doing.
So if was on Delta Airlines site trying to book a flight and I was 10 minutes into booking the itinerary and all of a sudden it had a problem, in the past, I’d pick up the phone and just call and have to start over because no one had any idea of the itinerary I was just trying to book. These new contextual tools that you can embed in, understand the itinerary so that it routes through the appropriate IVR into the contact center. So think of it as a task, an intelligent task. It knows I was trying to book a flight from Tokyo to Shanghai next Thursday and it will route me through the appropriate IVR to the person on the help desk for the international Asia markets.

And so you can envision from a customer personalization or a customer intimacy, rather than me having to start over which is what happens today, which is very frustrating to all of us. You can imagine the agent picking the phone up and saying, “Hi, Mr. Masarek. I see you’re trying to book a flight next Thursday from Tokyo to Shanghai. How can I help?” That’s a direct connection between the customer experience, routing the task into the contact center. We think that’s very important.

Tsahi: Let’s look a little bit into the future.

Alan: Okay.

Tsahi: What do you think is the biggest challenge for the modern businesses moving forward from now on? When it comes to communications of course.

Alan: I’m not sure it’s a challenge. I don’t want to sort of split words between challenge and opportunity, but I actually think communications is going to fundamentally change by virtue of we’re no longer tethered to a physical device. We think about communications, I’m on a call, either a landline or a desk. In our vision for it, communications is in everything. So whether it’s a click-to-call or click-to-communicate functionality in the website or … Pick whatever app you want. You’re on Salesforce, I’m on an Excel spreadsheet, someone else is in G Suite or in Gmail, or in Google Sheets. Doesn’t matter. There will be click-to-communicate functionality everywhere and naturally, these microservices that are going to be created increasingly by these CPaaS type solutions. So you’re going to have I think this explosion in communications the way I think about it because you’re no longer tethered to anything physical. You’re in an app or a website or what have you.

And the way I think about it is your decision of how you communicate is simply going to be a function of the limitations of the physical device that you got onto the internet with. So for instance, if the device doesn’t have a camera, you’re not going to do video. If it doesn’t have a speaker and microphone, you’re only going to do messaging, that’s all you can. But the mode, video, audio or messaging is going to be the limitations of the device and your personal preference, also kind of situational. If you just stepped out of the shower, you’re not going to do video likely. So the point is regardless of how you’re interacting in some sort of app or website, you’re going have communication everywhere. So I think the notion of the challenge to companies is less the challenge and more that I think it’s going to change the way we work because the notion of how we collaborate, how we share, the tightness of the communication, sort of that feedback loop is going to get tighter, and tighter, and tighter is the way I think about it.

I actually think about communication, this renaissance or this explosion in communication a little bit like the internet 10 years ago. 10 years ago, there was no video flying around the internet. It was kind of more flat files and such. There wasn’t full-motion video. There certainly wasn’t virtual reality and things like that, and self-driving cars and all these stuff that is just massive quantities of data that are going around the internet. When that began, look what happened with all the content delivery networks. They just kind of went like this in terms of the volume of capacity they have on the internet. I think communications is going to go through this similar renaissance or explosion in the sense because if communications are everywhere, not just on specific devices, you’re going to be communicating all the time, and so I think you’re going to see this massive uplift in it. If it’s a challenge out there, it’s going to create sort of communication overload, perhaps, but maybe smarter people than use will figure it out on how to make it simpler.

Tsahi: And moving forward, would businesses end up building their communication needs on top of APIs, go pick a UCaaS or a communication solution to do that for them or go for even a very specific niche SaaS product to get what they need?

Alan: I think that increasingly, communications will be built on top of the platform, the PaaS product, not going and buying some monolithic application. Like you said earlier, everybody’s got APIs. The old way we used to write software, we write a big monolithic solution from the UI, the user interface, all the way down to the metal called PBX, in our example. I can open up APIs to the PBX but it’s not programmable. It’s simply an API into that monolithic solution. Where we sit today is a microservices architecture where it’s fully programmable.

And I think what you’ll see, and this is exactly the strategy we’re building to, is whether you want to use that big chunk of microservices in a particular use case that is as a big application like PBX or a big application like contact center, it’s just a function of what’s the best way to deliver it to a customer. Do I think people are going to build their own PBX all the time? No. Because I think to me it’s analogous to the vast majority of people don’t build their own computer. You certainly could. You could be a hobbyist and build your own PC and buy the motherboard and the chassis and the whole bit, but very few people do that when you go out and buy a computer for $400. So I think the PBX distribution model where it’s something you’re going to subscribe to, it’s a SaaS solution, will persist, but I think the microservices are really going to takeover where communications get woven into everything else.

Tsahi: Vonage in 5 to 10 years from now, where do you see the company itself? What are you going to sell to businesses, to consumers? What kind of services are going to be there?

Alan: Vonage in the next five years will be an extraordinarily different company than it is today. Let me go backwards first. Four years ago, we were 100% consumer. Now, this year in 2018, roughly 60% of the revenue is business. Business is growing really quickly. So as of last quarter, 22% growth organically, nothing to do with acquisitions. And consumer has been declining as residential home phone usage is in decline, by 12% roughly. Now that business is the larger of the two segments and growing at twice the rate that consumer’s declining, you can imagine where the line separate in a very big hurry. So the whole focus of the organization is on business. It already is. Consumer is still a meaningful piece, it’s 40% but it’s getting smaller all the time as a percentage of the total.

What’s interesting from a how we’re going to serve customers is precisely the way we do it today. Our whole approach from a platform perspective, the way I described it where irrespective of whether it’s UCaaS, CCaaS or CPaaS, coming out of a common platform, we will continue to execute on that. What’s interesting where I think a value unlock happens for the company is you’re now going to have … We’re already having consolidated revenue growth.

Last year, we did just above a billion dollars in revenue. This year, Wall Street has us close to a billion fifty. Again, as the smaller piece, consumer, get smaller and smaller, it’s mitigating impact and overall growth declines. Therefore, we’re sort of more and more of a consolidated growth company. Again, unrelated to any acquisitions, just purely organically. The notion then of, “Oh my goodness. You’re in the midst of a transformation” goes away because you’ve now transformed.

So where I can see us in pretty short order is serving our approach to our customers in this differentiated way which I think will withstand the test of time, will withstand competitive entrance because, the end of the day, we’re just rooted in how do we provide better business outcomes for our customers. But now you’re going to have this increasingly fast growing consolidated company, well greater than a billion dollars in revenue, highly profitable still and I think that’s going to be a value unlock for the story. When I go back to many transformational stories in the early days, there’s a lot of investor skepticism about transformational stories is most of them don’t work. This one’s worked and that’s why we’ve had sort of a almost quadrupling of our stock price over the last four years.

Tsahi: Okay.

Alan: All right.

Tsahi: Thanks for your time, Alan.

Alan: My pleasure. Thanks so much. I enjoyed it.

Tsahi: Me too.

Alan: Sure. Thank you.

Tsahi: Thank you.

The post UCaaS, CCaaS & CPaaS: An interview with Alan Masarek, Vonage CEO appeared first on BlogGeek.me.

Where does Machine Learning fit in Real Time Communication (ML in RTC)?

Mon, 06/11/2018 - 12:00

ML in RTC can fit anywhere – from low level optimization to the higher application layers.

TL;DR – I am working with Chad Hart on a new ML in RTC report. If you are interested in it, scroll down to the end of this article.

Machine Learning (ML), Artificial Intelligence (AI), Big Data Analytics. Call it what you will. You’ll be finding it everywhere. Autonomous cars, ecommerce websites, healthcare – the list goes on. In recent years we’ve seen a flourish in this domain due to the increase in memory and processing power, but also due to some interesting breakthrough in machine learning algorithms – breakthroughs that have rapidly increased the accuracy of what a machine can now do.

My ML Origin Story

I’ve been looking and dealing with machine learning for many years now. Never directly calling it that, but always in the vicinity of the communications industry.

It probably started in university. I decided to do an M.Sc because I was somewhat bored at work. I took a course in computational linguistics which then ended with me doing research in backward transliteration, looking at phonemic similarities between English and Spanish (#truestory). That was in 2005, and we used a variant of dynamic programming and the viterbi algorithm. That and other topics such as hidden markov model were my part and parcel at the time.

Later on, I researched the domain of Big Data and Analytics at Amdocs. I was part of a larger group trying to understand what these mean in telecommunications. Since then, that effort grew into a full business group within Amdocs (as well as the acquisition of Pontis, well after I left Amdocs for independent consulting).

Which is why when I talked to Chad Hart about what we can do together, we came to an agreement that something around ML and AI made a lot of sense for both of us, and taking it through the prism of RTC (real time communications), placed it in the comfort zone of both of us.

We molded that effort under the Kranky Geek roof, calling it Kranky Geek Research. Created a landing page for our research, a brochure and a survey (more on that later).

During that period, we thought a lot about what domains we wish to cover and what ML in RTC really means.

Categorizing ML in RTC

Communications is a broad enough topic, even when limited to the type that involves humans. So we limited even further to real time communications – RTC. And while at it, threw text out the window (or at the very least decided that it must include voice and video).

Why do that? So we don’t have to deal with the chatbots craze. That’s too broad of a topic on its own, and we figured there should be quite a few reports there already – and a few oil snake sellers as well. Not our cup of tea.

This still left the interesting question – what exactly can you do with AI and ML in RTC?

We set out to look at the various vendors out there and understand what are they doing when it comes to ML in RTC.

Our decision was to model it around 4 domains: Speech Analytics, Computer Vision, Voice Bots / Assistants and RTC quality / cost optimization.

1. Speech Analytics

Speech Analytics deals a lot with Natural Language Processing (NLP) and Natural Language Understanding (NLU).

Each has a ton of different use cases and algorithms to it.

Think of a contact center and what you can do there with speech analytics:

  • Employ speech-to-text for transcription of the sessions
  • Go further with sentiment analysis from analyzing voice queues and not only the transcripted text
  • Glean meaning out of the transcription and glean actionable insights based on that meaning

You will find a lot of speech analytics related RTC ML taking place in contact centers. A bit less of it in unified communications, though that might be changing if you factor in Dialpad’s acquisition of TalkIQ.

2. Computer Vision

Computer Vision deals a lot with object classification and face detection, with all the derivative use cases you can bring to bear from it.

“Simple” things like face recognition or emotion recognition can be used in real time communications for a multitude of communication applications. Object detection and classification can be used in augmented reality scenarios, where you want to mark and emphasize certain elements in the scene.

Compared to speech analytics, computer vision is still nascent, though moving rapidly forward. You’ll find a growing number of startups in this domain as well as the cloud platform giants.

3. Voice Bots & Assistants

To me, voice bots and assistants is the tier that comes right above speech analytics.

If speech analytics gets you to NLP and NLU, the ability to convert speech to text and from there moving to intent. Voice bots are about conversations – moving from a single request to a fluid interaction. The best example? Probably the Google Duplex demo – the future of what conversational AI may feel like.

Voice bots and assistants are rather new to the scene and they bring with them another challenge – do you build them as a closed application or do you latch on to the new voice bot ecosystems that have been rapidly making headway? How do you factor in the likes of Amazon Alexa, Google Home, Google Assistant, Siri and Cortana into your planning? Are they going to be the interaction points of your customers? Does building your own independent voice bot even makes sense?

Whatever the answers are, I am pretty sure there’s a voice bot in the future of your communications application. Maybe not in 2018, but down the road this is something you’ll need to plan for.

4. RTC Quality & Cost Optimizations

While the previous 3 machine learning domain areas revolve around new use cases, scenarios and applications through enabling technologies, this one is all about optimization.

There are many areas in real time communication that are built around heuristics or simple rule engines. To give an example, when we compress and decompress media we do so using a codec. The encoding process (=compression) is lossy in nature. We don’t keep all the data from the original media, but rather throw away stuff we assume won’t be noticed anyway (sounds outside the human hearing range, small changes in color tones, etc) and then we compress the data.

The codecs we use for that purpose are defined by the decoder – by what you do if you receive a compressed bitstream. No one is defining when an encoder needs to look like or behave. That is left to developers to decide, and ecoders differ in many ways. They can’t brute-force their way to the best possible media quality, especially not in real-time – there’s not enough time to do that. So they end up being built around guesswork and heuristics.

Can we improve this with machine learning? Definitely.

Can we improve network routing, bandwidth estimation, echo cancellation and the myriad of other algorithms necessary in real time communications using machine learning? Sure we can.

The result is that you get better media quality and user experience by optimizing under the hood. Not many do it, as the work isn’t as high profile as the other domains. That said, it is necessary.

Interested in ML in RTC?

Here are a few things you can do:

Fill out our survey

This will get factored into the quantitative part of our report. If you fill it out, you will also receive a complimentary e-book we’re writing titled Intro to AI in RTC.

Take the ML in RTC Survey

Learn more about the report

Interested in the report itself? Thinking of purchasing it? Great! We have a special launch discount.

You can find more information about the report itself in our research page.

Download the report prospectus here

Share your opinion on AI in RTC

Doing something interesting in this space? Share your thoughts with us.

Contact us via research@krankygeek.com to participate in our study.

The post Where does Machine Learning fit in Real Time Communication (ML in RTC)? appeared first on BlogGeek.me.

Pages

Using the greatness of Parallax

Phosfluorescently utilize future-proof scenarios whereas timely leadership skills. Seamlessly administrate maintainable quality vectors whereas proactive mindshare.

Dramatically plagiarize visionary internal or "organic" sources via process-centric. Compellingly exploit worldwide communities for high standards in growth strategies.

Get free trial

Wow, this most certainly is a great a theme.

John Smith
Company name

Startup Growth Lite is a free theme, contributed to the Drupal Community by More than Themes.