What is WebRTC ? How to Build Own webRTC Platforms
Today’s we learn how we build our own platforms in a more elegant way using the statistics api from web RTC.
I’m going to first of all begin with a bit of small refresher about the protocols for multimedia so when we talk about multimedia systems especially like WebRTC there are bit of things that multimedia system needs to do so one it needs to capture audio and video these come in a couple of codecs.
What are Video And Audio Codecs?
Video we have h.264 we have vp8 vp9 and more codecs in the future for for audio that is g711 g722 and opus and typically an endpoint which is a multimedia system captures.
This the the audio and the video packets or frames at first then packet eyeses them and sends them over the network on the receiving side the endpoint receives these packets the D packet I system has a jitter buffer to make sure that players out correctly and then basically renders it.
What is RTP(real-time transport protocol)?
So just a bit about how we do this typically we use a protocol called RTP which stands for real-time transport protocol typically in this on in the next few charts I’m only going to talk about senders and receivers to just to make sure that you people understand how this works.
So in these cases the media is only flowing in one direction but with the bar TC media flows in both direction so in this case you have a sender and receiver. It’s in scattered the Centers encapsulating the audio and video packets and sending them across the network along with the media package that might send protection packets it might send retransmissions it might send repair packets so and so forth.
How to Handeled If Packets Lost?
In case packets get lost at the receiver the receiver is receiving these packets and playing them out it needs some information to to play out these packets in synchronized manner especially when it’s getting audio and video so it has a jitter buffer and and it also does some monitoring and reporting. So that in points can look at the data locally and see if the media is being rendered properly or not with RTP there is also a control protocol associated with it. It’s called rtcp which is RTP control protocol.
There is of course to create a feedback loop between the sender and the receiver as i said the receiver needs to play back audio and videos in a synchronized manner so the sender sends some timing information from the sender to the receiver so that it can play them out in an accurate smooth manner, so that there is no avd sync problems at the receiver needs to send data back to the sender, so that the sender can adapt typically the this is carried in RTC PS receiver reports or call our DC PRRS.
They have rough statistics and some congestion cues which helps the sender do short-term adaptation and when I talk about short term adaptation this can be automatic bandwidth control or adapting the media bitrate or resolution or frame rate based on the network characteristics observed at the at the receiver so this was a quick recap of the protocols associated with the vid multimedia.
Today with WebRTC we do all this mainly because every application service wants to have really high quality video and audio and typically that requires especially in web RTC case the the Layton sees for these systems to be low so that you can hear and interact with the people in real times.
The talk before for example talked about like what happens when this doesn’t happen or what happens when these situations are encountered and one of the proposals was to like do a pretest and the pretest is all good because you know you make sure that the call starts off very well but if the calls are long enough there might be issues that people may encounter during a call this might be people you know someone turns on the microwave or people walk out of the door lose connectivity and so on so forth.
How WebRTC Provides Highest Quality Audios & Videos:
When we talk about webRTC platforms we want to always say you know beaten one the highest quality of audio and video but literally the most important thing is not the highest quality audio video you want optimal audio quality and optimal video quality but the most important part is to have interactivity and that is guaranteed a bit by having lower latency.
Low latency means like lower one-way delay or lower rtts they’re all synonyms for the same metric for example in this chart you can see that this is a call which is about two hundred sixty seconds the y-axis is the delay or the the one-way delay measured by the endpoints.
What you see is also a red line which is the itu-t standard for audio latency typically when you say the audio latency should be below this for a call to be considered good so and in the chart you can see that the audio latency from one one endpoint to the other is spiky at the beginning of the call, it slower but towards the end of the call around 200 seconds in you see a three-second spike and that’s not.
How WebRTC Provides Highy Quality Call Experience:
It the spikes continue after so one of the most important things is is about being able to measure this and to be able to respond to a customer support tickets when they say you know they had a bad quality of experience a bit more about quality of experience is the ability to measure the user experience and when we talk about web RTC the most important thing we talk about is the call expierence a duration of the call from the start to the end.
We want to be able to measure the quality of experience by collecting metrics and you also want to collect user interactions for example did they turn off their audio? or did they turn off their video at some point? and did they switch from like the camera to screen sharing? did they switch back and so on so forth? because we want to be able to correlate the metrics that you see during the call experience with user act user actions for example.
If you see a video bitrate drop you want to know it was because the user indeed turned off their video then you can speculate why they might they may have done that later on the other bit.
Important Factors By Google:
It is also an important factor in the talk by Google earlier on but Daniel Peterson was the call setup time and how important it is so when you build applications.You want to be able to measure like when the calls started, when was the interaction made and when did the first media frame arrive, so that you can say the call has begun and that’s an important aspect.
If the call takes too much time to set up people might get a bit annoyed or just leave and you might have a larger fall off in in your usability of the product at the end of the call.
Typically you’ve seen with skype and various other applications that they end at the end of the call they asked the end user how the quality of the call was, it’s really important to ask the right question because if you do not ask the right question you might get really weird feedback.
So being able to say like did you experience specific annoying issues with the video or with the audio is a better thing than just saying please rate the quality of the call because you might get answers like for example like one of the times we’ve seen an end user.
User experience depending on what type of service you bill for example hangouts and other services when you pass a URL or give a URL to a person the video immediately starts rendering in those cases you want to be able to measure the page load times and so on so forth to make sure that the call starts immediately in a snappy manner.
You want to make sure that all the libraries that the call depends on are loaded ahead of time and if you have any issues with caching or with some Java scripts you want to make sure that you are able to measure those at the other at the beginning of the call the other thing is that not all calls work that way there are other calls where for example on slack you press a button and initiated into alcohol you want to actually see what happens at the point when you press that button, so you want to be able to capture all those aspects of the call initialization process.
Just to make sure that you have all the data available for you to measure user experience so as I said so there are a lot of things that you can do in terms of measurement and typically you can do that at the network level so you know you get bits per second, you get RTT so those are network statistics and the second level of information you want is the multimedia statistics.
You want to know if a frame was rendered if there were burst frame losses did all was the audio and the video played out synchronized I synchronously so that you know that the pipeline is working in a correct manner above all what you want to make sure is when you have these network matrix and the multimedia pipeline metrics you are able to create models out of them that.
We call annoyances so that you can say for example the resolution changed by too much so you started at 4k or 1080p and the video resolution dropped 264 DP or for adp very quickly and over a very short period of time and maybe went back up to 2 HD video in a very short period so those things are annoying for users for example because the quality of the video has changed dramatically over very short periods and by measuring those things you can come up with your own quality metrics depending on the kind of service you building.
There are some services where you want like low frame rates or in sub some others you want high frame rates and maybe optimal resolutions and you can do that today by using the get stats API so the get stats API is at the top where it says that there’s a interface on the peer connection, which you can call it is the response to the pure connection or to the get statue API is asynchronous.
We call the api and short time later asynchronously we get a callback which gives us an answer about the data in it we can call it at the pure connection level or if you know which dream you’re interested in for example an audio or a particular video stream.
How to Get Stats?
We can call the get stats on that particular stream you can call the api at as often as you want typically if you call it quicker than 150 milliseconds you’re going to like probably get the same response back as you got a moment ago and you can call the by calling the api more often you can get a series of data which might show you the trend.
For example what is happening so is the bit rate increasing or is the bit rate decreasing so that is one of the the charts that you saw before was we’re calling pure can get stats more often about every one second here’s an example for example in this case you are interested in the audio latency or in this case audio track latency we call the get stats on the pure connection and this is a promise API so as soon as the the data is available.
What is den function?
The den function would be called and if it fails then the catch would be called so if there was some error in get stats when you call get stats if it was not implemented or so and so forth it there would be a log error displayed in the catch statement and as you can see in this case.
How to Measure the rtt?
I’m just logging almost all the data out of the outbound statistics for the audio and the reason I did that was the rt-pcr tepees RTT or round trip time is measured at the sender so you send a packet and it sends something back any measure the time taken for that interaction, so if you want to measure the rtt you call the outbound RTP you can see the output on the right side where it says the package sense bites and and round her time.
In this case the round-trip time is in milliseconds so it’s 31 milliseconds but when we talk about metrics another way of looking at it is that there is media flowing between Alice and Bob in this case again for simplicity reasons I’m only showing media in one direction and you can gather statistics across the pipeline so at the sender side there is there are tracks that are audio video tracks that are put into a sender which is an RTP sender which sends the data over the network oh through a nice transport and at the receiver side the data is received on the ice transport and given to the RTP receiver where it deep akka dieses it and sends it to the right track so in most cases you can call get stats on all these objects to get accurate information related to that particular stream.
For example so I’m just going to walk you through a quick example of the skull between two people so there is a user 0 and there’s a user one and I’m only using the data that is going from user one to us from 0 to 1 shown by the orange line and we’re just showing the frame width in this case sorry if it is frame rate than the on the top I say frame height apologies for that and and as you can see at the beginning of the call the the the frame width is 1080p it quickly drops to 640.
I believe and then remains constant for most of the period and then around 1600 to 1800 it it switches between low between the current and higher and then comes back down so at the trax level you can see the data related to the tracks in this case you’re showing the resolution at the RTP sender level you can look at the troop word and for example in this case.
You may also see: FreeSWITCH VS Asterisk
We can see the throughput is not always constant it is like varying at the beginning of the call it goes up to 2,600 then drops and there are quite a bit quieter moments where it drops below 1 1 megabit per second and towards the 1600 and 1800 what you notice is that the bandwidth drops quite a lot very quickly and then goes back up and comes back down so this is similar to the frames that this might be one of the reasons why the frame width was changing at in this periods of time.
It’s also interesting because we see these drops in the middle between 600 to 1200 that do not really cause any variation in the frame berth on the previous slide was basically flat from the beginning till the end so one of the things that we noticed was we went down to the ice transport and we looked at you know what are the losses that we are seeing on the same bath and what we notice is that there are periods of forty percent losses which kind of coincide in the beginning like at 400 800 and 1200.
We noticed these kind of peaks in in packet loss which perhaps indicate why the the throughput of the media throughput dropped in that period but and the other thing that we notice is that around 1200 and 1400 you see spikes again which along with the fact that the bandwidth also fluctuated in this period kind of is indicative of the fact that the the congestion control tried to to change the frame width in the resolution to see to overcome these losses and in the lack of capacity in this period.
So these are things that you can do if you have data across the pipeline being able to like go and investigate after the fact I’m just going to walk you through another example of a simplified a model this is for g.711 it’s a recommendation from itu-t from maybe 20 25 30 maybe even earlier years ago and what it basically says is the one way delay from the from the mouth to the ear should be within bounds so at the highest level.
If you want users to be very satisfied the mouth to ear delay should be about 250 milliseconds and at the worst case if it’s above 500 milliseconds then it’s considered bad or the user is dissatisfied and this is fairly easy to implement today because we have the get stats API and remember at the beginning of the on charts before I talked about the get stats API so we were just repurposing that data again so you have a selector which is only using the audio tracks.
Why used Get Stats API:
we’re looking at the get stats API only for for the audio track and in this case we are pulling it at every one second interval so that’s why there’s a timeout that will get the rtt measurements every one second and we can look at the variation over a longer period of time the next thing we want to do is we want to be able to look at the the outbound RTP as I said before the round-trip time or the rtt is only measured at the center side so we want to look at the the metrics at the center side which is the outbound RTP and the local statistics for that audio stream and what we’re going to do is every time.
We get in I round-trip time measurement we’re going to like put it into an array so that we can calculate an average over the lifetime so you can either take from the beginning of the call or you can take 10 seconds samples or 22nd samples or longer samples for measurements depending on how robust you want the metric to be and once you have calculated the rd round trip time average we can divide that by two and pass that to the simple amodel which would then go and compare it to the graph.
You may also see: PBX System For Small Businesses
What Is I state?
I’d shown at the beginning of the call and you can log that for example in locally or you can send that data to your logging service where you would be able to show the variation of the call quality for that particular person over a period of time so this was just an example and you’re wondering what you could do more it’s not just about the get stats API you can also attach yourself to the I States so there’s something called on I state change you want to attach yourself to that because whenever it goes to disconnected state for example or if it fails you want to be at the application level be aware of the fact that you lost connectivity or or the connectivity is only momentarily lost because you can build things.
What Is Disruption Metric?
For example when user complaints that you know they do not have very good quality or they could not hear a person for example in our case what we’ve done is we have this visualization where you have three participants in a call there they are present for about 16 to 18 minutes and we have this metric called disruption which means that either they lost connectivity so the network address changed or there was low available capacity or very high RTT or or very high losses in this case you can see that by the grey periods on the charts.
The the most important thing to notice is that the third user AE is disrupted for about eight minutes with this user e six and four in this case it’s because that he was experiencing very high Layton sees during that period so at the high level if we go and look at ours logs we can easily look at this disruption.
For example and if a user complains your kind of aware that you’re also seeing the same behavior that the user is complaining about and then you can as a second step go and try to diagnose what those reasons would be a common occurrence that we hypothesized earlier was the fact that women whenever you have like low low bandwidth situations or losses and you can’t hear the other person typically at least I go in you know try to switch off my video and sometimes turn off my audio and then turn them back on to see if the person will be able to hear me better right so I turn off my video in the process to see if that it that happens and in this case there are three people in the call again and all three are disrupted using the same dis same definition of disrupted as before at the same time and what we see now is that the users are actually turning off their audio or turning off the video and turning them back on.
So each line represents like their interaction and this is another way of looking at like how people are self diagnosing their issues by turning off video and and this is how we do analytics and and that’s why you don’t only need data from the get stats API but you want to like leverage as much information and context that you can gather from your application if you’re interested in more about get stats I’m one of the co-authors with harvest ran from google about the get stats API it’s available at a URL listed there we recently updated the document maybe 6-8 weeks ago.
We are going to update it again over the next few weeks before the end of the year so if you’re more interested in using the get stats API you can have a look there the other thing that we learned over the last few years is near you know our customers needed to deploy monitor more turn server some of them needed to go down to introducing TCP support.