Integrating WebRTC with your app

Thank you so much, Chris. And thank you, Sahid. Thank you both for inviting me to speak. And thanks to all of you for taking a few minutes to listen. I think this talk is going to be a little bit different from the ones we’ve heard so far, but I hope this provides an interesting perspective. As Chris mentioned, I work at Atlassian, and I work on the HipChat product. I’m the product manager there, and I’ve been with that team for about three months. HipChat, if you are not familiar with it, is a chat tool built for your team at work. Another way to think about it is it’s like IRC, except everybody at your work is willing to use it, instead of just the devs.

But it’s fantastic at bringing your whole community into one place. You can have persistent chat rooms for teams, projects, deploys, bugs, anything you could think of. And of course, you can chat one-on-one with anybody on your team. You can share files, images, links, and HipChat records everything and makes it searchable and easily findable later. HipChat runs on pretty much everything. Of course, we started on the web, but we also have Native Clients for Windows, Mac, and Linux; and you get it on iOS and Android. So pretty much anywhere you are at any time, you can be in contact with your team using HipChat. We found– and this was certainly the case with us before we discovered HipChat– that most people at work are using some sort of consumer tool to communicate with their colleagues. I found that I did this all the time. So we decided to build a better solution. HipChat is built for business, because your whole team is already there. There are tens of thousands of teams around the world who are using it to collaborate today.

It’s the first thing they sign into in the morning. They leave it open on their desktop all day long. They converse in it with their colleagues until they go home, and even after. And it is the heartbeat of those teams. This was certainly the case for us, but we found that there was a problem. Any time we found ourselves moving the conversation outside of HipChat, we got frustrated. It happened over and over again. So as conversations got longer, someone would inevitably say, “well hey, let’s just talk about this rather than typing for an hour.” Then we proceeded to the negotiation phase. So do you want to use a real phone? Or how about Skype? What about a Hangout over here? We could just go to the conference room and use a VC system? Oh wait, OK, what room are you in? What’s your login on that system? I invited you, have you seen my invitation? I don’t see you logged in, let me restart.

So inevitably, it would take 10 minutes to actually get this stuff done. And by the time you actually connected with the person that you wanted to talk to, you’d forgotten what you wanted to talk about in the first place. So that’s why we decided to build HipChat Video. You can now move a conversation from text chat to video chat with a single click inside HipChat. You can do video calls, screen sharing, and audio calls all right within the application. And so far, our customers have loved it. So that’s really what I’m here to talk about, is how we built HipChat Video. But here’s the tricky bit, we really didn’t build very much of it, actually. We ended up using a video provider to power all of HipChat Video, and all we did was integrate it into the application. We decided that running a video platform was not our core competency, not where we wanted to spend all of our energy and our effort. We wanted to build a great experience, and a great chat tool, and a great collaboration tool, of which video was a feature.

And we weren’t convinced that we could do it better than the many great companies that are starting up right now, devoted to that very problem. We wanted to take advantage of what they had done. So we set out on a journey about a year ago to choose a provider to work with, and to integrate that feature into our application. I’m going to talk a little bit about what we thought about as we chose somebody, what we have discovered as we rolled it out, and I’m going to leave you with a few of the lessons we learned. To start with, the most important thing, and one of the daunting things that caused us to want to work with a provider to begin with was platform support. Because we have to work on all those platforms, native apps, all the browsers including IE, including Safari, we knew that vanilla WebRTC was not going to be enough.

So we found somebody who could provide us a library and the plug-in for the non-native support browsers, and they would extract away all that difference. We could program to one API, no matter which browser we were using, and it worked great. They use WebRTC everywhere they can, they install a plug-in where necessary, and then I think they have some fallback methods for the very old versions of IE. In the same way, we weren’t just building for the web, we had five other platforms.

Luckily, we found a provider that provided a library that supported Mac, Windows, Linux, and iOS and Android all natively. So we had an SDK for each of those. They extracted away a lot of pain that we would have had to go through if we were trying to do this entirely on our own. That API is actually the second thing that we’ve thought really long and hard about. We actually spent a lot of time experimenting with multiple providers to make sure that they had an API that was complete, that was well documented, that supported what we wanted to do, and that was well supported.

And it was actually amazing how many people failed this simple test. The features that we cared about, that was obviously another major consideration. We’ve started now with one-to-one video chat, but we know that we want to go much further. We want to move to group next, and broadcast in the future, and so we made sure that we found somebody who was heading in that same direction, and provided a lot of those features. We looked at how they handle multi-party, how widely they could scale, what technology they use behind it, and there have been some great explanations today about the architectures that support that. We also looked at things like how they collected analytics, and could we have them? Could we actually see how things were performing inside our application? And quality measures as well. Fourth, we cared a lot about user experience.

We were trying to solve a problem where there was a lot of sort of UX friction. And so we wanted to make sure that we could make a smooth and painless an experience as possible to make it feel like 100 percent native part of the app. So it was interesting, because there were a bunch of providers– maybe not a bunch, but there are some providers– that we looked at that had a standalone UX. You couldn’t really modify very much. Sometimes there was branding that you couldn’t get rid of. Sometimes it would launch a new process. We found one that we could integrate wholly into the app, and as far as our customers are concerned, there is no difference between HipChat and HipChat Video. So not every provider gives you the latitude that you might want to build the user interface that fits best in your application. And lastly, we were looking for a company whose primary business model was providing video as a service to us and others. We were happy to pay for this. We just wanted to make sure that our incentives were aligned with theirs as a company, and that we were going to go down this road together, because you are making a long-term commitment.

And you want the team to be responsive, and fortunately the folks we worked with proved to be very responsive. So how did it go? It worked out fantastically well from our perspective. Once we had selected a provider, we had a prototype up and running in a matter of weeks, and it took several months after that to go from prototype to shipping, But I guarantee you, it would have taken possibly an inordinate magnitude longer to actually build support for all five platforms, for all the backend, for multi-party support. That was all stuff that we just got to take advantage of someone else’s great work. And when we launched, we supported all those platforms. All the browsers, all the native apps, and all the mobile apps as well. And that was a fantastic thing, because of course video as a feature in your app, it’s sort of a network effects kind of thing. It’s more valuable the more people I can call. So if we had a limit and say, well, it works in the Mac app and the Windows app, but not in iOS yet, then you’re automatically sort of kneecapping your feature to begin with.

So launching on all platforms simultaneously was really important to us, and with the partner we were able to do it. So we launched HipChat Video three months ago, as I mentioned, and user response has been really, really good. So just a few of the reactions that we’ve seen in that first week, and since. People are really valuing the ability to stay within the app, to use video in the context that they already are. We’ve had over 150,000 video calls placed in three months, 2.5 million minutes of conversations. I didn’t really know what to expect when we launched this thing. I didn’t have an idea of the scale. But when I looked this number up, it blew my mind. That is a lot of people talking to each other about a lot of really important things.

The average call is about 15 minutes long, and there are plenty of calls that are an hour or more. So clearly, people are taking advantage of this for meetings at work, which is sort of how we were using it as well, so that makes sense. In HipChat, we are really doubling down on this idea of making great meetings. We think that having a tool like this, especially for distributed teams, can make meetings more efficient, less painful, and just more useful by bringing all your communication channels into one context. So that’s what we’re thinking about. So what did we learn? Or what have we learned so far in the year or so that we thought about selecting, and then building, and then releasing this feature inside our chat application? Well first, we figured out that video is actually really hard in the wild. And we were insulated from some of it because we got to work with somebody, but still there was all sorts of user error that we weren’t expecting.

I mean, we were all web developers, and all of a sudden you’re now dealing with hardware. It’s not complicated hardware, but it’s still hardware, and it’s not something that I was used to ever having to do as a web developer. But you’ve got to worry about cameras, you’ve got to worry about drivers, you’ve got to worry about the microphone, you’ve got to worry about other applications who have stolen focus. And these are things that, no matter what happens, the user is going to blame your application. So we had to work really hard to try and smooth that experience out. And of course, with video, especially with peer-to-peer video like WebRTC, you’re at the mercy of the network.

And again, the user doesn’t care what the network looks like, they just want the thing to work. And if it doesn’t work, it’s your fault. So that led us to a second lesson, which was we had to monitor everything. So we want a constantly updating sense of how the application is performing out in the field. And so we started looking at things like calls attempted, called completed, calls dropped, looking at those ratios. We looked for problem signs, like repeated short calls between the same two people, or lots of drops, or short calls that were legitimately hung up. But if somebody’s only on there for five seconds, it’s probably because they can’t hear or they can’t see or something like that. We monitored network quality and audio quality and video quality during the call, and see how that changes over time and with load.

And then lastly, we also made sure that we asked the users how things were going. So this is a pretty standard UI, but as I’m sure you’ve seen with many other tools, when the call ends we say, “how was it? How was the quality? Please give us your one to five rating. If there’s a problem, you can leave some notes.” And we track and record that over time as well to make sure that we’re staying at a high level of quality.

And fortunately, things have been pretty good so far. But thirdly, the first two lessons led to this third one, which is, it’s really important to help users help themselves. So because the user is going to blame your tool no matter what goes wrong, it’s best to give them as many ways out of a problem as possible. So we built in things into the application like a Test Sound button to make sure that the speakers were turned on, or their headset was plugged in right or working correctly. Indicators for your mic, for the camera display so you can actually see what the camera is seeing to make sure it’s actually working the way you think it’s working.

And then lastly, we made sure that there was really detailed logging that a user can turn on and access, and give to us if necessary. And that’s helped our support guys a whole lot. So I’ll leave you with three key takeaways that we pulled out of this process. The first is the choice of Building versus Buying. And what we asked ourselves was, was video the core of our application? Was it the most important thing we were going to build? And we decided that it was a feature, but it was not the core.

And secondly, we were not in a position that we could invest to do it differently, or better than the whole ecosystem of you guys out there who were building these services already. So we found a great partner to work with. Secondly, before you choose, make sure you look hard at the SDK and the API before you make a commitment. You’re going to be using that API for a long time, make sure it’s something you want to program against.

And lastly, remember that the real world is messy. So make it as easy as possible for your users to self diagnose, to give you information, to help you help them. So that’s how we built HipChat Video. I really appreciate you guys listening, and I hope you’ll go check it out. Thank you. [APPLAUSE] .