Capturing Audio & Video in HTML5

HTML5 Rocks

Introduction

Audio/Video capture has been the "Holy Grail" of web development for a long time. For many years we had to rely on browser plugins (Flash or Silverlight) to get the job done. Come on!

HTML5 to the rescue. It might not be apparent, but the rise of HTML5 has brought a surge of access to device hardware. Geolocation (GPS), the Orientation API (accelerometer), WebGL (GPU), and the Web Audio API (audio hardware) are perfect examples. These features are ridiculously powerful, exposing high level JavaScript APIs that sit on top of the system's underlying hardware capabilities.

This tutorial introduces navigator.mediaDevices.getUserMedia(), which allows web apps to access a user's camera and microphone.

The road to getUserMedia()

If you're not aware of its history, the way we arrived at the getUserMedia() API is an interesting tale.

Several variants of "Media Capture APIs" have evolved over the past few years. Many folks recognized the need to be able to access native devices on the web, but that led everyone and their mom to put together a new spec. Things got so messy that the W3C finally decided to form a working group. Their sole purpose? Make sense of the madness! The Device APIs Policy (DAP) Working Group has been tasked to consolidate + standardize the plethora of proposals.

I'll try to summarize what happened in 2011...

Round 1: HTML Media Capture

HTML Media Capture was the DAP's first go at standardizing media capture on the web. It works by overloading the <input type="file"> and adding new values for the accept parameter.

If you wanted to let users take a snapshot of themselves with the webcam, that's possible with capture=camera:

<input type="file" accept="image/*;capture=camera">

Recording a video or audio is similar:

<input type="file" accept="video/*;capture=camcorder">
<input type="file" accept="audio/*;capture=microphone">

Kinda nice right? I particularly like that it reuses a file input. Semantically, it makes a lot of sense. Where this particular "API" falls short is the ability to do realtime effects (e.g. render live webcam data to a <canvas> and apply WebGL filters). HTML Media Capture only allows you to record a media file or take a snapshot in time.

Support:

  • Android 3.0 browser - one of the first implementations. Check out this video to see it in action.
  • Chrome for Android (0.16)
  • Firefox Mobile 10.0
  • iOS6 Safari and Chrome (partial support)

Round 2: device element

Many thought HTML Media Capture was too limiting, so a new spec emerged that supported any type of (future) device. Not surprisingly, the design called for a new element, the <device> element, which became the predecessor to getUserMedia().

Opera was among the first browsers to create initial implementations of video capture based on the <device> element. Soon after (the same day to be precise), the WhatWG decided to scrap the <device> tag in favor of another up and comer, this time a JavaScript API called navigator.getUserMedia(). A week later, Opera put out new builds that included support for the updated getUserMedia() spec. Later that year, Microsoft joined the party by releasing a Lab for IE9 supporting the new spec.

Here's what <device> would have looked like:

<device type="media" onchange="update(this.data)"></device>
<video autoplay></video>
<script>
  function update(stream) {
    document.querySelector('video').src = stream.url;
  }
</script>

Support:

Unfortunately, no released browser ever included <device>. One less API to worry about I guess :) <device> did have two great things going for it though: 1.) it was semantic, and 2.) it was easily extensible to support more than just audio/video devices.

Take a breath. This stuff moves fast!

Round 3: WebRTC

The <device> element eventually went the way of the Dodo.

The pace to find a suitable capture API accelerated thanks to the larger WebRTC (Web Real Time Communications) effort. That spec is overseen by the W3C WebRTC Working Group. Google, Opera, Mozilla, and a few others have implementations.

getUserMedia() is related to WebRTC because it's the gateway into that set of APIs. It provides the means to access the user's local camera/microphone stream.

Support:

getUserMedia() has been available since Chrome 21, Opera 18, and Firefox 17. Support was initially provided by the Navigator.getUserMedia() method, but this has been deprecated.

You should now use the navigator.MediaDevices.getUserMedia() method, which is widely supported.

Getting started

With getUserMedia(), we can finally tap into webcam and microphone input without a plugin. Camera access is now a call away, not an install away. It's baked directly into the browser. Excited yet?

Feature detection

Feature detecting is a simple check for the existence of navigator.mediaDevices.getUserMedia:

function hasGetUserMedia() {
  return !!(navigator.mediaDevices &&
    navigator.mediaDevices.getUserMedia);
}

if (hasGetUserMedia()) {
  // Good to go!
} else {
  alert('getUserMedia() is not supported by your browser');
}

Gaining access to an input device

To use the webcam or microphone, we need to request permission. The parameter to getUserMedia() is an object specifying the details and requirements for each type of media you want to access. For example, if you want to access the webcam, the parameter should be {video: true}. To use both the microphone and camera, pass {video: true, audio: true}:

<video autoplay></video>

<script>
const constraints = {
  video: true
};

const video = document.querySelector('video');

navigator.mediaDevices.getUserMedia(constraints).
  then((stream) => {video.srcObject = stream});
</script>

OK. So what's going on here? Media capture is a perfect example of HTML5 APIs working together. It works in conjunction with our other HTML5 buddies, <audio> and <video>. Notice that we're not setting a src attribute or including <source> elements on the <video> element. Instead of feeding the video the URL of a media file, we're giving it a MediaStream from the webcam.

I'm also telling the <video> to autoplay, otherwise it would be frozen on the first frame. Adding controls also works as you'd expected.

Setting media constraints (resolution, height, width)

The parameter to getUserMedia() can also be used to specify more requirements (or constraints) on the returned media stream. For example, instead of just indicating you want basic access to video (e.g. {video: true}), you can additionally require the stream to be HD:

const hdConstraints = {
  video: {width: {min: 1280}, height: {min: 720}}
};

navigator.mediaDevices.getUserMedia(hdConstraints).
  then((stream) => {video.srcObject = stream});

...

const vgaConstraints = {
  video: {width: {exact: 640}, height: {exact: 480}}
};

navigator.mediaDevices.getUserMedia(vgaConstraints).
  then((stream) => {video.srcObject = stream});

If the resolution isn't supported by the currently selected camera, getUserMedia()will be rejected with an OverconstrainedError and the user will not be prompted to grant permission to access their camera.

For more configurations, see the constraints API

Selecting a media source

The navigator.mediaDevices.enumerateDevices() method provides information about available input and output devices, and makes it possible to select a camera or microphone. (The MediaStreamTrack.getSources() API has been deprecated.)

This example enables the user to choose an audio and video source:

const videoElement = document.querySelector('video');
const audioSelect = document.querySelector('select#audioSource');
const videoSelect = document.querySelector('select#videoSource');

navigator.mediaDevices.enumerateDevices()
  .then(gotDevices).then(getStream).catch(handleError);

audioSelect.onchange = getStream;
videoSelect.onchange = getStream;

function gotDevices(deviceInfos) {
  for (let i = 0; i !== deviceInfos.length; ++i) {
    const deviceInfo = deviceInfos[i];
    const option = document.createElement('option');
    option.value = deviceInfo.deviceId;
    if (deviceInfo.kind === 'audioinput') {
      option.text = deviceInfo.label ||
        'microphone ' + (audioSelect.length + 1);
      audioSelect.appendChild(option);
    } else if (deviceInfo.kind === 'videoinput') {
      option.text = deviceInfo.label || 'camera ' +
        (videoSelect.length + 1);
      videoSelect.appendChild(option);
    } else {
      console.log('Found another kind of device: ', deviceInfo);
    }
  }
}

function getStream() {
  if (window.stream) {
    window.stream.getTracks().forEach(function(track) {
      track.stop();
    });
  }

  const constraints = {
    audio: {
      deviceId: {exact: audioSelect.value}
    },
    video: {
      deviceId: {exact: videoSelect.value}
    }
  };

  navigator.mediaDevices.getUserMedia(constraints).
    then(gotStream).catch(handleError);
}

function gotStream(stream) {
  window.stream = stream; // make stream available to console
  videoElement.srcObject = stream;
}

function handleError(error) {
  console.error('Error: ', error);
}

Check out Sam Dutton's great demo of how to let users select the media source.

Security

getUserMedia() can only be called from an HTTPS URL, localhost or a file:// URL. Otherwise, the promise from the call will be rejected. getUserMedia() also won't work for cross origin calls from iframes: see Deprecating Permissions in Cross-Origin Iframes for more detail.

All browsers will throw up an infobar upon calling getUserMedia(), which gives users the option to grant or deny access to their camera or microphone. Here is Chrome's permission dialog:

Permission dialog in Chrome
Permission dialog in Chrome

This permission will be persistent. That is, users won't have to grant/deny access every time. If users change their mind later, they can update their camera access options per origin from the browser settings.

The MediaStreamTrack is actively using the camera, which takes resources and keeps the camera open (and camera light on). When you are no longer using a track make sure to call track.stop() so that the camera can be closed.

Basic demo

Taking screenshots

The <canvas> API's ctx.drawImage(video, 0, 0) method makes it trivial to draw <video> frames to <canvas>. Of course, now that we have video input via getUserMedia(), it's just as easy to create a photo booth application with realtime video:

<video autoplay></video>
<img src="">
<canvas style="display:none;"></canvas>

<script>
const captureVideoButton =
  document.querySelector('#screenshot .capture-button');
const screenshotButton = document.querySelector('#screenshot-button');
const img = document.querySelector('#screenshot img');
const video = document.querySelector('#screenshot video');

const canvas = document.createElement('canvas');

captureVideoButton.onclick = function() {
  navigator.mediaDevices.getUserMedia(constraints).
    then(handleSuccess).catch(handleError);
};

screenshotButton.onclick = video.onclick = function() {
  canvas.width = video.videoWidth;
  canvas.height = video.videoHeight;
  canvas.getContext('2d').drawImage(video, 0, 0);
  // Other browsers will fall back to image/png
  img.src = canvas.toDataURL('image/webp');
};

function handleSuccess(stream) {
  screenshotButton.disabled = false;
  video.srcObject = stream;
}
</script>

Applying Effects

CSS Filters

Using CSS Filters, we can apply some gnarly effects to the <video> as it is captured:


<video autoplay></video>
<p><button class="capture-button">Capture video</button>
<p><button id="cssfilters-apply">Apply CSS filter</button></p>

<script>
const captureVideoButton =
  document.querySelector('#cssfilters .capture-button');
const cssFiltersButton =
  document.querySelector('#cssfilters-apply');
const video =
  document.querySelector('#cssfilters video');

let filterIndex = 0;
const filters = [
  'grayscale',
  'sepia',
  'blur',
  'brightness',
  'contrast',
  'hue-rotate',
  'hue-rotate2',
  'hue-rotate3',
  'saturate',
  'invert',
  ''
];

captureVideoButton.onclick = function() {
  navigator.mediaDevices.getUserMedia(constraints).
    then(handleSuccess).catch(handleError);
};

cssFiltersButton.onclick = video.onclick = function() {
  video.className = filters[filterIndex++ % filters.length];
};

function handleSuccess(stream) {
  video.srcObject = stream;
}
</script>

WebGL Textures

One amazing use case for video capture is to render live input as a WebGL texture. Since I know absolutely nothing about WebGL (other than it's sweet), I'm going to suggest you give Jerome Etienne's tutorial and demo a look. It talks about how to use getUserMedia() and Three.js to render live video into WebGL.

Using getUserMedia with the Web Audio API

One of my dreams is to build AutoTune in the browser with nothing more than open web technology!

Chrome supports live microphone input from getUserMedia() to the Web Audio API for real-time effects. Piping microphone input to the Web Audio API looks like this:

window.AudioContext = window.AudioContext ||
                      window.webkitAudioContext;

const context = new AudioContext();

navigator.mediaDevices.getUserMedia({audio: true}).
  then((stream) => {
    const microphone = context.createMediaStreamSource(stream);
    const filter = context.createBiquadFilter();
    // microphone -> filter -> destination
    microphone.connect(filter);
    filter.connect(context.destination);
});

Demos:

For more information, see Chris Wilson's post.

Conclusion

Historically, device access on the web has been a tough nut to crack. Many tried, few succeeded. Most of the early ideas never gained widespread adoption or took hold outside of a proprietary environment. Perhaps the main problem has been that the web's security model is very different from the native world. In particular, you probably don't want every Joe Shmoe website to have random access to your video camera or microphone. It's been difficult to get right.

Since then, driven by the increasingly ubiquitous capabilities of mobile devices, the web has begun to provide much richer functionality. We now have APIs to take photos and control camera settings, record audio and video, and to access other types of sensor data such as location, motion and device orientation. The Generic Sensor framework ties all this together, alongside generic APIs to enable web applications to access USB and interact with Bluetooth devices.

getUserMedia() was but the first wave of hardware interactivity.

Additional resources

Demos

Comments

0