Capture Audio and Video in HTML5

HTML5 Rocks

Introduction

The ability to capture audio and video has been the Holy Grail of web development for a long time. For many years, you had to rely on browser plugins (Flash or Silverlight) to get the job done. Come on!

HTML5 to the rescue. It might not be apparent, but the rise of HTML5 has brought a surge of access to device hardware. Geolocation (GPS), the Orientation API (accelerometer), WebGL (GPU), and the Web Audio API (audio hardware) are perfect examples. These features are ridiculously powerful and expose high-level JavaScript APIs that sit on top of the system's foundational hardware capabilities.

This tutorial introduces navigator.mediaDevices.getUserMedia(), which allows web apps to access a user's camera and microphone.

The road to the getUserMedia() API

If you're not aware of its history, the road to the getUserMedia() API is an interesting tale.

Several variants of media-capture APIs evolved over the past few years. Many folks recognized the need to access native devices on the web, but that led many people to propose a new spec. Things got so messy that the W3C finally decided to form a working group. Their sole purpose? Make sense of the madness! The Devices and Sensors Working Group has been tasked to consolidate and standardize the plethora of proposals.

Here's a summary of what happened in 2011.

Round 1: HTML Media Capture

HTML Media Capture was the group's first go at standardizing media capture on the web. It overloads the <input type="file"> and adds new values for the accept parameter.

If you want to let users take a snapshot of themselves with the webcam, that's possible with capture=camera:

<input type="file" accept="image/*;capture=camera">

Pretty nice, right? Semantically, it makes a lot of sense. Where this particular API falls short is the ability to do real-time effects, such as render live webcam data to a <canvas> and apply WebGL filters. HTML Media Capture only allows you to record a media file or take a snapshot in time.

Support

  • Android 3.0 browser—one of the first implementations. Check out this video to see it in action.
  • Google Chrome for Android (0.16)
  • Firefox Mobile 10.0
  • iOS 6 Safari and Chrome (partial support)

Round 2: Device element

Many thought HTML Media Capture was too limited, so a new spec emerged that supported any type of (future) device. Not surprisingly, the design called for a new element, the <device> element, which became the predecessor to getUserMedia().

Opera was among the first browsers to create initial implementations of video capture based on the <device> element. Soon after (the same day to be precise), the WhatWG decided to scrap the <device> tag in favor of another up and comer, this time a JavaScript API called navigator.getUserMedia(). A week later, Opera put out new builds that included support for the updated getUserMedia() spec. Later that year, Microsoft joined the party by releasing a Lab for IE9 supporting the new spec.

Here's what <device> would have looked like:

<device type="media" onchange="update(this.data)"></device>
<video autoplay></video>
<script>
  function update(stream) {
    document.querySelector('video').src = stream.url;
  }
</script>

Support:

Unfortunately, no released browser ever included <device>. One less API to worry about. <device> did have two great things going for it, though:

  • It was semantic.
  • It was easily extensible to support more than audio and video devices.

Take a breath. This stuff moves fast!

Round 3: WebRTC

The <device> element eventually went the way of the dodo.

The pace to find a suitable capture API accelerated thanks to the larger WebRTC (web real-time communication) effort. That spec is overseen by the Web Real-Time Communications Working Group. Google, Opera, Mozilla, and a few others have implementations.

getUserMedia() is related to WebRTC because it's the gateway into that set of APIs. It provides the means to access the user's local camera and microphone stream.

Support:

getUserMedia() has been available since Chrome 21, Opera 18, and Firefox 17. Support was initially provided by the Navigator.getUserMedia() method, but this has been deprecated.

You should now use the navigator.mediaDevices.getUserMedia() method, which is widely supported.

Get started

With getUserMedia(), you can finally tap into webcam and microphone input without a plugin. Camera access is now a call away, not an install away. It's baked directly into the browser. Excited yet?

Feature detection

Feature detection is a simple check for the existence of navigator.mediaDevices.getUserMedia:

function hasGetUserMedia() {
  return !!(navigator.mediaDevices && navigator.mediaDevices.getUserMedia);
}
if (hasGetUserMedia()) {
  // Good to go!
} else {
  alert("getUserMedia() is not supported by your browser");
}

Gain access to an input device

To use the webcam or microphone, you need to request permission. The parameter to getUserMedia() is an object specifying the details and requirements for each type of media you want to access. For example, if you want to access the webcam, the parameter should be {video: true}. To use both the microphone and camera, pass {video: true, audio: true}:

<video autoplay></video>

<script>
const constraints = {
  video: true,
};

const video = document.querySelector("video");

navigator.mediaDevices.getUserMedia(constraints).then((stream) => {
  video.srcObject = stream;
});
</script>

Okay. So what's going on here? Media capture is a perfect example of HTML5 APIs working together. It works in conjunction with your other HTML5 buddies, <audio> and <video>. Notice that you don't set a src attribute or include <source> elements on the <video> element. Instead of the URL of a media file, you give the video a MediaStream from the webcam.

You also tell the <video> to autoplay, otherwise it would be frozen on the first frame. The addition of controls also works as you'd expect.

Set media constraints (resolution, height, and width)

The parameter to getUserMedia() can also be used to specify more requirements (or constraints) on the returned media stream. For example, instead of only basic access to video (for example, {video: true}), you can additionally require the stream to be HD:

const hdConstraints = {
  video: { width: { min: 1280 }, height: { min: 720 } },
};

navigator.mediaDevices.getUserMedia(hdConstraints).then((stream) => {
  video.srcObject = stream;
});
const vgaConstraints = {
  video: { width: { exact: 640 }, height: { exact: 480 } },
};

navigator.mediaDevices.getUserMedia(vgaConstraints).then((stream) => {
  video.srcObject = stream;
});

If the resolution isn't supported by the currently selected camera, getUserMedia()is rejected with an OverconstrainedError and the user isn't prompted to grant permission to access their camera.

For more configurations, see the Constraints API.

Select a media source

The navigator.mediaDevices.enumerateDevices() method provides information about available input and output devices, and makes it possible to select a camera or microphone. (The MediaStreamTrack.getSources() API has been deprecated.)

This example enables the user to choose an audio and video source:

const videoElement = document.querySelector("video");
const audioSelect = document.querySelector("select#audioSource");
const videoSelect = document.querySelector("select#videoSource");

navigator.mediaDevices
  .enumerateDevices()
  .then(gotDevices)
  .then(getStream)
  .catch(handleError);

audioSelect.onchange = getStream;
videoSelect.onchange = getStream;

function gotDevices(deviceInfos) {
  for (let i = 0; i !== deviceInfos.length; ++i) {
    const deviceInfo = deviceInfos[i];
    const option = document.createElement("option");
    option.value = deviceInfo.deviceId;
    if (deviceInfo.kind === "audioinput") {
      option.text =
        deviceInfo.label || "microphone " + (audioSelect.length + 1);
      audioSelect.appendChild(option);
    } else if (deviceInfo.kind === "videoinput") {
      option.text = deviceInfo.label || "camera " + (videoSelect.length + 1);
      videoSelect.appendChild(option);
    } else {
      console.log("Found another kind of device: ", deviceInfo);
    }
  }
}

function getStream() {
  if (window.stream) {
    window.stream.getTracks().forEach(function (track) {
      track.stop();
    });
  }

  const constraints = {
    audio: {
      deviceId: { exact: audioSelect.value },
    },
    video: {
      deviceId: { exact: videoSelect.value },
    },
  };

  navigator.mediaDevices
    .getUserMedia(constraints)
    .then(gotStream)
    .catch(handleError);
}

function gotStream(stream) {
  window.stream = stream; // make stream available to console
  videoElement.srcObject = stream;
}

function handleError(error) {
  console.error("Error: ", error);
}

Check out Sam Dutton's great demo of how to let users select the media source.

Security

getUserMedia() can only be called from an HTTPS URL or localhost. Otherwise, the promise from the call is rejected. getUserMedia() also doesn't work for cross-origin calls from iframes. For more information, see Deprecating Permissions in Cross-Origin Iframes.

All browsers generate an infobar upon the call to getUserMedia(), which gives users the option to grant or deny access to their cameras or microphones. Here's the permission dialog from Chrome:

Permission dialog in Chrome.
Permission dialog in Chrome

This permission is persistent. That is, users don't have to grant or deny access every time. If users change their mind later, they can update their camera access options per origin from the browser settings.

The MediaStreamTrack actively uses the camera, which takes resources and keeps the camera open (and camera light on). When you no longer use a track, call track.stop() so that the camera can be closed.

Basic demo

Take screenshots

The <canvas> API's ctx.drawImage(video, 0, 0) method makes it trivial to draw <video> frames to <canvas>. Of course, now that you have video input through getUserMedia(), it's just as easy to create a photo-booth app with real-time video:

<video autoplay></video>
<img src="">
<canvas style="display:none;"></canvas>

<script>
const captureVideoButton = document.querySelector(
  "#screenshot .capture-button"
);
const screenshotButton = document.querySelector("#screenshot-button");
const img = document.querySelector("#screenshot img");
const video = document.querySelector("#screenshot video");

const canvas = document.createElement("canvas");

captureVideoButton.onclick = function () {
  navigator.mediaDevices
    .getUserMedia(constraints)
    .then(handleSuccess)
    .catch(handleError);
};

screenshotButton.onclick = video.onclick = function () {
  canvas.width = video.videoWidth;
  canvas.height = video.videoHeight;
  canvas.getContext("2d").drawImage(video, 0, 0);
  // Other browsers will fall back to image/png
  img.src = canvas.toDataURL("image/webp");
};

function handleSuccess(stream) {
  screenshotButton.disabled = false;
  video.srcObject = stream;
}
</script>

Apply effects

CSS Filters

With CSS filters, you can apply some gnarly effects to the <video> as it is captured:

<video autoplay></video>
<p><button class="capture-button">Capture video</button>
<p><button id="cssfilters-apply">Apply CSS filter</button></p>

<script>
const captureVideoButton = document.querySelector(
  "#cssfilters .capture-button"
);
const cssFiltersButton = document.querySelector("#cssfilters-apply");
const video = document.querySelector("#cssfilters video");

let filterIndex = 0;
const filters = [
  "grayscale",
  "sepia",
  "blur",
  "brightness",
  "contrast",
  "hue-rotate",
  "hue-rotate2",
  "hue-rotate3",
  "saturate",
  "invert",
  "",
];

captureVideoButton.onclick = function () {
  navigator.mediaDevices
    .getUserMedia(constraints)
    .then(handleSuccess)
    .catch(handleError);
};

cssFiltersButton.onclick = video.onclick = function () {
  video.className = filters[filterIndex++ % filters.length];
};

function handleSuccess(stream) {
  video.srcObject = stream;
}
</script>

WebGL textures

One amazing use case for video capture is to render live input as a WebGL texture. Give Jerome Etienne's tutorial and demo a look. It talks about how to use getUserMedia() and Three.js to render live video into WebGL.

Use the getUserMedia API with the Web Audio API

Chrome supports live microphone input from getUserMedia() to the Web Audio API for real-time effects. It looks like this:

window.AudioContext = window.AudioContext || window.webkitAudioContext;

const context = new AudioContext();

navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => {
  const microphone = context.createMediaStreamSource(stream);
  const filter = context.createBiquadFilter();
  // microphone -> filter -> destination
  microphone.connect(filter);
  filter.connect(context.destination);
});

Demos:

For more information, see Chris Wilson's post.

Conclusion

Historically, device access on the web has been a tough nut to crack. Many tried, few succeeded. Most of the early ideas never gained widespread adoption or took hold outside of a proprietary environment. Perhaps the main problem has been that the web's security model is very different from the native world. In particular, you probably don't want every Joe Shmoe website to have random access to your video camera or microphone. It's been difficult to get right.

Since then, driven by the increasingly ubiquitous capabilities of mobile devices, the web has begun to provide much richer functionality. You now have APIs to take photos and control camera settings, record audio and video, and access other types of sensor data, such as location, motion, and device orientation. The Generic Sensor framework ties all this together, alongside generic APIs to enable web apps to access USB and interact with Bluetooth devices.

getUserMedia() was only the first wave of hardware interactivity.

Additional resources

Demos

Comments

0