Client-Side Storage

HTML5 Rocks

Introduction

This is an overview of client-side storage, a general term for several separate but related APIs: Web Storage, Web SQL Database, Indexed Database, and File Access. Each of these techniques provides a distinct way to store data on the user's hard drive, instead of the server, where data usually resides. There are two main reasons to do this: (a) to make the web app available offline; (b) to improve performance. For a detailed explanation of the use cases for client-side storage, see the HTML5Rocks article, "Offline": What does it mean and why should I care?.

The APIs share a similar scope and similar principles. So let's first understand what they have in common before launching to the specifics of each.

Common Features

Storage on the Client Device

In practice, "client-side storage" means data is passed to the browser's storage API, which saves it on the local device in the same area as it stores other user-specific information, e.g. preferences and cache. Beyond saving data, the APIs let you retrieve data, and in some cases, perform searches and batch manipulations.

Sandboxed

All four storage APIs tie data to a single "origin". e.g. if http://abc.example.com saves some data, then the browser will only permit http://abc.example.com to access that data in the future. When it comes to "origins", the domain must be exactly the same, so http://example.com and http://def.example.com are both disqualified. The port must match too, so http://abc.example.com:123 also cannot see http://abc.example.com (which defaults to port 80), and so must the protocol (http versus https, etc.).

Quotas

You can imagine the chaos if any website was allowed to populate unsuspecting hard drives with gigabytes of data! Thus, browsers impose limits on storage capacity. When your app attempts to exceed that limit, the browser will typically show a dialog to let the user confirm the increase. You might expect the browser to enforce a single limit for all storage an origin can use, but most enforce limits separately for each storage mechanism. This will change as the Quota API is adopted, but for now, you should think of the browser as maintaining a 2-D matrix, with "origin" in one dimension and "storage" in the other. For example, "http://abc.example.com" may be allowed to store up to 5MB of Web Storage, 25MB of Web SQL Database Storage, and forbidden to use Indexed Database due to the user denying access. The Quota API brings this into a central location and lets you query how much space is available and in use.

There are also environments where the user can see upfront how much storage will be used, e.g. in the case of the Chrome Web Store, when a user installs an app, they will be prompted upfront to accept its permissions, which include storage limits. One possible value in the manifest is "unlimited_storage".

Transactions

The two "database" storage formats support transactions. The aim is the same reason regular relational databases use transactions: To ensure the integrity of the database. Transactions prevent "race conditions", a phenomenon where two sequences of operations are applied to the database at the same time, leading to results that are both unpredictable and a database whose state is of dubious accuracy.

Synchronous and Asynchronous Modes

Most of the storage formats all support synchronous and asynchronous modes. Synchronous mode is blocking, meaning that the storage operation will be executed to completion before the next line of JavaScript is executed. Asynchronous mode will cause the next lines of JavaScript to be executed before the storage operation completes. The storage operation will be performed in the background and the application will be notified when the operation is finished by way of a callback function being called, a function which must be specified when the call is made.

Synchronous mode should be avoided at all costs, it may seem like a simpler API, but it blocks rendering on the page while the operation completes, and in some cases freezes the whole browser. You've probably noticed when sites and even apps do this, you click on a button and everything freezes, you wonder whether it's crashed, then it springs back to life.

Some APIs don't have an async mode, such as localStorage, you should carefully performance monitor your use of these APIs, and be prepared to switch to one of the async APIs if it becomes an issue.

Overview and Comparison of APIs

Web Storage

Web Storage is basically a single persistent object called localStorage. You can set values using localStorage.foo = "bar" and retrieve them later on — even when the browser has been closed and re-opened — as localStorage.foo. There's also a second object called sessionStorage available, which works the same way, but clears when the window is closed.

Web Storage is an example of a NoSQL key-value store.

Strengths of Web Storage Weakness of Web Storage
  1. Supported on all modern browsers, as well as on iOS and Android, for several years (IE since version 8).
  2. Simple API signature.
  3. Simple call flow, being a synchronous API.
  4. Semantic events available to keep other tabs/windows in sync.
  1. Poor performance for large/complex data, when using the synchronous API (which is the most widely supported mode).
  2. Poor performance when searching large/complex data, due to lack of indexing. (Search operations have to manually iterate through all items.)
  3. Poor performance when storing and retrieving large/complex data structures, because it's necessary to manually serialize and de-serialize to/from string values. The major browser implementations only support string values (even though the spec says otherwise).
  4. Need to ensure data consistency and integrity, since data is effectively unstructured.

Web SQL Database

Web SQL Database is a structured database with all the functionality - and complexity - of a typical SQL-powered relational database. Indexed Database sits somewhere between the two. It has free-form key-value pairs, like Web Storage, but also the capability to index fields from those values, so searching is much faster.

Strengths of Web SQL Database Weakness of Web SQL Database
  1. Supported on major mobile browsers (Android Browser, Mobile Safari, Opera Mobile) as well as several desktop browsers (Chrome, Safari, Opera).
  2. Good performance generally, being an asynchronous API. Database interaction won't lock up the user interface. (Synchronous API is also available for WebWorkers.)
  3. Good search performance, since data can be indexed according to search keys.
  4. Robust, since it supports a transactional database model.
  5. Easier to maintain integrity of data, due to rigid data structure.
  1. Deprecated. Will not be supported on IE or Firefox, and will probably be phased out from the other browsers at some stage.
  2. Steep learning curve, requiring knowledge of relational databases and SQL.
  3. Suffers from object-relational impedance mismatch.
  4. Diminishes agility, as database schema must be defined upfront, with all records in a table matching the same structure.

Indexed Database (IndexedDB)

So far, we have seen that Web Storage and Web SQL Database both have major strengths as well as major weaknesses. Indexed Database has arisen from experiences with both of those earlier APIs, and can be seen as an attempt to combine their strengths without incurring their weaknesses.

An Indexed Database is a collection of "object stores" which you can just drop objects into. The stores are something like SQL tables, but in this case, there's no constraints on the object structure and so no need to define anything upfront. So this is similar to Web Storage, with the advantage that you can have as many databases as you like, and as many stores within each database. But unlike Web Storage, there are important performance benefits: An asynchronous API, and you can create indexes on stores to improve search speed.

Strengths of IndexedDB Weakness of IndexedDB
  1. Good performance generally, being an asynchronous API. Database interaction won't lock up the user interface. (Synchronous API is also available for WebWorkers.)
  2. Good search performance, since data can be indexed according to search keys.
  3. Supports versioning.
  4. Robust, since it supports a transactional database model.
  5. Fairly easy learning curve, due to a simple data model.
  6. Decent browser support: Chrome, Firefox, mobile FF, IE10.
  1. Very complex API resulting in large amounts of nested callbacks.

FileSystem

The previous formats are all suitable for text and structured data, but when it comes to large files and binary content, we need something else. Fortunately, we now have a FileSystem API standard. It gives each domain a full hierarchical filesystem, and in Chrome at least, these are real files sitting on the user's hard drive. For reading and writing of individual files, the API builds on the existing File API.

Strengths of FileSystem API Weakness of FileSystem API
  1. Can store large content and binary files, so it's suitable for images, audio, video, PDFs, etc.
  2. Good performance, being an asynchronous API.
  1. Very early standard. Only available in Chrome and Opera.
  2. No transaction support.
  3. No built-in search/indexing support.

Show Me the Code

This section compares how the various APIs tackle the same problem. The example is a "geo-mood" check-in system, where you can track your mood across time and place. The interface lets you switch between database types. Of course, this is slightly contrived as in real world situations, one database type will always make more sense than the rest, and FileSystem API is not suited to this kind of application at all! But for demonstration purposes, it's helpful indeed if we can see the different means we can use to achieve the same end. Note too that some of the code snippets have been refactored for readability.

Try the Geo-Mood demo now.

To make the demo interesting, we'll isolate the data storage aspects using standard object-oriented design techniques. The UI logic will only know there is a "store"; it won't need to know how the store is implemented, because each store has exactly the same methods on it. So the UI code can just call store.setup(), store.count(), and so on. In reality, there are four implementations of the store, one for each storage type. When the app starts up, it inspects the URL and instantiates the right store.

To keep the API consistent, the methods are asynchronous, i.e. they pass results back to the caller. This is even true for the Web Storage implementation, where the underlying implementation is local.

In the walkthroughs below, we'll skip the UI and geolocation logic to focus on the storage techniques.

Setting up the Store

For localStorage, we do a simple check to see if the store exists. If not, we'll create a new array and store it against the localStorage "checkins" key. We use JSON to convert the structure to a string first, since, in most browsers, localStorage only stores strings.

if (!localStorage.checkins) localStorage.checkins = JSON.stringify([]);

For Web SQL Database, we need to create the database structure if it doesn't exist. openDatabase fortunately creates the database automatically if it doesn't exist, and, likewise, we use the SQL phrase "if not exists" to ensure the new checkins table is not overridden if it is already present. We have to define the structure of the data upfront, i.e. the name and type of each column in the checkins table. Each row will represent a single checkin.

this.db = openDatabase('geomood', '1.0', 'Geo-Mood Checkins', 8192);
this.db.transaction(function(tx) {
  tx.executeSql("create table if not exists " +
    "checkins(id integer primary key asc, time integer, latitude float," +
              "longitude float, mood string)",
    [],
    function() { console.log("siucc"); }
  );
});

Indexed Database setup takes some work, because it enforces a database version system. When we make a connection to our database we specify which version we're expecting, if the current database uses a previous version, or hasn't been created yet, the onupgradeneeded event is fired, and onsuccess is called once the upgrade is complete. If no upgrade is needed onsuccess is called straight away.

Another thing we do here is creating a mood index, so we will later be able to quickly search for all checkins matching a particular mood.

var db;
var version = 1;

window.indexedStore = {};

window.indexedStore.setup = function(handler) {
  // attempt to open the database
  var request = indexedDB.open("geomood", version);

  // upgrade/create the database if needed
  request.onupgradeneeded = function(event) {
    var db = request.result;
    if (event.oldVersion < 1) {
      // Version 1 is the first version of the database.
      var checkinsStore = db.createObjectStore("checkins", { keyPath: "time" });
      checkinsStore.createIndex("moodIndex", "mood", { unique: false });
    }
    if (event.oldVersion < 2) {
      // In future versions we'd upgrade our database here.
      // This will never run here, because we're version 1.
    }
    db = request.result;
  };

  request.onsuccess = function(ev) {
    // assign the database for access outside
    db = request.result;
    handler();
    db.onerror = function(ev) {
      console.log("db error", arguments);
    };
  };
};

Finally, FileSystem setup. We'll store each checkin in its own file, JSON-encoded, and all of them inside a "checkins/" directory. Again, this is not the most appropriate use of FileSystem API, but good for demonstration purposes.

The setup gets a handle on the overall FileSystem, using it to check for the "checkins" directory. If it's not there, we create it with getDirectory.

setup: function(handler) {
  requestFileSystem(
    window.PERSISTENT,
    1024*1024,
    function(fs) {
      fs.root.getDirectory(
        "checkins",
        {}, // no "create" option, so this is a read op
        function(dir) {
          checkinsDir = dir;
          handler();
        },
        function() {
          fs.root.getDirectory(
            "checkins",
            {create: true},
            function(dir) {
              checkinsDir = dir;
              handler();
            },
            onError
          );
        }
      );
    },
    function(e) {
      console.log("error "+e.code+"initialising - see http://goo.gl/YW0TI");
    }
  );
}

Saving a Check-in

With localStorage, we simply pull the check-in array out, add a new one to the end, and save it again. We also have to do the JSON dance to store it in string form.

var checkins = JSON.parse(localStorage["checkins"]);
checkins.push(checkin);
localStorage["checkins"] = JSON.stringify(checkins);

With Web SQL Database, we run everything inside a transaction. We're going to create a new row in the checkins table, It's a straightforward SQL call, and instead of including the checkin data in the "insert" command, we use "?" syntax because it's cleaner and more secure. The actual data - the four values we want to store as columns in the new checkins row - are specified in the second row. The "?" elements will be replaced by those values (checkin.time, checkin.latitude, etc.). The next two arguments indicate functions which will be called when the operation has completed, one for success and one for failure. In this app, we use the same generic error handler for all transactions. In this case, the success function is simply the handler that was passed into the search function - we ensure the handler will be called on success so that the UI logic can be notified when the operation has been completed (e.g. to update the count of checkins so far).

store.db.transaction(function(tx) {
  tx.executeSql(
    "insert into checkins " +
    "(time, latitude, longitude, mood) values (?,?,?,?);",
    [checkin.time, checkin.latitude, checkin.longitude, checkin.mood],
    handler,
    store.onError
  );
});

Once the store is set up, saving in IndexedDB is almost as simple as Web Storage, with the advantage of working asynchronously, in a transaction:

var transaction = db.transaction("checkins", 'readwrite');
transaction.objectStore("checkins").put(checkin);
transaction.oncomplete = handler;

With FileStore, once we create a file and get a handle on it, we can use the FileWriter API to populate it:

fs.root.getFile("checkins/" + checkin.time, {create: true, exclusive: true}, function(file) {
  file.createWriter(function(writer) {
    writer.onerror = fileStore.onError;
    var bb = new WebKitBlobBuilder;
    bb.append(JSON.stringify(checkin));
    writer.write(bb.getBlob("text/plain"));
    handler();
  }, fileStore.onError);
}, fileStore.onError);

The next function fishes out all checkins matching a particular mood, so the user can see where and when they were happy recently, for example. With localStorage, we have to manually walk through each checkin and compare it to the mood, building up a list of matches. It's good practice to return clones of the data that's stored, rather than the actual objects, since searching should be a read-only operation; hence we pass each matching checkin object through a generic clone() operation.

var allCheckins = JSON.parse(localStorage["checkins"]);
var matchingCheckins = [];
allCheckins.forEach(function(checkin) {
  if (checkin.mood == moodQuery) {
    matchingCheckins.push(clone(checkin));
  }
});
handler(matchingCheckins);

With Web SQL Database, we perform a query that returns only the checkin rows that we need. However, we still have to manually walk through that list to accumulate the checkin structures, as the database API returns database rows, rather than an array. (This is a good thing for large result sets, but right now, it adds some work for us to do!)

var matchingCheckins = [];
store.db.transaction(function(tx) {
  tx.executeSql(
    "select * from checkins where mood=?",
    [moodQuery],
    function(tx, results) {
      for (var i = 0; i < results.rows.length; i++) {
        matchingCheckins.push(clone(results.rows.item(i)));
      }
      handler(matchingCheckins);
    },
    store.onError
  );
});

Naturally enough, the IndexedDB solution uses an index, the index on "mood we created earlier, called "moodIndex". We use a cursor to iterate through each checkin matching the query. Note that this cursor pattern can also be used against an entire store; so, with indexes, it's like we get a window into the store where we can only see matching objects (similar to a "view" in traditional databases).

var store = db.transaction("checkins", 'readonly').objectStore("checkins");
var request = moodQuery ?
  store.index("moodIndex").openCursor(new IDBKeyRange.only(moodQuery)) :
  store.openCursor();

request.onsuccess = function(ev) {
  var cursor = request.result;
  if (cursor) {
    handler(cursor.value);
    cursor["continue"]();
  }
};

As with many traditional filesystems, there's no indexing, so a search algorithm (like that used by the Unix "grep" command) must iterate through each file. We extract a Reader from the checkins directory, which lets us walk through each file via readEntries(). For each file, we again extract a reader, and inspect its contents via readAsText(). As these operations are asynchronous, we need to chain calls together, which is the function served by readNext().

checkinsDir.createReader().readEntries(function(files) {
  var reader, fileCount=0, checkins=[];
  var readNextFile = function() {
    reader = new FileReader();
    if (fileCount == files.length) return;
    reader.onload = function(e) {
      var checkin = JSON.parse(this.result);
      if (moodQuery==checkin.mood||!moodQuery) handler(checkin);
      readNextFile();
    };
    files[fileCount++].file(function(file) { reader.readAsText(file); });
  };
  readNextFile();
});

Counting All Checkins

Finally, we need to count all checkins.

For localStorage, we simply de-serialize the checkins array structure and find its length.

handler(JSON.parse(localStorage["checkins"]).length);

With Web SQL Database, we could retrieve each row in the database (select * from checkins) and look at the length of the result set, but if we know our way around SQL, there's an easier - and faster - way. We can perform a special select statement to retrieve the count. It will return exactly one row, having one column containing the count.

store.db.transaction(function(tx) {
  tx.executeSql(
    "select count(*) from checkins;",
    [],
    function(tx, results) {
      handler(results.rows.item(0)["count(*)"]);
    },
    store.onError
  );

Unfortunately, Indexed Database doesn't offer any counting facility, so we have to iterate through all checkins.

var count = 0;
var request = db.transaction(["checkins"], 'readonly').objectStore("checkins").openCursor();
request.onsuccess = function(ev) {
  var cursor = request.result;
  cursor ? ++count && cursor["continue"]() : handler(count);
};

For FileSystem, a directory reader's readEntries() method provides a list of files, so we can just return the length of that list.

checkinsDir.createReader().readEntries(function(files) {
  handler(files.length);
});

Summary

This has been a high-level overview of modern client-side storage techniques. You should also check out the overview on offline apps

Comments

0