Using HTML5 Application Cache to Create Offline Web Applications

HTML5 introduces Application Cache, a new feature that enables you to make web apps and sites available offline. The new specification also provides an easy way to prefetch some or all of your web app's assets (HTML files, images, CSS, JavaScript, and so on) while the client is still online. During this caching process, files are stored in an application cache, where they sit ready for future offline use.

Compare this to regular browser caching, in which pages that you visit are cached in the browser's cache based on server-side rules and client-side configuration. But-even if web pages are cached normally, this does not provide a reliable way for you to access pages while you're in offline mode (in an airplane, for example). In addition, an application cache can cache pages that have not been visited at all and are therefore typically unavailable in the regular browser cache. Prefetching files can even speed up your site's performance, though you are of course using bandwidth to download those files initially.

Regular caching can lead to undesired results while you're offline.
The mechanism that makes offline web applications available is simple: create a manifest (text) file that lists your app's assets and reference it in a manifest attribute in your web pages' html elements. Sounds simple? It is. Just remember that there are some common misconceptions about how offline web apps work, and because of all the caching the browser is now doing for you, you may get a few (tricky) surprises when you try to debug and test your web apps (online and offline). This article explains what to look out for, so you don't have to learn the hard way. This article also covers some lower level object and Application Cache event functionality that you can tap into. Sounds good? Great-let's get started!

Note: a set of starter files that make up an example offline web app are available at: http://tech.kaazing.com/training/offline/peter-lubbers-html5-offline-web-apps-presentation-code.zip

The Basics
Let's start with the basics. Consider a web app that has just a few web pages: an index page, a CSS file, images, and JavaScript files. This web site can be made available offline in less than five minutes (once you get the hang of it). Here's how:

CACHE MANIFEST
index.html
cache.html
resources/img/html5.png
resources/css/html5.css
resources/js/html5.js

Note
: files must be referenced relative to the manifest file. Full URLs are allowed as well.

Browser notification about offline storage

Browser Support for HTML5 Offline Web Applications
Which browsers currently support offline web applications? The following table shows the browser support that's available for this feature at the time of this writing. As you can see, HTML5 Offline Web Applications are already supported in most browsers.

Browser

Details

Firefox

Supported in version 3.5 and later

Safari

Supported in version 4.0 and later

Chrome

Supported in version 4.0 and later

Opera

Supported in version 10.6 and later

Internet Explorer

Some day... Hopefully before we're all old and gray!

Note: Always go to http://caniuse.com to find the latest and greatest browser support matrixes for HTML5 and CSS3 features.

Due to the varying levels of support, it is a good idea to first test if HTML5 Offline Web Applications are supported, before you count on it. You can do this in two ways: with or without Modernizr as shown in the next example. I suggest using Modernizr (http://www.modernizr.com/) because it can handle certain tricky marginal cases. For example, in private browsing modes, such as Chrome's incognito mode, a call to window.applicationCache (more on this later) will return true, but the browser won't actually be able to write files to the cache.

if(window.applicationCache) {
// this browser supports offline web apps
}

//or using Modernizr
if (Modernizr.applicationcache){
// We have offline web app support
}

Creating a Manifest File
You can simply add some files to a cache as shown in the previous example, but you can also do more than that. Let's explore these options in more detail.

To ensure HTML5 interoperability, browsers must be very strict when it comes to reading files, so you must be very careful how you specify your files. If you don't pay attention to supplying the proper case, required colons, and formatting, you'll get undesired and sometimes puzzling results. Here are some general rules:

CACHE MANIFEST
# manifest version 1.0.1
# Files to cache

There are three name spaces; all of them can appear multiple times in the file:

Let's take a look at each of these name spaces.

CACHE:
The files listed in this section will be cached in an application cache. If you only want to specify a list of files to be cached, you can simply add them under the CACHE MANIFEST directive without the CACHE: header, because this is the default behavior for files listed in the manifest file. However, if you want to flag files to be cached anywhere else in the file, you need to place them under an explicit CACHE: header (including the colon at the end, or you'll run into problems).

Here are the rules for the CACHE: section:

CACHE:
index.html
cache.html
html5.css
image1.jpg
favicon.ico

NETWORK:
This section is also called the "online whitelist." Files listed in this section will not be loaded from the application cache, but will be retrieved from the server if the browser is online. You can specify "*" (the default), which sets the online whitelist wildcard flag to "open", so that resources from other origins (an origin is the combination of a scheme, host, and port) will not be blocked. Here is an example NETWORK: section that specifies that the file network.html must always be retrieved from the server, bypassing the application cache:

# Use from network if available
NETWORK:
network.html

FALLBACK:
This section has a slightly different syntax than the other sections; it provides a way to specify a fallback resource that must be served if a specific resource cannot be found. An example of this is when the browser is offline and tries to load something that is not in the application cache, such as a page or JavaScript file listed in the NETWORK section. The following example shows how you can serve the page fallback.html when requests to server pages fail:

# Fallback content
FALLBACK:
/ fallback.html

To recap, here is our final manifest file, called offline.manifest:

CACHE MANIFEST
# manifest version 1.0.1

# Files to cache
index.html
cache.html
html5.css
image1.jpg
favicon.ico
# Use from network if available
NETWORK:
network.html
# Fallback content
FALLBACK:
/ fallback.html

Now that you've created your manifest file, you just need to reference it by adding the manifest attribute to the html elements of the HTML pages that you want to cache (cache.html and index.html). You do this as follows:

<!DOCTYPE html>
<html manifest="offline.manifest">

Serving the Manifest File
Just like you want a nice pasta dish served up al dente, you want your manifest files served up with the text/cache-manifest MIME type. You will find, however, that very few web servers will do this correctly out-of-the-box. Instead, you'll find that files will be served in either text or binary mode-and neither one will work. You can test this by using navigation to the file in a browser and looking at the properties for the file. Most web servers provide a way to configure the mime types for specific file extensions, so once you just update the mime type configuration file on the server, you're all set.

On Apache, you can change this globally in the mime.types file:

# Apache mimetype configuration
# APACHE_HOME/conf/mime.types
text/cache-manifest manifest

Alternatively, you can update the .htaccess file for an individual web application:

# Apache mimetype configuration
AddType text/cache-manifest .manifest

Oh, and while you're changing your Apache configuration, take this good advice from Bruce Lawson and Remy Sharp in their excellent book Introducing HTML5. Set the cache control headers for the manifest file to prevent the manifest file from being cached. If you don't, you'll wish you did, because as you'll soon see, the manifest file must be updated in order to trigger any web app updates that you download. In other words, the manifest file is not a file you want to cache! You can do this in your .htaccess file as well:

# Cache settings for the manifest file
<IfModule mod_expires.c>
Header set cache-control: public
ExpiresActive on
# Prevent receiving a cached manifest
ExpiresByType text/cache-manifest "access plus 0 seconds"
</IfModule>

For Python's SimpleHTTPServer-a great server to do some quick testing-you can update the mimetypes section in the file mimetypes.py located in the PYTHON_HOME/Lib directory as follows:

# Python SimpleHTTPServer mimetype Configuration
'.manifest'    : 'text/cache-manifest',

Note: If you do not have a mimetypes.py file (this happens a lot on default Mac installations, in which you'll probably have a compiled mimetypes.pyc file instead), you can use the sample mimetypes.py file located in the mac-config-file directory in the starter file ZIP file. Make sure that the permissions on this file are changed to read/write. When you start Python with the new file, Python compiles it and generates a new mimetypes.pyc.

Application Cache Sequences
Let's see what's happening behind the scenes when you access a web app that uses an application cache. When you first access your web app, the following sequence of events takes place:


The initial page load


Going Offline in Firefox

Note: In Opera and Firefox you can go offline by selecting File > Work Offline, but a similar option does not exist (unfortunately) in Chrome or Safari. As a workaround, I've found that specifying a made-up proxy server in the LAN settings can give you the same effect after the browser times out looking for the non-existent proxy server.

The cache page loads from the application cache

Fallback content is served when you try to access a network resource in offline mode

You might be surprised by what happens the next time you visit the app, when the following sequence of events takes place:

Important: This is an important detail about application caching and application cache busting: New files will be downloaded only when a change in the manifest file is detected.

Best Practice: if you only made a content change to an existing file (cache.html), then no files were added or removed and you obviously don't really have to make changes to the manifest file. In this case you can make a trivial change such as adding a comment. As a best practice, use a version number comment each time you make any change to force the download of your app's files.

See Also: Check out the following, related proposal to enhance Application Cache for better performance by Google's Seth Ladd: http://blog.sethladd.com/2010/10/proposal-to-enhance-html5-app-cache.html

Try this out in different browsers. The current implementations are not completely interoperable yet, but it's a good start. One browser that has great support for Application Cache is Google Chrome. In the recent versions of the developer channel for this browser, there is complete support for application cache and application cache events in the storage tab as shown in the following image:

Google Chrome Developer Tools Application Cache Storage View

Application Cache Events
The window.applicationCache object fires several events related to the state of the cache. window.applicationCache.status is a numerical property that tells you the state of the cache:

0-UNCACHED
1-IDLE
2-CHECKING
3-DOWNLOADING
4-UPDATEREADY
5-OBSOLETE

The following table shows the window.applicationCache callback attributes you can use in your applications:

Callback Attribute

Event

onchecking

CHECKING

ondownloading

DOWNLOADING

onupdateready

UPDATEREADY

onobsolete

OBSOLETE

oncached

CACHED

onerror

ERROR

onnoupdateready

NOUPDATE

onprogress

PROGRESS

The following code snippet shows how you can use these event callback attributes in your code.

Note: you can programmatically call window.applicationCache.update() to check for updates and window.applicationCache.swapCache() to connect the browser to use the latest version of the updated cache. The browsers will do this under the covers when you refresh the page.

window.applicationCache.onchecking = function(e) {
log("Checking for application update");
}
window.applicationCache.onnoupdate = function(e) {
log("No application update found");
}
window.applicationCache.onupdateready = function(e) {
log("Application update ready");
//Now connect the browser to use the new cache
window.applicationCache.swapCache();
}
window.applicationCache.onobsolete = function(e) {
log("Application obsolete");
}
window.applicationCache.ondownloading = function(e) {
log("Downloading application update");
}
window.applicationCache.oncached = function(e) {
log("Application cached");
}
window.applicationCache.onerror = function(e) {
log("Application cache error");
}
window.applicationCache.onprogress = function(e) {
log("Application Cache progress");
}

You can use the application cache events to code up some cool user notification functionality. For example, while you're downloading an application (receiving progress events), you can show the progress of the download. And when the onupdateready event fires, you can swap the browser's active cache to the new cache and instruct the user that (1) a new update has been downloaded, and (2) they must refresh the page to see the changes (since the cache is now downloaded and ready for use, but the page was already loaded from the application cache before the download started and will not be automatically refreshed.

See Also: Check out Ben Nadel's blog post about handling application cache events: http://www.bennadel.com/blog/2029-Using-HTML5-Offline-Application-Cache-Events-In-Javascript.htm

Detecting Online and Offline Status
HTML5 also allows you to detect whether you're online or offline. The following code shows how you can add event listeners for online and offline events.

You can use this to stop communicating with a server and to instead store things in HTML5 Web Storage (window.localStorage or window.sessionStorage) or in an HTML5 Web SQL Database while you're offline.

window.addEventListener("online", function(e) {
log("Application is now online");
// Send app data to server
}, true);
window.addEventListener("offline", function(e) {
log("Application is now offline");
window.localStorage.myLocalKey = ‘Some Data';
}, true);

Since these events do not fire when the page loads, you can also detect the initial status of a page by calling window.applicationCache.status.

Accessing Application Cache Content
You might be wondering where all of those files are stored. The application cache files are actually somewhere on your device's hard disk. For example, if you open the page about:cache in Firefox it will show you the exact location. Internally, the files may be stored in a SQLite database, but how the files are stored is an implementation detail that is left up to the browsers to figure out.


The
about:cache page in Firefox

Security
Since it is possible to access other people's files from an application cache on a shared computer (by navigating to the same site), it is critical that you do not store personal or sensitive data in an application cache. It is also important to keep in mind that you cannot always write to the application cache even if your browser supports the feature. This is because most private browsing modes (For example, Safari's Private Browsing mode, shown in the following image, or Chrome's Incognito Mode) prevent you from writing to an application cache for security reasons. It is therefore important to check for errors and to never assume you can access the cache.


Safari's Private Browsing Mode

Disk Quota
If you try to store a lot of data in an application cache, you may run into quota errors. Firefox and Opera provide a handy way to increase storage size for specific offline web applications while Chrome and Safari support this only through the use of special startup parameters. In your applications you should listen for errors such as the following:

Application Cache Error event: Failed to commit new cache to storage as it would exceed the quota.

In the future, browsers will hopefully have graceful, on-the-fly quota upgrade mechanisms for Application caching, like those of Opera's Web Storage, which prompts you as you are about to exceed your quota.

Clearing the Cache
In order to debug and test web apps that use application cache, it is a good idea to start with a clean slate to avoid false positives. To do this you must first blow away any existing application caches. Typically you do this by clearing the cache. Ensure that you are not doing this while a page from an offline web application is still open in the browser; that may cause problems with clearing the cache for that site.

Note: An application cache is created using the manifest's complete URL. You can have multiple manifest files in a site, which then allows you to split up loading of files. Each of these manifests will create a separate application cache. Here is how you can clear the cache and application cache in the various browsers:

Browser

Steps to Clear a Cache

Chrome

Settings Menu > Tools >
Clear Browsing Data

Firefox

Tools > Clear Recent History

&
Tools > Options (Preferences on Mac OS X) > Advanced > Network > Select specific application cache > Remove

 

Safari

Settings Menu > Reset Safari

Opera

Tools > Preferences > Storage

(+ Tools > Clear Private Data)

 

Internet Explorer

N/A

Best Practices
To recap, here are some of the best practices when it comes to using HTML5 offline web applications:

127.0.0.1 localhost
127.0.0.1 offline0.example.com
127.0.0.1 offline1.example.com

Checking which files are requested and served up in Python's SimpleHTTPServer server log

Summary
In this article you learnt about the ins and outs of HTML5 Application Cache, a new feature that can be used to create offline web applications. This article explained how application caching worked and added some clarification about common misconceptions while providing a few best practices along the way.

© 2008 SYS-CON Media