Welcome!

AJAX & REA Authors: John Funnell, Bob Little, Kevin Hoffman, Maureen O'Gara, Onkar Singh

Related Topics: AJAX & REA, Web 2.0

AJAX & REA: Article

Instant Instant Messaging: Just Add Web Sockets

Chat requires full-duplex communication

Unlike traditional AJAX, in which each XMLHttpRequest consists of a round trip which sends and then receives data from a remote server, a Web Socket sends and receives asynchronously on a single connection. This allows WebSockets to

  • Reuse the same TCP stream
  • "Push" data to the browser and
  • Skip redundant HTTP headers

The WebSocket API is straightforward, as the interface definition in the current HTML 5 specification shows:

[Constructor(in DOMString url)]|
interface WebSocket {
readonly attribute DOMString URL;
// ready state
const unsigned short CONNECTING = 0;
const unsigned short OPEN = 1;
const unsigned short CLOSED = 2;
readonly attribute long readyState;
// networking
attribute EventListener onopen;
attribute EventListener onmessage;
attribute EventListener onclosed;
void postMessage(in DOMString data);
void disconnect();
};

The only twist is that the constructor also initiates the outbound connection. Due to the single-threaded nature of JavaScript, event handlers can be safely attached to the newly constructed object. For instance, the following code adds a handler for the "open" event that will never be called prematurely, even though the process of opening the socket appears to have already begun. In another language, similar code might create a race condition, but in JavaScript, it's perfectly safe.

var mySocket = new WebSocket("ws://example.com/server");
mySocket.addEventListener("open", openHandler);

While at first glance this usage may be somewhat confusing, this choice is not wholly without benefit. Because the connection opens when the WebSocket object is created, Web Sockets cannot be reused nor can they listen for incoming connections. This removes some of the ambiguity found in other socket APIs.

Web Socket objects dispatch events when the connection state changes ("open" and "closed") and upon receiving a new frame ("message"). Like XMLHttpRequest, Web Sockets have the readyState property, which can have the following values:

  • CONNECTING         0
  • OPEN                    1
  • CLOSED                2

The WebSocket interface, including events, callbacks, readyState, and postMessage, is consistent with existing browser APIs. In that respect, it's like an updated XMLHttpRequest with real-time capabilities.

Cross-Domain Security
Security is a major concern with browser technology. After all, cross site attacks have accounted for some truly dangerous exploits. The Web Socket protocol contains security mechanisms meant to stave off such attacks.

If low-level sockets were exposed directly to JavaScript, unsuspecting web visitors could be made to participate in sophisticated distributed attacks. Not the least of these potential exploits would connect browsers directly to unsuspecting mail servers to send spam. In order to avoid the sort of doomsday scenario that would "break the Internet," any sort of socket in the browser must impose additional security restrictions. Flash and Silverlight require side-participation from a security policy server before allowing socket connections. This is an unfortunate compromise, but it allows direct connectivity at the TCP level. Instead of adopting this approach, Web Sockets use a single connection with an opening handshake.

The Web Socket handshake is a strict initial exchange between the browser and server. The handshake identifies the protocol, destination host name, and origin. This allows services that are not expecting connections from the web to reject attacks. Likewise, services intended for use from a particular set of origin domains can enforce strict security policies, including same-origin.

[Client Sends]
GET /services/chat HTTP/1.1
Upgrade: WebSocket
Connection: Upgrade
Host: chat.example.com:81
Origin: http://www.example.com:80

[Server Responds]
HTTP/1.1 101 Web Socket Protocol Handshake
Upgrade: WebSocket
Connection: Upgrade
websocket-origin: http://www.example.com:80
websocket-location: ws://chat.example.com:81/services/chat

The Web Socket handshake resembles HTTP, but it is explicitly not HTTP. This duality ensures compatibility with proxies and other intermediaries while allowing the specification to define behavior that falls outside of standard HTTP.

Web Sockets cannot connect directly to the same servers that TCP sockets can. In order for a Web Socket-capable browser to communicate with a server, either the server must be updated to accept Web Socket connections or a bridge must adapt the protocol. Kaazing's Web Socket server, Kaazing Enterprise Gateway, provides that bridging functionality by brokering TCP socket connections to web browsers. Kaazing Enterprise Gateway allows browsers and network servers to communicate efficiently, enforces access control, and includes client libraries for use with chat and other application protocols.

Framed Messages
The post-handshake portion of the Web Socket protocol consists of variable-length, framed messages. Users of native sockets know that the option that creates a TCP socket is SOCK_STREAM. Web Sockets, although designed to solve a similar problem as TCP sockets, are not streaming.

The Web Socket protocol sits on top of TCP and consists of framed UTF-8 strings. There are provisions in the HTML 5 specification to eventually support binary frames. On the wire, a typical WebSocket appears as a byte of all zeros (0x00), a string, and a byte of all ones (0xFF). The bytes 0xFF and 0x00 never appear in UTF-8 strings and act as frame delineators. This guarantees that each message event contains the text of a complete message. This may seem like the bare minimum. After all, XMLHttpRequest returns text responses. When using TCP directly, however, the only atomic units are bytes. The responsibility for interpreting higher-level constructs (including strings) lies on the shoulders of the developer. Since Web Socket frames contain complete UTF-8 strings, it eliminates the need to buffer and parse streams of bytes in simple text-oriented protocols.

The downside is that stream-oriented protocols that do not require framing are less efficient when implemented on top of framed messages. The not-insignificant upside is that it becomes trivial to send and receive complete strings out of the box. Sending strings is a very common case, and the practice of sending JSON or XML can continue easily over Web Sockets. Even with enhanced connectivity capabilities, text encodings are likely to dominate web programming for some time.

Architecture: Simplify, Simplify
One of the obstacles for real-time messaging on the web, and therefore also for chat, are web servers and frameworks. Most implementations of chat over HTTP involve running a native chat client on the server-side and bridging or translating the semantics of chat into a different format for consumption in JavaScript. This bridging approach is cumbersome, inefficient, and now unnecessary. In real-time applications, web servers get in the way of simple client/server architectures. Access to a bidirectional communication API from JavaScript promotes applications with end-to-end participation. Now that web clients have emerged as legitimate platforms for rich applications, it makes sense to move some logic away from the middle tier.

There are scalability benefits from a socket-based architecture, as well. In addition to the obvious vertical scalability boost from shedding the overhead of HTTP, sockets put the responsibility for scaling out in the appropriate place. An application that connects to a cluster of chat servers scales just as well as the chat servers themselves. The fact that the connections originated from web browsers does not impose additional scalability limitations.

With a full-duplex connection, building a chat client for the web can be as straightforward as building chat for the desktop. On the desktop, the client would open a connection to the destination server and communicate using a standard chat protocol. On the web, we would like to do the same. The simple, elegant client/server approach suits web browsers with Web Sockets extremely well.

More Stories By Frank Salim

Frank Salim is a polyglot programmer with a keen interest in making life easier for his fellow coders. He leads WebSocket development at Kaazing and is the front man for Kaazing's open source project at kaazing.org. Salim is an open source advocate and a committer in several open source projects. He is a regular author and contributor to the online tech magazine Comet Daily.

Comments (1)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.