Welcome!

Machine Learning Authors: Zakia Bouachraoui, Liz McMillan, Roger Strukhoff, Pat Romanski, Carmen Gonzalez

Related Topics: Machine Learning

Machine Learning : Article

Real-World AJAX Book Preview: Retrieving and Sending File Content

The Web client cannot, in the traditional role of things, provide Web content

This content is reprinted from Real-World AJAX: Secrets of the Masters published by SYS-CON Books. To order the entire book now along with companion DVDs for the special pre-order price, click here for more information. Aimed at everyone from enterprise developers to self-taught scripters, Real-World AJAX: Secrets of the Masters is the perfect book for anyone who wants to start developing AJAX applications.

Retrieving (and Sending) File Content
The Web client cannot, in the traditional role of things, provide Web content. Of course, that's not quite true - form content sent to the server either directly via a form post submission, or via an XMLHttpRequest object, are very definitely content being "served" to the server. The difference here is that the server is a passive entity - it can only send information when it gets a request from a client, while the client is increasingly able to do both. Even that definition begins to break down when you consider that the server can make requests from other servers for content.

Indeed, the recursive nature of this process points increasingly to the role of the server, primarily as a way point, or node, in the larger network and the fact that Web pages, much like e-mail, are as likely to originate from an external server as they are from the requested one.

This is a considerable shift from the past, and one that has largely been driven by the increasing power of syndication. A syndicated file, whether in the various RSS formats or an atom format, is fundamentally a collection of links associated with metadata about each link. Typically, most Web browsers exist in a security sandbox that prohibits content coming in from any but the originating server. This restriction is aimed primarily at the danger of cross-scripting, in which a block of JavaScript is loaded from an external source outside of the requesting domain, which could in turn load in other resources and get access to sensitive information.

Unfortunately, this also makes it difficult to create browser-centric Web applications that pull content from external resources. Perhaps one of the most archetypal AJAX applications is the news reader. Such an application is comparatively simple to write, especially with the use of client-side XSLT support, but it relies upon the use of a "friendly"server that is willing to retrieve XML content from other servers and pass it on.

One of the more useful server-side scripts I've ever developed (and I've rewritten it for any number of languages over the years) is one that pulls query string parameters to retrieve both an XML datasource and an XSLT transformation, passing these and any other parameters into the transformation. An example written in PHP 5 (transform.php) is shown as follows:

<?php

header("Content-Type: text/xml");
/*
// Do a Server Dump
foreach ($_SERVER as $key => $value){
     print("$key:$value<br/>");
     }*
/* use either & or ; as delimiters in query string */
$qsbase = $_SERVER['QUERY_STRING'];
$qs_arr = split('[;&]',$qsbase);
$qs = array();
foreach ( $qs_arr as $key => $value ){
     $pair = explode('=',$value);
     $qs[$pair[0]] =$pair[1];
     }
$qs['_server'] = "http://".$_SERVER['SERVER_NAME'].$_SERVER['SCRIPT_NAME'];
/* load the xml file and stylesheet as domdocuments */
$xt = $qs['xt'];
$x = $qs['x'];
$xsl = new DomDocument();
$xtpath = "{$xt}";
$xpath = "{$x}";
$xsl->load($xtpath);
$inputdom = new DomDocument();
$inputdom->load($xpath);
$proc = new XsltProcessor();
$proc->registerPhpFunctions();
$xsl = $proc->importStylesheet($xsl);
foreach ($qs as $key => $value){
     $proc->setParameter(null, $key, $value);
     }
$newdom = $proc->transformToDoc($inputdom);
print $newdom->saveXML();

Listing 3-1. transform.php

This particular script is useful for a number of reasons. It effectively lets you use one or more XML sources to populate a site using XSLT, which was essentially designed for the task. The conversion of query string parameters into XSLT parameters is straightforward. Most server-side XSLT implementations (such as the libXSLT used by PHP5) include support both for the EXSLT library functions (which were the precursors to the upcoming XSLT 2.0 specification and can be found at www.exslt.org) and extension functions in the host language. In transform.php this is indicated by $proc->registerPhpFunctions(), which serves to register all locally defined PHP functions into the php: namespace. More information about this specific functionality is available at http://ca3.php.net/xsl_xsltprocessor_register_php_functions.

An important point to understand in an application such as this is that it does not carry any semantic information by itself. Instead, the semantics are supplied by parameters - the URI of a data source, the URI of a transformation, potentially secondary data sources and conditionals. When your information moves around as XML, this particular approach can prove extraordinarily powerful, especially in conjunction with the XMLHttpRequest capability increasingly resident on the client. For instance, consider the previous component, which provided not only a way of loading in content from the server into a container, but also gave a means to "chunk" that data into pages, combined with a server component that is able to load information from any XML external resource, possibly cleaning that information of potentially dangerous factors (script code, style content and so forth, in-line event handlers, and so forth).

Such a "mashup" makes an incredible amount of sense when used for RSS newsfeeds. Such feeds, produced by Web news portals and other sites, contain article links and synopses, and increasingly are being used to carry the text content of articles directly. Moreover, most RSS formats, of the form, as follows:

<feed>
   <header>
     <title>Title Text</text>
     <link>LinkURI</link>
       <id>headerGUID</id>
       <summary>Header Summary Information</summary>
       <item>
       <title>Item 1 Title</title>
       <link>ItemLinkURI</link>
       <id>itemGUID</id>
       <summary>Item Summary Information</summary>
     </item>
       <item>
       <title>Item 2 Title</title>
       <link>ItemLinkURI</link>
       <id>itemGUID</id>
       <summary>Item Summary Information</summary>
     </item>
       <!-- More Items -->
   </header>
</feed>

works well for transporting any bundle or list of items. As it turns out, such lists are remarkably common - news items, lists of pictures, mail, membership lists, event listings, houses (and other items) for sale, grocery lists - indeed, the lists of such lists are well-nigh endless.

Moreover, a list of items, unlike a singleton of Web page content, typically tends to make obvious applications. The aforementioned grocery list, for instance, opens up possibilities for creating marketing tie-ins for supermarkets - you enter a list of typical groceries that you need (or that the stores can glean from buying habits) and the store can then tie into the list of those items with promotional coupons, recipe recommendations, and so forth, all downloaded to your friendly RSS feed, ready for both linking and printing. Couple that with the ability to select from that list and you've got a very nice and remarkably simple application.

The challenge that is currently faced in this particular space comes from an embarrassment of riches. There are currently at least 14 such RSS feeds in active use today - each with slight (or sometimes fairly profound) variations, and each used by a large enough market share that the inevitable whittling down of such alternate standards is only just beginning to be felt.

Most news feeds fit into one of three distinct families:

UserLand RSS: A modification of the original RSS specification introduced by Netscape in 1996, this encompasses RSS 0.91 through 0.94 and RSS 2.0, but not RSS 1.0. While ostensibly XML, the Userland feeds often violate general XML standards and tend to cause headaches for developers working in that space.

RDF Based: RSS 1.0 was an attempt by a number of such developers to create an RSS specification based upon the W3C's RDF language, for which it is well suited. It was the creation of RSS 1.0 that prompted the UserLand faction to create RSS 2.0.

Atom Feeds: Recognizing the benefits of RSS feeds and especially their applicability to areas such as blogging, yet another group (which had some overlap with the second group - the politics of the Web are just as engaging as the politics of anything else) created the Atom format and a publishing API to be used with it. Atom has been gaining traction, especially in XML-oriented systems.

As it turns out, sometimes the simplest solution to dealing with these various formats is to run a transformation that will map any of these to a single target format, which can then either be transformedto an appropriate output format or used as that format itself. While the best direct path would be to a format such as Atom, for purposes of this chapter I decided to focus on developing an output to an XHTML format that contains enough information to nonetheless be factored into the Xinclude component described above. This transformation is given in ProcessNewsFeed.xsl:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rss="http://purl.org/rss/1.0/"
   xmlns:a="http://www.w3.org/2005/Atom" xmlns:exslt="http://exslt.org/common"xmlns = ""
   xmlns:a3="http://purl.org/atom/ns#" xmlns:h="http://www.w3.org/1999/xhtml"version="1.0"
   exclude-result-prefixes="rdf rss a exslt">
   <!-- Generates xm output -->
   <xsl:output method="xml" media-type="text/xml" indent="yes" cdata-sectionelements="a"/>
   <!-- $x = a stub -->
   <xsl:param name="x"/>
   <!-- $xs = The transformation file - in most cases, this file -->
   <xsl:param name="xt"/>
   <!-- $feed = The URL of the feed (ampersands can be replaced with semi-colons)
-->
   <xsl:param name="feed"
   select="'http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml'"/>
   <!-- $feedDoc = An instance of the initial feed document -->
   <xsl:variable name="feedDoc" select="document($feed)"/>
   <!-- Transform works on the external feed, not the initial stub -->
   <xsl:template match="/">
     <xsl:apply-templates select="$feedDoc/*"/>
   </xsl:template>
   <!-- RSS 1.0 specification root -->
   <xsl:template match="rdf:RDF">
   <h:div class="newsfeed">
     <xsl:apply-templates select="rss:channel" mode="rss1"/>
   </h:div>
   </xsl:template>
   <xsl:template match="rss:channel" mode="rss1">
   <h:h1 class="feedtitle">
   <xsl:value-of select="rss:title"/>
   </h:h1>
   <h:ul>
     <xsl:apply-templates select="rss:items" mode="rss1"/>
   </h:ul>
   </xsl:template>
   <xsl:template match="rss:items" mode="rss1">
   <xsl:for-each select="rdf:Seq/rdf:li/@rdf:resource">
   <xsl:variable name="currentResource" select="string(.)"/>
   <h:li>
     <xsl:apply-templates
   select="/rdf:RDF/rss:item[string(@rdf:about) = $currentResource]" mode="rss1"/>
   </h:li>
   </xsl:for-each>
   </xsl:template>
   <xsl:template match="rss:item" mode="rss1">
   <h:div class="item" id="{generate-id(.)}">
   <!-- The $descr/$description variables are used to retrieve descriptive content
and sanitize this content to insure it's safe for browser consumption
-->
   <xsl:variable name="descr">
   <xsl:value-of select="rss:description/text()" disable-outputescaping="yes"/>
   </xsl:variable>
   <xsl:variable name="description">
   <xsl:for-each select="exslt:node-set($descr)">
     <xsl:apply-templates select="*|text()" mode="sanitize"/>
   </xsl:for-each>
   </xsl:variable>
   <h:div class="item_title">
     <h:a href="{rss:link}" target="display" title="{$description}">
       <xsl:value-of select="rss:title"/>
     </h:a>
   </h:div>
   <h:div class="item_description">
     <xsl:value-of select="$description" disable-output-escaping="no"/>
       </h:div>
     </h:div>
   </xsl:template>
   <xsl:template match="rss[@version='2.0' or @version='0.91']">
   <h:div class="newsfeed">
     <xsl:apply-templates select="channel" mode="rss2"/>
   </h:div>
   </xsl:template>
   <xsl:template match="channel" mode="rss2">
   <h:h1 class="feed-title">
     <h:img src="{image/url}" height="19px" align="left" style="marginright:3px;"/>
       <xsl:value-of select="title"/>
   </h:h1>
   <h:ul>
       <xsl:apply-templates select="item" mode="rss2"/>
   </h:ul>
   </xsl:template>
   <xsl:template match="item" mode="rss2">
   <xsl:variable name="descr">
     <xsl:value-of select="description/text()" disable-outputescaping="no"/>
   </xsl:variable>
   <xsl:variable name="description">
   <xsl:for-each select="exslt:node-set($descr)">
       <xsl:apply-templates select="*|text()" mode="sanitize"/>
   </xsl:for-each>
   </xsl:variable>
   <h:li>
   <h:div class="item" id="{generate-id(.)}">
   <h:div class="item_title">
     <h:a href="{link}" target="display" title="{$description}">
       <xsl:value-of select="title"/>
   </h:a>
   </h:div>
   <h:div class="item_description">
       <xsl:value-of select="$description" disable-outputescaping="no"/>
   </h:div>
   </h:div>
   </h:li>
   </xsl:template>
   <xsl:template match="a:feed">
   <h:div class="newsfeed">
   <h:h1 class="feed-title">
     <xsl:value-of select="a:title"/>
   </h:h1>
   <h:ul>
       <xsl:apply-templates select="a:entry" mode="atom"/>
   </h:ul>
   </h:div>
   </xsl:template>
   <xsl:template match="a:entry" mode="atom">
   <xsl:variable name="descr">
     <xsl:value-of select="a:summary/text()" disable-outputescaping="yes"/>
   </xsl:variable>
   <xsl:variable name="description">
   <xsl:for-each select="exslt:node-set($descr)">
       <xsl:apply-templates select="*|text()" mode="sanitize"/>
   </xsl:for-each>
   </xsl:variable>
   <h:li>
   <h:div class="item" id="{generate-id(.)}">
   <h:div class="item_title">
     <h:a href="{a:link}" target="display" title="{$description}">
       <xsl:value-of select="a:title"/>
   </h:a>
   </h:div>
   <h:div class="item_description">
     <xsl:value-of select="$description" disable-outputescaping="no"/>
   </h:div>
   </h:div>
   </h:li>
   </xsl:template>
   <xsl:template match="a3:feed">
   <h:div class="newsfeed">
   <h:h1 class="feed-title">
   <xsl:value-of select="a3:title"/>
   </h:h1>
   <h:ul>
     <xsl:apply-templates select="a3:entry" mode="atom3"/>
   </h:ul>
   </h:div>
   </xsl:template>
   <xsl:template match="a3:entry" mode="atom3">
   <xsl:variable name="descr">
     <xsl:value-of select="a3:summary/text()" disable-outputescaping="no"/>
   </xsl:variable>
   <xsl:variable name="description">
   <xsl:for-each select="exslt:node-set($descr)">
       <xsl:apply-templates select="*|text()" mode="sanitize"/>
   </xsl:for-each>
   </xsl:variable>
   <h:li>
   <h:div class="item" id="{generate-id(.)}">
   <h:div class="item_title">
     <h:a href="{a3:link/@href}" target="display" title="{$description}">
   <xsl:value-of select="a3:title"/>
   </h:a>
   </h:div>
   <h:div class="item_description">
   <xsl:copy-of select="$description" disable-outputescaping="yes"/>
   </h:div>
   </h:div>
   </h:li>
   </xsl:template>
<!-- In general everything passes through the sanitation routine, except ...
-->
   <xsl:template match="*|@*|text()" mode="sanitize">
   <xsl:copy>
       <xsl:apply-templates select="*|@*|text()" mode="sanitize"/>
   </xsl:copy>
   </xsl:template>
   <!-- script blocks -->
   <xsl:template match="*[local-name(.)='script']" mode="sanitize"/>
   <!-- style blocks (because of the possibility of behaviors and XBL bindings)
-->
   <xsl:template match="*[local-name(.)='style']" mode="sanitize"/>
   <!-- link blocks (which can pull in scripts) -->
   <xsl:template match="*[local-name(.)='link']" mode="sanitize"/>
   <!-- and any attribute beginning with 'on', which typically indicates an event handler -->
   <xsl:template match="@*[starts-with(local-name(.),'on')]" mode="sanitize"/>
</xsl:stylesheet>

This code does a number of things, but primarily it pulls in the newsfeed from the URL given in the feed parameter, then uses a series of staggered templates to attempt to format the code into blocks of XHTML.

The one point of complexity worth noting in this is the fact that <description>, <content>, and <summary> blocks often have inline HTML content that is rendered in a CDATA section. This can be resolved into XHTML, but there is a danger in this - once the code is rendered in the browser, potentially dangerous inline script code contained either in script blocks, inline event handlers or style-sheets (via XBL or other behavioral bindings) could be executed. The above tranformation should strip the content of such elements and thus render it inert.

This content is reprinted from Real-World AJAX: Secrets of the Masters published by SYS-CON Books. To order the entire book now along with companion DVDs, click here to order.

More Stories By Kurt Cagle

Kurt Cagle is a developer and author, with nearly 20 books to his name and several dozen articles. He writes about Web technologies, open source, Java, and .NET programming issues. He has also worked with Microsoft and others to develop white papers on these technologies. He is the owner of Cagle Communications and a co-author of Real-World AJAX: Secrets of the Masters (SYS-CON books, 2006).

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


CloudEXPO Stories
The precious oil is extracted from the seeds of prickly pear cactus plant. After taking out the seeds from the fruits, they are adequately dried and then cold pressed to obtain the oil. Indeed, the prickly seed oil is quite expensive. Well, that is understandable when you consider the fact that the seeds are really tiny and each seed contain only about 5% of oil in it at most, plus the seeds are usually handpicked from the fruits. This means it will take tons of these seeds to produce just one bottle of the oil for commercial purpose. But from its medical properties to its culinary importance, skin lightening, moisturizing, and protection abilities, down to its extraordinary hair care properties, prickly seed oil has got lots of excellent rewards for anyone who pays the price.
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected path for IoT innovators to scale globally, and the smartest path to cross-device synergy in an instrumented, connected world.
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
ScaleMP is presenting at CloudEXPO 2019, held June 24-26 in Santa Clara, and we’d love to see you there. At the conference, we’ll demonstrate how ScaleMP is solving one of the most vexing challenges for cloud — memory cost and limit of scale — and how our innovative vSMP MemoryONE solution provides affordable larger server memory for the private and public cloud. Please visit us at Booth No. 519 to connect with our experts and learn more about vSMP MemoryONE and how it is already serving some of the world’s largest data centers. Click here to schedule a meeting with our experts and executives.
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understanding as the environment changes.