| By Jimmy Zhang | Article Rating: |
|
| September 14, 2008 11:00 PM EDT | Reads: |
3,471 |
My question becomes: Am I the only one seeing the elephant in the room?
How VTD-XML Changes the Picture
Simply put, VTD-XML provides a solution so spectacular that the problem is completely gone.
The first part of this article series introduced VTD-XML as a memory-efficient, high-performance XML parser with integrated indexing and XPath. Virtually every technical benefit of VTD-XML is, one way or another, the result of non-extractive parsing, meaning the original XML text is loaded in memory and fully preserved. However, the most important benefits of VTD-XML - the ones that truly set it apart from other XML processing models - lie in its unique ability to manipulate XML document content at the byte level. Below are three distinct, yet related, sets of capabilities available in the latest version of VTD-XML.
- Incremental XML modifier: You can modify an XML document incrementally through the XMLModifier, which defines three types of "modify" operations: inserting new content into any location (i.e., offset) in the document, deleting content (by specifying the offset and length), and replacing old content with new content - which effectively is a deletion and insertion at the same location. To compose a new document containing all the changes, call XMLModifier's output(...) method.
- XML slicer and splicer: You can use a pair of integers (offset and length) to address a segment of XML text so your application can slice the segment from the original document and move it to another location in the same or a different document. The VTDNav class exposes two methods that allow you to address an element fragment: getElementFragment(), which returns a 64-bit integer representing the offset and length value of the current element, and getElementFragmentNs() (in the latest version), which returns an ElementFragmentNs object representing a "namespace-compensated" element fragment. The latest version also transparently supports transcoding, so you can perform cutting and pasting across documents with different encoding formats.
- XML editor: You can directly edit the in-memory copy of the XML text using VTDNav's overWrite(...) method, provided that the original tokens you're overwriting are wide enough to hold the new byte content.
Using VTD-XML as an incremental modifier to update the text node, you basically navigate the VTD records to the right location, stick in the change, and generate a new document - exactly the same way you would do it with NotePad. Listing 1 shows a simple application updating a text node using VTD-XML.
XML Processing: Object Oriented vs Document Centric
Traditional XML processing models, such as DOM, SAX, and various object data binding tools, are designed around the notion of objects. The XML text - merely the output of object serialization - is relegated to the status of a second-class citizen. You base your applications on DOM nodes, strings, and various business objects, but rarely on the physical documents. If you have followed my analysis so far, it's become obvious that this object-oriented approach of XML processing makes little sense as it causes performance hits from virtually all directions (an in-depth discussion on the topic can be found in "the performance woe of binary XML"). Not only are object creations and garbage collection inherently memory- and CPU-intensive, but applications incur the cost of re-serialization with even the smallest changes to the original text.
What is "document-centric" XML processing? In non-extractive parsing, the XML text - the persistent data format - is the starting point from which everything else comes about. Whether it's parsing, XPath evaluation, modifying content, or slicing element fragments, by default, you no longer work with objects. You only do that when it makes sense. More often than not, you treat documents purely as syntax, and think in bytes, byte arrays, integers, offsets, lengths, fragments, and namespace-compensated fragments. The first-class citizen in this paradigm is the XML text. The object-centric notions of XML processing, such as serialization and de-serialization (or marshaling and un-marshaling), as shown in Figure 1, are often displaced, if not replaced, by more document-centric notions of parsing and composition. Increasingly, you will find that your XML programming experience is getting simpler. Not surprisingly, the simpler, intuitive way to think about XML processing is also the most efficient and powerful (see Table 1 for the technical comparison of DOM and VTD-XML).
Published September 14, 2008 Reads 3,471
Copyright © 2008 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Jimmy Zhang
Jimmy Zhang is a cofounder of XimpleWare, a provider of high performance XML processing solutions. He has working experience in the fields of electronic design automation and Voice over IP for a number of Silicon Valley high-tech companies. He holds both a BS and MS from the department of EECS from U.C. Berkeley.
- Kindle 2 vs Nook
- Cloud Computing on Gartner's Top 10 List and SYS-CON Events' 2010 Calendar
- Confessions of a Ulitzer Addict
- IBM Hardware Chief, Intel VC Exec Arrested in Insider Trading Scam
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- Ulitzer.com Named Exclusive "New Media" Sponsor of Cloud Computing Conference & Expo
- Moving Your RIA Apps into the Cloud: Seven Challenges
- Adobe’s Aiming ColdFusion at Multiple Clouds
- Windows 7 – Microsoft’s First Step to the Cloud
- Ulitzer Provides a Powerful Social Journalism Platform
- Jill Tummler Singer, Deputy CIO of CIA, Keynotes at GovIT Expo
- Open Source Mobile Cloud Sync and Push Email
- Kindle 2 vs Nook
- The Difference Between Web Hosting and Cloud Computing
- Cloud Computing on Gartner's Top 10 List and SYS-CON Events' 2010 Calendar
- Ajax in RichFaces 3.3, JSF 2 and RichFaces 4
- Confessions of a Ulitzer Addict
- IBM Hardware Chief, Intel VC Exec Arrested in Insider Trading Scam
- My Thoughts on Ulitzer
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- Ulitzer.com Named Exclusive "New Media" Sponsor of Cloud Computing Conference & Expo
- US Post Office Hops a Ride on NetSuite’s Cloud
- Moving Your RIA Apps into the Cloud: Seven Challenges
- Adobe’s Aiming ColdFusion at Multiple Clouds
- Building a Drag-and-Drop Shopping Cart with AJAX
- What Is AJAX?
- Google Maps! AJAX-Style Web Development Using ASP.NET
- Flashback to January 2006: Exclusive SYS-CON.TV Interviews on "OpenAjax Alliance" Announcement
- AJAXWorld Conference & Expo to Take Place October 2-4, 2006, at the Santa Clara Convention Center, California
- AJAX Sponsor Webcasts Are Now Available at AJAXWorld Website
- How and Why AJAX, Not Java, Became the Favored Technology for Rich Internet Applications
- "Real-World AJAX" One-Day Seminar Arrives in Silicon Valley
- AJAXWorld University Announces AJAX Developer Bootcamp
- AJAX Support In JadeLiquid WebRenderer v3.1
- Where Are RIA Technologies Headed in 2008?
- Struts Validations Framework Using AJAX




































