Welcome!

Machine Learning Authors: Zakia Bouachraoui, Yeshim Deniz, Elizabeth White, Pat Romanski, Liz McMillan

Related Topics: Microservices Expo, Industrial IoT

Microservices Expo: Article

SOA with Document-Centric XML Processing

This article introduces the concept of document-centric XML processing and a set of emerging document-centric capabilities

This article introduces the concept of document-centric XML processing and a set of emerging document-centric capabilities such as cutting, splitting, and splicing documents at the byte level. It also explains how it solves one of the most fundamental technical issues hampering enterprise SOA and XML application development: the redundant serialization and de-serialization of object-oriented XML processing models such as DOM.

Public Enemy #1: DOM's Problem of Modifying XML
If a DOM-based application modifies a particular text node of an XML document, the following steps are needed to accomplish that:

  1. Decode characters
  2. Create string objects by taking apart the input document
  3. Allocate node objects to build the DOM tree
  4. Navigate to the text node (manually or by XPath)
  5. Attach a new text node
  6. Encode characters
  7. Byte concatenation
  8. Garbage collecting node and string objects

If you focus on the objective, I think many readers will realize that the process outlined above doesn't really make sense. It is, in fact, absurd. DOM processing incurs at least the following three round-trip overheads in those steps:

  • Every time a character is decoded, it eventually needs to be encoded again.
  • Every time a document is taken apart of any change, it needs to be put back together (by concatenation).
  • Every time an object (e.g., strings, nodes, etc.) is allocated, it will eventually go out of scope and be garbage collected.

Because those round-trips pretty much restore the document to the original state, they are nothing but a waste of CPU cycles and memory. Notice that modifying a text node can be done far more efficiently by humans using a text editor. To edit a text node, just open the document with NotePad, move the cursor to the text node, make the change in-place, and save it. This time, the update is "incremental," meaning it does not touch irrelevant parts of the document. If we, humans, can edit XML like this, why can't XML parsers?

To me, the answer to this question reveals some of the deep-rooted technical problems in software development today. Below are some of my observations on this topic:

  • It significantly impacts your application performance: When applications process XML in a read-only fashion, the base-line performance is decided by XML parsing. If applications both read and write XML data, the base-line performance is typically cut in half (as serialization and de-serialization are equal in performance).
  • It is a common, but deep, problem: Have you wondered, given that XML is ubiquitous, why nobody seems to be complaining? One way to look at this is since this is the way things have always been, everyone seems to get used to it. To make matters worse, solutions don't exist to make the problem look obvious. We end up with a ubiquitous issue that is surprisingly non-obvious and from which there is almost no escape.
  • Hidden from OO perspective: If you live in a pure OO world, the redundant de-serialization/serialization process - the textbook approach of XML processing - is very much the right thing to do. This problem is again hidden.

More Stories By Jimmy Zhang

Jimmy Zhang is a cofounder of XimpleWare, a provider of high performance XML processing solutions. He has working experience in the fields of electronic design automation and Voice over IP for a number of Silicon Valley high-tech companies. He holds both a BS and MS from the department of EECS from U.C. Berkeley.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


CloudEXPO Stories
Sanjeev Sharma Joins November 11-13, 2018 @DevOpsSummit at @CloudEXPO New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
It cannot be overseen or regulated by any one administrator, like a government or bank. Currently, there is no government regulation on them which also means there is no government safeguards over them. Although many are looking at Bitcoin to put money into, it would be wise to proceed with caution. Regular central banks are watching it and deciding whether or not to make them illegal (Criminalize them) and therefore make them worthless and eliminate them as competition. ICOs (Initial Coin Offerings) are something most have no idea as to what it means and how you utilize it. Where is the "Stamp of Approval" or "Stamp of Legitimacy" on some of these Bitcoin websites (how do you know you are not dealing with a scammer?)
We are in a digital age however when one looks for their dream home, the mortgage process can take as long as 60 days to complete. Not what we expect in a time where processes are known to take place swiftly and seamlessly. Mortgages businesses are facing the heat and are in immediate need of upgrading their operating model to reduce costs, decrease the processing time and enhance the customer experience. Therefore, providers are exploring multiple ways of tapping emerging technologies to solve this industry problem. During this session, Chander Damodaran, Chief Blockchain Architect at Brillio Technologies, will discuss how blockchain could transform the mortgage business and its value chain. Blockchain can bridge the gap and provide a seamless digital channel to enable quicker and transparent mortgage processing thereby elevating the overall experience and helping drive costs down.
Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throughout enterprises of all sizes. We are offering early bird savings on all ticket types where you can save significant amount of money by purchasing your conference tickets today.