OK, so how do we store XML?
Published: 06 Mar 2003 10:34 GMT

There's no debating the trend toward widespread adoption of XML in the development industry. However, the lack of industry standards for storing XML documents means near-zero interoperability between various vendor products. Further, storage and processing problems cause system performance issues or meaningless search results when conventional relational databases store large XML documents, and the prevalent strategies for overcoming these problems cause further complications.
If the application of XML continues to grow at its current rate, clearly these issues will have to be overcome. Two possible solutions are the adoption of a more XML-friendly query language or more XML-friendly database systems. Before diving into these alternatives, allow me to first explain what's wrong with our current solutions.
XML + RDBMS = nightmare
At the risk of sounding like a bad sci-fi picture, imagine this: In the not-too-distant future, user-defined extensible markup language (XML) schema will be widely used to describe data residing in all manner of enterprise-wide systems. These schemas are non-standard in the extreme and range from Microsoft Office documents on a central server to customer relationship management systems to business-to-business Web services. Developers are forced to use SQL to search for and retrieve XML documents from the relational database management systems (RDBMSs) typically deployed for persistent data storage.
The two most common solutions for storing XML in an RDBMS, mapping the schema to database rows and storing the entire document as a single character large object (CLOB) field, both present limitations. In the mapping method, the database has no awareness of the data's context or hierarchy. Parts of the XML document are spread around the database and physically occupy different parts of the server. As a result, any SQL queries involve a time-consuming search for and reconstruction of the parts. The CLOB method, on the other hand, avoids these context issues. Instead of mapping schema to rows, the database preserves the data context and hierarchy in one unit. However, a SQL query cannot look inside the field holding the document and interpret it -- the only way to examine parts of a document is to return the whole thing in a result set.
In simple terms, we're talking about a potential nightmare here. The only real solution lies in choosing either a different type of database or a different query language.









