XPath is a language for selecting nodes from an XML document. XPath is used extensively in XSLT and other XML technologies. I also vastly prefer using XPath (e.g. with XPathNavigator) over the XML DOM when manipulating XML in a non-streaming fashion.
In XPath, strings must be delimited by either single or double quotes. Given a quote character used to delimit a string, one can’t represent that same quote character within the string. This means that if you decide to use single quotes to delimit your XPath string, you couldn’t represent the string O’Reilly; use double quotes, and you can’t represent “Hello”.
Microsoft has created a new, lightweight C++ XML processing library called
XmlLite. It includes a streaming XML writing class patterned after
.NET’s System.Xml.XmlWriter.
XmlTextWriter is .NET’s class for writing XML in a forward-only streaming manner. It is highly efficient and is the preferred way to generate XML in .NET in most circumstances. I find XmlTextWriter so useful I wrote a partial C++ implementation of it in Implenting IXmlWriter Series.
Unfortunately, XmlTextWriter isn’t quite as strict as it could be. It will let slip some invalid XML such as duplicate attributes, invalid Unicode characters in the range 0×0 to 0×20, and invalid element and attribute names. You can read about XmlTextWriter’s limitations in the article Customized XML Writer Creation.
When using XSLT’s format-number() function to format a decimal, consider using a zero in the least significant place of the decimal part of your format string. This will allow a number with a 0 integer part to display correctly.
This is part 14/14 of my Implementing IXmlWriter post series.
Today I will add support for writing the generated XML to a C++ stream to last time’s IXmlWriter.
Finally the reason why I’ve insisted on calling this series IXmlWriter (instead of StringXmlWriter) should become clear: I’ve been planning on supporting writing the generated XML to more than just a string. Specifically, today I will add the ability to write the XML to a C++ ostream object, a base class in the C++ iostream library which defines a writable stream.
The idea behind the pimpl idiom is to hide as much of the class definition as possible in order to avoid requiring users of the class to recompile if the class’s private members are changed. It is accomplished by moving all private members (functions, data, etc.) into a separate class (called the implementation or pimpl class) hidden from the class definition, and replacing these members with an opaque pointer to a forward declaration of this class. It works because a C++ compiler does not need to have the full definition of a class visible in order to allocate space for a pointer to the class; every pointer is a constant, fixed size (often 4 bytes).
Pretty-printing is the addition of whitespace at predetermined locations to make the resulting XML easier to read than when it is all on one line. In the .NET Framework’s System.Xml.XmlTextWriter class, it is supported by the properties Formatting, which allows you to enable or disable pretty-printing; Indentation, which allows you to specify how many whitespace characters indentation should use; and IndentChar, which allows you to specify the whitespace character to use for indentation. For IXmlWriter, I instead chose to expose these features exclusively through the constructor. This frees me from the worry of a user trying to change these properties after IXmlWriter has already begun writing XML, which could produce awkward results. Default parameters are used to make the use of pretty-printing optional and straightforward.
Namespaces are defined by the W3C recommendation Namespaces in XML. Using namespaces requires two parts: a namespace declaration, which associates a prefix with a namespace name (a user-defined, ideally globally-unique string which defines the namespace, often in the form of a URL); and the assignment of XML elements and attributes to this namespace by using the aforementioned prefix.
Comments MAY appear anywhere in a document outside other markup; in addition, they MAY appear within the document type declaration at places allowed by the grammar.
Considering this, we should allow writing comments in virtually every WriteState that the IXmlWriter can be in. In fact, some quick thought confirms that we should allow it for every WriteState but WriteState_Attribute, as a comment cannot be legally represented between the quotation marks which delimit an attribute value. With this in mind, here’s the test case I wrote:
WriteStartDocument() writes the XML declaration (i.e. <?xml version="1.0"?>) and WriteEndDocument() closes all open attributes and elements and sets the IXmlWriter back in the initial state. Adding support for these functions is straightforward. Note that I have introduced a new IXmlWriter state called WriteState_Prolog; this will be important later.