The previous version of IXmlWriter will generate the XML string <root><element att="""/></root>, which is invalid and will be rejected by a XML parser. The rules for XML attribute escaping are given by Section 2.3 of the XML 1.0 spec—specifically, the AttValue literal:
This Backus-Naur form-like construct says that attribute values can be enclosed in either single or double quotes, and that the characters <, &, and the respective quotation character cannot appear between these quotes. However, with the exception of < (see Well-formedness constraint: No < in Attribute Values—thanks dbt), we can insert escaped versions of these characters. As we always encase attribute values in double quotes, we only need to worry about escaping the " character and not the ' character. Let’s construct a test case:
1
2
3
4
5
6
7
8
9
StringXmlWriter xmlWriter;
xmlWriter.WriteStartElement("root");
xmlWriter.WriteStartElement("element");
xmlWriter.WriteAttributeString("att", "\"&");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndElement();
std::string strXML = xmlWriter.GetXmlString();
// strXML should be <root><element att=""&"/></root>
Note that we are now required to perform escaping (albeit with different characters) in two separate functions: WriteString() and WriteAttributeString(). This is a prime candidate for refactoring—we can separate the escaping code into its own function, and we can make such large changes with confidence because we have a test suite to verify that changed code is correct. Here’s the new code:
Because we cannot insert a < character into an attribute value, escaped or otherwise, we should explicitly forbid this value in the function WriteAttributeString(). I will be sure to address this when I get to error handling in a future post. However, be sure to be aware of this constraint when you design your XML schemas!