Implementing IXmlWriter Part 11: Supporting Namespaces

This is part 11 of my Implementing IXmlWriter post series.

Today I will add support for namespaces to last time’s IXmlWriter.

Namespaces are defined by the W3C recommendation Namespaces in XML. Using namespaces requires two parts: a namespace declaration, which associates a prefix with a namespace name (a user-defined, ideally globally-unique string which defines the namespace, often in the form of a URL); and the assignment of XML elements and attributes to this namespace by using the aforementioned prefix.

Here’s an example of a XML document that uses namespaces:

<?xml version="1.0"?>
<bk:book xmlns:bk='urn:loc.gov:books'>
</bk:book>

The xmlns:bk='urn:loc.gov:books' is the namespace declaration, and it assigns the prefix bk: to the namespace name urn:loc.gov:books. The book element is declared as a member of the urn:loc.gov:books namespace (and not the default, empty namespace) by the usage of this prefix.

There are a few subtleties to the use of namespaces. A common one is that while you can declare a default namespace into which unprefixed elements are automatically assigned (through the use of xmlns="..."), unprefixed attributes are not automatically assigned into this namespace. In other words, the following XML fragments are not equivalent because the title attributes are in different namespaces:

<-- The title attribute is in the
    urn:loc.gov:books namespace -->
<bk:book bk:title='Cheaper by the Dozen'
         xmlns:bk='urn:loc.gov:books'>
</bk:book>
<-- The title attribute is in the
    empty namespace -->
<book title='Cheaper by the Dozen'
      xmlns='urn:loc.gov:books'>
</book>

An important point to note (and one which we will take advantage of shortly) is that the value of the prefix is meaningless — it is simply a shorthand way of denoting the membership of a XML element or attribute in a namespace. In other words, if I replaced bk: with foobar: everywhere in the above code, the resulting document would be equivalent to the original. Therefore, for now, I choose to not allow users of IXmlWriter to control the namespace prefixes — I will assign them automatically as ns1:, ns2:, …

In order to keep track of what namespaces have already been declared, I will store them (in addition to the namespace-qualified name QName) in m_openedElements. Because I need the ability to search all declared namespaces for all opened elements, I will change m_openedElements from a std::stack to a std::vector. Furthermore, because namespaces can only be declared in an essentially stack-like manner, I can assign namespace prefixes by simply counting up the total number of namespace prefixes already declared and adding one. I will not support default namespaces at this time.

Here are the five test cases I developed for this functionality:

// Test simple namespaces
StringXmlWriter xmlWriter;

xmlWriter.WriteStartElement("root", "namespace1");
xmlWriter.WriteEndElement();

std::string strXML = xmlWriter.GetXmlString();
// strXML should be <ns1:root xmlns:ns1="namespace1"/>
// Test attribute namespacing
StringXmlWriter xmlWriter;

xmlWriter.WriteStartElement("root", "namespace1");
  xmlWriter.WriteAttributeString("att", "namespace1", "value");
xmlWriter.WriteEndElement();

std::string strXML = xmlWriter.GetXmlString();
// strXML should be <ns1:root xmlns:ns1="namespace1" ns1:att="value"/>
// Test child namespace declarations
StringXmlWriter xmlWriter;

xmlWriter.WriteStartElement("root", "namespace1");
  xmlWriter.WriteElementString("child", "namespace2", "value");
xmlWriter.WriteEndElement();

std::string strXML = xmlWriter.GetXmlString();
// strXML should be (on one line):
// <ns1:root xmlns:ns1="namespace1">
//   <ns2:child xmlns:ns2="namespace2">value</ns2:child>
// </ns1:root>
// Complicated namespace test
StringXmlWriter xmlWriter;

xmlWriter.WriteStartElement("root", "namespace1");
  xmlWriter.WriteStartElement("child", "namespace2");
    xmlWriter.WriteAttributeString("att1", "namespace1", "value1");
    xmlWriter.WriteAttributeString("att2", "namespace2", "value2");
    xmlWriter.WriteAttributeString("att3", "namespace3", "value3");
    xmlWriter.WriteAttributeString("att4", "value4");
    xmlWriter.WriteStartElement("child2", "namespace3");
      xmlWriter.WriteString("value");
    xmlWriter.WriteEndElement();
  xmlWriter.WriteEndElement();
xmlWriter.WriteEndElement();

std::string strXML = xmlWriter.GetXmlString();
// strXML should be (on one line):
// <ns1:root xmlns:ns1="namespace1">
//   <ns2:child xmlns:ns2="namespace2" ns1:att1="value1" ns2:att2="value2"
//              xmlns:ns3="namespace3" ns3:att3="value3" att4="value4">
//     <ns3:child2>value</ns3:child2>
//   </ns2:child>
// </ns1:root>
// Test "sibling" namespace declarations
StringXmlWriter xmlWriter;

xmlWriter.WriteStartElement("root");
  xmlWriter.WriteStartElement("child1", "namespace1");
  xmlWriter.WriteEndElement();
  xmlWriter.WriteStartElement("child2", "namespace1");
  xmlWriter.WriteEndElement();
xmlWriter.WriteEndElement();

std::string strXML = xmlWriter.GetXmlString();
// strXML should be (on one line):
// <root>
//   <ns1:child1 xmlns:ns1="namespace1"/>
//   <ns1:child2 xmlns:ns1="namespace1"/>
// </root>

Here’s the new header file:

// StringXmlWriter.h

class StringXmlWriter
{
private:
    enum WriteState
    {
        WriteState_Attribute, // An attribute value is being written
        WriteState_Content, // Element content is being written
        WriteState_Element, // An element start tag has been written (and is unclosed)
        WriteState_Prolog, // The prolog is being written
        WriteState_Start, // No Write() methods have been called
    };

    struct OpenElement
    {
        explicit OpenElement(const std::string& localName) :
            QName(localName)
        {
        }

        explicit OpenElement(const std::string& localName,
                             const std::string& prefix) :
            QName(prefix.empty() ? localName : prefix + “:” + localName)
        {
        }

        // The qualified name (namespace prefix-included) of the
        // opened element
        std::string QName;
        // All namespaces declared in this element (maps namespace
        // to namespace prefix)
        typedef std::map<std::string, std::string> Namespaces_t;
        Namespaces_t Namespaces;
    };

    WriteState m_writeState;

    // Need to use a vector instead of a stack because we must be able
    // to iterate over each opened element in the stack to see if a
    // namespace has already been declared.
    typedef std::vector<OpenElement> OpenedElements_t;
    OpenedElements_t m_openedElements;
    std::string m_xmlStr;

public:
    StringXmlWriter();

    std::string GetXmlString() const;
    void WriteAttributeString(const std::string& localName,
                              const std::string& text);
    void WriteAttributeString(const std::string& localName,
                              const std::string& ns,
                              const std::string& text);
    void WriteComment(const std::string& text);
    void WriteElementString(const std::string& localName,
                            const std::string& text);
    void WriteElementString(const std::string& localName,
                            const std::string& ns,
                            const std::string& text);
    void WriteEndAttribute();
    void WriteEndDocument();
    void WriteEndElement();
    void WriteStartAttribute(const std::string& localName);
    void WriteStartAttribute(const std::string& localName,
                             const std::string& ns);
    void WriteStartDocument();
    void WriteStartElement(const std::string& localName);
    void WriteStartElement(const std::string& localName,
                           const std::string& ns);
    void WriteString(const std::string& text);

private:
    // Disable copy construction and assignment
    StringXmlWriter(const StringXmlWriter&);
    StringXmlWriter& operator=(const StringXmlWriter&);

    std::string GetExistingNamespacePrefix(const std::string& ns);
    std::string GetNextNamespacePrefix(const std::string& ns);
};

Here’s the new implementation file:

// StringXmlWriter.cpp

#include "StringXmlWriter.h"

#define ARRAYSIZE(x) ( sizeof(x) / sizeof(x[0]) )

struct CharTranslation
{
    char OriginalChar;
    const char* ReplacementString;
};

static const CharTranslation AttributeValueTranslations[] =
{
    { '"', "&quot;" },
    { '&', "&amp;" },
};

static const CharTranslation CharDataTranslations[] =
{
    { '&', "&amp;" },
    { '<', "&lt;" },
    { '>', "&gt;" },
};

struct OriginalCharEquals :
    public std::binary_function<CharTranslation, char, bool>
{
    bool operator() (const CharTranslation& translation, char ch) const
    {
        return (translation.OriginalChar == ch);
    }
};

static std::string TranslateString(const std::string& originalStr,
                                   const CharTranslation* translations,
                                   int numTranslations)
{
    // Actually one past end, needed for proper std::find_if semantics
    const CharTranslation* endTranslations = translations + numTranslations;

    std::string translatedStr;

    for (std::string::const_iterator stringIter = originalStr.begin();
         stringIter != originalStr.end();
         ++stringIter)
    {
        char ch = *stringIter;

        const CharTranslation* translation = std::find_if
            (
            translations,
            endTranslations,
            std::bind2nd(OriginalCharEquals(), ch)
            );
        if (translation != endTranslations)
        {
            translatedStr += translation->ReplacementString;
        }
        else
        {
            translatedStr += ch;
        }
    }

    return translatedStr;
}

StringXmlWriter::StringXmlWriter() : m_writeState(WriteState_Start)
{
}

std::string StringXmlWriter::GetXmlString() const
{
    return m_xmlStr;
}

void StringXmlWriter::WriteAttributeString(const std::string& localName,
                                           const std::string& text)
{
    WriteStartAttribute(localName);
    WriteString(text);
    WriteEndAttribute();
}

void StringXmlWriter::WriteAttributeString(const std::string& localName,
                                           const std::string& ns,
                                           const std::string& text)
{
    WriteStartAttribute(localName, ns);
    WriteString(text);
    WriteEndAttribute();
}

void StringXmlWriter::WriteComment(const std::string& text)
{
    switch (m_writeState)
    {
    case WriteState_Element:
        // An element is currently open.  Close the element so we can open
        // a new one.
        m_xmlStr += ‘>’;
        m_writeState = WriteState_Content;
        // FALL THROUGH
    case WriteState_Content:
    case WriteState_Prolog:
    case WriteState_Start:
        m_xmlStr += “<!–”;
        m_xmlStr += text;
        m_xmlStr += “–>”;
        break;
    default:
        // It doesn’t make sense to allow writing comments when writing an
        // attribute value.
        // TODO: Generate error
        break;
    }
}

void StringXmlWriter::WriteElementString(const std::string& localName,
                                         const std::string& text)
{
    WriteStartElement(localName);
    WriteString(text);
    WriteEndElement();
}

void StringXmlWriter::WriteElementString(const std::string& localName,
                                         const std::string& ns,
                                         const std::string& text)
{
    WriteStartElement(localName, ns);
    WriteString(text);
    WriteEndElement();
}

void StringXmlWriter::WriteEndAttribute()
{
    switch (m_writeState)
    {
    case WriteState_Attribute:
        m_xmlStr += ‘”‘;
        m_writeState = WriteState_Element;
        break;
    default:
        // TODO: Generate error
        break;
    }
}

void StringXmlWriter::WriteEndDocument()
{
    switch (m_writeState)
    {
    case WriteState_Attribute:
        WriteEndAttribute();
        // FALL THROUGH
    case WriteState_Content:
    case WriteState_Element:
        while (!m_openedElements.empty())
        {
            WriteEndElement();
        }
        break;
    case WriteState_Start:
    case WriteState_Prolog:
        // DO NOTHING
        break;
    default:
        // TODO: Generate error
        break;
    }

    m_writeState = WriteState_Start;
}

void StringXmlWriter::WriteEndElement()
{
    switch (m_writeState)
    {
    case WriteState_Content:
        {
            m_xmlStr += “</”;
            m_xmlStr += m_openedElements.back().QName;
            m_xmlStr += ‘>’;
            m_openedElements.pop_back();
            m_writeState = WriteState_Content;
            break;
        }
    case WriteState_Element:
        {
            m_xmlStr += “/>”;
            m_openedElements.pop_back();
            m_writeState = WriteState_Content;
            break;
        }
    default:
        // TODO: Generate error
        break;
    }
}

void StringXmlWriter::WriteStartAttribute(const std::string& localName)
{
    WriteStartAttribute(localName, “”);
}

void StringXmlWriter::WriteStartAttribute(const std::string& localName,
                                          const std::string& ns)
{
    switch (m_writeState)
    {
    case WriteState_Element:
        {
        std::string nsPrefix;
        bool mustDeclareNamespace = false;

        if (!ns.empty()) {
            nsPrefix = GetExistingNamespacePrefix(ns);
            if (nsPrefix.empty()) {
                nsPrefix = GetNextNamespacePrefix(ns);
                m_openedElements.back().Namespaces[ns] = nsPrefix;
                mustDeclareNamespace = true;
            }
        }

        if (mustDeclareNamespace) {
            m_xmlStr += ” xmlns:”;
            m_xmlStr += nsPrefix;
            m_xmlStr += “=”";
            m_xmlStr += ns;
            m_xmlStr += ‘”‘;
        }

        m_xmlStr += ‘ ‘;
        if (!nsPrefix.empty()) {
            m_xmlStr += nsPrefix;
            m_xmlStr += ‘:’;
        }
        m_xmlStr += localName;
        m_xmlStr += “=”";
        m_writeState = WriteState_Attribute;
        break;
        }
    default:
        // TODO: Generate error
        break;
    }
}

void StringXmlWriter::WriteStartDocument()
{
    switch (m_writeState)
    {
    case WriteState_Start:
        m_xmlStr += “<?xml version=”1.0”?>”;
        m_writeState = WriteState_Prolog;
        break;
    default:
        // TODO: Generate error
        break;
    }
}

void StringXmlWriter::WriteStartElement(const std::string& localName)
{
    WriteStartElement(localName, “”);
}

void StringXmlWriter::WriteStartElement(const std::string& localName,
                                        const std::string& ns)
{
    switch (m_writeState)
    {
    case WriteState_Element:
        // An element is currently open.  Close the element so we can open
        // a new one.
        m_xmlStr += ‘>’;
        // FALL THROUGH
    case WriteState_Content:
    case WriteState_Prolog:
    case WriteState_Start:
        {
        std::string nsPrefix;
        bool mustDeclareNamespace = false;

        if (!ns.empty()) {
            nsPrefix = GetExistingNamespacePrefix(ns);
            if (nsPrefix.empty()) {
                nsPrefix = GetNextNamespacePrefix(ns);
                mustDeclareNamespace = true;
            }
        }

        OpenElement openElement(localName, nsPrefix);
        if (mustDeclareNamespace) {
            openElement.Namespaces[ns] = nsPrefix;
        }

        m_openedElements.push_back(openElement);

        m_xmlStr += ‘<’;
        if (!nsPrefix.empty()) {
            m_xmlStr += nsPrefix;
            m_xmlStr += ‘:’;
        }
        m_xmlStr += localName;
        if (mustDeclareNamespace) {
            m_xmlStr += ” xmlns:”;
            m_xmlStr += nsPrefix;
            m_xmlStr += “=”";
            m_xmlStr += ns;
            m_xmlStr += ‘”‘;
        }
        m_writeState = WriteState_Element;
        break;
        }
    default:
        // TODO: Generate error
        break;
    }
}

void StringXmlWriter::WriteString(const std::string& text)
{
    switch (m_writeState)
    {
    case WriteState_Attribute:
        m_xmlStr += TranslateString
            (
            text,
            AttributeValueTranslations,
            ARRAYSIZE(AttributeValueTranslations)
            );
        break;
    case WriteState_Element:
        // An element is currently open.  Close the element so we can start
        // writing the element content.
        m_xmlStr += ‘>’;
        m_writeState = WriteState_Content;
        // FALL THROUGH
    case WriteState_Content:
        m_xmlStr += TranslateString
            (
            text,
            CharDataTranslations,
            ARRAYSIZE(CharDataTranslations)
            );
        break;
    default:
        // TODO: Generate error
        break;
    }
}

std::string StringXmlWriter::GetExistingNamespacePrefix(const std::string& ns)
{
    for (OpenedElements_t::const_iterator openElemIter = m_openedElements.begin();
         openElemIter != m_openedElements.end();
         ++openElemIter)
    {
        OpenElement::Namespaces_t::const_iterator nsIter =
            openElemIter->Namespaces.find(ns);
        if (nsIter != openElemIter->Namespaces.end())
        {
            return nsIter->second;
        }
    }

    return “”;
}

std::string StringXmlWriter::GetNextNamespacePrefix(const std::string& ns)
{
    // Namespace prefixes are named ns1, ns2, …  They directly correlate to
    // the total number of namespaces already declared.

    size_t totalNumNamespaces = 0;
    for (OpenedElements_t::const_iterator iter = m_openedElements.begin();
         iter != m_openedElements.end();
         ++iter)
    {
        totalNumNamespaces += iter->Namespaces.size();
    }

    std::stringstream ss;
    ss << “ns” << (totalNumNamespaces + 1);
    return ss.str();
}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s