Create XML files using DOM

Some of you may be familiar with my Random Data Generator addin. Currently, one of the output formats is Microsoft's proprietary XML Spreadsheet format.

One of my goals is to increase the number of available output formats by including standard XML. Here is one way to save worksheet data as XML.

We'll need a function that can take any number of columns and rows and output them as valid XML. Using sample code I found at Overview of the SAX to DOM Example I cribbed together a function that takes a range and either returns the XML or saves it to a file on disk.

Function to Save Worksheet Data As XML

The following function takes an array (ex: a contiguous range with no blank rows or columns), and three optional parameters:

  1. Path and filename
  2. Parent node name
  3. Whether to return the XML

If a filepath is supplied, the XML is saved to the path using the filename provided. Minimal validation is performed (path is validated).

If a parent node name is provided, the topmost node is given that name. Otherwise, "Values" is the topmost node. The first level child nodes are named using the first row of the

input array (the "header") and the individual nodes are named "Value".

If requested, the XML will be returned as a String to the calling procedure.

Note that something must be done with the XML — either saved to disk or returned to the calling procedure, otherwise it is discarded. If no filepath is provided, the XML is forcibly returned. Of course, there's really no way to force you to use the XML. If you call the function with no filepath and don't assign the return value to a String, you've simply wasted processing time but don't really have to do anything with the XML the function creates.

Function CreateXML(inputValues As Variant, _
                   Optional filePath As String, _
                   Optional parentNodeName As String = "Values", _
                   Optional returnXML As Boolean = 1) As String
  Dim pathName As String
  Dim xmlDoc As Object ' MSXML2.DOMDocument60
  Dim mxxml As Object ' MSXML2.MXXMLWriter60
  Dim cnth As Object ' MSXML2.IVBSAXContentHandler
  Dim i As Long, j As Long

  ' create new DOM Document and point XML writer to it
  Set xmlDoc = GetDomDoc

  If xmlDoc Is Nothing Then
    MsgBox "Could not create MSXML DOM Document."
    Exit Function
  End If

  Set mxxml = GetMXXMLWriter
  If mxxml Is Nothing Then
    MsgBox "Could not create MXXML Writer"
    Exit Function
  End If

  Set cnth = mxxml
  mxxml.output = xmlDoc
  mxxml.indent = True

  ' begin creating the XML document output
  cnth.startDocument
  ' add xml declaration
  cnth.processingInstruction "xml", "version='1.0' encoding='UTF-8'"

  ' create parent node using input name or default of "Values"
  cnth.startElement "", "", parentNodeName, Nothing

  ' create first-level child nodes using first row of array
  For i = LBound(inputValues, 2) To UBound(inputValues, 2)
    cnth.startElement "", "", CStr(inputValues(1, i)), Nothing
    ' loop through array and create a child node for each
    ' since parent node hasn't been closed, these will 
    ' automatically be child nodes
    For j = 2 To UBound(inputValues)
      cnth.startElement "", "", "Value", Nothing
      cnth.Characters CStr(inputValues(j, i))
      cnth.endElement "", "", "Value"
    Next j
    ' close first-level child node
    cnth.endElement "", "", CStr(inputValues(1, i))
  Next i

  ' close parent node
  cnth.endElement "", "", parentNodeName
  ' end output
  cnth.endDocument

  ' save xml?
  If Len(filePath) > 0 Then
    ' verify folder exists
    pathName = Split(filePath, ExtractFileName(filePath))(0)
    If FolderExists(pathName) Then
      ' save xml to specified filepath
      xmlDoc.Save filePath
    End If
  Else ' return xml
    returnXML = 1
  End If

  ' return xml?
  If returnXML Then
    CreateXML = xmlDoc.xml
  End If

End Function

Function FolderExists(foldername As String) As Boolean
  FolderExists = (Len(Dir(foldername)) > 0)
End Function

Following the example from MSDN, this function starts by creating a new MSXML DOM Document and XML Writer. The SAX Content Handler is pointed to the XML Writer, whose output is the DOM Document. The Content Handler is used to loop through the array and add nodes to the DOM Document. Finally we save and/or return the XML. Because of the way the loop is written, it doesn't matter how many columns or rows there are.

The resulting XML should be valid according to W3 Validator.

For the GetDomDoc and GetMXXMLWriter functions visit MSXML Object Library. For the ExtractFileName and GetFileType functions visit Filename Parsing Methods.

Sample Usage

Here is an example range we might to write to XML:

worksheet one column

I'll call the above function as follows:

Sub TestCreateXML()
  Dim xml As String
  xml = CreateXML(Range("A1").CurrentRegion.value, _
    "C:\Myxmlfile.xml", "MyValues")
End Sub

I use CurrentRegion because I can add more columns or rows without having to change at all how I call the CreateXML function. All I have to do is worry about my data — add or remove columns or rows and rerun the exact same procedures without having to change a thing!

Example output:

<?xml version='1.0' encoding='UTF-8'?>
<MyValues>
  <Values>
    <Value>ABC1</Value>
    <Value>ABC2</Value>
    <Value>ABC3</Value>
    <Value>ABC4</Value>
    <Value>ABC5</Value>
  </Values>
</MyValues>

Now suppose I have multiple columns of worksheet data:

worksheet multiple columns

As I explained above, I do not need to make any changes to either the CreateXML function or my calling procedure. CurrentRegion will pick up the additional column of data and CreateXML will dutifully loop through it and add it to the output. Simply re-run the above code and I get the following:

<?xml version='1.0' encoding='UTF-8'?>
<MyValues>
  <Values>
    <Value>ABC1</Value>
    <Value>ABC2</Value>
    <Value>ABC3</Value>
    <Value>ABC4</Value>
    <Value>ABC5</Value>
  </Values>
  <Values2>
    <Value>ABC2</Value>
    <Value>ABC3</Value>
    <Value>ABC4</Value>
    <Value>ABC5</Value>
    <Value>ABC6</Value>
  </Values2>
</MyValues>

The parent node is named "MyValues" as we specified when we called the function. The first level child nodes are named "Values" and "Values2" which are the header row values. This was also the case in the single column file, the multiple column example just makes it more obvious.

In the next article we'll review how we can read these XML files back into the DOM and parse them.

About JP

I'm just an average guy who writes VBA code for a living. This is my personal blog. Excel and Outlook are my thing, with a sprinkle of Access and Word here and there. Follow this space to learn more about VBA. Keep Reading »


Related Articles:


Share This Article:

Share and bookmark this articledelicious buttonfacebook buttonlinkedin buttonstumbleupon buttontwitter button

This article is closed to any future comments.
Random Data Generator