Working With Checkboxes and Other Controls in Word Using OpenXml
Microsoft’s adoption of the OpenXML format for its Office documents such as Word, PowerPoint and Excel has meant that developers can provide considerably improved experiences over the web and on desktop applications when working with Office documents. For example, it used to be the case that in order to serve up a Word document from a web server, you needed to have Microsoft Office installed on that server. While this might still occasionally be the case, in most instances you can now reference the OpenXml SDK and create a Word 2007 / 2010 document on the fly.
While the OpenXml framework offers a huge improvement in interoperability, it also adds a reasonable degree of complexity if you’re trying to work with attributes of a document that would generally be considered outside of the “routine” tasks of working with text and layout. My specific problem last week was trying to get references to checkbox controls I’d placed in the document and read or manipulate their states. The trick, it turns out, is knowing your checkboxes from your checkboxes. Not all checkboxes are created equal.
Understanding Control Genealogy
The Microsoft Office suite has evolved from a set of applications that employed proprietary binary file formats, windows-centric embedded controls and a litany of other ideosyncrasies that served their purpose at the time, but are little more than legacy relics today. Still – they need to be supported and we are therefore left with quite a diverse range of possible controls that might be found in a given document or spreadsheet.
What this means is that when working with embedded controls – particularly those used in forms such as checkboxes, dropdown lists and the like – we have to be aware of what kind of control is actually being used and thereby how to access it. In working with Word, we’re faced with three different incarnations of the checkbox: the “legacy” control, the ActiveX control and the most recent “Content Checkbox”.
Let’s look at how to get to each.
Legacy Checkboxes
The following code shows how to iterate across legacy checkbox controls. The Parent node of the checkbox holds information such as the name of the actual checkbox instance. To get identifying information we need to traverse up the XML tree and retrieve the FormFieldName element.
using (WordprocessingDocument doc = WordprocessingDocument.Open("c:\\checkbox.docx", true))
{
foreach (CheckBox cb in doc.MainDocumentPart.Document.Body.Descendants())
{
Console.Out.WriteLine(cb.LocalName);
FormFieldName cbName = cb.Parent.ChildElements.First();
Console.Out.WriteLine(cbName.Val);
DefaultCheckBoxFormFieldState defaultState = cb.GetFirstChild();
Checked state = cb.GetFirstChild();
Console.Out.WriteLine(defaultState.Val.ToString());
if (state.Val == null) // In case checkbox is checked the val attribute is null
{
Console.Out.WriteLine("CHECKED");
}
else
{
Console.Out.WriteLine(state.Val.ToString());
}
}
}
ActiveX Checkboxes
ActiveX controls present a more problematic situation. The OpenXML spec is not concerned with ActiveX controls, which are Windows-specific program building blocks and OpenXML is focused on interoperability across operating systems or platforms.
The code to find an ActiveX control is a lot more verbose and requires us to “sniff around” to check the class type of any controls found in the document. We can find ActiveX controls using the Control element type:
foreach (Control ctrl in doc.MainDocumentPart.Document.Body.Descendants()) { Console.Out.WriteLine(ctrl.Id); Console.Out.WriteLine(ctrl.Name); Console.Out.WriteLine(ctrl.ShapeId); }
As I’ve already mentioned, we can’t differentiate between ActiveX controls within the SDK framework itself, meaning we need to check the class ID of any control we find. The class ID for a checkbox is {8BD21D40-EC42-11CE-9E0D-00AA006002F3}.
OpenXmlPart part = doc.MainDocumentPart.GetPartById(ctrl.Id);
OpenXmlReader reader = OpenXmlReader.Create(part.GetStream());
reader.Read();
OpenXmlElement el = reader.LoadCurrentElement();
if(el.GetAttribute("classid", el.NamespaceUri).Value == "{8BD21D40-EC42-11CE-9E0D-00AA006002F3}")
{
Console.WriteLine("Checkbox found.");
}
reader.Close();
It’s a cumbersome and counter-intuitve approach and would ideally require the developer to replace any of these controls where possible.
Native Content Controls
As previously mentioned, Office 2010 introduces a native set of controls that are more manageable when working with the OpenXML SDK. These controls are differentiated using the SdtContentCheckBox type.
using (WordprocessingDocument doc = WordprocessingDocument.Open(filename, true))
{
MainDocumentPart mp = doc.MainDocumentPart;
foreach(SdtContentCheckBox cb in mp.Document.Body.Descendants())
{
if(cb.Checked.Val == "1");
{
Console.Out.WriteLine("CHECKED");
}
}
}
Form controls are somewhat tricky to work with but hopefully these examples go some way toward clearing up their quirks and traps.



Trackbacks & Pingbacks