NIST publishes Implementation Guidance for Common Data Formats

At the end of 2024, the National Institute of Standards and Technology (NIST), part of the U.S. Department of Commerce, published “Implementation Guidance for Common Data Formats”—a NIST Grant/Contractor Report I drafted for The Turnout.

Conveniently, NIST has also published a web version of the guide.

This guide discusses various topics related to the practical implementation of the NIST Voting Common Data Formats (i.e., NIST SP 1500 series). This includes materials on how the Common Data Formats (CDFs) are constructed, cross-referencing data inside and outside CDF instances, handling of geopolitical geography, and low-level processing.

Background of NIST Voting Common Data Formats

Interoperability is a crucial principle in the U.S. Election Assistance Commission’s (EAC) Voluntary Voting System Guidelines (VVSG) that ensures various parts of a voting system can exchange and interpret data accurately and reliably. The NIST Voting CDFs are standardized formats used to represent and exchange election-related data by VVSG 2.0-certified voting systems and others. They are designed to promote consistency, accuracy, and efficiency in the processing of election data across different jurisdictions and systems.

The first NIST Voting CDF was released in 2015. Numerous systems have adopted the Voting CDFs to ease interoperability between election technology components. With the benefit of nearly 10 years of user experience with these CDFs, this guide seeks to address key questions recurrently asked by implementers and provide insights for those on the ground implementing and using the CDFs.

Guide Overview

This guide consists of several sections, each focusing on a particular goal for the subject matter. The guide:

Delineates the practical applications of each CDF within the elections ecosystem
Offers an in-depth overview of the Unified Modeling Language (UML) class diagram notation as it is used to describe each CDF
Outlines various methodologies for managing identifiers within the CDFs
Provides an extensive background on geopolitical geography and its significance in the context of elections
Addresses practical implementation details that are crucial for the successful deployment of CDFs in election systems
Covers automated tools and techniques for ensuring compliance with the standards and guidelines highlighted elsewhere in the document

Key Guide Points

There is a lot of material—very deep and detailed technical material—in this 82-page guide. While all of it is necessary to absorb to implement and use the Voting CDFs properly, the following five points are the key takeaways from this guide:

Development and Understanding of Design Decisions: The NIST CDFs were developed with a specific methodology and design principles starting around 2012. These principles guided their structure and content, aiming to provide interoperability and clarity for election-related data exchange. However, the rationale behind many design decisions is not readily apparent to those not involved in their inception. This creates a knowledge gap, as key insights into the compromises and choices made during development are embedded in the history and discussions of the working groups rather than explicitly documented in the specifications. For those outside this initial circle, understanding the "why" behind the structure of the CDFs can be challenging. The implementation guidance document makes many of these assumptions explicit.
Overestimation of Implementers’ Interpretive Abilities: The specifications for the CDFs were written assuming that implementers could interpret and apply them easily. While the documents are comprehensive, they presume a high level of familiarity with the technologies (e.g., Extensible Markup Language, JavaScript Object Notation) and the intent and context behind the standards. This overestimation has led to varying levels of success in implementation, as some nuances of the formats are not explicitly addressed or clarified, leaving room for divergent interpretations. The implementation guidance clarifies several aspects of the CDF specifications.
Use of Unified Modeling Language Class Models: A fundamental design choice was to represent the CDFs at a high level using UML class models. This approach aimed to abstractly depict the data relationships and structures, making the formats conceptually more straightforward to understand and reason about. However, there is often a disconnect between these diagrammatic representations and their machine-readable implementations, such as Extensible Markup Language (XML) schemas or JavaScript Object Notation (JSON) schemas. The mapping between these high-level models and their technical artifacts is not always intuitive or explicitly described. It can be a barrier for implementers translating conceptual models into functional systems. The implementation guidance makes the mapping between UML and XML/JSON schemas explicit so implementers can better interpret the CDF specifications.
Technology-Specific Knowledge Requirements: Implementing the CDFs does not require extensive knowledge but demands familiarity with specific technical standards and tools. For example, an implementer must understand XML Schema Definition (XSD) and JSON schema to validate data structures, as these technologies form the backbone of the CDF implementations. Moreover, nuances within these schema languages—such as handling optionality, extensions, or constraints—require a level of expertise that may not be universally possessed by all developers tasked with implementing the CDFs.
Lack of Documentation on Inter-CDF Strategy: While each CDF is tailored to serve specific use cases within the election systems domain (e.g., ballot definition, cast vote records, election results reporting), their interoperability was part of a broader, albeit implicit, strategy. This strategy envisioned how the various CDFs would support comprehensive workflows and data exchanges across the election lifecycle. However, no single document or cohesive narrative articulates this high-level integration strategy. As a result, implementers and stakeholders must infer these connections, which can lead to inconsistent or suboptimal integration across different CDFs.

Election Process Steps	CDFs
PRE-ELECTION
begin election
decide to include contest on ballot	BD
decide to include candidate on ballot	BD
register candidate for election	BD
register voter	VRI
define election	BD
define ballot	BD
implement ballot	BD
install ballot on equipment	BD
verify election equipment is ready for election	EEL
ELECTION
open polls
authenticate/identify voter	VRI
connect voter to blank ballot	VRI, BD
voter interacts with ballot via interfaces	BD
voter edits ballot (selects, deselects) contest choices	BD
voter navigates ballot	BD
voter verifies contest selections	BD
voter casts/records ballot	CVR, mCDF
voter cancels/spoils ballot	BD
POST-ELECTION
close polls
count votes	CVR
consolidate votes	CVR, ERR
transfer information (physically, electronically)	CVR, ERR, EEL
report results (intermediate, final)	ERR
track/log election status throughout	EEL
archive election information and equipment	VRI, BD, ERR, EEL
audit election information and equipment	VRI, BD, ERR, EEL
accept election results	VRI, BD, ERR, EEL
end election

The document aims to streamline the understanding and adoption of the CDFs across diverse use cases and stakeholders by making these implicit connections explicit. We at The Turnout are proud to have created this guide, and I encourage you to read and digest the material. Feel free to contact me, my colleagues at The Turnout, or the Voting team at NIST with any questions or ideas for enhancement.