Database and Information Systems Development: Data Dictionaries

Ron McFarland PhD
8 min readDec 8, 2021

by Ron McFarland, Ph.D., MSc, Post-Doc, CISSP

Data Dictionary — Meta Data about an Information System/Database Project

In this series of database and systems development articles, I’ll present several salient topics that can support good development processes. An essential element that is present in many software development and database development projects is the use of a viable and robust data dictionary. While some projects that I’ve worked on, either when the project was in development or the follow-on system maintenance phase, have not sufficiently used a data dictionary, I believe that the use of a data dictionary is a critical success factor for system development and project management.

What is a data dictionary?

A data dictionary can be considered a fundamental part of an information system and database design project. Whether the information system or database is in the active phase of development or the follow-on maintenance phase (the longest phase), there is a compelling technical and project management reason for the use of a viable and up-to-date data dictionary. The data dictionary is meta data (data about data) that generally describes the source and location of the data within the system and identifies the data owner of the data.

As the name suggests, a data dictionary (DD) is a collection of names, definitions, and attributes about data elements that are used in a database or information system. A DD can include database tables and schemas, table details (i.e., owners, security constraints, creation dates, etc.), how and where tables are stored, table constraints (i.e., primary key, foreign key, default values, value ranges, attributes, etc.), and database views (Meador, 2018). A DD describes the meanings and purposes of data elements within the context of a project, and provides guidance on interpretation, accepted meanings and representation.

Data dictionaries are used to catalog and communicate the structure and content of data and provides descriptions for any named data objects. The information from data dictionary is mostly used for filling in entities and attributes for the database system. Aside from our IT-focus in this article, tangentially, a data dictionary can also be used as a part of an extensive research project (UCMerced Library, 2021). In short, a data dictionary:

1. Assists in avoiding data inconsistencies across a project.

2. Helps to define conventions used within a project or across several projects.

3. Provides consistency in the collection and use of data across multiple members of a development/maintenance (or research) teams.

4. Makes it easier to analyze data, as the data use and data typing is described and defined.

5. Enforces the use of data standards, including scope and data names, within and between projects.

Data dictionaries can facilitate the sharing of information about the data, the data structures, and associated business rules that guide the use and need for the data used in the information systems or database project. A data dictionary is a valuable resource for DB Administrators (DBAs) as it consists of data that details the data content of the database. The DD may also provide the roles or privileges and where the data is stored (a physical location). The use of a data dictionary in a development or maintenance project can help support the project quality and consistency as the dictionary describes conventions for any associated project that use the relevant data. Without the adequate use and support of a viable data dictionary, there is an increased chance of misrepresenting information in the translation and transition of data used within an associated development or maintenance effort.

A data dictionary includes several items

Data dictionaries act as a repository for metadata, or data that gives information about other data, and to help control said metadata (Bourne, 2013; Middleton, 2007). A data dictionary, or metadata repository, as defined in the IBM Dictionary of Computing, is a “centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format”. Oracle defines it as a collection of tables with metadata. The term can have one of several closely related meanings pertaining to databases and database management systems (DBMS):

1. A document describing a database or collection of databases.

2. An integral component of a DBMS that is required to determine its structure.

3. A piece of middleware that extends or supplants the native data dictionary of a DBMS.

Common metadata includes names of fields, tables, databases, data types, data size, comments, privileges, triggers, and modules (Middleton, 2007). Some of the salient information that is included in a typical data dictionary are:

  • Lists of data objects used in the project along with their description and location of use.
  • Details about the properties of data elements such as data type, size, indexes, and other data-centric information.
  • Entity-relationship diagrams, data flow diagrams, and other system level diagrams relevant to the data and associated project(s).
  • Business rules including justification, data validation, data quality parameters, security parameters, and testing requirements.
  • The information contained within the data dictionary holds a variety piece of information pertaining to processes, entities, data files, etc.
  • Some of the common attributes between the different instances include names, contents, requirements for creating data, and constraints that are imposed for the data, relevant processes invoked to create, secure, delete, and/or update data.
  • A DD can contain detailed processes for the actions that the object will typically take. An example would be the internal process for creating a new customer record that is invoked by the information system/database system.

Two types of data dictionaries

There are two generally types of data dictionaries. An Active Data Dictionary is updated dynamically when a database structure and specification changes in the database product. The DBMS in which the data dictionary resides will manage these types of data dictionary updates (Meador, 2018).

A Passive Data Dictionary is maintained independently from the database/information systems development process to which it contents are stored in the data dictionary. The updates are manually completed, usually by the DBA. A manual update provides a challenge in that there will be a time delay to the completion of data dictionary updates; hence, the data dictionary will easily become out of sync from the Database structure. This approach is not desirable for a database that undergoes frequent or complicated updates (Meador, 2018).

Figure 1: Example of a passive data diction in Excel format, from “Data Dictionaries” USGS (2021).

How Data Dictionaries are Used

Data dictionaries (DD) support the full project life cycle for information systems and/or database development, as is noted below:

  • Documentation: DDs provide details about data structures relevant for users, developers, and other system stakeholders.
  • Communication: The DD equips developers and users with a common vocabulary. Definitions for shared data, data standards, data flow and exchange, will help users and developers gage impacts of any schema modifications (including additions and maintenance).
  • Application Design: A DD can support database and application developers with the appropriate creation of related forms (screens) and reports with proper data types and controls and can ensure that navigation is consistent with data relationships.
  • Systems Analysis: A DD can support the understanding of data analysts with the overall system design and data flow, and to find where data interact with various processes or components.
  • Data Integration: A DD supports clear definitions of data elements and provide the contextual understanding necessarily when mapping one data system to another, or whether to subset, merge, stack, or transform data for a specific use.
  • Decision Making: The DD can assist in planning data collection, project development, and other collaborative efforts.

The Data Dictionary and the SDLC

Data dictionaries are a major component of the SDLC that facilitate the developmental progress of the system. A data dictionary is considered a vital part of a software/database development project. For groups of people working with the same data, having a shared data dictionary facilitates standardization by documenting common data structures and provides the precise vocabulary necessary to foster effective communication about the data and processes between teams. Shared dictionaries used by development and maintenance teams ensure that the meaning, relevance, and quality of data elements are the same for all users. Data dictionaries provide information needed by those who build systems and applications that support the data.

The DD is meant to keep a detailed record of each instance in a data flow diagram and help to help serve as a reference aid to the developers during the project development. The purpose of a data dictionary is for developers to record standard information regarding the elements of the database in one place, making it accessible to many parts of a project (Dennis, 2012). The metadata in the DD is assessed by developers and users throughout the SDLC process and showcases the capability to perform a process (Dennis, 2012; Ferguson & Hebels, 2003). Also, use of data dictionaries in the business sector allows for quality control of data within an information system or database (Middleton, 2007).

For both data reviewers and data users, the data dictionary can reveal potential credibility problems within the data. Poor table organization and object naming can severely limit data understandability and ease-of-use, incomplete data definitions can render otherwise stellar data virtually useless, and failure to keep the dictionary up to date with the actual data structures suggests a lack of data stewardship. Although getting critical feedback about their data may be initially troublesome for some data creators, developing good data design and description habits is worth the effort and ultimately benefits everyone who will use the data.

Within a project, the DD equips users with a common vocabulary and definitions for share data, data flow and exchange. It can also assist in planning data collection, project development, and other collaborative efforts (USGS, 2021). The DD can be used to mitigate data inconsistencies within a given project as it serves as a repository of attribute (field) names. The DD is used as a catalog to describe the meanings and purpose of data elements within the context of a project, provides guidance on interpretation.

Summary

In summary, the use of a Data Dictionary in an Information Systems or Database Development project is essential. While the use of the DD is part and parcel of the system documentation process, the DD plays a vital role in the analysis and designing of a system as the central repository for detailed descriptions of entities, file, data flows, and processes. The data dictionary serves as a living document that stores vital information throughout the analysis and design of a system that supports discoveries of the system, assumptions made about the system, can indicate data conflicts, may identify system misconceptions, and supports the system developers and project manager with the development, implementation, and maintenance of an information system or database development project.

About the Author

Ron McFarland, PhD, CISSP is a Senior Cybersecurity Consultant at CMTC (California Manufacturing Technology Consulting) in Torrance, CA. He received his doctorate from NSU’s School of Engineering and Computer Science, MSc in Computer Science from Arizona State University, and a Post-Doc graduate research program in Cyber Security Technologies from the University of Maryland. He taught Cisco CCNA (Cisco Certified Network Associate), CCNP (Cisco Certified Network Professional), CCDA (Design), CCNA-Security and other Cisco courses and was honored with the Cisco Academy Instructor (CAI) Excellence Award in 2010, 2011, and 2012 for excellence in teaching. He also holds multiple security certifications including the prestigious Certified Information Systems Security Professional (CISSP) certification and several Cisco certifications.

CONTACT Dr. Ron McFarland, MSc, PhD, CDNA, CISSP

CMTC Email: rmcfarland@cmtc.com
Email: highervista@gmail.com
LinkedIn: https://www.linkedin.com/in/highervista/
Website: https://www.highervista.com
YouTube Channel: https://www.youtube.com/channel/UCJ57_1OgZ5H1nMVdGElcvrw

--

--

Ron McFarland PhD

Cybersecurity Consultant, Educator, State-Certified Digital Forensics and Expert Witness (California, Arizona, New Mexico)