When creating a data dictionary, it is critical not to design with it. You should not be attempting to create a database nor should you be showing the relationships of data. This will be a detailed description of the data involved in your project. It will also be one of the largest morale draining approval processes you can have for a document. They will be very time consuming an have the potential to have a good deal of churn over the data items they hold.
One of my former projects demanded a great amount of detail be gathered from the existing systems to be implemented in a new outside system. Unfortunately there were several large problems with the way Data Dictionaries were used. One of the two largest problems was that there were upwards of twenty different data dictionaries that were being worked on independently by different functional areas of the project. The other huge issue was that only fields that were visible to the business were being captured.
Having multiple data dictionaries for the same project causes many issues and can lead to delayed development and incomplete requirements. If there are 10 different Data Dictionaries that all call for a field identified as ‘Credit’, how does one figure out if they are all the same field? If they are the same field, which of the different entries has the correct business rules and attributes? If they are not the same field, how will the names be reconciled? If they are different, which one was being referred to in other documentation and models? And what about the one field labeled ‘CustCred’. Was that the same? Were all the business SME’s aware that ‘CustCred’ was included in only one of the data dictionaries?
Failure to engage the Business and IT in creating a Data Dictionary can lead to a plethora of data not being identified and documented. Unfortunately when only the business is involved in creating a Data Dictionary, it tends to only get populated with fields that can be seen, either on the screen or in reports. IT will be able to provide input around these fields. What is not being seen is that when you enter your account number and password, the system is keeping a log of your user id, the date the login attempt occurred, the outcome, and your IP address. Without IT there, you could have just lost 4 of the 6 fields required to login. Additionally, if you have to go back a separate time with the IT group, you will have to bring the Business in even more to make sure that the business name you have recorded is the same as the field IT identified. Just because it says customer name on the screen doesn’t eliminate the possibility that it is actually ‘cust_legal_name’ in the database.
I have found the following fields to be a robust baseline for any data dictionary I create:
-ID – unique ID to keep track of the data fields
-Category – Organizational field that can help group data
-Business Name – Official or most widely recognized name for the data
-Aliases – Abbreviations or alternative names the data could go by
-Data Field – Field used in the database (mostly used on gap projects)
-Description – Short description of the field for context
-Formula – If the field is calculated, include the formula used to come up with the value here
-Business Rules – Any actions or restrictions that must apply to this data
-Data Type – generic data type such as ‘alpha’, ‘numeric’, ‘alphanumeric’, ‘currency’, ‘date’, or ‘time’
- Length – Maximum allowed length of the field
-Owner – Individual or group who currently owns an action item for the field
-Status – Draft/Reviewed/Approved/Removed/etc.
- Notes – Misc notes to be taken about the field such as links or individual’s comments
-Issues – List of unresolved issues or action items with the field
Keep in mind that the Data Dictionary is best in situations where you have an undefined data model or in situation where you are involving new systems. This data model will become less useful when dealing with a smaller or more defined/understood data set or if you are short on time.
Unfortunately this model is also not a visual model. This means that it will be much harder for people consuming it to quickly learn the data being presented and it will also be difficult to quickly interpret the data being presented.
You are almost always going to find this document living as an excel spreadsheet. If you plan on having multiple large groups working on the same project, you should consider having a collaboration tool for this document such as Google Docs or SharePoint lists. Multiple lists being worked on separately can cause massive headaches and prove to be massive time sink when trying to correct and compile. As a best practice, one should never delete data either, but rather mark it as removed or do a strikethrough formatting. This will allow you to know that it was documented but deemed unnecessary instead of forgetting about that one remote field and researching it again to find out you cut it already.
Pro Tip for excel users:
Holding down Alt and pressing Enter (Alt + Enter) while typing in a cell will allow you to return
to the next line within the cell as if it were a word document. This will make it easier to format notes or add multiple aliases/data fields to a single row.
Fun tip- for those who have sympathy for making others do a data dictionary review: bring koosh balls, stress relievers or fun packs of play-doh for your attendees to keep their brains semi-active during the reviews.