The Community Blog for Business Analysts

Seilevel
Seilevel

RML® Model 4 – Data Dictionary

When creating a data dictionary, it is critical not to design with it.  You should not be attempting to create a database nor should you be showing the relationships of data.  This will be a detailed description of the data involved in your project.  It will also be one of the largest morale draining approval processes you can have for a document.  They will be very time consuming an have the potential to have a good deal of churn over the data items they hold.

 

One of my former projects demanded a great amount of detail be gathered from the existing systems to be implemented in a new outside system.  Unfortunately there were several large problems with the way Data Dictionaries were used.  One of the two largest problems was that there were upwards of twenty different data dictionaries that were being worked on independently by different functional areas of the project.  The other huge issue was that only fields that were visible to the business were being captured.

 

Having multiple data dictionaries for the same project causes many issues and can lead to delayed development and incomplete requirements.  If there are 10 different Data Dictionaries that all call for a field identified as ‘Credit’, how does one figure out if they are all the same field?  If they are the same field, which of the different entries has the correct business rules and attributes?  If they are not the same field, how will the names be reconciled?  If they are different, which one was being referred to in other documentation and models?  And what about the one field labeled ‘CustCred’.  Was that the same?  Were all the business SME’s aware that ‘CustCred’ was included in only one of the data dictionaries? 

 

Failure to engage the Business and IT in creating a Data Dictionary can lead to a plethora of data not being identified and documented.  Unfortunately when only the business is involved in creating a Data Dictionary, it tends to only get populated with fields that can be seen, either on the screen or in reports.  IT will be able to provide input around these fields.  What is not being seen is that when you enter your account number and password, the system is keeping a log of your user id, the date the login attempt occurred, the outcome, and your IP address.  Without IT there, you could have just lost 4 of the 6 fields required to login.  Additionally, if you have to go back a separate time with the IT group, you will have to bring the Business in even more to make sure that the business name you have recorded is the same as the field IT identified.  Just because it says customer name on the screen doesn’t eliminate the possibility that it is actually ‘cust_legal_name’ in the database.

 

 

I have found the following fields to be a robust baseline for any data dictionary I create:
 
 
-ID – unique ID to keep track of the data fields
-Category – Organizational field that can help group data
-Business Name – Official or most widely recognized name for the data
-Aliases – Abbreviations or alternative names the data could go by
-Data Field – Field used in the database (mostly used on gap projects)
-Description – Short description of the field for context
-Formula – If the field is calculated, include the formula used to come up with the value here
-Business Rules – Any actions or restrictions that must apply to this data
-Data Type – generic data type such as ‘alpha’, ‘numeric’, ‘alphanumeric’, ‘currency’, ‘date’, or ‘time’
- Length – Maximum allowed length of the field
-Owner – Individual or group who currently owns an action item for the field
-Status – Draft/Reviewed/Approved/Removed/etc.
- Notes – Misc notes to be taken about the field such as links or individual’s comments
-Issues – List of unresolved issues or action items with the field
 
Keep in mind that the Data Dictionary is best in situations where you have an undefined data model or in  situation where you are involving new systems.  This data model will become less useful when dealing with a smaller or more defined/understood data set or if you are short on time.
 
 Unfortunately this model is also not a visual model.  This means that it will be much harder for people consuming it to quickly learn the data being presented and it will also be difficult to quickly interpret the data being presented.
 
You are almost always going to find this document living as an excel spreadsheet.  If you plan on having multiple large groups working on the same project, you should consider having a collaboration tool for this document such as Google Docs or SharePoint lists.  Multiple lists being worked on separately can cause massive headaches and prove to be massive time sink when trying to correct and compile.  As a best practice, one should never delete data either, but rather mark it as removed or do a strikethrough formatting.  This will allow you to know that it was documented but deemed unnecessary instead of forgetting about that one remote field and researching it again to find out you cut it already.
 
Pro Tip for excel users:
 
Holding down Alt and pressing Enter  (Alt + Enter) while typing in a cell will allow you to return
to the next line within the cell as if it were a word document.  This will make it easier to format notes or add multiple aliases/data fields to a single row.
 
Fun tip- for those who have sympathy for making others do a data dictionary review: bring koosh balls, stress relievers or fun packs of play-doh for your attendees to keep their brains semi-active during the reviews.

by jheep

Want more models? check us out here.

This entry was published on Aug 05, 2010 / Seilevel. Posted in Elicitation (BABOK KA), Data Analysis & Modeling, Business Analysis. Bookmark the Permalink or E-mail it to a friend.
Like this article:
  2 members liked this article

Related Articles

COMMENTS

jm posted on Tuesday, September 21, 2010 1:16 AM
In analysis of legacy systems, some organisations migrate data from one database to one or more other databases. Depending on the scope of the project, it may be useful to use logical or physical database references in the Data Field to clearly identify different source fields, for example, Database.Table.Column. Where this type of data migration occurs, it is not unusual for the same field names to contain different data, or for the same data to be contained in fields with different names.
jm
Only registered users may post comments.

Modern Analyst Blog Latests

As we start a new year many of us will take the time to reflect on our accomplishments from 2012 and plan our goals for 2013. We can set small or large goals. goals that will be accomplished quickly or could take several years. For 2013, I think Business Analysts should look to go beyond our traditional boundaries and set audacious goals. Merriam-...
Recently, I was asked by the IIBA to present a talk at one of their chapter meetings. I am reprinting here my response to that invitation in the hope that it will begin a conversation with fellow EEPs and BAs about an area of great concern to the profession. Hi xx …. Regarding the IIBA talk, there is another issue that I am considering. It's p...
Continuing the ABC series for Business Analysts, Howard Podeswa created the next installment titled "BA ABCs: “C” is for Class Diagram" as an article rather than a blog post. You can find the article here: BA ABCs: “C” is for Class Diagram Here are the previous two posts: BA ABCs: “A” is for Activity Diagram BA ABCs: “B” is for BPMN

 



Blog Information

» What is the Community Blog and what are the Benefits of Contributing?

» Review our Blog Posting Guidelines.

» I am looking for the original Modern Analyst blog posts.

 




Copyright 2006-2025 by Modern Analyst Media LLC