A Data-Driven Approach to Specifications


"The ultimate purpose of any information system is, of course, to manipulate data in some form." Software Systems Architecture, Nick Rozanski and Eoin Wood

The Value of Data

A Data-Driven Approach to SpecificationsImagine for a moment that you're a CIO faced with an either/or decision: retain your company's multi-million-dollar ERP system--or the data contained within that system. What would you do?

Not a difficult choice. Enterprise applications exist to manage your company's data, and although costly, whether custom or commercial, they're commodities that, given time and resources, can be completely replaced. Your company's data, however, is not only irreplaceable, but is essential to maintaining your company's competitive position.

But that same data sometimes takes a back seat in requirements analysis, where the focus is very often on business processes. For a business person, it's more intuitive (and much more interesting) to describe what he does on his job than to discuss the contents of his filing cabinets. The mass digitization of business data has also encouraged the perception of data as being "technical", resulting in much of the responsibility for data details being delegated to IT staff.

Many studies have concluded, coincidentally or otherwise, that struggling IT projects are often plagued by difficulties in

  • Stating business requirements precisely and comprehensively

  • Accurately transforming business requirements into system specifications

  • Responding rapidly to change in both business requirements and technology platforms.

Business requirements are usually captured in narratives and graphics that, regardless of how detailed, structured, cross-referenced and validated, are fundamentally imprecise. A data-driven approach to specifications has the potential to help avoid these problems and subsequently decrease the risk and increase the return on companies' IT investments.

Business Requirements

Enterprise applications, in contrast to personal productivity suites and video games, are much like filing cabinets. The data in an enterprise application is very similar to the data that used to be stored in filing cabinets. Computers just make it much easier to store much more of it.

The functions performed on data by an enterprise application are also relatively less complex: creating, retrieving, updating and deleting records; performing calculations (creating more data), and testing for conditions in persistent or transient data.

Most of the complexity in enterprise applications is in fact a consequence of data dependencies, which are seldom explicitly and precisely expressed in requirements. But when recognized, understood and deliberately managed, dependencies allow a collection of data and functionality to be to largely self-organizing1 .

Capturing data requirements, and subsequently the functions required to manage that data, clearly exposes the dependencies among data and functions. A data-centric specification metamodel allows business requirements to be expressed precisely and unambiguously in terms of data, operations and conditions which can then be accurately transformed into implementable enterprise application components.

Requirements to Specifications

Specifications differ from requirements in that they are specific enough to be accurately transformed into implementable application components. Performing this transformation accurately is the challenge faced by analysts and architects, who must also contend with multiple semi-compatible formats and standards for both. A specifications metamodel can solve many of these problems by providing an unambiguous target format for the transformation of requirements into specifications.

There are many ways to express business requirements, but let's look at an example of a requirement in the form of a business rule stated in natural language:

“A student must not register for more than 6 classes in a semester.”2

A rule such as this is implemented in enterprise application by allowing or preventing the assignment of specified values to specified variables under specified conditions. To do this, these operations, values, variables, conditions and their inter-relationships need to be stated more precisely than is possible with natural language. How then can a requirements statement such as this business rule be transformed into such specific terms?

The first step is to identify the variables in the business requirements. Data analysts usually begin by looking for nouns, such as, in this case, Student, Class, and Semester. But the most important data in this rule is actually the verb "Register". Expressing this concept in its noun form-Registration-designates the relationship between the noun s Student and Class. This business rule is then all about a condition on Create operations for the Registration variable.

The resulting specifications are shown in the table below. The Path column shows each variable, operation and condition as a leaf of a branch of its relationship tree, clearly showing the parent-child interdependencies. The Kind column shows what the component is, and the Link column shows its relationship to its immediate parent.






















The names in the Kind column are examples of entity types in the specifications metamodel: Registration is a Persistent Variable, RegCount is a Transient Variable, LT is a Condition and CreateReg, an Operation. The names in the Link column are relationship types in the metamodel: Type-Variable, Dependent Variable, and three Independent Variable roles .3

Because perhaps next semester a student could legitimately register for 7 classes, or 5, the value "6" in the business rule, rather than being a constant, is expressed as a value that is part of a condition (not shown) constraining the variable RegMax. The variable UnderRegMax is a Boolean type variable that, if it evaluates to TRUE, allows the Create operation.

To avoid the ambiguity of "ELSE" (as Ross recommends) the specification of a violation of this rule will be specified as another, opposite condition.

The table above was extracted from a prototype tool for defining specifications with sufficient precision to allow the automatic generation of data definitions and executables for an application. Specifications structured in this way can be viewed and managed much like a bill of materials, providing a common, precise and comprehensive understanding for business stakeholders, analysts, designers and developers alike, even for a very large application.

Design and Implementation

The application design process is a crafting of relationships between specifications and implementation platforms. Much of the process consists of deciding how to best separate data and operations into the most computer-digestible chunks. Data dependencies spanning these chunks become interfaces, and each interface increases the overall complexity of the application.

Precisely-structured, fine-grained specifications and explicit delineation of all interdependencies as described above allow for intelligent bottom-up assembly and/or top-down partitioning of implementation components that range in detail from individual operations and variables up to entire monolithic applications.

Perpetual Change

In any sort of engineering construction, a stable foundation enables flexibility in adapting structures built upon it.

Very often by the time an enterprise application goes into production, significant changes have occurred in the business it supports and/or its implementation technology. The form and content of a business's data typically change much more gradually4 . Although it is not truly possible to "decouple" data and function because of their interdependency, it is possible to understand those interdependencies clearly, precisely and comprehensively.

In a typical application development project, text documents-requirements and specifications--are manually transformed into other text documents--source programs--that are then compiled into a form usable by computers. As a result dependencies are obfuscated in semi-structured text, which effectively creates resistance to change in the system. If only a small portion of the dependencies are visible at one time, it is difficult to assemble a comprehensive picture of what components are dependent on what others.

An enterprise application comprising components with visible and precisely delineated dependencies is much more adaptable. A specifications bill of materials enables clear and accurate dependency analysis and assessment of the impact of potential changes. The scope and impact of any change can be accurately assessed before being implemented, and the change can be applied with predictable results.

Requirements to Implementation: Common Ground

The specifications metamodel introduced in this article provides an anchor point between requirements and implementation. The model and supporting tools are being developed as part of the Data-Driven Application Engineering5 effort underway at IBM. Watch for more details at Modern Analyst and The Data Administration Newsletter (www.tdan.com). Please email questions and comments to [email protected].

Bill Lewis, Senior IT Specialist with IBMAuthor: Bill Lewis

Bill Lewis is a Senior IT Specialist with IBM. His experience spans the financial services, energy, health care and software industries. His current specializations are data management, metadata management and business intelligence, and he has been recognized as a thought leader on topics ranging from software development tools to IT architecture. He has contributed to numerous online and print publications, and is the author of Data Warehousing and E-Commerce. He can be reached at [email protected].


1Normalization is the discipline by which these natural patterns of data behavior are discovered and specified.

2From the Modern Analyst article "What's Wrong with If-Then Syntax for Expressing Business Rules" by Ron Ross.

3Some readers may be thinking at this point "this looks like just another programming language...why should I have to learn a programming language to write specifications?" Consider this: the example includes nine metamodel entity and relationship types, out of a total of fourteen in the entire metamodel. If you understand the example you've already learned nearly two-thirds of the language elements!

4In fact, when application data is converted from one form to another, one of the primary goals of testing is confirming the equivalence of the data before and after the conversion.

5U.S. Patent applied for.

Article Image © Tyler Olson - Fotolia.com

Like this article:
  17 members liked this article


ron segal posted on Tuesday, February 15, 2011 1:16 PM
Bill,yes, I'm afraid that what you're describing has almost become a lost art, with a new generation of 'analysts' who understand only process oriented Use Cases. We need more thought leaders such as yourself to turn this around.

Something that gets in the way here generally I find is that 'data' is seen as a technical design issue, whereas the fundamental concern is with unequivocally identifying and describing (classifying) the 'things' that a business operates upon and which operate upon the business (customers, products, contracts etc). In other words describing the real world framework in which a business exists, its 'ontology'.
Alan posted on Tuesday, February 15, 2011 5:05 PM
Both data modelling and process focussed documentation would be involved in representing these relationships and business rules. It is not clear what deficiencies in combination of data modelling and process documentation is being covered by the new alternative recommended here.
atv posted on Wednesday, February 16, 2011 8:25 AM
Bill, thank you for the article. From one side data metamodel makes communication with developers easier. From other side it is another instrument for elicitation hidden requirements when communicating with customers.
Only registered users may post comments.



Copyright 2006-2024 by Modern Analyst Media LLC