Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Data Management Approach

Data Management Approach

What are the principles that need to be followed to ensure a more pragmatic and productive approach to actually improve the data?

The “What” of Data Management was discussed in a previous post Data Management Goals.  Recent changes in this post are marked in maroon dark red italics.

Data Management – the What

The DAMA Data Management Book of Knowledge (DMBOK) focuses on activities, roles and responsibilities organized by processes.  It also covers every subject area and approach in the industry.  More importantly, it shows that the scope of Data Management is quite broad.  DMBOK provides the art of the possible.  As a result, it has become quite large and the text has been subdivided into many sections.

I have always felt that DMBOK is an amalgamation of general best practices provided by the best known data professionals in the industry.  It is great to be able to point to DMBOK to support your ideas on implementing Data Management.

Data Management – the How

DMBOK does not tell you what tools to buy nor the standards or methods needed to implement “the how” of data management.  That was not the intent of DMBOK.  In fact, no one would want nor expect DAMA to promote specific tools and solutions. Each organization has its own requirements and goals.

Figure 1: Data Management Program Overview

I do not feel that we have to do everything exactly as stated in DMBOK.  We also need to allow for creativity in coming up with new and better ideas as well as tailor the Data Management Program (DMP) to each organization (org).  In my deliverables I have also added best practises and lessons learned based on over 35 years of experiences on the development of Enterprise Data Warehouses and other major databases and Big Data stores.  Data management, data Architecture, data quality, data integration and business analytics were essential for the success of these projects.

Often I had to promote data management best practices and standards when no formal program yet existed. Some of the best practises used were:

  • Follow a data-centric rather than a tool/solution-centric approach;
  • Define the data problems precisely using data profiling rather than just stating subjective opinions;
  • Develop an Data Management approach that is incremental rather than “trying to boil the ocean”;
  • Enable incremental changes to the data quality (remediation) rules to be applied to the data already loaded in to the database or data warehouse;
  • Track and persist the causes and results of the data remediation directly in the database making them available for Business Intelligence reporting;

Figure 2: Data Quality Status Dashboard

Data Management – the Why

The value sections in the detailed frameworks listed below, explain the benefits to the Business management.  The Data Management Program Framework, value proposition briefs and presentations discuss the WHY of the overall DMP.  These deliverables would be used to gain the support of management.

Data Management Program Approach

Generally, companies develop a framework to define their program. The frameworks obtained from other organizations were 20 to 44 pages long covering all functional areas of DMBOK.  These short documents were generic and provided motherhood statements without much value.

I wanted to go way beyond platitudes and describe how to implement data management using best practises from the industry as well as my experiences.  I wanted a framework that was practical rather than just conceptual.

The approach taken was to write a separate, in-depth Framework to explain each subject area. This allows the Business and IM/IT resources to focus on the material relevant to them.  Each deliverable is an object-oriented component that stands on its own and does not repeat the information stored elsewhere.

Each DMP Framework is 30 to 60 pages in length.  It provides practical, concrete advice and best practices for specific tasks and deliverables.  A Value section explains why each item will benefit the Business and provides references to external sources (DMBOK, DGI) and internal standards documents (Guidebooks).

A Guidebook would specify best practises and standards that must be followed when needed for a specific subject area.  Guidebooks can be created when there is substantial amount of technical advice and standards that need to be applied by data and IM/IT professionals.

Alternatively, information geared to technical resources could be placed in appendices. However, if the technical material is many pages long it is better to make it a separate component.  This also allows changes to be made independently to each document.

Data Management Frameworks

The diagram below, adapted from DMBOK, simplifies and consolidates the program into five subject areas.  Why did I organize this way?  I did not agree with the placement of some subject areas in DMBOK.  I feel that the subject matter must be organized so that the best practises and standards are in one place to make it easier for resources to do their work.

Figure 3: Data Management Program Framework

The circle segments in the figure above indicates which subject areas in the DMP framework corresponds to functional areas in the same position in the original DMBOK figure.  Given interest, each of the following deliverables could be the subject of another – or a few – posts:

  • Data Management Program Framework
    overview framework that discusses the WHY of the DMP
  • Data Governance Framework
    describes Data Governance roles and activities, DG Model and the interrelationships of the Data Stewards, Data Quality Office, Project Management Office, Architecture Review Board and DG Council
    describes the Data Governance processes to ensure that frameworks are followed
  • Data Architecture Framework
    explores activities, deliverables, roles and responsibilities, and value
    describes Enterprise Architecture Principles (from TOGAF) as well as the author’s architectural principles behind the best practices and standards 

  • Data Quality Framework
    examines DQ definition, activities, deliverables, DQ principles, DQ Rules, and measurement   
  • Data Integration Framework
    defines DI Architecture, cleansing approach, re-usable cleansing functions, Master Data Management approach, result tracking approach, and DQ measurement and reporting   
  • Data Warehouse-Business Intelligence Framework
    explains the pillars of a Business Analytics Framework, activities, metrics, data modelling and BI semantic layer, DQ reporting, and Centre of Excellence

In addition you will need:

  • Guidebooks – probably for Data Architecture, Data Integration and Business Intelligence
  • Training Documents – briefs, presentations… that discuss the WHY of the DMP
  • Tool Evaluations – Data Modelling, ETL, BI and other tools

Here are some comments on the comparison of the DMP subject areas to the DMBOK functional areas.

Data Architecture

The Data Architecture Framework (DAF) combines DMBOK’s Data Architecture, Data Modelling, Metadata and Interoperability into a single document.  Data Development (Data Modelling) and Metadata Management are integral to Data Architecture, a point recognized by DAMA.  They separated them because the original chapter was getting too long.

I put them together because this is the key area and focus of data architects and modellers. Metadata and modelling are inseparable.  We need data modelling rather than the just-build-it approach still used today.  

Interoperability

DAMA places Interoperability within the Data Integration function but my DMP moved Interoperability to Data Architecture.  It is true that data exchange is data movement.  However, a well-designed data architecture is the essence of interoperability.  The most important aspect to achieving Interoperability is a common data model; i.e. an Enterprise Data Model (EDM).

Figure 4: Interoperability  via Canonical Data Model

I have worked on NATO interoperability and the key ingredient is the common canonical data model, whose use is mandatory for every NATO member.  Canada’s Dept. of National Defence (DND) and the JC3IEDM model are even mentioned by the TOGAF architecture framework as the example of interoperability.

When we wanted to add system-specific entities and attributes, these were added via a separate model.  Members were not allowed to unilaterally change the canonical model or core database.  In other words, the key to Interoperability is a data architecture not data movement.

Enterprise Data Model (EDM)

DAMA stresses the need for an Enterprise Data Model (EDM) as one of the prime activities of Data Architecture as well as Data Governance.  However, just because it is called “Enterprise” does not mean that every Entity Relationship in the org must be  analyzed and incorporated.  It would take considerable time and effort to document the As-Is Architecture, which is also a moving target that needs continual updating.  Moreover, the entities, relationships, business rules, and attribute names would not necessarily match what the To-Be Architecture would use.  After a year, management would ask what value or benefit were they getting?  This approach was rejected by at least one of my clients.  

The DAF focuses on producing the EDM as the template to be inherited by all future models and thus databases.  From my experience, reverse-engineering legacy models using a data modelling tool is essential for understanding the source systems to be ingested.  But the legacy systems were rarely good models for an EDM.  We do not want to repeat the design mistakes of the past?  

The EDM is particularly useful for the development of DWs, Data Marts, BI Semantic Models, and especially for Reference and Master Data.  As in an EDW, the development is phased in as subject areas are incorporated.  The Reference and Master Data have to be the first subject area.  In other words, the focus is on the standardization of the future attributes, naming conventions, business rules, entities and relationships so they can be inherited and reused by project-specific models.  

Reference and Master Data

Reference and Master Data are part of Data Architecture.  In fact, about 50% of an Enterprise Data Model (EDM) consists of the reference and master data entities. If you want consistency, accuracy, conformity, and interoperability then project-specific models must inherit these global definitions from the EDM.

The best way to ensure better data management and data quality is to build quality into the Data Architecture.  Many of the system problems I have seen were due to the lack of a well-designed approach to capturing and using Reference and Master Data. One government department built three master data repositories and each solution failed. Each solution failed to ensure referential integrity in particular and data integrity in general.

Data Quality

I am sure all data professionals are aware how interrelated the DMBOK functional areas are.  There is a lot of overlap.  In my approach one of the more difficult frameworks to describe is Data Quality.  The Data Quality Framework (DQF) concerned itself with how DQ was going to be measured and described data analysis (data profiling) techniques and best practises.

DQ is so closely tied to Data Architecture and Data Integration. Rather than repeat information, I addressed best practises and standards according to how and where it was being done.

The goal is to build Quality right into the database by embedding data integrity rules and other architectural best practices directly into the models and therefore into the databases.  Improved Data Architecture is essential for managing data assets, satisfying enterprise/system requirements, and improving Data Quality. This is the reason that Data Quality must incorporate Data Architecture – a best practice shared by many data quality specialists around the world.  Better quality business intelligence and management decision making is enabled through better quality data.

Data Integration

Data Integration is an application so it too needs both a Data Architecture and an Application architecture.

Data Quality

It’s important to track the success of the data cleansing and improvement in the Data Integration solution in order to demonstrate the value IM/IT is providing to the business.

One of my best practises is to persist the results of the data cleansing processes right into the database itself.  This means that the results of the DQ processes can be viewed by the end-users directly in the databases, posted on dashboards or published in BI statistical reports.  This should also increase the confidence the Business will have in the data or indicate that the data has issues.

Master Data Management

Master Data Management (MDM) was placed under Data Integration.  MDM requires data cleansing and enhancement (transformation), so it is just a specialized form of Data Integration.  Therefore the same resources are involved using the same best practises and standards with the additional consideration of sophisticated MDM appliance/server solutions versus bespoke solutions.

However, one must walk before one runs.  Therefore, one needs to develop basic transformation of core attributes needed to perform entity resolution before one starts either a bespoke or commercial MDM program.  If you cannot accurately match entities then you cannot identify all duplicates.

Data Warehouse-Business Intelligence

In the orgs where I have worked, I focused on what is not being done to provide the quality of data needed to help the Business find the answers they seek.   One needs good Data Quality (DQ) to provide the good Business Intelligence (BI) / Business Analytics (BA) that is required to successfully run the business.  The primary goal is to provide accurate and trustworthy data.  You can only achieve information delivery success with data quality success.  The key is not gathering information; it is what do you want to get out of the information.

Grey Areas

Why did I exclude the grey areas?  It is not that they are not important.  Aspects such as Data Storage, Document Management and Security are usually well documented by IM/IT.  My clients already had apps to control these aspects.  If that is not the case then one could develop frameworks to address these areas.

As an Enterprise Architect myself, I am well aware of EA frameworks and architectural principles.  EA terms and concepts are described in the DMP overview material.  For instance, the DMP mentions that almost every framework – whether it is TOGAF, Zachman, et al. – has data as the first pillar.  Unfortunately, too many times data design is handled in a Just-Build-It manner by developers.  However, the focus of this discussion and the documents is a Data Management Program, not an EA framework.  

Conclusion

The Data Management Frameworks (DMFs) I have written are more practically focused on deliverables.  They do not repeat what is already well documented in DMBOK.

However the DMFs explain the practical things that must be done.  They simplify DMBOK by reducing the chapters back to precise and concise subject areas.  They also provide separate documents that can be consumed independently.

Footnotes

DMP Data Management Program Framework, Terra Encounters 2015-03-06

Enterprise Architecture led Data Quality Strategy, Jay Barua, Director IT, Direct General; 16th International Conference on Information Quality, 2011

TOGAF 9.1, §29. Interoperability Requirements, The Open Group 2011



This post first appeared on Terra Encounters, please read the originial post: here

Share the post

Data Management Approach

×

Subscribe to Terra Encounters

Get updates delivered right to your inbox!

Thank you for your subscription

×