Master Data Management

Recently I had the pleasure of discussing with a prospective client the
bulky subject of Master Data Management (MDM). In this situation the
client was considering a variety of MDM solutions and wanted some
specific direction into which technology/vendor to choose from.
Now it is no secret that I tend to favor Microsoft products but in this
case the platform already in place was Oracle so naturally, I wanted to
give them more general advice instead of just pushing MS MDM.
There are quite a few upfront activities that must be done regardless
of which products you are going to use. I've listed some introductory
efforts that need to be done earlier in the process and that can be
done before production selection.
- Governance
To get the ball rolling, you really need to figure out what data you
are talking about, where it lives, who thinks they own it now, and what
organization you need to address this effort. Here are some suggestions
on how to make this go smoothly:
- Identify External Data Ownership
Often there will be multiple owners identified for the same data. Being
able to identify these different uses in their various applications is
a significant lever used throughout the rest of the process. Without a
clear understanding of where the data is currently 'owned' and its
criticality to various applications, mis-matched expectations will
cause problems.
- Formalize Ownership in MDM
This often requires
significant negotiation and compromise from the various stakeholders.
In organizations where the inter-organization trust level is high, this
is more straight-forward. In some environments, only a top-down
directive will accomplish this. Whichever tactic is used, establishing
formal ownership is a prerequisite for success.
- Identify Data Domains
This sounds a lot easier than it
always ends of being. Attributes utilized by disparate systems often
have subtleties in their definitions which must be accommodated. Being
able to provide synonym mapping and transition approaches is imperative
to keeping comfort levels high between organizations.
- Formalize Domain Administration Process
Having an
established (and hopefully standardized!) set of processes to perform
CRUD operations on domains can take the stress out of relationships
where trust levels are sub par between applications.
- Establish Organizational Governance
This is often rolled
up into Change Management or a similar phase, but in reality needs to
be an ongoing activity. These organizational groupings will allow for
escalation procedures, conflict resolution, and increased visibility
into the data lifecycles.
Once you have the basic framework to talk about data ( and who owns
it, how it gets maintained, etc. ) you can then delve into the
specifics of the data.
- Model Dimensions
If governance is properly addressed, this should be a fairly
straightforward exercise for data architects. They'll go through some
steps similar to the following:
- Identify Dimensions
What data are you really talking about? You need to pick a single name
for each domain. Sometimes the same type of data is called different
things in different places. You have to formalize on a common taxonomy.
- Identify Consuming Applications
There may be downstream
consumers beyond the currently perceived data owners. Make sure you
understand the data lifecycle and flows so you can estimate impacts
properly.
- Identify Entities Per Dimension
Make sure you flesh out
the taxonomy down to the most granular level necessary. The more detail
you include now, the more hours saved later.
- Identify Entity Attributes
This is a critical step that is often
treated as a second-class citizen. In reality, it is a key driver.
Without precision about attributes, the potential values and rules
(which come next) can't be properly defined. It also means your
estimates will be incorrect. - Quantify Attribute Values
This is only as important as the complexity of the data rules (which
come next) and the accuracy of your rule implementation estimates need
to be. If your sources have this well defined, it should be
straightforward to ensure this is comprehensive.
- Identify Data Rules
This is sometimes referred to
ambiguously as Business Rules which is very imprecise. Here we are
speaking not to rules governing operations or activities (hence the
term Business) but instead the states, lifecycles, and value cohorts
for entities and domains.
As you can well imagine this is a significant undertaking for any
organization, and can be done in
a technology agnostic way. Once you've
identified and quantified this information, you can begin to look at
transition plans, data flow planning and infrastructure. These are very
technology and environmentally specific.
As the proverb states: "Measure twice, cut once."