If you receive errors when attempting to view this white paper, please install the latest version of
"Founded in 1972, SAP has a rich history of innovation and growth as a true industry leader. SAP currently has sales and development locations in more than
50 countries worldwide and is listed on several exchanges, including the Frankfurt Stock Exchange and NYSE
under the symbol SAP."
Source : SAP
Enterprise Information Management: Strategy, Best Practices, and Technologies on Your Path to Success
Information Management is also known as :
Applied Info Management,
Web Based Project Management,
Easy Document Management,
Collection Management of Information,
Related to Information Management,
Information Management Tools,
Content Management News,
Information Technology Management,
Information Management Definition,
Information Management Course,
Information Management Journal,
Personal Information Management,
Information Technology Resources,
Enterprise Information Management,
Information Management System,
Security Information Management,
Information Management Software,
Information Management Strategy,
Information Management Solution,
Effective Information Management,
Definition of Information Management,
Records Information Management,
Knowledge Management Resource,
Guide to Information Management,
White Papers about Information Management.
- The Business Value of EIM
- Getting Started
- EIM Strategy
- What Goes into an EIM Strategy
- In Far of Pragmatism
- EIM Best Practices
- IT and Business Collaboration
- Trusted Information
- Enterprise-wide Reuse and Standards
- Data Gornance
- Taken Together
- Requirements for Information Management
- SOA Support
- Centralized Data Management
- Complete Functionality
- Seamless Integration
- Ease of Use
- Information Management Software
- Data Quality
- Metadata Management
- Master Data Management (MDM)
- In Closing
When faced with information management issues, particularly those in a cross-functional
setting, many business and IT professionals turn, albeit often unwittingly, toward Enterprise
Information Management (EIM). EIM is the effort and practice of reaching across all data
and application silos embedded in the organization’s operating infrastructure; then binding
those repositories together into one effective information management environment where
information is delivered to the person who needs it, when they need it, and how they need
it. EIM, as the term denotes, spans the entire corporation, regardless of size, from a small,
30-person garment maker to a 50,000-person, multi-national manufacturer. Agility, accuracy,
and completeness of data delivery are the three primary objectives. An EIM initiative
will often be launched well after the organization has implemented its patchwork infrastructure
of disparate repositories and applications, signifying a creeping recognition that
data integration is broader than individual systems and organizations. As data management
practices evolve and become adopted, companies realize that they can be more effective in
the use of their information if they take their overall information architecture to the next
level—one in which disparate, siloed repositories and applications are instead planned
and designed to interoperate and deliver information quickly, completely, and in the
An entire book would be needed to expose EIM to the depth and breadth that it deserves.
The goal of this paper is to paint the EIM landscape, noting its components but focusing
on the importance of an overarching EIM strategy that focuses on corporate objectives
while at the same time offering cross-functional support. Knowing that EIM exists is the
first step towards understanding how business issues fit in the information picture. With
that overall view, the business and IT manager will be better equipped to discuss, compose
requirements, and draft designs for the modern information management environment.
Given the breadth of the EIM domain, which is essentially any policy, practice, process or
technology that manages information, this paper will delve into two areas that can deliver
immediate value to the reader today: (1) EIM strategy development and (2) enabling
information management technology. Understanding these two areas is crucial to starting,
planning and executing an EIM initiative. The strategy lays out the blueprint of the EIM initiative,
communicating the vision, goals, and prioritized projects. And while there are other
important technology concepts in EIM—such as data warehousing and data security—only
a corporate-wide data management vision can bind disparate, heterogeneous data sources
together in a framework for access and sharing of data. This is a fundamental goal of EIM.
As such, we will discuss metadata management, master data management, data quality, and
data migration—all of which play important roles in integrating and managing data.
The Business Value of EIM
EIM is about managing information assets across the entire enterprise. The enterprise can
be large or small, with several divisions or business units, or it can be a single functional
entity. Whatever its scope, EIM involves fostering, creating, and maintaining practices that
allow the business to optimize data access and usage regardless of where the data resides
and what functional entity needs it. First and foremost, EIM exists to support business
objectives. This means business drivers are used to form the EIM strategy and tightly link
them to corporate goals, such as profit, revenue, share value, etc. In order to aid in the
attainment of business objects, various operational barriers must be overcome. One barrier
that EIM is uniquely suited to breach is the difference in data definitions, business rules,
and even jargon between functional entities. Resolving data anomalies such as semantic
inconsistencies, duplicate or missing data, and inaccurate values is one of the drivers of
EIM. This implies implementing processes and infrastructure that allow different business
units or functions to communicate and share data in a common vernacular. Let’s face it,
manufacturing sees a ‘product’ as a part on the shop floor. Marketing considers product
as one of many of the company’s offerings. And accounting will insist it is a line entry in
the general ledger. These are semantic differences. EIM, specifically the data integration,
metadata, and master data management (MDM) elements, seeks to bridge those semantics
through practices and technology that first exposes the differences via metadata, then integrates
the diverse data entities into common objects, and then turns them into master reference
data used as the basis for information understanding across all business functions.
Few organizations have the budget or wherewithal to implement an EIM strategy across
all lines of business and all data volumes in one fell swoop. Instead, the best approach is
to pick the problems that EIM can address, prioritize them, and then implement that portion
of the EIM strategy that delivers the highest value in the quickest timeframe. In this
way, EIM benefits can be reaped early on in the initiative and used to credit, justify, support,
and even fund further incremental EIM projects in the strategy portfolio.
Take, for instance, a medical equipment supplier’s first foray into EIM. It was considered a
smashing success by both customers and IT practitioners alike. By first collecting the information
on their thousands of products into one master repository, and then cleansing and
standardizing the individual records, they were able to match and consolidate the products
into a hierarchical tree. Instead of the data being segmented according to specialty catalog,
which resisted vendor and product comparisons, they could now see which vendors
offered the best price performance in general and which offered the best price for unique
categories. The distributor was able to streamline catalog production, reduce the number of
catalogs, and offer a better product mix in the catalogs that remained. The ability to refine
business rules about products and vendors and to deploy data quickly not only meant better
decision making, but enhanced collaboration between product line units.
Ultimately, the key benefit of an EIM initiative is the creation of an effective and dynamic
information management environment with robust facilities for data creation, collection,
summarization, sharing, and reporting. The ultimate goal is maximizing business performance
through access to trustworthy and authoritative business information.
A common question is "How do I get started with EIM?" Interestingly, the adoption and
maturity of EIM appears to be moving in lockstep with data quality. When data quality
adoption began accelerating in the mid-2000s, practitioners changed their question from
"Why should I care about data quality?" to "How do I get started?" The same evolution is
occurring with EIM.
Creating an EIM strategy is the way to get started. With the strategy in hand, the next steps
follow classic IT project management: Build a program plan, and within the plan, begin to
drill down and define the kernels of the individual projects, as shown in Figure 1.
Ultimately, the purpose of creating the strategy and building the program is to formalize
EIM within the organization. Developing an awareness campaign informs stakeholders of
the benefits of EIM and how it will accelerate the attainment of corporate goals. As with
data quality, the success of an EIM initiative comes quickest when the organization is
already feeling business pain because of poorly understood, defined, or integrated data.
Those organizations that want to excel eventually demand a strategy for dealing with
As EIM is formalized through strategy development, approval of the strategy by senior management
establishes the charter for the EIM initiative. Once approved, the program plan
aligns resources, priorities, and schedules to the individual projects. At some point during
the second or third project, it will have become clear that EIM has been operationalized. The
charter and strategy is in place. The program plan is being executed, and data governance
activities are creating and refining policies, business rules, and even metrics to measure the
success of the business. These business metrics are key, as many measurements will have
not been available before the initiative was started. These metrics will provide a newfound
transparency into how well the business operates. The information delivered by these metrics
should be used to highlight the EIM initiative and form the basis for new justifications
to expand the program beyond the initial pilot projects.
An individual project can be large, like launch a CRM system, or small, like create a data
stewardship council. It all depends on the project scope. The detailed specifications for
each project are then developed, prioritizing each one according to business impact, return
on investment (ROI), and executive support. This structured and metrics-based prioritization
process will help bubble candidate projects to the top. If you are new to EIM, pick
the smallest projects first and schedule them to complete one after the other. To quote
Applegate, et al., in Corporate Information Strategy and Management:
Infrastructure that lends itself to incremental improvement enjoys favorable management
attributes; for example, investment and implementation risks are easier to manage when
improvements involve a series of many small steps rather than a few ‘all or nothing’ steps.
Incremental improvement also facilitates experimentation and learning.1
Of course, achieving such a lofty objective requires not only an understanding of how
heterogeneous and disparate a company’s data is, but of the associated business impacts.
An EIM strategy, developed jointly by business and IT, is the best first step.
Not surprisingly, many organizations implement portions of EIM without realizing it. A
common example of this is a firm that was desperate to provide a sales contact and pipeline
tracking tool to its diverse and geographically distributed sales force. The firm wanted a
system that all sales people could use; all data was stored in a single centralized repository;
it included standardized and robust reporting for both contacts and weekly activity; and
it was Web-accessible. The solution was a sales force automation (SFA) application, and it
was deployed across the enterprise.
When the information demands of a corporate function are implemented in such a way
that benefits the business, the application is considered a success. That is, until the next
EIM challenge is tackled. In this example, the assumption is that the architects and planners
of the SFA application designed it to operate and integrate well within the firm’s existing
infrastructure. After the firm implemented SFA, they then turned their attention to building
a more effective marketing organization and wanted to deploy customer relationship management
(CRM). Now the question became: How will the SFA and CRM systems interoperate?
And what about the product information system that manufacturing was considering?
So far, siloed pieces and parts had been implemented without any overall vision or strategy.
Corporate or functional goals (if visible) were being addressed in isolation of each other.
Conflicts will invariably arise over funding, interfaces, roles, and objectives, and instead
of having a collaborative EIM environment, infighting and bickering over span of control,
budgets, and development schedules ensues. Without a plan, any progress towards EIM will
be as much by luck, given the failure rates of so many large-scale system implementations.
The answer to this problem is to create an EIM strategy.
What Goes into an EIM Strategy?
Before an organization can build any type of strategy, it needs to have a vision of where
it wants to go and a set of goals that support, drive, and measure success towards that
vision. This vision, along with goals, is absolutely crucial for forming and directing the
For example, if the vision of the organization is to have a 360-degree view of the customer
so it can increase revenues through improved customer intelligence, then an EIM strategy
might include a customer data integration (CDI) effort, data quality automation, and the
acquisition of an analytical CRM tool. The information architecture planning will take into
account the data infrastructure and policies necessary to support this vision. In this case,
corporate strategy—where the vision and goals are laid out and articulated—serves as input
into the EIM strategy. From the corporate strategy, the CIO, IT director, and their business
unit counterparts analyze each directive and formulate what and how the information systems
need to change to meet those directives. Often it will be the mid-level managers who
first grapple with the concept of an EIM strategy because they are the ones who will most
likely be directed to execute on specific goals. These managers may work in either business
or IT, and will usually be the first to document the deficiencies (gaps) in the existing infrastructure.
This gap analysis and resolution planning is the first stage of EIM strategy development,
but the planners need to know it, lest they architect yet another isolated data silo.
People: Information is consumed by people. Moreover, it is the people in the organization
who establish the vision and goals for the initiative, staff the processes, dictate the policies,
and deploy the technology. Therefore, the "people" aspect of an EIM strategy considers the
roles of IT and business managers, their specific responsibilities, and how they are incented
to achieve EIM objectives. A best practice that epitomizes the People quadrant is IT and
business collaboration, which will be explored further in the Best Practices section below.
Processes: An EIM strategy will answer, at least at the high level, how a chain of information
operations should interact. An information operation is any process that uses data—
such as a direct marketing campaign, an order entry system, or a customer dashboard. The
strategy will bind together the People quadrant with the Processes quadrant to define who
manages and participates in a given workflow. A key process, and hence best practice, is the
creation and maintenance of trusted data. After all, what value is EIM if you can’t trust the
information it delivers?
Policies: Closely related to People, but in a separate quadrant are Policies. Perhaps the
quadrant with the least exposure, the policies category is comprised of business rules and
data governance, which is seeing increasing awareness of late. The reason that organizations
are awash with data, processes (either broken or working), and applications is because
there are no formal or published guidelines that govern information rules and policies.
The classic question of "What is the definition of a customer?" is answered by the data
governance function. How can disparate operations efficiently cooperate on business goals
if they can’t agree on business rules and definitions? Policies and data standards set by the
organization for their unique context are the foundation upon which the people, processes,
and technology are constructed.
Technology: The last and probably most visible of the quadrants is Technology. The simple
fact is paper and pencil went the way of the buggy whip when it comes to managing
information—and today’s spreadsheets are close behind. Technology—including software
applications, databases, and middleware, among others—is the quadrant responsible for
information delivery. However, technology can quickly become inefficient and unbearably
complex if not managed, and an EIM strategy focuses on what otherwise could be chaos.
The Technology quadrant of EIM needs to define the interoperability of business applications,
how and when data should be secured and shared, and what level of complexity is
acceptable to the users. A key best practice for this quadrant is enterprise reuse and standards.
As we’ll see below, the goals for technology in an EIM strategy are ease of use,
ability to share data, complete functionality, and integration with other EIM components.
In Favor of Pragmatism
Don’t let the breadth of EIM scare you. Any organized and holistic progress you can make
is better than no progress at all. For example, an EIM strategy, especially in the beginning,
can be large or small, have multiple phases, and have a long or short horizon—but it will
always be living and dynamic. If there is one strategy that will evolve with an organization,
it is the EIM strategy. No other system employed by the business is more dynamic than its
information systems. There are several reasons for this:
- The tremendous and continuous growth of data volumes;
- The rapid advance of information technology;
- The increased rate of new systems development efforts;
- The rise of external data sources, resulting from mergers and acquisitions and from partners and customers;
- Evolving data formats, including unstructured data; and
- An increased business urgency to accelerate the pace of competitive differentiation.
The above list reflects tremendous forces on the information infrastructures. Plan a regular
review cycle, perhaps every three months, but no more than six. Plan to improve, expand,
and refine the strategy. For every change to the corporation’s business strategy and goals,
there will also be corresponding changes to the EIM strategy. One is responsible for
delivering on the other.
EIM Best Practices
The best practices in EIM are as numerous as the types of benefits they deliver. In this
paper, we choose four practices—one for each information management quadrant—
that every IT and business leader should understand.
IT and Business Collaboration
If you’ve ever sat in a meeting where business managers complained that IT delivered
applications that didn’t meet their needs, or the business managers didn’t understand IT’s
project prioritization process, then you’ve been witness to the lack of business/IT collaboration.
In those situations, either side assumes they know what the other is doing or what it
needs and goes marching off in blissful ignorance. What has been lost is the fact that one
side is the customer and the other side is the supplier, and both are partners in achieving
the organization’s goals. How can IT help the business if they don’t ask business for their
goals, needs and requirements? And how can the business ease the IT burden if they don’t
prioritize by explaining the goals, needs, and requirements of the business? We’re not talking
about one email message sent to the CIO from the VP of sales and marketing. We are
talking about constant and regular communication between all echelons, with the players
so enmeshed that you have to look at their business cards to tell them apart.
Collaboration between IT and business is by far the most important EIM best practice. You know
business and IT collaboration is a success when the joint team meets for its weekly project
review and the "business" asks "IT" questions, and "IT" asks the "business" questions. Each
side is completely aware of the other’s issues. While this may be the height of collaboration,
an indicator of solid progress is when the two sides can speak in shorthand and not
feel the compulsory need to explain all the minutia of their various challenges. They’ve
gotten past it.
No EIM initiative will be a success unless some portions of business and IT communicate
back and forth regularly, in writing and in person. It is true that IT can guess at the needs
of the business without their input and, given enough tries, will deliver an application that
the business can use. Email and Internet connectivity are two examples of communication
channels, but both are commodity services and neither offers a competitive advantage.
Only through rigorous collaboration will business and IT define requirements for systems
that optimize performance for their unique organization and culture.
Beyond people themselves, the foundation of any company is the knowledge used to
conduct business. It’s that fundamental. For some of us, this can be a scary thought. It
is because of this that a goal of EIM—through data quality, data profiling, data integration,
and other functions—is to enhance the measurable integrity—i.e., the trust—of the
information. How is trusted information created? It is created through the use of a series
of processes that ensures:
- The data is captured accurately (with no errors, transpositions, etc.);
- The captured data adheres to corporate data standards (formats and definitions);
- The data is moved, integrated, and summarized as needed when needed;
- The data is matched and consolidated to the hierarchical levels and context required;
- The data lineage can traced to its origins;
- The data is maintained and cleansed over time as it ages; and
- The data serves the business requirements that drive its access and use.
Without trust, the significant investment in enterprise-class IT systems, such as CRM or ERP
systems, will be squandered because the business users will instead invest in and rely on
their own private data stores—typically spreadsheets. Business productivity degrades to the
level of individual management and interpretation of data. Most companies are not only
seeking the use of sanctioned and meaningful information, they are hoping that information
will result in competitive advantage. Can the information be used to make critical
decisions? You need to go no further than healthcare, patient treatment records and family
medical histories to understand what trust is. When the doctor looks at the online medical
records, she will make a potentially life-changing decision on what is stored in that system.
CFOs, CEOs, and other business leaders make their decisions based on data too. Therefore,
a best practice of EIM is to ensure data integrity is maintained throughout the information
Enterprise-wide Reuse and Standards
The very nature of EIM dictates that the greatest value derived from information and IT
assets is when they are leveraged across the entire enterprise. This provides for economies of
scale, the sharing of data, the uniform spread of technology, and the effective use of trained
and experienced staff. A goal of any EIM initiative is to ensure that an application developed
for customer support, for example, can be accessed and used by marketing.
After all, why reinvent the wheel? It is true that wheels come in different sizes and are made
of different materials, but proper EIM planning takes that into account and ensures that a
version of the same application, with adjustments to the user interface and data model, can
be delivered to marketing with a minimum of work. In so doing, customer support and
marketing are essentially using the same system and data, but adjusted within a tolerance
the two functions can support. This means that business applications—particularly data
integration applications such as CRM, CDI, and MDM—can be deployed faster and serve a
Information management technologies have evolved to the point where the platforms they
are built upon can support a wider range of business operations, often accessible from a
single repository (see the Information Management Software section below). The platform
approach to delivering data quality and data integration functionality, for example, standardizes
data delivery. Now marketing, customer support, and sales departments can all
expect the same behavior and consistent results from a cleansing operation. Substantial
work is invested by data stewards into data definition and business rule development.
This data is captured and stored within a system in a structured and sustainable way.
EIM practice would dictate that those business rules be made available across the enterprise
so that other functions, such as marketing, can standardize on those definitions and not
have to replicate the weeks or months worth of "pick and shovel" work to create them.
Moreover, smart EIM teams, through a common application platform, will allow marketing
to inherit those rules and change them to suit their own specific needs. Marketing can then
publish its own set of business rules to the enterprise, making the data environment deeper
and richer with managed vertical content. All of which follows corporate standards invoked
through the data systems via the user interface, rules repositories, and data models.
In the book Customer Data Integration: Reaching a Single Version of the Truth, the authors
The goal of data governance is to establish and maintain a corporate-wide agenda for
data, one of joint decision making and collaboration for the good of the corporation
rather than the individuals or departments, and one of balancing business innovation
and flexibility with IT standards and efficiencies.2
This goal emphasizes the importance of policy making around corporate information. If
you’ve ever heard a manager say – "We back up our data whenever we can" or "The quality
of our data is okay. It could be better, but there is no one driving that" – you have just
heard a failure of data governance. It is the purview of the data governance function to
establish, amongst a myriad of other policies, the sanctioned definitions and acceptable
level of quality for corporate data. Data governance must be done in a well-planned and
cross-functional manner. It is also implemented up and down the organizational hierarchy,
so that the data stewards who regularly manipulate and fix the data can raise their issues
and propose tactics, while business directors and executives can set goals and propose policies.
In the middle of the governance function, the proposed policies meet the nascent
tactics and the two come together, over time establishing a robust policy and rules system
that meets the need of the organization by "…balancing business innovation and flexibility
with IT standards and efficiencies." Implicit in that quote is the refusal to restrict organizational
growth with needless straightjacket regulations, but to set standard processes that
deliver greater value.
These best practices can be implemented separately and incrementally, but they gain exponential
value as other practices are added to the EIM framework, gradually putting on
muscle. Bottom line: EIM is not built overnight. It is built every day, and with each sunset,
some small part has been added, and with each sunrise, there is the promise to add more.
Requirements for Information Management
In order for a suite of information management applications to support the demands of a
robust EIM environment, there is a high-level set of requirements the suite should satisfy:
- Services Oriented Architecture (SOA) support;
- Centralized data management;
- A complete solution for a given chain of operations;
- Easy or existing integration with other applications; and
- Easy to use for all use cases.
These requirements are about deploying an easy-to-use solution for any part of the EIM
problem domain across the enterprise, and ensuring the targeted users applaud its effectiveness.
So as practitioners go about either building or buying components of their EIM
infrastructure, they should keep these five requirements firmly in mind, and bake them into
the specification process to the extent possible. Consider it, if you will, part of the standard
The ultimate purpose of SOA is to provide an application-independent interface layer to
IT architectures that connect multiple data silos across the enterprise. SOA is modern-day
middleware—only this instantiation is proving to be more effective and is gaining broader
adoption because it is evolving into an industry standard. Industry standards are good
for EIM because anything that eases and simplifies data sharing and operational integration
makes EIM easier to implement. In the past, attempts at EIM have been problematic
because integrating between data silos took substantial effort and time. SOA directly attacks
this decades-old problem. Moreover, SOA is not just about requesting and receiving data;
it is also bi-directional. Data sources can call published services via SOA to perform specific
functions, like launch a series of data audit tests when an event is triggered. SOA makes
EIM operations richer because they can both pass information and invoke procedures.
In the grand scheme of things, organizations become more agile.
Organizations implementing SOA do so to reduce costs through reuse, change systems faster,
or modernize their system architecture. SOA agility means new applications or services can
be brought online and have their capabilities published; existing operations can then subscribe
to them without disrupting existing applications. Integrating disparate applications
is now substantially easier: it removes significant time and cost from systems development;
the standard SOA connectivity isolates and abstracts programmatic interfaces; and it eliminates
the drama of system maintenance and upgrades.
Centralized Data Management
A challenge to information management is the distributed nature of the applications and
systems that generate and use company data. While substantial effort is regularly invested
in getting databases, marts, applications and warehouses to "share" their data, there
is always equal pressure to create new silos—temporary or permanent—for very good
reasons. While a company’s data systems may grow like buildings in cities, there is no
reason the management of the data in those systems should remain disjointed. Similar
to how buildings are connected by telecommunications and roads, and managed by zoning
restrictions and centralized property management firms, so too can distributed data
systems be interconnected and centrally managed. Business intelligence (BI) and data
integration competency centers, data governance councils, data stewardship programs,
metadata management, and other efforts are all components of a common data management
infrastructure. The benefits of this approach are substantial:
- Formal data management organizations are sanctioned by the company’s leadership,
and therefore, their responsibilities are more apt to be recognized by both the
business and IT.
- As roles and responsibilities are clearly defined, an enterprise-focused data management
organization is more able to justify and absorb them.
- Policies and procedures are standardized once and practiced continually.
- Metadata and business rules have a central point of reference.
- Systems of record are identified, prioritized, and recognized as key data sources.
- Technology maintenance is streamlined and is more cost effective.
- Data provisioning is an enterprise-based service, thus leveraging specialized skills and
data reuse across projects and systems. The resulting cost savings can be substantial.
In the pursuit of centralized data management, firms will create solutions and application
architectures that can access and manage the content of many different systems in a sustained
and repeatable way. Sometimes, as in the case of master data management, the data
will be regularly pulled from the distributed systems, cleansed, matched, consolidated, and
enhanced in a central location so that it can then be published (pushed out) to the distributed
systems as master reference content. These efforts are evolving as key components of
IT architectures where each solution has greater and broader capabilities.
Moreover, the "toolbox" approach provides for simplicity. Having a platform, or set of
tools, that supports a majority of the processing needs reduces the installation footprint,
maintenance burden, training efforts, and operational complexity; and increases the sharing
of business rules and standardizes services offered. When combined with SOA, the toolbox
approach becomes even more powerful in three ways:
- Connecting to Web-enabled data sources is simplified. There is no
need for complex SQL scripts or knowledge of proprietary application or
database interfaces to access the data.
- Via the SOA connectivity, data management tools can be called from
other applications as a service, again eliminating the need for a
proprietary API to access the platform.
- No single platform offers all the functionality needed by an EIM
initiative. SOA allows for a blueprint to augment existing capabilities
with plug-in modules. Through this surrogate relationship, the platform
can serve as the larger framework upon which to build third-party
functionality when appropriate.
The platform leads us straight to the next ingredient: a complete solution for a given
chain of operations. Information management vendors that offer a single platform make
it significantly easier to add new functionality. All of the processing "overhead"—such as
grid computing, parallel processing, user interface (UI), rules and metadata repositories,
processing engines and so on—are taken care of by the platform. When practitioners build
out their EIM infrastructure, they look for solutions that provide them with the greatest
breadth to reduce data acquisition and provisioning time, complexity, and installation
costs. Moreover, the more complete the solution, the more efficient their development
efforts. Anytime a separate function has to be "stitched in" to fill a processing void, costs
increase and additional failure points are introduced. So the completeness of the solution
is not only about being the most functional, but also about achieving the lowest risk of
There comes a point where the functional boundary of the platform will be reached and
a handoff to the next application is needed. Unfortunately, a technology platform is constrained
by the elegance of its design and the amount of development resources applied to
it. It can’t be expected to do everything. For example, consider the migration from a source
system to an MDM hub to a data warehouse. No single, discrete platform today supports
the multi-functional capabilities of robust extract and transformation with operational data
reconciliation and analytical and query support. There are, however, world-class solutions
for each of these, and vendors are providing tightly-coupled integration solutions between
these separate applications and platforms. Such solutions can take a variety of forms from
predefined SOA calls to code-level callouts. Most often, the strongest integration between
separate applications will be within the product line of a single vendor, such as SAP: their
ETL product, SAP® BusinessObjects™ Data Integrator, integrates with SAP NetWeaver®
Master Data Management, which in turn is coupled with their SAP BusinessObjects
business intelligence solutions. One advantage to steering towards products with existing
external integrations is the practitioner can comfortably and incrementally expand
and scale the environment knowing that for the next component, the integration point
exists and has been tested.
Ease of Use
Ease of use is a common refrain from all business application users. All EIM (BI, ERP, etc.)
software should be easy to use. The judges are not IT, but rather, those people who have
to run the application as part of their work. Consider the wide variety of applications the
typical sales operations manager uses during the typical work week. First, there is the full
Microsoft Office suite of Excel, Word, PowerPoint, Visio, Outlook, etc. Then there is the
sales force automation solution, CRM application, and the web browser. With the plethora
of applications and increasing complexity of the modern workplace, ease of use in software
becomes a matter of personal productivity.
The pressures on IT staff are no different. IT management does not want to buy yet another
product that requires intensive training and significant subsequent practice. Neither IT
nor the business wants to invest in a solution that requires a high degree of specialization.
Ultimately, ease of use is about speed to return on investment (ROI). The faster a person
can learn an application, the sooner the organization accelerates towards profit and revenue
targets. Sadly, ease of use is the most overlooked of all EIM requirements, and yet the one
with the most measurable and tangible returns.
Information Management Software
The focus of the information management software discussion centers on applications and
technologies closely related to data integration. There are important reasons why an organization
is encouraged to consider starting with EIM:
- Companies across industries, particularly those accustomed to frequent mergers and
acquisitions, have heterogeneous data environments. Extracting value from those data
systems demands that their data be integrated; otherwise their data is isolated to the
few users and applications with access to those silos. Data integration is the core technology
for sharing data across the enterprise.
- Integration offers a relatively quick return on IT investment. It leverages existing data
systems to extract and move data to where it is needed today. To a certain extent, a
robust data integration strategy can overcome weaknesses in the existing information
architecture (deployed repositories) until newer repositories can be affected.
- The movement of data within an organization is constant and crucial to business
operations. Developing strong capabilities in this area increases enterprise agility that
improves the organization’s ability to react given unforeseen circumstances.
- Any time data is moved from point A to point B, there is an opportunity to improve
it. A common complaint by government agencies is they cannot change the data
because they don’t own the source systems. The information value chain within
government agencies can cover many departments with the original source system
beyond the span of control. Modern data integration technology solves this dilemma
by allowing data transformations on the fly, as the data is moved. The changes to the
data can either be saved or discarded, knowing that the next time the data is moved
the same transformation can be applied.
In essence, data integration is a key building block of EIM. Yet even the data integration
technology space is broad. There is ETL (extract/transform/load), EII (enterprise information
integration), EAI (enterprise application integration), database replication, and the
simplest of all, FTP (file transfer protocol). Surrounding the data integration space or
closely related to it is data quality, metadata management, text analytics, and master data
management. The master reference data process shown in Figure 3 highlights the interaction
of these technologies in a typical IT environment and shows how they fit into a major
The overall purpose of the above process is to collect data from the point of capture and
load it into an MDM system where a reporting or analytical application (BI) can access the
master data and provide an enterprise-wide view of the information in the context needed.
The ETL process is at the heart of data integration. Most often, data integration entails
the movement of data, not just accessing data in place. Moreover, as can be seen in the
MRD diagram, ETL can serve as the framework upon which other EIM functionality can be
included in the process flow. In the diagram, two different source systems—the backend of
an e-commerce website and the call logs for a warranty center—are accessed. One has structured
data in the form of database tables, and the other has unstructured data in the form
of text files. The ETL program will internally route the data to the appropriate transform,
one of them being a sophisticated text analysis (unstructured data processing) program that
is linked to the ETL application through SOA or an interface API. The ability of the ETL program
to interface with external programs is one of the requirements for information management
software. After the text analysis function extracts the desired data, the ETL program
takes over and merges the two disparate data streams into one structured data stream where
a myriad of transformations can be applied. This in itself is a major boon to EIM. In years
past, practitioners had to struggle with complex and convoluted processes to extract data
from freeform textual data and then compromise on how it was stored with structured data.
With 80% of the world’s data in unstructured data sources, an EIM strategy will eventually
have to address it.
Following the native ETL transformations, the ETL application can route the single data
stream to data quality processing in the same way it did for text analytics. However, more
data integration vendors are building single application frameworks that natively support
greater portions of the EIM domain, and the first easy step in this direction is embedding
data quality functionality. After data quality processing, the ETL application is ready to
load the cleansed data stream into the MDM solution. Typically, the data is deposited into
a staging area isolated from the heart of the MDM repository. For EIM, ETL has served the
crucial role of moving, transforming, and loading captured data from one end of the enterprise—
i.e., an order entry website—all the way to the corporate master reference data system.
Organizations will have different architectures, and some will have a data warehouse
in the process flow, but regardless, the technology of choice to perform data movement
Building trusted information is an EIM best practice. Organizations build and maintain
trusted data at every step in the data supply chain. The concept of the data supply chain has
no greater relevance than in the EIM context. Figure 4 shows how data quality technology
intersects with a classic data supply chain:
In Figure 4 we can see data quality operations exist at every major stage in the chain. Each
stage is an opportunity to create, enhance, or just maintain the level of trust in the data.
The sooner data quality issues are corrected in the chain, the sooner the firm benefits from
greater trust. For example, validating and standardizing data at the initial point of contact
with the customer, such as a website where they can enter their information, benefits every
downstream operation no matter how far-reaching the enterprise. You can multiply the
benefit by the count of all the subsequent operations that use the data. Conversely, the
longer an organization waits to cleanse and improve data integrity, the more upfront operations
are sub-optimized because of data defects impacting their effectiveness. Moreover, the
earlier the data is cleansed, the less the cleansing costs later on. The reason is the count,
type, and most importantly, complexity of data quality problems are less. Rather than letting
problems build up to the point where correcting them in the data warehouse becomes
a large task, tackling the issues as they arise makes each operation simpler. Following the
incremental improvement approach, data quality operations lend themselves to pilot project
implementations. Use the success of each pilot to build out the data quality infrastructure
as part of your EIM strategy.
Metadata is data about data. It tells us such useful things like when a table was extracted
from a data source, what transformations were performed on each field, what user ran the
transformations, when they did it, and where the data was moved. If the CFO wants to
know how his quarterly financials became corrupted, the IT director will be very interested
in the migration log tables to answer this question.
There are at least three general types of metadata3, depending on whose definition you use:
business, application, and database. Regardless of how you define the specific contexts,
metadata is the information a firm will use to decide on the usefulness of a given data set
in their decision-making and business operations. Data quality metrics that quantify number
of defects, percentages of blank or null fields, cardinality, minimum and maximum
values, and outliers against business rules are all metadata attributes that a data steward
will use to judge the information. Capturing, storing, and analyzing this information is
fundamental to building trusted information. Metadata management software must be able
to serve this function. Moreover, to be useful, metadata needs to be tracked backwards in
the information supply chain via data lineage and tracked forwards via impact analysis.
These are the two key operations of metadata management. Data lineage allows the CIO to
see where and when the data came from and what was done to it before being used in the
financial reports. Impact analysis flips the coin over and allows the IT analyst to see what
reports use a field of data that requires a calculation change. With this visibility, the analyst
can go to report stakeholders and notify them of the pending change before they find it in
Master Data Management (MDM)
At the apex of data integration software is MDM. As shown in the MRD diagram, two disparate
data sources are loaded into the MDM system. Actually many different source systems
may be involved. The MDM system reconciles (matches, standardizes, and consolidates)
new input data with its current master reference data, and then stores the master repository
in a data model flexible enough to support multiple hierarchies.
For certain, MDM is much more than technology, as it encompasses policies, practices, and
systems that create an infrastructure for collecting, storing, and managing master reference
data. However, no discussion on EIM software is complete without MDM software. MDM
software deployment can take four forms: Three of them are domain-specific (customer,
product, financial), and the fourth is a generalist version that seeks to support all domains
and comes with the necessary generalizations.
For EIM, an MDM system offers great advantages. It not only serves as the system of record
for customer or product data, collecting and consolidating it from all reaches of the enterprise,
including multiple data warehouses, but it also allows the data stewards to design
and create data models that roll up to hierarchies that can be adjusted at will for a specific
view. These views, or context-sensitive hierarchies, can be saved and used by different corporate
functions as their own operations dictate. Marketing, sales, and manufacturing can
all view the product hierarchy—from suppliers to chemical composition to distribution
channel—as needed. Then, when a hierarchy is placed "in production," the master data can
be published to subscribing applications, where it is either pushed or pulled out to downstream
Along with publishing to external systems, the MDM system—through SOA or another type
of integration—can serve as the hub or repository that provisions cleansed and reconciled
data to a business intelligence (BI) environment. Indeed, one of the "entry points" for
MDM software is often to cleanse and reconcile master data to readily support improved
reporting and analytics.
Organizations are facing increasing complexity in their operational and data environments.
New data sources, unstructured data, and more data than ever before are creating a
perfect storm of information overload (also known as "infoglut"). New regulatory requirements
for transparency and confidentiality add a layer of rules that compound complexity.
Customers’ demands for faster service and more relevant conversations stress front-office
applications, while parallel demands by internal users place even greater demands on backoffice
systems. And the technologies used to implement the environment are constantly
evolving and becoming more sophisticated, but not necessarily easier to use. Meanwhile,
competitive pressures are never-ending with the companies continually raising the bar
through their own adoption of information integration and deployment strategies.
All of these pressures have combined to render information management more urgent than
ever. Before your company discovers that its data quality and deployment practices have
been marginalized to the point of ineffectiveness, consider adopting EIM. Only through
holistic and systematic planning encompassing the best practices discussed in this paper
can your corporate data contribute to revenue growth and strategic fulfillment.
About the Author
Frank Dravis is a senior consultant at Baseline Consulting, a business analytics and data
integration services firm. Frank has twenty-one years of experience in enterprise information
management (EIM) and data quality solutions design, implementation, and consulting.
At Baseline Consulting he serves as senior consultant specializing in data integration,
data quality, and data governance solutions, advising key clients and industry vendors on
these and other technology strategies. Prior to joining Baseline Consulting Frank served as
VP of EIM Strategy at Business Objects/SAP where he researched and aided in the formulation
of EIM and data quality market strategies. Principle among those efforts was planning
of CDI/master data management in the EIM suite. As a benefit of the research Frank
delivered data quality best-practice advice and consulting to Business Objects’ extensive
list of industry-leading clients. He is a frequent writer, blogger and industry speaker on
EIM topics. Prior to Business Objects Frank held such positions as VP of Development
and VP of Information Quality at Firstlogic, Inc. where he led the IQ Assurance Strategic
Data Quality consulting program, contributing thought leadership and practice management
in addition to data profiling program management. Frank holds an M.B.A. from the
University of Wisconsin-La Crosse, and a B.S. degree in computer science.