If you receive errors when attempting to view this white paper,
please install the latest version of
"Our consultants can help you optimize your end-to-end
data migration by combining our expertise with our SAP BusinessObjects software to
integrate your data and improve the data’s quality, particularly when upgrading or
replacing legacy systems with SAP ERP. "
A Road Map to Data Migration Success
Data Migration Project
is also known as :
Road Map Data Migration Success
Data Migration Requirement
Successful Data Migration
Data Migration Projects
Implement Data Migration
ETL Extraction Transformation and Load
ETL Quality Project Plan
Managing ETL Strategies
Scope Extraction Transofrmation Load Effort,
Essential Needs Extraction Transofrmation Load,
Mapping Extraction Transofrmation Load Phases Tasks,
Cost Effective Development,
Free Documentation Extraction Transofrmation Load,
Successful Project Plan,
Essential Data Migration,
Scope Data Migration Effort.
Data migration is the one-time movement of data from a legacy source, or multiple sources,
to a new target database. This simple concept and requirement can drive a scope that is much
larger than expected. A data migration requirement can be driven by a range of initiatives,
such as an application replacement or upgrade, the need to consolidate data within a data warehouse,
or a requirement to create a single view of product within an organization. In cases where migrated
data is transformed for new uses, your project team encounters some very specific management and technical
challenges. For example, a team that has been writing extraction, transformation, and load (ETL) code for a
data warehouse faces a new set of challenges when migrating data to a live, operational system. Although a 2%
error rate may be acceptable for aggregate reporting, it is not acceptable for customer contact data—in
this example, we would fail to recognize one out of 50 customers when they call!
Many significant business initiatives and large IT projects depend upon a successful data
migration. Your goal is to minimize as much of your risk as possible through effective
planning and scoping. The objective of this paper from Business Objects, an SAP company,
is to provide insight into what issues are unique to data migration projects and to offer
advice on how to best approach them.
PLANNING AND SCOPING THE MIGRATION
As with any project, success depends upon a good plan. Data migration projects have a
distinct methodology and project approach. Too often, project managers make the costly
mistake of thinking that migrating data is a simple task within a larger project and hand
it off to a development team. Table 1 lists the migration tasks that must be considered.
Table 1. Migration Tasks
|Business needs mapping
||Ensure that your migration plan is driven by the expectations and needs of the users
||There are unique infrastructure requirements that should be sorted as early as possible in the project timeline
|Identifying data sources and targets
||Validate the system of record for your source data and identify the data needs of your target database
|Evaluating the data quality
||Assess the quality of your data to meet the target application requirements and business needs
|Gap analysis between source and target
||Identification, and plan for mitigation, of any gaps between available source data and target application data requirements
|Impact of multiple sources of data
||Estimate the challenges of consolidating similar data from several sources, or integrating dissimilar data
||Understand the effort required to accurately identify source data at column level detail, including transformation specifications
||Understand the effort required to design, code, test and implement the data migration
In addition to these tasks, the following sections detail two additional project management
considerations that are essential to any data migration - dependency management and data governance planning.
ALLOWING FOR DEPENDENCY MANAGEMENT
Don't forget that most risk is outside of your control or scope. Numerous other projects,
initiatives, and issues may influence a data migration project. Your project manager or team
leader must be a good communicator and be aware of the decisions and changes happening in the
larger environment. The following scenarios can change your plan and affect the scope of your data migration.
Scenario: Larger Projects
Data migration plans and schedules typically are driven by larger projects—for
example a master data management (MDM) initiative, a new business process,
or a data consolidation effort that supports a new reporting environment.
Any changes to the parent project schedule affect the data migration schedule,
including changes to production deployment, and the changes to test and beta deployment schedules.
The parent project may change its approach to implementation, such as moving from a
regional phased approach to a product phased approach. For example, in a cell phone company,
rather than convert customers by region, the team converts all customers internationally that
have the newest service plan, followed by a family service plan, and so on. What may be a
straightforward change for the larger project could exponentially increase the scope of the data migration effort.
It is not uncommon for resources to shift roles during a project. For example,
the priorities of the larger project could drive key players away from the migration tasks,
which are deemed less important than other development. Thus, your schedule is delayed and your risks increase.
Scenario: Know Your Data
The success of your migration project largely depends upon how well you know
the data content and business rules. Your data experts help with identifying business needs,
analyzing source systems, and testing, covering the full life cycle of your project. Experts
likely will be from the business and tasked to the migration project on an as-needed or part-time basis.
Scenario: Changing Source Systems and Business Rules
The source data and the business rules that govern it may change during the course
of your project. There could be other initiatives in the organization, new product offerings,
or business process changes that impact the data.
The source system itself could change, due to application or server upgrades. Application
upgrades are particularly risky, since the application features and data model of the source may be altered.
ESTABLISHING DATA GOVERNANCE AND STEWARDSHIP
Involving business users and data experts in the migration project is the single biggest
factor driving your success. Have you ever seen a data intensive project implemented
successfully without organized, committed business user involvement? In order to fulfill
your data migration needs, you must include a data governance strategy that provides leadership
and direction for user input. In your organization, you may not refer to the strategy as data
governance or stewardship, but regardless of what you call it, the key is that the data experts—often
business users—are involved in all data migration phases and tasks.
Your data migration project team must plan for the participation of data experts, set expectations
for resource commitments, and initiate the identification of measurable success factors. Project planning
must consider the two key roles for this team: 1) ensuring that the project team is aware of the business
needs being met by the target application, and 2) providing detailed data expertise and ongoing
monitoring (ownership) demanded by the new application. Team responsibilities include:
- Participating in the identification of "best" sources or "system of record" sources
- Actively reviewing, creating, and correcting data to support cleansing and reference (master) data requirements
- Monitoring data quality and data auditing status
- Sharing and capturing content detail about source system data and business processes
- Testing and confirming application functionality that is supported by migrated data
The approach to establishing a data governance team is specific to your organization. If possible,
leverage the talents of an existing team or committee. Often a data warehouse user's group is an
excellent source of data knowledge, or you may be able to leverage a team that supported a previous project.
Consider a metadata tool for the project team. The data migration project is a unique
opportunity to capture metadata about your source systems and to initiate a business process
for managing data in the target application. What can you leverage that you have in place now?
What are the requirements for managing the metadata and the business rules for the data migration
project? What are the requirements going forward? Consider:
- The need to share content knowledge and business rules for multiple team members
- Testing against identified business rules and mapping/transformation specifications
- Historical documentation (audit trails) of the decisions made, and logic behind them, for the data migration
- Historical documentation about the source of data, especially useful when merging data from multiple sources
- Identifying data owners and experts for the migration and future data
BUILDING THE MIGRATION TEAM
Table 2 lists roles and responsibilities for the planning and scope phase of the data migration project.
Table 2: Roles and Responsibilities for Planning and Scoping
|Project manager or team leader
||-Coordinating and communicating task status
- Managing issues
- Creating and maintaining project plan
- Managing dependencies to larger project, target,
and source application schedules
- Coordinating the creation of a data governance strategy and team
|Business users (data experts)
||- Identifying and validating business needs
- Analyzing target and source systems
- Assessing data quality
- Assessing data mapping and business rules for data transformation
|Business or data analyst
||- Infrastructure support, including database sizing, optimization, and life cycle management.
Works closely with the database analyst.
- Data support, including data analysis, data quality
assessment, mapping, and migration assessment.
||- Initial sizing for staging database(s)
- Planning for life cycle management of database objects
- Management planning for schema changes and user security
||- Architecture and methodology
- Source and target application data analysis
- Data quality assessment, mapping assessment
- Data governance planning
- Configuration management
- Configuration standards and quality assurance
|Migration lead (for planning and scoping
||- Architecture and programming standards
- Data quality, data mapping, and data migration assessment
- Source to target gap analysis and multiple systems impact assessment
Consider using consultants for key roles on a migration project, since it is a one-time,
custom effort and your team may not have the specific planning, strategy, or development
experience. Focus your project plan on growing the skills needed in your organization, which
may include future data migration, data integration, or data quality projects.
PROJECT PLANNING CONSIDERATIONS
The following sections address planning considerations for your data migration project.
MAPPING BUSINESS NEEDS
Business needs represent why you are migrating data. How many new applications go
live but do not deliver the business value expected? You must articulate the business
needs that are driving your project. In addition to ensuring that you are meeting the
needs of the consumer, you are also controlling the scope of a data migration project,
especially source systems analysis and mapping. At first glance, this may seem obvious,
but consider as an example the full functionality of a customer relationship management (CRM)
application. Organizations pick and choose from the menu of functions and features, and the
implementation may be phased over time. Rather than migrating all of the data subject areas of a
full customer database, the team can focus on only those tables and columns that support the business
needs of the CRM application. A clear understanding of business needs also provides focus for the sizing
estimates, data quality assessment, and test planning.
Additionally, as unexpected data quality and gap analysis issues arise (and they will), knowing
the business needs aids in identifying priorities for the project team. For example, if the customer
age is not fully populated, or has inaccurate data (230 years old), the team can reference the known
uses of that data to determine the extent of data cleansing needed. In another example, the medical
history for an automated patient chart must be of the highest quality when migrated. The business need,
which is driven by legal requirements, must be clearly understood by the entire migration team. Ideally,
the project team will include data experts and business users as part of a data governance strategy to
ensure that this process is as straightforward as possible.
A data migration project demands a dedicated infrastructure. Don't make the mistake
of using your existing development and test environments. For example, do not underestimate
the planning needed for a dedicated staging area or for load testing (yes, load testing!)
You might be converting many years worth of data in a single weekend. Far too many organizations
wait until the beta or production conversion to discover performance issues that can set the schedule
back by weeks or months.
What is your current architecture? Do you have data quality and data integration tools
needed to accomplish a data conversion? Is there a database or server available to host your
data and code? When and how will you test the full conversion to ensure that your performance
will meet the schedule needs? The team should evaluate what servers, databases, and software
tools are available to host a migration project. Leverage what you can and do not reinvent the wheel.
- Requirements documentation and capture
- Metadata and data integration mapping repositories
- Data integration tools
- Data quality tools
- Code management tools and processes
- Test tools and processes
Given that the data migration is a one-time project, you should evaluate what code
management processes can be bypassed or customized for this effort. This may sound like
a surprising recommendation, but consider the environment surrounding the data migration
project. Your schedule is not your own; rather, it is dictated by the larger project. Also
the priorities and phased approach can change at any time. The data migration team needs to be
very agile. The approach to analysis, coding, and testing should be lightweight and mutable.
The dedicated infrastructure for the project should be planned as part of your scope phase.
Do not make the error of waiting until the development phase of the project to discover that
you need additional resources for integration testing, such as servers or new database instances.
The planning for this infrastructure should include:
- A staging database for analysis and development. Impact on production services
is minimized by the use of a separate migration server and by the use of configured staging
and process areas within the service to manage multiple data versions, sources, and status values.
Your team can also use the staging area to created needed reference data for the target system.
- A sizing estimate for the staging database, which includes any needed replication required to
support both migration code development and application
- A code life-cycle management plan
- Performance test requirements
- Backup and recovery planning
- Scalable connectivity requirements and approaches (for example, how will you extract and load data to the
source and target systems?)
Planning for Testing
The test requirements for a data migration project are unique, as you may have multiple
conversions during parallel and beta testing of an application. Adding to the complexity
is the fact that the migration team is supporting the application test requirements of the
larger project and also testing the conversion code itself.
Your migration team needs to first consider the basics of audit and reconciliation. This
is missed more often than it should be. Are you looking at the right set of data? Can you trace
back to your source system at a record level? Include basic record counts and validation of ETL
processes. Many of these processes will be accomplished multiple times, as testing and re-loads occur.
The audit statistics need to be reported and evaluated each time. Also, if the project is a phased
implementation, you may be running similar data across parallel databases. Audits and reconciliation
are needed to ensure that the data is consistent across these applications.
You should also plan for performance testing. Many projects are delayed because the
development team has successfully converted a test set of data but waited until the last
minute to run the conversion of the full set. A customer consolidation project for one of
the largest global computer manufacturer's was delayed by months when the development team
attempted to convert (at go-live!) a year's worth of customer data - using code that had been
developed using a day's worth of test data.
For your conversion, there may be millions of records to process, with complex referential integrity.
Performance issues for data processing at this level of magnitude can require significant code changes.
What to do? Estimate your load and include this consideration when unit and integration testing.
Participate in peer reviews, or contract with experienced consultants, to address code optimization,
data query tuning, and database tuning challenges. And include these options in your earliest planning.
The primary reason for capturing the metadata around your business rules is to test the mapping and
transformations. All phases of testing, unit, integration, and target application, should include a
validation of the business rules.
As early as possible in your migration schedule, allow the business users to
test the data in the target application. Does it support the features and functionality
as expected? Data migration mapping is very abstract, especially if the users have never
been part of a similar project. They may not "get it" until the converted data is actually
supporting the application features and processes that they are implementing. Allow time in
your planning for changes to the mapping, business rules, and re-loads.
It is not uncommon for there to be confusion during beta testing and implementation as
to where errors originate. It's all too easy for "bad data" to be blamed for every test
error. What can you do as part of your planning to avoid this? Implement the migration with
as much metadata and reconciliation as possible. Not only are you ensuring the integrity of
your data, you are creating the lineage necessary to support the integration test process.
Data migration is, in fact, all about the data. Data related tasks, especially those around
quality, will be the bulk of your work. Plan enough time to evaluate the source data, to
compare the available data to the needed data, and to drill down to the detail needed for
source to target mapping. Your goal is to ensure that the data migration development is the
shortest task on your project plan. You must understand the source data and target needs before
you start coding. You don't want to experience the very costly surprise of learning that the source
data is not "fit to use" at integration testing or implementation.
Evaluate Data Sources and Targets
As part of the planning process, your team will take an initial look at the source and target systems.
The objective is to know enough about each system to make an estimate of the effort to complete mapping
and migration. At a high level, the team will identify subject areas, data entities, and supported
functionality. For example, customer address data will come from the billing system, customer phone
contact data will come from the data warehouse, and so on.
The planning team needs to look for the following risks and complexities:
- The commonly agreed to, or popular, system of record may not be the best source of data.
Consult as many data experts as possible and look for reconciliation statistics, compliance numbers,
and other forms of audit to identify a trusted source.
- Multiple sources increase your schedule exponentially. Never assume that a second source of data
will make your project twice the effort. If you have one source of data you may have to look for duplicate
data, but if you have two sources you have to look for duplicate data, harmonize reference data, and run
record-matching algorithms to determine correct mapping. (A requirement for sharing master data across
multiple applications is a project in itself. The need to harmonize reference data is complex enough to
warrant its own white paper.)
- Certain types of projects introduce their own challenges. For example, the approach to managing a
data migration that is part of an MDM project will be distinct from other kinds of migrations, with a
focus on dependency management, data quality, and data governance. An MDM implementation requires
initial data migrations, but also ongoing integration and reconciliation. The business rules, matching
criteria, and governance processes that are established for migration will be propagated to support the
Also, mergers and acquisitions require data consolidation, with a focus on reference data
and business processes (that are represented in the data structures and content). Even if the
two organizations have the same software the mapping may not necessarily be straightforward.
Another challenge is that a data warehouse migration can seem less complicated because the
data is relatively static in the target database; however, the expectations of the business
users can be extensive. In a well-designed warehouse, one data mart will support multiple
business units. Also, the data migration process is likely to be the initial load of an ongoing
ETL process and there should be reuse of many components.
- Conduct a gap analysis. The features that the business expects from the target system
may not have a data source at all, and the migration team may be the designated messenger
to carry this bad news. For example, in an MDM customer data project, one region out of five
has never captured customer contact history and, when consolidating the customer data, a
subset will not have a full history. The business requirement to "establish customer
segmentation based on contact history" is seriously impacted.
Your planning should include the time needed and the approach for resolving gaps in the
source to target data mapping (and a cost benefit of the effort). There are several
options, and each will increase the scope of your project:
- The business users and data experts could manually create data. This is not uncommon when
creating reference data, such as product types or customer status. The project plan will include
the time needed to work with the governance team and facilitate the data entry, updates, and testing.
- You may have the option to source data externally—for example, using Dun &dstreet data to create
more consistent customer data.
- You could default the data to an unknown value and plan to populate it over time, as part of
the target application. Using the customer segmentation example above, if it requires a year's
worth of customer contact history, the business will accept the 12-month delay before having a
100% correct segmentation calculation.
Perform a Data Quality Assessment
The biggest factor in your success—and your biggest risk—is the quality of the source data.
Many projects fail, or are significantly delayed, upon discovering that the data is not "fit
to use." The most important phase in a data migration project involves the tasks needed to
understand the data content. You would not put dirty petrol in your new car—why would you map
bad data into your new software application?
It's unlikely that anyone on your team really understands the current state of the data at the
level of detail needed for your project. And even if the source data is pristine, that doesn't
mean that it is fit for the requirements of the new application. Experience shows that unexpected
data conditions encountered in load programs can significantly impact the timelines of data migration
projects. Early assessment of data quality reduces project risk.
Your team will do an initial data quality assessment as part of the planning process. The
sooner that you look at the data, the more likely your success. For this initial pass, you
are looking for the big-ticket items - missing data, data that is not fully populated, columns
names that are inconsistent with content, and so on. Your goal is to know enough to estimate the
level of effort to get the migration job done.
You have done a review of business needs and this will drive the type of data you review.
For example, if the objective is to create a consolidated view of product data across six
regional SAP® applications, you may pull a subset of product data from each and attempt to
match them. How good is the data? What are the challenges? This should give you a pretty clear
idea of the project requirements for mapping and migration. Another example is pulling customer
data from two systems, such as billing and CRM. You want to run some high-level reconciliations
and audits between the two systems, looking for deltas. What if you have customers who have products
but they're not getting billed? It has happened! What if you have historical data for billing that isn't
represented in your CRM application?
When capturing metadata, a tool is a good option - but when assessing data quality, a profiling tool
is a must. Use data profiling technology to systematically scan the tables of interest to quantify
any data quality and data integrity issues, such as:
- Missing values
- Pattern analysis
- Frequency analysis
- Ranges and outliers
- Redundant data analysis
- Foreign key integrity analysis
- Statistics relevant to numeric fields
- Parsing requirements to enable mapping of source columns to the target columns
Perform a Data Mapping Assessment
Hand-in-hand with data quality is data mapping. What are we moving and what are the rules? How much
time will it take the team to complete data mapping? Given your planning, you should have a general
idea of what it will take to create the final design and implementation. The captured business needs,
the source and target system metadata, and the data profiling results from the data quality assessment
all create the information need to understand the mapping effort.
Much of the mapping will be data analysis work, if done right. It is not uncommon for teams to
accomplish this mapping based on guessing which column names match in a spreadsheet, or worse,
jotting a few notes down on a hardcopy of the DDL listing before coding. Be aware that there are
development teams capable of this behavior.
Where possible, identify the data challenges. One example illustrates how complex your
mapping can become. An insurance premium discount for a new membership system is stored
in a single column on a single record. However, in the custom source systems, it is stored
in multiple locations and averaged according to complex business rules. These rules are
captured in the source system code. There are no data experts who understand fully how the
logic works. The data migration team must recreate this logic in the data integration code
in order to populate the target system value. To make it even more complicated, the data conversion
is an ongoing process that happens at policy renewal, and getting the calculation wrong will cost the
company money. If the discount is too large, the organization loses money; if it's too small, the
organization may lose customers. Finally, there is the importance of capturing this logic in the
metadata or design of the data migration for auditors, actuaries, and the membership support teams.
Some of the things to consider when estimating the time it will take to accomplish data mapping are:
- Know that some of your target requirements will require significant transformations, or the
creation of new data content. For example, the mapping scope should include the effort to create
reference data for the target application, such as product types, customer status, and vendor
segmentation types. It's not uncommon for the team, working with the data experts, to create
these and set up some cross-reference tables to support the conversion of the data. This should
be planned early in the project, and data quality validation should be included, such that all
records are converted.
Also, matching and consolidating is required when sourcing from multiple systems. It could
be that you need to make a match decision on the best record from multiple source systems,
rather than having a single system of record. Or that your reference data is so in your target
system that you need to combine multiple records to make a "best of" instance.
Data quality matching could also be used to eliminate duplicate data within a single source.
This logic could be implemented to select a best of record that will later be evaluated by a
data expert. The logic and the process to identify the ‘good record' should be part of your mapping specification.
- Allow time to look at the data. Don't allow your team to fall into the trap of mapping based
on column names, copybooks (remember those?), or DDL. For each subject area, you should allow at least
a week of time for the analyst to gather information about the meaning of the data, how it is
populated (data profiling), and how it supports the target application.
- For high priority data content, the team should learn as much as possible about the business
processes that create the data and the target applications business processes. Data experts and
business users are an invaluable source for this information. For example, a patient record in a
hospital radiology application has some very basic requirements for radiologist review. There may
be scheduling and workflow included in the application. If that data is mapped to an "automated patient
chart," it will be used by physicians other than radiologists to make broader diagnoses and evaluations.
What are the differences in these processes—and does the data support the new needs represented by the
automated patient chart?
- The planning should include the evaluation of a mapping tool, or mapping functionality within a
data integration tool. Keys things to look for are the ability to easily connect to and analyze the
source data as part of the mapping process, the ability to share analysis results across a larger team,
and how well the mapping drives the development of code.
- Include validation and reconciliation steps in your mapping.
- For any matching that you do, or any consolidation, verify that each source
record is present in the source systems as designed. Each subject area mapped must
have a reconciliation base to the source, validating that all records extracted were
processed. For example, if I have 2,000 customers on my billing system, then I should
have 2,000 customers on my new CRM system.
External to the migration process, you should also have a process that is designed for a checks-and-balance approach.
Another approach is to create a high-level mapping to a database that is wholly external
to the migration process, thus creating a further audit validation. For example, you may map
your in-patient data in an automated chart application to hospital admissions data that is captured
by a financial system. Is this overkill? That entirely depends on your business user community and
the extent to which you want to guarantee trusted data.
Perform a Data Migration Assessment
Effective planning will allow you to mitigate the risks associated with your project.
If you have done the work of scoping the infrastructure, the mapping, and the data quality
assessment, then the migration itself will be your shortest task. When it comes time to migrate
your data, the people and the processes will make it work. It's more a team effort than another
other type of IT project. Throughout the migration process, you will have checks and balances in
place, including a team of data experts available to address the expected data content decisions and
to help resolve the unexpected. This can include team members who are clearly responsible for tracking
set of audit and reconciliation metrics to evaluate:
- Do the record counts match from source to target through each phase of your processing?
- Can you reconcile counts and reports to other systems in the organization that host the same data,
for example numbers of active customers, or product sales. This is especially important if you are running
systems in parallel. Can you track
the deltas between the parallel systems and can you understand them?
- Do the data experts have access to review the data for quality and consistency?
- Can you create an audit back to your source system that would satisfy financial compliance requirements?
- An approach to a phased project and how to keep parallel systems in synch as you migrate forward. This can
include ongoing integration or replication between applications for the duration of the project.
Agility is key. This is a custom (bespoke) development project and you should apply the rigor that you would
to any other development project. However, keep the implementation approach as lightweight as possible.
There are many dependencies to other projects and applications within the organization. Plan for the
inevitable need to change scope, schedules, and roadmaps as quickly as possible.
It‘s not unusual for a new application to source from multiple databases,
flat files, or external feeds. Additional sources will increase your complexity
exponentially. Consider your approach to rationalizing the data across these sources.
Also what is your cutover strategy? And how will these sources impact your schedule? If you
are de-duping or consolidating data across sources, consider what your record-matching requirements
are—and be certain to evaluate your data integration software with this in mind.
Finally, map all of your data integration requirements and data quality needs against an off-the-shelf
toolset. Simple things, such as populating a gender column based on customer name, come out of
the package, and save significant amounts of time. The more complicated needs, such as matching
and consolidation, are infinitely easer using dedicated tools. Don't make the mistake of thinking
you are saving money by hand coding—you are risking your schedule and the quality of your product.
Perform a Post-Migration Needs Assessment
Finally, as part of your planning process, do not forget the needs of your team going forward. These include:
- The need for archiving data that is not converted, or needs to be reported on in its historical structure.
- The ongoing reconciliation and balancing between the new application and parallel applications.
For example, if you implement a new customer data mart, it should include a frequent reconciliation back
to its source (lineage) and to another unrelated customer application (trust).
- It's likely that the data governance team has determined resolution of a set of data quality issues
can be deferred post-production. Plan to capture as much detail as possible over the course of your project.
DATA MIGRATION SUCCESS
If you have successfully implemented your data migration, you should have data governance, standards,
integration, and quality processes in place. Leverage these on your next project. Use the
migration effort as a route to growing your overall information management skills, which
will make subsequent data projects easier, faster, and cheaper.