Data Migration
Data Migration
Data Conversion
And Migration
Proprietary Notice
All trademarks, services marks, trade names, logos, icons and other intellectual property rights (the
Intellectual Property) are proprietary to Key Management Group, Inc (KMG) or used by KMG as permitted
through contractual arrangements with respective proprietors. The content of this testing process shall at all
times continue to be the property of KMG/ proprietors and will be strictly protected by copyright and other
applicable laws. Any usage, reproduction, retransmission, distribution, dissemination, sale, publication or
circulation of the said content without the express prior written consent of KMG is strictly prohibited. Nothing
in this document shall be construed as granting by implication or otherwise a license or a right to use such
Intellectual Property, or indulge in reproduction or decompiling or sublicensing such Intellectual Property.
The users/clients expressly recognizes, admits and confirms that all the IPR rights in the development of this
testing process is the sole property of KMG and the user /client shall not claim or in any way assist, aid,
permit or abet others in claiming any right title or interest there in or thereto. If required by KMG the
user/client shall cooperate in having KMG register the IPR in this documentation, if deemed necessary.
Unless otherwise acknowledged, the entire IPR (Intellectual Property Rights) in this testing process and
contents thereof absolutely vests in KMG. All rights reserved.
Table of Contents
1
2
3
4
Introduction .................................................................................................................. 4
About KMG .................................................................................................................... 5
Overview ...................................................................................................................... 6
3.1
Conversion and Migration Process ................................................................................... 6
3.2
Guiding Principles/Assumptions ..................................................................................... 7
Planning ....................................................................................................................... 8
4.1
Identify Migration Requirement ..................................................................................... 9
4.2
Identify Team and Stakeholders.................................................................................... 10
4.3
Identify System Environment ....................................................................................... 12
4.4
Create Project Schedule ............................................................................................. 13
4.5
Configuration Management Plan ................................................................................... 14
Analysis ...................................................................................................................... 15
5.1
Source and Target Data Profiling .................................................................................. 16
5.2
Data Mapping .......................................................................................................... 17
5.3
Data Cleansing ......................................................................................................... 18
5.4
Design Migration Architecture ...................................................................................... 19
5.5
Build Converted Data Test Plan .................................................................................... 20
5.5.1
Error Handling and Auditing Requirements................................................................ 21
Convert ...................................................................................................................... 22
6.1
Design & Develop Migration Tool ................................................................................... 22
6.2
Pre-test and Recalibration of Tool ................................................................................. 23
6.3
Configure Staging Area ............................................................................................... 24
6.4
Execute Data Conversion ............................................................................................ 25
6.5
Validate by Test Plans ............................................................................................... 26
Migrate ...................................................................................................................... 27
7.1
Develop Migration Statistics ........................................................................................ 27
7.2
DATA LOAD in Target System ....................................................................................... 27
Deployment phase ......................................................................................................... 28
7.3
Validation - Target Application ..................................................................................... 28
7.4
Target System Implementation ..................................................................................... 28
Risks and Mitigation ....................................................................................................... 29
1 Introduction
From time to time various Business organizations implement new Software Application System to replace the
functionality currently delivered by one or more legacy systems. Complications arise when there is an attempt
to take the information currently maintained by the legacy system and transform it to fit into the new system.
More often, the data structure of the legacy systems is different from the new application being
implemented, and that difference is not just limited to the table names, field names or attributes or sizes.
The types of databases are different and diverse, or the entity relationships definitions in the new system are
not compatible with the older legacy application. To the business organizations all the data being held in the
legacy system remains critical for their business functions and decision making.
To bring the legacy system data to the new application some Data Conversion must take place, where an
initiative, separate or concurrent with the implementation of the new application, is undertaken to convert
data from one structural form, used by the legacy application to the structural from required by the newer
application .
Often in a Data Conversion process, one would tend to think that any two similar systems that maintain the
same sort of data, as they are doing very similar functions should map from one to another without much
trouble. But that is not really the case as -
In Legacy systems, historically, data integrity checks were not strictly enforced, leaving orphan data
Theoretical design differences exist between hierarchical and relational systems.
Legacy data may require some data cleansing.
Therefore it is important to have a sound, methodological approach by which organizations can undertake
Data Conversion projects, which will help to confront unpleasant surprises on later stages and resolve those
issues fast and effectively.
Key Management Group. Inc. (KMG) often helps various business organizations to implement the new Software
Application, especially in the area of Property and Casualty (P&C) Insurance and Enterprise Resource Planning
(ERP) package software. To help our business clients, KMG designed and developed a methodology and
adopted that for such Data Conversion projects.
This document attempts to document that methodology, which it does by stating various processes to be
performed at various stages of Conversion and Migration of Legacy System Data, and the Roles and
Responsibilities of various stakeholders and participants in the projects.
Though over the years, this document was reviewed and refined and republished by KMG, based on the
experience of many such projects, it is important to note that the methods and processes described within
this document is generic in nature. Each project presents its own challenges and opportunities and KMG
personnel engaged in the respective project design and develop, add or modify steps within the broad
framework of this methodology, for the benefit of that specific project.
4
2 About KMG
KMG was established in US in 1990 and is among Top 100 outsourcing companies in US. It is listed in the top
100 outsourcing companies in the world, top 10 fastest growing Indian-owned companies in the US & among
top 50 software companies in India. It has a Dun & Bradstreet rating of Good- 2A1.
Legacy
Web-enabling
Application
Support
New
Development
Services
BPO
Testing
Business
Analysis
3 Overview
3.1 Conversion and Migration Process
Data Conversion project activities starts with Planning, leading to Analysis and Design, progressing to
Conversion of Data, finishing in Migration - where converted data loaded in the target system database.
Following figure provides an overview of all the processes involved in legacy data conversion process
Figure 1
All the tasks to be conducted at various stages of a Legacy Data Conversion projects and the guiding principles
for execution of those along with roles and responsibilities are described in the subsequent chapters.
4 Planning
To achieve the goal of a successful Legacy data migration through conversion, it is imperative that a lot of
upfront planning happens prior to that move of data, irrespective of the complexity of the task.
In this stage KMG, in conjunction with its client, prepares a plan, which is also intended to shorten the
duration of the Conversion/migration process and also reduces business impact and risk.
The Migration Plan, which is the end result of the planning process by KMG, defines
-
Requirements i.e. what data is moved, where it is moved, how it is moved, when it is moved, and
approximately how long the move will take
Team i.e. the Users, Business Analysts, System/Data Analysts, Testers, Target Application Users
Environment under which the Legacy (Source) and the Target Application system operates, What will be
the Data Conversion Stage Area
Configuration Management Plan to maintain control over changes needed due to various reasons
The other objectives of this planning process by KMG is to design and document resolutions for events like
-
Application downtime,
Performance degradation,
Data corruption/loss.
KMG would initiate such project by deploying a Project Manager and an experienced Analyst to gather
information and develop the Migration Plan
Location
Client On-Site
KMG
CLIENT
Participants
Project Manager
Business Analyst / SME
CIO / CTO
Project Manager
Business Analysts
Technical Lead (Legacy System)
Technical Lead (Target System)
Output
Responsibilities
Business Analysts
Technical Lead
(Legacy System)
Technical Lead
(Target System)
10
Responsibilities
Database
Administrator
System / Network
Administrator
Project Coordinator/
Manager
If RDBMS being used in either or both of the Legacy and Target system, DDL
could be generated to identify Table and filed properties
Create and Maintain Test Database for the Target System and Staging
Environment for the actual conversion
Provide help in understanding the Legacy System Environment and Target
System Environment
Help to setup up Staging Area (Servers/ network)
Connectivity between Legacy System and Staging Area
Execute the process of DATA LOAD in target environment
Manages the Data Conversion Processes and repo/rting
Review and if required modify, develop the Migration Plan
Coordinate and Provide necessary resources as per the Migration Plan
Assign the right Technical resources
In KMG - all projects are overseen by a Project Owner (Normally a AVP/VP of the company). Often in KMG
project the PM and the Technical Lead have interchangeable skills and act as backups for each other.
The backup developers continuously shadow the main team members. They are as good as any other
developer in the team and can take over from anyone at a days notice. These people are used in cases of
attrition as well handling spikes in the load (if any). These people also make sure that the work is not
disrupted if any member goes on vacation. A fresh person is added to the team as backup the day any backup
is absorbed into the team
11
Same details would be gathered for the Target System. Such information helps to determine whether the
Legacy System data need to be converted to another format (like EBCIDIC to ASCII) which increases the
complexity.
Apart from the ones mentioned above technical information is gathered with respect to the Network in which
the Client organization hosts the Legacy System and Target System and operates those and other Hardware,
servers used by the organization.
There other technology considerations, such as:
a. How old is the operating system(s) under which data is to be migrated? Some migration tools do not
support legacy operating systems.
b. What staging area requirements are present, given current technologies and data migration requirements?
c. Whether Client would need or want the option to recover quickly from the source disk, or to fall back to
the original storage device as a fail-over? This is to determine/design both procedural and technological
ways to accomplish that.
d. Is a central console needed to manage data migrations across multiple servers?
e. Is there a need to control the data migration from a local server or a remote server? If remote, which
protocols must be supported?
f. Is there a requirement to throttle or control data flows between servers?
g. Which storage tiers are involved?
h. Whether any change in the Target System model is likely or not? Should that is possibility that should be
taken into account to analyze the consequences of the change in the eventual conversion process.
Based on the information gathered in response to the queries, listed above, KMG will enhance the
Requirement definition. All these information will greatly influence in determining the Conversion Framework,
method and ETL tools to be used or not.
12
KMG Project manager develops the schedules in consultation with the Client and the Project team member,
taking their experience and issues into consideration.
Project schedules would be revised based on the actual time taken to resolve complex issues and procurement
of necessary hardware, software resources and team resources.
13
A Change Control Board (CCB) could be constituted, consisting of KMG Project manager and Client Project
Owner, to review any proposed database changes and its impact. This would be critical to ensure
communication between the members of the project teams whether software development team, Data
Analysts/ SMEs.
KMG will review and recommend, if required, Business freeze in multiple areas which could be a critical and
required component of such Legacy Data conversion effort. Such requirement would be explained and
reviewed with business data owners, management and auditors.
Business freeze requirements should be addressed in detail through a separate document and circulated well
before the timelines - established for the cutover plan.
14
5 Analysis
In this methodology, the first stage of data conversion and migration is Data Classification, by creating Data
Profile for the data elements used in Legacy System(s) and Target Application.
Figure 2
During the Analysis phase of the Project, KMG produces various documents which serves as the specification
guideline for the Execution (Convert and Migrate) phases of the project.
Location
Client
Participants
Project Manager
Business Analyst
Data Analyst / SME
Technical Leads
Output
CIO / CTO
Project Manager
Business Analysts
Technical Lead (Legacy System)
Technical Lead (Target System)
Referential Rules
Validation Rules
Cleansing Requirement
15
KMGs Data profiling process consist of three sequential steps with each step building on the information
produced in the previous steps. Data sources are profiled in three dimensions: down columns (column
profiling) ; across rows (dependency profiling); and across tables (redundancy profiling).
Column Profiling. Column profiling analyzes the values in each column or field of source data, inferring
detailed characteristics for each column, including data type and size, range of values, frequency and
distribution of values, cardinality and null and uniqueness characteristics. This step allows analysts to detect
and analyze data content quality problems and evaluate discrepancies between the inferred, true meta data
and the documented meta data.
Dependency Profiling. Dependency profiling analyzes data across rows comparing values in every column
with values in every other column and infers all dependency relationships that exist between attributes
within each table. Dependency profiling identifies primary keys and whether or not expected dependencies
(e.g., those imposed by a new application) are supported by the data. It also identifies "gray-area
dependencies" those that are true most of the time, but not all of the time, and are usually an indication of a
data quality problem.
Redundancy Profiling. Redundancy profiling compares data between tables of the same or different data
sources, determining which columns contain overlapping or identical sets of values. It looks for repeating
patterns among an organization's "islands of information". Redundancy profiling identifies attributes containing
the same information but with different names (synonyms) and attributes that have the same name but
different business meaning (homonyms). It also helps determine which columns are redundant and can be
eliminated and which are necessary to connect information between tables. Redundancy profiling eliminates
processing overhead and reduces the probability of error in the target database.
KMG believes that developing an accurate profile of existing data sources is the essential first step in any
successful data migration project. The most significant problem associated with this phase could be if there
are frequent changes to the Target Application System Data model. Any change in the target system model
would have to be taken into account to analyze the consequences of the change in the eventual conversion
process. This renders the whole process to be iterative until a point wherein there is a freeze on the Target
System Data structure model or a complete understanding of the legacy system has been reached.
16
Data Map (Transformation specification): At the end of the data mapping process, a detailed document
would be in place that would show the target field identified for each legacy field. Apart from identifying
the target fields to which the legacy fields are mapped, mapping specifications define the rules to be
applied in the conversion process. These rules are commonly known as Transformation rules.
Referential Rules for Integrity check and domain values permitted,
Validation rules Application specific Business rules
Data mapping is an iterative process. For any change in the design of the target system or change in rule for
setting a value of a particular field there is a need to amend the mapping specification reflecting the changes
in Transformation rules.
The resulting Data Map (Transformation specification) document would be used later
a. In conjunction with third-party data migration tools to extract, scrub, transform and load the data from
the old system to the new system or
b. Develop a customized Application system to convert the Legacy / Source system data into Target System
data model. This will provide essential information to the programmers creating conversion routines to
move data from the source to the target database.
17
KMG believes that Data Cleansing is critical to the success of any Data Conversion and Migration project. If not
undertaken then business processes will not operate as designed. Data cleansing always takes more time and
more resources than anyone anticipates. It is for this reason that data cleansing efforts will be launched as
early as possible to make subsequent phase of the project easier and avoid delays.
Data cleansing can be accomplished in two different ways.
a. Cleansing at the source: This involves the cleansing directly in the production of the existing legacy
system or systems. The main advantage of this approach is the exponential reduction in complications. It
makes conversion process very simple.
b. Cleansing through external means: This type of cleansing is generally accomplished by spreadsheets. A
report is sent to the persons responsible for data cleansing with data that is required to be cleansed. Care
should be taken that the spreadsheet contains enough data to make the businessperson understand what
they have to cleanse. It should also cater to the technical requirements so that it can be incorporated in
the data conversion process with ease. Spreadsheet design is of utmost importance as it can be
problematic, if at a later stage, a deficiency were to be identified with the basic design of the
spreadsheet which results in an inability to accommodate it in the conversion process.
KMG, suggests that data can be cleansed in the source system or in a staging area. Due to audit requirements
and for ease of cleansing, it is recommended that all data cleansing be performed in Legacy Systems unless
not viable.
KMGs approach for data cleansing includes working with Client Integration/Functional teams to define /
execute automated data cleansing based on the findings of the analysis phase. Multiple cycles of cleansing
could take place. Extracts are made periodically for validation of Cleansing activities and can be handed off to
the Technical Development team for sample loads into Staging area.
Key Participating Groups
a. KMG Data Conversion and migration project team
b. Client IS team - Integration/Functional
Deliverables/Outputs
a. Modified source data that increases the success of automated data conversion
b. Control metrics
c. Data Cleansing Requirement Specification (/ Recommendation)
18
Figure 3
The generic framework for a Data Conversion process consists of following steps
a. Data Extraction Read and Gather data from source data store(s) into another storage, and if required
converted to the data format of the Target System (e.g. EBCIDIC to ASCII) and loaded in Staging Area
b. Validation and Cleansing to confirm content and structure of extracted data in light of business rules
and fulfills integration rules based on the referential rules of Target System. Data Cleansing is performed
at this time based on requirements identified during Analysis phase.
c. Transformation - convert the extracted data from its previous form into the target form. Transformation
occurs by using Transformation Rules defined in Data Map (Transformation specification) and lookup
tables.
d. Validation -Target System - confirm content and structure of transformed data is valid for target.
e. DATA LOAD - Write the data into the target database, either through script or copying data using system
utilities.
Steps (a), (b), (c) and (d), shown in figure 3.0 above, are part of the Convert phase of the Data Conversion
and Migration methodology. Step (d) i.e. DATA LOAD is part of the Migrate phase of the project.
KMG makes necessary changes, depending on the specific requirement of each project, to the tasks as defined
in the various steps of the framework and create the executable process for Conversion of the Legacy Data
19
At this stage, KMG, in consultation with Client IT team, analyze and determines whether
- To use a data transformation tool to manage their data Conversion/migration effort? or
- Design and Develop an application system specifically designed to fulfill that specific conversion
requirement based on the Data Map (Transformation Specification) by a group of technical experts?
KMG also determines, based on some of the information and Client requirement whether to execute the
Conversion process in all at once or move data over through a controlled phase of multiple releases? KMG
analyze pros and cons to both options, considering which approach will best fit for the project based on
organization needs - to be evaluated on a variety of factors, like how much data involved and requirement of
the Target System.
Physical errors are the result of syntactical errors of the scripts / programs, which can be easily
identified and resolved.
Logical errors these are identified and resolved during Test phase. Such errors are result of the quality
of the mapping effort. During Implementation/Testing scripts/programs developed based on the Data Map
i.e. Transformation specification, are executed.
Based on the on the Data Map (Transformation Rules) KMG, creates Test Plans in which KMG identifies Legacy
System data element and determines the Target system element and the expected results based on set of
extract to be used for testing. This test plan is to be prepared for each of the data element being converted.
Response to the following queries would be gathered and verified by Testing team
a.
b.
c.
d.
e.
f.
How many records were expected to be created by the scripts being tested?
Did the correct number of records get created? If not, why?
Has the data been loaded into the correct fields?
Is the data load complete or are certain fields missing?
Has the data been formatted correctly?
Are any post-migration clean-up tasks in order?
The goal of a successful data migration is to keep the length of the deploy phase(s) to a minimum.
During Pilot /Testing, KMG would determine the quality of data mapping, by providing the populated target
data structures to the users that assisted in the analysis and design of the conversion scripts/application
system.
That would help Client Integration/Functional team to understand the data and would allow the user to
physically interact with the new, populated data structures of the Target System.
20
Record counts from the legacy input data, records with errors, records without errors, and total records
Record counts for the converted output data
Counts for the anticipated number of transactions
Date and time of the start and end of the run
Any codes that were not found in the crosswalk table, and the record in which they were encountered
Any fields missing or in error and not found in the conversion table
Summarized dollar values comparable with legacy system subtotals/totals.
This will help maintain the Conversion tables, enable Client functional teams to verify that the data loaded
match the data extracted. This will allow the KMGs project Team to estimate the execution time required for
future conversion runs and provide an audit tracking mechanism.
21
6 Convert
6.1 Design & Develop Migration Tool
The data conversion process can be accomplished by the following methods:
Using a data conversion tool.
Scripts developed specifically for the purpose of conversion in the project.
Manual data conversion and migration.
The choice of a right conversion tool for a given project is always debatable but the following significant
factors form the basis of a rational decision
1. Cost involved in procuring a tool and having trained personnel to run the tool. Is this cost less than the
cost of employing developers to script the conversion process?
2. Does the tool require any customization? If yes, then the cost and time scales of such a customization
effort should be ascertained.
3. The volume of data to be converted and migrated. If the volume of data is much less, manual data
conversion and migration is the best option.
4. Whether the Legacy System data and Target environment could be hosted on same environment as
difference in operating system / data format would involve manual extraction and conversion to target
environment
22
23
24
26
7 Migrate
7.1 Develop Migration Statistics
There are specific goals associated with implementing an effective data migration strategy. Primarily, data
must be migrated from the source platform to the target platform completely and accurately, and according
to company and regulatory policies on information controls and security. This means no dropped or
incomplete records, and no data fields that fail validation or other quality controls in the target environment.
Another goal of data migration is that the process be done quickly, with as short a downtime window as
possible. Finally, the cost of data migration must be manageable, in terms of technology and staff
requirements.
There are many metrics that can measure the effectiveness and efficiency of data migrations:
Number of customizations required
Percentage of migrated records
Percentage of migrated tables
Percentage of data with quality problems
Number of migration errors
Migration impact on database size
Downtime due to migration
Required staging storage / hardware
Percentage of reconciliation errors
Percentage of cleansed data
27
28
30