Data Life Cycle
Data Life Cycle
Whether you manage data initiatives, work with data professionals, or are
employed by an organization that regularly conducts data projects, a firm
understanding of what the average data project looks like can prove highly
beneficial to your career. This knowledge—paired with other data skills—is what
many organizations look for when hiring.
No two data projects are identical; each brings its own challenges, opportunities,
and potential solutions that impact its trajectory. Nearly all data projects,
however, follow the same basic life cycle from start to finish. This life cycle can
be split into eight common stages, steps, or phases:
1. Generation
2. Collection
3. Processing
4. Storage
5. Management
6. Analysis
7. Visualization
8. Interpretation
Below is a walkthrough of the processes that are typically involved in each of
them.
For the data life cycle to begin, data must first be generated. Otherwise, the
following steps can’t be initiated.
Data generation occurs regardless of whether you’re aware of it, especially in our
increasingly online world. Some of this data is generated by your organization,
some by your customers, and some by third parties you may or may not be
aware of. Every sale, purchase, hire, communication, interaction—
everything generates data. Given the proper attention, this data can often lead to
powerful insights that allow you to better serve your customers and become more
effective in your role.
Back to top
2. Collection
Not all of the data that’s generated every day is collected or used. It’s up to your
data team to identify what information should be captured and the best means for
doing so, and what data is unnecessary or irrelevant to the project at hand.
Forms: Web forms, client or customer intake forms, vendor forms, and human
resources applications are some of the most common ways businesses
generate data.
Surveys: Surveys can be an effective way to gather vast amounts of
information from a large number of respondents.
Interviews: Interviews and focus groups conducted with customers, users, or
job applicants offer opportunities to gather qualitative and subjective data that
may be difficult to capture through other means.
Direct Observation: Observing how a customer interacts with your website,
application, or product can be an effective way to gather data that may not be
offered through the methods above.
It’s important to note that many organizations take a broad approach to data
collection, capturing as much data as possible from each interaction and storing
it for potential use. While drawing from this supply is certainly an option, it’s
always important to start by creating a plan to capture the data you know is
critical to your project.
Back to top
3. Processing
Once data has been collected, it must be processed. Data processing can refer
to various activities, including:
Data wrangling, in which a data set is cleaned and transformed from its raw
form into something more accessible and usable. This is also known as data
cleaning, data munging, or data remediation.
Data compression, in which data is transformed into a format that can be more
efficiently stored.
Data encryption, in which data is translated into another form of code to
protect it from privacy concerns.
Even the simple act of taking a printed form and digitizing it can be considered a
form of data processing.
Back to top
4. Storage
After data has been collected and processed, it must be stored for future use.
This is most commonly achieved through the creation of databases or datasets.
These datasets may then be stored in the cloud, on servers, or using another
form of physical storage like a hard drive, CD, cassette, or floppy disk.
When determining how to best store data for your organization, it’s important to
build in a certain level of redundancy to ensure that a copy of your data will be
protected and accessible, even if the original source becomes corrupted or
compromised.
Back to top
5. Management
Back to top
6. Analysis
Data analysis refers to processes that attempt to glean meaningful insights from
raw data. Analysts and data scientists use different tools and strategies to
conduct these analyses. Some of the more commonly used methods include
statistical modeling, algorithms, artificial intelligence, data mining, and machine
learning.
7. Visualization
While technically not a required step for all data projects, data visualization has
become an increasingly important part of the data life cycle.
Back to top
8. Interpretation
Finally, the interpretation phase of the data life cycle provides the opportunity to
make sense of your analysis and visualization. Beyond simply presenting the
data, this is when you investigate it through the lens of your expertise and
understanding. Your interpretation may not only include a description or
explanation of what the data shows but, more importantly, what the implications
may be.
A data lifecycle consists of a series of phases over the course its useful life.
Each phase is governed by a set of policies that maximizes the data’s value
during each stage of the lifecycle. DLM becomes increasingly important as
the volume of data that is incorporated into business workstreams grows.
A new data lifecycle starts with data collection, but the sources of data are
abundant. They can vary from web and mobile applications, internet of
things (IoT) devices, forms, surveys, and more. While data can be generated
in a variety of ways, the collection of all available data isn’t necessary for the
success of your business. The incorporation of new data should be always be
evaluated based on its quality and relevancy to your business.
Phase 2: Data storage
Data can also differ in the way its structured, which has implications on the
type of data storage that a company uses. Structured data tends to leverage
relational databases while unstructured data typically makes use of NoSQL
or non-relational databases. Once the type of storage is identified for the
dataset, the infrastructure can be evaluated for any security vulnerabilities
and the data can undergo different types of data processing, such as data
encryption and data transformation, to safeguard the business from
malicious actors. This type of data munging also ensures sensitive data
meets the privacy and governmental requirements for governmental
policies, like GDPR, allowing businesses to avoid any costly fines from these
types of regulations.
During this phase, data becomes available to business users. DLM enables
organizations to define who can use the data and the purpose for which it
can be used. Once the data is made available it can be leveraged for a range
of analyses—from basic exploratory data analysis and data visualizations to
more advanced data mining and machine learning techniques. All of these
methods play a role in business decision-making and communication to
various stakeholders.
Additionally, data usage isn’t necessarily restricted to internal use only. For
example, external service providers could use the data for purposes such as
marketing analytics and advertising. Internal uses include day-to-day
business processes and workflows, such as dashboards and presentations.
An organization’s DLM strategy should clearly define when, where, and for
how long data should be archived. In this stage, data undergoes an archival
process that ensures redundancy.
Phase 5: Data Deletion
In this final stage of the lifecycle, data is purged from the records and
destroyed securely. Businesses will delete data that they no longer need to
create more storage space for active data. During this phase, data is
removed from archives when it exceeds the required retention period or no
longer serves a meaningful purpose to the organization.