Summary in Spanish of DAMA
Summary in Spanish of DAMA
Data characteristics:
Metadata originates from the processes derived from its creation, processing and use,
including architecture, modeling, administration, governance, data quality management,
systems development, IT, and business operations and analytics.
Data management is cross-functional, requiring a lot of skills and experience.
It requires a business perspective, which is why data management and governance are
intertwined.
Different types of data have different lifespans.
Being an asset, they represent risks in the organization, organizations must consider the
ethical implications of the use of data.
Effective data management requires committed executives.
The value of the data is contextual, since it is not of equal use to other organizations and is
sometimes temporary.
Data quality is essential, if you store bad information you will waste your money.
You must ensure that it meets the needs of the institution by working with data users to
define their needs, including characteristics so that the data is of the highest quality.
Using data includes learning why it is applied to learn and create value.
Planning for better data requires a strategy in architecture, modeling and a collaboration
strategy between business and IT leaders.
The life cycle of the data is related to the life cycle of the products, additionally there is a
“lineage” (The path along which it moves from one point of origin to another where it is
used, sometimes of called data string).
Life cycle and lineage intersect and can be understood in relation to each other.
The data must be classified by data type:
o transactional,
o Reference,
o master data,
o metadata,
o Alternative category,
o Of resource,
o Event,
o detailed transaction),
o By context (domain, assigned area),
o Due to the format or the level of protection required,
o In addition to how and where it is stored or how it is entered.
Because different types of data have different requirements, are associated with different risks,
and play different roles, many data management tools focus on classification and control aspects.
Availability,
Reliability,
Complete,
Precision,
Consistency,
Chance,
Utility,
Understandability.
The information is frequently associated with business strategy and operational use of data.
Data is associated with IT and processes that make it accessible for use.
1. Business strategy.
2. Information technology strategy.
3. Infrastructure, organizational and processes.
4. Information technology infrastructure and processes.
Information governance
Business YOU
Phase 1.
The organization purchases an application with a database. It means it requires data design and
modeling, storage, security. To achieve the functioning of the system, work on integration and
interoperability is required.
Phase 2.
Once bootstrapping is achieved, you will encounter data quality challenges. Reliable metadata and
consistent data architecture are required to achieve the highest quality. This gives clarity, data is
obtained from different systems working together.
Phase 3.
Disciplines to manage quality, metadata and architecture required for data governance that
provide the structure for data management activities. Additionally, data governance enables the
execution of strategic initiatives such as documentation and context management, reference data
management, master data, data warehousing and business intelligence.
Phase 4.
The benefits of taking advantage of good data management in the organization and advancing its
analytical capabilities.
Episode 2. Data handling ethics
Describes the primary role that data ethics plays in generating reporting, socially responsible
decisions about data use. Awareness of ethics in obtaining, analyzing and using data that a
professional must observe.
Chapter 3. Data Governance
Provides direction and oversight by establishing a system of good data decisions that considers the
needs of the company.
Data governance is defined as an exercise of authority and control (Plan, monitor, enforce) over
data management.
To achieve the objective, the data governance program must develop policies and procedures,
generate management practices at different levels of the organization, complementing the
organization's change process. Requires a change management process.
Activities:
Define the project for active data management by aligning the organization's strategy to the
required strategic data and design to meet those requirements.
Results.
Activities.
Behavior.
1. Business drivers.
The goal of data architecture is to be the bridge between business strategy and technology
execution. As part of enterprise architecture, data architecture is:
The strategy prepares the organization for rapid evolution of products, services, and
provides an advantage of business opportunities inherent in emerging technologies.
Translate business needs into data and system requirements, processes have data they
require.
Facilitates alignment between business and IT.
Acts as an agent of change, transformation and agility.
3. Essential concepts.
Enterprise architecture domains.
Business data model. It includes key enterprise data entities, their relationships, critical
business rules, and critical attributes. Every data model must be based on the EDM.
Data flow design. Definition of requirements and the master map for storage and processing
through the database, applications, platform and network. Manages data on business
processes, location, roles, and technical components.
It requires a good investment even when acquired, it requires definition and documentation of
organizational vocabulary, business rules and business knowledge.
Data flow design
Activities
The work of developing enterprise data architecture specifications for the subject area in greater
detail.
Tools
Techniques
Implementation guide
Risk assessment.
Lack of administrative support.
No proven records.
Apprehensive responsible.
Culture shock.
Inexperience of the project leader.
Supervised projects.
Architectural design administration.
Definition of standards.
Creation of data artifacts.
Business drivers
Essential concepts
Explains the different types of data that can be modeled, the data model components, the types of
data models that can be developed, and the reason for choosing different types of situations.
It describes an understanding of the organization or what the organization wants to be. It is the
way to document the data requirements and the results of the process model data definitions. The
data model is the main instrument for communicating business data requirements to IT and with
IT generating the analysis, modeling, architecture to designers and developers.
o Information category. The use of data to classify and assign types of things. Ex. Products
classified by color, model, size.
o Resource information. Basic profile of resources required to conduct operational
processes such as products, clients, suppliers.
o Business event information. Creation of data while operation processes are in processes.
o Detail of transactional information. It is frequently generated by the point of sale system,
by the social networks system, interaction with the Internet, usually referred to as big
data.
Definitions of “entities”:
It is an essential contribution to business value in any data model.
Helps IT professionals to generate business intelligence and design decision applications. Should
have:
o Clarity.
o Accuracy.
o Completeness, integrity.
Relationship
Alias relationship
Cardinal relationship
In the relationship between two entities, cardinal captures how many times one entity participates
in the relationship with how many of the other entities.
Arity of relationships
Attribute
o Construction keys
A surrogate key is a unique identifier for a table, serves technical functions and should not be
visible to end users.
A business key is one or more attributes that business professionals will use as a single entity.
Domain
In data modeling, a domain is a complete set of possible values that an attribute can be assigned.
It can be articulated in different ways. A domain provides the standardization meaning of attribute
characteristics.
You can restrict a domain with different rules, called restrictions. Rules can relate to formats, logic,
or both.
Relational
Relational theory provides a systematic way to organize data that reflects its meaning. This
approach has the additional effect of reducing redundancy in data storage.
Dimensional
In dimensional models, data is structured to optimize the query and analysis of a large amount of
data.
Fact tables
In a dimensional scheme, the rows of the fact table of a particular measure and are numerical,
such as amounts, quantities. Some measurements are the result of algorithms in which case
Metadata is critical to providing understanding and use.
Dimension table
They represent the important objects of the business and contain mainly textual descriptions.
Snowflake
It is the term used to normalize the plane, simple table, dimensional structure in a starting scheme
in the representation of hierarchical components or in network structures.
Grain
The term grain shelf from the meaning or description of a column in the de facto table.
Dimensional shaping
Goal orientation.
The Unified Modeling Language (UML) is a graphical language for modeling software.
This language is based on the analysis of natural verbalization that can occur in the business
domain.
diagrams,
Definitions.
Exceptional problems and questions.
Lineage.
Build the data model:
1. Advanced engineering
Tools
Better practices
1. At conventions.
2. In data design.
It includes the design, implementation and support of data storage to maximize its value.
Operations provides support throughout the data lifecycle by planning data disposition.
Two subdivisions:
Companies rely on their information systems to run their operations. Business continuity is the
first driver.
Essential concepts
1. Database terms.
2. Data cycle management.
3. Database administrator.
DBA producer.
DBA application.
DBA procedures and development.
NSA.
4. Types of data architecture:
DBAs, in coordination with network and systems administrators, need to establish a systematic
comprehensive project to include standardizations, consolidation, virtualization and automation of
data backup and recovery functions, as well as security of those functions.
5. Types of database processing
ACID.
o Atomicity. The operations are carried out so that if one of the parts of the
transaction fails, the entire transaction fails.
o Consistency. The transaction must follow all the rules defined by the system and
every time.
o Isolation. Each transaction is independent on its own.
o Durability. Once completed, the transaction cannot be undone.
BASE
o Available in basic form. The system guarantees some levels of data availability
even when nodes fail.
o Soft state. Data is in constant flow, while a response is given, the data is not
guaranteed to be available.
o Eventually consistent. Data will eventually be consistent across all nodes, but not
all transactions will be consistent at all times.
CHAP
o Consistency. The system must operate as designed at all times.
o Availability. It must be available when required and must respond to each requirement.
o Distributed tolerance. The system must be available to continue operations even in the
event of failures.
6. Data storage media
The most common are disks and network area storage (SAN).
9. Database specialization.
10. Common database process.
Activities
It must follow the same principles and administration standards of any methodology.
Data professionals must first identify companies' characteristics before determining what
recommendation to make.
Most companies have database tools installed to perform ranges or functions, for data
management. Only a few of those tools have mandatory standards.
Select the strategic software DBMS (Database Management System. is a set of programs that
allow the storage, modification and extraction of information in a database , in addition to
providing tools to add, delete, modify and analyze the data. Users can access information
using specific query and reporting tools, or through applications for this purpose ) is very
important. DBMS has its greatest impact on integration, application performance and business
productivity. Some factors to consider:
Database administration
Understand requirement.
o Define storage requirements.
o Identify usage patterns.
o Defines access requirements.
o Business continuity planning.
o Generate backups.
o Data recovery.
o Manage access controls.
o Create data containers.
o Implement physical data models.
o Load data.
Tools
Implementation guide
Metrics
Information movement
Compliance with licensing agreements and regulatory requirements.
Ensures data privacy, confidentiality is maintained, data is not breached and access to data is
appropriate.
Business drivers
Information security begins with the classification of the organization's data to identify which data
requires protection. The following steps are considered:
2. Business growth.
The growth of e-Commerce has changed the way we offer products and services. Reliable e-
Commerce leads to profits and growth.
Secure and robust information management enables transactions and security with the client.
3. Security is an asset.
One approach to managing data sensitivity is metadata. Developing a master repository of data
characteristics means that the entire company can know precisely what level of protection
sensitive information requires.
1. Goals
o Enable appropriate access and prevent inappropriate access to company data.
o Enable compliance with regulations and privacy, protection and confidentiality policies.
o Ensure that user requirements for privacy and reliability are met.
2. Beginning
o Collaboration.
o Business focus.
o Proactive administration.
o Clear responsibilities.
o Metadata boost.
o Risk reduction by reducing exposure.
Essential concepts
Can be prioritized:
1. Risk classification
o Critical risk (CRD).
o High risk (HRD).
o Moderate risk (MRD).
2. Organizational data security
Data stewards must be active with technology developers and cybersecurity professionals.
3. Security processes
o The 4 “As”. Access, auditable, authentication, authorization and entitlement.
o Monitoring.
4. Data integrity
5. Encryption. Hash, private key, public key.
6. Masking.
7. Network security terms. Backdoor, zombie, cookie, firewall, perimeter, DMZ, super-user
account, logger, penetration test, private network, types of data security, security device,
security credentials, identity management system
8. Types of data security restrictions.
9. Security risk system.
10. Social threats to security. Identity fraud.
11. Malware.
Activities
Tools
1. Antivirus software.
2. HTTPS.
3. Identity management technology.
4. Intrusion detection and prevention software.
5. Firewalls.
6. Metadata tracking.
7. Encryption.
Techniques
Implementation guide
1. Evaluation Preparation.
2. Organization and culture of change.
3. Visibility in data use rights.
4. Data security in an outsourced world.
5. Data security in a cloud world.
It includes processes related to the movement and consolidation of data and between
warehouses, applications and the organization.
Business drivers
The trend of organizations purchasing applications rather than developing them expands the need
for data integration and interporality.
Data centers such as warehouses and master data help solve by consolidating data needs for
applications and providing applications with consistent views of data.
Have data available, lower data warehouse costs, lower costs and complexity in managing
solutions, make sense of events/opportunities automatically, support business intelligence,
analytics, master data management, and operational efficiency efforts.
Essential concepts
Tools
Implementation guide
1. Risk evaluation.
2. Organizational and cultural change.
DLL governance
It includes the planning, implementation, and control of activities used to manage the life cycle of
data and information found in an unstructured range of data, especially documents necessary for
legal support and regulatory compliance.
Business drivers
Essential concepts
1. Content.
o Content management.
o Metadata content.
o Modeling content.
o Content of delivery methods.
o Vocabulary control.
2. Content and recordings
3. Information architecture
It is the process of creating the structure by the body of information and context.
4. Search Engine
Activities
Tools
Implementation guides
1. Risk evaluation.
2. Organizational and cultural change.
Governance
Includes in-process reconciliations and maintenance of critical shared data information available
through the use of systems to provide the most accurate, timely and relevant version of the truth
about business entities.
Business drivers
Find organizational data requirements, manage data quality, data integration cost management,
risk reduction.
o Ensure that the organization has consistently, current, and authoritatively completed the
master and organizational process through reference data.
o Enable master and reference data to be shared across business functions and applications.
o Reduce the cost and reduce the complexity of data use and integration through standards,
common data models and integration patterns.
Essential concepts
1. Differences between master data and reference data.
Master data management (MDM). It involves control over master data values and
identifiers for consistent use of enablers throughout the system, the most cumulative and
timely data on essential business entities.
The goals of MDM include ensuring accurate and timely availability of data while reducing
the risks associated with ambiguous identifiers (those identifiers with more than one
entity and those references to more than one entity).
Reference data management (RDM). It involves control over the values of defined
domains and their definitions. The goals of RDM is to ensure that organizations have
access to the full set of current and cumulative values of each represented concept.
2. Reference Data
It is any data used to characterize or classify other data or release information to an external
party. The most basic reference data are codes and descriptions, but some reference data can be
more complex, incorporating mapping and hierarchies.
They involve control and maintenance of values of defined domains and relationships through
dominant values. The goal of baseline value management is to ensure that values are consistent
and current across different functions and that data is accessible to the organization.
3. Master data
It is data about a business entity (Employees, customers, products, financial structures, assets)
that provides transaction context and analysis.
Identify candidate sources that can provide a comprehensible view of the master data
entities.
Development of exact and precise rules and data fusion.
Establish a focus on reliable distribution of data to systems across the enterprise.
1. MDM Activities
Defines MDM enablers and requirements.
Each organization has different MDM enablers and obstacles, derived from the number and type
of systems, age, support of business processes, and how data is used for transactions and analysis.
Enablers often include opportunities to improve customer service and operational efficiency, as
well as reduce risk related to privacy and compliance. Obstacles include differences in data
meaning and structure between systems.
It is relatively easy to define requirements for master data in an application. It is more difficult to
define standard requirements through the application.
MDM requires special designed tools to enable entity management. MDM can be implemented
through data integration tools, data remediation tools, operational data warehouses (ODS).
Shared data HUB or specialized MDM applications.
Implementation guide
They cannot be implanted overnight. Solutions require business specialists and technical
knowledge. Organizations should expect to implement master and reference data solutions
incrementally across a series of projects according to a project map.
It includes the planning, instrumentation and control of processes to support decisions and
establish knowledge for workers to obtain the value of the data through analysis and reporting.
Business facilitators
The main thing is to support operational functions, compliance with requirements and business
intelligence activities.
1. Business Intelligence.
Has two meanings. The first is to direct data analysis to understand the activities and opportunities
of the organization. The results of the analysis are used to improve the success of the organization.
When people say that data holds the key to competitive advancement, they mean the inherent
promise of business intelligence activities: This is if the organization asks the right questions of its
own data. This can fall internally into products, services and customers, which enable you to make
better decisions such as meeting your specific objectives.
Second. Business intelligence refers to a set of techniques that support this type of data analysis.
An evolution of tools supporting decisions. BI tools enable questioning, data mining, statistical
analysis, reporting, scenario modeling, data visualization, and data dashboards. They are used for
all of the advanced analytics budget.
2. Data warehouse
3. Data storage
It is the operation of extracting, cleaning, transforming, controlling and loading processes that
maintain data in a data warehouse.
7. DW architecture components
Activities
1. Understanding of requirements.
2. Define and maintain the DW/BI architecture.
3. Development of data warehouse and marked data.
4. Data warehouse population.
5. Implementation of the business intelligence portfolio.
6. Product data maintenance.
Tools
1. Metadata repository.
2. Data integration tools.
3. Types of business intelligence tools.
Techniques
2. BI self-service
Visualization and statistical analysis tools allow you to quickly explore and discover data.
Implementation guide
1. Risk assessment.
2. Map release.
3. Configuration management.
4. Organization and cultural change.
Governance
1. Business acceptance.
2. User satisfaction.
3. Service level agreements.
4. Reporting strategy.
5. Metrics.
It includes the planning, instrumentation and control of activities to establish access to high quality
data by integrating metadata, including definitions, models and flows and other information
critical to understanding data and the system created, kept accessible.
Data on data, classified as broad in scope. It includes technical and business process information,
data rules and constraints, and physical and logical data structures. It describes the data itself, the
concepts that the data represents (business processes, system applications), and the connections
between data and concepts. Metadata helps an organization understand its data, its systems and
its flow of information.
Business drivers
Data cannot be managed without metadata. Additionally, metadata itself must be manageable,
reliable, and help with good metadata management.
Goals include:
Organizational commitment.
Strategy.
Business perspective.
Socialization.
Access.
Quality.
Audit.
Improvement.
Essential concepts
1. Metadata VS data
The organization should not worry about the philosophy, it should focus on the requirements
being focused on what they need from the metadata (creating new data, understanding existing
data, enabling movements between systems, access data, sharing data) and data sources. to
respond to the requirements.
2. Types of metadata
Business metadata. Focused largely on data context and condition, it includes data
governance details. It includes the non-technical part such as names, and definition of
concepts, areas, entities and attributes, range, description, calculations, algorithms and
business rules, valid domain values and their definitions.
Technical metadata. Technical details of the data, the data warehouse system, and the
process that moves between systems.
Operational Metadata. Describes details of the process and data accessibility.
3. ISO/IEC 11179 Metadata registration standard
Provides a frame of reference for the definition of a matadata record. It is designed to enable
metadata-driven data exchange, starting with data elements.
By its nature, all data has a structure, however, not all data is formally structured in rows, columns
and stored in a relational database. Any data that is not in a database or data file, including
documents or other media, is considered unstructured data.
Metadata for unstructured data includes description metadata, as well as a description catalog,
keywords, structured metadata such as tags, field structure, formats, managed metadata, as well
as sources, updated programs, access rights, navigation information.
5. Metadata sources
Metadata can collect information from different sources, as long as they are well managed. Most
organizations manage metadata well at the tiers because metadata is often created as a by-
product of an application in progress rather than a finished product.
To give an idea of the breadth of metadata in an organization, the range of sources is outlined
here in alphabetical order:
The strategy includes the definition of the future state of the company's metadata architecture
and the implementation phases required to meet the strategic objectives:
Tools
Business focus. Limits lineage discovery to prioritizing data elements for the business.
Technical approach. Start with the source system and identify all immediate consumers.
2. Metadata for big data ingestion
Implementation guide
1. Risk assessment.
2. Organizational and cultural change.
Metadata governance
1. Control processes.
2. Documentation of metadata solutions.
3. Metadata standard and guides.
4. Metrics.
Effective data management involves a complex set of interrelated processes that enables an
organization to use data to achieve strategic goals. Data management includes the ability to design
for applications, storage and entry securely, sharing appropriately, learning from it, and ensuring
business needs are met. An underlying premise about the value of data is the reliability and
trustworthiness of the information, in other words “high quality.”
All management disciplines contribute to data quality, and high data quality supports the
organization in achieving the goal of the data management disciplines.
Business facilitators
Criticality.
Life cycle management.
Prevention.
Root cause remediation.
Governance.
Standard drivers.
Objective and transparent measurement.
Embed business processes.
Systematic effort.
Connection with service levels.
Essential concepts
1. Data quality.
2. Critical data.
3. Data quality dimensions.
The term dimensions is used to make the connection of dimensions in the measurement of
physical objects.
In order to measure data, the organization needs to establish characteristics that are both
important: business processes and measurable.
The intent of ISO 8000 is to help organizations define what data quality is and is not, enabling the
use of standards conventions, and verifying that they have received data quality using the
standards. When the standards are followed, the requirements are met when using computer
programs.
The cycle begins with the identification of data that does not match consumer requirements and
questionable data that are obstacles in relation to the strategic objective of the business. The Data
needs to be evaluated again by the quality dimension keys and meet the business requirements.
The plan generates the scope, impact and priority of the known problems and evaluates
alternatives to solve them.
In “do” you direct the cause-effect paths of the problems identified in the plan.
The cost of fixing data the first time is less than using bad data and fixing it later.
Conformation of definitions.
Present value and complete data.
Compliance format.
8. Common causes of data quality problems:
Problems caused by lack of leadership.
Problems caused by data loading process.
Use statistical techniques to discover the real structure, content and quality of data collection.
While the focus is on error prevention, data processing can also be improved in some way.
Data cleaning.
Data improvement.
Data formatting and analysis.
Data transaction and standardization.
Activities
Tools
Techniques
1. Prevention actions.
2. Corrective actions.
3. Quality review and audit code modules.
4. Effective data quality metrics.
5. Static process control.
6. Route cause analysis.
Implementation guide
1. Risk evaluation.
2. Cultural and organizational change.
It describes the technology and business processes that emerge as our ability to obtain and
analyze a large and diverse set of data.
Big data refers not only to the volume of data, but also structured and unstructured, documents,
files, audios, videos, transition data and the speed at which it is produced. People who develop
predictive, machine learning, and predictive models and analytics and generate analysis results for
stakeholders are called Data Scientists.
Data scientists have integrated models from mathematics, statistics, computer science, signal
processing, probabilistic models, pattern recognition, machine learning, uncertainty modeling, and
data visualization in order to obtain information and predict behavior in a big data set. Data
scientists have found new ways to analyze and derive value from data.
Business drivers
Decide to find and act on business opportunities that can be discovered through the set of data
generated through the diversification of processes.
Essential concepts
1. Data science
It combines data mining, statistical analysis and machine learning with data integration and data
modeling capabilities, to build predictive models that explore the patterns contained in the data.
Developing predictive models sometimes called Data Science because data analysis or data
scientist uses the scientific method to evaluate.
3. big data
Volume. They generally contain thousands of entities or elements with millions of records.
Speed. The speed at which data is captured, generated, shared. It can be generated and
analyzed in real time.
Variety/variability. Refers to the way data is captured and sent. Big Data requires storage
in multiple ways. The structure is often inconsistent with data crossing.
Goo. How difficult it is to use or integrate data.
Volatility. How frequently changes occur and then how long the data is used.
Veracity. How reliable is the data?
4. Big Data Architecture Components
The selection, installation and configuration of the Big Data and Data Science environment
requires expert specialists. End to End architecture must be generated and rationalized from
existing data exploratory tools.
It is the product of emails, social, online requests and online video games. Data is not only
generated by telephone and point-of-sale devices, but also by surveillance systems, sensors in
transportation systems, medical monitoring systems, industrial and monitoring systems, satellites,
and military equipment.
6. data lake
It is a medium where a large amount of data of various types of structure can be ingested, stored,
and analyzed. They can serve several purposes:
8. Machine learning
Explore construction and study learning algorithms. It can be seen as the union of unsupervised
learning methods, more commonly referred to as data mining, and supervised learning methods
deeply rooted in mathematical theory, specialized statistics, combinatorics, and optimization. The
third brand is now called reinforcement learning, where actualized goals are earned, but not
especially recognized by teachers.
Machine learning exploits the construction and study of learning algorithms, which fall into three
types:
Supervised learning.
Unsupervised learning.
Reinforcement learning.
9. Sentiment analysis
It is used to understand what people say and feel about brands, products and services. Using
natural language processing (NLP) or partial phrases or sentences, semantic analysis can detect
feelings and also reveal changes in feelings and predict possible scenarios. IBM Watson.
It is a type of analysis that reveals patterns in the use of data by various algorithms. Text Mining
analyzes documents with text analysis and data mining techniques to automatically classify
content in a guided workflow.
Profile
Data reduction.
Association.
Grouping.
Self-organizing maps.
11. Predictive analysis.
It is a part of supervised learning where users attend to a model of data elements and predict the
future through estimated probability evaluations.
It combines text mining data analysis, association, clustering, and other unsupervised learning
techniques to encode large data sets.
It is a process of interpreting concepts, ideas, and factors through the use of photographs or
graphic representations.
The strategy will direct the scope and time of the organization in the map of Big Data capabilities.
Review the available sources and the processes that create those resources and manage the plan
for new sources:
Fundamental data.
Granularity.
Consistency.
Reliability.
Inspection/profile new sources.
3. Acquire and ingest data sources.
Once sources are identified, they need to be found, sometimes purchased, and ingested (Loaded)
into the big data environment.
Tools
The advancement in technology has created big data and the data science industry. To understand
the industry one must understand the enablers. The tools and techniques that enable big data
science to emerge will be explained.
1. MPP non-shared technologies and architectures.
Massively parallel processing (MPP) has become a standard scientifically oriented data analysis
platform for big data sets.
Like open source Hadoop, it is a cheap way to store a large amount of data in different formats. It
is ideal for secure data storage, but has challenges when it comes to entering structured data or
data type mechanisms (SQL).
Map.
Deck of cards.
Reduction.
3. Database algorithms.
4. Cloud database solutions.
5. Computational statistics and graphical languages.
6. Data visualization tools.
Techniques
1. Analytical modeling.
Model description.
Explanatory modeling.
2. Big data modeling.
Implementation guide
1. Strategic alignment.
Every big data/data science program must be strategically aligned with the organization's
objectives. Establishing a big data strategy, data security, metadata management, including
lineage and data quality management.
The strategy should document governance goals and principles. The ability to balance big data
with the skills and capabilities requirements of the organization.
The deliverables strategy must consider:
Life cycle information.
Metadata.
Data quality.
Data acquisition.
Data access and security.
Data governance.
Data privacy.
Learning and adoption.
Operations.
2. Risk evaluation.
Business relevance.
Business preparation.
Economic viability.
Prototype.
3. Organization and culture of change.
Governance
Sources, source analysis, ingestion, enrichment, and publication require business procedures, as
well as technical controls and addressing.
Maturity models are defined in terms of a progression of levels that describe process
characteristics. When an organization gains an understanding of process characteristics, its
maturity level can be assessed and a plan to improve its capabilities can be put in place. You can
also measure improvements and compare yourself or other companies, guided by the levels of the
model.
Before beginning any DMMA, the organization must establish a baseline understanding of its
current capabilities, assets, goals and priorities. A certain level of maturity in the organization is
required to conduct the evaluation initially, as well as respond effectively to the expected results
by establishing objectives.
Business facilitators
Regulatory.
Data governance.
Organizational preparation for implementation processes.
Organizational change.
New technology.
Problems in data management.
Goals and principles
The first goal is to evaluate the current state of critical data management activities to plan for
implementation. The evaluation puts the organization on a maturity scale classifying strengths and
weaknesses. Helps the organization identify, prioritize and implement opportunities.
DMMA helps the organization clarify priorities, crystallize objectives and develop a comprehensive
plan to implement.
Essential concepts
1. Evaluation of levels and characteristics.
Level 0: Absence of capabilities.
Level 1. Initial. Success depends on individual fulfillment.
Level 2. Repeatable. Minimal project discipline is in place.
The roles are defined and the processes depend on a specific expert.
There is awareness in the organization about problems in the quality of data and concepts.
The concepts of master data are beginning to be known.
The evaluation criterion may include the presence of some control process, such as the
entry into quality problems.
Level 3. Definite. The standards are in place and in use. The roles are defined.
See the introduction and institutionalization on a scale of management processes and DM
to a facilitator. Features include data replication across the organization with some
controls in place, and generally increasing data quality, just with defined policy
coordination and management.
Evaluation criteria may include the existence of data management policies, the use of
scalable processes, and the consistency of data model and control systems.
Level 4. Administered. The processes are quantified and controlled. Institutionalization of
knowledge achieved at levels 1 to 3, allowing the organization to predict results, when
new projects and tasks approach and begin to manage risks related to data, metrics are
included. Standardization of data management tools from the desktop to the
infrastructure, coupled with a well-formed centralized planning and governance function.
Evaluation criteria may include metrics related to successful projects, system operation
metrics, and data quality.
Level 5. Optimized. Implementation of process goals are quantified. When data
management practices are optimized, they are highly predictable through automated
processes and change management technologies.
Evaluation criteria may include change management artifacts and metrics in process
implementation.
2. Evaluation criteria.
At each level the evaluation criteria will be used independent of the scale of 1. Not started, 2. In
process, 3. Working and 4 effectively.
Categories in the conceptual diagram:
Activity.
Tool.
Standards.
People and resources
The Capability Maturity Models Institute (CMMI) develops a maturity model with the following
criteria:
Activities
1. Define objectives.
Choose a frame of reference.
Defines the scope of the organization.
Defines interaction approach.
Communication plan.
2. Evaluate maturity.
Gather information.
Perform the evaluation.
3. Interpret results.
Evaluation results report.
Development of execution instructions.
4. Create a program of objectives to implement.
Identify actions and create a roadmap.
5. Re-evaluate maturity.
Tools
Techniques
Guide to DMMA
1. Risk evaluation
2. Organization and cultural change.
Provides best practices and considerations for the data management team and enables successful
data management practices.
It describes the set of principles that should be considered when putting data management and
data governance together in an organization.
Awareness, ownership and responsibility are the keys to activating and engaging people in data,
policy and process management initiatives.
Once you take a snapshot of the current state, you evaluate the level of satisfaction and
organizational data management needs and prioritization.
The data management organization must align with the organization's hierarchy and its resources.
Finding the right people requires an understanding of the functional and political role of data
management in the organization. The object should be to find multifunctional personnel:
Identifies employees who are currently developing data management activities. Identify
and involve them first. Hire additional resources only such as data and governance
administrator if required.
Examines the methods the organization uses to manage data and determines how the
processes can be implemented. Determine how much change is required to implement
data management practices.
Roadmap of the type of organizational changes needed to meet the requirements.
It tries to determine how decisions are made, as well as how they are implemented.
The answer will provide a starting point to understand the location of the organization.
Most organizations start with a decentralized model before starting with a formal organizational
model. As an organization you have to visualize the impact on data quality, this can start with a
RACI data management matrix and move towards a network model.
Consider the following recommendations:
1. Executive sponsor.
2. Clear vision.
3. Proactive change management.
4. Alignment with leadership.
5. Communication.
6. Commitment to users.
7. Guidance and training.
8. Measurement adoption.
9. Adherence to guiding principles.
10. Evolution not revolution.
Once the model is established and the participants are identified, it is time to move people into
the new authorized roles.
Data governance
It is the organizational framework to establish the strategy, objectives, and policies for effective
corporate data management. It consists of processes, policies, organization and technologies
required to manage and ensure the availability, usability, integrity, consistency, audibility and
security of data.
Enterprise architecture
Technological architecture.
Application architecture.
Data architecture.
Business architecture.
Manage a global organization
Adhere to standards.
Process synchronization.
Aligned accounting.
Training and communication.
Effective monitoring and measurement.
Development of economies of scale.
Reduction of duplication of effort.
1. Organizational roles.
2. Individual roles.
Executives.
Business.
YOU.
Hybrids.
Describes how to plan through the cultural changes that are necessary to introduce data
management practices into the organization.
Laws of change
1. Communication of principles.
2. Audience evaluation and preparation.
3. The human element.
4. Communication plan.
5. Keep in contact.