0% found this document useful (0 votes)
21 views53 pages

Chapter 03 - Meta Data Management - 04

Chapter Three discusses data concepts essential for data management and curation, emphasizing the role of metadata in organizing, managing, and understanding data across various contexts. It outlines the importance of metadata management for ensuring data quality, usability, and compliance, while also detailing types of metadata such as descriptive, structural, and administrative. The chapter highlights best practices, standards, and the significance of effective communication in data management to enhance data sharing and accessibility.

Uploaded by

dine mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views53 pages

Chapter 03 - Meta Data Management - 04

Chapter Three discusses data concepts essential for data management and curation, emphasizing the role of metadata in organizing, managing, and understanding data across various contexts. It outlines the importance of metadata management for ensuring data quality, usability, and compliance, while also detailing types of metadata such as descriptive, structural, and administrative. The chapter highlights best practices, standards, and the significance of effective communication in data management to enhance data sharing and accessibility.

Uploaded by

dine mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Chapter Three

Data Concepts

1
Data Concepts in Data Management and Curation

➢ Data concepts provide a foundational understanding of how data is structured,


identified, preserved, and communicated.
➢ These concepts help ensure the efficient management, sharing, and usability
of data in different domains.
What is Metadata?
➢ Metadata is simply called as data about the data, it is used to organize the information, manage the
information and understand the information.

➢ Metadata was found in different contexts, including digital files, libraries, websites and databases.
➢ It plays a major role in management the data, retrieving the information, organization of the content with
different domains.

➢ Metadata includes a wide range of information depends upon the content which they are used.
➢ Let us consider an example in the real world, in digital photography context, metadata can include the details
like time of the photo, the date of the photo was taken, coordination of the GPS,

➢ it indicates where the photo was taken and even the data or information about the photographer.

➢ In document context, metadata includes the details like author of the document, title of the document, creation
date of the document and keywords of the document.
What is Metadata?
➢ Metadata describes technical, business, or operational aspects of other data.

➢ This provides you context so you can find the information you need more easily and use your
data more effectively.

➢ What is “other data”? By other data, we mean a collection of facts which represent measurements
or descriptions of a situation.

➢ These facts can be in the form of numbers, symbols or words and are typically stored digitally.

➢ Now let’s get back to defining metadata. To help you to find, understand, and access
the information you need, this “other data” needs to have metadata associated with it.

➢ The metadata identifies other data and gives it context by providing core information
about it, such as author, creation time, file size, file type, topic, etc.
What is Metadata?
➢Metadata is the data, or information, about the data.
➢Data ‘reporting’
• created the data?
Who • manages the data?
• is the study area?
Where • can I access the data?
• is the data content?
What • source data was used?
• was the data created?
How • is the data distributed?
• is the time period of the content?
When • was the data created?
• was the data created?
of your data Why • are there missing values?
A Label For Your Data
➢ In this example the:
➢ Front label indicates:
➢ the title (Vzz),
➢ a short abstract (100% juice) about the
content,
➢ the size (64 oz.) of the resource, and
➢ some supplemental information (sodium
content) that consumers may consider
valuable.
➢ The timestamp on the cap indicates the
time period or freshness of the resource.
Metadata in Real Life
• Metadata is all around…
Author(s) Boullosa, Carmen. For example, a card catalogue tell us more information
Title(s) They're cows, we're pigs /
than just the title of the book, they also tells the user:
by Carmen Boullosa
Place New York : Grove Press, 1997. Who is the author?
Physical Descr viii, 180 p ; 22 cm. Who published the book?
Subject(s) Pirates Caribbean Area Fiction. What subject area does the book fall in?
Format Fiction
And finally, where is it located in the library?

Another example of metadata that we see in our daily lives is the nutrition and
ingredient information on food labels.
Nutrition labels answer questions such as:
What ingredients were used?
Who made the food?
How many calories per serving?
How many servings in the can?
What percentage of daily vitamins are in each serving?
What is Metadata management?
➢Metadata management is a set of activities, technologies, and policies that
target metadata collection, storage, and organizing.

➢Its goal is making data assets understandable and discoverable for users.

➢In library analogy, metadata management would involve creating a book


catalog and a user guide to guide library visitors around the stacks.

➢Metadata management is a part of the data governance process

➢which, in turn, is an element of the overall data management strategy.


Why is Metadata Management Important?
➢ Metadata management helps you find the data you need and trust that data is accurate.

➢ Your company likely has a large volume of complex data coming from many sources. And

➢ you need to be able to find, understand and trust the right information to gain actionable insights
that improve your business.

➢ The key benefits of robust metadata management as part of your data governance framework:

➢ Better data quality and data usability

➢ More accurate data insights and decisions.


➢ Fewer data retrieval issues because your metadata definitions are more consistent.
➢ Easier to meet regulatory and compliance requirements
➢ A wider scope of business users can interact with data
ways to create metadata
• There are two ways to create metadata:

➢ Manual creation is labor-intensive but allows you to include more details.


➢ This approach is recommended for high-value, low volume data sets.
➢ Automatic creation, sometimes referred to as active metadata management, allows you to
process massive volumes of data by leveraging machine learning.
➢ However, this approach can limit the amount of details you add.
Ontology for Data Concepts
• Ontology in data management refers to the structured representation of concepts
and their relationships within a specific domain. It helps define the meaning, categories,
and relationships of data.
• Key Features of Ontology in Data Concepts:
• Defines entities and their relationships.
• Standardizes terms used in a domain.
• Enables semantic interoperability between systems.
• Supports data integration across multiple sources.
• Example:
A Healthcare Ontology might define:
• Concepts: Patients, Doctors, Diseases, Treatments.
• Relationships: "A patient receives treatment," "A doctor prescribes medication."
Problems in Data
• problems in data occur when there are inconsistencies in how data objects are recognized,
named, or linked.
• These issues can lead to data duplication, confusion, or data loss.
• Common Identity Problems:
1. Duplicate Records – Multiple entries for the same entity in a database.
2. Ambiguous Identifiers – Different records using the same ID mistakenly.
3. Inconsistent Naming – Variations in data naming conventions.
4. Changing Identifiers – IDs that are not persistent over time.
5. Cross-System Conflicts – Incompatibility between databases using different identification schemes.
• Example:
A university might have “Solomon" recorded as:
• StudentID: 2023-001 in the student database
• EmployeeID: 56789 in the staff database
• LibraryID: SM567 in the library system
• This can create problems if there is no unified way to recognize that all refer to the same person.
Types of Metadata
• 1. Descriptive metadata
• Describes a resource for purposes such as discovery and identification.
• The descriptive metadata describe about the title, author/creator, subject/keywords,
abstract/summary, date, format, languages, identifier, spatial coverage, rights information. In
the digital context like documents, descriptive metadata also includes the details of the
documentation such as location, creation date, format and size of the file.
• 2. Structural metadata
• Structural metadata indicates how compound objects are put together,
• for example, how pages are ordered to form chapters.
• 3. Administrative metadata
• Administrative metadata provides information to help manage a resource, such as when and
how it was created, file type and other technical information, and who can access it.
Descriptive Metadata
2. Structural Metadata
• In structural metadata, it provides the information about
organization, about arrangements and relationships with the
digital asset.
• It helps to the users to navigate and understand the internal
structure and the hierarchical components.
• The purpose of structural metadata is to provide details about the
organization and relationships within the content or dataset.
• Let us consider an example, In a book, structural metadata can be
include the title of the chapter, headings of the chapter, contents,
indexes, etc.,
3. Administrative Metadata
• Administrative metadata is nothing but it gives the information about
the management, about the administration and governance of the
digital assets with its lifecycle.
• It served as a foundation for the resource management, traceability
and compliance with the organizational policies and its standards.
Metadata can enabled the efficient
• Example: This metadata can include the details like ownership of the
data (author), update/create date, access permissions, and formats of
the files and. It also describe about the storage of the data, strategies
for the preservation.
• Metadata inside photo, that is administrative metadata. It shows the types of data, time the image was taken,
Types of metadata

Example: A descriptive meta data of a book in a library could be:


ISBN number, Book author and title,
➢ An Example of structured data would be, how the pages of the book are put together to
create different chapters.
Example

➢Generally, Metadata can be embedded in a digital object or it can be stored
separately.
➢It is often embedded in HTML documents and in the headers of image file.
➢However, it is impossible to embed metadata in some types of objects (for
example, artifacts).
➢Also, storing metadata separately can simplify the management of the metadata
itself and facilitate search and retrieval.
➢Therefore, metadata is commonly stored in a database system and linked to the
objects described.
Metadata allows data developers to:
• Avoid data duplication : because researchers can determine if data already exist
• Share reliable information: Scientists are able to share reliable information about a
dataset by creating metadata and passing it along with the dataset
• Publicize efforts – promote the work of a scientist and his/her contributions to a field of
study : Metadata also allow data creators to publicize the valuable data they have
collected by making the metadata available on clearinghouses and other publicly available
venues
• Metadata reuse saves time and resources in the long-run: Scientists wishing to reuse a
dataset can be confident of its origins, data quality, and other valuable information about
the data
Metadata gives a user the ability to:
• Search, retrieve, and evaluate dataset information from both inside and outside an
organization
• Find data: Determine what data exists for a geographic location and/or topic

• Determine applicability: Decide if a dataset meets a particular need

• Discover how to acquire the dataset identified; process and use the dataset

• Understand the dataset, including definitions of column names, or expected

numerical ranges found in the data


What is the Value to Organizations?
• Metadata helps ensure an organization’s investment in data:
• Documentation of data processing steps, quality control, definitions, data uses, and
restrictions
• Ability to use data after initial intended purpose
• Allows organization to track data use and facilitates publication
• Transcends people and time:
• Offers data permanence
• Creates institutional memory
• Advertises an organization’s research:
• Creates possible new partnerships and collaborations through data sharing
Data Management: Maintenance and Update
• Metadata records can be used to track data provenance accurately

• Data Maintenance:
• Are the data current?
• Are the data in a reliable format?
• Where are the data stored?

• Data Update:
• Contact information
• Distribution policies, availability, pricing, URLs
• New derivations of the dataset
Data Preservation
• Data preservation ensures that data remains accessible and usable for the long term, even as
technology evolves.
• Challenges in Data Preservation:
• Data Degradation – Risk of data loss due to storage failures.
• Format Obsolescence – Old file formats becoming unreadable.
• Storage Medium Issues – Hard drives, CDs, and tapes degrade over time.
• Legal and Ethical Concerns – Privacy laws may change.
• Example: Digital Archives
• The Library preserves historical documents digitally.
• NASA maintains old satellite data using modern storage techniques.
• Best Practices:
• Use open and widely accepted formats (e.g., CSV instead of proprietary Excel files).
• Store multiple copies in geographically separate locations.
• Implement backup and recovery plans.
Identifiers in Data Management
• An identifier is a unique reference assigned to data objects to distinguish them.
• Types of Identifiers:
1. Persistent Identifiers (PIDs) – Do not change over time (e.g., DOI, ORCID).
2. Hierarchical Identifiers – Structured identifiers for classification (e.g., ISBN for books).
3. Proprietary Identifiers – Internal identifiers used by organizations.
• Examples of Identifiers:
• DOI (Digital Object Identifier): Used for research papers (e.g., 10.1000/xyz123).
• ORCID: Unique researcher identifier (0000-0001-2345-6789).
• ISBN (International Standard Book Number): Identifies books globally (978-3-16-148410-0).
• Best Practices:
• Use global identifiers for research outputs.
• Ensure consistent naming schemes across systems.
• Avoid changing identifiers after publication.
Workflow, Provenance
• Data Workflow
• A workflow defines the sequence of steps for managing data, from
collection to analysis and storage.
• Example: Data workflow for research
1. Collect data from sensors.
2. Clean and preprocess data.
3. Store data in a database.
4. Analyze using Python.
5. Publish results in an open repository.
Workflow, Provenance ..
• Provenance
• Data provenance tracks the origin, history, and modifications made to a
dataset.
• Example: Data Provenance in a Lab Study
• Original Data: Collected on 2025-03-01 from field sensors.
• Processing Step 1: Outliers removed on 2025-03-02.
• Processing Step 2: Data normalized on 2025-03-03.
Communication in Data Management
• Effective communication ensures that data is correctly shared, interpreted, and used.
• Key Aspects of Data Communication:
• Clear Documentation – Use README files, manuals, and metadata.
• Data Visualization – Charts, graphs, dashboards.
• Standard Formats – Use CSV, JSON, XML instead of proprietary formats.
Example:
A company uses Power BI dashboards to present sales data to executives.
Best Practices in Data Management and Curation
1. Use structured metadata for documentation.
2. Implement access controls to protect sensitive data.
3. Regularly backup and archive important data.
4. Ensure compliance with data standards and regulations.
5. Encourage open data sharing while respecting privacy rules.
Standards in Data Management
• Data standards ensure consistency, interoperability, and data quality.
• Common Data Standards:
• Dublin Core – Metadata standard for digital content.
• ISO 27001 – Data security management standard.
• FAIR Principles – Data should be Findable, Accessible, Interoperable, and Reusable.
Example:
A research dataset following FAIR Principles:
• Findable: Indexed in a public repository.
• Accessible: Available under an open license.
• Interoperable: Uses standard formats (e.g., JSON, CSV).
• Reusable: Includes complete metadata and clear licensing terms.
Use Cases:
• Healthcare data exchange (e.g., HL7, FHIR standards).
• Scientific data sharing (e.g., Open AIRE, Data Cite).
• Business data governance (e.g., GDPR compliance)
What is a Metadata Standard
• A Standard provides a structure to describe data with:
• Common terms to allow consistency between records
• Common definitions for easier interpretation
• Common language for ease of communication
• Common structure to quickly locate information
• In search and retrieval, standards provide:
• Documentation structure in a reliable and predictable format for computer
interpretation
• A uniform summary description of the dataset
why metadata standard?
• Different communities create, collect, manage, and use different types of
information.
• These communities may have different concerns and approaches to
metadata.
• There is no single standard that can satisfy all the needs across communities.
• However, in order for metadata to be understandable to the people and
software applications that use it, some sort of consistency is required.
• Organizations often define and publish metadata standards to meet needs
broadly across knowledge domains or within specialized disciplines.
• Published standards help system designers and end users accomplish their
goals effectively.
List of Metadata Standards
• The followings are Cross Disciplinary metadata standard
• 1 Dublin Core, MARK, EAD , MODS, METS and VRA Core
• The followings are Domain Specific metadata standards
• 1 Social Science and Humanity:- Categories for the Description for work of
Art (CDWA), Textile Encoding Initiative (TEI)
• 2 Library & Information Science: Online Information Exchange (ONIX) for
Books, PBCore, and Preservation Metadata: Implementation Strategies
(PREMIS)
• 3 Life Sciences: Darwin Core (DwC), and Ecological Metadata Language (EML)
• 4 Physical Sciences & Mathematics:-Crystallographic Information
Framework/File (CIF), Flexible Image Transport System (FITS), ISO 19115 and
NeXus
• 5 Social & Behavioral Sciences: Data Documentation Initiative (DDI) and
Friend of a Friend (FOAF)
Dublin Core Metadata Standard
• Dublin Core is the most common metadata schema for web content.
• Dublin Core consists of 15 elements that were considered broad and generic
enough to describe a wide range of resources.
Dublin Core Metadata Element Set
• 1. Title, 2. Creator, 3. Subject, 4. Description 5. Publisher, 6. Contributer, 7.
Coverage, 8. Date, 9. Type, 10. Format, 11. Right, 12. Source, 13. Language,
14. Relation, 15. Identifier
Steps to Create Quality Metadata
1. Organize your information
• Did you write a project abstract to obtain funding for your proposal? Re-use it in your
metadata!
• Did you use a lab notebook or other notes during the data development process that
define measurements and other parameters?
• Do you have the contact information for colleagues you worked with?
• What about citations for other data sources you used in your project?
2. Write your metadata using a metadata tool
3. Review for accuracy and completeness
4. Have someone else read your record
5. Revise the record, based on comments from your reviewer
6. Review once more before you publish
THANK YOU

53

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy