Chapter 03 - Meta Data Management - 04
Chapter 03 - Meta Data Management - 04
Data Concepts
1
Data Concepts in Data Management and Curation
➢ Metadata was found in different contexts, including digital files, libraries, websites and databases.
➢ It plays a major role in management the data, retrieving the information, organization of the content with
different domains.
➢ Metadata includes a wide range of information depends upon the content which they are used.
➢ Let us consider an example in the real world, in digital photography context, metadata can include the details
like time of the photo, the date of the photo was taken, coordination of the GPS,
➢ it indicates where the photo was taken and even the data or information about the photographer.
➢ In document context, metadata includes the details like author of the document, title of the document, creation
date of the document and keywords of the document.
What is Metadata?
➢ Metadata describes technical, business, or operational aspects of other data.
➢ This provides you context so you can find the information you need more easily and use your
data more effectively.
➢ What is “other data”? By other data, we mean a collection of facts which represent measurements
or descriptions of a situation.
➢ These facts can be in the form of numbers, symbols or words and are typically stored digitally.
➢ Now let’s get back to defining metadata. To help you to find, understand, and access
the information you need, this “other data” needs to have metadata associated with it.
➢ The metadata identifies other data and gives it context by providing core information
about it, such as author, creation time, file size, file type, topic, etc.
What is Metadata?
➢Metadata is the data, or information, about the data.
➢Data ‘reporting’
• created the data?
Who • manages the data?
• is the study area?
Where • can I access the data?
• is the data content?
What • source data was used?
• was the data created?
How • is the data distributed?
• is the time period of the content?
When • was the data created?
• was the data created?
of your data Why • are there missing values?
A Label For Your Data
➢ In this example the:
➢ Front label indicates:
➢ the title (Vzz),
➢ a short abstract (100% juice) about the
content,
➢ the size (64 oz.) of the resource, and
➢ some supplemental information (sodium
content) that consumers may consider
valuable.
➢ The timestamp on the cap indicates the
time period or freshness of the resource.
Metadata in Real Life
• Metadata is all around…
Author(s) Boullosa, Carmen. For example, a card catalogue tell us more information
Title(s) They're cows, we're pigs /
than just the title of the book, they also tells the user:
by Carmen Boullosa
Place New York : Grove Press, 1997. Who is the author?
Physical Descr viii, 180 p ; 22 cm. Who published the book?
Subject(s) Pirates Caribbean Area Fiction. What subject area does the book fall in?
Format Fiction
And finally, where is it located in the library?
Another example of metadata that we see in our daily lives is the nutrition and
ingredient information on food labels.
Nutrition labels answer questions such as:
What ingredients were used?
Who made the food?
How many calories per serving?
How many servings in the can?
What percentage of daily vitamins are in each serving?
What is Metadata management?
➢Metadata management is a set of activities, technologies, and policies that
target metadata collection, storage, and organizing.
➢Its goal is making data assets understandable and discoverable for users.
➢ Your company likely has a large volume of complex data coming from many sources. And
➢ you need to be able to find, understand and trust the right information to gain actionable insights
that improve your business.
➢ The key benefits of robust metadata management as part of your data governance framework:
• Discover how to acquire the dataset identified; process and use the dataset
• Data Maintenance:
• Are the data current?
• Are the data in a reliable format?
• Where are the data stored?
• Data Update:
• Contact information
• Distribution policies, availability, pricing, URLs
• New derivations of the dataset
Data Preservation
• Data preservation ensures that data remains accessible and usable for the long term, even as
technology evolves.
• Challenges in Data Preservation:
• Data Degradation – Risk of data loss due to storage failures.
• Format Obsolescence – Old file formats becoming unreadable.
• Storage Medium Issues – Hard drives, CDs, and tapes degrade over time.
• Legal and Ethical Concerns – Privacy laws may change.
• Example: Digital Archives
• The Library preserves historical documents digitally.
• NASA maintains old satellite data using modern storage techniques.
• Best Practices:
• Use open and widely accepted formats (e.g., CSV instead of proprietary Excel files).
• Store multiple copies in geographically separate locations.
• Implement backup and recovery plans.
Identifiers in Data Management
• An identifier is a unique reference assigned to data objects to distinguish them.
• Types of Identifiers:
1. Persistent Identifiers (PIDs) – Do not change over time (e.g., DOI, ORCID).
2. Hierarchical Identifiers – Structured identifiers for classification (e.g., ISBN for books).
3. Proprietary Identifiers – Internal identifiers used by organizations.
• Examples of Identifiers:
• DOI (Digital Object Identifier): Used for research papers (e.g., 10.1000/xyz123).
• ORCID: Unique researcher identifier (0000-0001-2345-6789).
• ISBN (International Standard Book Number): Identifies books globally (978-3-16-148410-0).
• Best Practices:
• Use global identifiers for research outputs.
• Ensure consistent naming schemes across systems.
• Avoid changing identifiers after publication.
Workflow, Provenance
• Data Workflow
• A workflow defines the sequence of steps for managing data, from
collection to analysis and storage.
• Example: Data workflow for research
1. Collect data from sensors.
2. Clean and preprocess data.
3. Store data in a database.
4. Analyze using Python.
5. Publish results in an open repository.
Workflow, Provenance ..
• Provenance
• Data provenance tracks the origin, history, and modifications made to a
dataset.
• Example: Data Provenance in a Lab Study
• Original Data: Collected on 2025-03-01 from field sensors.
• Processing Step 1: Outliers removed on 2025-03-02.
• Processing Step 2: Data normalized on 2025-03-03.
Communication in Data Management
• Effective communication ensures that data is correctly shared, interpreted, and used.
• Key Aspects of Data Communication:
• Clear Documentation – Use README files, manuals, and metadata.
• Data Visualization – Charts, graphs, dashboards.
• Standard Formats – Use CSV, JSON, XML instead of proprietary formats.
Example:
A company uses Power BI dashboards to present sales data to executives.
Best Practices in Data Management and Curation
1. Use structured metadata for documentation.
2. Implement access controls to protect sensitive data.
3. Regularly backup and archive important data.
4. Ensure compliance with data standards and regulations.
5. Encourage open data sharing while respecting privacy rules.
Standards in Data Management
• Data standards ensure consistency, interoperability, and data quality.
• Common Data Standards:
• Dublin Core – Metadata standard for digital content.
• ISO 27001 – Data security management standard.
• FAIR Principles – Data should be Findable, Accessible, Interoperable, and Reusable.
Example:
A research dataset following FAIR Principles:
• Findable: Indexed in a public repository.
• Accessible: Available under an open license.
• Interoperable: Uses standard formats (e.g., JSON, CSV).
• Reusable: Includes complete metadata and clear licensing terms.
Use Cases:
• Healthcare data exchange (e.g., HL7, FHIR standards).
• Scientific data sharing (e.g., Open AIRE, Data Cite).
• Business data governance (e.g., GDPR compliance)
What is a Metadata Standard
• A Standard provides a structure to describe data with:
• Common terms to allow consistency between records
• Common definitions for easier interpretation
• Common language for ease of communication
• Common structure to quickly locate information
• In search and retrieval, standards provide:
• Documentation structure in a reliable and predictable format for computer
interpretation
• A uniform summary description of the dataset
why metadata standard?
• Different communities create, collect, manage, and use different types of
information.
• These communities may have different concerns and approaches to
metadata.
• There is no single standard that can satisfy all the needs across communities.
• However, in order for metadata to be understandable to the people and
software applications that use it, some sort of consistency is required.
• Organizations often define and publish metadata standards to meet needs
broadly across knowledge domains or within specialized disciplines.
• Published standards help system designers and end users accomplish their
goals effectively.
List of Metadata Standards
• The followings are Cross Disciplinary metadata standard
• 1 Dublin Core, MARK, EAD , MODS, METS and VRA Core
• The followings are Domain Specific metadata standards
• 1 Social Science and Humanity:- Categories for the Description for work of
Art (CDWA), Textile Encoding Initiative (TEI)
• 2 Library & Information Science: Online Information Exchange (ONIX) for
Books, PBCore, and Preservation Metadata: Implementation Strategies
(PREMIS)
• 3 Life Sciences: Darwin Core (DwC), and Ecological Metadata Language (EML)
• 4 Physical Sciences & Mathematics:-Crystallographic Information
Framework/File (CIF), Flexible Image Transport System (FITS), ISO 19115 and
NeXus
• 5 Social & Behavioral Sciences: Data Documentation Initiative (DDI) and
Friend of a Friend (FOAF)
Dublin Core Metadata Standard
• Dublin Core is the most common metadata schema for web content.
• Dublin Core consists of 15 elements that were considered broad and generic
enough to describe a wide range of resources.
Dublin Core Metadata Element Set
• 1. Title, 2. Creator, 3. Subject, 4. Description 5. Publisher, 6. Contributer, 7.
Coverage, 8. Date, 9. Type, 10. Format, 11. Right, 12. Source, 13. Language,
14. Relation, 15. Identifier
Steps to Create Quality Metadata
1. Organize your information
• Did you write a project abstract to obtain funding for your proposal? Re-use it in your
metadata!
• Did you use a lab notebook or other notes during the data development process that
define measurements and other parameters?
• Do you have the contact information for colleagues you worked with?
• What about citations for other data sources you used in your project?
2. Write your metadata using a metadata tool
3. Review for accuracy and completeness
4. Have someone else read your record
5. Revise the record, based on comments from your reviewer
6. Review once more before you publish
THANK YOU
53