0% found this document useful (0 votes)
0 views4 pages

List of Topic For ML Project

The document outlines three main topics: COVID-19, Climate and the Environment, and Emerging Researches and Technologies, each with associated datasets. For COVID-19, datasets include testing statistics, health care impacts, and ongoing research articles. The Climate and Environment section covers climate measurements and biodiversity studies, while the Emerging Researches section focuses on space exploration data and recommender systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views4 pages

List of Topic For ML Project

The document outlines three main topics: COVID-19, Climate and the Environment, and Emerging Researches and Technologies, each with associated datasets. For COVID-19, datasets include testing statistics, health care impacts, and ongoing research articles. The Climate and Environment section covers climate measurements and biodiversity studies, while the Emerging Researches section focuses on space exploration data and recommender systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Topic 1: COVID-19

DAT ASET A: TESTING AND MORTALITY STATISTICS


This dataset contains US reports on COVID-19 testing and cases from
the COVID-19 Data Repository by the Center for Systems Science and
Engineering (CSSE) at Johns Hopkins University and CDC (Centers for Disease
Control and Prevention). You can access all the data within the Topic
1/Dataset A directory on Google Drive:

 csse_covid_19_daily_reports_us.csv contains US daily reports


(documentation)
 contains US weekly reports on
cdc_death_counts_by_sex_age_state.csv
deaths involving COVID-19, pneumonia, and influenza reported to
NCHS by sex, age, group, and state. (documentation)
 cdc_death_counts_by_conditons.csv contains US weekly reports on health
conditions and contributing causes mentioned in conjunction with
deaths involving COVID-19. (documentation)
You must choose to work with at least 2 of the reports above in your
analysis.

DAT ASET B: IMPACT ON HEALTH CARE


This dataset contains reports from the Household Pulse Survey launched by
NCHS in partnership with the U.S. Census Bureau; it focuses on how COVID-
19 has affected survey correspondents’ mental health and their access to
health care. In addition, it provides statistics on usage of telemedicine by
healthcare providers. You can access all the data within the Topic 1/Dataset
B directory on Google Drive:

 nchs_covid_indicators_of_anxiety_depression.csvcontains survey estimates


of responses to questions that are indicators of anxiety or depression
based on reported frequency of symptoms within the past week.
(documentation)
 nchs_covid_mental_health_care.csv contains survey estimates of responses
to questions that ask if participants have accessed mental health care
in the past 4 weeks. (documentation)
 nchs_covid_health_insurance_coverage.csv contains survey estimates of
responses to questions that ask about participants’ health insurance
coverage. (documentation)
 nchs_covid_reduced_access_to_health_care.csv contains survey estimates of
responses to questions that ask if participants have experienced delay
or been refused health care due to COVID-19. (documentation)
 nchs_covid_telemedicine_usage.csv contains survey estimates of responses
to questions that ask if healthcare providers offered telemedicine
(including video and telephone appointments) – both during and before
the pandemic – and about the use of telemedicine during the
pandemic. (documentation)
You must choose to work with at least 3 of the reports above in your
analysis.

DAT ASET C: ONGOING RESEARCHES


This dataset contains (in full-text and metadata form) scholarly articles
related to COVID-19. The data are optimized for machine readability and
made available for use by the global research community. The dataset is
intended to mobilize researchers to generate new insights from the articles
in support of the fight against this infectious disease. You can access all the
data within the Topic 1/Dataset C directory on Google Drive:

 contains the link that will guide you to


covid_open_research_dataset.txt
obtain the full-text and metadata dataset of COVID-related research
articles. (documentation)

Topic 2: Climate and the Environment

DAT ASET A: GENERAL MEASUREMENTS AND STATISTICS


This dataset contains some general statistics and measurements of various
aspects of the climate and the environment. You can access all the data
within the Topic 2/Dataset A directory on Google Drive. It includes the
following reports:

 daily_global_weather_2020.csv contains data on daily temperature and


precipitation measurements. To learn how to use the data from this
file, please read the following section on the first report.
 us_greenhouse_gas_emissions_direct_emitter_facilities.csv and us_greenhouse_g
as_emission_direct_emitter_gas_type.csv contain data reported by EPA
(Environment Protection Agency) on greenhouse gas emissions,
detailing the specific types of gas reported by facilities and general
information about the facilities themselves. The dataset is made
available through EPA’s GHGRP (Greenhouse Gas Reporting Program).
 us_air_quality_measures.csv contains data from the EPA’s AQS (Air Quality
System) that measures air quality on a county level from
approximately 4000 monitoring stations around the country. (source)
 aqi_data contains more data from the EPA from a number of sites
across a multitude of different metrics. (source)
The following subsection contains more details on how to work with the first
report on global daily temperature and precipitation:

The first report on daily temperature and precipitation is measured by


weather stations in the Global Historical Climatology Network for January to
December 2020.

The data in daily_global_weather_2020.csv is derived from the source file at


https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/2020.csv.gz.
To help you get started with a dataset of manageable size, we have
preprocessed the GHCN dataset to include only the average temperature
and precipitation measurements from stations that have both
measurements. Each row in the preprocessed dataset contains both the
average temperature and precipitation measurements for a given station on
a given date.

If you wish to explore the climate data for a different year, you can use
the GHCN_data_preprocessing.ipynb notebook to download and perform the
preprocessing described above. Please be advised that depending on the
dataset size for a given year, GHCN_data_preprocessing.ipynb may not run on
DataHub. We will not be providing infrastructural support for running the
notebook, but you are welcome to run it on a different machine you have
access to or ask a GSI to dump the data for you.

The data contains only the (latitude, longitude) coordinates for the weather
stations. To map the coordinates to geographical locations, the reverse-
geocoder package mentioned in the References section might be helpful.

DAT ASET B: BIODIVERSITY IN THE ECOSYSTEM


This dataset contains studies focused specifically on the impact of
environmental and climate changes on biodiversity and the local
ecosystems. You can access all the data within the Topic 2/Dataset B directory
on Google Drive. It includes the following reports:

 bioCON_plant_diversity.csv contains data collected as part of an ecological


experiment, BioCON (Biodiversity, CO2, and Nitrogen), that started in
1997 and focused on studying biodiversity within the plant species at
Cedar Creek Ecosystem Science Preserve. (documentation)
 plant_pollinator_diversity_set1.csv and plant_pollinator_diversity_set2.csv c
ontain ecological data collected from a long-term observation study
from 2011 to 2018 that focuses on plant-pollinator interaction and its
impact on local biodiversity. (documentation)
 national_parks_biodiversity_parks.csv and national_parks_biodiversity_species
.csv contain data published by the National Park Service on animal
and plant species identified in individual national parks.

Topic 3: Emerging Researches and Technologies

DAT ASET A: SPACE EXPLORATION


This dataset contains a set of reports from pioneering researches that
explore the outer space. Much of the data from these studies have provided
a rich foundation for a variety of large-scale research projects that explore
widely discussed topics such as habitable exoplanets or search for
extraterrestrial life.

You can access all the data within the Topic 3/Dataset A directory on Google
Drive. It includes the following reports:
 kepler_exoplanet_search.csv contains data collected by NASA from the
Kepler Space Observatory as part of a long-term study on finding
habitable exoplanets from over 10,000 candidates. (source)
 kelper_planetary_system_composite.csv contains data collected by NASA
from the Kelper Space Observatory as part of an ongoing study that
tabulates all confirmed planetary systems outside the solar system.
You are encouraged to use the composite data in conjunction with the
exoplanet search results above. (source)
 nasa_neows.csv contains data collected from NASA’s NeoWs (Near Earth
Object Web Service) that collects information on near earth asteroids.

DAT ASET B: RECOMMENDER SYSTEMS


A recommender system is an information filtering system that focuses on
predicting the preference a user would give to an item by predicting its rank;
it is used in a variety of areas, such as search engines, online shopping
platforms, etc. This dataset contains a set of reports on various tools using a
recommender system.

You can access all the data within the Topic 3/Dataset B directory on Google
Drive. It includes the following reports:

 fitness_recommendation.txtcontains a link to access the fitness data from


sequential sensors for various workouts. (documentation)
 amazon_reviews.txt contains a link to access the data on a subset of
Amazon product reviews. The report includes metadata such as ratings
and text on the reviews and general information about the product.
(documentation)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy