Datasets
Welcome to our Datasets database, where you’ll find hundreds of datasets from various categories such as computer vision, audio, NLP, and more. All datasets are free and ready for use on the DagsHub platform for all your projects. Browse through our categories and find the perfect dataset to fit your needs. Get started today and experience the power of data.
Search datasets:
Filter results:
LAION-Aesthetics V2 (6.5+)
NOAA Geostationary Operational Environmental Satellites (GOES) 16, 17 & 18
Sentinel-2 Cloud-Optimized GeoTIFFs
Radiant MLHub
Image classification – fast.ai datasets
OpenCell on AWS
ESA WorldCover
SiPeCaM (Sitios Permanentes de la Calibración y Monitoreo de la Biodiversidad)
Allen Brain Observatory – Visual Coding AWS Public Data Set
Prefeitura Municipal de São Paulo (PMSP) LiDAR Point Cloud
Click for more
Automatic Speech Recognition (ASR) Error Robustness
Helpful Sentences from Reviews
Learning to Rank and Filter – community question answering
AI2 TabMCQ: Multiple Choice Questions aligned with the Aristo Tablestore
The Klarna Product-Page Dataset
MultiCoNER Dataset
Low Context Name Entity Recognition (NER) Datasets with Gazetteer
Common Screens
WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation
Sudachi Language Resources
Japanese Tokenizer Dictionaries
Click for more
Covid Job Impacts – US Hiring Data Since March 1 2020
NASA Prediction of Worldwide Energy Resources (POWER)
U.S. Census ACS PUMS
Japanese Tokenizer Dictionaries
recount3
Mars Spectrometry: Detect Evidence for Past Habitability
Legal Entity Identifier (LEI) and Legal Entity Reference Data (LE-RD)
TSBench
Speedtest by Ookla Global Fixed and Mobile Network Performance Maps
CAM6 Data Assimilation Research Testbed (DART) Reanalysis: Cloud-Optimized Dataset
Mars Spectrometry 2: Gas Chromatography for the Sample Analysis at Mars Data (SAM) Instrument
Click for more
NOAA National Water Model Short-Range Forecast
Prefeitura Municipal de São Paulo (PMSP) LiDAR Point Cloud
NREL National Solar Radiation Database
The Klarna Product-Page Dataset
DigitalCorpora
Common Screens
NASA Prediction of Worldwide Energy Resources (POWER)
Geosnap Data, Center for Geospatial Sciences
Multiview Extended Video with Activities (MEVA)
Normalized Difference Urban Index (NDUI)
ComStock
Click for more
Improve your data quality for better AI
Easily curate and annotate your vision, audio, and document data with a single platform
Binding DB – Data Lakehouse Ready
IBL Behavioral Data on AWS
1000 Genomes
OpenCell on AWS
1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5 – Data Lakehouse Ready
Encyclopedia of DNA Elements (ENCODE)
Variant Effect Predictor (VEP) and the Loss-Of-Function Transcript Effect Estimator (LOFTEE) Plugin
Allen Ivy Glioblastoma Atlas
SiPeCaM (Sitios Permanentes de la Calibración y Monitoreo de la Biodiversidad)
Tabula Sapiens
COVID-19 Genome Sequence Dataset
Click for more
NOAA Geostationary Operational Environmental Satellites (GOES) 16, 17 & 18
Radiant MLHub
CMIP6 GCMs downscaled using WRF
Sentinel-2 Cloud-Optimized GeoTIFFs
Community Earth System Model Large Ensemble (CESM LENS)
NOAA Real-Time Mesoscale Analysis (RTMA)
ESA WorldCover
NOAA National Water Model Short-Range Forecast
Virginia Coastal Resilience Master Plan, Phase 1 – December 2021
HIRLAM Weather Model
Defense Meteorology Satellite Program (DMSP) Auroral Particle Flux
Click for more