Datasets

Search
all
verified

Popular Datasets

data-training
Data Training Ver01

"Meta Llama 3" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta. META LLAMA 3 COMMUNITY LICENSE AGREEMENT. Meta Llama 3 Version Release Date: April 18, 2024

NIH_Chest_X_ray
NIH Chest X-Ray

NIH Chest X-Ray is a large dataset containing chest X-ray images of patients collected by the National Institutes of Health (NIH) of the United States.

TAL-SCQ5K
TAL-SCQ5K

TAL-SCQ5K are high-quality mathematical competition datasets created by TAL Education Group.

BLiMP
BLiMP

The Benchmark of Linguistic Minimal Pairs, a challenge set for evaluating the linguistic knowledge of language models (LMs) on major grammatical phenomena in English, finds that state-of-the-art models identify morphological contrasts related to agreement reliably, but they struggle with some subtle semantic and syntactic phenomena.

super-dataset-in-the-world
Super Dataset In The World

The Super Dataset in the World is a groundbreaking, all-encompassing data repository designed to empower researchers, developers, and industry professionals with an unparalleled resource for machine learning, data analytics, and AI innovation. Meticulously curated from diverse, high-quality sources across multiple domains, this dataset sets a new benchmark in data comprehensiveness, accuracy, and scalability

AI2_Reasoning_Challenge
AI2 Reasoning Challenge

The ARC dataset consists of 7,787 science exam questions drawn from a variety of sources, including science questions provided under license by a research partner affiliated with AI2.

XQuAD
XQuAD

This dataset is a great resource for researchers who want to evaluate cross-lingual question answering performance.

DOCCI
DOCCI

The DOCCI dataset consists of comprehensive descriptions on 15k images specifically taken with the objective of evaluating T2I and I2T models. These cover a lot of key details in the images, as illustrated below.