Datasets
Popular Datasets
NIH Chest X-Ray is a large dataset containing chest X-ray images of patients collected by the National Institutes of Health (NIH) of the United States.
by @AIOZNetwork

TAL-SCQ5K are high-quality mathematical competition datasets created by TAL Education Group.
by @AIOZNetwork

The Benchmark of Linguistic Minimal Pairs, a challenge set for evaluating the linguistic knowledge of language models (LMs) on major grammatical phenomena in English, finds that state-of-the-art models identify morphological contrasts related to agreement reliably, but they struggle with some subtle semantic and syntactic phenomena.
by @AIOZNetwork

The ARC dataset consists of 7,787 science exam questions drawn from a variety of sources, including science questions provided under license by a research partner affiliated with AI2.
by @AIOZNetwork

The DOCCI dataset consists of comprehensive descriptions on 15k images specifically taken with the objective of evaluating T2I and I2T models. These cover a lot of key details in the images, as illustrated below.
by @AIOZNetwork

This dataset is a great resource for researchers who want to evaluate cross-lingual question answering performance.
by @AIOZNetwork

This is the repository for PLOD Dataset subset being used for CW in NLP module 2023-2024 at University of Surrey.
by @AIOZNetwork
