Models

Search
all
verified
Color_Extraction
Color Extraction

Color Extraction is a task in computer vision that involves the extraction and analysis of colors from images or videos. The objective of this task is to identify and isolate specific colors or color ranges present in the visual data.

user-avatar
641
21
9
Document_Parsing_by_Donut
Document Parsing by Donut

Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.

Emotion_Speech_Recognition
Speech Emotion Recognition

Speech Emotion Recognition (SER) is a field of study in artificial intelligence and natural language processing that focuses on identifying and classifying human emotions through speech. By analyzing audio features such as pitch, intensity, speech rate, and spectral characteristics, SER systems can recognize emotional states like happiness, sadness, anger, surprise, fear, and more.

Artwork_Image_Generator
Artwork Image Generator

Artwork Image Generator is an artificial intelligence model designed to generate artistic images in various styles.

Medical_Diagnosis
Medical Diagnosis

Medical diagnosis involves the process of identifying diseases, disorders, or conditions in patients based on their symptoms, medical history, physical examinations, and diagnostic tests. It is a complex task that relies on the expertise of healthcare professionals, aided by various tools and technologies.

MediaPipe_Face_Detection
MediaPipe Face Detection

Face detection is a computer vision technique that involves identifying and locating human faces within an image or video. The goal of face detection is to detect the presence of faces and draw bounding boxes around them, without necessarily identifying specific facial features or landmarks.

Clip_Crop
Clip Crop

Extract sections of images from your image by using OpenAI's CLIP and YoloSmall.

user-avatar
115
5
7
Metal_Band_Logos_Classification
Metal Band Logos Classification

The Metal Band Logos Classification task involves the classification or recognition of logos associated with metal music bands. It employs machine learning and computer vision techniques to analyze and categorize the visual characteristics of metal band logos.

Age_Prediction
Age Prediction

The Age Prediction task involves estimating the age or age range of an individual based on certain input features. It is a machine learning task that utilizes statistical models and algorithms to predict the age-related characteristics of a person.

ZeroShot_Image_Classification_CLIP
ZeroShot Image Classification CLIP

ZeroShot Image Classification CLIP is a task in the field of machine learning and image processing, aiming to predict the class or label of an image that has not been previously classified, in a dataset that the model has not been trained on with those classes.

user-avatar
161
13
10
Image_Generator_Using_SSD_1B

Image Generator Using SSD 1B

Image Generator Using SSD 1B is a powerful deep learning model specifically designed for image synthesis and generation.

user-avatar
197
10
5
Attention_Maps_Exploration_by_SimPool
Attention Maps Exploration by SimPool

The ViT-S model (Vision Transformer-Small) is a variant of the Vision Transformer architecture, which applies the Transformer model to image recognition tasks. SimPool, short for "Simplified Pooling," is a pooling method designed to aggregate information from the ViT-S model s attention maps and produce a fixed-size representation for downstream tasks.

Anime_Background_Style_Transfer
Anime Background Style Transfer

Anime backgrounds, also known as anime backgrounds art or anime scenery, refer to the visual elements that form the backdrop of animated scenes in anime. These backgrounds are carefully designed and illustrated to provide the setting, atmosphere, and context for the characters and events within the anime.

user-avatar
160
6
9
Human_Activity_Recognition
Human Activity Recognition

Human Activity Recognition (HAR) is the task of automatically identifying and classifying human activities based on sensor data or input from various sources, such as wearable devices, cameras, or microphones. The goal is to recognize and understand the actions and movements performed by individuals in order to gain insights, monitor behavior, or enable context-aware applications.

ViT_ImageNet_Classification
ViT ImageNet Classification

Object Classification, also known as Object Recognition, is a computer vision task that involves identifying and categorizing objects within an image or a video frame. The goal is to train a model to recognize and assign labels to different objects or classes present in the visual data.

Image_To_Anime
Image to Anime

The goal of Image to Anime was to create a new version of the image that would possess the same clean lines and evoke the characteristic feel found in anime productions, capturing the unique artistry and aesthetics associated with this style.

user-avatar
197
20
12
Image_To_Sketch
Image to Sketch

Image to Sketch conversion is a fascinating process that involves transforming regular photographs or digital images into hand-drawn or pencil-like sketches. This technique has gained popularity among artists, designers, and photography enthusiasts as it offers a creative and artistic way to reinterpret and stylize images.

Image_Restoration_by_SRMNet
Image Restoration by SRMNet

Image Restoration is a compute vision task which restoring from the degraded images to clean images.

user-avatar
120
8
10
Prompt_Extend
Prompt Extend

Prompt Extend is an innovative approach that aims to enhance the capabilities of language models and improve their response generation. It involves extending the initial prompt or query by providing additional context or specifications to guide the model's understanding and generate more accurate and relevant responses.

user-avatar
51
21
5
Dense_Prediction_for_Vision_Transformers
Dense Prediction for Vision Transformers

Dense Prediction for Vision Transformers is a task focused on applying Vision Transformers (ViTs) to dense prediction problems, such as object detection, semantic segmentation, and depth estimation. Unlike traditional image classification tasks, dense prediction involves making predictions for each pixel or region in an image.