Models

Search
all
verified
Background_Removal

Background Removal is an image processing technique used to separate the main object from the background of a photo. Removing the background helps highlight the product, subject, or character, bringing a professional and aesthetically pleasing look to the image.

pytorch

1.75424 AIOZ ($0.8)

208
28
14
Color_Extraction

Color Extraction is a task in computer vision that involves the extraction and analysis of colors from images or videos. The objective of this task is to identify and isolate specific colors or color ranges present in the visual data.

pytorch

0 AIOZ ($0)

168
4
5
Image_To_Anime

The goal of Image To Anime was to create a new version of the image that would possess the same clean lines and evoke the characteristic feel found in anime productions, capturing the unique artistry and aesthetics associated with this style.

pytorch

2.19280 AIOZ ($1)

141
19
11
ZeroShot_Image_Classification_CLIP

ZeroShot Image Classification CLIP is a task in the field of machine learning and image processing, aiming to predict the class or label of an image that has not been previously classified, in a dataset that the model has not been trained on with those classes.

pytorch
transformers
tf
jax

0 AIOZ ($0)

90
9
9
Image_Restoration_by_SRMNet

Image Restoration is a compute vision task which restoring from the degraded images to clean images.

pytorch

0 AIOZ ($0)

82
7
9
Anime_Background_Style_Transfer

Anime backgrounds, also known as anime backgrounds art or anime scenery, refer to the visual elements that form the backdrop of animated scenes in anime. These backgrounds are carefully designed and illustrated to provide the setting, atmosphere, and context for the characters and events within the anime.

pytorch

0 AIOZ ($0)

63
1
7

Collections


Token Classification

Token classification is a natural language understanding task in which a label is assigned to some tokens in a text. Some popular token classification subtasks are Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging.

Audio Classification

Audio classification is the task of assigning a label or class to a given audio. It can be used for recognizing which command a user is giving or the emotion of a statement, as well as identifying a speaker.

Depth Estimation

Depth estimation is the task of predicting depth of the objects present in an image.

Image Feature Extraction

Image feature extraction is the task of extracting features learnt in a computer vision model.

Image-to-Text

The Image-to-text task is an important task in the field of natural language processing and computer vision. Its purpose is to convert information within an image into readable and understandable text.

Object Detection

The Object Detection task is an important task in the fields of computer vision and artificial intelligence. Its main objective is to detect and determine the position of objects within images or videos.

tyểweetrjy

fgfhjklhgsfead

Question Answering

Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document. Some question answering models can generate answers without context!

Latest


ZeroShot_Image_Classification_CLIP

ZeroShot Image Classification CLIP is a task in the field of machine learning and image processing, aiming to predict the class or label of an image that has not been previously classified, in a dataset that the model has not been trained on with those classes.

user-avatar
90
9
9
Image_Restoration_by_SRMNet

Image Restoration is a compute vision task which restoring from the degraded images to clean images.

user-avatar
82
7
9
Jaks_Woolitize_Image_Generator

Jak's Woolitize Image Generator

Jak's Woolitize Image Generator is a text to image task that focuses on applying a woolitize texture and appearance to generated images, creating images that convey warmth.

user-avatar
20
3
4
Image_To_Anime

The goal of Image To Anime was to create a new version of the image that would possess the same clean lines and evoke the characteristic feel found in anime productions, capturing the unique artistry and aesthetics associated with this style.

user-avatar
141
19
11
Document_Parsing_by_Donut

Document Parsing by Donut

Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.

user-avatar
16
0
0
Image_to_Text_by_Pix2Struct

Image to Text by Pix2Struct

Pix2Struct is an image encoder - text decoder model that is trained on image-text pairs for various tasks, including image captionning and visual question answering.

user-avatar
10
0
0
Document_Visual_Question_Answering

Document Visual Question Answering

Donut model fine-tuned on DocVQA. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.

user-avatar
11
0
0
Dense_Prediction_for_Vision_Transformers

Dense Prediction for Vision Transformers

Dense Prediction Transformer (DPT) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper Vision Transformers for Dense Prediction by Ranftl et al. (2021). DPT uses the Vision Transformer (ViT) as backbone and adds a neck + head on top for monocular depth estimation. This repository hosts the ""hybrid"" version of the model as stated in the paper. DPT-Hybrid diverges from DPT by using ViT-hybrid as a backbone and taking some activations from the backbone.

user-avatar
10
0
0
Image_Restoration_by_Deblur

Image Restoration by Deblur

CMFNet achieves competitive performance on three tasks: image deblurring, image dehazing and image deraindrop.

user-avatar
13
0
0
Chest_X-rays_classification_with_ViT

Chest X-rays classification with ViT is not a specific term or model that I am aware of in the context of pneumonia classification using X-ray images. However, I can provide you with a general overview of pneumonia classification using X-ray images and the role of vision transformers (ViTs) in image analysis tasks.

user-avatar
28
2
5
Image_Restoration_by_Deraindrop

Image Restoration by Deraindrop

CMFNet achieves competitive performance on three tasks: image deblurring, image dehazing and image deraindrop.

user-avatar
10
0
0
Speed_Recognition_by_Fairseq_S2T

Speed Recognition by Fairseq S2T

S2T is an end-to-end sequence-to-sequence transformer model. It is trained with standard autoregressive cross-entropy loss and generates the transcripts autoregressively.

user-avatar
15
0
0
Image_Generator_Using_SSD_1B

Image Generator Using SSD 1B

Image Generator Using SSD 1B is a powerful deep learning model specifically designed for image synthesis and generation.

user-avatar
52
5
5
Anime_Style_Image_Generator

Anime Style Image Generator

The task is a powerful tool that generates anime-style images based on text descriptions or prompts.

user-avatar
22
5
3
Longformer_for_SQuADv2

Longformer for SQuADv2

Longformer uses a combination of a sliding window (local) attention and global attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations.

user-avatar
13
0
0
Text_To_Sound

Text To Sound

The Text-to-Sound task involves converting written text into audible speech. It is a technology that utilizes natural language processing and speech synthesis techniques to transform written words into a spoken form. This task plays a crucial role in various applications, such as text-to-speech (TTS) systems, accessibility tools for visually impaired individuals, voice assistants, and automated voice response systems.

user-avatar
0
0
0