Models

Search
all
verified
Color_Extraction
Color Extraction

Color Extraction is a task in computer vision that involves the extraction and analysis of colors from images or videos. The objective of this task is to identify and isolate specific colors or color ranges present in the visual data.

mit
Image-to-Text
PyTorch
English

byuser-avatar@AIOZNetwork

641
21
9
Background_Removal
Background Removal

Background Removal is an image processing technique used to separate the main object from the background of a photo. Removing the background helps highlight the product, subject, or character, bringing a professional and aesthetically pleasing look to the image.

apache-2.0
Image-to-Image
PyTorch
English

byuser-avatar@AIOZNetwork

258
36
17
Image_To_Anime
Image to Anime

The goal of Image to Anime was to create a new version of the image that would possess the same clean lines and evoke the characteristic feel found in anime productions, capturing the unique artistry and aesthetics associated with this style.

mit
Image-to-Image
PyTorch
English

byuser-avatar@AIOZNetwork

197
20
12
ZeroShot_Image_Classification_CLIP
ZeroShot Image Classification CLIP

ZeroShot Image Classification CLIP is a task in the field of machine learning and image processing, aiming to predict the class or label of an image that has not been previously classified, in a dataset that the model has not been trained on with those classes.

mit
Zero-Shot Image Classification
PyTorch
Transformers
TensorFlow
JAX
English

byuser-avatar@AIOZNetwork

161
13
10
Anime_Background_Style_Transfer
Anime Background Style Transfer

Anime backgrounds, also known as anime backgrounds art or anime scenery, refer to the visual elements that form the backdrop of animated scenes in anime. These backgrounds are carefully designed and illustrated to provide the setting, atmosphere, and context for the characters and events within the anime.

mit
Image-to-Image
PyTorch
English

Image_Restoration_by_SRMNet
Image Restoration by SRMNet

Image Restoration is a compute vision task which restoring from the degraded images to clean images.

apache-2.0
Image-to-Image
PyTorch
English

byuser-avatar@AIOZNetwork

120
8
10

Collections


Image-to-Text

The Image-to-Text task is an important task in the field of natural language processing and computer vision. Its purpose is to convert information within an image into readable and understandable text.

Zero-Shot Image Classification

Task Zero-Shot Image Classification is an important task in the field of image processing and artificial intelligence. This task aims to classify images into different categories where the model has never been trained before.

Object Detection

The Object Detection task is an important task in the fields of computer vision and artificial intelligence. Its main objective is to detect and determine the position of objects within images or videos.

Document Question Answering

The DQA is a task in natural language processing and information retrieval that focuses on automatically generating accurate and relevant answers to questions based on a given document.

Image Segmentation

The Image Segmentation task is an important task in the fields of computer vision and image processing. Its main objective is to segment and classify different regions within an image to delineate the boundaries of objects.

Image to Image

Image-to-Image is an important task in the field of image processing, where we convert images from one format or data type to another.

Text Generation

Task Text Generation is an important task in the field of natural language processing and artificial intelligence. This task aims to generate text automatically from input data, including descriptions, stories, articles, or other types of text.

Text to Image

Task Text-to-Image is an important task in the field of artificial intelligence and natural language processing. This task aims to create images from descriptions or descriptive text.

Latest


Artwork_Image_Generator
Artwork Image Generator

Artwork Image Generator is an artificial intelligence model designed to generate artistic images in various styles.

Attention_Maps_Exploration_by_SimPool
Attention Maps Exploration by SimPool

The ViT-S model (Vision Transformer-Small) is a variant of the Vision Transformer architecture, which applies the Transformer model to image recognition tasks. SimPool, short for "Simplified Pooling," is a pooling method designed to aggregate information from the ViT-S model s attention maps and produce a fixed-size representation for downstream tasks.

ZeroShot_Image_Classification_CLIP
ZeroShot Image Classification CLIP

ZeroShot Image Classification CLIP is a task in the field of machine learning and image processing, aiming to predict the class or label of an image that has not been previously classified, in a dataset that the model has not been trained on with those classes.

user-avatar
161
13
10
Image_Restoration_by_SRMNet
Image Restoration by SRMNet

Image Restoration is a compute vision task which restoring from the degraded images to clean images.

user-avatar
120
8
10
Prompt_Extend
Prompt Extend

Prompt Extend is an innovative approach that aims to enhance the capabilities of language models and improve their response generation. It involves extending the initial prompt or query by providing additional context or specifications to guide the model's understanding and generate more accurate and relevant responses.

user-avatar
51
21
5
Jaks_Woolitize_Image_Generator

Jak's Woolitize Image Generator

Jak's Woolitize Image Generator is a text to image task that focuses on applying a woolitize texture and appearance to generated images, creating images that convey warmth.

Image_To_Anime
Image to Anime

The goal of Image to Anime was to create a new version of the image that would possess the same clean lines and evoke the characteristic feel found in anime productions, capturing the unique artistry and aesthetics associated with this style.

user-avatar
197
20
12
Clip_Crop
Clip Crop

Extract sections of images from your image by using OpenAI's CLIP and YoloSmall.

user-avatar
115
5
7
Document_Parsing_by_Donut
Document Parsing by Donut

Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.

Image_to_Text_by_Pix2Struct

Image to Text by Pix2Struct

Pix2Struct is an image encoder - text decoder model that is trained on image-text pairs for various tasks, including image captionning and visual question answering.

Document_Visual_Question_Answering

Document Visual Question Answering

Donut model fine-tuned on DocVQA. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.

Dense_Prediction_for_Vision_Transformers
Dense Prediction for Vision Transformers

Dense Prediction for Vision Transformers is a task focused on applying Vision Transformers (ViTs) to dense prediction problems, such as object detection, semantic segmentation, and depth estimation. Unlike traditional image classification tasks, dense prediction involves making predictions for each pixel or region in an image.

Image_Restoration_by_Deblur

Image Restoration by Deblur

CMFNet achieves competitive performance on three tasks: image deblurring, image dehazing and image deraindrop.

Chest_X-rays_classification_with_ViT
Chest X-rays classification with ViT

Chest X-rays classification with ViT is not a specific term or model that I am aware of in the context of pneumonia classification using X-ray images. However, I can provide you with a general overview of pneumonia classification using X-ray images and the role of vision transformers (ViTs) in image analysis tasks.

Image_Restoration_by_Deraindrop

Image Restoration by Deraindrop

CMFNet achieves competitive performance on three tasks: image deblurring, image dehazing and image deraindrop.

Speed_Recognition_by_Fairseq_S2T

Speed Recognition by Fairseq S2T

S2T is an end-to-end sequence-to-sequence transformer model. It is trained with standard autoregressive cross-entropy loss and generates the transcripts autoregressively.