Models

Search
all
verified
Background_Replacement
Background Replacement

Background Replacement is a powerful tool that enables users to easily change the background of their images, opening up endless possibilities for creative transformations and visual enhancements.

user-avatar
60
55
20
DPT_Depth_Estimation
DPT Depth Estimation

The Depth Estimation task involves determining the distance or depth information of objects in a given scene or image. It is a computer vision task that utilizes various techniques and algorithms to estimate the relative distances between objects and their positions in three-dimensional space.

Jaks_Woolitize_Image_Generator

Jak's Woolitize Image Generator

Jak's Woolitize Image Generator is a text to image task that focuses on applying a woolitize texture and appearance to generated images, creating images that convey warmth.

Story_Generator

Story Generator

This task aims to create stories or paragraphs automatically based on pre-programmed patterns and rules.

Anime_Style_Image_Generator

Anime Style Image Generator

The task is a powerful tool that generates anime-style images based on text descriptions or prompts.

Speed_Recognition_by_Fairseq_S2T

Speed Recognition by Fairseq S2T

S2T is an end-to-end sequence-to-sequence transformer model. It is trained with standard autoregressive cross-entropy loss and generates the transcripts autoregressively.

TAPEX_Table_Pre-training_via_Learning_a_Neural_SQL_Executor

TAPEX: Table Pre-training via Learning a Neural SQL Executor

TAPEX (Table Pre-training via Execution) is a conceptually simple and empirically powerful pre-training approach to empower existing models with table reasoning skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.

TAPAS_Weakly_Supervised_Table_Parsing_via_Pre-training

TAPAS: Weakly Supervised Table Parsing via Pre-training

TAPAS is a BERT-like transformers model pretrained on a large corpus of English data from Wikipedia in a self-supervised fashion. This means it was pretrained on the raw tables and associated texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts.

Image_Restoration_by_Deraindrop

Image Restoration by Deraindrop

CMFNet achieves competitive performance on three tasks: image deblurring, image dehazing and image deraindrop.

Image_Restoration_by_Deblur

Image Restoration by Deblur

CMFNet achieves competitive performance on three tasks: image deblurring, image dehazing and image deraindrop.

Longformer_for_SQuADv2

Longformer for SQuADv2

Longformer uses a combination of a sliding window (local) attention and global attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations.

Image_to_Text_by_Pix2Struct

Image to Text by Pix2Struct

Pix2Struct is an image encoder - text decoder model that is trained on image-text pairs for various tasks, including image captionning and visual question answering.

Document_Visual_Question_Answering

Document Visual Question Answering

Donut model fine-tuned on DocVQA. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.

Image_Captioning_using_ViT_and_GPT2

Image Captioning using ViT and GPT2

Image captioning refers to the process of generating a descriptive and meaningful textual description for an image. It involves utilizing computer vision techniques and natural language processing to analyze the visual content of an image and generate a coherent caption that accurately represents its content.

Filling_Mask_with_Bert

Filling Mask with Bert

Filling Mask AI is a technique or tool used to fill in missing or occluded regions in an image using artificial intelligence algorithms. It is particularly useful when there are areas in an image that are obscured, damaged, or need to be replaced.

Image-Guided_Object_Detection_with_OWL-ViT

Image-Guided Object Detection with OWL-ViT

You can use OWL-ViT to query images with text descriptions of any object or alternatively with an example / query image of the target object. To use it, simply upload an image and a query image that only contains the object you're looking for. You can also use the score and non-maximum suppression threshold sliders to set a threshold to filter out low probability and overlapping bounding box predictions.

DistilBERT_for_Sentiment_Analysis
DistilBERT for Sentiment Analysis

Sentiment classification is the automated process of identifying and classifying emotions in text as positive sentiment, negative sentiment, or neutral sentiment based on the opinions expressed within.

ZeroShot_Text_Classification
NLI-based Zero Shot Text Classification

Zero-shot text classification is a technique used in natural language processing (NLP) to classify text into predefined categories without requiring any labeled training data for those specific categories.

Vehicle_Classification
Vehicle Classification

The Vehicle Classification task involves automatically categorizing vehicles based on their visual characteristics, such as shape, size, and appearance. It combines computer vision techniques and machine learning algorithms to analyze images containing vehicles and assign them to specific classes or categories.

Musical_Instrument_Classification
Musical Instrument Classification

Musical instrument classification is the task of automatically recognizing and categorizing different musical instruments from audio recordings or spectrograms. It involves identifying the unique characteristics and sound patterns associated with each instrument to determine its class or type.