Models

Search
all
verified
Background_Removal

Background Removal is an image processing technique used to separate the main object from the background of a photo. Removing the background helps highlight the product, subject, or character, bringing a professional and aesthetically pleasing look to the image.

user-avatar
213
28
14
Longformer_for_SQuADv2

Longformer for SQuADv2

Longformer uses a combination of a sliding window (local) attention and global attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations.

user-avatar
13
0
0
Midjourney_Prompt_Generator

Midjourney Prompt Generator

Midjourney Prompt Generator's mission is to generate suggestions or questions midway through the journey, to promote creative thinking and the discovery of new potential.

user-avatar
27
0
3
Jaks_Woolitize_Image_Generator

Jak's Woolitize Image Generator

Jak's Woolitize Image Generator is a text to image task that focuses on applying a woolitize texture and appearance to generated images, creating images that convey warmth.

user-avatar
20
3
4
Image_Generator_Using_SSD_1B

Image Generator Using SSD 1B

Image Generator Using SSD 1B is a powerful deep learning model specifically designed for image synthesis and generation.

user-avatar
52
5
5
Text_To_Sound

Text To Sound

The Text-to-Sound task involves converting written text into audible speech. It is a technology that utilizes natural language processing and speech synthesis techniques to transform written words into a spoken form. This task plays a crucial role in various applications, such as text-to-speech (TTS) systems, accessibility tools for visually impaired individuals, voice assistants, and automated voice response systems.

user-avatar
0
0
0
Anime_Style_Image_Generator

Anime Style Image Generator

The task is a powerful tool that generates anime-style images based on text descriptions or prompts.

user-avatar
22
5
3
Story_Generator

Story Generator

This task aims to create stories or paragraphs automatically based on pre-programmed patterns and rules.

user-avatar
26
0
1
Speed_Recognition_by_Fairseq_S2T

Speed Recognition by Fairseq S2T

S2T is an end-to-end sequence-to-sequence transformer model. It is trained with standard autoregressive cross-entropy loss and generates the transcripts autoregressively.

user-avatar
15
0
0
TAPEX_Table_Pre-training_via_Learning_a_Neural_SQL_Executor

TAPEX: Table Pre-training via Learning a Neural SQL Executor

TAPEX (Table Pre-training via Execution) is a conceptually simple and empirically powerful pre-training approach to empower existing models with table reasoning skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.

user-avatar
0
0
0
TAPAS_Weakly_Supervised_Table_Parsing_via_Pre-training

TAPAS: Weakly Supervised Table Parsing via Pre-training

TAPAS is a BERT-like transformers model pretrained on a large corpus of English data from Wikipedia in a self-supervised fashion. This means it was pretrained on the raw tables and associated texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts.

user-avatar
0
1
0
Image_Restoration_by_Deraindrop

Image Restoration by Deraindrop

CMFNet achieves competitive performance on three tasks: image deblurring, image dehazing and image deraindrop.

user-avatar
10
0
0
Image_Restoration_by_Deblur

Image Restoration by Deblur

CMFNet achieves competitive performance on three tasks: image deblurring, image dehazing and image deraindrop.

user-avatar
13
0
0
Document_Parsing_by_Donut

Document Parsing by Donut

Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.

user-avatar
16
0
0
Image_to_Text_by_Pix2Struct

Image to Text by Pix2Struct

Pix2Struct is an image encoder - text decoder model that is trained on image-text pairs for various tasks, including image captionning and visual question answering.

user-avatar
10
0
0
Document_Visual_Question_Answering

Document Visual Question Answering

Donut model fine-tuned on DocVQA. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.

user-avatar
11
0
0
Dense_Prediction_for_Vision_Transformers

Dense Prediction for Vision Transformers

Dense Prediction Transformer (DPT) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper Vision Transformers for Dense Prediction by Ranftl et al. (2021). DPT uses the Vision Transformer (ViT) as backbone and adds a neck + head on top for monocular depth estimation. This repository hosts the ""hybrid"" version of the model as stated in the paper. DPT-Hybrid diverges from DPT by using ViT-hybrid as a backbone and taking some activations from the backbone.

user-avatar
10
1
0
Image_Captioning_using_ViT_and_GPT2

Image Captioning using ViT and GPT2

Image captioning refers to the process of generating a descriptive and meaningful textual description for an image. It involves utilizing computer vision techniques and natural language processing to analyze the visual content of an image and generate a coherent caption that accurately represents its content.

user-avatar
14
0
0
Filling_Mask_with_Bert

Filling Mask with Bert

Filling Mask AI is a technique or tool used to fill in missing or occluded regions in an image using artificial intelligence algorithms. It is particularly useful when there are areas in an image that are obscured, damaged, or need to be replaced.

user-avatar
19
0
0
Image-Guided_Object_Detection_with_OWL-ViT

Image-Guided Object Detection with OWL-ViT

You can use OWL-ViT to query images with text descriptions of any object or alternatively with an example / query image of the target object. To use it, simply upload an image and a query image that only contains the object you're looking for. You can also use the score and non-maximum suppression threshold sliders to set a threshold to filter out low probability and overlapping bounding box predictions.

user-avatar
19
0
0