Models
Background Removal is an image processing technique used to separate the main object from the background of a photo. Removing the background helps highlight the product, subject, or character, bringing a professional and aesthetically pleasing look to the image.
by @AIOZNetwork
1.70504 AIOZ ($0.8)
Longformer for SQuADv2
Longformer uses a combination of a sliding window (local) attention and global attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations.
by @AIOZNetwork
0 AIOZ ($0)
Midjourney Prompt Generator
Midjourney Prompt Generator's mission is to generate suggestions or questions midway through the journey, to promote creative thinking and the discovery of new potential.
by @AIOZNetwork
0 AIOZ ($0)
Jak's Woolitize Image Generator
Jak's Woolitize Image Generator is a text to image task that focuses on applying a woolitize texture and appearance to generated images, creating images that convey warmth.
by @AIOZNetwork
0 AIOZ ($0)
Image Generator Using SSD 1B
Image Generator Using SSD 1B is a powerful deep learning model specifically designed for image synthesis and generation.
by @AIOZNetwork
0 AIOZ ($0)
Text To Sound
The Text-to-Sound task involves converting written text into audible speech. It is a technology that utilizes natural language processing and speech synthesis techniques to transform written words into a spoken form. This task plays a crucial role in various applications, such as text-to-speech (TTS) systems, accessibility tools for visually impaired individuals, voice assistants, and automated voice response systems.
by @AIOZNetwork
0 AIOZ ($0)
Anime Style Image Generator
The task is a powerful tool that generates anime-style images based on text descriptions or prompts.
by @AIOZNetwork
0 AIOZ ($0)
Story Generator
This task aims to create stories or paragraphs automatically based on pre-programmed patterns and rules.
by @AIOZNetwork
0 AIOZ ($0)
Speed Recognition by Fairseq S2T
S2T is an end-to-end sequence-to-sequence transformer model. It is trained with standard autoregressive cross-entropy loss and generates the transcripts autoregressively.
by @AIOZNetwork
0 AIOZ ($0)
TAPEX: Table Pre-training via Learning a Neural SQL Executor
TAPEX (Table Pre-training via Execution) is a conceptually simple and empirically powerful pre-training approach to empower existing models with table reasoning skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.
by @AIOZNetwork
0 AIOZ ($0)
TAPAS: Weakly Supervised Table Parsing via Pre-training
TAPAS is a BERT-like transformers model pretrained on a large corpus of English data from Wikipedia in a self-supervised fashion. This means it was pretrained on the raw tables and associated texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts.
by @AIOZNetwork
0 AIOZ ($0)
Image Restoration by Deraindrop
CMFNet achieves competitive performance on three tasks: image deblurring, image dehazing and image deraindrop.
by @AIOZNetwork
0 AIOZ ($0)
Image Restoration by Deblur
CMFNet achieves competitive performance on three tasks: image deblurring, image dehazing and image deraindrop.
by @AIOZNetwork
0 AIOZ ($0)
Document Parsing by Donut
Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.
by @AIOZNetwork
0 AIOZ ($0)
Image to Text by Pix2Struct
Pix2Struct is an image encoder - text decoder model that is trained on image-text pairs for various tasks, including image captionning and visual question answering.
by @AIOZNetwork
0 AIOZ ($0)
Document Visual Question Answering
Donut model fine-tuned on DocVQA. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.
by @AIOZNetwork
0 AIOZ ($0)
Dense Prediction for Vision Transformers
Dense Prediction Transformer (DPT) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper Vision Transformers for Dense Prediction by Ranftl et al. (2021). DPT uses the Vision Transformer (ViT) as backbone and adds a neck + head on top for monocular depth estimation. This repository hosts the ""hybrid"" version of the model as stated in the paper. DPT-Hybrid diverges from DPT by using ViT-hybrid as a backbone and taking some activations from the backbone.
by @AIOZNetwork
0 AIOZ ($0)
Image Captioning using ViT and GPT2
Image captioning refers to the process of generating a descriptive and meaningful textual description for an image. It involves utilizing computer vision techniques and natural language processing to analyze the visual content of an image and generate a coherent caption that accurately represents its content.
by @AIOZNetwork
0 AIOZ ($0)
Filling Mask with Bert
Filling Mask AI is a technique or tool used to fill in missing or occluded regions in an image using artificial intelligence algorithms. It is particularly useful when there are areas in an image that are obscured, damaged, or need to be replaced.
by @AIOZNetwork
0 AIOZ ($0)
Image-Guided Object Detection with OWL-ViT
You can use OWL-ViT to query images with text descriptions of any object or alternatively with an example / query image of the target object. To use it, simply upload an image and a query image that only contains the object you're looking for. You can also use the score and non-maximum suppression threshold sliders to set a threshold to filter out low probability and overlapping bounding box predictions.
by @AIOZNetwork
0 AIOZ ($0)