Attention_Maps_Exploration_by_SimPool

Attention Maps Exploration by SimPool

The ViT-S model (Vision Transformer-Small) is a variant of the Vision Transformer architecture, which applies the Transformer model to image recognition tasks. SimPool, short for "Simplified Pooling," is a pooling method designed to aggregate information from the ViT-S model s attention maps and produce a fixed-size representation for downstream tasks.

mit
Image Feature Extraction
PyTorch
English
by @AIOZNetwork
3
41
5.0 (1)

Last updated: 3 months ago


API Keys

There are no API keys associated with this model.

You have no API Token for this model