Attention Maps Exploration by SimPool
The ViT-S model (Vision Transformer-Small) is a variant of the Vision Transformer architecture, which applies the Transformer model to image recognition tasks. SimPool, short for "Simplified Pooling," is a pooling method designed to aggregate information from the ViT-S model s attention maps and produce a fixed-size representation for downstream tasks.
mit
Image Feature Extraction
PyTorch
English
No discussions yet. Start the first one.
New Discussion