W3AI - Attention Maps Exploration by SimPool

Attention Maps Exploration by SimPool

The ViT-S model (Vision Transformer-Small) is a variant of the Vision Transformer architecture, which applies the Transformer model to image recognition tasks. SimPool, short for "Simplified Pooling," is a pooling method designed to aggregate information from the ViT-S model s attention maps and produce a fixed-size representation for downstream tasks.

mit

Image Feature Extraction

PyTorch

English

by @AIOZNetwork

•

5.0 (1)

Last updated: 3 months ago

Details

No discussions yet. Start the first one.

New Discussion