Scenario2Vector: “Papers to Read”
Dec 18, 2020
Transformers — Foundational Papers
Transformers — Video Applications
- Attention Is All You Need For Videos: Self-Attention Based Video Summarization Using Universal Transformers
- End-to-End Object Detection with Transformers
- End-to-End Contextual Perception and Prediction with Interaction Transformer
- Advisable Learning for Self-driving Vehicles by Internalizing Observation-to-Action Rules
- Video Action Transformer Network
- COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
- Bimodal transformer for deep captioning: https://arxiv.org/abs/2005.08271, https://towardsdatascience.com/dense-video-captioning-using-pytorch-392ca0d6971a, https://github.com/v-iashin/BMT,
3D CNNs for Spatio-Temporal Feature Extraction
- Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
- Learning Spatiotemporal Features with 3D Convolutional Networks
- SlowFast Networks for Video Recognition
- Long-Term Feature Banks for Detailed Video Understanding
- AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures