본문 바로가기

논문 리뷰

(25)

Masked-attention mask transformer for universal image segmentation 리뷰 Cheng, Bowen, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, and Rohit Girdhar. "Masked-attention mask transformer for universal image segmentation." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1290-1299. 2022. CVPR 2022 Open Access Repository Masked-Attention Mask Transformer for Universal Image Segmentation Bowen Cheng, Ishan Misra, Alexand..

Segment Anything 리뷰 Kirillov, Alexander, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao et al. "Segment anything." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015-4026. 2023. ICCV 2023 Open Access Repository Segment Anything Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Ale..

Segmenter: Transformer for Semantic Segmentation 리뷰 Strudel, Robin, Ricardo Garcia, Ivan Laptev, and Cordelia Schmid. "Segmenter: Transformer for semantic segmentation." In Proceedings of the IEEE/CVF international conference on computer vision, pp. 7262-7272. 2021. ICCV 2021 Open Access Repository Segmenter: Transformer for Semantic Segmentation Robin Strudel, Ricardo Garcia, Ivan Laptev, Cordelia Schmid; Proceedings of the IEEE/CVF Internationa..

Vision transformers need registers 리뷰 Darcet, Timothée, Maxime Oquab, Julien Mairal, and Piotr Bojanowski. "Vision transformers need registers." arXiv preprint arXiv:2309.16588 (2023). Vision Transformers Need Registers Transformers have recently emerged as a powerful tool for learning visual representations. In this paper, we identify and characterize artifacts in feature maps of both supervised and self-supervised ViT networks. Th..

IBOT : IMAGE BERT PRE-TRAINING WITH ONLINETOKENIZER 리뷰 Zhou, Jinghao, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. "ibot: Image bert pre-training with online tokenizer." arXiv preprint arXiv:2111.07832 (2021). https://arxiv.org/abs/2111.07832 iBOT: Image BERT Pre-Training with Online Tokenizer The success of language Transformers is primarily attributed to the pretext task of masked language modeling (MLM), where texts are ..

Neural discrete representation learning(VQ-VAE) 리뷰 + 코드 Van Den Oord, Aaron, and Oriol Vinyals. "Neural discrete representation learning." Advances in neural information processing systems 30 (2017). Neural Discrete Representation Learning Requests for name changes in the electronic proceedings will be accepted with no questions asked. However name changes may cause bibliographic tracking issues. Authors are asked to consider this carefully and discu..

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction 리뷰 Tian, Keyu, Yi Jiang, Zehuan Yuan, Bingyue Peng, and Liwei Wang. "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction." arXiv preprint arXiv:2404.02905 (2024). Paperswithcode에 상위 논문으로 플로팅 되어 있길래 하는 리뷰NLP에서 사용하는 autoregressive modeling을 비전에 적용한 첫 모델? 인듯 하다. Abstract 우리는 autoregressive learning을 이용한 이미지 생성에 새로운 패러다임을 제시한다. 다음 토큰을 예측하는 전형적인 방법에서 벗어나 단계별로 해상도/스케일을 예측한..

DINOv2: Learning Robust Visual Features without Supervision 리뷰 Oquab, Maxime, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez et al. "Dinov2: Learning robust visual features without supervision." arXiv preprint arXiv:2304.07193 (2023). DINO v1을 리뷰하고 BYOL을 리뷰하고 다시 DINO v2를 리뷰하는 뒤죽박죽이다. Abstract 대용량 데이터셋으로 사전훈련된 모델을 이용할 수 있음에 따라 이미지의 시각적 특징만을 이용해 원하는 작업을 위한 정보를 얻을 수 있게 되었다. 이 연구에서는 다양한 원천에서 수집된 데이터로 사전훈련된 모델이 존재한다면 ..

목록 더보기

티스토리툴바