論文まとめ:ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
178{icon} {views} 論文タイトル:ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimod […]...
MiniGPT4をAutoGPTQ/BitsAndBytesで量子化してAWS上でのスループットを検証する
704{icon} {views} LLMをデプロイする際に、LLM部分の量子化が必要になることが多いです。MiniGPT4のようなVision & Language(マルチモーダル)なLLMに焦点をあて、Aut […]...
論文まとめ:UniVTG: Towards Unified Video-Language Temporal Grounding
359{icon} {views} タイトル:UniVTG: Towards Unified Video-Language Temporal Grounding 著者:Kevin Qinghong Lin, Pengch […]...
論文まとめ:GRiT: A Generative Region-to-text Transformer for Object Understanding
1.1k{icon} {views} タイトル:GRiT: A Generative Region-to-text Transformer for Object Understanding 著者:Jialian Wu, […]...
論文まとめ:EVA-02: A Visual Representation for Neon Genesis
2k{icon} {views} タイトル:EVA-02: A Visual Representation for Neon Genesis 著者:Yuxin Fang, Quan Sun, Xinggang Wang, […]...
論文まとめ:EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
1.7k{icon} {views} タイトル:EVA: Exploring the Limits of Masked Visual Representation Learning at Scale 著者:Yuxin F […]...
論文まとめ:Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
5.4k{icon} {views} タイトル:Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection […]...
論文まとめ:Flamingo: a Visual Language Model for Few-Shot Learning
1.8k{icon} {views} タイトル:Flamingo: a Visual Language Model for Few-Shot Learning 著者:Jean-Baptiste Alayrac, Jeff […]...
論文まとめ:BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
9.3k{icon} {views} タイトル:BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large […]...
論文まとめ:Domino: Discovering Systematic Errors with Cross-Modal Embeddings
328{icon} {views} タイトル:Domino: Discovering Systematic Errors with Cross-Modal Embeddings 著者:Sabri Eyuboglu, Ma […]...