論文まとめ:ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
266{icon} {views} 論文タイトル:ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimod […]...
論文まとめ:COLE: A Hierarchical Generation Framework for Graphic Design
384{icon} {views} * タイトル:COLE: A Hierarchical Generation Framework for Graphic Design * 著者:Peidong Jia, Chenxu […]...
論文まとめ:Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
665{icon} {views} 論文タイトル:Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 著者:Sond […]...
論文まとめ:WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
801{icon} {views} タイトル:WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models 著者:Hongliang […]...
論文まとめ:Gemini: A Family of Highly Capable Multimodal Models
520{icon} {views} タイトル:Gemini: A Family of Highly Capable Multimodal Models 著者:Gemini Team((842 additional aut […]...
論文まとめ:Weak to Strong Generalization: Eliciting Strong Capabilities with Weak SUPERVISION
519{icon} {views} タイトル:Weak to Strong Generalization: Eliciting Strong Capabilities with Weak SUPERVISION 著者:O […]...
論文まとめ:Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
990{icon} {views} タイトル:Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets 著者:Stab […]...
論文まとめ:Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
1.1k{icon} {views} 論文URL:Video-LLaVA: Learning United Visual Representation by Alignment Before Projection 著者: […]...
論文まとめ:LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
1.9k{icon} {views} タイトル:LCM-LoRA: A Universal Stable-Diffusion Acceleration Module 論文URL:https://arxiv.org/abs […]...
論文まとめ:Improving Image Generation with Better Captions
2k{icon} {views} タイトル:Improving Image Generation with Better Captions 著者:James Betker、Gabriel Gohなど(OpenAIの人) […]...