-
Notifications
You must be signed in to change notification settings - Fork 13.1k
Closed
Labels
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Qwen just released Qwen2-VL 2B & 7B under the Apache 2.0 License.
Motivation
SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
Possible Implementation
No response
AaronFeng753, zhongwei, blaueck, NaiveYan, pinyin and 221 morecrzroot, ilovesusu, AaronFeng753, kac487, Amusingdock25 and 33 moreAaronFeng753, kac487, amirvenus, isr431, WildCatApp and 29 moreAaronFeng753, kac487, isr431, sammcj, elyzionz and 27 moreAaronFeng753, kac487, uestcbraid, mrhalyang, swistaczek and 28 more