-
Notifications
You must be signed in to change notification settings - Fork 216
Open
Labels
enhancementNew feature or requestNew feature or requestmodelRequest to add / extend support for the model.Request to add / extend support for the model.
Description
Search before asking
- I have searched the Multimodal Maestro issues and found no similar feature requests.
Description
As far as I know, Qwen2.5-VL is the first open source multimodal model that can extract bounding boxes.
e.g. from https://github.com/QwenLM/Qwen2.5-VL/blob/main/cookbooks/spatial_understanding.ipynb:
It would be great to support this so that other models can support this as well.
Use case
We would use this for generative process automation in https://github.com/OpenAdaptAI/OpenAdapt
Additional
No response
Are you willing to submit a PR?
- Yes I'd like to help by submitting a PR!
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestmodelRequest to add / extend support for the model.Request to add / extend support for the model.