Microsoft open-sources innovation framework: DeepSeek-R1 and other models can be turned into AI Agents
This article is machine translated
Show original
Odaily report: Microsoft has released the latest version 2.0 of its visual Agent parsing framework OmniParser on its official website, which can turn models like DeepSeek-R1, GPT-4o, and Qwen-2.5VL into usable AI Agents on computers. Compared to V1, V2 has higher accuracy and faster inference speed when detecting smaller interactive UI elements, with a 60% reduction in latency. In the high-resolution Agent benchmark ScreenSpot Pro, the accuracy of V2+GPT-4o reached an impressive 39.6%, while the original accuracy of GPT-4o was only 0.8%, a significant overall improvement. In addition to V2, Microsoft has also open-sourced omnitool, a Docker-based Windows system that covers screen understanding, positioning, action planning, and execution, which is a key tool for turning large models into Agents.
Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments
Share
Relevant content