GithubHelp home page GithubHelp logo

Comments (7)

Sleepychord avatar Sleepychord commented on August 24, 2024 1

@lplplpoooouuu @zRzRzRzRzRzRzR 有这个功能的,请下载本仓库中的CogAgent模型(问题中加上(with grounding))就是针对你说的任务。不过没有针对标号的图像训练,这个可以通过在cogagent上微调来实现。

from cogvlm.

zRzRzRzRzRzRzR avatar zRzRzRzRzRzRzR commented on August 24, 2024

没做这个训练

from cogvlm.

lplplpoooouuu avatar lplplpoooouuu commented on August 24, 2024

好的谢谢,我感觉这个功能会非常有前景,实现了以后可以让LLM来执行鼠标键盘的操作从而做到真正的自动化

from cogvlm.

lplplpoooouuu avatar lplplpoooouuu commented on August 24, 2024

感谢!我之前试了web 的demo版,看了一下我的硬件可能没法用CogAgent,有点遗憾

from cogvlm.

zRzRzRzRzRzRzR avatar zRzRzRzRzRzRzR commented on August 24, 2024

可以这么理解
目前的CogAgent模型( 问题中加上(with grounding)) 可以实现你说的 任务,但是局限于较为简单的场景(例子:我想返回图像中猫的位置),或者第二种场景(我希望你帮我寻找到图像中的搜索框并,并填充“搜索xxx”)

但是如果要跟细粒度或者较难的场景(我给图片打上网格并给每个网格标号,比如,把图像切成9个功能并告诉我一个人在打球在图像的哪个区域内)比较难做

from cogvlm.

zRzRzRzRzRzRzR avatar zRzRzRzRzRzRzR commented on August 24, 2024

通过微调的方式能解决一些相似的但是更针对的任务,模型本身的局限性导致了这个方案的效果不一定能完美的达到理想状态,关于别的方式,目前我也没有探索到

from cogvlm.

lplplpoooouuu avatar lplplpoooouuu commented on August 24, 2024

模型本身的局限性是指什么啊

from cogvlm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.