GithubHelp home page GithubHelp logo

Comments (4)

lihaoyang-ruc avatar lihaoyang-ruc commented on July 23, 2024 2

这部分我们做的比较简单,只是follow了前人的一项工作PICARD(https://github.com/ServiceNow/picard)。

具体来说,当来一个问题的时候,我们会根据问题和数据库中所有values进行一个字符串级别的相似度计算,抽取出相似度很高的values拼在对应列后面。当训练模型的时候,它就可以学到“列后面跟的values很可能是会出现在SQL中的值”。

但仔细思考,这样做其实是有若干问题的,比如它不能处理语义级别的相似度,因为目前只是根据字符串匹配相似度进行打分;其次,对于非常大的表(可能存了上百GB的数据),就算是简单的字符串匹配,也会消耗大量的时间;最后,在很多实际生产环境中,由于隐私问题,我们根本没有权限拿到数据库中的所有values。

from resdsql.

ManchesterWuer avatar ManchesterWuer commented on July 23, 2024

@lihaoyang-ruc 所以,选择值的前提,是直接读取了原始数据对吗?我一直以为,值的选择是通过学习自然语言问句分解出来的(因为值在自然语言问句中可能比较有特异性,而且经常是原封不动的“搬到”sql里,比如:数字)。那如果原始数据的量非常大,是不是可以认为“索引+匹配”的方式速度会比较慢?
如果希望能直接从自然语言问句中直接识别出值,您有推荐的方案吗?

from resdsql.

lihaoyang-ruc avatar lihaoyang-ruc commented on July 23, 2024

是这样的,模型在数据上训练后,其实是可以直接识别出问题中出现的值。SQL中的values一部分可以从问题中直接copy过来,另一部分可以从DB中按照字符串相似度抽出的values直接copy过来。

from resdsql.

ManchesterWuer avatar ManchesterWuer commented on July 23, 2024

您能帮忙举个例子吗?怎样的value,是不能从问题copy,而只能从DB中抽取的?

from resdsql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.