Comments (4)
这部分我们做的比较简单,只是follow了前人的一项工作PICARD(https://github.com/ServiceNow/picard)。
具体来说,当来一个问题的时候,我们会根据问题和数据库中所有values进行一个字符串级别的相似度计算,抽取出相似度很高的values拼在对应列后面。当训练模型的时候,它就可以学到“列后面跟的values很可能是会出现在SQL中的值”。
但仔细思考,这样做其实是有若干问题的,比如它不能处理语义级别的相似度,因为目前只是根据字符串匹配相似度进行打分;其次,对于非常大的表(可能存了上百GB的数据),就算是简单的字符串匹配,也会消耗大量的时间;最后,在很多实际生产环境中,由于隐私问题,我们根本没有权限拿到数据库中的所有values。
from resdsql.
@lihaoyang-ruc 所以,选择值的前提,是直接读取了原始数据对吗?我一直以为,值的选择是通过学习自然语言问句分解出来的(因为值在自然语言问句中可能比较有特异性,而且经常是原封不动的“搬到”sql里,比如:数字)。那如果原始数据的量非常大,是不是可以认为“索引+匹配”的方式速度会比较慢?
如果希望能直接从自然语言问句中直接识别出值,您有推荐的方案吗?
from resdsql.
是这样的,模型在数据上训练后,其实是可以直接识别出问题中出现的值。SQL中的values一部分可以从问题中直接copy过来,另一部分可以从DB中按照字符串相似度抽出的values直接copy过来。
from resdsql.
您能帮忙举个例子吗?怎样的value,是不能从问题copy,而只能从DB中抽取的?
from resdsql.
Related Issues (20)
- Execuse me. What happened to paper CodeS? Isn't this article open source before? HOT 9
- Low training metrics HOT 14
- Support for Historical Conversation in RESDSQL HOT 4
- Question about evaluation scripts HOT 2
- 请问推理方法 HOT 2
- 最低支持的GPU内存是多少,我怎么跑不起来。
- Dev result file?
- 部分带有别名的sql在经过normalization处理后出现错误 HOT 2
- Inference script not working HOT 5
- CoSQL HOT 1
- 训练Cross-Encoder的时候为什么24G的显存还不够用? HOT 1
- 关于RESDSQL在BIRD上的运行时间 HOT 2
- Training cross-coder error HOT 1
- xlm_roberta_text2natsql_schema_item_classifier HOT 3
- Evaluation detail on CSpider HOT 1
- 你好,请问如何将自己的数据集处理成CSpider的形式? HOT 3
- 你好,请问如何SQL2NatSQL?我想用自己的数据集跑text2NatSQL的方法。 HOT 2
- 请问模型训练有多gpu并行支持吗 HOT 1
- Can the ranking-filter successfully choose all the right schema items? HOT 1
- 为什么我使用对bird训练的classifier时出现了truncated_dataset.json文件,而且陷入了循环无法结束运行 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from resdsql.