GithubHelp home page GithubHelp logo

text2sql's Introduction

DuoRAT

DuoRAT

目录介绍

├── configs
│   ├── duorat # 模型配置文件
│   │   └── ...
├── data
│   ├── configs # 数据配置文件,主要内容包括涉及到的dataset及其路径
│   │   └── ...
│   ├── test_data # ,包含schema,content,test的json文件
│   └── test_dataset # test data 按数据库split后的结果
│   └── train_data # 训练数据schema,content,test的json文件
│   └── train_dataset #train data 按数据库split后的结果
├── logdir # 保存的模型文件路径
├── pretrain # 预训练模型路径,比如bert,roberta等
├── duorat # 这里面是核心代码文件
├── scripts # 存放了直接可以运行的各个.py,如train.py等
├── requirements.txt # 核心依赖包
└── infer.sh train.sh infer2.sh # 常用的.sh文件,完成infer和train工作

环境配置

  • os: Ubuntu 18.04
  • python: 3.7
  • CUDA: 10.0
  • cuda: 9.2
  • cudnn: 7.6.5

本环境使用conda创建的虚拟环境。

conda create -n duorat python=3.7

安装完毕后,先激活环境:

conda activate duorat

安装cudatoolkitpytorch命令:

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=9.2 -c pytorch

然后切换到项目目录,安装依赖包:

pip install -r requirements.txt

数据

  • dusql 数据格式
  • 训练集:train.json;db_schema.json;db_content.json
  • 测试集:test.json;db_schema.json;db_content.json

工作流程

运行前准备

通过conda activate duorat激活环境,并进入~/data/duo-rat-2020文件夹下

train

前置依赖:已有处理好的,转换为dusql形式的训练集与验证集 流程:

  1. 运行scripts/split_dusql_bydb.py,得到按数据库划分后的训练数据和验证数据, 其中需要修改内容使之同data下的文件命名和路径一致
# 修改10行开始的内容,使之命名一致
tables_json_path = "newSPDB_db_schema.json"
content_json_path = 'newSPDB_db_content.json'
examples_paths = ["spdb_train.json", "spdb_val.json"]

#修改86行开始内容,指定dataset路径和split后结果的存放路径

parser.add_argument("--data-path", type=str, default='./data/newSPDB')
parser.add_argument("--duorat-path", type=str, default='./data/database')  
  1. 创建data config,将前一步输出的train数据依赖数据库名和val数据依赖数据库名加入到对应的train.libsonnetval.libsonnet

  2. 创建model condig,在其中配置模型和训练过程的各项参数,并在data部分指定第2步配置好的train.libsonnetval.libsonnet

  3. 修改train.sh,配置logdir等参数

  4. sh train.sh即可,train.sh内容如下

nohup python -u scripts/train.py \
    --config configs/duorat/dusql-electra-base.jsonnet \
    --logdir logdir/3.0-for-hornor  >train_best.log 2>&1 &
tail -f train_best.log

infer

前置依赖:已有处理好的,转换为dusql形式的测试集 流程:

  1. 运行scripts/split_test.py,得到按数据库划分后的测试数据
  2. 创建data config,将前一步输出的test数据依赖数据库名和加入到对应的test.libsonnet
  3. 修改 infer2.sh,指定logdir和datadir infer2.sh内容
nohup python scripts/infer_questions_multi_proc.py --logdir ./logdir/3.0-for-hornor \
                --data-config data/test.libsonnet \
                --questions data/test_split/test.json \
                --output-spider ./logdir/3.0-for-hornor/infer_35000.json \
                --nproc 4 \
                --step 35000  >infer_3.0_35000.log 2>&1 &
tail -f infer_3.0_35000.log

参考资料

text2sql's People

Contributors

hyc2026 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.