GithubHelp home page GithubHelp logo

ammieqi / spider-schema-gnn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from benbogin/spider-schema-gnn

0.0 2.0 0.0 68 KB

Author implementation of the paper "Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing"

Python 98.64% Jsonnet 1.36%

spider-schema-gnn's Introduction

Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing

Author implementation of this ACL 2019 paper.

Install & Configure

  1. Install pytorch version 1.0.1.post2 that fits your CUDA version

    (this repository should probably work with the latest pytorch version, but wasn't tested for it. If you use another version, you'll need to also update the versions of packages in requirements.txt)

    pip install https://download.pytorch.org/whl/cu100/torch-1.0.1.post2-cp37-cp37m-linux_x86_64.whl # CUDA 10.0 build
    
  2. Install the rest of required packages

    pip install -r requirements.txt
    
  3. Run this command to install NLTK punkt.

python -c "import nltk; nltk.download('punkt')"
  1. Download the dataset from the official Spider dataset website

  2. Edit the config file train_configs/defaults.jsonnet to update the location of the dataset:

local dataset_path = "dataset/";

Training

  1. Use the following AllenNLP command to train:
allennlp train train_configs/defaults.jsonnet -s experiments/name_of_experiment \
--include-package dataset_readers.spider \ 
--include-package models.semantic_parsing.spider_parser

First time loading of the dataset might take a while (a few hours) since the model first loads values from tables and calculates similarity features with the relevant question. It will then be cached for subsequent runs.

You should get results similar to the following (the sql_match is the one measured in the official evaluation test):

  "best_validation__match/exact_match": 0.3715686274509804,
  "best_validation_sql_match": 0.47549019607843135,
  "best_validation__others/action_similarity": 0.5731271471206189,
  "best_validation__match/match_single": 0.6254612546125461,
  "best_validation__match/match_hard": 0.3054393305439331,
  "best_validation_beam_hit": 0.6070588235294118,
  "best_validation_loss": 7.383035182952881
  "best_epoch": 32

Note that the hyper-parameters used in defaults.jsonnet are different than those mentioned in the paper (most importantly, 3 timesteps are used instead of 2), thanks to the following contribution from @wlhgtc. The original training config file is still available in train_configs/paper_Defaults.jsonnet.

Inference

Use the following AllenNLP command to output a file with the predicted queries:

allennlp predict experiments/name_of_experiment dataset/dev.json \
--predictor spider \
--use-dataset-reader \
--cuda-device=0 \
--output-file experiments/name_of_experiment/prediction.sql \
--silent \
--include-package models.semantic_parsing.spider_parser \
--include-package dataset_readers.spider \
--include-package predictors.spider_predictor \
--weights-file experiments/name_of_experiment/best.th \
-o "{\"dataset_reader\":{\"keep_if_unparsable\":true}}"

spider-schema-gnn's People

Contributors

benbogin avatar wlhgtc avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.