GithubHelp home page GithubHelp logo

Comments (15)

lihaoyang-ruc avatar lihaoyang-ruc commented on August 26, 2024 1

Yes

from resdsql.

lihaoyang-ruc avatar lihaoyang-ruc commented on August 26, 2024

Hi!
Have you fine-tuned RESDSQL on your dataset? Or did you only use the checkpoints we provided to perform inference on your dataset?

from resdsql.

VirendraSttl avatar VirendraSttl commented on August 26, 2024

I have been utilizing the provided checkpoints in RESDSQL to enhance my work. However, I am uncertain about the process of fine-tuning RESDSQL on my own dataset. Could you kindly provide guidance on how to proceed with fine-tuning RESDSQL using my specific dataset?

from resdsql.

lihaoyang-ruc avatar lihaoyang-ruc commented on August 26, 2024

RESDSQL has been fine-tuned on Spider. Therefore, you should prepare your dataset in the same format as it (its home page https://yale-lily.github.io/spider).

In fact, most Text-to-SQL datasets organize their data in Spider's format (e.g., Dr. Spider, CSpider, BIRD, Kaggle-DBQA, etc.).

from resdsql.

VirendraSttl avatar VirendraSttl commented on August 26, 2024

I had already set my dataset in the format of Spider (i.e. tables.json)

from resdsql.

lihaoyang-ruc avatar lihaoyang-ruc commented on August 26, 2024

Just tables.json is not enough.

To train RESDSQL on your dataset, you have to prepare at least three files (Take Spider's file as an example):

  • database, a folder where the sqlite databases are saved.
  • train_spider.json, a json file that contains pairs of training data, each of them should contain three fields: db_id, query, and question.
  • tables.json, a json file that describes the schema of all databases.

To run inference and evaluation, you should prepare a separate dev_gold.sql file containing the gold SQL query and its corresponding db_id.

from resdsql.

VirendraSttl avatar VirendraSttl commented on August 26, 2024

Okay, I'll try this solution.

BTW I have a question do I need to train the model every time whenever I change my dataset?
Is there any way RESDSQL will generate an SQL query on the hidden test set?

from resdsql.

lihaoyang-ruc avatar lihaoyang-ruc commented on August 26, 2024

No, if your training set and test set have the same (or similar) distribution, it can be naturally generalized to the hidden test set without additional training.

from resdsql.

VirendraSttl avatar VirendraSttl commented on August 26, 2024

Okay.

I followed your training steps and noticed that both train_spider.json and dev.json were required. However, I am a bit confused about their differences. Are they essentially the same file with different names, or do they serve distinct purposes in the training process?

from resdsql.

lihaoyang-ruc avatar lihaoyang-ruc commented on August 26, 2024

They are different files. train_spider.json is the training set, and dev.json is the development set.

from resdsql.

lihaoyang-ruc avatar lihaoyang-ruc commented on August 26, 2024

We use dev.json to select the best checkpoint during fine-tuning.

from resdsql.

VirendraSttl avatar VirendraSttl commented on August 26, 2024

So can I use the same dev.json file? or Do I need to create it separately as per my dataset?

from resdsql.

lihaoyang-ruc avatar lihaoyang-ruc commented on August 26, 2024

My suggestion would be to create a separate dev.json so that you can evaluate the performance of the model on unseen data.

from resdsql.

lihaoyang-ruc avatar lihaoyang-ruc commented on August 26, 2024

If you're training and evaluating your model on the training set, I don't think it makes sense because the model will memorize your training data to quickly reach (close to) 100% accuracy.

from resdsql.

VirendraSttl avatar VirendraSttl commented on August 26, 2024

You mean, train_spider.json and dev.json are the same as we are splitting our data into two sets i.e. train set and test set

from resdsql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.