Comments (6)
Evaluating...
is a placeholder which means that this checkpoint (i.e., checkpoint-61642) is under evaluation, the other evaluation processes need to skip this checkpoint. This is designed for multi-process parallel evaluation. For more details, please see evaluate_text2sql_ckpts.py
lines 54-82.
However, when the evaluation process is terminated unexpectedly, this file still exists. If we restart the evaluation process, the new process will naturally skip this checkpoint.
Solution:
Delete this file (i.e., checkpoint-61642.txt), and run evaluate_on_spider_realistic.sh
again.
Please let me know if this problem persists. :)
from resdsql.
Hello! Thank you so much for your fast reply!
I attempted to re-run this but still, for some reason, I still get the same results (as the image I posted above).
Also, something very interesting that I noticed (when I was running the code on my end) was that when I attempted to print variables "em" and "exec", nothing was printed to the console. I'm not sure if this error is exclusive to my machine or if you were able to encounter such an issue as well.
Again, thank you so much for your fast reply!
from resdsql.
Unfortunately, I have not encountered this problem.
As far as I know, it will take a long time to evaluate each checkpoint using the 3B scale model. If your script finished quickly and without any exceptions, there must be some problems.
I recommend checking line by line, and if you find any bugs, please feel free to contact me.
from resdsql.
Alternatively, we provide the inference script in scripts/inference
, but this script can only evaluate one checkpoint at a time.
Run sh ./scripts/inference/infer_text2natsql.sh 3b spider-realistic
can reproduce our results on spider-realistic.
from resdsql.
I ran sh ./scripts/evaluate_robustness/evaluate_on_spider_realistic.sh
and obatined the following outputs:
ckpt_names: ['checkpoint-61642', 'checkpoint-78302']
Start evaluating ckpt: checkpoint-61642
Namespace(batch_size=1, db_path='./database', dev_filepath='./data/preprocessed_data/resdsql_spider_realistic_natsql.json', device='0', eval_results_path='./eval_results/text2natsql-t5-3b-spider-realistic', mode='eval', num_beams=8, num_return_sequences=8, original_dev_filepath='./data/spider-realistic/spider-realistic.json', output='predicted_sql.txt', save_path='./models/text2natsql-t5-3b/checkpoint-61642', seed=42, tables_for_natsql='./data/preprocessed_data/spider_realistic_tables_for_natsql.json', target_type='natsql')
19%|███████████████████▉ | 94/508 [03:37<15:49, 2.29s/it]select sum(*) from cars_data where cars_data.year = 1980
wrong number of arguments to function sum()
21%|██████████████████████▌ | 107/508 [04:09<17:03, 2.55s/it]Before fix: select cars_data.model from cars_data where cars_data.cylinders = 4 order by cars_data.horsepower desc limit 1
After fix: select car_names.model from cars_data where cars_data.cylinders = 4 order by cars_data.horsepower desc limit 1
---------------
28%|█████████████████████████████▉ | 142/508 [05:24<13:09, 2.16s/it]Before fix: select count ( flights.* ) from flights where airports.airport = 'ASY' and airlines.airline = 'United Airlines'
After fix: select count ( flights.* ) from flights where airports.airportcode = 'ASY' and airlines.airline = 'United Airlines'
---------------
42%|████████████████████████████████████████████▊ | 213/508 [07:53<11:58, 2.44s/it]Before fix: select visitor.id , visitor.name , visitor.level_of_membership from visitor group by visit.id order by sum ( visit.total_spent ) desc limit 1
After fix: select visitor.id , visitor.name , visitor.level_of_membership from visitor group by visitor.id order by sum ( visit.total_spent ) desc limit 1
---------------
43%|█████████████████████████████████████████████▋ | 217/508 [08:04<12:16, 2.53s/it]Before fix: select sum ( visit.total_spent ) from visit where visit.level_of_membership = 1
After fix: select sum ( visit.total_spent ) from visit where visitor.level_of_membership = 1
---------------
52%|███████████████████████████████████████████████████████▌ | 264/508 [10:09<13:01, 3.20s/it]Before fix: select students.first_name , students.middle_name , students.last_name from student_enrolment order by student_enrolment.date_first_registered asc limit 1
After fix: select students.first_name , students.middle_name , students.last_name from student_enrolment order by students.date_first_registered asc limit 1
---------------
59%|██████████████████████████████████████████████████████████████▊ | 298/508 [11:33<08:25, 2.41s/it]Before fix: select tv_series.series_name from tv_series where tv_series.episode = 'A Love of a Lifetime'
After fix: select tv_channel.series_name from tv_series where tv_series.episode = 'A Love of a Lifetime'
---------------
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 508/508 [19:42<00:00, 2.33s/it]
Text-to-SQL inference spends 1214.2015063762665s.
exact_match score: 0.7736220472440944
exec score: 0.8188976377952756
Start evaluating ckpt: checkpoint-78302
Namespace(batch_size=1, db_path='./database', dev_filepath='./data/preprocessed_data/resdsql_spider_realistic_natsql.json', device='0', eval_results_path='./eval_results/text2natsql-t5-3b-spider-realistic', mode='eval', num_beams=8, num_return_sequences=8, original_dev_filepath='./data/spider-realistic/spider-realistic.json', output='predicted_sql.txt', save_path='./models/text2natsql-t5-3b/checkpoint-78302', seed=42, tables_for_natsql='./data/preprocessed_data/spider_realistic_tables_for_natsql.json', target_type='natsql')
7%|███████▋ | 36/508 [01:21<16:36, 2.11s/it]Before fix: select pets.weight from pets where pets.pet_type = 'dog' order by pets.pet_age asc limit 1
After fix: select pets.weight from pets where pets.pet_age = 'dog' order by pets.pet_age asc limit 1
---------------
11%|███████████▋ | 55/508 [02:04<13:48, 1.83s/it]Before fix: select student.lname from student where @.@ join has_pet.* and pets.pet_age = 3 and has_pet.pettype = 'cat'
After fix: select student.lname from student where @.@ join has_pet.* and pets.pet_age = 3 and pets.pettype = 'cat'
---------------
21%|██████████████████████▌ | 107/508 [04:06<16:56, 2.53s/it]Before fix: select cars_data.model from cars_data where cars_data.cylinders = 4 order by cars_data.horsepower desc limit 1
After fix: select car_names.model from cars_data where cars_data.cylinders = 4 order by cars_data.horsepower desc limit 1
---------------
31%|█████████████████████████████████▋ | 160/508 [05:53<10:06, 1.74s/it]Before fix: select flights.flightno from flights where airports.destairport = 'APG'
After fix: select flights.flightno from flights where flights.destairport = 'APG'
---------------
42%|████████████████████████████████████████████▊ | 213/508 [07:47<11:42, 2.38s/it]Before fix: select visitor.id , visitor.name , visitor.level_of_membership from visitor group by visit.id order by sum ( visit.total_spent ) desc limit 1
After fix: select visitor.id , visitor.name , visitor.level_of_membership from visitor group by visitor.id order by sum ( visit.total_spent ) desc limit 1
---------------
59%|██████████████████████████████████████████████████████████████▊ | 298/508 [11:14<08:25, 2.41s/it]Before fix: select tv_series.series_name from tv_series where tv_series.episode like '%A Love of a Lifetime%'
After fix: select tv_channel.series_name from tv_series where tv_series.episode like '%A Love of a Lifetime%'
---------------
75%|████████████████████████████████████████████████████████████████████████████████▍ | 382/508 [14:25<03:45, 1.79s/it]Before fix: select countrylanguage.country from countrylanguage where countrylanguage.isofficial = 'English' or countrylanguage.isofficial = 'Dutch'
After fix: select countrylanguage.countrycode from countrylanguage where countrylanguage.isofficial = 'English' or countrylanguage.isofficial = 'Dutch'
---------------
76%|█████████████████████████████████████████████████████████████████████████████████▎ | 386/508 [14:35<04:58, 2.45s/it]Before fix: select countrylanguage.language from countrylanguage where country.governmentform = 'Republic' and count ( countrylanguage.* ) = 1 group by country.language.language
After fix: select countrylanguage.language from countrylanguage where country.governmentform = 'Republic' and count ( countrylanguage.* ) = 1 group by countrylanguage.language
---------------
90%|████████████████████████████████████████████████████████████████████████████████████████████████ | 456/508 [17:09<01:46, 2.05s/it]Before fix: select highschooler.friend_name from highschooler where highschooler.name = 'Kyle'
After fix: select highschooler.name from highschooler where highschooler.name = 'Kyle'
---------------
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 508/508 [19:10<00:00, 2.27s/it]
Text-to-SQL inference spends 1177.4357948303223s.
exact_match score: 0.765748031496063
exec score: 0.8070866141732284
ckpt name: ./models/text2natsql-t5-3b/checkpoint-61642
EM: 0.7736220472440944
EXEC: 0.8188976377952756
-----------
ckpt name: ./models/text2natsql-t5-3b/checkpoint-78302
EM: 0.765748031496063
EXEC: 0.8070866141732284
-----------
Best EM ckpt: {'ckpt': './models/text2natsql-t5-3b/checkpoint-61642', 'EM': 0.7736220472440944, 'EXEC': 0.8188976377952756}
Best EXEC ckpt: {'ckpt': './models/text2natsql-t5-3b/checkpoint-61642', 'EM': 0.7736220472440944, 'EXEC': 0.8188976377952756}
Best EM+EXEC ckpt: {'ckpt': './models/text2natsql-t5-3b/checkpoint-61642', 'EM': 0.7736220472440944, 'EXEC': 0.8188976377952756}
I hope this will help you.
from resdsql.
Yes, this helps me a lot! Thank you so much for the information that you've provided!
I think this issue might be something to do with my machine. Thank you for your efforts and response. I really appreciate it!
from resdsql.
Related Issues (20)
- Execuse me. What happened to paper CodeS? Isn't this article open source before? HOT 9
- Low training metrics HOT 14
- Support for Historical Conversation in RESDSQL HOT 4
- Question about evaluation scripts HOT 2
- 请问推理方法 HOT 2
- 最低支持的GPU内存是多少,我怎么跑不起来。
- Dev result file?
- 部分带有别名的sql在经过normalization处理后出现错误 HOT 2
- Inference script not working HOT 5
- CoSQL HOT 1
- 训练Cross-Encoder的时候为什么24G的显存还不够用? HOT 1
- 关于RESDSQL在BIRD上的运行时间 HOT 2
- Training cross-coder error HOT 1
- xlm_roberta_text2natsql_schema_item_classifier HOT 3
- Evaluation detail on CSpider HOT 1
- 你好,请问如何将自己的数据集处理成CSpider的形式? HOT 3
- 你好,请问如何SQL2NatSQL?我想用自己的数据集跑text2NatSQL的方法。 HOT 2
- 请问模型训练有多gpu并行支持吗 HOT 1
- Can the ranking-filter successfully choose all the right schema items? HOT 1
- 为什么我使用对bird训练的classifier时出现了truncated_dataset.json文件,而且陷入了循环无法结束运行 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from resdsql.