Comments (3)
Hi @dojoteef. Thanks for raising this issue. I'm looking into this at the moment. The regexes that we use to splitting the aggregate summaries are not exhaustive and at times we had to manually intervene and modify the lines in the text, rather than having a regex for every outlier scenario. I am adding some more checks to try and split as many book chapters as possible from this list you've shared. Thanks again!
from booksum.
My latest commit should've fixed this issue. I have tested out the fix for all sources, and should help with splitting some of the book chapters that we were not able to get before.
from booksum.
Thanks so much for looking into all these issues I've brought up! I've shifted focus for now, but will be coming back to the BookSum dataset in the near future and will try to see if I encounter any additional blockers.
Thanks again for all the help!
from booksum.
Related Issues (20)
- What uses the book-level summaries? HOT 1
- Unnecessary in align_data_bi_encoder_paraphrase.py HOT 1
- GPU error running align_data_bi_encoder_paraphrase.py HOT 1
- Release a dataset snapshot HOT 1
- Wrong File Open Mode in <align_data_bi_encoder_paraphrase.py> HOT 1
- Feature request: Provide single script to create data HOT 1
- Some books on the pinkmonkey are not free now. How to solve this problem? HOT 3
- NotADirectoryError[WinError 267] HOT 3
- Missing book level alignments, extra chapter level alignments HOT 1
- Request for .gathered Data Alignment files HOT 1
- Which scripts used to fine-tune t5 model after final alignment .jsonl files generated? HOT 1
- More instructions to reproduce the baseline model results? HOT 2
- Need more instructions to reproduce the extractive oracle of booksum-chapter
- Coverage & Density
- Evaluation Metric - rougeL
- EOL error needs to be changed to ""literature_links.tsv.pruned""
- Licensing Question
- Request for models' summaries and human evaluation
- Using book-level booksum
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from booksum.