Trying to create a perfect data for LLM training and interpreting
- Write normal class relations instead of hand-crafted bullshit
- shuffle before train
- Do something with '\n' and statement length
- Fix trouble with dot ., make generated sentences separable, add <eos_token> or another sep
- ToDo: write custom beam search over model outputs, and vary length, then check whether generated names of
entities
have sens - BM25 evaluation