tingofurro / keep_it_simple Goto Github PK

View Code? Open in Web Editor NEW

36.0 36.0 4.0 67 KB

Codebase, data and models for the Keep it Simple paper at ACL2021

License: Apache License 2.0

Python 70.33% Jupyter Notebook 4.14% JavaScript 2.32% HTML 23.21%

acl2021 bert news reinforcement-learning simplification text-simplification unsupervised-learning

keep_it_simple's Issues

Expanation for generate_batch and other generate_functions.

Does generate_batch in model_generator uses greedy decoding ? if so why wasn't greedy decoding implemented in HF used ?

Can you HF greedy decoding instead of generate_batch because I think so generate_batch is doing greedy decoding.

Processing in data collator

Hi Tingofurro,

Thanks for sharing a nice simplification repository.
I have a query for the explanation of the processing happening in the data collator:

def cc_news_collate(inps):
batch_paras = []
for inp in inps:
text = inp["text"]
paragraphs = sorted(text.split("\n"), key=lambda p: abs(p.count(" ")-35))
batch_paras.append(paragraphs[0])
return batch_paras

Why are you only appending the largest paragraph (if I am correct) rather than the complete text?

Looking forward to your response.

utils_misc can't find freer GPU

The nvidia-smi parsing code results in an empty sequence.

>>> import utils_misc
>>> utils_misc.select_freer_gpu()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/corey/workspace/keep_it_simple/utils_misc.py", line 24, in select_freer_gpu
    freer_gpu = str(get_freer_gpu())
  File "/home/corey/workspace/keep_it_simple/utils_misc.py", line 11, in get_freer_gpu
    return np.argmax(memory_available)
  File "<__array_function__ internals>", line 200, in argmax
  File "/home/corey/workspace/keep_it_simple/.venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 1242, in argmax
    return _wrapfunc(a, 'argmax', axis=axis, out=out, **kwds)
  File "/home/corey/workspace/keep_it_simple/.venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 54, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/home/corey/workspace/keep_it_simple/.venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
ValueError: attempt to get argmax of an empty sequence

Changing to 'grep -A5' appears to work for running the sample on a system with one consumer GPU, but I'm not equipped to evaluate the overall impact of the change.

os.system('nvidia-smi -q -d Memory |grep -A5 GPU|grep Free >tmp_smi')

May I ask how coverage_roberta.bin and gpt2_med_cp90.bin are generated?

May I ask how coverage_roberta.bin and gpt2_med_cp90.bin are generated? Can you provide the relevant code? Thank you.

tingofurro / keep_it_simple Goto Github PK

keep_it_simple's Issues

Expanation for generate_batch and other generate_functions.

Processing in data collator

utils_misc can't find freer GPU

May I ask how coverage_roberta.bin and gpt2_med_cp90.bin are generated?

An explanation towards the coverage scorer

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs