koomri / text-segmentation Goto Github PK
View Code? Open in Web Editor NEWImplementation of the paper: Text Segmentation as a Supervised Learning Task
Implementation of the paper: Text Segmentation as a Supervised Learning Task
In the train function, softmax activation is not applied but it is applied in the validate function. Why is that so? Thanks.
Why the need for both pad(self, s, max_length)
and pad_document(self, d, max_document_length)
in from_presentation.py
, max_sentence_embedding.py
and single_lstm.py
? To me, it seems that those two methods perform the same task - they just have different variable names.
i want to know where is the source of exceptions file.
Hey,
I was curious, how effective is this tool at segmenting text? Can it also tokenise words?
Can we download a pre-trained model or do we have to train it ourselves? If so, why is that, just out of curiosity?
Is this one of the best such tools out there at the moment or is there any standard tool or machine learning library at the moment which offers a method like this?
Thanks very much.
While trying to load the model given here, I'm facing the following problem:
.....
.....
File "/home/sid/text-segmentation/evaluate.py", line 13, in load_model
model = torch.load(f)
File "/home/sid/miniconda2/envs/textseg/lib/python2.7/site-packages/torch/serialization.py", line 261, in load
return _load(f, map_location, pickle_module)
File "/home/sid/miniconda2/envs/textseg/lib/python2.7/site-packages/torch/serialization.py", line 399, in _load
magic_number = pickle_module.load(f)
cPickle.UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified.
As i don't have linux OS . I'm stuck with setting up the environment .
I saw that for windows there is no version of torch=0.3.0 . If i download torch=0.4.1(cu80/torch-0.4.1-cp35-cp35m-win_amd64.whl or cpu/torch-0.4.1-cp35-cp35m-win_amd64.whl) or latest version and do .
Whether it will work for me ?
Might not be a big deal
I'd trained the model on my own, with a little bit change here or there, mainly on wiki data structure and python version.
Feeling smooth so far.
I use python 3.7 with segeval (2.0.11)
and the change is
seg.pk --> seg.window.pk.pk
seg.window_diff --> seg.window.windowdiff.window_diff
Thanks for the open source
Is there a pretrained model we could use directly?
ValueError Traceback (most recent call last)
<ipython-input-13-78897f9821a1> in <module>()
----> 1 cutoffs = evaluate.predict_cutoffs(sentences, model, word2vec)
/home/sid/text-segmentation/evaluate.pyc in predict_cutoffs(sentences, model, word2vec)
40 def predict_cutoffs(sentences, model, word2vec):
41 word2vec_sentences = text_to_word2vec(sentences, word2vec)
---> 42 tensored_data = prepare_tensor(word2vec_sentences)
43 batched_tensored_data = []
44 batched_tensored_data.append(tensored_data)
/home/sid/text-segmentation/evaluate.pyc in prepare_tensor(sentences)
23 tensored_data = []
24 for sentence in sentences:
---> 25 tensored_data.append(utils.maybe_cuda(torch.FloatTensor(np.concatenate(sentence))))
26
27 return tensored_data
ValueError: need at least one array to concatenate
Link to a gist containing the error is here
Can you share the processed cities and elements dataset?
Hi,
I can't reproduce the reported Pk of 26.26 on Choi's dataset with the pre-trained model.
When I run the default evaluation script
python test_accuracy.py --cuda --model model_gpu.t7
I get Pk of only 0.3667. What can be wrong here?
Running with threshold: 0.4
Loading word2vec ellapsed: 55.1196949482 seconds
running on Choi
...
2018-10-29 13:41:24,821 - INFO - Finished testing.
2018-10-29 13:41:24,821 - INFO - Average loss: 0.0
2018-10-29 13:41:24,821 - INFO - Average accuracy: 0.8987517337031901
2018-10-29 13:41:24,821 - INFO - Pk: 0.3667.
2018-10-29 13:41:24,821 - INFO - F1: 0.3511.
Seconds to execute to whole flow: 93.9438998699
text-segmentation/choiloader.py
Line 26 in 874d6ef
Does this mean we take every sentence separately ? If that's the case, shouldn't we add some context sentences, for example window = 3 ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.