GithubHelp home page GithubHelp logo

koomri / text-segmentation Goto Github PK

View Code? Open in Web Editor NEW
242.0 242.0 57.0 4.9 MB

Implementation of the paper: Text Segmentation as a Supervised Learning Task

Python 99.89% Shell 0.11%
dataset deep-learning machine-learning neural-network nlp text-segmentation

text-segmentation's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

text-segmentation's Issues

Redundant padding method in models?

Why the need for both pad(self, s, max_length) and pad_document(self, d, max_document_length) in from_presentation.py, max_sentence_embedding.py and single_lstm.py? To me, it seems that those two methods perform the same task - they just have different variable names.

Questions

Hey,

I was curious, how effective is this tool at segmenting text? Can it also tokenise words?

Can we download a pre-trained model or do we have to train it ourselves? If so, why is that, just out of curiosity?

Is this one of the best such tools out there at the moment or is there any standard tool or machine learning library at the moment which offers a method like this?

Thanks very much.

Error in loading the pretrained model

While trying to load the model given here, I'm facing the following problem:

.....
.....
  File "/home/sid/text-segmentation/evaluate.py", line 13, in load_model
    model = torch.load(f)
  File "/home/sid/miniconda2/envs/textseg/lib/python2.7/site-packages/torch/serialization.py", line 261, in load
    return _load(f, map_location, pickle_module)
  File "/home/sid/miniconda2/envs/textseg/lib/python2.7/site-packages/torch/serialization.py", line 399, in _load
    magic_number = pickle_module.load(f)
cPickle.UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified.

which pytorch version to use for windows

As i don't have linux OS . I'm stuck with setting up the environment .
I saw that for windows there is no version of torch=0.3.0 . If i download torch=0.4.1(cu80/torch-0.4.1-cp35-cp35m-win_amd64.whl or cpu/torch-0.4.1-cp35-cp35m-win_amd64.whl) or latest version and do .
Whether it will work for me ?

segeval version?

Might not be a big deal
I'd trained the model on my own, with a little bit change here or there, mainly on wiki data structure and python version.
Feeling smooth so far.
I use python 3.7 with segeval (2.0.11)
and the change is
seg.pk --> seg.window.pk.pk
seg.window_diff --> seg.window.windowdiff.window_diff

Thanks for the open source

Error in prepare_tensor in evaluate.py

ValueError                                Traceback (most recent call last)
<ipython-input-13-78897f9821a1> in <module>()
----> 1 cutoffs = evaluate.predict_cutoffs(sentences, model, word2vec)

/home/sid/text-segmentation/evaluate.pyc in predict_cutoffs(sentences, model, word2vec)
     40 def predict_cutoffs(sentences, model, word2vec):
     41     word2vec_sentences = text_to_word2vec(sentences, word2vec)
---> 42     tensored_data = prepare_tensor(word2vec_sentences)
     43     batched_tensored_data = []
     44     batched_tensored_data.append(tensored_data)

/home/sid/text-segmentation/evaluate.pyc in prepare_tensor(sentences)
     23     tensored_data = []
     24     for sentence in sentences:
---> 25         tensored_data.append(utils.maybe_cuda(torch.FloatTensor(np.concatenate(sentence))))
     26 
     27     return tensored_data

ValueError: need at least one array to concatenate

Link to a gist containing the error is here

Can't reproduce Pk on Choi's dataset

Hi,

I can't reproduce the reported Pk of 26.26 on Choi's dataset with the pre-trained model.
When I run the default evaluation script

python test_accuracy.py --cuda --model model_gpu.t7 

I get Pk of only 0.3667. What can be wrong here?

Running with threshold: 0.4
Loading word2vec ellapsed: 55.1196949482 seconds
running on Choi
...
2018-10-29 13:41:24,821 - INFO - Finished testing.
2018-10-29 13:41:24,821 - INFO - Average loss: 0.0
2018-10-29 13:41:24,821 - INFO - Average accuracy: 0.8987517337031901
2018-10-29 13:41:24,821 - INFO - Pk: 0.3667.
2018-10-29 13:41:24,821 - INFO - F1: 0.3511.
Seconds to execute to whole flow: 93.9438998699

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.