notai-tech / deepsegment Goto Github PK
View Code? Open in Web Editor NEWA sentence segmenter that actually works!
Home Page: http://bpraneeth.com/projects
License: GNU General Public License v3.0
A sentence segmenter that actually works!
Home Page: http://bpraneeth.com/projects
License: GNU General Public License v3.0
I get this error when I try to initialize a segmenter like so using pre-trained model suggested https://github.com/bedapudi6788/DeepSegment-Models:
segmenter = deepsegment.DeepSegment(
config_path='content_creation/ai_pretrained_models/deepsegment_eng_v1/deepsegment_eng_v1/config.json'
)
Error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 5665: character maps to <undefined>
Is there an issue with the pre-trained model, or have I set up something wrong ?
i make my own dataset and i make glove in my own using word2vec, and i got this issue on the last step
AttributeError: module 'tensorflow.python.framework.ops' has no attribute '_TensorLike'
can tell me why
Describe the bug and error messages (if any)
Find a typo:
https://github.com/bedapudi6788/deepsegment/blob/v2/deepsegment/deepsegment.py#L170
utils
should be modified to params
Bug description
I am using segment_long
to segment a relatively long paragraph, and despite using this specific function I am getting a repeated warning that reads
WARNING:root:Consider using segment_long for longer sentences.
Snippet which gave this bug
from deepsegment import DeepSegment
segmenter = DeepSegment('en')
...
resegmented = segmenter.segment_long(a_mod)
Specify versions of the following libraries
Expected behavior
No warnings at all, given I'm using the suggested function.
Notes
This warning message seems to by-pass warnings.filterwarnings('ignore')
.
I have prepared the data using data_gen.py
in master branch, how do I train a new model?
Versions:
- deepsegment 2.3.1
- tensorflow 2.4.0
- keras 2.3.1
Problem:
As I try to run the basic code for Sentence Segmentation
from deepsegment import DeepSegment
d = 'this is a sentence this is another sentence'
segmenter = DeepSegment('en')
d_seg = segmenter.segment(d)
I get the following error:
AttributeError: module 'tensorflow.python.framework.ops' has no attribute '_TensorLike'
Describe the bug and error messages (if any)
I trained Deepsegment on 1GB of custom data in Swedish. All was successful but when I run inference the model does not segment the text.
The code snippet which gave this error*
for line in lines[:3]:
print(line)
print('Tot: {}'.format(len(lines)))
--------------------------------
Enligt ett pressmeddelande från Anza är Hamilton Acorn Englands ledande producent av professionella måleriverktyg.
Omsättningen är cirka 150 miljoner kronor och företaget har 136 anställda.
Det var ett paket med flera kilo hasch som hittades av tullen på Landvetters flygplats utanför Göteborg.
Tot: 10126568
--------------------------------
x, y = generate_data(lines[10000:], max_sents_per_example=6, n_examples=10000)
vx, vy = generate_data(lines[:10000], max_sents_per_example=6, n_examples=1000)
--------------------------------
100% (10000 of 10000) |##################| Elapsed Time: 0:00:01 Time: 0:00:01
100% (1000 of 1000) |####################| Elapsed Time: 0:00:00 Time: 0:00:00
--------------------------------
train(x, y, vx, vy, epochs=2, batch_size=64, save_folder='./', glove_path='cc.sv.100.vec')
--------------------------------
Epoch 1/2
157/157 [==============================] - 168s 1s/step - loss: 3.6761
- f1: 80.93
precision recall f1-score support
sent 0.97 0.69 0.81 3425
avg / total 0.97 0.69 0.81 3425
Epoch 00001: f1 improved from -inf to 0.80926, saving model to ./checkpoint
Epoch 2/2
157/157 [==============================] - 166s 1s/step - loss: 3.5520
- f1: 84.49
precision recall f1-score support
sent 0.97 0.75 0.84 3425
avg / total 0.97 0.75 0.84 3425
Epoch 00002: f1 improved from 0.80926 to 0.84494, saving model to ./checkpoint
--------------------------------
from deepsegment import DeepSegment
segmenter = DeepSegment(lang_code=None, checkpoint_path='checkpoint', params_path='params', utils_path='utils', tf_serving=False, checkpoint_name=None)
segmenter.segment('under natten har det varit inbrott i ett kontor vid bredåkra kyrka en person gripen misstänkt för inbrottet polisen skriver på sin facebooksida att en av deras hundförare lyckades spåra upp gärningsmannen och det tillgripna godset personen som är i trettiofemårsåldern greps och sitter nu anhållen ingrid elfstråhle p fyra blekinge')
--------------------------------
['under natten har det varit inbrott i ett kontor vid bredåkra kyrka en person gripen misstänkt för inbrottet polisen skriver på sin facebooksida att en av deras hundförare lyckades spåra upp gärningsmannen och det tillgripna godset personen som är i trettiofemårsåldern greps och sitter nu anhållen ingrid elfstråhle p fyra blekinge']
cc.sv.100.vec is Facebook fasttext 300 vec reduced to 100 in Swedish.
Specify versions of the following libraries
Expected behavior
I expected Deepsegment to segment the text.
Screenshots
Nope
I am trying to run the sample code on google colab, and I got error:
Tensors in list passed to 'values' of 'ConcatV2' Op have types [bool, float32] that don't all match.
Here is the shared colab.
https://colab.research.google.com/drive/16dVsf_4J_HCAuBn_aNZXQ6qFd44h2gEC
Here is code I tried. The error happened at : segmenter = DeepSegment('en')
from deepsegment import DeepSegment
segmenter = DeepSegment('en')
segmenter.segment('I am Batman i live in gotham')
['I am Batman', 'i live in gotham']
I want to reproduce your work with my own dataset.
could you explain dependency and requirements?
setup.py requires only 'segtag' and segtag requires 'numpy'
but I need more detail dependency and configuration
Tensorflow and other modules.
thanks
Hello,
Apologies if the headline isn't to the point.
Actually I used Deepsegment (https://colab.research.google.com/drive/1CjYbdbDHX1UmIyvn7nDW2ClQPnnNeA_m#scrollTo=K9oMoDwwXgQl) to train a language model on my custom data. However when I use the trained model (HDF format) and parms (JSON format) and run the code below:
### My logic: czech-data -> DeepSegment -> train mode -> DeepCorrect -> punctuated and segmented sentences
from deepcorrect import DeepCorrect
DeepCorrect('/home/sagar/.DeepSegment_cs/params', '/home/sagar/.DeepSegment_cs/checkpoint')
There is an error that
UnpicklingError: invalid load key, '{'
As far I understand correctly, deep correct expects the params file to be a pickle file and not a plain text JSON file.
Is there anything wrong with my approach?
Thank You
Hi
I am using keras version 2.3.1 and tensorflow version 2.2.0
I am using DeepSegment:
stam = 'I am Batman i live in gotham'
testText = segmenter.segment(stam)
and i am getting the following error:
File "C:\PythonTestProjects\FlaskApp\testLanguageExtraction.py", line 223, in testConceptDataExtraction
testText = segmenter.segment(stam)
File "C:\PythonTestProjects\FlaskApp\env\lib\site-packages\deepsegment\deepsegment.py", line 215, in segment
all_tags = DeepSegment.seqtag_model.predict(encoded_sents, batch_size=batch_size)
File "C:\PythonTestProjects\FlaskApp\env\lib\site-packages\keras\engine\training.py", line 1452, in predict
if self._uses_dynamic_learning_phase():
File "C:\PythonTestProjects\FlaskApp\env\lib\site-packages\keras\engine\training.py", line 382, in _uses_dynamic_learning_phase
not isinstance(K.learning_phase(), int))
File "C:\PythonTestProjects\FlaskApp\env\lib\site-packages\keras\backend\tensorflow_backend.py", line 73, in symbolic_fn_wrapper
if _SYMBOLIC_SCOPE.value:
AttributeError: '_thread._local' object has no attribute 'value'
How shoud I avoid abbreviation and collocations as nltk?
When I test below sentence, deepsegment recognize i.e.
as punkt.
To it Thornton adds to knock into a cocked hat, despite its English sound, and to have an ax to grind. To go for, both in the sense of belligerency and in that of partisanship, is also American, and so is to go through (i. e., to plunder). Of adjectives the list is scarcely less long.
Result:
{
"nltk_punkt": [
"To it Thornton adds to knock into a cocked hat, despite its English sound, and to have an ax to grind.",
"To go for, both in the sense of belligerency and in that of partisanship, is also American, and so is to go through (i. e., to plunder).",
"Of adjectives the list is scarcely less long."
],
"spacy en_core_web_sm": [
"To it Thornton adds to knock into a cocked hat, despite its English sound, and to have an ax to grind.",
"To go for, both in the sense of belligerency and in that of partisanship, is also American, and so is to go through (i. e., to plunder).",
"Of adjectives the list is scarcely less long."
],
"DeepSegment": [
"To it Thornton adds to knock into a cocked hat, despite its English sound, and to have an ax to grind.",
"To go for, both in the sense of belligerency and in that of partisanship, is also American, and so is to go through (i.",
"e., to plunder).",
"Of adjectives the list is scarcely less long."
]
}
Hello,
I'm going to expand a custom training set with new examples. These two sentences are in the set:
In this case, I need that DeepSegment keeps the boundaries of the longest one (1). Considering that both examples are in training, I wonder if the final result would be
['Good morning', 'I need help to fix this issue']
I would like to avoid this but keep both examples as training. Would this be possible after training the model?
Thanks
Hi,
Deep segment model available for download is read only, while trying to load the model it's throwing error:
PermissionError: [Errno 13] Permission denied: 'models/deepsegment_eng'
I tried to remove read only options in security tab of folder, and couple of other options but nothing works. could you please look in to it ?
Thanks
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1585986715.466000000","description":"Failed to create subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":2261,"referenced_errors":[{"created":"@1585986715.466000000","description":"Pick Cancelled","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":224,"referenced_errors":[{"created":"@1585986715.466000000","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":69,"grpc_status":4}]}]}"
Hello,
I have followed the guide to train a custom DeepSegment that you published on colab.research.google.com
I want to try it on Spanish, so I used a custom corpus of sentences in Spanish and a custom vector file from this corpus.
After training, the checkpoint is saved and then I use DeepSegment as follows:
segmenter = DeepSegment(checkpoint_name='checkpoint')
I didn't specify "es" as Spanish is not supported, so I think that "english" is loaded by default in the object.
Is this the right way to train a different language?
Thanks!
I am inputting an entire paragraph to test the segmentation behavior. I am using segment_long, but when I do and leave the n_window = 10, I recieve the following error and the process seems to stall
I have a question. Is it possible to train such for other languages as well? If you can guide how to do this?
Hello!
I got an "automatic" update of Keras from 2.3.1 to 2.4.3 (I'm using pipenv, and Keras was set to "*" in the Pipfile).
My code won't run anymore with Keras 2.4.3, I'm getting the error below at runtime.
No issue with 2.3.1 though.
(I don't mind using Keras 2.3.1 so not really a blocking issue, just thought you'd wanted to know.)
Traceback (most recent call last): File "xxx/main.py", line 23, in <module> segmenter = DeepSegment('fr')
m = K.slice(states[3], [0, t], [-1, 2]) AttributeError: module 'keras.backend' has no attribute 'slice'
I also have an error with Keras 2.4.2:
Exception occured: in user code:
/xxx/lib/python3.8/site-packages/tensorflow/python/keras/utils/tf_utils.py:140 get_reachable_from_inputs raise TypeError('Expected Operation, Variable, or Tensor, got ' + str(x)) TypeError: Expected Operation, Variable, or Tensor, got 0
I got this error by installing via pip in an environment using python3
Using cached https://files.pythonhosted.org/packages/93/4b/979db9e44be09f71e85c9c8cfc42f258adfb7d93ce01deed2788b2948919/logging-0.4.9.6.tar.gz
Complete output from command python setup.py egg_info:
running egg_info
creating pip-egg-info/logging.egg-info
writing pip-egg-info/logging.egg-info/PKG-INFO
writing dependency_links to pip-egg-info/logging.egg-info/dependency_links.txt
writing top-level names to pip-egg-info/logging.egg-info/top_level.txt
writing manifest file 'pip-egg-info/logging.egg-info/SOURCES.txt'
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-sb7rjvuw/logging/setup.py", line 13, in
packages = ["logging"],
File "/usr/lib/python3.5/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.5/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/frejus/.virtualenvs/labforsims_neonat/local/lib/python3.5/site-packages/setuptools/command/egg_info.py", line 278, in run
self.find_sources()
File "/home/frejus/.virtualenvs/labforsims_neonat/local/lib/python3.5/site-packages/setuptools/command/egg_info.py", line 293, in find_sources
mm.run()
File "/home/frejus/.virtualenvs/labforsims_neonat/local/lib/python3.5/site-packages/setuptools/command/egg_info.py", line 524, in run
self.add_defaults()
File "/home/frejus/.virtualenvs/labforsims_neonat/local/lib/python3.5/site-packages/setuptools/command/egg_info.py", line 560, in add_defaults
sdist.add_defaults(self)
File "/home/frejus/.virtualenvs/labforsims_neonat/local/lib/python3.5/site-packages/setuptools/command/py36compat.py", line 34, in add_defaults
self._add_defaults_python()
File "/home/frejus/.virtualenvs/labforsims_neonat/local/lib/python3.5/site-packages/setuptools/command/sdist.py", line 127, in _add_defaults_python
build_py = self.get_finalized_command('build_py')
File "/usr/lib/python3.5/distutils/cmd.py", line 298, in get_finalized_command
cmd_obj = self.distribution.get_command_obj(command, create)
File "/usr/lib/python3.5/distutils/dist.py", line 846, in get_command_obj
klass = self.get_command_class(command)
File "/home/frejus/.virtualenvs/labforsims_neonat/local/lib/python3.5/site-packages/setuptools/dist.py", line 635, in get_command_class
self.cmdclass[command] = cmdclass = ep.load()
File "/home/frejus/.virtualenvs/labforsims_neonat/local/lib/python3.5/site-packages/pkg_resources/init.py", line 2229, in load
return self.resolve()
File "/home/frejus/.virtualenvs/labforsims_neonat/local/lib/python3.5/site-packages/pkg_resources/init.py", line 2235, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/home/frejus/.virtualenvs/labforsims_neonat/local/lib/python3.5/site-packages/setuptools/command/build_py.py", line 15, in
from setuptools.lib2to3_ex import Mixin2to3
File "/home/frejus/.virtualenvs/labforsims_neonat/local/lib/python3.5/site-packages/setuptools/lib2to3_ex.py", line 12, in
from lib2to3.refactor import RefactoringTool, get_fixers_from_package
File "/usr/lib/python3.5/lib2to3/refactor.py", line 19, in
import logging
File "/tmp/pip-install-sb7rjvuw/logging/logging/init.py", line 618
raise NotImplementedError, 'emit must be implemented '
^
SyntaxError: invalid syntax
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-sb7rjvuw/logging/
The work done is very valuable, thank you very much. Do you plan to support Turkish?
Or is it possible for me to train for Turkish with custom data.
I try to load the checkpoint locally using this:
seg = DeepSegment(lang_code=None, checkpoint_path="path/to/checkpoint")
but it does not work as it will always download the checkpoint.
what is the best way to do that?
Describe the bug and error messages (if any)
Can't import the module
The code snippet which gave this error*
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from deepsegment import DeepSegment
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'deepsegment'
Specify versions of the following libraries
pip install -upgrade deepsegment
: Successfully installed deepsegment-1.0.1 logging-0.4.9.6 numpy-1.16.6 seqtag-1.0.3Expected behavior
I expected for the module to be imported and to be able to follow the examples in the README
Thank you!
I am trying to test with my situation where I have lots of raw data with or without punctuation symbols. Couple of examples are below. First example has no punctuation and second has sentence separated by comma with spelling mistake.
When I run this statement through example code, I get no split at all.
It is likely your code may not expect raw statements as what I have. I don't have control on incoming data in raw format. I also receive this type of statements in 1000s so there is no way for manually modify each and every. Is there anything which I can do to make this work ?
DRIVE WITH EXCESS BLOOD ALCOHOL SPEED-EXCEED BY 15 KM/HR OR LESS FAIL TO SIGNAL DRIVE UNDER DISQUALIFICATION
Breach re 17/12/06 DRIVE WHILST AUTHORISATION SUSPENDED (2 CHARGES), EX PRESC CONC 3HRS-BREATH-DRIVER VECHICLE (3 CHARGES), DRIVE WHILST DISQUALIFIED
From the blog post:
pip install --upgrade deepsegment
import deepsegment
deepsegment.download('eng_fra_ita')
This doesn't seem to work (there is no download
method).
Hey!
I want to use this with tensorflow serving but having trouble understanding how to generate an hd5 file. Any ideas how I can do that?
Thanks
Describe the bug and error messages (if any)
I've tried to run:
segmenter = DeepSegment('en')
text = Path('data/bandt.txt').read_text()
tokens = segmenter.segment(text)
In both python3.7 and python3.6 and I'm getting this same error:
Traceback (most recent call last):
File "ds.py", line 4, in <module>
segmenter = DeepSegment('en')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/deepsegment/deepsegment.py", line 140, in __init__
DeepSegment.seqtag_model.load_weights(checkpoint_path)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/engine/saving.py", line 492, in load_wrapper
return load_function(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/engine/network.py", line 1221, in load_weights
with h5py.File(filepath, mode='r') as f:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h5py/_hl/files.py", line 408, in __init__
swmr=swmr)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h5py/_hl/files.py", line 173, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (truncated file: eof = 16777216, sblock->base_addr = 0, stored_eof = 80443280)
The code snippet which gave this error*
Specify versions of the following libraries
Expected behavior
For the sentences in text to be tokenized
Describe the bug and error messages (if any)
This is not a bug, however a request for clarification on the optimal length of n_window based on the string length. Is there a factor that should be considered when setting n_window? I am feeding in various length phrases to be broken up, so I could see setting this dynamically.
Thank you
Describe the bug and error messages (if any)
Training loss is nan after training for half an epoch. Is there a problem with my params?
Also, batch_size of 32 is as high as I can go. Everything above will OOM.
These params will consume all 32GB of RAM and some swap. May be related to the warning in the log: "UserWarning: Converting sparse IndexedSlices to a dense Tensor with 120289200 elements. This may consume a large amount of memory."
Hardware
Intel i7 32GB RAM + 2080Ti 11GB
The code snippet which gave this error*
Training code:
from deepsegment import train, generate_data
import unicodedata
import re
from tqdm import tqdm
lines = ...Reading all lines from 1GB text file...
print('ok')
x, y = generate_data(lines[10000:], max_sents_per_example=6, n_examples=1000000)
vx, vy = generate_data(lines[:10000], max_sents_per_example=6, n_examples=100000)
train(x, y, vx, vy, epochs=15, batch_size=32, save_folder='./', glove_path='cc.sv.100.vec')
Log:
Using TensorFlow backend.
WARNING: Logging before flag parsing goes to stderr.
W0611 11:19:13.705538 139626754037568 deepsegment.py:22] Tensorflow serving is not installed. Cannot be used with tesnorflow serving docker images.
W0611 11:19:13.705653 139626754037568 deepsegment.py:23] Run pip install tensorflow-serving-api==1.12.0 if you want to use with tf serving.
W0611 11:19:13.706187 139626754037568 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/deepsegment/train.py:9: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
W0611 11:19:13.706330 139626754037568 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/deepsegment/train.py:11: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2020-06-11 11:19:13.719930: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-11 11:19:13.726062: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-06-11 11:19:13.850172: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 11:19:13.851079: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4d4fe50 executing computations on platform CUDA. Devices:
2020-06-11 11:19:13.851095: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-06-11 11:19:13.863673: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 4008000000 Hz
2020-06-11 11:19:13.864927: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4a1f2d0 executing computations on platform Host. Devices:
2020-06-11 11:19:13.864978: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2020-06-11 11:19:13.865398: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 11:19:13.866497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635
pciBusID: 0000:01:00.0
2020-06-11 11:19:13.869341: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-06-11 11:19:13.906978: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-06-11 11:19:13.925570: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-06-11 11:19:13.931635: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-06-11 11:19:13.972803: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-06-11 11:19:13.999081: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-06-11 11:19:14.073585: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-06-11 11:19:14.073906: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 11:19:14.075861: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 11:19:14.077537: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-06-11 11:19:14.078231: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-06-11 11:19:14.081715: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-11 11:19:14.081795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2020-06-11 11:19:14.081834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2020-06-11 11:19:14.082829: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 11:19:14.084705: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 11:19:14.086398: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10309 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10126601/10126601 [02:57<00:00, 57064.71it/s]
ok
100% (1000000 of 1000000) |############################################################################################################################################################################| Elapsed Time: 0:01:23 Time: 0:01:23
100% (100000 of 100000) |##############################################################################################################################################################################| Elapsed Time: 0:00:06 Time: 0:00:06
2020-06-11 11:29:57.768082: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 11:29:57.768525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635
pciBusID: 0000:01:00.0
2020-06-11 11:29:57.768556: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-06-11 11:29:57.768567: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-06-11 11:29:57.768577: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-06-11 11:29:57.768587: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-06-11 11:29:57.768596: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-06-11 11:29:57.768605: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-06-11 11:29:57.768615: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-06-11 11:29:57.768665: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 11:29:57.769086: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 11:29:57.769471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-06-11 11:29:57.769490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-11 11:29:57.769497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2020-06-11 11:29:57.769502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2020-06-11 11:29:57.769566: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 11:29:57.769987: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 11:29:57.770382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10309 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
W0611 11:30:00.458791 139626754037568 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/backend.py:3794: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py:90: UserWarning: Converting sparse IndexedSlices to a dense Tensor with 120289200 elements. This may consume a large amount of memory.
num_elements)
W0611 11:30:08.525202 139626754037568 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
Epoch 1/15
2020-06-11 11:30:10.113686: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
1/31250 [..............................] - ETA: 19:39:54 - loss: 3.1761���������������������������������������������������������������������������
2/31250 [..............................] - ETA: 11:03:17 - loss: 3.0586���������������������������������������������������������������������������
3/31250 [..............................] - ETA: 8:08:36 - loss: 3.3094 ��������������������������������������������������������������������������
4/31250 [..............................] - ETA: 6:34:41 - loss: 3.4420��������������������������������������������������������������������������
5/31250 [..............................] - ETA: 5:38:11 - loss: 3.3396��������������������������������������������������������������������������
6/31250 [..............................] - ETA: 5:00:34 - loss: 3.1736��������������������������������������������������������������������������
7/31250 [..............................] - ETA: 4:38:08 - loss: 3.1815��������������������������������������������������������������������������
8/31250 [..............................] - ETA: 4:19:09 - loss: 3.2167��������������������������������������������������������������������������
9/31250 [..............................] - ETA: 4:06:03 - loss: 3.2543��������������������������������������������������������������������������
<Snipped>
15548/31250 [=============>................] - ETA: 1:08:47 - loss: 3.2829��������������������������������������������������������������������������
15549/31250 [=============>................] - ETA: 1:08:46 - loss: 3.2828��������������������������������������������������������������������������
15550/31250 [=============>................] - ETA: 1:08:46 - loss: 3.2828��������������������������������������������������������������������������
15551/31250 [=============>................] - ETA: 1:08:46 - loss: 3.2828��������������������������������������������������������������������������
15552/31250 [=============>................] - ETA: 1:08:46 - loss: 3.2827��������������������������������������������������������������������������
15553/31250 [=============>................] - ETA: 1:08:45 - loss: 3.2828��������������������������������������������������������������������������
15554/31250 [=============>................] - ETA: 1:08:45 - loss: nan �����������������������������������������������������������������������
15555/31250 [=============>................] - ETA: 1:08:45 - loss: nan�����������������������������������������������������������������������
15556/31250 [=============>................] - ETA: 1:08:45 - loss: nan�����������������������������������������������������������������������
15557/31250 [=============>................] - ETA: 1:08:44 - loss: nan�����������������������������������������������������������������������
15558/31250 [=============>................] - ETA: 1:08:44 - loss: nan�����������������������������������������������������������������������
15559/31250 [=============>................] - ETA: 1:08:44 - loss: nan�����������������������������������������������������������������������
<Snipped>
31248/31250 [============================>.] - ETA: 0s - loss: nan������������������������������������������������������������������
31249/31250 [============================>.] - ETA: 0s - loss: nan������������������������������������������������������������������
31250/31250 [==============================] - 8239s 264ms/step - loss: nan
2020-06-11 13:47:28.196777: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
- f1: 0.00
precision recall f1-score support
sent 0.00 0.00 0.00 349418
avg / total 0.00 0.00 0.00 349418
Epoch 00001: f1 improved from -inf to 0.00000, saving model to ./checkpoint
Epoch 2/15
1/31250 [..............................] - ETA: 2:40:55 - loss: nan�����������������������������������������������������������������������
2/31250 [..............................] - ETA: 2:54:27 - loss: nan�����������������������������������������������������������������������
3/31250 [..............................] - ETA: 2:37:31 - loss: nan�����������������������������������������������������������������������
4/31250 [..............................] - ETA: 2:35:34 - loss: nan�����������������������������������������������������������������������
5/31250 [..............................] - ETA: 2:28:59 - loss: nan�����������������������������������������������������������������������
6/31250 [..............................] - ETA: 2:30:40 - loss: nan�����������������������������������������������������������������������
7/31250 [..............................] - ETA: 2:28:10 - loss: nan�����������������������������������������������������������������������
8/31250 [..............................] - ETA: 2:27:05 - loss: nan�����������������������������������������������������������������������
9/31250 [..............................] - ETA: 2:27:35 - loss: nan�����������������������������������������������������������������������
<Snipped>
31244/31250 [============================>.] - ETA: 1s - loss: nan������������������������������������������������������������������
31245/31250 [============================>.] - ETA: 1s - loss: nan������������������������������������������������������������������
31246/31250 [============================>.] - ETA: 1s - loss: nan������������������������������������������������������������������
31247/31250 [============================>.] - ETA: 0s - loss: nan������������������������������������������������������������������
31248/31250 [============================>.] - ETA: 0s - loss: nan������������������������������������������������������������������
31249/31250 [============================>.] - ETA: 0s - loss: nan������������������������������������������������������������������
31250/31250 [==============================] - 8214s 263ms/step - loss: nan
- f1: 0.00
precision recall f1-score support
sent 0.00 0.00 0.00 349418
avg / total 0.00 0.00 0.00 349418
Epoch 00002: f1 did not improve from 0.00000
Epoch 3/15
1/31250 [..............................] - ETA: 2:16:22 - loss: nan�����������������������������������������������������������������������
2/31250 [..............................] - ETA: 2:17:49 - loss: nan�����������������������������������������������������������������������
3/31250 [..............................] - ETA: 2:28:30 - loss: nan�����������������������������������������������������������������������
4/31250 [..............................] - ETA: 2:54:15 - loss: nan�����������������������������������������������������������������������
5/31250 [..............................] - ETA: 2:46:57 - loss: nan�����������������������������������������������������������������������
6/31250 [..............................] - ETA: 2:53:14 - loss: nan�����������������������������������������������������������������������
7/31250 [..............................] - ETA: 2:44:51 - loss: nan�����������������������������������������������������������������������
8/31250 [..............................] - ETA: 2:38:22 - loss: nan�����������������������������������������������������������������������
9/31250 [..............................] - ETA: 2:34:27 - loss: nan�����������������������������������������������������������������������
<Snipped>
31246/31250 [============================>.] - ETA: 1s - loss: nan������������������������������������������������������������������
31247/31250 [============================>.] - ETA: 0s - loss: nan������������������������������������������������������������������
31248/31250 [============================>.] - ETA: 0s - loss: nan������������������������������������������������������������������
31249/31250 [============================>.] - ETA: 0s - loss: nan������������������������������������������������������������������
31250/31250 [==============================] - 8214s 263ms/step - loss: nan
- f1: 0.00
precision recall f1-score support
sent 0.00 0.00 0.00 349418
avg / total 0.00 0.00 0.00 349418
Epoch 00003: f1 did not improve from 0.00000
Epoch 4/15
1/31250 [..............................] - ETA: 1:57:41 - loss: nan�����������������������������������������������������������������������
2/31250 [..............................] - ETA: 2:05:43 - loss: nan�����������������������������������������������������������������������
3/31250 [..............................] - ETA: 2:09:32 - loss: nan�����������������������������������������������������������������������
4/31250 [..............................] - ETA: 2:10:20 - loss: nan�����������������������������������������������������������������������
<Snipped>
31250/31250 [==============================] - 8214s 263ms/step - loss: nan
- f1: 0.00
precision recall f1-score support
sent 0.00 0.00 0.00 349418
avg / total 0.00 0.00 0.00 349418
Epoch 00004: f1 did not improve from 0.00000
Exited here....
Specify versions of the following libraries
Expected behavior
I was hoping to get the model to improve and not have nan loss.
Screenshots
Nope
Hi,
as stated in the medium blog (https://medium.com/@praneethbedapudi/deepsegment-2-0-multilingual-text-segmentation-with-vector-alignment-fd76ce62194f) seems that in previuous commits there was an option for the Italian language.
Do you plan to re-add the Italian language as an option for the package?
Thanks
from deepsegment import DeepSegment
segmenter = DeepSegment('en')
segmenter.segment('I am Batman i live in gotham')
It gives me this ['I am Batman', 'i live in gotham']
but when i give it
from deepsegment import DeepSegment
segmenter = DeepSegment('en')
segmenter.segment('hello what is your name how are you ')
then it gives me
'hello what is your name who are you'
what would you advise?
I was trying to set up the library to test the accuracy for my usecase.
I did the package installs and created a script containing just the content from the readme
When executing it i am facing the following error
Traceback (most recent call last):
File "correct.py", line 4, in
print(segmenter.segment('I am Batman i live in gotham'))
File "/home/algante/tmp/deepspeech-venv/lib/python3.6/site-packages/deepsegment/deepsegment.py", line 149, in segment
all_tags = get_tf_serving_respone(DeepSegment.seqtag_model, encoded_sents)
File "/home/algante/tmp/deepspeech-venv/lib/python3.6/site-packages/deepsegment/deepsegment.py", line 73, in get_tf_serving_respone
response =stub.Predict(request, 20)
File "/home/algante/tmp/deepspeech-venv/lib/python3.6/site-packages/grpc/_channel.py", line 604, in call
return _end_unary_response_blocking(state, call, False, None)
File "/home/algante/tmp/deepspeech-venv/lib/python3.6/site-packages/grpc/_channel.py", line 506, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1570094221.600498100","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3876,"referenced_errors":[{"created":"@1570094221.600481900","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":395,"grpc_status":14}]}"
Can someone help with this?
Will there be an update to the library to support the new version of Keras 2.4?
I have encountered a problem that when I create a microservice based on deep segment and flask I get an error, which is based on the fact that Keras 2.3.1 does not work with multi threading
error that occurs looks like this:
File "/keras/engine/training.py", line 1452, in predict
if self._uses_dynamic_learning_phase():
File "/keras/engine/training.py", line 382, in _uses_dynamic_learning_phase
not isinstance(K.learning_phase(), int))
File "/keras/backend/tensorflow_backend.py", line 73, in symbolic_fn_wrapper
if _SYMBOLIC_SCOPE.value:
AttributeError: '_thread._local' object has no attribute 'value'
Hello, I am trying to load the eng_fra_ita model with the current branch v2, but it is not made available. Is there a way to load it?
P.S. I also dowloaded the zip file from deepsegment-model but there is no "utils" and "params" files.
Thank you
Is the support for german language in your schedule?
I try to load the checkpoint locally using this:
seg = DeepSegment(lang_code=None, checkpoint_path="path/to/checkpoint")
but it does not work as it will always download the checkpoint.
what is the best way to do that?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.