GithubHelp home page GithubHelp logo

pzelasko / kaldialign Goto Github PK

View Code? Open in Web Editor NEW
59.0 59.0 11.0 122 KB

Python wrappers for Kaldi Levenshtein's distance and alignment code.

License: Apache License 2.0

Python 23.15% C++ 16.79% CMake 58.62% Shell 1.43%

kaldialign's People

Contributors

csukuangfj avatar desh2608 avatar glynpu avatar pzelasko avatar somniumism avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

kaldialign's Issues

kaldialign-0.3 install failed

Hi,
when using 'pip install --verbose kaldialign' install kaldialign, failed to install 0.3 version, finally version 0.2 install successfully. logs:

Using pip 21.2.4 from /opt/conda/lib/python3.9/site-packages/pip (python 3.9)
Collecting kaldialign
Downloading kaldialign-0.3.tar.gz (10 kB)
Running command python setup.py egg_info
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-5vs591nx/kaldialign_3f7ed57b6e6d4a979465eb5b17e32acb/setup.py", line 111, in
f.write(f"version = '{get_package_version()}'\n")
File "/tmp/pip-install-5vs591nx/kaldialign_3f7ed57b6e6d4a979465eb5b17e32acb/setup.py", line 102, in get_package_version
with open("CMakeLists.txt") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'CMakeLists.txt'
WARNING: Discarding https://files.pythonhosted.org/packages/06/4d/cddae8f15576630c55aeadd58631d400eb366358581dd7b57e0896cba6f0/kaldialign-0.3.tar.gz#sha256=071d16089f206d6c11025858fff78bb1991b2eaea718ff13a0d9d9e69a7107ef (from https://pypi.org/simple/kaldialign/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
Downloading kaldialign-0.2.tar.gz (39 kB)
Running command python setup.py egg_info
running egg_info
creating /tmp/pip-pip-egg-info-uvo1szfh/kaldialign.egg-info
writing /tmp/pip-pip-egg-info-uvo1szfh/kaldialign.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-pip-egg-info-uvo1szfh/kaldialign.egg-info/dependency_links.txt
writing requirements to /tmp/pip-pip-egg-info-uvo1szfh/kaldialign.egg-info/requires.txt
writing top-level names to /tmp/pip-pip-egg-info-uvo1szfh/kaldialign.egg-info/top_level.txt
writing manifest file '/tmp/pip-pip-egg-info-uvo1szfh/kaldialign.egg-info/SOURCES.txt'
reading manifest file '/tmp/pip-pip-egg-info-uvo1szfh/kaldialign.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file '/tmp/pip-pip-egg-info-uvo1szfh/kaldialign.egg-info/SOURCES.txt'
Building wheels for collected packages: kaldialign
Running command /opt/conda/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-5vs591nx/kaldialign_5cb4433a854e4cb6b79bdc79fdb925d8/setup.py'"'"'; file='"'"'/tmp/pip-install-5vs591nx/kaldialign_5cb4433a854e4cb6b79bdc79fdb925d8/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-ss39jva6
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.9
creating build/lib.linux-x86_64-3.9/extensions
copying extensions/init.py -> build/lib.linux-x86_64-3.9/extensions
running build_ext
building 'kaldialign' extension
creating build/temp.linux-x86_64-3.9
creating build/temp.linux-x86_64-3.9/extensions
gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/tmp/pip-install-5vs591nx/kaldialign_5cb4433a854e4cb6b79bdc79fdb925d8/extensions -I/opt/conda/include/python3.9 -c extensions/align.cpp -o build/temp.linux-x86_64-3.9/extensions/align.o -std=c++11 -Wno-register -Wno-unused-function -Wno-unused-local-typedefs -funsigned-char
gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/tmp/pip-install-5vs591nx/kaldialign_5cb4433a854e4cb6b79bdc79fdb925d8/extensions -I/opt/conda/include/python3.9 -c extensions/kaldi_align.cpp -o build/temp.linux-x86_64-3.9/extensions/kaldi_align.o -std=c++11 -Wno-register -Wno-unused-function -Wno-unused-local-typedefs -funsigned-char
g++ -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-3.9/extensions/align.o build/temp.linux-x86_64-3.9/extensions/kaldi_align.o -o build/lib.linux-x86_64-3.9/kaldialign.cpython-39-x86_64-linux-gnu.so
/opt/conda/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
installing to build/bdist.linux-x86_64/wheel
running install
running install_lib
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/wheel
creating build/bdist.linux-x86_64/wheel/extensions
copying build/lib.linux-x86_64-3.9/extensions/init.py -> build/bdist.linux-x86_64/wheel/extensions
copying build/lib.linux-x86_64-3.9/kaldialign.cpython-39-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
running install_egg_info
running egg_info
writing kaldialign.egg-info/PKG-INFO
writing dependency_links to kaldialign.egg-info/dependency_links.txt
writing requirements to kaldialign.egg-info/requires.txt
writing top-level names to kaldialign.egg-info/top_level.txt
reading manifest file 'kaldialign.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'kaldialign.egg-info/SOURCES.txt'
Copying kaldialign.egg-info to build/bdist.linux-x86_64/wheel/kaldialign-0.2-py3.9.egg-info
running install_scripts
adding license file "LICENSE" (matched pattern "LICEN[CS]E*")
creating build/bdist.linux-x86_64/wheel/kaldialign-0.2.dist-info/WHEEL
creating '/tmp/pip-wheel-ss39jva6/kaldialign-0.2-cp39-cp39-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
adding 'kaldialign.cpython-39-x86_64-linux-gnu.so'
adding 'extensions/init.py'
adding 'kaldialign-0.2.dist-info/LICENSE'
adding 'kaldialign-0.2.dist-info/METADATA'
adding 'kaldialign-0.2.dist-info/WHEEL'
adding 'kaldialign-0.2.dist-info/top_level.txt'
adding 'kaldialign-0.2.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel
Building wheel for kaldialign (setup.py) ... done
Created wheel for kaldialign: filename=kaldialign-0.2-cp39-cp39-linux_x86_64.whl size=34661 sha256=9207ced32458ee21917bcba83b5d369dc19c0c040a41685b338b035b51aefa30
Stored in directory: /root/.cache/pip/wheels/ab/6e/ec/548bcab54d6c93f3a31ab3283cbad1e8512fd14c05b5d1ce20
Successfully built kaldialign
Installing collected packages: kaldialign
Successfully installed kaldialign-0.2

Query regarding sclite_mode

I don't know if I understand it correctly, but it seems like sclite_mode doesn't really do anything. I tried it with many reference-hypotheses pairs and the results will always be the same whether I set it to boolean True or False.

For reference, here's a small script I tested:

refs = [('a', 'b', 'c'), ('d', 'e', 'f')]
hyps = [('a', 's', 'x', 'c'), ('e', 'f', 'f')]
EPS = '*'
for ref, hyp in zip(refs, hyps):
    print(align(ref, hyp, EPS))
    print(edit_distance(ref, hyp, sclite_mode=False))
    print(edit_distance(ref, hyp, sclite_mode=True))

print(edit_distance(refs, hyps, sclite_mode=False))
print(edit_distance(refs, hyps, sclite_mode=True))
ans = bootstrap_wer_ci(refs, hyps)
print({"wer": ans["wer"], "ci95": ans["ci95"], "ci95min": ans["ci95min"], "ci95max": ans["ci95max"]})

and these are what gets printed:

[('a', 'a'), ('b', 's'), ('*', 'x'), ('c', 'c')]
{'ins': 1, 'del': 0, 'sub': 1, 'total': 2, 'ref_len': 3, 'err_rate': 0.6666666666666666}
{'ins': 1, 'del': 0, 'sub': 1, 'total': 2, 'ref_len': 3, 'err_rate': 0.6666666666666666}
[('d', '*'), ('e', 'e'), ('f', 'f'), ('*', 'f')]
{'ins': 1, 'del': 1, 'sub': 0, 'total': 2, 'ref_len': 3, 'err_rate': 0.6666666666666666}
{'ins': 1, 'del': 1, 'sub': 0, 'total': 2, 'ref_len': 3, 'err_rate': 0.6666666666666666}

For both cases above, the result seems to be giving the same penalty.

{'ins': 0, 'del': 0, 'sub': 2, 'total': 2, 'ref_len': 2, 'err_rate': 1.0}
{'ins': 0, 'del': 0, 'sub': 2, 'total': 2, 'ref_len': 2, 'err_rate': 1.0}
{'wer': 0.6666666666667462, 'ci95': 0.0, 'ci95min': 0.6666666666667462, 'ci95max': 0.6666666666667462}

Support conda install

@pzelasko
I think it would be good to support

conda install -c kaldialign kaldialign

All you need to do is the following two steps I will take care of the rest.

Note: Step 1 is optional. If you don't do it, I will do it.

Step 1:

Create an account at https://anaconda.org/

The account name MUST be kaldialign. Otherwise, it is not possible to use -c kaldialign.

After creating the account, visit the following URL:

https://anaconda.org/kaldialign/settings/access

Screen Shot 2022-10-01 at 16 13 07

You will find the token at the bottom of that page after clicking Create.

Click view to get its value. (We don't need the name of the API token. Only its value is important).

Step 2

Visit
https://github.com/pzelasko/kaldialign/settings/secrets/actions

Click the button "New repository secret".

Name of the secret should be: KALDIALIGN_CONDA_TOKEN
Value of the secret: Please paste the value of the API token you copied from step 1.

That is all.

Descriptive names for edit_distance

Hello, what do you think about using descriptive names for the arguments of edit_distance?
With descriptive I mean something like reference and hypothesis.
In the "plain" edit_distance you only count the numbers of errors, and you can switch the arguments.
It changes, once you report ins/del/sub.

While a change of the code arguments would be a breaking change, it would be nice to have it at least in the README and the docsting. I want to use kaldialign to report ins/del/sub, but now I have to think about, which argument is which.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.