GithubHelp home page GithubHelp logo

johnchase / cual-id Goto Github PK

View Code? Open in Web Editor NEW
23.0 23.0 7.0 470 KB

A package for creating and managing sample identifiers in comparative -omics datasets.

License: BSD 3-Clause "New" or "Revised" License

Python 8.48% Jupyter Notebook 91.52%

cual-id's People

Contributors

ebolyen avatar gregcaporaso avatar jairideout avatar johnchase avatar kmckinnis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cual-id's Issues

Cannot run after build

Hi I get this error:

cual-id -help
Traceback (most recent call last):
File "/home/jroatkul/anaconda_ete/envs/cual-id/bin/cual-id", line 4, in
import('pkg_resources').run_script('cual-id==0.9.1', 'cual-id')
File "/home/jroatkul/anaconda_ete/envs/cual-id/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/pkg_resources/init.py", line 744, in run_script
File "/home/jroatkul/anaconda_ete/envs/cual-id/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/pkg_resources/init.py", line 1506, in run_script
File "/home/jroatkul/anaconda_ete/envs/cual-id/lib/python3.5/site-packages/cual_id-0.9.1-py3.5.egg/EGG-INFO/scripts/cual-id", line 68, in
File "/home/jroatkul/anaconda_ete/envs/cual-id/lib/python3.5/site-packages/click/core.py", line 716, in call
return self.main(*args, **kwargs)
File "/home/jroatkul/anaconda_ete/envs/cual-id/lib/python3.5/site-packages/click/core.py", line 675, in main
_verify_python3_env()
File "/home/jroatkul/anaconda_ete/envs/cual-id/lib/python3.5/site-packages/click/_unicodefun.py", line 119, in _verify_python3_env
'mitigation steps.' + extra)
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Either run this under Python 2 or consult http://click.pocoo.org/python3/ for mitigation steps.

This system lists a couple of UTF-8 supporting locales that
you can pick from. The following suitable locales where
discovered: aa_DJ.utf8, aa_ER.utf8, aa_ET.utf8, af_ZA.utf8, am_ET.utf8, an_ES.utf8, ar_AE.utf8, ar_BH.utf8, ar_DZ.utf8, ar_EG.utf8, ar_IN.utf8, ar_IQ.utf8, ar_JO.utf8, ar_KW.utf8, ar_LB.utf8, ar_LY.utf8, ar_MA.utf8, ar_OM.utf8, ar_QA.utf8, ar_SA.utf8, ar_SD.utf8, ar_SY.utf8, ar_TN.utf8, ar_YE.utf8, as_IN.utf8, ast_ES.utf8, az_AZ.utf8, be_BY.utf8, ber_DZ.utf8, ber_MA.utf8, bg_BG.utf8, bn_BD.utf8, bn_IN.utf8, bo_CN.utf8, bo_IN.utf8, br_FR.utf8, bs_BA.utf8, byn_ER.utf8, ca_AD.utf8, ca_ES.utf8, ca_FR.utf8, ca_IT.utf8, crh_UA.utf8, cs_CZ.utf8, csb_PL.utf8, cv_RU.utf8, cy_GB.utf8, da_DK.utf8, de_AT.utf8, de_BE.utf8, de_CH.utf8, de_DE.utf8, de_LU.utf8, dv_MV.utf8, dz_BT.utf8, el_CY.utf8, el_GR.utf8, en_AG.utf8, en_AU.utf8, en_BW.utf8, en_CA.utf8, en_DK.utf8, en_GB.utf8, en_HK.utf8, en_IE.utf8, en_IN.utf8, en_NG.utf8, en_NZ.utf8, en_PH.utf8, en_SG.utf8, en_US.utf8, en_ZA.utf8, en_ZW.utf8, es_AR.utf8, es_BO.utf8, es_CL.utf8, es_CO.utf8, es_CR.utf8, es_DO.utf8, es_EC.utf8, es_ES.utf8, es_GT.utf8, es_HN.utf8, es_MX.utf8, es_NI.utf8, es_PA.utf8, es_PE.utf8, es_PR.utf8, es_PY.utf8, es_SV.utf8, es_US.utf8, es_UY.utf8, es_VE.utf8, et_EE.utf8, eu_ES.utf8, fa_IR.utf8, fi_FI.utf8, fil_PH.utf8, fo_FO.utf8, fr_BE.utf8, fr_CA.utf8, fr_CH.utf8, fr_FR.utf8, fr_LU.utf8, fur_IT.utf8, fy_DE.utf8, fy_NL.utf8, ga_IE.utf8, gd_GB.utf8, gez_ER.utf8, gez_ET.utf8, gl_ES.utf8, gu_IN.utf8, gv_GB.utf8, ha_NG.utf8, he_IL.utf8, hi_IN.utf8, hne_IN.utf8, hr_HR.utf8, hsb_DE.utf8, ht_HT.utf8, hu_HU.utf8, hy_AM.utf8, id_ID.utf8, ig_NG.utf8, ik_CA.utf8, is_IS.utf8, it_CH.utf8, it_IT.utf8, iu_CA.utf8, iw_IL.utf8, ja_JP.utf8, ka_GE.utf8, kk_KZ.utf8, kl_GL.utf8, km_KH.utf8, kn_IN.utf8, ko_KR.utf8, kok_IN.utf8, ks_IN.utf8, ku_TR.utf8, kw_GB.utf8, ky_KG.utf8, lg_UG.utf8, li_BE.utf8, li_NL.utf8, lo_LA.utf8, lt_LT.utf8, lv_LV.utf8, mai_IN.utf8, mg_MG.utf8, mi_NZ.utf8, mk_MK.utf8, ml_IN.utf8, mn_MN.utf8, mr_IN.utf8, ms_MY.utf8, mt_MT.utf8, my_MM.utf8, nb_NO.utf8, nds_DE.utf8, nds_NL.utf8, ne_NP.utf8, nl_AW.utf8, nl_BE.utf8, nl_NL.utf8, nn_NO.utf8, no_NO.utf8, nr_ZA.utf8, nso_ZA.utf8, oc_FR.utf8, om_ET.utf8, om_KE.utf8, or_IN.utf8, pa_IN.utf8, pa_PK.utf8, pap_AN.utf8, pl_PL.utf8, ps_AF.utf8, pt_BR.utf8, pt_PT.utf8, ro_RO.utf8, ru_RU.utf8, ru_UA.utf8, rw_RW.utf8, sa_IN.utf8, sc_IT.utf8, sd_IN.utf8, se_NO.utf8, shs_CA.utf8, si_LK.utf8, sid_ET.utf8, sk_SK.utf8, sl_SI.utf8, so_DJ.utf8, so_ET.utf8, so_KE.utf8, so_SO.utf8, sq_AL.utf8, sq_MK.utf8, sr_ME.utf8, sr_RS.utf8, ss_ZA.utf8, st_ZA.utf8, sv_FI.utf8, sv_SE.utf8, ta_IN.utf8, te_IN.utf8, tg_TJ.utf8, th_TH.utf8, ti_ER.utf8, ti_ET.utf8, tig_ER.utf8, tk_TM.utf8, tl_PH.utf8, tn_ZA.utf8, tr_CY.utf8, tr_TR.utf8, ts_ZA.utf8, tt_RU.utf8, ug_CN.utf8, uk_UA.utf8, ur_PK.utf8, ve_ZA.utf8, vi_VN.utf8, wa_BE.utf8, wo_SN.utf8, xh_ZA.utf8, yi_US.utf8, yo_NG.utf8, zh_CN.utf8, zh_HK.utf8, zh_SG.utf8, zh_TW.utf8, zu_ZA.utf8

Force ids to be more spreadsheet friendly?

John and I have been discussing whether this is a worthwhile thing to do. John doesn't want to end up supporting every spreadsheet program's weird data typing as it is read in, but perhaps some general things can be done without going overboard. John was hoping to get input from others.

I noticed when I create the ids when the character "e" is followed by numbers or when the id begins with 0 or any number it causes it to be misread by Excel. Most likely people will create the ids and put them in their metadata spreadsheet. I was wondering how difficult it would be to set rules governing the random creation that disallowed a number directly after "e" and starting with 0 or any number. I am sure that this would reduce the number of ids that can be created for a given length but could be an improvement.
There is also the issue of ids being read in as a date. A simple solution could be that all ids must have at least one letter (or perhaps even better must begin with a letter).

Thoughts on these ideas?
Arron

Couple ideas for cual-id after using it for 2 large projects

My lab has been using the cual-id system for our barcodes for 2 large projects. It has worked great for us. Having made over 800 barcodes and working with them on a daily basis, we have noticed 2 things that we think could be improved, as a suggestion.

  1. When there's an "e" as the second to last character in the barcode, excel recognizes it as an exponential number and you have to manually change the format of the cell so that it displays correctly.
  2. With many students working on this project, and people having different handwriting styles, we've noticed some mix ups between a "6" and a "b".
    Just thought I would tell you about these after working with so many of your barcodes. :-) Hope it helps a bit.

Couple of (very minor) issues with distance calculations

  1. The manuscript refers to transposition as an issue, but Hamming only captures character-by-character differences, not true edit distance (i.e., Levenshtein). In practice, this may not be a huge issue, but it does mean that a transposed ID could get confused (i.e., it would allow both "abcdef" and "bcdefg" as valid IDs, which could be easily confused if a letter was left off).
  2. The fix.py function doesn't use the same edit distance calculation as mint.py. Probably best to harmonize these, but again given the random nature of the generated IDs this is likely quite a minor issue in practice.

Opened a PR (#27) with one proposed fix. And thanks for the work โ€“ package + paper do a nice job outlining the benefits of the approach!

Switch columns in output to best help with --existing-ids

Hello,
It appears that when the create-ids command is run it creates the UUID column and then the id column. When this file is passed using the --existing-ids it reads the UUID column as the input for existing-ids. This runs with no error (not that one would be expected) and may lead the user to believe it has been done correctly. This can result in duplicate ids. If we change the order of the output in these columns then the created ids.txt file could be passed as the existing-ids file and no reworking of the ids.txt file is needed (as simple as this may be).
Thanks,
Arron and William

I Cant Create Cual IDs

When I try creating the cual IDs on my Mac on terminal it says -bash: cual-id: command not found. On my coworkers Mac it works fine though. How can this be fixed so that I can create IDs?

-e/--existing-ids option not working?

Hi there!

This is such a great tool. After < 24 hours working with this I can tell that our group is going to get a ton of mileage out of cual-id, so thank you very much!

I'm currently running cual-id on High Sierra (10.13.6) and have only had one issue: the --existing-ids option doesn't seem to work!

I installed cual-id into its own Conda env without issue:

conda create -c https://conda.anaconda.org/johnchase -n cual-id python=3 cual-id
source activate cual-id

I get the following output from cual-id --help:

(cual-id) myID@machine:~$ cual-id --help
Usage: cual-id [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  create  Command to create barcode labels or sample...
  fix     Compare a set of possibly invalid IDs against...

And generating IDs with --length/-l and --fail-threshold/-f works without issue:

(cual-id) myID@machine:~$ time cual-id create ids -l 10 -f 0.99 10
5cfc82d2-041a-41b3-898d-fbec9b47bff7	ec9b47bff7
31abdf17-b0bd-4e5f-b0b0-e254bf8be4a7	54bf8be4a7
1f5483ba-492c-4baa-a489-e5c891db6149	c891db6149
4d07142a-669b-4513-a125-299377ac9be8	9377ac9be8
c4705cd4-623f-44eb-9195-edcff7367e41	cff7367e41
61f01f14-7403-42a7-8355-2ca629729804	a629729804
2fc45e0b-3726-491a-a969-13cf60d1b53d	cf60d1b53d
d8218270-0e52-4138-9106-5727766baef3	27766baef3
104cb1dc-24b6-4665-9624-612fb13c26e9	2fb13c26e9
ab02825b-88ed-4b6b-8a99-ac33ae581fbb	33ae581fbb

real	0m0.245s
user	0m0.200s
sys	0m0.041s

(cual-id) myID@machine:~$ time cual-id create ids --length 10 --fail-threshold 0.99 10
22281f49-5be0-4465-a38f-0e0650e47f4b	0650e47f4b
b03e9988-ea9b-42fa-ac6c-eadd0253a688	dd0253a688
6b191d3b-25f5-4d73-809d-4e504bf9f15d	504bf9f15d
6d4d3097-c671-4b1d-818a-75e01d3987d0	e01d3987d0
33cd7df0-9c3b-46a0-98f1-dde555baa009	e555baa009
93fa15f8-ddcc-4611-aaad-0de4ca694c4a	e4ca694c4a
08b6c9c2-9375-4ef4-8ab5-182e90cea65d	2e90cea65d
d32d0d73-fcf0-4049-a8da-fa950016ba38	950016ba38
f6fcec23-38a4-4e9a-a256-ac682bfcc036	682bfcc036
46eaefc4-2cc1-4d48-aeb2-cc26a3c195fb	26a3c195fb

real	0m0.246s
user	0m0.201s
sys	0m0.040s

But, when I write to file and try to use a pre-existing file to create IDs, I get the following error:

(cual-id) myID@machine:~$ time cual-id create ids --length 10 --fail-threshold 0.99 10 > test-ids.txt

real	0m0.247s
user	0m0.201s
sys	0m0.042s
(cual-id) myID@machine:~$ head test-ids.txt
dccc9d88-7e36-477a-ac4a-610ada8bc9fe	0ada8bc9fe
c296e7d2-fb98-4e31-a4d8-83e1e2999836	e1e2999836
c825451d-4317-4099-8ae0-7a2e4b8a7ee3	2e4b8a7ee3
ea7e6d72-c82a-4840-ad88-9a3c2c7f5435	3c2c7f5435
4df91764-4d5a-4301-8472-8c78cb3f0853	78cb3f0853
d92744b0-342b-4cdd-a084-1a462c6f39a6	462c6f39a6
555202bf-434d-4d2c-b01e-73aa52ee002d	aa52ee002d
75eaa961-9b62-4628-9071-5b02be0260cb	02be0260cb
b3a6ffed-895e-4011-aa58-c2f30ae87cd5	f30ae87cd5
75316bf9-81c7-4ca4-8517-4aefa2e9f8f3	efa2e9f8f3
(cual-id) myID@machine:~$ time cual-id create ids --length 10 --fail-threshold 0.99 --existing-ids test-ids.txt 10
Error: no such option: --existing-ids

real	0m0.242s
user	0m0.198s
sys	0m0.040s

I can see that --existing-ids should be an option as it's clearly in cual-id/cual-id at master at line 36:

@click.option('-e', '--existing-ids', type=click.File('U'), default=None, required=False)

Am I missing something? Thanks a ton!

Fix bug where empty string returns nothing

If the existing list passed to create cualids contains an empty string the function returns nothing:

for id_ in cualid.create_ids(n=7, id_length=4, existing_ids=['']):
    print(e)

If an empty string is valid it should not return nothing, if it is not valid it should throw a warning. I don't feel an empty string should be considered valid

usability issues

I needed to generate some sample ids this morning and came across a few issues:

  • README.md has the wrong script name (BC_generator instead of Cual-ID)
  • Calling Cual-ID create-ids fails with traceback when called with no options - possible to just display the help text instead? Command line users shouldn't get a traceback.
  • help text for Cual-ID create-ids is confusing as the positional argument is required the [ ] imply that it's optional
  • when specifying a prefix with -p, the separator character that's used (:) is non MEINS-compliant, so keemei warns about it. Could this be replaced with a .?
  • optionally write the ids to stdout - this is nice if you're populating a google spreadsheet with them (when maybe you don't end up using the file that's created, but just copy/pasting from it)

create_ids should fail after n attempts

Currently create_ids will continue to try to add IDs to the list indefinitely. This creates a situation where it may not be possible to add IDs below a minimum edit distance. If this happens the loop will run forever

Running validation of cual-ids when generated in batch

On the off chance that someone's system doesn't have a clock accurate to the chosen interval, we should create a running validator that compares every ID to its immediate predecessor for uniqueness between the two.
Should failure of this condition raise on exception? or would it just retry until it succeeds, failing out after so many retries?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.