GithubHelp home page GithubHelp logo

miscscripts's People

Contributors

sejmodha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

ziels flopezo sofsta

miscscripts's Issues

non-GNU sed

Thank you for putting this script together! To overcome a KeyError: 'taxid' if you are using a Mac, simply change the subprocess.calls in line 79 and 80 from

subprocess.call("sed -i '1d' assembly_summary_refseq.txt", shell=True) subprocess.call("sed -i 's/^# //' assembly_summary_refseq.txt", shell=True)

to

subprocess.call("sed -i.bu '1d' assembly_summary_refseq.txt", shell=True) subprocess.call("sed -i.bu 's/^# //' assembly_summary_refseq.txt", shell=True)

The addition of the -i.bu creates a backup and operates sed "in place" something that is not standard on a Mac non-GNU version of sed.

UpdateKrakenDatabases.py how to set target download location

Hi sejmodha,

Great work for putting together this script! This might be a noob question, but you mention that this script takes an optional command-line argument which can be specified as the target location where the data should be downloaded and saved.

How do I do that? Can you perhaps give an example?

Cheers,

Sam

Error in using the database

Hi @sejmodha . I used your script to successfully setup the kraken database

But when I run I get this error
kraken: database ("/opt/apps/bioinfo/databases/kraken") does not contain necessary file database.kdb

This is the code I used
DB=/opt/apps/bioinfo/databases/kraken
kraken --preload --db $DB sample.trimmed.fa --threads 15 --classified-out sample.classified --unclassified-out sample.unclassified > sample.kraken

Please advice

UpdateKrakenDatabases.py kraken input format error

Hi
I got the following error while trying to convert sequences to kraken input format

Traceback (most recent call last):
File "/home/jason/Documents/Databases/UpdateKrakenDatabases.py", line 118, in
get_fasta_in_kraken_format('human_genome.fa')
File "/home/jason/Documents/Databases/UpdateKrakenDatabases.py", line 107, in get_fasta_in_kraken_format
outseq=">"+seq_id+"|"+taxid+"\n"+str(seq)+"\n"
File "/home/jason/anaconda3/lib/python3.9/site-packages/Bio/Seq.py", line 369, in str
return self._data.decode("ASCII")
File "/home/jason/anaconda3/lib/python3.9/site-packages/Bio/Seq.py", line 156, in decode
return bytes(self).decode(encoding)
File "/home/jason/anaconda3/lib/python3.9/site-packages/Bio/Seq.py", line 2911, in bytes
raise UndefinedSequenceError("Sequence content is undefined")
Bio.Seq.UndefinedSequenceError: Sequence content is undefined

I would apprecciate any help since I'm stuck trying to build the database

pandas error

Thanks for the script, but getting this error:

updatekrakendb.py
/media/disk1_12TB/tallnut/db/kraken2/refseq
/media/disk1_12TB/tallnut/db/kraken2/refseq
Downloading human genome

--2021-02-18 10:07:12-- ftp://ftp.ncbi.nih.gov/genomes/refseq/assembly_summary_refseq.txt
=> ‘assembly_summary_refseq.txt’
Resolving ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)... 165.112.9.228, 2607:f220:41e:250::11, 2607:f220:41e:250::13, ...
Connecting to ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)|165.112.9.228|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /genomes/refseq ... done.
==> SIZE assembly_summary_refseq.txt ... 66360144
==> PASV ... done. ==> RETR assembly_summary_refseq.txt ... done.
Length: 66360144 (63M) (unauthoritative)

assembly_summary_refseq.txt 100%[=================================================================================================>] 68.47M 1.13MB/s in 97s

2021-02-18 10:08:54 (724 KB/s) - ‘assembly_summary_refseq.txt’ saved [71801632]

/home/tallnutt/d/scripts/updatekrakendb.py:81: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
assembly_sum = pd.read_table('assembly_summary_refseq.txt',dtype='unicode')
Traceback (most recent call last):
File "/home/tallnutt/d/scripts/updatekrakendb.py", line 116, in
download_refseq_genome(9606,'human_genome_url.txt')
File "/home/tallnutt/d/scripts/updatekrakendb.py", line 81, in download_refseq_genome
assembly_sum = pd.read_table('assembly_summary_refseq.txt',dtype='unicode')
File "/opt/miniconda3/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "/opt/miniconda3/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 435, in _read
data = parser.read(nrows)
File "/opt/miniconda3/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1139, in read
ret = self._engine.read(nrows)
File "/opt/miniconda3/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1995, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 955, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2172, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 22 fields in line 141712, saw 36

Plasmid database

Hi @sejmodha
I realized the database name is HumanVirusBacteria. Does it mean that only these three are in the database? What about plasmids?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.