steineggerlab / foldseek Goto Github PK
View Code? Open in Web Editor NEWFoldseek enables fast and sensitive comparisons of large structure sets.
Home Page: https://foldseek.com
License: GNU General Public License v3.0
Foldseek enables fast and sensitive comparisons of large structure sets.
Home Page: https://foldseek.com
License: GNU General Public License v3.0
Hi,
It is a very nice software and extremely fast, thank you! One question: would it be possible that the server provides the superimposed structures and/or the translation-rotation matrix for the query vs hits? That would be super useful!
Thank you for your great work!
I find some problems in the link you give about Benchmark data and Foldseek databases: http://wwwuser.gwdg.de/~compbiol/foldseek/
Hello ! I was playing around with Foldseek and I mapped the encoded structure sequence from the PDB database downloaded from you (pdb_ss.fasta) to their respective aminoacid fasta sequence from PDB, I found inconsistency for 25.000 ids, like the following:
pdb_id_chain ss_seq_len aa_seq_len
11as_B 327 328
148l_E 163 162
155c_A 134 121
1a16_A 439 441
1a7a_A 416 431
The reasons can be multiple like in some structure the CA atom is missing or the amino-acid/sidechain wasn’t well properly defined etc.
I was wondering if it is possible to have or to know where did you retrieve the aminoacid sequences and if you applied some kind of filtering or cleaning.
Thank you very much !
If I understand correctly, the VQ-VAE used by foldseek translates each amino acid into one of 20 "states". Do we have access to these, i.e. is it possible to get the "state sequence"? Like:
AVGAI -> states 1, 5, 7, 1, 13
Thanks!
With a large database containing many files the base build step dies due to the command line character limit
For example, when trying to build a database of complete AlphaFold output.
Might be nice to enable an input settings file containing .cif / .pdb structures.
Dear foldseek developers,
When I search the 2ekj_A against PDB database with the default settings (3Di/AA), in the html output, I find the 2ekj_A as the first hit and I don't see it anymore in the output. But when I download the search results by selecting "Download All", I see 2ekj_A is reported multiple times in the table, with different e-values.
1st hit: "job.pdb_A 2ekj_A 100.000 105 0 0 1 105 1 105 3.285E-16 807 105 105"
569th hit: "job.pdb_A 2ekj_A 100.000 105 0 0 1 105 1 105 9.501E-16 785 105 105"
1121th hit: "job.pdb_A 2ekj_A 100.000 105 0 0 1 105 1 105 4.406E-16 802 105 105"
Why does it happen? I have renamed 2ekj.pdb to 2ekj.txt and attached it.
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
Please make sure to also post the complete output of Spacepharer. You can use gist.github.com for large output.
Providing context helps us come up with a solution and improve our documentation for the future.
Include as many relevant details about the environment you experienced the bug in.
Add RMSD to server output
Hi,
Is there a way to request NCBI or Uniprot accessions and/or the functions of the target proteins using foldseek api ?
with the python command : result = get('https://search.foldseek.com/api/result/' + ticket['id'] + '/0').json()
I get the target accession only.
If not, is there a way to batch request functions with alphafold database accessions ?
Thank you for this great tool by the way
Run example without error
foldseek createdb example/ targetDB
foldseek createdb example/ queryDB
foldseek search queryDB targetDB aln tmpFolder -a
foldseek aln2tmscore queryDB targetDB aln aln_tmscore
foldseek createtsv queryDB targetDB aln_tmscore aln_tmscore.tsv
Input database is not compatible with aln2tmscore.
aln2tmscore queryDB targetDB aln aln_tmscore
MMseqs Version: 188e5299e4d4e47b94614c4e8c67032f78f3ed21
Threads 96
Compressed 0
Verbosity 3
Input database "queryDB" has the wrong type (Aminoacid)
Allowed input:
- Unknown
Here is bash script that downloads: foldseek binary and git repo ,to get the example data, and runs the provided example generating the error.
#Create working directory
WORK_DIR=./foldseek_linux_avx
if [ -d $WORK_DIR ]; then
rm -rf $WORK_DIR
fi
mkdir $WORK_DIR
cd $WORK_DIR
#Download binary
wget https://mmseqs.com/foldseek/foldseek-linux-avx2.tar.gz --no-check-certificate
tar xvzf foldseek-linux-avx2.tar.gz
foldseek_bin=./foldseek/bin/foldseek
#Get test data
git clone https://github.com/steineggerlab/foldseek.git ./foldseek_git_src
EXAMPLE_DATA=./foldseek_git_src/example/
#Run foldseek; Generate error
$foldseek_bin createdb $EXAMPLE_DATA targetDB
$foldseek_bin createdb $EXAMPLE_DATA queryDB
$foldseek_bin search queryDB targetDB aln tmpFolder -a
$foldseek_bin aln2tmscore queryDB targetDB aln aln_tmscore
$foldseek_bin createtsv queryDB targetDB aln_tmscore aln_tmscore.tsv
I've also run the example with all 3 available compiled version (avx, sse and conda) and got the same error "Input database "queryDB" has the wrong type".
It will be nice to have a function that can check if an input database is compatible with a function or another.
Thank you
Hi,
I am trying to perform an all vs all search on the Alphafold/UniProt50 dataset.
Given the size of the job, my plan is to use Foldseek-MPI on a computer cluster.
The problem I am currently facing is that the job crashes whenever I try to use more than one node. It even crashes when testing it on a smaller dataset (~200k pdbs)
Since an all vs all search on the same smaller dataset works correctly on a non MPI build of foldseek (single node) the problem must be related to MPI.
Maybe the problem is in the way I am setting up the mpi runner?
(...)
#PBS -l nodes=2:ppn=24
(...)
foldseek search data_200k data_200k alignments tmpFolder \
-s 7.5 --max-seqs 20000 --mpi-runner "mpirun -map-by ppr:1:node:pe=24" \
--split 2 --split-mode 1
This link contains the stderr and stdout of the job.
The specific error is:
Error: Alignment step died
*** Error in `foldseek': double free or corruption (out): 0x0000000002e9a050 ***
Using this code foldseek aln2tmscore converted_full/ converted_full/ aln tmscore I should receive tm scores.
I get this error,
Input database "converted_full/" has the wrong type (Generic)
Allowed input:
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
run foldseek easy-search, then run the above code.
Please make sure to also post the complete output of Spacepharer. You can use gist.github.com for large output.
Providing context helps us come up with a solution and improve our documentation for the future.
Include as many relevant details about the environment you experienced the bug in.
Hi
I am looking to reduce the Alphafold indices further. I was wondering if the cluster mode in foldseek can be used to cluster structures further down than X% sequence identities? For example, using the default AA+3Di mode and evalues provided by the pairwise alignments?
Thanks for an awesome tool 👍
Database successfully creates for 180207 files in directory
Database builds up to a certain point and then fails:
[=========================================================> ] 88.05% 158.66K eta 20m 25s
Segmentation fault (core dumped)
The example commands provided on the repo run without error:
foldseek createdb example/ targetDB
foldseek easy-search example/d1asha_ targetDB aln.m8 tmpFolder
Output from example:
createdb example/ targetDB
MMseqs Version: 797a5a3ab5e2b6ba7104ac5fb20cfe4f817c88d5
Threads 32
Verbosity 3
Output file: targetDB
[=================================================================] 100.00% 26 0s 2ms
Time for merging to targetDB_ss: 0h 0m 0s 0ms> ] 68.00% 18 eta 0s
Time for merging to targetDB_h: 0h 0m 0s 0ms
Time for merging to targetDB_ca: 0h 0m 0s 0ms
Time for merging to targetDB: 0h 0m 0s 0ms
Ignore 0 out of 26.
Too short: 0, incorrect 0.
Time for processing: 0h 0m 0s 22ms
foldseek easy-search example/d1asha_ targetDB aln.m8 tmpFolder
aln.m8 exists and will be overwritten
Create directory tmpFolder
easy-search example/d1asha_ targetDB aln.m8 tmpFolder
MMseqs Version: 797a5a3ab5e2b6ba7104ac5fb20cfe4f817c88d5
Seq. id. threshold 0
Coverage threshold 0
Coverage mode 0
Max reject 2147483647
Max accept 2147483647
Add backtrace false
Include identical seq. id. false
TMscore threshold 0.5
Threads 32
Verbosity 3
Substitution matrix nucl:3di.out,aa:3di.out
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Max sequence length 65535
Compositional bias 1
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Gap open cost nucl:5,aa:11
Gap extension cost nucl:2,aa:1
Zdrop 40
Compressed 0
Seed substitution matrix nucl:3di.out,aa:3di.out
Sensitivity 5.7
k-mer length 0
k-score 2147483647
Alphabet size nucl:5,aa:21
Max results per query 300
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask lower case residues 0
Minimum diagonal score 15
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Alignment type 0
createdb example/d1asha_ tmpFolder/2764274086546556879/query --threads 32 -v 3
Output file: tmpFolder/2764274086546556879/query
[=================================================================] 100.00% 1 eta -
Time for merging to query_ss: 0h 0m 0s 0ms
Time for merging to query_h: 0h 0m 0s 0ms
Time for merging to query_ca: 0h 0m 0s 0ms
Time for merging to query: 0h 0m 0s 0ms
Ignore 0 out of 1.
Too short: 0, incorrect 0.
Time for processing: 0h 0m 0s 17ms
Create directory tmpFolder/2764274086546556879/search_tmp
search tmpFolder/2764274086546556879/query targetDB tmpFolder/2764274086546556879/result tmpFolder/2764274086546556879/search_tmp
prefilter tmpFolder/2764274086546556879/query_ss targetDB_ss tmpFolder/2764274086546556879/search_tmp/8562932631619137899/pref --sub-mat nucl:3di.out,aa:3di.out --seed-sub-mat nucl:3di.out,aa:3di.out -s 7.5 -k 0 --k-score 2147483647 --alph-size nucl:5,aa:21 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 0 --diag-score 1 --exact-kmer-matching 0 --mask 0 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca 1 --pcb 1.5 --threads 32 --compressed 0 -v 3
Query database size: 1 type: Aminoacid
Estimated memory consumption: 256M
Target database size: 26 type: Aminoacid
Index table k-mer threshold: 96 at k-mer size 6
Index table: counting k-mers
[=================================================================] 100.00% 26 0s 1ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 100.00% 26 0s 0ms
Index statistics
Entries: 3491
DB size: 128 MB
Avg k-mer size: 0.000208
Top 10 k-mers
FFGFFF 8
FFLFFF 6
DHQFFF 6
GDQQQQ 5
EEDDQE 4
FFGFDF 4
DFQFDF 4
FFDFFF 4
GEFHFF 4
RQQREG 4
Time for index table init: 0h 0m 0s 73ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 96
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 1
Target db start 1 to 26
[=================================================================] 100.00% 1 eta -
149.986395 k-mers per position
611 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
24 sequences passed prefiltering per query sequence
24 median result list length
0 sequences with 0 size result lists
Time for merging to pref: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 127ms
align tmpFolder/2764274086546556879/query_ss targetDB_ss tmpFolder/2764274086546556879/search_tmp/8562932631619137899/pref tmpFolder/2764274086546556879/search_tmp/8562932631619137899/aln --sub-mat nucl:3di.out,aa:3di.out -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 0 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca 1 --pcb 1.5 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --zdrop 40 --threads 32 --compressed 0 -v 3
Compute score, coverage and sequence identity
Query database size: 1 type: Aminoacid
Target database size: 26 type: Aminoacid
Calculation of alignments
[=================================================================] 100.00% 1 eta -
Time for merging to aln: 0h 0m 0s 0ms
24 alignments calculated
24 sequence pairs passed the thresholds (1.000000 of overall calculated)
24.000000 hits per query sequence
Time for processing: 0h 0m 0s 79ms
mvdb tmpFolder/2764274086546556879/search_tmp/8562932631619137899/aln tmpFolder/2764274086546556879/result
Time for processing: 0h 0m 0s 0ms
Removing temporary files
rmdb tmpFolder/2764274086546556879/search_tmp/8562932631619137899/pref
Time for processing: 0h 0m 0s 0ms
aln.m8 exists and will be overwritten
convertalis tmpFolder/2764274086546556879/query targetDB tmpFolder/2764274086546556879/result aln.m8 --sub-mat nucl:3di.out,aa:3di.out --format-mode 0 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits --translation-table 1 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --db-output 0 --db-load-mode 0 --search-type 0 --threads 32 --compressed 0 -v 3
[=================================================================] 100.00% 1 eta -
Time for merging to aln.m8: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 6ms
rmdb tmpFolder/2764274086546556879/result -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmpFolder/2764274086546556879/query -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmpFolder/2764274086546556879/query_h -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmpFolder/2764274086546556879/query_ca -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmpFolder/2764274086546556879/query_ss -v 3
Time for processing: 0h 0m 0s 0ms
foldseek createdb --threads 20 mmcif_files/ pdb_db
Command is run from within the pdb_mmcif
directory of the alphafold database download
Include as many relevant details about the environment you experienced the bug in.
Git commit used (The string after "MMseqs Version:" when you execute foldseek without any parameters): 797a5a3
Which foldseek version was used (Statically-compiled, self-compiled, Conda, etc.): self-compiled version
For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
Running cat /proc/cpuinfo | grep sse4_1
and cat /proc/cpuinfo | grep avx2
return 32 lines each. System has 16 cores/32 threads and 128 GB RAM.
Operating system and version: Ubuntu 20.04.2 LTS
Perhaps my system doesn't have the specs for a DB of this size? I tried running the SSE4.1 version and it had the same behavior.
Thank you for your time!
Dear foldseek developers,
I want to search many structures against many structures using foldseek, and I want to do local structure searching. I need to know the TM-score and the coordinates of the part of the query that aligns with the target i.e. qstart, qend, tstart, and tend.
As I need to do local alignment, I didn't use "--alignment-type 1" which is for the global alignment.
When I use the following code, it doesn't show qstart, qend, tstart, and tend.
foldseek createdb example/ targetDB
foldseek createdb example/ queryDB
foldseek search queryDB targetDB aln tmpFolder -a
foldseek aln2tmscore queryDB targetDB aln aln_tmscore
foldseek createtsv queryDB targetDB aln_tmscore aln_tmscore.tsv
How can I do local alignment and also have the qstart, qend, tstart, and tend?
I was thinking of running the search with default mode (that reports hit based on e-value) and then merging the data from this step with the table I get from converting alignments with e-value to tm-score. However, if a query protein has multiple domains that all align with the same template protein, it would fail.
Hi,
first, thanks for this amazing tool :)
I tried using the argument '--tmscore-threshold' with some cutoffs (0.25, 0.5, 0.6, 0.75, 0.9) and it seems that I get the same results every time, with tmscore lower than the threashold.
for example, I run this command:
foldseek easy-search quary_ranked_0.pdb my_db_archaea_db out_archea.txt /tmp/ --format-mode 4 --alignment-type 2 -c 0.7 --cov-mode 0 --format-output query,target,fident,alnlen,alntmscore,evalue,bits,mismatch,gapopen,qstart,qend,tstart,tend,qcov,tcov --tmscore-threshold 0.6
and I get this:
query target fident alnlen alntmscore evalue bits mismatch gapopen qstart qend tstart tend qcov tcov
ranked_0.pdb AF-Q58952-F1-model_v1.pdb.gz 0.083 274 4.084E-01 2.546E-06 280 157 18 33 258 16 243 0.873 0.916
ranked_0.pdb AF-Q58484-F1-model_v1.pdb.gz 0.115 233 2.404E-01 8.810E-06 260 135 13 3 195 6 207 0.745 0.716
...
the values in column 'alntmscore' are smaller than 0.6: 0.408 0.24...
do you know why it is happening?
I am using the linux-sse41 version
Thanks!
itai
Hello,
Do you provide the foldseek AF50_Best (selecting best plddt for each cluster or taking into account AF corrections for the 4% misfolded proteins) available for the standalone version? I can see It is available on the foldseek server (thanks so much), and you mentioned it will be available to download (foldseek databases) on twitter, so I was wondering if it is somewhere I did not find (I looked at https://foldseek.steineggerlab.workers.dev/ but did not find it, afdb50 seems to be the previous version uploaded in August 2022) or if you still plan to do it? That would be awesome!
Thanks a lot in advance!
Hi,
I was running a PDB file with foldseek on the swissprot Af models, using TM-align option. I noticed something weird with the listing of the TM-score
(all results: https://search.foldseek.com/result/aZf9k2cw6BPxvcZSqkVczCDixXzPjNd-hhuY8A/0#result-0-0)
As I understood, the 5th column should be the TM-score, however it is not the same as the TM-score written in light blue above the structure. Is this a bug? If not, what is the difference between the 2 TM-scores?
Thank you for looking into this issue!
I cannot find much information on how to view or process .m8 files. Please advise?
Hello,
I am wondering if you could advise on if I am able to extract a tabular output indicating, for each residue in the query, how far it is from the aligned residue in the target.
In the HTML output, this is represented by the blue arrows between the target and query structure (see below). I am looking for the length of those arrows.
Is there a way I can extract this information from the alignment information or even the HTML file?
Thank you very much! I am a huge fan of foldseek and am very grateful for the time you've spend putting this together.
Hello team,
is the newest 200 million uniprot structure by alphafold available?
Thanks,
Jianshu
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
Please make sure to also post the complete output of Spacepharer. You can use gist.github.com for large output.
Providing context helps us come up with a solution and improve our documentation for the future.
Include as many relevant details about the environment you experienced the bug in.
Cluster structures in a given database made using foldseek createdb.
When running easy-cluster on a very small set of 10 PDB structures, running the following command
foldseek easy-cluster ./target_db ./output_cluster ./tmp
results in this error message:
usage: foldseek createdb <i:PDB|mmCIF[.gz]> ... <i:PDB|mmCIF[.gz]> <o:sequenceDB> [options]
By Martin Steinegger <[email protected]>
options: common:
--threads INT Number of CPU-cores used (all by default) [4]
-v INT Verbosity level: 0: quiet, 1: +errors, 2: +warnings, 3: +info [3]
examples:
Convert PDB/mmCIF files to an db.
references:
- Steinegger M, Soding J: MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 35(11), 1026-1028 (2017)
Unrecognized parameter "--dbtype". Did you mean "--threads" (Threads)?
Looks like perhaps foldseek easy-cluster is trying to run foldseek createdb with an outdated option.
I used the precompiled binary for osx from https://mmseqs.com/foldseek/
foldseek-osx-universal.tar.gz 09-Jul-2021 07:55 6998109
foldseek Version: 75bb2e3
Operating system version: macOS Catalina 10.15.7
Hi,
Thanks for your amazing program. I have noticed if the following structure is in my query database, after I search it against PDB, aln2tmscore freezes.
The problem happens if this protein is in my list of queries: https://alphafold.ebi.ac.uk/entry/Q381H2
My foldseek version: 2dd3b2f
I use a statistically compiled program(avx2).
Best,
I'm running a simple easy-search for a folder against itself (to get the pairwise tm-scores) using the following:
foldseek easy-search --alignment-type 1 --tmscore-threshold 0.0 /some/folder /some/folder aln.m8 tmpFolder
/some/folder has 3 pdb files in it (all simple, single chain pdb files for proteins of L=100).
Thus, I expected aln.m8 to have 3 * 3 (or 3*2/2) rows in it representing the tmScores for each possible pair. However, the aln.m8 file showed in my case that there are only 6 rows, so some pairs are missing. Setting the sensetivity (-s) flag to 10 increased the number of rows to 8, but some rows are still missing still.
What's the reason not all the pairs appear in aln.m8? What configuration am I missing to enable that?
Command
foldseek createdb pfamprocessed/ foldseekDB
directory pfamprocessed contains PDB files in format:
ATOM 7 CG LEU 1 41.236 47.777 43.697
ATOM 9 CD1 LEU 1 42.032 47.298 42.478
ATOM 13 CD2 LEU 1 39.833 48.166 43.219
ATOM 17 C LEU 1 40.159 45.638 46.822
ATOM 18 O LEU 1 40.806 45.323 47.828
ATOM 19 N SER 2 39.248 44.822 46.297
ATOM 21 CA SER 2 38.875 43.515 46.837
ATOM 23 CB SER 2 37.869 42.860 45.885
they come from C-I-Tasser pre-computed database of models
https://zhanglab.ccmb.med.umich.edu/C-I-TASSER/pfam/
ls -1 pfamprocessed/|head
PF00094.pdb
PF00242.pdb
PF00257.pdb
PF00260.pdb
PF00324.pdb
PF00336.pdb
PF00363.pdb
PF00379.pdb
Program output:
createdb pfamprocessed/ foldseekDB
MMseqs Version: GITDIR-NOTFOUND
Threads 128
Verbosity 3
Output file: foldseekDB
[=================================================================] 100.00% 8.27K 0s 209ms
Time for merging to foldseekDB_ss: 0h 0m 0s 2ms
Time for merging to foldseekDB_h: 0h 0m 0s 2ms
Time for merging to foldseekDB_ca: 0h 0m 0s 2ms
Time for merging to foldseekDB: 0h 0m 0s 2ms
Ignore 8266 out of 8266.
Too short: 0, incorrect 8266.
Time for processing: 0h 0m 0s 267ms
Hello,
Could you please give a couple of examples with --db-output option? I thought it would create an output database containing the search results, but my output is only 2 file: outDB.dbtype, and outDB.index, whereas the index file contains only one line. Did I misunderstand this option or is there something wrong with it?
My idea was to get a searchable output database to run a new search with another query or parameters. Also, it would be helpful to have search restriction options like --gilist or -seqidlist in blast.
Thanks,
Harut
When I run the command line proposed for "Rescore aligments using TMscore" (with my own intputs data"):
foldseek easy-search example/ example/ aln tmp --format-output query,target,alntmscore,u,t
I am getting the error "Format code alntmscore does not exist.". Why?
I used to generated the taxdb for Swissprot v3 but using the taxid option will cause the following error
/remote/foldseek_db/swiss-prot/swissprot_mapping is empty. Rerun createtaxdb to recreate taxonomy mapping.
How to generated AlphaFold Swissprot v3 Taxdb ?
First of all, thanks for making the new Uniprot structures available as indices so quickly! Having a 70Gb is a lot less to download than 23Tb - amazing :)
First test: I am having issues using easy-search of the Alphafold/UniProt-NO-CA databases with the latest foldseek version (downloaded Aug 4 release - avx2 binaries). Search dies right after createdb (full log at bottom).
Potential explanation: Could my afdb
database be faulty? I could not get the foldseek download of UniProt-NO-CA (named afdb.tar.gz) to work, so I downloaded the file separately and untarred it. Untar worked fine.
I get tested using random .cif.gz files from AFDB. WHen running, I get the following error See full log below
foldseek easy-search AF-Q5G6D8-F1-model_v3.cif.gz afdb --alignment-type 2 res.m8 tmp
...
Index table k-mer threshold: 78 at k-mer size 6
Index table: counting k-mers
Illegal instruction (core dumped) ] 0.00% 1 eta -
Error: Kmer matching step died
Error: Search died
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
foldseek easy-search AF-Q5G6D8-F1-model_v3.cif.gz afdb --alignment-type 2 res.m8 tmp
Please make sure to also post the complete output of Spacepharer. You can use gist.github.com for large output.
Providing context helps us come up with a solution and improve our documentation for the future.
Include as many relevant details about the environment you experienced the bug in.
Statically compiled avx2 binaries from 4/8-22.
MMseqs Version: 4002f69
Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
Swissprot foldseek jobs works just fine.
Operating system and version:
Ubuntu 18.04
foldseek easy-search AF-Q5G6D8-F1-model_v3.cif.gz afdb --alignment-type 2 res.m8 tmp
easy-search AF-Q5G6D8-F1-model_v3.cif.gz afdb --alignment-type 2 res.m8 tmp
MMseqs Version: 4002f69c92a99b129a667b7399bb9d185a43a61b
Seq. id. threshold 0
Coverage threshold 0
Coverage mode 0
Max reject 2147483647
Max accept 2147483647
Add backtrace false
TMscore threshold 0.5
TMalign fast 1
Preload mode 0
Threads 32
Verbosity 3
Substitution matrix aa:3di.out,nucl:3di.out
Alignment mode 3
Alignment mode 0
E-value threshold 0.001
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Gap open cost aa:10,nucl:10
Gap extension cost aa:1,nucl:1
Compressed 0
Seed substitution matrix aa:3di.out,nucl:3di.out
Sensitivity 9.5
k-mer length 6
k-score seq:2147483647,prof:2147483647
Max results per query 1000
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 0
Mask residues probability 0.99995
Mask lower case residues 0
Minimum diagonal score 15
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Alignment type 2
Remove temporary files true
MPI runner
Force restart with latest tmp false
Chain name mode 0
Write lookup file 1
Tar Inclusion Regex .*
Tar Exclusion Regex ^$
Alignment format 0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output false
createdb AF-Q5G6D8-F1-model_v3.cif.gz tmp/3246194427880033517/query --chain-name-mode 0 --write-lookup 1 --tar-include '.*' --tar-exclude '^$' --threads 32 -v 3
Output file: tmp/3246194427880033517/query
[=================================================================] 100.00% 1 eta -
Time for merging to query_ss: 0h 0m 0s 649ms
Time for merging to query_h: 0h 0m 0s 734ms
Time for merging to query_ca: 0h 0m 0s 874ms
Time for merging to query: 0h 0m 1s 5ms
Ignore 0 out of 1.
Too short: 0, incorrect 0.
Time for processing: 0h 0m 7s 165ms
Create directory tmp/3246194427880033517/search_tmp
search tmp/3246194427880033517/query afdb tmp/3246194427880033517/result tmp/3246194427880033517/search_tmp --alignment-mode 3 --comp-bias-corr 1 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 -s 9.5 -k 6 --mask 0 --mask-prob 0.99995 --alignment-type 2 --remove-tmp-files 1
prefilter tmp/3246194427880033517/query_ss afdb_ss tmp/3246194427880033517/search_tmp/1200766418368934916/pref --sub-mat 'aa:3di.out,nucl:3di.out' --seed-sub-mat 'aa:3di.out,nucl:3di.out' -s 9.5 -k 6 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 1000 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 0.15 --diag-score 1 --exact-kmer-matching 0 --mask 0 --mask-prob 0.99995 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 32 --compressed 0 -v 3
Query database size: 1 type: Aminoacid
Target split mode. Searching through 9 splits
Estimated memory consumption: 94G
Target database size: 214684311 type: Aminoacid
Process prefiltering step 1 of 9
Index table k-mer threshold: 78 at k-mer size 6
Index table: counting k-mers
Illegal instruction (core dumped) ] 0.00% 1 eta -
Error: Kmer matching step died
Error: Search died
Hi,
I'm interested in finding matches for a local substructure. To prevent aligning to other domains and partial overlaps, I tried removing everything from the query structure that I'm not interested in, introducing chain breaks. Essentially reducing the protein to its core only.
I observe that I stop getting any hits if I use a structure that was trimmed down too aggressively - also missing hits that have great overlap with the region to which I trimmed down, which I recover well when using the full structure.
Are there general guidelines for doing this - should trimming structures be avoided altogether, or does a minimum number of residues need to be maintained to avoid alignment from failing?
Thanks!
Is the matrix found in foldseek/data/mat3di.out the one used to compute the SmithWatterman structural alignment mentioned in the paper?
I'm using your 3Di structural sequence to align structures, but the common BLOSUM62 matrix creates erroneous results, therefore I would like to use the one you employ in foldseek and see if different sequence alignment methods produce different structural alignments.
Thanks in advance, awesome tool and work as always!
While downloading big databases, would it be possible to resume the download after an ERROR 500 or other connection error ?
I have been trying for several days to complete the download of the full Alphafold/UniProt db .. it seems that the server does not accept the continuing download options and it always fail, with the foldseek program itself or externally with wget.
Last server error was just now at 91%, after downloading 554.82G !
Hi,
Could you help me understanding the .m8
output format provided by the server? There are 20 columns and I am wondering what is what. Could not find it documented anywhere, as in the local version I would control this myself.
Thanks!
The default threshold for sensitivity should be 7.5 as specified in the documentation.
Easy-search by default runs with a -s value of 5.7. Is this intended? Or should this be switched to 7.5 as specified in the documentation?
Thank you!
While running foldseek easy-search I ran into the following error.
##installed foldseek
conda install -c conda-forge -c bioconda foldseek
##installed database
foldseek databases Alphafold/Proteome afdb tmp
##try to run easysearch
foldseek easy-search /home/z76r142/AlphaFold/models/SLA-04_01093-RD/ranked_1.pdb /home/z76r142/condaenv/ 1093.m8 tmpfolder
##easy search results in following error
MMseqs2 was not compiled with zlib support. Cannot read compressed input.
Error: target createdb died
foldseek Version: 2.8bd520
MMseqs2 Version: 13.45111
foldseek databases PDB pdb tmp
should setup PDB database
Returns:
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Downloaded pdb.tar.gz is empty. It looks like the target URL (http://wwwuser.gwdg.de/~compbiol/foldseek/) no longer has uploaded databases.
I'm currently trying to align a database with 23391 structures against a dabase of CATH s35 PDB files in an all vs all fashion.
Segmentation fault at line 17 structureresearch.sh
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders. - Done
Generated with createdb successfully, empty tmp folder.
Launched as programs/foldseek/bin/foldseek search species/foldseek_db/homo_sapiens cath_s35_scans/s35_database cath_s35_scans/results/homo_sapiens_s35.m8 tmp
The two databases show as complete, although the program fails at
tmp/11743319497277053650/structuresearch.sh: line 17: 56940 Segmentation fault $RUNNER "$MMSEQS" prefilter "${QUERY_PREFILTER}" "${TARGET_PREFILTER}${INDEXEXT}" "${TMP_PATH}/pref" ${PREFILTER_PAR}
Error: Kmer matching step died
Here is the full output
programs/foldseek/bin/foldseek search species/foldseek_db/homo_sapiens cath_s35_scans/s35_database cath_s35_scans/results/homo_sapiens_s35.m8 tmp
cath_s35_scans/results/homo_sapiens_s35.m8 exists and will be overwritten
Create directory tmp
search species/foldseek_db/homo_sapiens cath_s35_scans/s35_database cath_s35_scans/results/homo_sapiens_s35.m8 tmp
MMseqs Version: 75bb2e3a2718903f47008d7d8cc3be099e35d1e9
Seq. id. threshold 0
Coverage threshold 0
Coverage mode 0
Max reject 2147483647
Max accept 2147483647
Add backtrace false
Include identical seq. id. false
TMscore threshold 0.5
Threads 48
Verbosity 3
Substitution matrix nucl:3di.out,aa:3di.out
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Max sequence length 65535
Compositional bias 0
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Gap open cost nucl:5,aa:11
Gap extension cost nucl:2,aa:1
Zdrop 40
Compressed 0
Seed substitution matrix nucl:3di.out,aa:3di.out
Sensitivity 7.5
k-mer length 0
k-score 2147483647
Alphabet size nucl:5,aa:21
Max results per query 300
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 0
Mask lower case residues 0
Minimum diagonal score 15
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Alignment type 0
prefilter species/foldseek_db/homo_sapiens_ss cath_s35_scans/s35_database_ss tmp/11743319497277053650/pref --sub-mat nucl:3di.out,aa:3di.out --seed-sub-mat nucl:3di.out,aa:3di.out -s 7.5 -k 0 --k-score 2147483647 --alph-size nucl:5,aa:21 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 0 --diag-score 1 --exact-kmer-matching 0 --mask 0 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca 1 --pcb 1.5 --threads 48 --compressed 0 -v 3
Query database size: 23391 type: Aminoacid
Estimated memory consumption: 359M
Target database size: 32388 type: Aminoacid
Index table k-mer threshold: 96 at k-mer size 6
Index table: counting k-mers
[=================================================================] 100.00% 32.39K 0s 225ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 100.00% 32.39K 0s 261ms
Index statistics
Entries: 4390456
DB size: 153 MB
Avg k-mer size: 0.261692
Top 10 k-mers
CCCCCC 1391
PCCCKS 1141
PCCCGI 1027
CPPPCP 913
PPPPPP 833
IPPPPP 656
MCCCKS 612
CCCCCS 557
CKKIPP 504
PCCCHS 480
Time for index table init: 0h 0m 0s 965ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 96
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 23391
Target db start 1 to 32388
tmp/11743319497277053650/structuresearch.sh: line 17: 56940 Segmentation fault $RUNNER "$MMSEQS" prefilter "${QUERY_PREFILTER}" "${TARGET_PREFILTER}${INDEXEXT}" "${TMP_PATH}/pref" ${PREFILTER_PAR}
Error: Kmer matching step died
If I use easy-search instead of search the first database shows a Query database size: 0
Include as many relevant details about the environment you experienced the bug in.
Alternatively, can Foldseek run on all the structures contained within a folder?
Thank you for all your the great tools and work!
Dear foldseek developers
I’m glad to hear that the newly updated Alphafold 210M database is now avaliable for foldseek , but there are three type of the afdb (afdb.tar.gz/
afdb50.tar.gz/
afdb_ca.tar.gz)
I learned that afdb50 was the product of Alphafold 210M db clusted by mmseqs , but what the difference between the other dbs? please help me explain this problem , THX!
(Not a bug, just asking for a new feature :) )
Hi, I started using the tool, which looks very useful and efficient! thanks for all the hard work
It will be very useful for me if I could have more than one output format in one run;
To be more specific: the pretty HTML format + the output table (mode 0/4) so I could choose the columns.
If it isn't not much of a bother, could you add this option?
thanks :)
Itai
Hitting the "toggle full query" button hides the portion of the query structure that does not align to the target.
The button has no effect. (The button for the toggling full target does work.)
Expand 1nqg_A (the top pdb100 hit) in
https://search.foldseek.com/result/CkjKZC8aEOSkQc-KtPlzKZQco_oRGaK2mQb5Bg/0#result-1-0)
Using Chrome 104.0.5112.101 (64-bit) on windows
A default run of foldseek search followed by convertalis (or, alternatively, easy-search) outputs an m8 file with columns as follow:
query target identity alignment_length mismatches gap_openings query_start query_end target_start target_end evalue bitscore
i.e.
3pvlA_A 3pvlA01_A 0.951 207 10 0 1 207 1 207 1.675E-142 487
Running the same two structures with the --alignment-type 1
flag turned on to use TMalign returns values that are inconsistent in position and format.
i.e.
3pvlA_A 3pvlA01_A 0.048 597 197 0 1 597 1 208 1.000E+00 100
It seems that the third column is no longer fident but something different (RMSD?), while the second to last is no longer an evalue but the TMalign score of 1 (in this case) reported in scientific notation.
Regarding the third column, I can't seem to find an equivalent result on the standalone TMalign output
TMalign
-bash-4.2$ ./TMalign ../data/cath.S20.v4_3_0.chainpdb/3pvlA ../data/cath.S20.v4_3_0.domainpdb/3pvlA01
*********************************************************************
* TM-align (Version 20210224): protein structure alignment *
* References: Y Zhang, J Skolnick. Nucl Acids Res 33, 2302-9 (2005) *
* Please email comments and suggestions to [email protected] *
*********************************************************************
Name of Chain_1: ../data/cath.S20.v4_3_0.chainpdb/3pvlA (to be superimposed onto Chain_2)
Name of Chain_2: ../data/cath.S20.v4_3_0.domainpdb/3pvlA01
Length of Chain_1: 597 residues
Length of Chain_2: 208 residues
Aligned length= 208, RMSD= 0.00, Seq_ID=n_identical/n_aligned= 1.000
TM-score= 0.34841 (if normalized by length of Chain_1, i.e., LN=597, d0=8.55)
TM-score= 1.00000 (if normalized by length of Chain_2, i.e., LN=208, d0=5.37)
(You should use TM-score normalized by length of the reference structure)
(":" denotes residue pairs of d < 5.0 Angstrom, "." denotes other aligned residues)
EEDLSEYKFAKFAATYFQGTTTHSYTRRPLKQPLLYHDDEGDQLAALAVWITILRFMGDLPEPKYHKIPVMTKIYETLGKKTYKRELQALQQGNSMLEDRPTSNLEKLHFIIGNGILRPALRDEIYCQISKQLTHNPSKSSYARGWILVSLCVGCFAPSEKFVKYLRNFIHGGPPGYAPYCEERLRRTFVNGTRTQPPSWLELQATKSKKPIMLPVTFMDGTTKTLLTDSATTARELCNALADKISLKDRFGFSLYIALFDKVSSLGSGSDHVMDAISQCEQYAKEQGAQERNAPWRLFFRKEVFTPWHNPSEDNVATNLIYQQVVRGVKFGEYRCEKEDDLAELASQQYFVDYGSEMILERLLSLVPTYIPDREITPLKNLEKWAQLAIAAHKKGIYAQRRTDSQKVKEDVVNYARFKWPLLFSRFYEAYKFSGPPLPKSDVIVAVNWTGVYFVDEQEQVLLELSFPEIMAVSSSRGTKMMAPSFTLATIKGDEYTFTSSNAEDIRDLVVTFLEGLRKRSKYVVALQDNPNSGFLSFAKGDLIILDHDTGEQVMNSGWANGINERTKQRGDFPTDCVYVMPTVTLPPREIVALVTM
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
EEDLSEYKFAKFAATYFQGTTTHSYTRRPLKQPLLYHDDEGDQLAALAVWITILRFMGDLPEPKYHKIPVMTKIYETLGKKTYKRELQALQQGNSMLEDRPTSNLEKLHFIIGNGILRPALRDEIYCQISKQLTHNPSKSSYARGWILVSLCVGCFAPSEKFVKYLRNFIHGGPPGYAPYCEERLRRTFVNGTRTQPPSWLELQATKS-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total CPU time is 0.57 seconds
-bash-4.2$
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
Foldseek-3di
./foldseek/bin/foldseek easy-search data/cath.S20.v4_3_0.chainpdb/3pvlA data/cath.S20.v4_3_0.domainpdb/3pvlA01 3pvlA.m8 tmp --alignment-type 1 -c 0.6 --cov-mode 1 -s 9
Foldseek-TMalign
./foldseek/bin/foldseek easy-search data/cath.S20.v4_3_0.chainpdb/3pvlA data/cath.S20.v4_3_0.domainpdb/3pvlA01 3pvlA.m8 tmp -c 0.6 --cov-mode 1 -s 9
TMalign
TMalign ../data/cath.S20.v4_3_0.chainpdb/3pvlA ../data/cath.S20.v4_3_0.domainpdb/3pvlA01
I'm attaching both structures for reference.
Structures.zip
What are the TMalign parameters used by Foldseek so I can crosscheck the results?
Thanks again for all your work on this, it's an amazing piece of software.
Dear Foldseek developers,
I wondered if making a foldseek database from a tar file is possible. The total number of structures in the latest alphafold release is more than 200M. My computational server has a limitation that doesn't allow me to have more than 1M files. It would be great if foldseek could make a database from the concatenated files (like tarred files).
Should compile on linux without errors
Throws error about missing header file encoder_weights_3di.kerasify.h during make
normal build procedure:
clone git repo, untar, create build dir, cmake, make
Please make sure to also post the complete output of Spacepharer. You can use gist.github.com for large output.
[ 87%] Building CXX object lib/3di/CMakeFiles/3di.dir/structureto3di.cpp.o
/home/user/packages/foldseek/foldseek/lib/3di/structureto3di.cpp:7:10: fatal error: encoder_weights_3di.kerasify.h: No such file or directory
7 | #include "encoder_weights_3di.kerasify.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [lib/3di/CMakeFiles/3di.dir/build.make:76: lib/3di/CMakeFiles/3di.dir/structureto3di.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:880: lib/3di/CMakeFiles/3di.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
self-compiled from git source
commit 90b2545
RebornOS (ArchLinux)
Hi,
Thanks for your great program.
I am using foldseek to search some cif files against a database of pdb files. I see foldseek reports the alignment "one amino acid shorter than what it is supposed to report". For instance, I chopped the structure of A0A017S8D7 from position 17 to 229, and searched the whole structure against the chopped part. I saw the alignment is one amino acid shorter than what was expected.
I expected to see qend to be 229 and tend to be 213.
qend = 228, and tend = 212
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
A0A017S8D7_17_229.pdb.gz
I assume you have downloaded the attached pdb.gz file and it is in your current working directory.
mkdir query
mkdir target
mv A0A017S8D7_17_229.pdb.gz target
wget https://alphafold.ebi.ac.uk/files/AF-A0A017S8D7-F1-model_v3.cif
mv AF-A0A017S8D7-F1-model_v3.cif query
foldseek createdb target/ Tdb --threads 1
foldseek createdb query/ Qdb --threads 1
foldseek search Qdb Tdb aln tmpFolder -a --threads 1
foldseek convertalis Qdb Tdb aln aln_tm \
--format-output "query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,qlen,tlen,evalue,bits,alntmscore" \
--threads 1
#Please take a look at the aln_tm file
Please make sure to also post the complete output of Spacepharer. You can use gist.github.com for large output.
Providing context helps us
come up with a solution and improve our documentation for the future.
Include as many relevant details about the environment you experienced the bug in.
Hello! I just downloaded the PDB databases. I received those files:
pdb
pdb_ca
pdb_ca.dbtype
pdb_ca.index
pdb_h
pdb_h.dbtype
pdb_h.index
pdb_mapping
pdb_ss
pdb_ss.dbtype
pdb_ss.index
pdb_taxonomy
pdb.dbtype
pdb.index
pdb.lookup
pdb.md5sum
I was wondering which one contains the 3Di sequences.
Thank you
Thanks a lot for this wonderful tool.
Unfortunately, I couldn't find a brief description of the output format. The output I have downloaded misses column headers. So, I don't know which data belongs to which column. I tried to guess some of them (seqident, ...) but without success. Looking at mmseqs2 site didn't help me. It would be very nice if the README contains a brief explanation of the output format by giving an example.
Thanks in advance,
Reza
When searching any protein on the web server (https://search.foldseek.com/) I continue to get failures when using the TM-align mode. I tried this with a number of proteins and loaded accessions.
Aligning human AlphaFold proteome to the Mycobacterium tuberculosis proteome
Index table: counting k-mers
/data/local/vfranke/VFranke_Structures/TMP/16012876887949026468/search_tmp/8683525511783924040/structuresearch.sh: line 17: 49775 Illegal instruction $RUNNER "$MMSEQS" prefilter "${QUERY_PREFILTER}" "${TARGET_PREFILTER}${INDEXEXT}" "${TMP_PATH}/pref" ${PREFILTER_PAR}
Error: Kmer matching step died
Error: Search died
Data is downloaded
foldseek='~/bin/Software/Proteins/foldseek/bin/foldseek'
inpath='~/Base/AlphaFold'
outpath=' /data/local/VFranke_Structures'
tmpdir=$outpath'/TMP';mkdir $tmpdir 2> /dev/null
db_mt=$outpath/'foldseek_myctu_db'
myctu=(find $inpath/MYCTU | grep cif
)
outdir=$outpath/"HUMAN-MYCTU-align"
hgfiles=(find $inpath/HUMAN | grep cif
)
MMseqs Version: 75bb2e3
Seq. id. threshold 0
Coverage threshold 0
Coverage mode 0
Max reject 2147483647
Max accept 2147483647
Add backtrace false
Include identical seq. id. false
TMscore threshold 0.5
Threads 32
Verbosity 3
Substitution matrix nucl:3di.out,aa:3di.out
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Max sequence length 65535
Compositional bias 1
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Gap open cost nucl:5,aa:11
Gap extension cost nucl:2,aa:1
Zdrop 40
Compressed 0
Seed substitution matrix nucl:3di.out,aa:3di.out
Sensitivity 5.7
k-mer length 0
k-score 2147483647
Alphabet size nucl:5,aa:21
Max results per query 300
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask lower case residues 0
Minimum diagonal score 15
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Alignment type 0
search /data/local/vfranke/VFranke_Structures/TMP/16012876887949026468/query /data/local/vfranke/VFranke_Structures/foldseek_myctu_db /data/local/vfranke/VFranke_Structures/TMP/16012876887949026468/result /data/local/vfranke/VFranke_Structures/TMP/16012876887949026468/search_tmp --threads 32
/data/local/vfranke/VFranke_Structures/TMP/16012876887949026468/search_tmp/8683525511783924040/structuresearch.sh: line 17: 51199 Illegal instruction $RUNNER "$MMSEQS" prefilter "${QUERY_PREFILTER}" "${TARGET_PREFILTER}${INDEXEXT}" "${TMP_PATH}/pref" ${PREFILTER_PAR}
Error: Kmer matching step died
Error: Search died
Trying to align the human proteome to the mycobacerium tuberculosis proteome
I downloaded the pre-compiled binary. Test run executed perfectly.
The reason for this are differences in parameter.
--max-seq 300 -e 0.001
--max-seqs 1000 -e 0.1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.