TileDB
The installation guide for TileDB can be found at this Github Wiki.
This alternate implementation is based on the original fork from Intel-HLS and is specifically optimized for GenomicsDB among other things.
Check out TileDB tutorials -
TileDB
License: MIT License
The installation guide for TileDB can be found at this Github Wiki.
This alternate implementation is based on the original fork from Intel-HLS and is specifically optimized for GenomicsDB among other things.
Check out TileDB tutorials -
GenomicsDB is adding a bunch of non-TileDB elements into TileDB storage. And we see this when GenomicsDB tries deleting an existing workspace -
10:05:08.612 INFO GenomicsDBImport - Done initializing engine
[TileDB::StorageManager] Error: Cannot delete non TileDB related element '/home/vagrant/gatk/gendb/22$4514841$4617450/genomicsdb_meta_dir'.
[TileDB::FileSystem] Error: posix: Cannot delete file; Directory not empty
10:05:08.841 INFO GenomicsDBImport - Shutting down engine
[October 17, 2018 10:05:08 AM PDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.04 minutes.
Allow an option to force delete TileDB workspaces.
Currently, there is only support for LZ4_compress_default() and the compression_level is ignored. We need to map tiledb compression levels to lz4_compress_fast() - see https://github.com/lz4/lz4/blob/dev/lib/lz4.h.
Right now, from GenomicsDB, we have to deconstruct the parent URI for cloud prefix and query strings. And then reconstruct after TileDBUtils::get_dirs()
and TileDBUtils::get_files()
return. However, hdfs://
returns the entire URI making this function inconsistent and cumbersome to use.
Support azure managed identities. @mlathara mentioned -
I think this is a good overview. In addition to the AZ CLI call I mention above, we could also make a REST call.
The use case I am thinking of here is where a user may be using a node or cluster they don't directly have control over, and might be relying on managed identity to provide credential info.
Not sure I am constructing the URI correctly. But the following does not work
% ./vcf2genomicsdb_init -w ws -o -S azb://<container>/vcfs
[E::hts_open_format] Failed to open file "azb://<container>/vcfs/t0.vcf.gz" : Protocol not supported
16:54:14.024 ERROR vcf2genomicsdb_init - pid=10937 tid=6368523 Could not open sample azb://<container>/vcfs/t0.vcf.gz with hts_open Protocol not supported
% ./vcf2genomicsdb_init -w ws -o -S azb://nalini/vcfs?endpoint=<account>.blob.core.windows.net
[E::hts_open_format] Failed to open file "azb://nalini/vcfs/t0.vcf.gz?endpoint=<account>.blob.core.windows.net" : Protocol not supported
16:55:41.334 ERROR vcf2genomicsdb_init - pid=10943 tid=6369128 Could not open sample azb://<container>/vcfs/t0.vcf.gz?endpoint=oda.blob.core.windows.net with hts_open Protocol not supported
whereas this works!
% ./vcf2genomicsdb_init -w ws -o -S az://<container>@<account>.blob.core.windows.net/vcfs
Hello,
Using the code currently at the tip of master, I get
error: cast from ‘const char*’ to ‘mup::int_type’ {aka ‘int’} loses precision
at the place that can be seen in the enclosed patch, proposing a fix.
This error was spotted in Debian unstable.
Cheers,
Pierre
Hello,
I have seen some little spelling errors in the software, you can find the correction enclosed.
Cheers,
Pierre
spelling.txt
Two issues here -
tiledb_array_overflow
uses the attribute id based on what was passed to tiledb_array_init
. But, it actually requires the id to correspond to that in the schema. There is no direct api to map the id to that in the schema making tiledb_array_overflow rather cumbersome to use.tiledb_array_overflow
does not return correct values when tiledb_array_read
is called repeatedly to refresh exhausted buffers. Need to have a bunch of test cases to demonstrate and fix this issue.Discovered while performance testing GenomicsDB scenarios that TileDB_IO_READ does not function correctly with no_compression scenarios. See #107
Hello,
The code embeds a copy of catch2, but it lies several versions behind upstream which has solved some issues, e.g. catchorg/Catch2#2421 which showed up for me when using the tip of master of your TileDB to build GenomicsDB.
In Debian unstable, linking to the Debian-packaged catch2 instead has solved this problem.
Cheers,
Pierre
Hello,
With the code at the tip of master, I need to make the encloses correction so that TILEDB_IO_* definitions are available in core/include/storage_manager/storage_manager_config.h
Cheers,
Pierre
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.