datma-health / tiledb Goto Github PK

View Code? Open in Web Editor NEW

2.0 4.0 1.0 13.06 MB

TileDB

License: MIT License

CMake 2.43% C 1.55% C++ 96.02%

storage-manager arrays posix hdfs emrfs gcs cloud mpi cloud-storage azure-storage

tiledb's Introduction

TileDB

The installation guide for TileDB can be found at this Github Wiki.

This alternate implementation is based on the original fork from Intel-HLS and is specifically optimized for GenomicsDB among other things.

Check out TileDB tutorials -

tiledb's People

Contributors

Stargazers

Watchers

Forkers

kgururaj

tiledb's Issues

Allow for a force delete of TileDB elements

GenomicsDB is adding a bunch of non-TileDB elements into TileDB storage. And we see this when GenomicsDB tries deleting an existing workspace -

10:05:08.612 INFO  GenomicsDBImport - Done initializing engine
[TileDB::StorageManager] Error: Cannot delete non TileDB related element '/home/vagrant/gatk/gendb/22$4514841$4617450/genomicsdb_meta_dir'.
[TileDB::FileSystem] Error: posix: Cannot delete file; Directory not empty 
10:05:08.841 INFO  GenomicsDBImport - Shutting down engine
[October 17, 2018 10:05:08 AM PDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.04 minutes.

Allow an option to force delete TileDB workspaces.

Allow specification for multiple compression levels for LZ4.

Currently, there is only support for LZ4_compress_default() and the compression_level is ignored. We need to map tiledb compression levels to lz4_compress_fast() - see https://github.com/lz4/lz4/blob/dev/lib/lz4.h.

TileDBUtils::get_dirs and TileDBUtils::get_files should return the entire URIs respectively

Right now, from GenomicsDB, we have to deconstruct the parent URI for cloud prefix and query strings. And then reconstruct after TileDBUtils::get_dirs() and TileDBUtils::get_files() return. However, hdfs:// returns the entire URI making this function inconsistent and cumbersome to use.

Allow for multiple azure credential scenarios

Support azure managed identities. @mlathara mentioned -
I think this is a good overview. In addition to the AZ CLI call I mention above, we could also make a REST call.

The use case I am thinking of here is where a user may be using a node or cluster they don't directly have control over, and might be relying on managed identity to provide credential info.

azb:// URI does not work well with vcf2genomicsdb_init in GenomicsDB

Not sure I am constructing the URI correctly. But the following does not work

% ./vcf2genomicsdb_init -w ws -o -S azb://<container>/vcfs
[E::hts_open_format] Failed to open file "azb://<container>/vcfs/t0.vcf.gz" : Protocol not supported
16:54:14.024 ERROR vcf2genomicsdb_init - pid=10937 tid=6368523 Could not open sample azb://<container>/vcfs/t0.vcf.gz with hts_open Protocol not supported
% ./vcf2genomicsdb_init -w ws -o -S azb://nalini/vcfs?endpoint=<account>.blob.core.windows.net
[E::hts_open_format] Failed to open file "azb://nalini/vcfs/t0.vcf.gz?endpoint=<account>.blob.core.windows.net" : Protocol not supported
16:55:41.334 ERROR vcf2genomicsdb_init - pid=10943 tid=6369128 Could not open sample azb://<container>/vcfs/t0.vcf.gz?endpoint=oda.blob.core.windows.net with hts_open Protocol not supported

whereas this works!

% ./vcf2genomicsdb_init -w ws -o -S az://<container>@<account>.blob.core.windows.net/vcfs

int_type initialization with 0 instead of ""

Hello,

Using the code currently at the tip of master, I get
error: cast from ‘const char*’ to ‘mup::int_type’ {aka ‘int’} loses precision
at the place that can be seen in the enclosed patch, proposing a fix.

This error was spotted in Debian unstable.

Cheers,

Pierre

int_type_initialization.txt

Tiny spelling issues

Hello,

I have seen some little spelling errors in the software, you can find the correction enclosed.

Cheers,
Pierre
spelling.txt

tiledb_array_overflow not working well

Two issues here -

The api in tiledb.h states that tiledb_array_overflow uses the attribute id based on what was passed to tiledb_array_init. But, it actually requires the id to correspond to that in the schema. There is no direct api to map the id to that in the schema making tiledb_array_overflow rather cumbersome to use.
tiledb_array_overflow does not return correct values when tiledb_array_read is called repeatedly to refresh exhausted buffers. Need to have a bunch of test cases to demonstrate and fix this issue.

In Debian unstable, linking to the Debian-packaged catch2 instead has solved this problem.

Cheers,
Pierre

Missing inclusion of tiledb_constants.h

Hello,

With the code at the tip of master, I need to make the encloses correction so that TILEDB_IO_* definitions are available in core/include/storage_manager/storage_manager_config.h

Cheers,
Pierre

missing_inclusion_tiledb_constants.txt

datma-health / tiledb Goto Github PK

tiledb's Introduction

TileDB

tiledb's People

Contributors

Stargazers

Watchers

Forkers

tiledb's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs