Comments (3)
I confirmed removing the PL field it works! For future reference, you can remove it from the schema using jq
:
vcf2zarr explode example.vcf.gz example.exploded
vcf2zarr mkschema example.exploded | jq 'del(.fields[] | select(.name == "call_PL"))' > example.json
vcf2zarr encode example.exploded example.zarr --schema example.json
from bio2zarr.
Thanks for the bug report @mufernando!
I think the problem is with your call_PL
field:
DEBUG Initialised <zarr.core.Array '/call_PL' (24739, 746, 66) int32>
Your PL field leads to an inner dimension of 66! 🫢
So, each variant chunk of this array is 10000 * 746 * 66 * 4 which is more than 2147483647. The immediate bug here then is that we're not giving a usable error message. More generally, PL fields are a major problem (#185 and linked discussion) but we know how they should be dealt with.
In the short term, I'd suggest creating a schema and dropping the call_PL field (assuming you're not using it):
vcf2zarr mkschema chrY_mcc-746-kray-filtered.exploded > schema.json
... edit JSON to remove delete the call_PL field
vcf2zarr encode chrY_mcc-746-kray-filtered.exploded -s schema.json chrY_mcc-746-kray-filtered.zarr
from bio2zarr.
I just merged a fix which should raise a more helpful error message @mufernando - would you mind trying it out on your data please? The message will need some links to as-yet unwritten docs about PL fields, but hopefully it's pointing people in the right direction.
from bio2zarr.
Related Issues (20)
- Returning a string from `.mkschema` HOT 1
- Document status of Python API
- Fixup msprime based tests when packages are fixed
- Add "what about cloud?" docs
- Add explicit warning for Mac Python 3.9
- New tool: tskit2zarr HOT 1
- Document copying to cloud storage HOT 1
- Refactor docs build infrastructure
- Restructure vcf2zarr docs
- Add --no-progress (or similar) to suppress progress
- Bug in dexplode-partition
- Change dexplode-init to use ``--num-parts``/``-n`` instead of positional HOT 1
- Change dencode-init to use --num-partitions
- Hypothesis testing for vcf2zarr HOT 5
- Pin to zarr < 3
- ValueError: could not broadcast input array
- Run tests against Zarr 3 HOT 1
- Run tests against numpy 2 HOT 4
- Set copy=True in np.array creation for numpy 2.0 compatibility HOT 1
- ICF stores created with numpy 1.x won't work with numpy 2.x HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bio2zarr.