Comments (16)
Hey, I think it's likely caused by choosing a too coarse resolution (6000000). To make it easier for the user, we implement some heuristic in Higashi to decide the feature dimension, model size etc. based on the genome reference size and typical resolutions that are used for analysis. It's likely that at the resolution of 6,000,000, one of the suggested dim by the model becomes 1, and leads to this error. I would suggest to change it to 1Mb as a start point. If the problem persists, I'll take another look. If for some reason that it's necessary to use the 6Mb resolution, I can add a fix to the code to avoid this issue.
from higashi.
Thank you for responding to my concern. Your response was very helpful. I take 1Mb as as suggested.
But now I am getting some message like : "The 0 th chrom in your chrom_list has no sample in this generator". Can you help me to understand what does it mean ? I am working on chr11 as specified in the configuration file.
Do you think that it is caused by the sparsity of my data (total_sparsity_cell 0.00040311744154797097) ? Because I noticed that in your tutorial (Higashi/tutorials/4DN_sci-Hi-C_Kim et al.ipynb), you get something like total_sparsity_cell 0.012761184803150997.
`>>> from higashi.Higashi_wrapper import *
#config = "config_mousse.JSON"
config = "config_souris.JSON"
print("1. Config finished")
- Config finished
Initialize the Higashi instance
higashi_model = Higashi(config)
Data processing (only needs to be run for once)
higashi_model.process_data()
generating start/end dict for chromosome
extracting from data.txt
100%|████████████████████████████████████████████████████████████████████████████████| 39410250/39410250 [01:56<00:00, 337588.28it/s]
generating contact maps for baseline
data loaded
750 False
creating matrices tasks: 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.07it/s]
total_feats_size 200
0%| | 0/1 [00:00<?, ?it/s]Done here 1
1
Done here 2
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 67.57it/s]
higashi_model.prep_model()
cpu_num 32
training on data from: ['chr11']
total_sparsity_cell 0.00040311744154797097
no contractive loss
batch_size 256
Node type num [250 122] [250 372]
start making attribute
0.994: 32%|███████████████████████████▊ | 96/300 [00:00<00:00, 433.01it/s]
loss 0.9697239995002747 loss best 0.9167578220367432 epochs 96
initializing data generator
0%| | 0/1 [00:00<?, ?it/s]
The 0 th chrom in your chrom_list has no sample in this generator
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 7194.35it/s]
initializing data generator
0%| | 0/1 [00:00<?, ?it/s]
The 0 th chrom in your chrom_list has no sample in this generator
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 7752.87it/s]
print("2. Process finished")
- Process finished`
from higashi.
Um. This error is raised when there are no hyperedges to train the model. For debug purpose could you run the following script?
temp_dir = ...
with h5py.File(os.path.join(temp_dir, "node_feats.hdf5"), "r") as input_f:
print(len(np.array(input_f['train_data_%s' % "chr11"]).astype('int'))))
Also, what's your minimum and maximum distance in the config files.
Thanks.
from higashi.
In my config file, I have :
"minimum_impute_distance": 0,
"maximum_impute_distance": -1
The fact that there are no hyperedges to train the model is probably related to my cell data. Maybe they are too sparse.
from higashi.
So may be I have to increase this two distances
from higashi.
That seems to be using all the edges, so I think these two parameters are probably fine. Also, I mean the minimum_distance not minimum_impute_distance.
could you try to run the code I provided above to see if there are any edges before the filtering step. That could help to narrow down where the problem is (too few reads, reads are mostly short-ranged interactions, etc. )
Thanks!
from higashi.
When I run the code above, that is what I get.
`>>> temp_dir = ...
with h5py.File(os.path.join(temp_dir, "node_feats.hdf5"), "r") as input_f:
... print(len(np.array(input_f['train_data_%s' % "chr11"]).astype('int')))
...
Traceback (most recent call last):
File "", line 1, in
File "/cvmfs/samfouss/easybuild/software/2020/avx2/Core/python/3.8.10/lib/python3.8/posixpath.py", line 76, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not ellipsis`
from higashi.
And also now I do not know what happened but I can run "higashi_model.prep_model()" command without any problem.
`
from higashi.Higashi_wrapper import *
Set the path to the configuration file, change it accordingly
#config = "config_mousse.JSON"
config = "config_souris.JSON"
print("1. Config finished")
- Config finished
Initialize the Higashi instance
higashi_model = Higashi(config)
Data processing (only needs to be run for once)
higashi_model.process_data()
generating start/end dict for chromosome
extracting from data.txt
100%|████████████████████████████████████████████████████████████████████████████████| 39410250/39410250 [01:58<00:00, 333620.38it/s]
generating contact maps for baseline
data loaded
2831250 False
creating matrices tasks: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [15:21<00:00, 921.79s/it]
total_feats_size 200
0%| | 0/1 [00:00<?, ?it/s]Done here 1
149
Done here 2
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.30it/s]
higashi_model.prep_model()
cpu_num 32
training on data from: ['chr11']
total_sparsity_cell 0.00040311744154797097
no contractive loss
batch_size 1280
Node type num [ 250 12185] [ 250 12435]
start making attribute
0.636: 100%|██████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:01<00:00, 213.18it/s]
loss 0.6364461779594421 loss best 0.6372790932655334 epochs 299
initializing data generator
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 25731.93it/s]
initializing data generator
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 27962.03it/s]
print("2. Process finished")
- Process finished
`
But "higashi_model.train_for_embeddings()" takes a lot of time to execute.
from higashi.
I am getting the same error, tried 1Mb and 100Kb to no avail.
I have sparsity 0.22241997971104638 which should be quite good, do you have any suggestion on how to debug this? In the temp_dir
I can't find a node_feats.hdf5
file to dump as you suggested.
from higashi.
Hey, what version of Higashi are you using. Is it the one from conda or the github + pip install.
from higashi.
I installed downloading the repo and using setup.py.
From my conda env export I see:
name: fasthigashi
- fasthigashi=0.1.1=py_0
from higashi.
I see. And to confirm, the error is: ValueError: Found array with 1 feature(s) (shape=(250, 1)) while a minimum of 2 is required by TruncatedSVD
from higashi.
Yes, of course with a different n for my data
'ValueError: Found array with 1 feature(s) (shape=(69, 1)) while a minimum of 2 is required by TruncatedSVD.'
from higashi.
To help with debugging:
- Are you working with a custom genome or standard ones (like hg38 / mm10)
- Are there any chromosomes with length smaller than 1Mb in the dataset?
- under the temp_dir, there should be some files with name
"cell_adj_%s.npy"
in it, could you load one of them and print out the shape?
Thanks!
from higashi.
Thanks to you! You may already have found the problem.
hg19
, but filtered out the uncharacterized chromosomes. However, chrM
was still there.
I removed it and removed all entries with it in the pairs. Now the SVD step works. Maybe a small addition to the documentation could address that only chr1:chrX/Y should be included, that's what I should have done in the first place!
Thank you for the help, and feel free to close this!
from higashi.
I see sounds good. Will update the documentation.
from higashi.
Related Issues (20)
- Error running Ramani data HOT 2
- higashi.process_data() won't finish HOT 20
- higashi.Higashi_backend.Modules import error HOT 5
- error when running scTAD.py HOT 1
- Error running simulated data
- The main_cell.py is so slow HOT 5
- Problem running Higashi on Ramani et al. HOT 5
- What are the configure options mean?
- Stop with OSError when run "higashi_model.train_for_imputation_nbr_0()" HOT 3
- Error in fh_model.prep_dataset() "Pack from sparse mtx to tensors" HOT 2
- ERROE when run process.py: no config file HOT 1
- Predicting structures from embedding vector HOT 2
- wrapper.fast_process_data() - method does not exist HOT 2
- ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (15361,) + inhomogeneous part. HOT 3
- RuntimeError: received 0 items of ancdata
- Higashi stuck on training at higashi_model.train_for_imputation_nbr_0() on SLURM system HOT 7
- ValueError: setting an array element with a sequence. HOT 1
- RuntimeError: CUDA out of memory.
- The Dip-C data processing keeps encountering errors. HOT 3
- how the cell_name in data.txt corresponds to the cell_type in label_info.pickle? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from higashi.