Comments (7)
Ah. I see. The config file should not be wrapped with [[ ]].
BTW, there are duplicated entries in both "chrom_list" and "impute_list" in your shared config file.
I am currently also preparing a tutorial on a toy example dataset. But in the meantime, I'll share the link to the toy example with the corresponding config file with you here so you can mimic the structure of the input files.
https://drive.google.com/drive/folders/1NrKGRUKzcG_jfDjXV6qiYaaWSDg-XM-S?usp=sharing
Feel free to make suggestions if the documentation looks unclear or confusing to you. Thanks!
from higashi.
A tutorial on the toy data sounds great!
Just looking at the data.txt in the toy example....why do you keep the first column the same? I suppose there are 1087 cells, but your first column has a "1" throughout?
My goal: to see any sort of clusters in my data just by using all the intrachromosomal contact information from all the chromosomes
Also, could you explain how to use the parameter "loss_mode" for my goal?
And also the ideal way to use the training step (step3 ) of higashi?
And also what output to use to visualize the embedding? (I am a bit confused with what you have written in the wiki)
Thanks again!
from higashi.
- You spotted a bug, 😄 . It should be the cell names instead of all "1", I'll fix that today.
- "rank" stands for ranking mode while "classification" stands for classification mode. If your dataset at the give resolution, most of the non-zero entries (e.g. > 80%) are just 1, then the classification mode would be sufficient as it just learn to predict nonzero entries. If your non-zero entries has a good continuous span, then the ranking mode would further learn the order of the values in the contact maps.
- I don't quite understand the question. Do you just want to train Higashi at step 3? Higashi trains subsequently, the embeddings from step 1 would be important for step 3 as well. We do have options to skip the imputation after step 2. I'm also testing if skipping step 2 would not affect step 3 results as much. Will update the code base once that finished.
- The {embedding_name}_0_origin.npy at {temp_dir} is the cell embeddings. You could also use the Higashi_vis to inspect both the embeddings and imputation results. I'll change the documentation to make it clearer
from higashi.
- Haha, glad I did! (Also there is a
{bottle_neck}
parameter in theconfig_toy.json
file. What is that?) - I understand.
- Sorry, what I meant was the training step of Higashi(Step3 from the wiki page and not
-s 3
). So I have started training the model giving-s 1
. This will run all the steps subsequently. - Oh I see now. In my
{temp_dir}
I can see the*_0_origin.npy
file and also the*_origin.npy
files for all of my training chromosomes. So that's what made me confused.
At present, the code is running at epoch 10 (out of 15). I will probably let them all finish before downstream analysis on the embeddings...
from higashi.
- It's a parameter that has been deprecated and will no longer be used in the program
- Yes. Your understanding is correct
- The first 15 epochs are just using bias effects (distance, batch id, chromosome id, cell coverage) to regress out the bias. There will be the following 60 epochs to get the real embeddings
from higashi.
I see. Can I then use UMAP on the {embedding_name}_0_origin.npy
data?
from higashi.
Yes.
from higashi.
Related Issues (20)
- FastHigashi wrapper.prep_dataset: 'int' object has no attribute 'shape' HOT 9
- Some problem about color HOT 5
- question about cell order HOT 5
- Problem solved
- Error running Ramani data HOT 2
- higashi.process_data() won't finish HOT 20
- higashi.Higashi_backend.Modules import error HOT 5
- error when running scTAD.py HOT 1
- Error running simulated data
- The main_cell.py is so slow HOT 5
- Problem running Higashi on Ramani et al. HOT 5
- What are the configure options mean?
- Stop with OSError when run "higashi_model.train_for_imputation_nbr_0()" HOT 3
- Error in fh_model.prep_dataset() "Pack from sparse mtx to tensors" HOT 2
- ERROE when run process.py: no config file HOT 1
- Predicting structures from embedding vector HOT 2
- wrapper.fast_process_data() - method does not exist HOT 2
- ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (15361,) + inhomogeneous part. HOT 3
- RuntimeError: received 0 items of ancdata
- Higashi stuck on training at higashi_model.train_for_imputation_nbr_0() on SLURM system HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from higashi.