brentlab / netprophet_2.0 Goto Github PK
View Code? Open in Web Editor NEWA “data light” TF-network mapping algorithm using only gene expression and genome sequence data.
License: MIT License
A “data light” TF-network mapping algorithm using only gene expression and genome sequence data.
License: MIT License
Hi,
firstly, thanks for bringing together all of this code and tools.
My issue is to do with its installation, so I was wondering if you are thinking of improving this part of its implementation? At the moment this is very very hard to install and make workable, because it seems specifically designed for your system, so it doesn't generalise very well. I've been trying to tweak your code to make this work on our system, but it's proving quite difficult and suboptimal, as I'm having to remove some of the parallelisation you did with SLURM.
I appreciate that this is quite a complex pipeline, with many dependencies. I am installing this for another user, not myself, therefore this feedback is based on the experience of trying to install your tool, not the results it produces.
Here's a few things I would suggest, I hope they make sense:
Keep code separate from data
At the moment, the scripts depend on having the code (SRC
and CODE
directories) and the data (RESOURCES
, OUTPUT
and LOG
) within the same directory (NetProphet_2.0
). Ideally, one would like to install the software in one place, and leave it separate from the data and pipeline scripts (Snakefile
and config.json
).
This way several users can share the same installation but work on their own data separately.
Remove dependency on SLURM
Although SLURM is widely used, it is not the universal job scheduler used by everyone.
This would also require modifying the way your program is launched. At the moment you've made a small shell script to submit via sbatch
. Perhaps instead you could have instructions of how the user could write a basic submission script, which could be adapted to whichever scheduler they use.
For example, at the moment your script uses 2 cores with snakemake
, but that could be decided by the user?
The package restorepoint
used by build_bart_network.r
is no longer available on CRAN.
Parallel options
Some parallelisation options are "on" without an choice. For example, run_build_bart_network.sh
hard codes useMpi=TRUE
, which doesn't make it optional anymore. Also, in the Snakefile
you have turned on -c
flag in CODE/build_motif_network.py
step, again not making it optional.
Perhaps this could be an option in the config.json
file, so the user could decide whether they would like to use this feature or not.
I realise you're using SLURM to parallelise some of the jobs. Perhaps you could consider alternatives, for example GNU parallel? Although that will probably not be as efficient as issuing separate sbatch
jobs, it is more general, and doesn't depend on a specific scheduler.
There's also other parallelisation alternatives in R, like using the parallel
package (with has parallel implementations of *apply()
functions), which would remove dependency on OpenMPI, but possibly still provide some increased performance.
python version
Perhaps consider converting code to python3
as the standard?
Because there's so many dependencies, you could also consider releasing the software as a docker container, making the user's life even easier 😄
Hope this helps!
"adding error handling" should be a general issue.
But, there should be error handling specifically around micro array vs rnaseq input -- are both expected to be logged, eg, at entry? If not, then there should be a way to check if the values are likely log values or not (maybe even just looking for float vs int?)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.