The netprophet_2.0 from brentlab

Improving ease of installation and dependencies

Hi,

firstly, thanks for bringing together all of this code and tools.

My issue is to do with its installation, so I was wondering if you are thinking of improving this part of its implementation? At the moment this is very very hard to install and make workable, because it seems specifically designed for your system, so it doesn't generalise very well. I've been trying to tweak your code to make this work on our system, but it's proving quite difficult and suboptimal, as I'm having to remove some of the parallelisation you did with SLURM.

I appreciate that this is quite a complex pipeline, with many dependencies. I am installing this for another user, not myself, therefore this feedback is based on the experience of trying to install your tool, not the results it produces.

Here's a few things I would suggest, I hope they make sense:

Keep code separate from data

At the moment, the scripts depend on having the code (SRC and CODE directories) and the data (RESOURCES, OUTPUT and LOG) within the same directory (NetProphet_2.0). Ideally, one would like to install the software in one place, and leave it separate from the data and pipeline scripts (Snakefile and config.json).

This way several users can share the same installation but work on their own data separately.

Remove dependency on SLURM

Although SLURM is widely used, it is not the universal job scheduler used by everyone.

This would also require modifying the way your program is launched. At the moment you've made a small shell script to submit via sbatch. Perhaps instead you could have instructions of how the user could write a basic submission script, which could be adapted to whichever scheduler they use.

For example, at the moment your script uses 2 cores with snakemake, but that could be decided by the user?

The package restorepoint used by build_bart_network.r is no longer available on CRAN.

Parallel options

Some parallelisation options are "on" without an choice. For example, run_build_bart_network.sh hard codes useMpi=TRUE, which doesn't make it optional anymore. Also, in the Snakefile you have turned on -c flag in CODE/build_motif_network.py step, again not making it optional.

Perhaps this could be an option in the config.json file, so the user could decide whether they would like to use this feature or not.

I realise you're using SLURM to parallelise some of the jobs. Perhaps you could consider alternatives, for example GNU parallel? Although that will probably not be as efficient as issuing separate sbatch jobs, it is more general, and doesn't depend on a specific scheduler.

There's also other parallelisation alternatives in R, like using the parallel package (with has parallel implementations of *apply() functions), which would remove dependency on OpenMPI, but possibly still provide some increased performance.

python version

Perhaps consider converting code to python3 as the standard?

Because there's so many dependencies, you could also consider releasing the software as a docker container, making the user's life even easier 😄

Hope this helps!

error handling -- expression input

"adding error handling" should be a general issue.

But, there should be error handling specifically around micro array vs rnaseq input -- are both expected to be logged, eg, at entry? If not, then there should be a way to check if the values are likely log values or not (maybe even just looking for float vs int?)

brentlab / netprophet_2.0 Goto Github PK

netprophet_2.0's People

Stargazers

Watchers

Forkers

netprophet_2.0's Issues

Improving ease of installation and dependencies

image link in README.md is broken

error handling -- expression input

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs