GithubHelp home page GithubHelp logo

brentlab / netprophet_2.0 Goto Github PK

View Code? Open in Web Editor NEW
12.0 6.0 5.0 88.92 MB

A “data light” TF-network mapping algorithm using only gene expression and genome sequence data.

License: MIT License

R 39.57% Python 30.65% Shell 28.74% Ruby 1.04%
transcription-factors transcriptional-regulatory-network gene-regulation gene-expression

netprophet_2.0's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

netprophet_2.0's Issues

Improving ease of installation and dependencies

Hi,

firstly, thanks for bringing together all of this code and tools.

My issue is to do with its installation, so I was wondering if you are thinking of improving this part of its implementation? At the moment this is very very hard to install and make workable, because it seems specifically designed for your system, so it doesn't generalise very well. I've been trying to tweak your code to make this work on our system, but it's proving quite difficult and suboptimal, as I'm having to remove some of the parallelisation you did with SLURM.

I appreciate that this is quite a complex pipeline, with many dependencies. I am installing this for another user, not myself, therefore this feedback is based on the experience of trying to install your tool, not the results it produces.

Here's a few things I would suggest, I hope they make sense:


Keep code separate from data

At the moment, the scripts depend on having the code (SRC and CODE directories) and the data (RESOURCES, OUTPUT and LOG) within the same directory (NetProphet_2.0). Ideally, one would like to install the software in one place, and leave it separate from the data and pipeline scripts (Snakefile and config.json).

This way several users can share the same installation but work on their own data separately.


Remove dependency on SLURM

Although SLURM is widely used, it is not the universal job scheduler used by everyone.

This would also require modifying the way your program is launched. At the moment you've made a small shell script to submit via sbatch. Perhaps instead you could have instructions of how the user could write a basic submission script, which could be adapted to whichever scheduler they use.

For example, at the moment your script uses 2 cores with snakemake, but that could be decided by the user?


The package restorepoint used by build_bart_network.r is no longer available on CRAN.


Parallel options

Some parallelisation options are "on" without an choice. For example, run_build_bart_network.sh hard codes useMpi=TRUE, which doesn't make it optional anymore. Also, in the Snakefile you have turned on -c flag in CODE/build_motif_network.py step, again not making it optional.

Perhaps this could be an option in the config.json file, so the user could decide whether they would like to use this feature or not.

I realise you're using SLURM to parallelise some of the jobs. Perhaps you could consider alternatives, for example GNU parallel? Although that will probably not be as efficient as issuing separate sbatch jobs, it is more general, and doesn't depend on a specific scheduler.

There's also other parallelisation alternatives in R, like using the parallel package (with has parallel implementations of *apply() functions), which would remove dependency on OpenMPI, but possibly still provide some increased performance.


python version

Perhaps consider converting code to python3 as the standard?


Because there's so many dependencies, you could also consider releasing the software as a docker container, making the user's life even easier 😄


Hope this helps!

error handling -- expression input

"adding error handling" should be a general issue.

But, there should be error handling specifically around micro array vs rnaseq input -- are both expected to be logged, eg, at entry? If not, then there should be a way to check if the values are likely log values or not (maybe even just looking for float vs int?)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.