GithubHelp home page GithubHelp logo

jasongfleischer / predicting-age-from-the-transcriptome-of-human-dermal-fibroblasts Goto Github PK

View Code? Open in Web Editor NEW
21.0 21.0 3.0 53.98 MB

Code to generate Figure 2 of the paper "Predicting age from the transcriptome of human dermal fibroblasts" from the FPKM tables

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 95.47% Python 4.53%

predicting-age-from-the-transcriptome-of-human-dermal-fibroblasts's Introduction

predicting-age-from-the-transcriptome-of-human-dermal-fibroblasts's People

Contributors

jasongfleischer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

predicting-age-from-the-transcriptome-of-human-dermal-fibroblasts's Issues

could this pipeline used for other datasets

Hi,

This .ipynb file is only for generate the figure in your paper. I wonder if there is a general version, which could be used for other dataset? As you mentioned in the paper the general applicability for E-MTAB-3037.
Then maybe need some guide for how to arrange the input. And it would be better if the input with other changeable options could be arguments and the script could run just one line :)

accessing linregr genes and coefficients without running the LDA ensemble

Hi,
In your readme you refer to examining the genes and coefficients used for prediction by looking at the variables in the linregr class, specifically linregr.genecolumns_ and linregr.coef_ . However without running the LDA ensemble linregr does not contain those variables.
Specifically if I run:
linregr.__dict__.keys()
I get

dict_keys(['subset_min', 'subset_fold', 'subset_logT', 'convfpkmToTpm', 'verbose', 'fit_intercept', 'normalize', 'copy_X', 'n_jobs'])

Is there a way to access those details without running the LDA ensemble (or perhaps they are available eslewhere)?

Thanks,
Avital

error when using the new version: Train your own predictor

Hi,

I tried to run the new version Train your own predictor.
And got several errors:

  1. when run this line:
    "ensemble = subset_genes_ensemble(clf=clf, class_size=20, subset_fold=5,subset_min=5,subset_logT=True,verbose=True)",
    it seems subset_logT is not defined.

my solution is to remove this command, then, it can work. But I guess, then it will not do log-transform?

2)for the command line:
"for train, test in crossval.split(fpkm,ages):"
comes to an error: "name 'fpkm' is not defined"

Which I don't know how to solve.

3)When I look at the script, I have several questions also.
a)if the range of my data is not 0-70, but for example 20-100, will it still work? will the age classifer fit as well?
b)could it also run other ML algorithm as well? In the old version, can run linear regression also.

Looking forward to your response! Thanks!

out of memory

Hi,
I am very interested in your work. However I wonder how large memory I should use. Now I used 80Gb but it is still not enough.
Best

Clarification on bestmodel pkl

Hello - many thanks for sharing this code. It' is much appreciated.

I'm trying to get an intuitive understand of what fig2_bestmodel_Ensemble LDA.pkl represents.

My guess is:

  • it's the model that's the best result found from a grid search.

  • however, the model is then re-fit to generate the scatter plot in Fig. 1 A. That is, I think you cycle through the leave-one-out set, re-fitting the model and plotting the prediction of the item left out (repeat for all 133 individuals).

So I'm assuming actual saved fit in the .pkl file isn't super relevant: it's more the hyperparameters (class size) that's important.

Apologies if I've got that totally wrong. I'm looking for clarification on the saved model. Eg., is it a model you could use to, for example, predict age from a totally new sample produced from the same RNA-Seq process.

Error while running make_figs()

Hi,
Although the notebook ran successfully yesterday, after restarting my computer I tried re-running the notebook and received the error below in the cell that runs the make_figs() function:
elastregr = make_figs( 'Elastic net', model=subelast, #search_cval=search_cval, parameters=parameters, # uncomment these lines #plot_cval=LeaveOneOut(), # to rerun the analysis from scratch #lcurve_cval=lcurve_cval, # instead of loading results from disk njobs=njobs)
I there anyway around this that you can recommend?

Thanks,
Avital


UnicodeDecodeError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\joblib\numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
525 try:
--> 526 obj = unpickler.load()
527 if unpickler.compat_mode:

~\Anaconda3\lib\pickle.py in load(self)
1084 assert isinstance(key, bytes_types)
-> 1085 dispatchkey[0]
1086 except _Stop as stopinst:

~\Anaconda3\lib\pickle.py in load_short_binstring(self)
1260 data = self.read(len)
-> 1261 self.append(self._decode_string(data))
1262 dispatch[SHORT_BINSTRING[0]] = load_short_binstring

~\Anaconda3\lib\pickle.py in _decode_string(self, value)
1200 else:
-> 1201 return value.decode(self.encoding, self.errors)
1202

UnicodeDecodeError: 'ascii' codec can't decode byte 0x9a in position 0: ordinal not in range(128)

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last)
in
12 #plot_cval=LeaveOneOut(), # to rerun the analysis from scratch
13 #lcurve_cval=lcurve_cval, # instead of loading results from disk
---> 14 njobs=njobs)

in make_figs(name, model, search_cval, parameters, plot_cval, lcurve_cval, njobs)
14 clf = model
15 else: # load up a previously saved model
---> 16 clf = load('fig2_bestmodel_{}.pkl'.format(name))
17
18 print("Using ",clf)

~\Anaconda3\lib\site-packages\joblib\numpy_pickle.py in load(filename, mmap_mode)
596 return load_compatibility(fobj)
597
--> 598 obj = _unpickle(fobj, filename, mmap_mode)
599
600 return obj

~\Anaconda3\lib\site-packages\joblib\numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
539 'This feature is not supported by joblib.')
540 new_exc.cause = exc
--> 541 raise new_exc
542 # Reraise exception with Python 2
543 raise

ValueError: You may be trying to read with python 3 a joblib pickle generated with python 2. This feature is not supported by joblib.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.