Code to generate Figure 2 of the paper "Predicting age from the transcriptome of human dermal fibroblasts" from the FPKM tables

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 95.47% Python 4.53%

predicting-age-from-the-transcriptome-of-human-dermal-fibroblasts's Introduction

Jason's home page. Based on the al-folio theme

predicting-age-from-the-transcriptome-of-human-dermal-fibroblasts's People

Contributors

Stargazers

Watchers

Forkers

chipsxu premith82 smarted

predicting-age-from-the-transcriptome-of-human-dermal-fibroblasts's Issues

could this pipeline used for other datasets

Hi,

This .ipynb file is only for generate the figure in your paper. I wonder if there is a general version, which could be used for other dataset? As you mentioned in the paper the general applicability for E-MTAB-3037.
Then maybe need some guide for how to arrange the input. And it would be better if the input with other changeable options could be arguments and the script could run just one line :)

accessing linregr genes and coefficients without running the LDA ensemble

Hi,
In your readme you refer to examining the genes and coefficients used for prediction by looking at the variables in the linregr class, specifically linregr.genecolumns_ and linregr.coef_ . However without running the LDA ensemble linregr does not contain those variables.
Specifically if I run:
linregr.__dict__.keys()
I get

dict_keys(['subset_min', 'subset_fold', 'subset_logT', 'convfpkmToTpm', 'verbose', 'fit_intercept', 'normalize', 'copy_X', 'n_jobs'])

Is there a way to access those details without running the LDA ensemble (or perhaps they are available eslewhere)?

Thanks,
Avital

error when using the new version: Train your own predictor

Hi,

I tried to run the new version Train your own predictor.
And got several errors:

when run this line:
"ensemble = subset_genes_ensemble(clf=clf, class_size=20, subset_fold=5,subset_min=5,subset_logT=True,verbose=True)",
it seems subset_logT is not defined.

my solution is to remove this command, then, it can work. But I guess, then it will not do log-transform?

2)for the command line:
"for train, test in crossval.split(fpkm,ages):"
comes to an error: "name 'fpkm' is not defined"

Which I don't know how to solve.

3)When I look at the script, I have several questions also.
a)if the range of my data is not 0-70, but for example 20-100, will it still work? will the age classifer fit as well?
b)could it also run other ML algorithm as well? In the old version, can run linear regression also.

Looking forward to your response! Thanks!

out of memory

Hi,
I am very interested in your work. However I wonder how large memory I should use. Now I used 80Gb but it is still not enough.
Best

Clarification on bestmodel pkl

Hello - many thanks for sharing this code. It' is much appreciated.

I'm trying to get an intuitive understand of what fig2_bestmodel_Ensemble LDA.pkl represents.

My guess is:

it's the model that's the best result found from a grid search.
however, the model is then re-fit to generate the scatter plot in Fig. 1 A. That is, I think you cycle through the leave-one-out set, re-fitting the model and plotting the prediction of the item left out (repeat for all 133 individuals).

So I'm assuming actual saved fit in the .pkl file isn't super relevant: it's more the hyperparameters (class size) that's important.

Apologies if I've got that totally wrong. I'm looking for clarification on the saved model. Eg., is it a model you could use to, for example, predict age from a totally new sample produced from the same RNA-Seq process.

out of

Error while running make_figs()

Hi,
Although the notebook ran successfully yesterday, after restarting my computer I tried re-running the notebook and received the error below in the cell that runs the make_figs() function:
elastregr = make_figs( 'Elastic net', model=subelast, #search_cval=search_cval, parameters=parameters, # uncomment these lines #plot_cval=LeaveOneOut(), # to rerun the analysis from scratch #lcurve_cval=lcurve_cval, # instead of loading results from disk njobs=njobs)
I there anyway around this that you can recommend?

Thanks,
Avital

UnicodeDecodeError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\joblib\numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
525 try:
--> 526 obj = unpickler.load()
527 if unpickler.compat_mode:

~\Anaconda3\lib\pickle.py in load(self)
1084 assert isinstance(key, bytes_types)
-> 1085 dispatchkey[0]
1086 except _Stop as stopinst:

~\Anaconda3\lib\pickle.py in load_short_binstring(self)
1260 data = self.read(len)
-> 1261 self.append(self._decode_string(data))
1262 dispatch[SHORT_BINSTRING[0]] = load_short_binstring

~\Anaconda3\lib\pickle.py in _decode_string(self, value)
1200 else:
-> 1201 return value.decode(self.encoding, self.errors)
1202

UnicodeDecodeError: 'ascii' codec can't decode byte 0x9a in position 0: ordinal not in range(128)

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last)
in
12 #plot_cval=LeaveOneOut(), # to rerun the analysis from scratch
13 #lcurve_cval=lcurve_cval, # instead of loading results from disk
---> 14 njobs=njobs)

in make_figs(name, model, search_cval, parameters, plot_cval, lcurve_cval, njobs)
14 clf = model
15 else: # load up a previously saved model
---> 16 clf = load('fig2_bestmodel_{}.pkl'.format(name))
17
18 print("Using ",clf)

~\Anaconda3\lib\site-packages\joblib\numpy_pickle.py in load(filename, mmap_mode)
596 return load_compatibility(fobj)
597
--> 598 obj = _unpickle(fobj, filename, mmap_mode)
599
600 return obj

~\Anaconda3\lib\site-packages\joblib\numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
539 'This feature is not supported by joblib.')
540 new_exc.cause = exc
--> 541 raise new_exc
542 # Reraise exception with Python 2
543 raise

ValueError: You may be trying to read with python 3 a joblib pickle generated with python 2. This feature is not supported by joblib.

jasongfleischer / predicting-age-from-the-transcriptome-of-human-dermal-fibroblasts Goto Github PK

predicting-age-from-the-transcriptome-of-human-dermal-fibroblasts's Introduction

predicting-age-from-the-transcriptome-of-human-dermal-fibroblasts's People

Contributors

Stargazers

Watchers

Forkers

predicting-age-from-the-transcriptome-of-human-dermal-fibroblasts's Issues

could this pipeline used for other datasets

accessing linregr genes and coefficients without running the LDA ensemble

error when using the new version: Train your own predictor

out of memory

Clarification on bestmodel pkl

out of

Error while running make_figs()

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs