GithubHelp home page GithubHelp logo

julianspaeth / random-survival-forest Goto Github PK

View Code? Open in Web Editor NEW
54.0 3.0 9.0 83 KB

A Random Survival Forest implementation for python inspired by Ishwaran et al. - Easily understandable, adaptable and extendable.

License: Other

Python 98.90% Shell 1.10%
python survival-analysis survival-prediction random-survival-forests random-forest machine-learning

random-survival-forest's Introduction

BuyMeACoffee

Go Python TypeScript

Stats

random-survival-forest's People

Contributors

julianspaeth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

random-survival-forest's Issues

Starting code

@julianspaeth
Hi Julian.
What is x and y? also x_val and y_val?
I think you mean x and x_val are same.
And i want to know they should be dataframe not array.
Also, what type of y_val['time'] ?

If you describe dataset for x and y specifically, It would be better for everyone to understand :)

Increase in running time for fit function

I am running a dataset that has 12K rows, with this RSF module. The fit function has been running for the past 2 days and there is no output. can someone please confirm if this is expected?

Also, how does this address an imbalanced dataset?

Memory Error:

The fit method keeps returning a Memory Error without any additional information. It looks like it is getting hung up on the oob score. This is the error code I get. I'm wondering, is there is a dataframe size that works well for you? I'm training on about 90,000 samples with about 15 feature columns. Thanks for any help you can give.

MemoryError Traceback (most recent call last)
in
1 timeline = range(0, 10, 1)
2 rsf = RandomSurvivalForest(n_estimators=10, timeline=timeline, n_jobs = -1)
----> 3 rsf.fit(X_train, y_train)
4 round(rsf.oob_score, 3)
5 0.76

C:\ProgramData\Anaconda3\lib\site-packages\random_survival_forest\RandomSurvivalForest.py in fit(self, x, y)
59 self.bootstraps.append(self.bootstrap_idxs[i])
60
---> 61 self.oob_score = self.compute_oob_score(x, y)
62
63 return self

C:\ProgramData\Anaconda3\lib\site-packages\random_survival_forest\RandomSurvivalForest.py in compute_oob_score(self, x, y)
99 """
100 oob_ensembles = self.compute_oob_ensembles(x)
--> 101 c = concordance_index(y_time=y.iloc[:, 0], y_pred=oob_ensembles, y_event=y.iloc[:, 1])
102 return c
103

C:\ProgramData\Anaconda3\lib\site-packages\random_survival_forest\scoring.py in concordance_index(y_time, y_pred, y_event)
11 """
12 oob_predicted_outcome = [x.sum() for x in y_pred]
---> 13 possible_pairs = list(combinations(range(len(y_pred)), 2))
14 concordance = 0
15 permissible = 0

If the survival time is 'float' object

Hi, I have a question when I want to fit the model.
If the survival time is 'float' object, an error occurs,
TypeError: 'float' object cannot be interpreted as an integer, because in the "RandomSurvivalForest.py", line 46, in fit
self.timeline = range(y.iloc[:, 0].min(), y.iloc[:, 0].max(), 1)
Is there a solution for this?
Thanks.

Error during model fitting

rsf.fit(X_train, Y_train)

File "C:\Anaconda2\lib\site-packages\random_survival_forest\RandomSurvivalForest.py", line 54, in fit
for i in range(self.n_estimators))
File "C:\Anaconda2\lib\site-packages\joblib\parallel.py", line 921, in call
if self.dispatch_one_batch(iterator):
File "C:\Anaconda2\lib\site-packages\joblib\parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)
File "C:\Anaconda2\lib\site-packages\joblib\parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "C:\Anaconda2\lib\site-packages\joblib_parallel_backends.py", line 182, in apply_async
result = ImmediateResult(func)
File "C:\Anaconda2\lib\site-packages\joblib_parallel_backends.py", line 549, in init
self.results = batch()
File "C:\Anaconda2\lib\site-packages\joblib\parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "C:\Anaconda2\lib\site-packages\random_survival_forest\RandomSurvivalForest.py", line 81, in create_tree
unique_deaths=self.unique_deaths, min_leaf=self.min_leaf, random_state=self.random_states[i])
File "C:\Anaconda2\lib\site-packages\random_survival_forest\SurvivalTree.py", line 36, in init
self.grow_tree()
File "C:\Anaconda2\lib\site-packages\random_survival_forest\SurvivalTree.py", line 45, in grow_tree
self.score, self.split_val, self.split_var, lhs_idxs_opt, rhs_idxs_opt = find_split(self)
File "C:\Anaconda2\lib\site-packages\random_survival_forest\splitting.py", line 16, in find_split
score, split_val, lhs_idxs, rhs_idxs = find_best_split_for_variable(node, i)
File "C:\Anaconda2\lib\site-packages\random_survival_forest\splitting.py", line 37, in find_best_split_for_variable
min_leaf=node.min_leaf)
File "C:\Anaconda2\lib\site-packages\random_survival_forest\splitting.py", line 66, in logrank_statistics
event_observed_A=event_observed_a, event_observed_B=event_observed_b)
TypeError: logrank_test() takes at least 2 arguments (2 given)

2^32-1 too high for numpy randint "int32"

In generating random states for the fitting process, for whatever reason, numpy is trying to use int32 as the dtype for the randomly generated integers, making the call on line 50 of RandomSurvivalForest.py (specifically the high bound 2**32-1) throw an out of bound error. On my own code, I just changed the value to 2**16-1 and it works fine, but wasn't sure if you wanted to change things a different way to remedy this.

File "C:\Program Files\Python38\lib\site-packages\random_survival_forest\RandomSurvivalForest.py", line 50, in fit
self.random_states = np.random.RandomState(seed=self.random_state).randint(0, 2**32-1, self.n_estimators)
File "mtrand.pyx", line 745, in numpy.random.mtrand.RandomState.randint
File "_bounded_integers.pyx", line 1360, in numpy.random._bounded_integers._rand_int32
ValueError: high is out of bounds for int32

I'm working on a windows 10 machine, Python version is 3.8.1
numpy version is 1.18.1
downloaded random-survival-forest a couple weeks ago, so I don't think anything relevant to this has changed since then.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.