julianspaeth / random-survival-forest Goto Github PK

A Random Survival Forest implementation for python inspired by Ishwaran et al. - Easily understandable, adaptable and extendable.

License: Other

Python 98.90% Shell 1.10%

python survival-analysis survival-prediction random-survival-forests random-forest machine-learning

random-survival-forest's Introduction

random-survival-forest's People

Contributors

Stargazers

Watchers

Forkers

aashay246 r1551z mahmudrahman xevilly eternity666 faturan yingguo-zjw romy-vos 302850047

random-survival-forest's Issues

Starting code

@julianspaeth
Hi Julian.
What is x and y? also x_val and y_val?
I think you mean x and x_val are same.
And i want to know they should be dataframe not array.
Also, what type of y_val['time'] ?

If you describe dataset for x and y specifically, It would be better for everyone to understand :)

Increase in running time for fit function

I am running a dataset that has 12K rows, with this RSF module. The fit function has been running for the past 2 days and there is no output. can someone please confirm if this is expected?

Also, how does this address an imbalanced dataset?

The fit method keeps returning a Memory Error without any additional information. It looks like it is getting hung up on the oob score. This is the error code I get. I'm wondering, is there is a dataframe size that works well for you? I'm training on about 90,000 samples with about 15 feature columns. Thanks for any help you can give.

MemoryError Traceback (most recent call last)
in
1 timeline = range(0, 10, 1)
2 rsf = RandomSurvivalForest(n_estimators=10, timeline=timeline, n_jobs = -1)
----> 3 rsf.fit(X_train, y_train)
4 round(rsf.oob_score, 3)
5 0.76

C:\ProgramData\Anaconda3\lib\site-packages\random_survival_forest\RandomSurvivalForest.py in fit(self, x, y)
59 self.bootstraps.append(self.bootstrap_idxs[i])
60
---> 61 self.oob_score = self.compute_oob_score(x, y)
62
63 return self

C:\ProgramData\Anaconda3\lib\site-packages\random_survival_forest\RandomSurvivalForest.py in compute_oob_score(self, x, y)
99 """
100 oob_ensembles = self.compute_oob_ensembles(x)
--> 101 c = concordance_index(y_time=y.iloc[:, 0], y_pred=oob_ensembles, y_event=y.iloc[:, 1])
102 return c
103

C:\ProgramData\Anaconda3\lib\site-packages\random_survival_forest\scoring.py in concordance_index(y_time, y_pred, y_event)
11 """
12 oob_predicted_outcome = [x.sum() for x in y_pred]
---> 13 possible_pairs = list(combinations(range(len(y_pred)), 2))
14 concordance = 0
15 permissible = 0

invalid imports in node.py

If the survival time is 'float' object

Hi, I have a question when I want to fit the model.
If the survival time is 'float' object, an error occurs,
TypeError: 'float' object cannot be interpreted as an integer, because in the "RandomSurvivalForest.py", line 46, in fit
self.timeline = range(y.iloc[:, 0].min(), y.iloc[:, 0].max(), 1)
Is there a solution for this?
Thanks.

Error during model fitting

rsf.fit(X_train, Y_train)

File "C:\Anaconda2\lib\site-packages\random_survival_forest\RandomSurvivalForest.py", line 54, in fit
for i in range(self.n_estimators))
File "C:\Anaconda2\lib\site-packages\joblib\parallel.py", line 921, in call
if self.dispatch_one_batch(iterator):
File "C:\Anaconda2\lib\site-packages\joblib\parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)
File "C:\Anaconda2\lib\site-packages\joblib\parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "C:\Anaconda2\lib\site-packages\joblib_parallel_backends.py", line 182, in apply_async
result = ImmediateResult(func)
File "C:\Anaconda2\lib\site-packages\joblib_parallel_backends.py", line 549, in init
self.results = batch()
File "C:\Anaconda2\lib\site-packages\joblib\parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "C:\Anaconda2\lib\site-packages\random_survival_forest\RandomSurvivalForest.py", line 81, in create_tree
unique_deaths=self.unique_deaths, min_leaf=self.min_leaf, random_state=self.random_states[i])
File "C:\Anaconda2\lib\site-packages\random_survival_forest\SurvivalTree.py", line 36, in init
self.grow_tree()
File "C:\Anaconda2\lib\site-packages\random_survival_forest\SurvivalTree.py", line 45, in grow_tree
self.score, self.split_val, self.split_var, lhs_idxs_opt, rhs_idxs_opt = find_split(self)
File "C:\Anaconda2\lib\site-packages\random_survival_forest\splitting.py", line 16, in find_split
score, split_val, lhs_idxs, rhs_idxs = find_best_split_for_variable(node, i)
File "C:\Anaconda2\lib\site-packages\random_survival_forest\splitting.py", line 37, in find_best_split_for_variable
min_leaf=node.min_leaf)
File "C:\Anaconda2\lib\site-packages\random_survival_forest\splitting.py", line 66, in logrank_statistics
event_observed_A=event_observed_a, event_observed_B=event_observed_b)
TypeError: logrank_test() takes at least 2 arguments (2 given)

2^32-1 too high for numpy randint "int32"

In generating random states for the fitting process, for whatever reason, numpy is trying to use int32 as the dtype for the randomly generated integers, making the call on line 50 of RandomSurvivalForest.py (specifically the high bound 2**32-1) throw an out of bound error. On my own code, I just changed the value to 2**16-1 and it works fine, but wasn't sure if you wanted to change things a different way to remedy this.

File "C:\Program Files\Python38\lib\site-packages\random_survival_forest\RandomSurvivalForest.py", line 50, in fit
self.random_states = np.random.RandomState(seed=self.random_state).randint(0, 2**32-1, self.n_estimators)
File "mtrand.pyx", line 745, in numpy.random.mtrand.RandomState.randint
File "_bounded_integers.pyx", line 1360, in numpy.random._bounded_integers._rand_int32
ValueError: high is out of bounds for int32

I'm working on a windows 10 machine, Python version is 3.8.1
numpy version is 1.18.1
downloaded random-survival-forest a couple weeks ago, so I don't think anything relevant to this has changed since then.

Thanks!

julianspaeth / random-survival-forest Goto Github PK

random-survival-forest's Introduction

random-survival-forest's People

Contributors

Stargazers

Watchers

Forkers

random-survival-forest's Issues

set n_jobs to 1

Starting code

Increase in running time for fit function

Memory Error:

invalid imports in node.py

If the survival time is 'float' object

Error during model fitting

2^32-1 too high for numpy randint "int32"

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs