julianspaeth / random-survival-forest Goto Github PK
View Code? Open in Web Editor NEWA Random Survival Forest implementation for python inspired by Ishwaran et al. - Easily understandable, adaptable and extendable.
License: Other
A Random Survival Forest implementation for python inspired by Ishwaran et al. - Easily understandable, adaptable and extendable.
License: Other
@julianspaeth
Hi Julian.
What is x and y? also x_val and y_val?
I think you mean x and x_val are same.
And i want to know they should be dataframe not array.
Also, what type of y_val['time'] ?
If you describe dataset for x and y specifically, It would be better for everyone to understand :)
I am running a dataset that has 12K rows, with this RSF module. The fit function has been running for the past 2 days and there is no output. can someone please confirm if this is expected?
Also, how does this address an imbalanced dataset?
The fit method keeps returning a Memory Error without any additional information. It looks like it is getting hung up on the oob score. This is the error code I get. I'm wondering, is there is a dataframe size that works well for you? I'm training on about 90,000 samples with about 15 feature columns. Thanks for any help you can give.
MemoryError Traceback (most recent call last)
in
1 timeline = range(0, 10, 1)
2 rsf = RandomSurvivalForest(n_estimators=10, timeline=timeline, n_jobs = -1)
----> 3 rsf.fit(X_train, y_train)
4 round(rsf.oob_score, 3)
5 0.76
C:\ProgramData\Anaconda3\lib\site-packages\random_survival_forest\RandomSurvivalForest.py in fit(self, x, y)
59 self.bootstraps.append(self.bootstrap_idxs[i])
60
---> 61 self.oob_score = self.compute_oob_score(x, y)
62
63 return self
C:\ProgramData\Anaconda3\lib\site-packages\random_survival_forest\RandomSurvivalForest.py in compute_oob_score(self, x, y)
99 """
100 oob_ensembles = self.compute_oob_ensembles(x)
--> 101 c = concordance_index(y_time=y.iloc[:, 0], y_pred=oob_ensembles, y_event=y.iloc[:, 1])
102 return c
103
C:\ProgramData\Anaconda3\lib\site-packages\random_survival_forest\scoring.py in concordance_index(y_time, y_pred, y_event)
11 """
12 oob_predicted_outcome = [x.sum() for x in y_pred]
---> 13 possible_pairs = list(combinations(range(len(y_pred)), 2))
14 concordance = 0
15 permissible = 0
Hi, I have a question when I want to fit the model.
If the survival time is 'float' object, an error occurs,
TypeError: 'float' object cannot be interpreted as an integer
, because in the "RandomSurvivalForest.py", line 46, in fit
self.timeline = range(y.iloc[:, 0].min(), y.iloc[:, 0].max(), 1)
Is there a solution for this?
Thanks.
rsf.fit(X_train, Y_train)
File "C:\Anaconda2\lib\site-packages\random_survival_forest\RandomSurvivalForest.py", line 54, in fit
for i in range(self.n_estimators))
File "C:\Anaconda2\lib\site-packages\joblib\parallel.py", line 921, in call
if self.dispatch_one_batch(iterator):
File "C:\Anaconda2\lib\site-packages\joblib\parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)
File "C:\Anaconda2\lib\site-packages\joblib\parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "C:\Anaconda2\lib\site-packages\joblib_parallel_backends.py", line 182, in apply_async
result = ImmediateResult(func)
File "C:\Anaconda2\lib\site-packages\joblib_parallel_backends.py", line 549, in init
self.results = batch()
File "C:\Anaconda2\lib\site-packages\joblib\parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "C:\Anaconda2\lib\site-packages\random_survival_forest\RandomSurvivalForest.py", line 81, in create_tree
unique_deaths=self.unique_deaths, min_leaf=self.min_leaf, random_state=self.random_states[i])
File "C:\Anaconda2\lib\site-packages\random_survival_forest\SurvivalTree.py", line 36, in init
self.grow_tree()
File "C:\Anaconda2\lib\site-packages\random_survival_forest\SurvivalTree.py", line 45, in grow_tree
self.score, self.split_val, self.split_var, lhs_idxs_opt, rhs_idxs_opt = find_split(self)
File "C:\Anaconda2\lib\site-packages\random_survival_forest\splitting.py", line 16, in find_split
score, split_val, lhs_idxs, rhs_idxs = find_best_split_for_variable(node, i)
File "C:\Anaconda2\lib\site-packages\random_survival_forest\splitting.py", line 37, in find_best_split_for_variable
min_leaf=node.min_leaf)
File "C:\Anaconda2\lib\site-packages\random_survival_forest\splitting.py", line 66, in logrank_statistics
event_observed_A=event_observed_a, event_observed_B=event_observed_b)
TypeError: logrank_test() takes at least 2 arguments (2 given)
In generating random states for the fitting process, for whatever reason, numpy is trying to use int32 as the dtype for the randomly generated integers, making the call on line 50 of RandomSurvivalForest.py (specifically the high bound 2**32-1) throw an out of bound error. On my own code, I just changed the value to 2**16-1 and it works fine, but wasn't sure if you wanted to change things a different way to remedy this.
File "C:\Program Files\Python38\lib\site-packages\random_survival_forest\RandomSurvivalForest.py", line 50, in fit
self.random_states = np.random.RandomState(seed=self.random_state).randint(0, 2**32-1, self.n_estimators)
File "mtrand.pyx", line 745, in numpy.random.mtrand.RandomState.randint
File "_bounded_integers.pyx", line 1360, in numpy.random._bounded_integers._rand_int32
ValueError: high is out of bounds for int32
I'm working on a windows 10 machine, Python version is 3.8.1
numpy version is 1.18.1
downloaded random-survival-forest a couple weeks ago, so I don't think anything relevant to this has changed since then.
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.