Comments (5)
This R script will recreate the memory bug in the R C5.0 code.
Technically bug is not in the R code, but in the C classes invoked by the
CRAN C50 package. See handling of dynamic memory allocation (use of callloc,
malloc, realloc, and free) in that code. Running this will result in a
Segmentation Fault in Linux if invoked through rApache, or will result in a
session fatal error if performed in RStudio.
I have tested this with older versions of R C5.0, and have found the same results.
So, this segmentation fault bug has been around awhile.
Tested with the following versions:
-
C5.0 version 0.0.9
-
C5.0 version 0.1.0-013
-
C5.0 version 0.1.0-15
-
C5.0 version 0.1.0-16
Author: Tony Boarman
Date: 02/27/2014
runMyTest <- function() {
# My testing has shown that running these scenarios will always recreate the memory allocation error,
# however if it does not then call runTest with higher times-to-run values.
runTest(7,20)
}
runTest <- function( timesToRunType1, timesToRunType2 ) {
require(C50)
control <- C5.0Control(minCases= 30)
# Run Type 1 Rule Model serveral times with mishandling dynamic memory
runConsecutiveType1RuleModels( timesToRunType1, control )
# Now, change things up so that the C5.0 code will try to address the dynamic memory differently
runConsecutiveType2RuleModels( timesToRunType2, control )
}
runConsecutiveType1RuleModels <- function( timesToRun, control ) {
# Type 1 Rule Model uses numeric predictor fields and will result in a Rule Model with 10 Rules.
# size = 10
#Test Data Setup
columnClasses <- c(FN_FN_LOYAL="factor", QC_CONTINU="numeric", QC_COMMIT="numeric", QC_ALOTDF="numeric")
formula <- as.formula(`FN_FN_LOYAL` ~ `QC_CONTINU` + `QC_COMMIT` + `QC_ALOTDF`)
csv <- getHappyCaseResultData()
#Create as dataframe
caseData <- convertCsvToCaseDataFrame( csv, columnClasses )
runRuleModels( timesToRun, formula, caseData, control )
}
runConsecutiveType2RuleModels <- function( timesToRun, control ) {
# Type 2 Rule Model uses factor predictor fields and will result in a Rule Model where no Rules can be calcualted.
# size = 0.
#Test Data Setup
columnClasses <- c(FN_FN_LOYAL="factor", SC_CUST_CLASS="factor", SC_GROUP_ID="factor", SC_LANGUAGE_ID="factor", SC_SIC_CODE="factor", QC_BENCHMARK="factor", RC_LAST_SECTION="factor", QC_B_ALOTDF="factor")
formula <- as.formula(`FN_FN_LOYAL` ~ `SC_CUST_CLASS` + `SC_GROUP_ID` + `SC_LANGUAGE_ID` + `SC_SIC_CODE` + `QC_BENCHMARK` + `RC_LAST_SECTION` + `QC_B_ALOTDF`)
csv <- getNoRulesResultData()
#Create as dataframe
caseData <- convertCsvToCaseDataFrame( csv, columnClasses )
runRuleModels( timesToRun, formula, caseData, control )
}
runRuleModels <- function( timesToRun, formula, caseData, control ) {
str(sprintf("Running %s times using formula: ", timesToRun))
print(formula)
# Simple loop to repeat Rule Model calculation.
# The model is calculated several times in order to create a problem with dynamic memory deallocation and then
# perform an allocation of the mishandled memory. This will result in a Segmentation Fault in Linux if sent
# through rApache, or will result in a session fatal error if performed in RStudio. The actual problem is
# with the memory management in the C code for C5.0, see handling of dynamic memory allocation (use of
# callloc, malloc, realloc, and free).
if (timesToRun > 0) {
for(i in 1:timesToRun) {
str(sprintf("Run #: %s", i))
runRuleModel(formula, caseData, control)
}
}
}
runRuleModel <- function(formula, caseData, control ) {
ruleModel <- C5.0(formula= formula, data= caseData, rules= TRUE, control= control)
str(sprintf("Number of Rules: %s", ruleModel$size))
}
convertCsvToCaseDataFrame <- function( csv, columnClasses ) {
#Create as dataframe
caseData <- read.csv(
header = TRUE, # first row of CSV is header row
text = csv,
fill = FALSE,
comment.char = "",
colClasses = columnClasses,
check.names = FALSE,
na.strings = c("")
)
}
from c5.0.
If you work on this issue, please contact me and I will send you the above R script with the hard-coded data functions that you will need to run the code. These seemed a bit too big to post as a comment.
from c5.0.
Note: If running in rApache, the segmentation fault core dump shows the error occuring while trying to invoke calloc from the standard C library classes:
[Fri Jan 24 13:24:36 2014] [notice] child pid 6628 exit signal Segmentation fault (11), possible coredump in /tmp
*** glibc detected *** /usr/sbin/httpd: corrupted double-linked list: 0x00007f3c455445b0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x30a36760e6)[0x7f3c41a5f0e6]
/lib64/libc.so.6(+0x30a3679f0a)[0x7f3c41a62f0a]
/lib64/libc.so.6(__libc_calloc+0xc6)[0x7f3c41a635a6]
/usr/lib64/R/library/C50/libs/C50.so(Pcalloc+0x17)[0x7f3c341d9857]
/usr/lib64/R/library/C50/libs/C50.so(SiftRules+0x3d5)[0x7f3c341d3005]
/usr/lib64/R/library/C50/libs/C50.so(FormRules+0x346)[0x7f3c341bc656]
/usr/lib64/R/library/C50/libs/C50.so(ConstructClassifiers+0x478)[0x7f3c341b6aa8]
/usr/lib64/R/library/C50/libs/C50.so(c50main+0x269)[0x7f3c341cd279]
/usr/lib64/R/library/C50/libs/C50.so(+0x2ffed)[0x7f3c341d5fed]
/usr/lib64/R/lib/libR.so(+0x33fc293b27)[0x7f3c38898b27]
/usr/lib64/R/lib/libR.so(Rf_eval+0x6e3)[0x7f3c388cb323]
/usr/lib64/R/lib/libR.so(+0x33fc2c9328)[0x7f3c388ce328]
/usr/lib64/R/lib/libR.so(Rf_eval+0x4dc)[0x7f3c388cb11c]
/usr/lib64/R/lib/libR.so(+0x33fc2c9510)[0x7f3c388ce510]
/usr/lib64/R/lib/libR.so(Rf_eval+0x4dc)[0x7f3c388cb11c]
/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x341)[0x7f3c388bda71]
/usr/lib64/R/lib/libR.so(Rf_eval+0x20d)[0x7f3c388cae4d]
/usr/lib64/R/lib/libR.so(+0x33fc2c9328)[0x7f3c388ce328]
/usr/lib64/R/lib/libR.so(Rf_eval+0x4dc)[0x7f3c388cb11c]
/usr/lib64/R/lib/libR.so(+0x33fc2c9510)[0x7f3c388ce510]
/usr/lib64/R/lib/libR.so(Rf_eval+0x4dc)[0x7f3c388cb11c]
/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x341)[0x7f3c388bda71]
/usr/lib64/R/lib/libR.so(+0x33fc2f94e3)[0x7f3c388fe4e3]
/usr/lib64/R/lib/libR.so(+0x33fc2f99ca)[0x7f3c388fe9ca]
/usr/lib64/R/lib/libR.so(Rf_eval+0x4dc)[0x7f3c388cb11c]
/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x341)[0x7f3c388bda71]
/usr/lib64/R/lib/libR.so(Rf_eval+0x20d)[0x7f3c388cae4d]
from c5.0.
@tboarman: you should contact the author at http://www.rulequest.com/contact.html (Ross Quinlan). I only keep a copy of the source here for my own uses. Thanks.
from c5.0.
FYI: Contacted Max, the maintainer of C5.0 regarding this issue. It was a memory management/cleanup issue with the C5.0 software. This problem will be fixed by the new C5.0 version 0.1.0-17 soon to be released.
from c5.0.
Related Issues (1)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from c5.0.