GithubHelp home page GithubHelp logo

Comments (5)

tboarman avatar tboarman commented on June 3, 2024

This R script will recreate the memory bug in the R C5.0 code.
Technically bug is not in the R code, but in the C classes invoked by the
CRAN C50 package. See handling of dynamic memory allocation (use of callloc,
malloc, realloc, and free) in that code. Running this will result in a
Segmentation Fault in Linux if invoked through rApache, or will result in a
session fatal error if performed in RStudio.

I have tested this with older versions of R C5.0, and have found the same results.
So, this segmentation fault bug has been around awhile.
Tested with the following versions:

  • C5.0 version 0.0.9

  • C5.0 version 0.1.0-013

  • C5.0 version 0.1.0-15

  • C5.0 version 0.1.0-16

    Author: Tony Boarman
    Date: 02/27/2014

runMyTest <- function() {
# My testing has shown that running these scenarios will always recreate the memory allocation error,
# however if it does not then call runTest with higher times-to-run values.
runTest(7,20)
}

runTest <- function( timesToRunType1, timesToRunType2 ) {

require(C50)

control <- C5.0Control(minCases= 30)

# Run Type 1 Rule Model serveral times with mishandling dynamic memory
runConsecutiveType1RuleModels( timesToRunType1, control )

# Now, change things up so that the C5.0 code will try to address the dynamic memory differently
runConsecutiveType2RuleModels( timesToRunType2, control )

}

runConsecutiveType1RuleModels <- function( timesToRun, control ) {

# Type 1 Rule Model uses numeric predictor fields and will result in a Rule Model with 10 Rules.
# size = 10

#Test Data Setup
columnClasses <- c(FN_FN_LOYAL="factor", QC_CONTINU="numeric", QC_COMMIT="numeric", QC_ALOTDF="numeric")

formula <- as.formula(`FN_FN_LOYAL` ~ `QC_CONTINU` + `QC_COMMIT` + `QC_ALOTDF`)

csv <- getHappyCaseResultData()

#Create as dataframe
caseData <- convertCsvToCaseDataFrame( csv, columnClasses )

runRuleModels( timesToRun, formula, caseData, control )

}

runConsecutiveType2RuleModels <- function( timesToRun, control ) {

# Type 2 Rule Model uses factor predictor fields and will result in a Rule Model where no Rules can be calcualted.
# size = 0.

#Test Data Setup
columnClasses <- c(FN_FN_LOYAL="factor", SC_CUST_CLASS="factor", SC_GROUP_ID="factor", SC_LANGUAGE_ID="factor", SC_SIC_CODE="factor", QC_BENCHMARK="factor", RC_LAST_SECTION="factor", QC_B_ALOTDF="factor")

formula <- as.formula(`FN_FN_LOYAL` ~ `SC_CUST_CLASS` + `SC_GROUP_ID` + `SC_LANGUAGE_ID` + `SC_SIC_CODE` + `QC_BENCHMARK` + `RC_LAST_SECTION` + `QC_B_ALOTDF`)

csv <- getNoRulesResultData()

#Create as dataframe
caseData <- convertCsvToCaseDataFrame( csv, columnClasses )

runRuleModels( timesToRun, formula, caseData, control )

}

runRuleModels <- function( timesToRun, formula, caseData, control ) {

str(sprintf("Running %s times using formula: ", timesToRun))
print(formula)

# Simple loop to repeat Rule Model calculation.
# The model is calculated several times in order to create a problem with dynamic memory deallocation and then
# perform an allocation of the mishandled memory.  This will result in a Segmentation Fault in Linux if sent
# through rApache, or will result in a session fatal error if performed in RStudio.  The actual problem is 
# with the memory management in the C code for C5.0, see handling of dynamic memory allocation (use of 
# callloc, malloc, realloc, and free). 
if (timesToRun > 0) {
    for(i in 1:timesToRun) {
        str(sprintf("Run #: %s", i))
        runRuleModel(formula, caseData, control)
    }
}

}

runRuleModel <- function(formula, caseData, control ) {

ruleModel <- C5.0(formula= formula, data= caseData, rules= TRUE, control= control)

str(sprintf("Number of Rules: %s", ruleModel$size))

}

convertCsvToCaseDataFrame <- function( csv, columnClasses ) {

#Create as dataframe
caseData <- read.csv(
        header = TRUE,  # first row of CSV is header row
        text = csv,
        fill = FALSE,
        comment.char = "",
        colClasses = columnClasses,
        check.names = FALSE,
        na.strings = c("")
)

}

from c5.0.

tboarman avatar tboarman commented on June 3, 2024

If you work on this issue, please contact me and I will send you the above R script with the hard-coded data functions that you will need to run the code. These seemed a bit too big to post as a comment.

from c5.0.

tboarman avatar tboarman commented on June 3, 2024

Note: If running in rApache, the segmentation fault core dump shows the error occuring while trying to invoke calloc from the standard C library classes:

[Fri Jan 24 13:24:36 2014] [notice] child pid 6628 exit signal Segmentation fault (11), possible coredump in /tmp
*** glibc detected *** /usr/sbin/httpd: corrupted double-linked list: 0x00007f3c455445b0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x30a36760e6)[0x7f3c41a5f0e6]
/lib64/libc.so.6(+0x30a3679f0a)[0x7f3c41a62f0a]
/lib64/libc.so.6(__libc_calloc+0xc6)[0x7f3c41a635a6]
/usr/lib64/R/library/C50/libs/C50.so(Pcalloc+0x17)[0x7f3c341d9857]
/usr/lib64/R/library/C50/libs/C50.so(SiftRules+0x3d5)[0x7f3c341d3005]
/usr/lib64/R/library/C50/libs/C50.so(FormRules+0x346)[0x7f3c341bc656]
/usr/lib64/R/library/C50/libs/C50.so(ConstructClassifiers+0x478)[0x7f3c341b6aa8]
/usr/lib64/R/library/C50/libs/C50.so(c50main+0x269)[0x7f3c341cd279]
/usr/lib64/R/library/C50/libs/C50.so(+0x2ffed)[0x7f3c341d5fed]
/usr/lib64/R/lib/libR.so(+0x33fc293b27)[0x7f3c38898b27]
/usr/lib64/R/lib/libR.so(Rf_eval+0x6e3)[0x7f3c388cb323]
/usr/lib64/R/lib/libR.so(+0x33fc2c9328)[0x7f3c388ce328]
/usr/lib64/R/lib/libR.so(Rf_eval+0x4dc)[0x7f3c388cb11c]
/usr/lib64/R/lib/libR.so(+0x33fc2c9510)[0x7f3c388ce510]
/usr/lib64/R/lib/libR.so(Rf_eval+0x4dc)[0x7f3c388cb11c]
/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x341)[0x7f3c388bda71]
/usr/lib64/R/lib/libR.so(Rf_eval+0x20d)[0x7f3c388cae4d]
/usr/lib64/R/lib/libR.so(+0x33fc2c9328)[0x7f3c388ce328]
/usr/lib64/R/lib/libR.so(Rf_eval+0x4dc)[0x7f3c388cb11c]
/usr/lib64/R/lib/libR.so(+0x33fc2c9510)[0x7f3c388ce510]
/usr/lib64/R/lib/libR.so(Rf_eval+0x4dc)[0x7f3c388cb11c]
/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x341)[0x7f3c388bda71]
/usr/lib64/R/lib/libR.so(+0x33fc2f94e3)[0x7f3c388fe4e3]
/usr/lib64/R/lib/libR.so(+0x33fc2f99ca)[0x7f3c388fe9ca]
/usr/lib64/R/lib/libR.so(Rf_eval+0x4dc)[0x7f3c388cb11c]
/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x341)[0x7f3c388bda71]
/usr/lib64/R/lib/libR.so(Rf_eval+0x20d)[0x7f3c388cae4d]

from c5.0.

ndawe avatar ndawe commented on June 3, 2024

@tboarman: you should contact the author at http://www.rulequest.com/contact.html (Ross Quinlan). I only keep a copy of the source here for my own uses. Thanks.

from c5.0.

tboarman avatar tboarman commented on June 3, 2024

FYI: Contacted Max, the maintainer of C5.0 regarding this issue. It was a memory management/cleanup issue with the C5.0 software. This problem will be fixed by the new C5.0 version 0.1.0-17 soon to be released.

from c5.0.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.