asafschers / scoruby Goto Github PK
View Code? Open in Web Editor NEWRuby Scoring API for PMML
License: MIT License
Ruby Scoring API for PMML
License: MIT License
I've learned a GBM regression tree model in R and tried loading it into ruby using scoruby and seems like it's not supported.
The r2pmml package produces a v4.3 .pmml that does not contain any OutputField
I only see.
<Targets>
<Target field="target" rescaleConstant="4.107037612571389"/>
</Targets>
The pmml package produces a v 4.4 pmml with something like this
<Output>
<OutputField name="Predicted_target" optype="continuous" dataType="double" feature="predictedValue"/>
<OutputField name="GaussianPrediction" feature="transformedValue" dataType="double">
<Apply function="+">
<FieldRef field="Predicted_target"/>
<Constant>4.10703761257139</Constant>
</Apply>
</OutputField>
</Output>
I've looked at the source of scoruby and seems like the logic for detecting gbm needs an OutputField. How can I help to add support for regression GBM?
Great work so far!
The classes in scoruby are not namespaced under Scoruby package and interferes with generic named things in a rails5 project (especially the Features module under test).
Can you move all the top level files under scoruby:
Scoruby does not seem to tolerate the XML generated via sklearn2pmml. So far it's broken at the following steps:
model_factory
@score = xml.attribute('score').to_s
when calling .score(features)
A request to support a CART model (binary split) than a CHAID (multiple branches per split).
Here's an example of a CART model. The issue is that the PMML that is exported doesn't have the the attribute probability
so it's not scoring anything. Is this a non-standard format or could scoruby support this?
Thanks and Happy New Year!
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_3" version="4.3">
<Header>
<Application name="JPMML-SkLearn" version="1.3.6"/>
<Timestamp>2017-12-19T22:11:27Z</Timestamp>
</Header>
<DataDictionary>
<DataField name="y" optype="categorical" dataType="integer">
<Value value="0"/>
<Value value="1"/>
</DataField>
<DataField name="total_nights" optype="continuous" dataType="float"/>
<DataField name="days_to_booking" optype="continuous" dataType="float"/>
<DataField name="ppd" optype="continuous" dataType="float"/>
<DataField name="business_traveler" optype="continuous" dataType="float"/>
<DataField name="percent_distance_avg_price" optype="continuous" dataType="float"/>
<DataField name="distance_to_city_km" optype="continuous" dataType="float"/>
</DataDictionary>
<TreeModel functionName="classification" splitCharacteristic="binarySplit">
<MiningSchema>
<MiningField name="y" usageType="target"/>
<MiningField name="total_nights"/>
<MiningField name="days_to_booking"/>
<MiningField name="ppd"/>
<MiningField name="business_traveler"/>
<MiningField name="percent_distance_avg_price"/>
<MiningField name="distance_to_city_km"/>
</MiningSchema>
<Output>
<OutputField name="probability(0)" optype="continuous" dataType="double" feature="probability" value="0"/>
<OutputField name="probability(1)" optype="continuous" dataType="double" feature="probability" value="1"/>
</Output>
<Node id="1">
<True/>
<Node id="2">
<SimplePredicate field="ppd" operator="lessOrEqual" value="8.461737"/>
<Node id="10" score="0" recordCount="14960.0">
<SimplePredicate field="ppd" operator="lessOrEqual" value="3.2442136"/>
<ScoreDistribution value="0" recordCount="14939.0"/>
<ScoreDistribution value="1" recordCount="21.0"/>
</Node>
<Node id="11" score="0" recordCount="4071.0">
<SimplePredicate field="ppd" operator="greaterThan" value="3.2442136"/>
<ScoreDistribution value="0" recordCount="3525.0"/>
<ScoreDistribution value="1" recordCount="546.0"/>
</Node>
</Node>
<Node id="3">
<SimplePredicate field="ppd" operator="greaterThan" value="8.461737"/>
<Node id="4">
<SimplePredicate field="business_traveler" operator="lessOrEqual" value="0.5"/>
<Node id="6" score="1" recordCount="508.0">
<SimplePredicate field="percent_distance_avg_price" operator="lessOrEqual" value="6.7344605E-5"/>
<ScoreDistribution value="0" recordCount="5.0"/>
<ScoreDistribution value="1" recordCount="503.0"/>
</Node>
<Node id="7">
<SimplePredicate field="percent_distance_avg_price" operator="greaterThan" value="6.7344605E-5"/>
<Node id="8">
<SimplePredicate field="total_nights" operator="lessOrEqual" value="1.5"/>
<Node id="16" score="0" recordCount="486.0">
<SimplePredicate field="ppd" operator="lessOrEqual" value="9.76857"/>
<ScoreDistribution value="0" recordCount="287.0"/>
<ScoreDistribution value="1" recordCount="199.0"/>
</Node>
<Node id="17" score="1" recordCount="2492.0">
<SimplePredicate field="ppd" operator="greaterThan" value="9.76857"/>
<ScoreDistribution value="0" recordCount="892.0"/>
<ScoreDistribution value="1" recordCount="1600.0"/>
</Node>
</Node>
<Node id="9">
<SimplePredicate field="total_nights" operator="greaterThan" value="1.5"/>
<Node id="18" score="1" recordCount="568.0">
<SimplePredicate field="days_to_booking" operator="lessOrEqual" value="6.5"/>
<ScoreDistribution value="0" recordCount="256.0"/>
<ScoreDistribution value="1" recordCount="312.0"/>
</Node>
<Node id="19" score="0" recordCount="3390.0">
<SimplePredicate field="days_to_booking" operator="greaterThan" value="6.5"/>
<ScoreDistribution value="0" recordCount="2230.0"/>
<ScoreDistribution value="1" recordCount="1160.0"/>
</Node>
</Node>
</Node>
</Node>
<Node id="5">
<SimplePredicate field="business_traveler" operator="greaterThan" value="0.5"/>
<Node id="12" score="1" recordCount="454.0">
<SimplePredicate field="percent_distance_avg_price" operator="lessOrEqual" value="0.0038093738"/>
<ScoreDistribution value="0" recordCount="6.0"/>
<ScoreDistribution value="1" recordCount="448.0"/>
</Node>
<Node id="13">
<SimplePredicate field="percent_distance_avg_price" operator="greaterThan" value="0.0038093738"/>
<Node id="14">
<SimplePredicate field="days_to_booking" operator="lessOrEqual" value="15.5"/>
<Node id="20" score="1" recordCount="455.0">
<SimplePredicate field="ppd" operator="lessOrEqual" value="9.816237"/>
<ScoreDistribution value="0" recordCount="187.0"/>
<ScoreDistribution value="1" recordCount="268.0"/>
</Node>
<Node id="21" score="1" recordCount="2409.0">
<SimplePredicate field="ppd" operator="greaterThan" value="9.816237"/>
<ScoreDistribution value="0" recordCount="451.0"/>
<ScoreDistribution value="1" recordCount="1958.0"/>
</Node>
</Node>
<Node id="15">
<SimplePredicate field="days_to_booking" operator="greaterThan" value="15.5"/>
<Node id="22" score="1" recordCount="960.0">
<SimplePredicate field="distance_to_city_km" operator="lessOrEqual" value="4.05"/>
<ScoreDistribution value="0" recordCount="294.0"/>
<ScoreDistribution value="1" recordCount="666.0"/>
</Node>
<Node id="23" score="0" recordCount="730.0">
<SimplePredicate field="distance_to_city_km" operator="greaterThan" value="4.05"/>
<ScoreDistribution value="0" recordCount="377.0"/>
<ScoreDistribution value="1" recordCount="353.0"/>
</Node>
</Node>
</Node>
</Node>
</Node>
</Node>
</TreeModel>
</PMML>
Dear Asaf,
I've recently added your project to our RubyML list: https://github.com/arbox/machine-learning-with-ruby
I wonder if you want to participate in the Ruby for ML network. You could do this in a very simple step by adding the rubyml
topic to your GitHub repository. You may want to spread a word on Twitter or on other media :)
Thank you for the project!
Hey, noticed a small issue (possibly just outdated) with the random forest wiki. The example uses:
random_forest.predict(features)
=> "0"
I think this should be:
random_forest.score(features)
=> {:label=>"0", :score=>0.882}
Hi,
Can we put in a request to add Naive Bayes model commonly used for spam detection?
We noticed you granted a similar request for Decision Trees :)
Cheers!
This project looks great. Can I put in a request to support decision trees? We want to use generated PMML files from large sets of hotel shopping data to score and decide on sort algorithms in real-time. Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.