GithubHelp home page GithubHelp logo

asafschers / scoruby Goto Github PK

View Code? Open in Web Editor NEW
68.0 3.0 12.0 2.45 MB

Ruby Scoring API for PMML

License: MIT License

Ruby 99.74% Shell 0.26%
ruby-gem pmml random-forest classification rubyml machine-learning gradient-boosting-classifier gbm ruby gradient-boosted-models

scoruby's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

scoruby's Issues

Support for GBM regression

I've learned a GBM regression tree model in R and tried loading it into ruby using scoruby and seems like it's not supported.

The r2pmml package produces a v4.3 .pmml that does not contain any OutputField I only see.

<Targets>
  <Target field="target" rescaleConstant="4.107037612571389"/>
</Targets>

The pmml package produces a v 4.4 pmml with something like this

  <Output>
   <OutputField name="Predicted_target" optype="continuous" dataType="double" feature="predictedValue"/>
   <OutputField name="GaussianPrediction" feature="transformedValue" dataType="double">
    <Apply function="+">
     <FieldRef field="Predicted_target"/>
     <Constant>4.10703761257139</Constant>
    </Apply>
   </OutputField>
  </Output>

I've looked at the source of scoruby and seems like the logic for detecting gbm needs an OutputField. How can I help to add support for regression GBM?

Great work so far!

Namespace the files under lib directory

The classes in scoruby are not namespaced under Scoruby package and interferes with generic named things in a rails5 project (especially the Features module under test).

Can you move all the top level files under scoruby:

  • decision.rb
  • features.rb
  • models_factory.rb
  • node.rb
  • predicate_factory.rb

Support for sklearn2pmml generated PMML

Scoruby does not seem to tolerate the XML generated via sklearn2pmml. So far it's broken at the following steps:

  1. modelName check in model_factory
  2. @score = xml.attribute('score').to_s when calling .score(features)

Support for CART (binary splits) vs CHAID

A request to support a CART model (binary split) than a CHAID (multiple branches per split).

Here's an example of a CART model. The issue is that the PMML that is exported doesn't have the the attribute probability so it's not scoring anything. Is this a non-standard format or could scoruby support this?

Thanks and Happy New Year!

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_3" version="4.3">
	<Header>
		<Application name="JPMML-SkLearn" version="1.3.6"/>
		<Timestamp>2017-12-19T22:11:27Z</Timestamp>
	</Header>
	<DataDictionary>
		<DataField name="y" optype="categorical" dataType="integer">
			<Value value="0"/>
			<Value value="1"/>
		</DataField>
		<DataField name="total_nights" optype="continuous" dataType="float"/>
		<DataField name="days_to_booking" optype="continuous" dataType="float"/>
		<DataField name="ppd" optype="continuous" dataType="float"/>
		<DataField name="business_traveler" optype="continuous" dataType="float"/>
		<DataField name="percent_distance_avg_price" optype="continuous" dataType="float"/>
		<DataField name="distance_to_city_km" optype="continuous" dataType="float"/>
	</DataDictionary>
	<TreeModel functionName="classification" splitCharacteristic="binarySplit">
		<MiningSchema>
			<MiningField name="y" usageType="target"/>
			<MiningField name="total_nights"/>
			<MiningField name="days_to_booking"/>
			<MiningField name="ppd"/>
			<MiningField name="business_traveler"/>
			<MiningField name="percent_distance_avg_price"/>
			<MiningField name="distance_to_city_km"/>
		</MiningSchema>
		<Output>
			<OutputField name="probability(0)" optype="continuous" dataType="double" feature="probability" value="0"/>
			<OutputField name="probability(1)" optype="continuous" dataType="double" feature="probability" value="1"/>
		</Output>
		<Node id="1">
			<True/>
			<Node id="2">
				<SimplePredicate field="ppd" operator="lessOrEqual" value="8.461737"/>
				<Node id="10" score="0" recordCount="14960.0">
					<SimplePredicate field="ppd" operator="lessOrEqual" value="3.2442136"/>
					<ScoreDistribution value="0" recordCount="14939.0"/>
					<ScoreDistribution value="1" recordCount="21.0"/>
				</Node>
				<Node id="11" score="0" recordCount="4071.0">
					<SimplePredicate field="ppd" operator="greaterThan" value="3.2442136"/>
					<ScoreDistribution value="0" recordCount="3525.0"/>
					<ScoreDistribution value="1" recordCount="546.0"/>
				</Node>
			</Node>
			<Node id="3">
				<SimplePredicate field="ppd" operator="greaterThan" value="8.461737"/>
				<Node id="4">
					<SimplePredicate field="business_traveler" operator="lessOrEqual" value="0.5"/>
					<Node id="6" score="1" recordCount="508.0">
						<SimplePredicate field="percent_distance_avg_price" operator="lessOrEqual" value="6.7344605E-5"/>
						<ScoreDistribution value="0" recordCount="5.0"/>
						<ScoreDistribution value="1" recordCount="503.0"/>
					</Node>
					<Node id="7">
						<SimplePredicate field="percent_distance_avg_price" operator="greaterThan" value="6.7344605E-5"/>
						<Node id="8">
							<SimplePredicate field="total_nights" operator="lessOrEqual" value="1.5"/>
							<Node id="16" score="0" recordCount="486.0">
								<SimplePredicate field="ppd" operator="lessOrEqual" value="9.76857"/>
								<ScoreDistribution value="0" recordCount="287.0"/>
								<ScoreDistribution value="1" recordCount="199.0"/>
							</Node>
							<Node id="17" score="1" recordCount="2492.0">
								<SimplePredicate field="ppd" operator="greaterThan" value="9.76857"/>
								<ScoreDistribution value="0" recordCount="892.0"/>
								<ScoreDistribution value="1" recordCount="1600.0"/>
							</Node>
						</Node>
						<Node id="9">
							<SimplePredicate field="total_nights" operator="greaterThan" value="1.5"/>
							<Node id="18" score="1" recordCount="568.0">
								<SimplePredicate field="days_to_booking" operator="lessOrEqual" value="6.5"/>
								<ScoreDistribution value="0" recordCount="256.0"/>
								<ScoreDistribution value="1" recordCount="312.0"/>
							</Node>
							<Node id="19" score="0" recordCount="3390.0">
								<SimplePredicate field="days_to_booking" operator="greaterThan" value="6.5"/>
								<ScoreDistribution value="0" recordCount="2230.0"/>
								<ScoreDistribution value="1" recordCount="1160.0"/>
							</Node>
						</Node>
					</Node>
				</Node>
				<Node id="5">
					<SimplePredicate field="business_traveler" operator="greaterThan" value="0.5"/>
					<Node id="12" score="1" recordCount="454.0">
						<SimplePredicate field="percent_distance_avg_price" operator="lessOrEqual" value="0.0038093738"/>
						<ScoreDistribution value="0" recordCount="6.0"/>
						<ScoreDistribution value="1" recordCount="448.0"/>
					</Node>
					<Node id="13">
						<SimplePredicate field="percent_distance_avg_price" operator="greaterThan" value="0.0038093738"/>
						<Node id="14">
							<SimplePredicate field="days_to_booking" operator="lessOrEqual" value="15.5"/>
							<Node id="20" score="1" recordCount="455.0">
								<SimplePredicate field="ppd" operator="lessOrEqual" value="9.816237"/>
								<ScoreDistribution value="0" recordCount="187.0"/>
								<ScoreDistribution value="1" recordCount="268.0"/>
							</Node>
							<Node id="21" score="1" recordCount="2409.0">
								<SimplePredicate field="ppd" operator="greaterThan" value="9.816237"/>
								<ScoreDistribution value="0" recordCount="451.0"/>
								<ScoreDistribution value="1" recordCount="1958.0"/>
							</Node>
						</Node>
						<Node id="15">
							<SimplePredicate field="days_to_booking" operator="greaterThan" value="15.5"/>
							<Node id="22" score="1" recordCount="960.0">
								<SimplePredicate field="distance_to_city_km" operator="lessOrEqual" value="4.05"/>
								<ScoreDistribution value="0" recordCount="294.0"/>
								<ScoreDistribution value="1" recordCount="666.0"/>
							</Node>
							<Node id="23" score="0" recordCount="730.0">
								<SimplePredicate field="distance_to_city_km" operator="greaterThan" value="4.05"/>
								<ScoreDistribution value="0" recordCount="377.0"/>
								<ScoreDistribution value="1" recordCount="353.0"/>
							</Node>
						</Node>
					</Node>
				</Node>
			</Node>
		</Node>
	</TreeModel>
</PMML>

RubyML list

Dear Asaf,

I've recently added your project to our RubyML list: https://github.com/arbox/machine-learning-with-ruby

I wonder if you want to participate in the Ruby for ML network. You could do this in a very simple step by adding the rubyml topic to your GitHub repository. You may want to spread a word on Twitter or on other media :)

Thank you for the project!

Small wiki update

Hey, noticed a small issue (possibly just outdated) with the random forest wiki. The example uses:

random_forest.predict(features)

=> "0"

I think this should be:

random_forest.score(features)

=> {:label=>"0", :score=>0.882}

Request to add Naive Bayes model

Hi,

Can we put in a request to add Naive Bayes model commonly used for spam detection?
We noticed you granted a similar request for Decision Trees :)

Cheers!

Decision Trees

This project looks great. Can I put in a request to support decision trees? We want to use generated PMML files from large sets of hotel shopping data to score and decide on sort algorithms in real-time. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.