GithubHelp home page GithubHelp logo

chart-parser's Introduction

Chart Parser

Parses horse racing result charts into JSON/CSV/Java...

Build Status

TL;DR

When given an Equibase result chart PDF file e.g.

sample-chart-img

chart-parser can turn it into machine-readable formats, like JSON, e.g.

sample-json

or CSV, e.g.

sample-csv

or even to be used as code in an SDK:

sample-code

Highlights

  • The entire PDF is parsed; everything you see in the chart can be used, including:

    • the race conditions and restrictions
    • lengths ahead/behind at each point of call
    • fractional times
    • wagering payoffs, pools, and carryovers
    • footnotes etc.
  • Full race card PDFs containing multiple races (including those spread over multiple pages) can be parsed.

  • An SDK comes out-of-the-box that supports full serialization to and from a JSON API.

  • Textual descriptions of race distances are converted to feet e.g. "Six Furlongs" becomes 3,960.

  • Values for lengths ahead/behind are converted to decimal formats.

  • The software adds additional features, including:

    • attempting to lookup the last-raced track details and linking to it
    • calculating estimated individual fractional and splits at each fraction for each starter in a race.
    • outlining each medication and equipment used
    • providing a normalized "X-to-1" odds determination for all wagering payoffs
    • displaying the day- of-the-week and -of-the-year that a race took place
  • Thoroughbred, Quarter Horse, Arabian and Mixed breed races are all supported.

  • The software handles edge-case scenarios such as dead-heats, walkovers, non-betting races, disqualifications (including adjusting final winning positions), cancellations, claiming price information etc.

How it works

PDFs are parsed using the Apache PDFBox library.

For a given PDF file, each character present is written as pipe-delimited String that notes its x-y coordinates, height, width, scale, font-size, and unicode value within a page of the PDF.

This is done using ChartStripper, a customized PDFTextStripper instance.

For each pipe-delimited String representing a character within the PDF, it is converted to a custom POJO, ChartCharacter, using the CSV Jackson data format.

The list of ChartCharacters is then further grouped by the line of text it is present on within the PDF.

Each line of text within the PDF is then tested against a series of regex matchers to identify which parts of the race domain model it represents. When matched, the information is parsed and used to create an instance of RaceResult, following the Builder pattern.

See ChartParser#parse() for more.

How to use

Chart Parser is available in the Maven Central repository:

<dependency>
    <groupId>com.robinhowlett</groupId>
    <artifactId>chart-parser</artifactId>
    <version>1.2.0.RELEASE</version>
</dependency>

Parsing a PDF file is simple and can be done in one-line e.g.:

List<RaceResult> raceResults = ChartParser.create().parse(Paths.get("ARP_2016-07-24_race-charts.pdf").toFile());

// print the winning margins
raceResults.stream()
        .flatMap(raceResult -> raceResult.getStarters().stream())
        .filter(Starter::isWinner)
        .forEach(starter -> System.out.println(
                String.format("%-20s: %10s",
                        starter.getHorse().getName(),
                        starter.getFinishPointOfCall().getRelativePosition().getLengthsAhead().getText())
        ));

// console output
Back Stop           :      1 1/2
Cowboy Cliff        :      9 1/2
Perkin Desire       :      1 3/4
Fast as Thunder     :        1/2
Takin the Blame     :      7 1/4
Acme Rocket         :      1 1/4
Magical Twist       :      3 3/4
Lady Jila           :       Neck
Prater Sixty Four   :      3 1/4

Handycapper is provided as a sample application to parse and convert PDF charts:

UI

Compiling

IMPORTANT: This project relies on enabling the Java 8 method parameter reflection feature (-parameters) in your JVM settings e.g.

intellij-settings

chart-parser is a Maven-based Java open-source project. Running mvn clean install will compile the code, run all tests, and install the built artificat to the local repository.

Notes

This software is open-source and released under the MIT License.

This project contains a single sample Equibase PDF chart included for testing, educational and demonstration purposes only.

It is recommended users of this software be aware of the conditions on the PDF charts that may apply.

chart-parser's People

Contributors

robinhowlett avatar thausherr avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.