GithubHelp home page GithubHelp logo

donaldsawyer / pysparkjavaudfexample Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 12 KB

Example of wrapping a java function in a java spark udf, then calling from pyspark

License: Apache License 2.0

Java 89.18% Python 10.82%

pysparkjavaudfexample's Introduction

PysparkJavaUdfExample

Example of wrapping a java function in a java spark udf, then calling from pyspark

Purpose

The purpose was to demonstrate the use case of calling a java library in pyspark through a UDF.

  1. Start with Java function (triple()) in a library (Multiples)
  2. Wrap the Java function in a Java Spark UDF (TripleUdf)
  3. Call the UDF from pyspark.

Steps to Build and Run

This is a quickly-built POC, so it's not ready for repeatable execution. Follow the steps below to run the example and see how the functionality works.

  1. Build and install the Multiples library as a jar. mvn package install
  2. Multiples is referenced by the UDF in the UDF's pom.xml, so mvn install is required.
    <dependency>
      <groupId>org.example.functions</groupId>
      <artifactId>Multiples</artifactId>
      <version>1.0-SNAPSHOT</version>
    </dependency>
  3. Build the MultipleUdf project, which will create a fat jar in the target directory, MultipleUdf-1.0-SNAPSHOT.jar.
  4. Open up the pyspark shell , referencing the additional jarin the classpath. pyspark --jars /Users/donaldsawyer/git/PysparkJavaUdfExample/MultipleUdf/target/MultipleUdf-1.0-SNAPSHOT.jar`
  5. Run the pyspark commands that are in executePythonJavaUdf.py
     from pyspark.sql import functions as F
     from pyspark.sql.types import DoubleType
     	
     spark.udf.registerJavaFunction("triple", "TripleUdf")
     df = spark.createDataFrame([0.0, 4.111, -4.5], DoubleType()).toDF("value")
     df.withColumn("tripled", F.expr("triple(value)")).show()

pysparkjavaudfexample's People

Contributors

donaldsawyer avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.