GithubHelp home page GithubHelp logo

digilog-n / arrow Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 3.99 MB

Sample Java programs and Python scripts for writing/reading Arrow data files

License: Apache License 2.0

Java 87.42% Python 12.30% Shell 0.28%

arrow's Introduction

Arrow

Sample Java programs and Python scripts for writing/reading to Arrow

  1. ArrowPlasmaTestJava: simple Java program which writes Arrow record batches to a Plasma in-memory object store (which must be running at "/tmp/plasma")

  2. ArrowTestJava: simple Java program which writes Arrow record batches to an Arrow file

  3. CT2Arrow: Java program which reads data from a CT source and writes it as a record batch either to an Arrow output file ("*.arrow") or to a Plasma in-memory object store; this was the culmination of JPW's Java/Arrow development in the Phase I project.

  • to build CT2Arrow: ./gradlew build

  • the JAR file is located at CT2Arrow/build/libs

  • usage information is available by executing: java -jar CT2Arrow.jar -help

  • sample execute command; in this case, we read from CT source "PHM08", we trigger off CT channel "unit.i32", we write to Plasma (this is the "-p" option), we display debug information (this is the "-x" option), and we read in a total of 7 CT channels:

java -jar CT2Arrow.jar -s PHM08 -t unit.i32 -p -x -chans "unit.i32,time.i32,op1.f32,op2.f32,op3.f32,sensor01.f32,sensor02.f32"
  • Additional information to run CT2Arrow:
  • Set JAVA_HOME variable to /usr/lib/jvm/jdk-14.0.2; use this Java to run CT2Arrow
  • CTlib.jar is a dependency
  • The Plasma libraries aren't included with the Arrow distribution; need to compile Arrow from source (and specify one or two flags) for these libraries to get built
  • Specifically, according to https://stackoverflow.com/questions/53231052/apache-arrow-plasma-client-cant-connect-to-memory-store-unsatisfiedlinkerror, we need to have the following 3 libraries available on the system path: libplasma.so, libarrow.so, libplasma_java.so; all 3 of these are at /home/john/Apache_Arrow_repository/arrow/cpp/release/release
  • I made changes to .profile to implement the above items as shown below
export JAVA_HOME=/usr/lib/jvm/jdk-14.0.2
# To compile CT applications
export ctdev=/home/john/CTappsV1_1
# Add Arrow libraries to path
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/john/Apache_Arrow_repository/arrow/cpp/release/release
  1. OBD2Arrow: Java program which reads OBD data from an input file and write it out as Arrow record batches to an Arrow file
  1. PHM08_to_Plasma: Java program which reads data from a PHM08 input file out to Apache Plasma in-memory object store
  1. SamplePythonScripts:
  • read_arrow_test_file.py: Python script which reads Arrow data from a file; can use the "test.arrow" file contained in this same folder as an input file (this is the Arrow file written out by our sample "ArrowTestJava" application)

  • read_from_arrow_plasma.py: Python script which reads objects from Plasma memory store (located at "/tmp/plasma"); this program works along with the "ArrowPlasmaTestJava" Java test program (which writes data to Plasma).

  • read_OBD.py: Python script which demonstrates reading from an Arrow file; will read data written out by the Java "OBD2Arrow" application.

  • read_PHM08_from_plasma.py: Python script for reading record batches of PHM08 data from Apache Plasma in-memory data store; works with PHM08 data that has been written to Plasma by the Java program "CT2Arrow".

  • read_PHM08_from_plasma_OLD.py: Python script for reading record batches of PHM08 data from Apache Plasma in-memory data store; works with PHM08 data that has been written to Plasma by the Java program "PHM08_to_Plasma".

  • write_and_read_example.py: Python script which demonstrates simple example of writing data out to an Arrow file and reading it back in.

  • write_and_read_plasma_example.py: Python script which demonstrates simple example of writing an Arrow record batch to a Plasma in-memory object store and then reading it back out from Plasma.

A few notes on using the Plasma in-memory object store

  1. Plasma is only supported on Mac and Linux

  2. Must use Python version 3.5+

  3. Install PyArrow (https://arrow.apache.org/docs/python/install.html) e.g. pip install pyarrow

  4. Start up a Plasma store e.g. plasma_store -m 1000000000 -s /tmp/plasma

arrow's People

Contributors

jpw-erigo avatar

Watchers

James Cloos avatar Charles Cowart avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.