GithubHelp home page GithubHelp logo

mlsastools's Introduction

MarkLogic SAS Tools

The idea here is to store SAS7BDAT file content in MarkLogic, as JSON documents, using the row-as-document convention. This Java library provides functions to convert SAS data sets to objects, enabling easy ingest to MarkLogic, which means either:

  • an ArrayList of Jackson ObjectNodes (JSON objects), with each member representing a row of the original data set, or
  • a Jackson ArrayNode (a JSON array), with each member an ObjectNode representing a row of the original data set

Which method you use depends on the constraints of your project.

Additionally, you can extract the properties of a given SAS7BDAT file to a Jackson object, so it's easy to store that data in MarkLogic as well.

The dirty work is done by the epam/parso library, which turns SAS binary data into usable Java data. "Sastools" (this project), just takes parso's output and marshalls it into Jackson.

Context

You bring your own:

  • SAS files
  • MarkLogic client e.g.:
    • The REST API
    • MarkLogic's DatabaseClient
    • Data Services
  • row naming strategy (e.g. row numbers, content hashes, etc.)

Usage

To use the library, instantiate a SasFileReaderJson on an InputStream from a SAS7BDAT file:

SasFileReaderJson sas = new SasFileReaderJson(sasInputStream);

Then you can use readDataSetToObjectArray(), readDataSetToArrayNode(), and readPropertiesToObject() to prepare the data set and/or content for ingest to MarkLogic.

ArrayList of Jackson objects: readDataSetToObjectArray

If you will do your looping in Java, it's probably easiest to use this method

FileInputStream fis = new FileInputStream("/path/to/sasfile.sas7bdat");
SasFileReaderJson sas = new SasFileReaderJson(fis);
List<ObjectNode> oList = sas.readDataSetToObjectArray();

With that list, and an existing jsonDocumentManager from your DatabaseClient, you'd probably do something like this to insert all the JSON data:

for (ObjectNode o : oList) {
  JacksonHandle jacksonHandle = new JacksonHandle(o);
  jsonDocumentManager.write("/" + filename + UUID.randomUUID().toString() + ".json", jacksonHandle);
}

(note that you might want something more useful than random strings for your document URIs)

(Jackson) ArrayNode of Jackson objects: readDataSetToArrayNode

If you will send the entire data set contents to MarkLogic in a single operation—e.g. with Data Services handling the writes on the server side—this method is preferable.

FileInputStream fis = new FileInputStream("/path/to/sasfile.sas7bdat");
SasFileReaderJson sas = new SasFileReaderJson(fis);
ArrayNode arrayNode = sas.readDataSetToArrayNode();

Then, for example, if you have a Data Services interface, you'd call that for the insert:

dataservicesInterface.myIngestApiFunction(arrayNode)

And then there'd be some server-side logic to write the arrayNode members to MarkLogic:

var arrayNode; // instance of ArrayNode from Data Services client

let lngth = arrayNode.length;
let padding = lngth.toString().length;

for (r = 0; r < lngth; r++) {
  let row = rows[r];
  let rowIdx = r.toString().padStart(padding, '0');
  let rowUri = fn.stringJoin([prefix, filename, "row" + rowIdx + ".json"], "/");
  xdmp.documentInsert(rowUri, row);
}

SAS file properties to a Jackson object: readPropertiesToObject

This one's pretty simple. It reads the SAS7BDAT file properties (not the data set) to an ObjectNode, which can then be written to MarkLogic as a flat JSON document:

ObjectNode properties = sasFileReaderJson.readPropertiesToObject();

You'd then want to store this somwhere near the data set, to keep track of associated metadata for those rows.

mlsastools's People

Contributors

mikrovvelle avatar

Watchers

 avatar Mark Ferneau avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.