GithubHelp home page GithubHelp logo

dwinkler1 / jsonlines.jl Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 0.0 244 KB

JSONLines reader/writer for Julia

Home Page: https://danielw2904.github.io/JSONLines.jl/stable/

License: MIT License

Julia 78.88% Jupyter Notebook 21.12%
jsonlines julia-package julialang julia julia-language

jsonlines.jl's Introduction

JSONLines

Stable Dev Build Status

A simple package to read (parts of) a JSON Lines files. The main purpose is to read files that are larger than memory. The two main functions are LineIndex and LineIterator which return an index of the rows in the given file and an iterator over the file, respectively. The LineIndex is Tables.jl compatible and can directly be piped into e.g. a DataFrame if every row in the result has the same schema (i.e. the same variables). See also materialize and columnwise. It allows memory-efficient loading of rows of a JSON Lines file. In order to select the rows skip and nrows can be used to index nrows rows after skipping skip rows. The file is mmaped and only the required rows are loaded into RAM. Files must contain a valid JSON object (denoted by {"String1":ELEMENT1, "String2":ELEMENT2, ...}) on each line. JSON parsing is done using the JSON3.jl package. Lines can be separated by \n or \r\n and some whitespace characters are allowed at the beginning of a line before the JSON object and the newline character (basically all that can be represented as a single UInt8). Typically a file would look like this:

{"name":"Daniel","organization":"IMSM"}
{"name":"Peter","organization":"StatMath"}

There is experimental support for JSON Arrays on each line where the first line after skip contains the names of the columns.

["name", "organization"]
["Daniel", "IMSM"]
["Peter", "StatMath]

This should work but is not tested thoroughly. Please report any usecase that is not working.

Getting Started

(@v1.5) pkg> add JSONLines

jsonlines.jl's People

Contributors

dwinkler1 avatar github-actions[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

jsonlines.jl's Issues

Read jsonlines from stream

I have a compressed archive of jsonl files and I'd like to load one of those files straight from the compressed archive into jsonlines for reading, but jsonlines seemingly only expects to read from a file on disk. Can it read from an iterator/stream?

LineIterator fails with ERROR: MethodError: Cannot `convert` an object of type

Hi,
I'm getting the following error when trying to read a JSONLines' file:

julia> li = LineIndex("sunhotels-xmlapitraces.json")
ERROR: MethodError: Cannot convert an object of type
Type{Union{Nothing, String}} to an object of type
Union{DataType, UnionAll}
Closest candidates are:
convert(::Type{S}, ::T) where {S, T<:CategoricalArrays.CategoricalValue} at /home/gallir/.julia/packages/CategoricalArrays/0ZAbp/src/value.jl:68
convert(::Type{T}, ::T) where T at essentials.jl:171
Stacktrace:
[1] setindex!(::OrderedCollections.OrderedDict{Symbol,Union{DataType, UnionAll}}, ::Type{T} where T, ::Symbol) at /home/gallir/.julia/packages/OrderedCollections/BvIBz/src/ordered_dict.jl:291
[2] LineIndex(::Array{UInt8,1}, ::Int64, ::Int64, ::Int64, ::Nothing, ::UnitRange{Int64}, ::Int64) at /home/gallir/.julia/packages/JSONLines/a4kId/src/LineIndex.jl:51
[3] #LineIndex#6 at /home/gallir/.julia/packages/JSONLines/a4kId/src/LineIndex.jl:76 [inlined]
[4] LineIndex(::String) at /home/gallir/.julia/packages/JSONLines/a4kId/src/LineIndex.jl:76
[5] top-level scope at REPL[2]:1

Find attached sample of the first json file: json_sample.txt

LineIterator works, wbut it's very slow for reading jsons into a dataframe.
julia> lit = LineIterator("sunhotels-xmlapitraces.json")
LineIterator(UInt8[0x7b, 0x22, 0x53, 0x74, 0x61, 0x72, 0x74, 0x54, 0x69, 0x6d … 0x79, 0x22, 0x3a, 0x6e, 0x75, 0x6c, 0x6c, 0x7d, 0x0d, 0x0a], 1, 108104445, nothing)
julia> lit[1]
┌ Warning: Indexin LineIterators is slow. Consider using LineIndex instead.
└ @ JSONLines ~/.julia/packages/JSONLines/a4kId/src/LineIterator.jl:43
JSON3.Object{SubArray{UInt8,1,Array{UInt8,1},Tuple{UnitRange{Int64}},true},Array{UInt64,1}} with 32 entries:
:StartTime => 1602806403
:StopTime => 1602806403
:ResponseTimeMillis => 24
:HotelsToSearch => 1
:HotelsReturned => 1
:ReturnedHotels => [14997]
:RoomsReturned => 8
:BrandID => 15
:AgentID => 28554
:AgentImpersonatingID => -1
:NumberOfRooms => 1
:NumberOfAdults => 2
:NumberOfChildren => 0
:NumberOfInfants => 0
:B2C => true
:HasLiveSearches => false
:AgentIsTestUser => false
:CustomerCountry => "1"
:UserHostName => "expedia-search.sunhotels.net"
:UserIP => "34.216.110.57, 10.190.190.193"
:Language => "en"
:CheckInDate => "2020-11-08T00:00:00.000"
:CheckOutDate => "2020-11-13T00:00:00.000"
:SearchedDestinations => [15]
:SearchedHotels => [14997]
:SearchedResorts => Union{}[]
:SearchedDestinationId => nothing
:Server => "WIN-AESG5RM48VS"
:DataCenter => "aws-eu-west-1"
:ErrorDetails => nothing
:HasErrors => false
:SearchKey => nothing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.