GithubHelp home page GithubHelp logo

azrazalea-debtbook / creek Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pythonicrubyist/creek

0.0 0.0 0.0 438 KB

Ruby library for parsing large Excel files.

Home Page: http://rubygems.org/gems/creek

License: MIT License

Ruby 100.00%

creek's Introduction

version downloads

Creek - Stream parser for large Excel (xlsx and xlsm) files.

Creek is a Ruby gem that provides a fast, simple and efficient method of parsing large Excel (xlsx and xlsm) files.

Installation

Creek can be used from the command line or as part of a Ruby web framework. To install the gem using terminal, run the following command:

gem install creek

To use it in Rails, add this line to your Gemfile:

gem 'creek'

Basic Usage

Creek can simply parse an Excel file by looping through the rows enumerator:

require 'creek'
creek = Creek::Book.new 'spec/fixtures/sample.xlsx'
sheet = creek.sheets[0]

sheet.rows.each do |row|
  puts row # => {"A1"=>"Content 1", "B1"=>nil, "C1"=>nil, "D1"=>"Content 3"}
end

sheet.simple_rows.each do |row|
  puts row # => {"A"=>"Content 1", "B"=>nil, "C"=>nil, "D"=>"Content 3"}
end

sheet.rows_with_meta_data.each do |row|
  puts row # => {"collapsed"=>"false", "customFormat"=>"false", "customHeight"=>"true", "hidden"=>"false", "ht"=>"12.1", "outlineLevel"=>"0", "r"=>"1", "cells"=>{"A1"=>"Content 1", "B1"=>nil, "C1"=>nil, "D1"=>"Content 3"}}
end

sheet.simple_rows_with_meta_data.each do |row|
  puts row # => {"collapsed"=>"false", "customFormat"=>"false", "customHeight"=>"true", "hidden"=>"false", "ht"=>"12.1", "outlineLevel"=>"0", "r"=>"1", "cells"=>{"A"=>"Content 1", "B"=>nil, "C"=>nil, "D"=>"Content 3"}}
end

sheet.state   # => 'visible'
sheet.name    # => 'Sheet1'
sheet.rid     # => 'rId2'

Filename considerations

By default, Creek will ensure that the file extension is either *.xlsx or *.xlsm, but this check can be circumvented as needed:

path = 'sample-as-zip.zip'
Creek::Book.new path, :check_file_extension => false

By default, the Rails file_field_tag uploads to a temporary location and stores the original filename with the StringIO object. (See this section of the Rails Guides for more information.)

Creek can parse this directly without the need for file upload gems such as Carrierwave or Paperclip by passing the original filename as an option:

# Import endpoint in Rails controller
def import
  file = params[:file]
  Creek::Book.new file.path, check_file_extension: false
end

Parsing images

Creek does not parse images by default. If you want to parse the images, use with_images method before iterating over rows to preload images information. If you don't call this method, Creek will not return images anywhere.

Cells with images will be an array of Pathname objects. If an image is spread across multiple cells, same Pathname object will be returned for each cell.

sheet.with_images.rows.each do |row|
  puts row # => {"A1"=>[#<Pathname:/var/folders/ck/l64nmm3d4k75pvxr03ndk1tm0000gn/T/creek__drawing20161101-53599-274q0vimage1.jpeg>], "B2"=>"Fluffy"}
end

Images for a specific cell can be obtained with images_at method:

puts sheet.images_at('A1') # => [#<Pathname:/var/folders/ck/l64nmm3d4k75pvxr03ndk1tm0000gn/T/creek__drawing20161101-53599-274q0vimage1.jpeg>]

# no images in a cell
puts sheet.images_at('C1') # => nil

Creek will most likely return nil for a cell with images if there is no other text cell in that row - you can use images_at method for retrieving images in that cell.

Remote files

remote_url = 'http://dev-builds.libreoffice.org/tmp/test.xlsx'
Creek::Book.new remote_url, remote: true

Mapping cells with header names

By default, Creek will map cell names with letter and number(A1, B3 and etc). To be able to get cell values by header column name use with_headers (can be used only with #simple_rows method!!!) during creation (Note: header column is first string of sheet)

creek = Creek::Book.new file.path, with_headers: true

Contributing

Contributions are welcomed. You can fork a repository, add your code changes to the forked branch, ensure all existing unit tests pass, create new unit tests which cover your new changes and finally create a pull request.

After forking and then cloning the repository locally, install the Bundler and then use it to install the development gem dependencies:

gem install bundler
bundle install

Once this is complete, you should be able to run the test suite:

rake

There are some remote tests that are excluded by default. To run those, run

bundle exec rspec --tag remote

Bug Reporting

Please use the Issues page to report bugs or suggest new enhancements.

License

Creek has been published under MIT License

creek's People

Contributors

pythonicrubyist avatar mindreframer avatar melcha avatar davich avatar juniljacob avatar kamilhism avatar dpsk avatar aralox avatar bschmeck avatar bgentry avatar flyingmachine avatar demimismo avatar dbernheisel avatar harrykiselev avatar jarredholman avatar maciejmajewski avatar mateusmedeiros avatar mdemare avatar maland avatar weilandia avatar pdenya avatar pavitkaur05 avatar petergoldstein avatar ppostma avatar rhammam1 avatar radar avatar thomassevestre avatar westonganger avatar y-f-u avatar kruzewski avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.