beatrichartz / csv Goto Github PK

View Code? Open in Web Editor NEW

492.0 492.0 91.0 402 KB

CSV Decoding and Encoding for Elixir

License: MIT License

Elixir 91.30% HTML 8.37% Shell 0.33%

csv decoder decoding elixir encoder encoding hex parser parsing rfc-4180 stream

csv's People

Contributors

Stargazers

Watchers

Forkers

jamesbee joe-noh c-rack mykook me lowks ybur-yug dnnx rozap tomjoro fadsel matthewlehner kiliancs merlin-p barruumrex emerleite moxley aerosol rossjones cgarciae leakybucket wub niahoo lawrenceue orikremer raphsfeir tdcain89 robmckinnon securingsincity iboard aptinio harrisonl kenta-s aeturnumfluctus t0ha zacharydenton namjae jlgeering princemaple malcolmstill seejohnplay blake-education deepfryed motionless archivist062 renesugar raksonibs shoeheart anthonyfalzetti kernelmadness jangojam sgml michalstoinski jgmchan alex88 alexfreska lasseebert augnustin lustrousgorilla staylorwr y86 tylertemp parkerduckworth vad2der kim-company kianmeng domi20005 route-now mreyk jshmrtn cyberflamego agsilvajonathan al2o3cr artur64 mindful-care igorsegallafa thbar rspeicher ryvasquez hidnasio milmazz rgomide capitalist42 8tincdoorgu 1sosubenpe taylor-redden-papa 23skidoo salehwizard2

csv's Issues

Unexpected token error

I am getting this error:

when parsing this:

 Message
"prof·li·gate\ˈprä-fli-gət, -ˌgāt\
adjective
: carelessly and foolishly wasting money, materials, etc. : very wasteful
Full Definition
1 : wildly extravagant <profligate spending>
2 : completely given up to dissipation and licentiousness <leading a profligate life>
prof·li·gate·ly adverb
Origin: Latin profligatus, from past participle of profligare to strike down, from pro- forward, down + -fligare (akin to fligere to strike); akin to Greek phlibein to squeeze.
First use: 1617
Synonyms: extravagant, high-rolling, prodigal, spendthrift, squandering, thriftless, unthrifty, wasteful
Antonyms: conserving, economical, economizing, frugal, penny-pinching, scrimping, skimping, thrifty
Synonyms: fritterer, high roller, prodigal, spender, spendthrift, squanderer, waster, wastrel
Antonyms: economizer, penny-pincher
2
prof·li·gate\ˈprä-fli-gət, -ˌgāt\
noun
: a person given to wildly extravagant and usually grossly self-indulgent expenditure
Origin: (see 1profligate ).
First use: 1709
Synonyms: extravagant, high-rolling, prodigal, spendthrift, squandering, thriftless, unthrifty, wasteful
Antonyms: conserving, economical, economizing, frugal, penny-pinching, scrimping, skimping, thrifty
Synonyms: fritterer, high roller, prodigal, spender, spendthrift, squanderer, waster, wastrel
Antonyms: economizer, penny-pincher"
"46984364136: Beamish
adjective

and I am getting this error:

%SyntaxError{description: "unexpected token: \":\" (column 61, codepoint U+003A)",
 file: "nofile", line: 4}

and this is my code:

defmodule SpellingListParser do
  File.stream!("SpellingList.csv") |> CSV.decode :headers: true |> Enum.map fn row ->
    Enum.each(row, IO.puts)
  end
end

Running in production/release slow

Hi I'm having problems that

full_rows |> CSV.encode |> Enum.each(&IO.write(file, &1))
is realy slow, and won't write the whole data to the file.

It works very well when I'm running mix phenix.server, but as release it's not working well.

I have consolidate_protocols: true in mix.exs

Any idea what it might be?

Decoder uses number of schedulers at compile time, rather than runtime

https://github.com/beatrichartz/csv/blob/master/lib/csv/decoder.ex#L13

This tells the decoder to use the number of schedulers that were present at compile time. This is irrelevant to runtime. If I were to roll a release from a small machine and deploy on a large machine, it wouldn't use the resources as well as you're trying to.

It's not hugely important, but it struck me as incorrect. Hope this isn't a bothersome thing to bring up! :)

Ignore lines with errors

Is there a way I could ignore lines that have invalid encoding? Or at the very least, know what this invalid encoding is? All I currently get is:

** (CSV.Lexer.EncodingError) Invalid encoding on line 3642
             lib/csv/decoder.ex:161: CSV.Decoder.handle_error_for_result!/1
    (elixir) lib/stream.ex:454: anonymous fn/4 in Stream.map/2
    (elixir) lib/enum.ex:2744: Enumerable.List.reduce/3
    (elixir) lib/stream.ex:732: Stream.do_list_transform/9
    (elixir) lib/stream.ex:1247: Enumerable.Stream.do_each/4
    (elixir) lib/enum.ex:1477: Enum.reduce/3
    (elixir) lib/enum.ex:609: Enum.each/2

I'm working with a very large file, so it's pretty hard to pick out a line that might be encoded incorrectly.

Ability to specify header values and ignore headers in the csv file

Thanks for the CSV library!

In order to use the Map parsed from each row with Ecto, I need to define the headers as atoms that exactly match my model:

File.stream!("/Users/wsmoak/Downloads/transactions.csv")
  |> CSV.decode(headers: [:date, :description, :original_description, :amount, :transaction_type, :category, :account_name, :labels, :notes])

When I do this, the first line of the file that contains the headers is picked up as a row of values, which I don't want.

Is there a way to skip that first row the way headers: false would, and define the values for the headers for the map?

Would having two different attributes work better? Maybe :headers could be true/false only, to say whether the csv file contains headers, and then a separate attribute could be used to optionally specify the header values to use?

(My workaround is to define a function that matches on one of the things I know will be in the Map for that first unwanted record, and ignore it: def store_it(%{:date => "Date"}) do ... end )

Performance Issues

I have a tab delimited file that's ~2.6GB. I'm attempting to do the following in iex but it never completes:

file = File.open!("output.csv", [:write])
File.stream!("input.csv")
  |> CSV.Decoder.decode(separator: ?\t)
  |> CSV.encode 
  |> Enum.each(&(IO.write(file, &1)))

I thought this was because of the number of rows but even if I do Enum.take(700) it doesn't complete. If I only take 500 however it completes almost immediately. Any idea on what's going on or know what I could do to debug this? I'm using Elixir 1.3.0

Question (ErlangError) erlang error: :no_translation

Hi, thanks for this project 👍 . Has been a big help for beginners like me.I'm getting a strange error and have no clue how to fix it.

I only get this error if I try do encode this field "Hotel 10 Aparecida de Goiânia" without it works just fine. Do you have any tip or direction to give me?

table_data |> CSV.encode |> Enum.each(&IO.write(file, &1))
** (ErlangError) erlang error: :no_translation
(stdlib) :io.put_chars(#PID<0.248.0>, :unicode, "27,SinnPDV Atualização Goiânia,6125,Hotel 10 Aparecida de Goiânia,Felipe Borges Ferreira,Bruno\r\n")
(elixir) lib/enum.ex:657: anonymous fn/3 in Enum.each/2
(elixir) lib/enum.ex:1637: anonymous fn/3 in Enum.reduce/3
(elixir) lib/enum.ex:2843: Enumerable.List.reduce/3
(elixir) lib/stream.ex:769: Stream.do_list_transform/9
(elixir) lib/enum.ex:1636: Enum.reduce/3
(elixir) lib/enum.ex:656: Enum.each/2

Q: Does `num_workers` process the file in parallel?

Just as the title says, if I want to process a file with the rows processing independently of each other (non-sequential), does setting the num_workers to, let's say 6, process the file in 6 independent processes?


iex> "../test/fixtures/docs/valid.csv"
iex> |> Path.expand(__DIR__)
iex> |> File.stream!
iex> |> CSV.decode(num_workers: 6)

Docs

Hi, I downloaded installed and started using your library, but I had to give up when I could work out how to pattern match on a row in order to do a database insert. I've solved it with another library, but thought it worth recommending you beef up the example for beginners like me.

UTF-8 with BOM

Hi,

I have a CSV saved as UTF-8 with BOM. The decoder stores the BOM together with the first header. Would this library try to strip out the BOM, or would it be the user's responsibility to decide whether to have the BOM in the CSV?

Provide default implementation of `CSV.Encode` where appropriate

See #9

The fallback to Any is costly, and integers can be encoded unambigously. There may be other types where a default implementation would be appropriate.

Allow stray quotes in unescaped fields

Quotes escaping is a recommendation in RFC 4180, however CSV should allow stray quotes in fields not starting with a quote.

That means this should be valid:

A,B"C,D

Whereas this should still not be valid:

A,"B"C",D

Make row length check optional

Hello,

thank you for the wonderful library, it's a pleasure to use it except one pain point: a hard check on a row length.
If there are no any strong objections it can be a wonderful addition.

CSV.decode(headers: true) breaks when there is more than one column with a matching header

This is a subtle but important implementation problem when using a straight-up map as your target implementation, because it’s not quite right since keys cannot be reused.

Assume you have a CSV like this:

id,name,address,address,city,province
1,joe,123 city st,,moosejaw,SK

This will result in a map %{"id" => 1, "name" => "joe", "address" => "", "city" => "moosejaw", "province" => "SK"}. While this seems reasonable, this is completely incorrect and results in data loss. The Ruby CSV library handles this correctly, in that it returns a data structure that acts like a hash but is not quite a hash so that you can ask for row['address'] # => "123 city st" and row['address', 1] # => nil (the first can be row['address', 0], too).

I suspect that to fix this, you will need to return a specialized form that can be handled properly. The interface would be something like CSV.Row.get(row, header, index \\ 0) where is_binary(header) and CSV.Row.get(row, index) (because row[3] always returns the 4th column.

`CSV.encode` seems extremely slow

defmodule CSVTest do
  use ExUnit.Case, async: true

  for rows <- [1, 10, 100, 500] do
    @rows rows
    test "generating #{rows} rows" do
      data = List.duplicate([1, 2, 3, 4, 5], @rows)
      data |> CSV.encode |> Stream.run
    end
  end
end

➜ mix test test/csv_test.exs --trace

CSVTest
  * generating 100 rows (1858.3ms)
  * generating 1 rows (16.4ms)
  * generating 10 rows (173.0ms)
  * generating 500 rows (8899.1ms)


Finished in 10.9 seconds (0.05s on load, 10.9s on tests)
4 tests, 0 failures

It appears to be linear in the number of rows (as you'd expect) but with a huge constant factor -- about 17ms per row.

Any idea why the encoder is so slow?

Add an option to disable the row_length check

I'm trying to process an invalid csv, with 10 headers in the first row, follow by rows with 10 or less values.

decode (rightfully) fails with Row has length 8 - expected length 10 on line X

I could probably try to pre-process the rows and add the missing columns (append the right number of commas to the line) but that would require that I handle all the escaping cases that you already handle in your code, to be able to actually count the columns.

Probably easier (for me) would be to have an option to disable this check, maybe something like:

row_length: false and maybe even row_length: 10 to enforce a certain row_lenght

would you accept such a merge request?

Headers as atoms

Hello,

Is there any way to read the headers from the first line, but to transform them into atoms before converting rows to maps.

What I want is to have atoms as keys in maps, not strings.

Thank you !

Crashes on unrecognized letters?

Here are two examples of individual lines that led the parser to barf and stop:

frações,1,M,12-Sep,,,,,,,,,,,,,,,,,,,

matemática financeira,1,M,12-Sep,,,,,,,,,,,,,,,,,,,

This is in a file generated by Excel (export as XLS). Not sure what format it was in - saving it in Vim as UTF-8 resolved the issue. However, it seems like it could either accept parsing a file, even if some characters are messed up, or at least give a useful error message ("Non UTF-8 valid character found on line x etc").

Error message:

17:41:09.397 [error] Error in process <0.112.0> with exit value: {function_clause,[{'Elixir.CSV.Lexer',lex,[411,<<37 bytes>>,<0.120.0>,{content,<<5 bytes>>},44],[{file,"lib/csv/lexer.ex"},{line,35}]},{'Elixir.CSV.Lexer',lex_into,2,[{file,"lib/csv/lexer.ex"},{line,24}]}]}

Add context and field number to syntax errors

CSV should add context to syntax errors in order to make the error messages more useful. That means an error like this:

Invalid escape sequence on line 418

should become:

Invalid escape sequence on line 418, field 4 near "B, \"C"

When there are headers, the field could be named:

Invalid escape sequence on line 418, field "ABC" near "B, "C"

First row skipped when parsing StringIO

I'm trying to parse a CSV string, using StringIO.open/1 and IO.binstream/1 to convert the string to a stream first. When the :headers option is false, the decoder skips the first row in the CSV. When :headers is true, the CSV is decoded as expected.

Add this test to test/csv_test.exs:

  test "decodes from StringIO stream" do
    {:ok, out} =
      "a,b,c\nd,e,f"
      |> StringIO.open

    stream = out |> IO.binstream(:line)

    assert stream |> CSV.decode! |> Enum.map(&(&1)) == [~w(a b c), ~w(d e f)]
  end

Fails with:

  1) test decodes from StringIO stream (CSVTest)
     test/csv_test.exs:30
     Assertion with == failed
     code: stream |> CSV.decode!() |> Enum.map(&(&1)) == [~w"a b c", ~w"d e f"]
     lhs:  [["d", "e", "f"]]
     rhs:  [["a", "b", "c"], ["d", "e", "f"]]
     stacktrace:
       test/csv_test.exs:37

I spent quite a bit of time digging into the code and debugging, but I'm out of time. The thing that looks suspicious is the Enum.take(1) call in CSV.Decoder.get_first_row/2. It would seem to me that you can't pluck an item out of a stream like that without disturbing the stream.

But then again, if the stream comes from a file, it's not a problem. For files, maybe the stream is opened (?) twice? For example with a file stream, you can inspect the stream, then send it to the decoder without creating a new stream:

IO.puts "stream: #{stream |> Enum.map(&(&1)) |> inspect}"
assert stream |> CSV.decode! |> Enum.map(&(&1)) == [~w(a b c), ~w(d e f)]

But with a stream created from StringIO.open and IO.bitstream, you can't inspect the stream before sending it to the decoder. The decoder outputs an empty stream.

I don't know if this is a bug in CSV, or a bug in StringIO or IO.binstream, or if I'm just doing it wrong. How does one reliably decode a CSV string? I like to do this in tests a lot. Fortunately, I've been using headers: true, but sometimes I'd like to have headers: false.

Implement Protocol for Encoder

The encoder should define a protocol so the result of encoding can be controlled.

just fyi - hexdocs.pm/csv is broke

Using hexdocs to search for csv leads us to an overview page which may be missing or something.

It seems small. I can get to your docs through the direct link to functions like decode.

CSV encode double-escapes escape codes like CR

When encoding a line which contains a field with an embedded escape code, CSV escapes the backslash, so the output contains \r\n instead of a CRLF. This is an issue with double-quoted fields which are allowed to have CRLF's embedded. See below:

iex(9)> IO.puts "foo\r\nbar"
foo
bar
:ok
iex(10)> [["foo\r\nbar", "foo", "bar"]] |> CSV.encode |> Enum.to_list
["\"foo\\r\\nbar\",foo,bar\r\n"]

I can work around it by regex removing the escaped backslash, but unless I'm missing something this is incorrect behavior.

Header row may be out of order when invoking with multiple pipes

The header row may be out of order when invoked with multiple pipes, leading to the second row being the header.

Takes way too long for large files

Maybe I'm using the library wrong but I noticed something strange. I have a large csv file (~ 500MB). With streams it is very easy to output this file line by line with a very low memory footprint:

File.stream!("large.csv") |> Stream.each(&IO.inspect/1) |> Stream.run

So I assumed it would be very easy to plug the CSV decoder in between:

File.stream!("large.csv") |> CSV.decode |> Stream.each(&IO.inspect/1) |> Stream.run

I assumed this would print out the elixir data structures that csv emits but it turns out, there is some kind of eager functionality in between because when I run this, there is no output at first and only after a while the output is shown (and much slower). Also the memory usage goes through the roof, running that code.

Is there something I'm missing?

Thinking about that, wouldn't it be best to have the possibility to provide some kind of fallback to the decode method to run side effects?

File.stream!("large.csv") |> CSV.decode(&IO.inspect/1)

Double quotes on numbers

Hi,

how can I force number field to be double quoted?

Does the library support this?

Thanks.

Implement advanced decoding for Decoder

The Decoder should allow to decode cells further into other types than strings. The exact mechanism still needs to be selected.

Adding an option to set up the quote_char

Hi ! I'm currently porting ETL ruby code in elixir and I have a problem in some files.

I'm on the master branch and I use the | as separator. I get an Exception about CSV.EscapeSequenceError when I have only one " in the line which can happen because this character is not intended to be used for escaping in my case. Here a line that trigger the exception :
TEST|RES "LES PRES LE ROY||ZZ

I was expecting on the master branch to receive a tuple for this error : {:error, EscapeSequenceError...}, can I do something for getting that instead the exception ?
The error message in the exception was wrong about the line number of the problem, I'll look why but there is maybe a bug here.

In the csv ruby library I can set a different character (the quote_char option) for escaping and it solved my problem if I put a character that I'm sure is not in the file : https://ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html#method-c-new

Do you think that can of option could be introduced in the library ? I can try to submit a PR for that if it's ok

if multiple options are specified, then strip_cells is ignored

When specifying headers and strip_cells, then strip_cells was ignored. I changed it to be consistent with the way Lex works, etc., by checking if the option is set.

File.stream!("data/user_1/data.csv") |> CSV.decode(strip_cells: true, headers: true) |> Enum.each(fn(row) -> IO.inspect(row) end)

Encoder becomes corrupted when it encounters an error

Playing around with your library in iex yielded some pretty odd results:

iex(1)> ["a"] |> CSV.encode |> CSV.decode |> Enum.to_list
** (Protocol.UndefinedError) protocol Enumerable not implemented for "a"
    (elixir) lib/enum.ex:1: Enumerable.impl_for!/1
    (elixir) lib/enum.ex:112: Enumerable.reduce/3
    (elixir) lib/enum.ex:981: Enum.map/2
             lib/csv/encoder.ex:47: CSV.Encoder.encode_row/3
             lib/csv/encoder.ex:42: anonymous fn/4 in CSV.Encoder.encode/2
    (elixir) lib/stream.ex:650: Stream.do_transform/7
    (elixir) lib/enum.ex:1740: Enum.take/2
             lib/csv/decoder.ex:120: CSV.Decoder.produce/2
iex(1)> [["b"]] |> CSV.encode |> CSV.decode |> Enum.to_list
[]
iex(2)> [["b"]] |> CSV.encode |> CSV.decode |> Enum.to_list
[["b"], ["b"]]
iex(3)> [["b"]] |> CSV.encode |> CSV.decode |> Enum.to_list
[]
iex(4)> [["b"]] |> CSV.encode |> CSV.decode |> Enum.to_list
[["b"]]

Notice that [["b"]] |> CSV.encode |> CSV.decode |> Enum.to_list resulted in [] the first time, then [["b"], ["b"]] and [], and then finally, on the 4th try, [["b"]], as expected. The mistake made initially causes the encoder to become corrupted apparently. If exit the iex session and re-do it without the bad line the problem does not occur.

How to get the count number of rows in the CSV file?

How to get the count number of rows in the CSV file? Also, would you please have a look here: https://elixirforum.com/t/inserting-csv-into-postgres/3606

Extract escaping function to somewhere reusable

https://github.com/beatrichartz/csv/blob/master/lib/csv/encode.ex#L54-L59

  defp escape(cell) do
    cell |>
      String.replace(<< @newline :: utf8 >>, "\\n") |>
      String.replace(<< @carriage_return :: utf8 >>, "\\r") |>
      String.replace("\t", "\\t")
  end

Perhaps this function could be moved to somewhere else so that the escaping logic can be used in custom encode implementations.

Accept string input

Does this library accept a simple string input? Sometimes the entire csv string already exists in memory and it would be nice to simply pass it into this library.

(CSV.Parser.SyntaxError) Unterminated escape sequence

Hi,

I've recently exported data to CSV from Postgres, therefore I assume that there is nothing wrong with the data from RFC perspective.

I am using latest csv release 1.4.2 and facing error mentioned in subject during parsing.

Sample, which gives me error with CSV header attached

sample.csv.zip

id,question_id,user_id,year,locked,text,locale,inserted_at,updated_at,image_file_name,image_content_type,image_file_size,image_updated_at,comment_count
170,144,8,2015,f,"ООО...Неее... это не ко мне((
Но мое любимое вот это:
""Тихо-тихо ползи,
Улитка, по склону Фудзи,
Вверх, до самых высот.""",ru,2015-03-05 17:12:55,2015-03-05 17:12:55,,,,,0

P.s. I don't think that Russian text makes any difference, right? :)

Allow line breaks in fields

According to RFC 4180 page 2, fields that are enclosed in double quotes can have newlines in them. When trying to parse such row:

first,row,here
one,two,"three
and newline"

The parser throws a syntax error:

(CSV.Parser.SyntaxError) Unterminated escape sequence. on line 1

Notice that the reported line number is incorrect as well – confusing.

I'll take a look if I can fix it, but I may need some direction :)

does not work with mac os legacy line feed CR

with a csv file with line separator CR (legacy mac os),
the module does not work, it cannot decode the file, it puts everything as one line and gives back only a blank list if headers true
as soon as the csv file is rewritten with line feed LF everything works as expected
this is with last stable elixir on a mac

you get legacy max os CR line separator as soon as you create the csv from excel on mac for instance

thanks

New release?

Hi there,

thank you for this nice library!

It hasn't seen a release in quite some time and some of the new features on master are pretty nice and we'd like to use them, so a new release would be highly appreciated :)

Thanks!
Tobi

missing entry if parsing IO.stream

it is found that different result obtained if IO.stream is used, instead of File.stream!, e.g.

iex(1)> "sample.csv" |> File.stream! |> CSV.Decoder.decode(headers: true)|>Enum.to_list
[%{"login" => "A", "name" => "Mr A", "shell" => "/bin/bash"},
 %{"login" => "admin", "name" => " Administrator", "shell" => " /bin/bash"},
 %{"login" => "root", "name" => " Root", "shell" => " /bin/sh"}]
iex(2)> f = File.open!("sample.csv")
#PID<0.176.0>
iex(3)> IO.stream(f, :line) |> CSV.Decoder.decode(headers: true) |> Enum.to_list 
[%{"login" => "admin", "name" => " Administrator", "shell" => " /bin/bash"}, 
 %{"login" => "root", "name" => " Root", "shell" => " /bin/sh"}]

content of "sample.csv":

login,name,shell
A,Mr A,/bin/bash
admin, Administrator, /bin/bash
root, Root, /bin/sh

May you let me know if you have any idea of the problem? thanks

module version: 1.4.0

Calling `decode` with `num_pipes: 1` multiple times on the same stream yields different orderings each time

I'm noticing some weird stuff happening with the decoder when I call it multiple times in an IEx session.

Given a file my_file.csv with ~200 lines, doing

File.stream!("my_file.csv") |> CSV.Decoder.decode(num_pipes: 1) |> Enum.take(2)

yields the first 2 lines of the file the first time I run it, but subsequent invocations yield seemingly random / out-of-order lines.

I also get strange behavior when working with a simple stream (although I'm not able to reproduce the ordering issue):

iex(1)> stream = ~w(1,2,3,4 5,6,7,8 9,10,11,12 13,14,15,16) |> Stream.map(&(&1))
#Stream<[enum: ["1,2,3,4", "5,6,7,8"],
 funs: [#Function<45.113986093/1 in Stream.map/2>]]>
iex(2)> stream |> Enum.take(2)
["1,2,3,4", "5,6,7,8"]
iex(3)> stream |> Enum.take(2)
["1,2,3,4", "5,6,7,8"]
iex(4)> CSV.Decoder.decode(stream, num_pipes: 1) |> Enum.take(2)
#Function<25.113986093/2 in Stream.resource/3>
iex(5)> CSV.Decoder.decode(stream, num_pipes: 1) |> Enum.take(2)
[["1", "2", "3", "4"], ["5", "6", "7", "8"]]
iex(6)> CSV.Decoder.decode(stream, num_pipes: 1) |> Enum.take(2)
[]
iex(7)> CSV.Decoder.decode(stream, num_pipes: 1) |> Enum.take(2)
[["1", "2", "3", "4"], ["5", "6", "7", "8"]]
iex(8)> CSV.Decoder.decode(stream, num_pipes: 1) |> Enum.take(2)
[]

Decode is slow when using (headers: true)

When asking the CSV parser to add headers it seems to take a while to start. Once going it is generally fast. See attached gif for demo. The only difference in the code between the two panes is (headers: true)

ParallelStream is not available

Hi!

I recently created release with edeliver and I'm seeing this?

** (exit) an exception was raised:^M
    ** (UndefinedFunctionError) function ParallelStream.map/3 is undefined (module ParallelStream is not available)^M
        ParallelStream.map(#Function<57.89908360/2 in Stream.transform/3>, #Function<1.67996581/1 in CSV.Decoder.decode/2>, [num_workers: 3])^M

Naive question, but would requiring ParallelStream in the Decoder module fix the runtime error I am seeing?

Consider \r as a valid delimiter by default?

Files coming from Windows users will just contain \r as a newline delimiter, and I think it's good if the default decode method supports this out of the box. Otherwise there's no good/fast way to parse a CSV like that: we'd need to read each line, trim it, and then decode it. Using \r as a delimiter doesn't work because if \r\n appears in the file then \n will be part of the next line. \n could work, but then we have to trim each column's value.

What do you think?

Support for csv in strings?

I'm getting a error when trying to use csv on strings rather than reading from file

iex(23)> string = "a,1
...(23)> b,4"
"a,1\nb,4"
iex(24)> string |> CSV.decode  |> Enum.map fn r -> Enum.map(r, &String.upcase/1) end
** (Protocol.UndefinedError) protocol Enumerable not implemented for "a,1\nb,4"
    (elixir) lib/enum.ex:1: Enumerable.impl_for!/1
    (elixir) lib/enum.ex:112: Enumerable.reduce/3
    (elixir) lib/stream.ex:1240: Enumerable.Stream.do_each/4
    (elixir) lib/stream.ex:700: Stream.do_transform/8
    (elixir) lib/enum.ex:1400: Enum.reduce/3
    (elixir) lib/enum.ex:1047: Enum.map/2
iex(24)> File.stream!("/Users/username/test_data.csv") |> CSV.decode  |> 
iex(25)>Enum.map fn r -> Enum.map(r, &String.upcase/1) end
[["A", "1"], ["B", "4"]]
iex(26)>

the contents of test_data.csv are same as string used
Is there anything wrong i'm doing or are strings not supported?

PS: i'm fairly new to functional programing/elixir world.

Make decode return errors in a tuple and decode! raise errors

The decoder interface could be separated into a method raising errors and into another returning them in a tuple with the result:

myfile
|> File.stream!
|> CSV.decode! # potentially raises errors

myfile
|> File.stream!
|> CSV.decode # returns rows as { :ok, row } and errors as { :error, "Error message" }

Desired Behavior: When headers are provided as a list in the options to CSV.Encoding.Encoder.encode/2, the headers in the final output should be ordered as in the list.

beatrichartz / csv Goto Github PK

csv's People

Contributors

Stargazers

Watchers

Forkers

csv's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs