GithubHelp home page GithubHelp logo

Comments (7)

jbax avatar jbax commented on September 17, 2024

If you are parsing a file with a different line separator sequence than what your OS uses, then you must set it manually in the format configuration.

The parser will only consider \r\n as the newline sequence if you set it to be like that, or if you are using Windows (then the format will automatically be set to use the '\r\n' as the line separator sequence).

You got an exception because the parser reached the end of the input without finding a new line sequence (\r\n) after the comment. Just \n is not the end of the line according to your format specification

The normalized line separator is used internally to replace the \r\n sequence when parsing/writing, so our parsers/writer and whoever writes a parser/writer won't have to test for \r and \n sequences (or just \r or just \n). All your parser has to do is to test for the normalized newline character, and it will represent whatever sequence you defined in the format configuration.

So you must do this:

CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setLineSeparator('\n');

from univocity-parsers.

adessaigne avatar adessaigne commented on September 17, 2024

That what I thought. Then I realized that you can properly parse this file (with the normalized new line) without the comment line.

Thus the behavior must be consistent between parsing with and without comment. In order to choose I would rather want to be able to read with already normalized new line symbols.

from univocity-parsers.

jbax avatar jbax commented on September 17, 2024

So we have a bug that occurs in Windows:

If you have this file:

# this is a comment line\n
A,B,C\n
1,2,3\n

When reading this on Windows, the reader returns \r\n for each new line instead of just \n.

Now, if you set the line separator to be \n (which you should), it parses the values as:

A,B and C\r
1, 2 and 3\r

Apparently the JVM introduces the \r character automatically when reading files on Windows whose line terminators are \n only.

from univocity-parsers.

adessaigne avatar adessaigne commented on September 17, 2024

For testing I used a plain String that has the new line symbols I wanted.

Also, In git you have an option to automatically set the line symbols of the files.

https://help.github.com/articles/dealing-with-line-endings/

from univocity-parsers.

jbax avatar jbax commented on September 17, 2024

Oh we don't want github to guess the line terminator for us. We have tests around files with different line terminators

from univocity-parsers.

jbax avatar jbax commented on September 17, 2024

So apparently many other parsers suffer from the excessive helpfulness of the JVM (guessing what characters we need). This test produces different outcomes when executed on Windows and on Linux.

from univocity-parsers.

jbax avatar jbax commented on September 17, 2024

Closing the issue as it is not a bug.

Just tested everything again and included a test case (see commit ca3a11b).

I was getting errors before because my files with \n were being converted automatically to use \r\n by my git installation on Windows.

from univocity-parsers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.