Comments (7)
If you are parsing a file with a different line separator sequence than what your OS uses, then you must set it manually in the format configuration.
The parser will only consider \r\n as the newline sequence if you set it to be like that, or if you are using Windows (then the format will automatically be set to use the '\r\n' as the line separator sequence).
You got an exception because the parser reached the end of the input without finding a new line sequence (\r\n) after the comment. Just \n is not the end of the line according to your format specification
The normalized line separator is used internally to replace the \r\n sequence when parsing/writing, so our parsers/writer and whoever writes a parser/writer won't have to test for \r and \n sequences (or just \r or just \n). All your parser has to do is to test for the normalized newline character, and it will represent whatever sequence you defined in the format configuration.
So you must do this:
CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setLineSeparator('\n');
from univocity-parsers.
That what I thought. Then I realized that you can properly parse this file (with the normalized new line) without the comment line.
Thus the behavior must be consistent between parsing with and without comment. In order to choose I would rather want to be able to read with already normalized new line symbols.
from univocity-parsers.
So we have a bug that occurs in Windows:
If you have this file:
# this is a comment line\n
A,B,C\n
1,2,3\n
When reading this on Windows, the reader returns \r\n for each new line instead of just \n.
Now, if you set the line separator to be \n (which you should), it parses the values as:
A,B and C\r
1, 2 and 3\r
Apparently the JVM introduces the \r character automatically when reading files on Windows whose line terminators are \n only.
from univocity-parsers.
For testing I used a plain String that has the new line symbols I wanted.
Also, In git you have an option to automatically set the line symbols of the files.
https://help.github.com/articles/dealing-with-line-endings/
from univocity-parsers.
Oh we don't want github to guess the line terminator for us. We have tests around files with different line terminators
from univocity-parsers.
So apparently many other parsers suffer from the excessive helpfulness of the JVM (guessing what characters we need). This test produces different outcomes when executed on Windows and on Linux.
from univocity-parsers.
Closing the issue as it is not a bug.
Just tested everything again and included a test case (see commit ca3a11b).
I was getting errors before because my files with \n were being converted automatically to use \r\n by my git installation on Windows.
from univocity-parsers.
Related Issues (20)
- Tutorial site is not available HOT 1
- Wrong result for FixedWidthParser HOT 1
- CSV Reader does not escape ASCII control characters
- Differentiating NULL, EMPTY-STRING and SPACE in CSV files from a ResultSet
- New Realese
- The headers cache (StringCache) cause a memory leak
- Incorrect header parsing
- Add small (4 Gb limit) clob support
- `parseRecord(bean)` does not work
- parseRow not working in (CentOS Linux release 7.6.1810 (Core) )
- Integrating uniVocity into OSS-Fuzz
- Website unavailable HOT 1
- is there is any way to get the MaxDataColumnCount in the TSV File which has no text Qualifier and Headers.
- Incorrect column pruning HOT 1
- CsvWriterSettings - add option to not create the final line separator
- AbstractCharInputReader throws ArrayIndexOutOfBoundsException when buffer starts with whitespace
- Date format issue "MM/dd/YYYY" in csv download file
- THIS REPO IS ABANDONED – IT IS USELESS TO OPEN ISSUES OR PULL REQUESTS HOT 8
- Memory not released after processing HOT 2
- StringIndexOutOfBoundsException when quoted space/s are present in first row with parse option Ignore trailing whitespaces as true and Ignore leading whitespaces as false HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from univocity-parsers.