GithubHelp home page GithubHelp logo

csv's People

Contributors

algorithmsarecool avatar azuxirenleadguy avatar claasd avatar estebanz01 avatar fossabot avatar jburman avatar joeskeen avatar prof79 avatar renovate[bot] avatar stevehansen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

csv's Issues

How to iterate headers? Property...

Hi there,
I'm using you library reading data out of a csv file (as expected), with HeaderMode.HeaderPresent active.
The iteration through the items starts with the second line, where the data starts...
How can the headers being iterated?

Does not see any Header property having the list of all column names.
Would be a good thing to have directly (read) access to the headers.

Thank you for any hint.
// SiL

PS: Got it working, when switched the HeaderMode to HeaderAbsent ;)

Wrong escaping logic with nested double quotes and a comma at the end

This csv row:

"Normal,\"quoted with nested \"double\" quotes, and comma at the end,\",normal 3,normal 4,normal 5"

will count 2 columns instead of 5.

Note, there is a nested double quotes, which with that alone it passes the test correctly (5 columns); but if you add a comma at the very end of the column (2nd column in this case) it will evaluate to only 2 columns instead of 5.

ReadFromText reads file correct but headers are incorrect when HeaderMode.HeaderAbsent

My code:

string importDataString = "Test;\"A\nB\nC\nD\nE\nF\nG\nH\";testing with very long string;123123";
var options = new CsvOptions
{
    Separator = ";",
    HeaderMode = HeaderMode.HeaderAbsent,
    AllowNewLineInEnclosedFieldValues = true,
    AllowBackSlashToEscapeQuote = false,
};

CsvReader.ReadFromText(importDataString, options)

The result of my reader is a list with one record.

Record:

  • ColumnCount = 4 (Ok)
  • Headers = ["Column1", "Column2"] (Not ok)
  • Values = ["Test","A\nB\nC\nD\nE\nF\nG\nH","testing with very long string","123123"] (Ok)

Why is Headers an array of 2 records and not 4 records? Since ColumnCount is actually 4 and Values also contains 4 records.

No CsvWriter implementation?

Am I correct in that this library currently only reads csv files? I don't see any tests or implementation for writing, but I wanted to make sure I wasn't missing something....

Thanks!

How to write \n instead of \n\r inside cell

Hi, Steve! Thank you a lot for the great tool!
Could you please explain how to change (if possible) the new line character (NLC) in CsvWriter.WriteToText? Yes, we can do similar thing through the CsvOptions in CsvReader.ReadFromText and it is very useful. I read my tables with \n\r as a general NLC, and \n as a NLC inside multiline cells. I would want to write the same way, but CsvWriter.WriteToText does it waywardly: both cases with \n\r. Though it is not a wrong behaviour and data remains consistent, it does mess up the appearance of my tables a bit.

HeaderMode.HeaderAbsent multi blank columns

Hi,

When I read a csv file with the HeaderMode.HeaderAbsent option, if the first row has two columns that are blank (I assume even with the same content) I get the error "an item with the same key has already been added".

I assume this is because it is using the values of the first row as the field/column name. It should just be manually using "Column1, Column2.." etc to avoid this.

Support headerless CSV files (all rows are data)

Thanks for this little library, I'm working with some CSV files that will not have any header rows. So i'm modifying your project to optionally include the first row as data.

Does it sound like an idea you would care for in csv?

I love it this simplicity

The CSV is simple but also one of the most difficult for developers to handle.
I love this simplicity and it helps me a lot.

Thanks,

Error: An item with the same key has already been added.

I can't see why I would get an error like this. My file has no key of any kind, so why would the assembly assume that it does. Is it assuming that the first column is some kind of unique key? It shouldn't assume anything.

Doesn't handle cells with return very well

For the input data(with CR-LF return)

key,value
test1,test2
test3,"test4

test5"
test6,test7

which is expected to be parsed to

key value
test1 test2
test3 test4\r\n\r\ntest5
test6 test7

is actually parsed to

key value
test1 test2
test3 "test4
test5"
test6 test7

Blank Fields

Breaks when reading Blank Fields

example:

,,First

public ReadLineFromStream

I have a huge csv file.
I would like to read it line by line so it would not take a bunch of memory.
DoAre you thinking about adding that functionnality ?

Doesn't detect separator

I tried reading a csv file with this package, but the separator is not detected, and even when I try passing in options specifying the separator, I keep getting only one column detected.

Note: This happened after opening the file in excel - works fine if the file wasn't opened in excel.

No documentation

I tried to use this package but I can't find any documentation to direct me.

Switch to state based parser

It seems most of our performance issues comes from the Regex based parser (it in itself is complex, it is invoked for each line, but we also need to clean up afterward if it was wrong for the splitting quoted values)

Going over the reader/memory character by character will allow us to do trimming at once, will handle quoting correctly, can escape correctly, and will probably handle multiline better as well.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.

Detected dependencies

github-actions
.github/workflows/codeql-analysis.yml
  • actions/checkout v3
  • github/codeql-action v2
  • github/codeql-action v2
  • github/codeql-action v2
.github/workflows/stale.yml
  • actions/stale v7
nuget
Csv.Tests/Csv.Tests.csproj
  • MSTest.TestAdapter 2.2.10
  • MSTest.TestFramework 2.2.10
  • Microsoft.NET.Test.Sdk 17.6.0
Csv/Csv.csproj
  • Microsoft.SourceLink.GitHub 1.1.1

  • Check this box to trigger a request for Renovate to run again on this repository

Headers vs Values, and HasColumn

I test line.HasColumn(x) to see if a column exists, and then query line[x].
line.Headers is 55 elements wide, and line.Values is 40 elements wide. So, I get:

System.InvalidOperationException: Invalid row, missing line_item_1 header, expected 55 columns, got 40 columns.

I'd expect it to return "".

Backslash before quote with AllowBackSlashToEscapeQuote = false fails

When a field contains the combination " at the end, parsing will fail.
The following test will provoke it in the second section:

    [TestMethod]
    public void BackslashBeforeQuote()
    {
        var withSpace = CsvReader.ReadFromText("\"A\";\"B\";\"C\"\n\"A \"\"\";\"B \\\"\" \";\"C\"", new CsvOptions { AllowNewLineInEnclosedFieldValues = true, AllowBackSlashToEscapeQuote = false, AllowSingleQuoteToEncloseFieldValues = false }).ToArray();
        Assert.AreEqual(1, withSpace.Length);
        Assert.AreEqual(3, withSpace[0].Headers.Length);
        Assert.AreEqual("A \"", withSpace[0][0]);
        Assert.AreEqual("B \\\" ", withSpace[0][1]);
        Assert.AreEqual("C", withSpace[0][2]);
        var withoutSpace = CsvReader.ReadFromText("\"A\";\"B\";\"C\"\n\"A \"\"\";\"B \\\"\"\";\"C\"", new CsvOptions { AllowNewLineInEnclosedFieldValues = true, AllowBackSlashToEscapeQuote = false, AllowSingleQuoteToEncloseFieldValues = false }).ToArray();
        Assert.AreEqual(1, withoutSpace.Length);
        Assert.AreEqual(3, withoutSpace[0].Headers.Length);
        Assert.AreEqual("A \"", withoutSpace[0][0]);
        Assert.AreEqual("B \\\"", withoutSpace[0][1]);
        Assert.AreEqual("C", withoutSpace[0][2]);
    }

AllowNewLineInEnclosedFieldValues incorrectly detected column count when last attribute is empty

Hi I encountered this problem in V1.0.51; I noticed there is a new commit regarding multiline issue, I am not sure whether those are relevant.

The code I used:

 static void Main(string[] args)
        {
            // V1.0.51
            string content =
@"A1,A2,A3,A4
1, 1,""Hello
World"",""""
2, 2,""Hello
World"",""""";
            // Case 1
            Console.WriteLine("Case 1:");
            foreach (var line in CsvReader.ReadFromText(content, 
                new CsvOptions() { AllowNewLineInEnclosedFieldValues = false }))
            {
                Console.WriteLine($"{line.ColumnCount}");
            }
            Console.ReadKey();

            // Case 2
            Console.WriteLine("Case 2:");
            foreach (var line in CsvReader.ReadFromText(content,
                new CsvOptions() { AllowNewLineInEnclosedFieldValues = true }))
            {
                Console.WriteLine($"{line.ColumnCount}");
                Console.WriteLine($"{line.Values}");
            }
            Console.ReadKey();
        }

The csv:

A1,A2,A3,A4
1, 1,"Hello
World",""
2, 2,""Hello
World",""

This is what I expect (as in Excel import csv):

Excel Import

This is what I am getting from program output:

Case 1:
3
2
3
2
Case 2:
7
System.String[]

Csv Reader

Please check!
Thanks.

Visual Studio Solution: CsvMultilineIssue.zip

Double quotes in unquoted columns is breaking parsing

I have a file that uses tabs as separators, and the columns are unquoted, but quotes may exist in the data. The parser puts all remaining data into the first column containing a double quote. In c#, you can reproduce with this:

var header = "h1\th2\th3\r\n1\t\"2\" is 2\t3"; var res = CsvReader.ReadFromText(header);

This results in 3 headers but only 2 data elements in the first row. There isn't (or at least I can't find) a way to disable quote parsing.

Problem with Detecting Quote Escape Sequences When Followed by Separator Character in Quote-Enclosed Fields

As can be seen below, when the read text contains a quote-enclosed field containing the quote escape sequence "", followed by the separator character ,, the APIs for reading column values parse the data incorrectly.

image

Also, as can be seen below, even when the AllowSingleQuoteToEncloseFieldValues flag is set, the incorrect parsing behaviour remains unchanged. I tested this flag in order to see if, for some reason, quote-enclosed fields were only detected with this mode activated, even though I know it is specifically to be used when fields are enclosed with the single quote character '.

image

To summarize, it is valid for CSV data to contain fields enclosed by the " character, while also containing quote escape sequences such as "" followed by separator characters such as the comma ,, but this library does not seem to be able to parse this formatting correctly.

Support for encoding?

Is there going to be support for other types of encdoing?
Like letters as øæå?

Need to support custom separators for CsvWriter

De-facto there are at least three versions of CSV format with different separators: comma (canonical separator), semicolon and tab. CsvReader supports them through CsvOptions but CsvWriter doesn't.
As CsvOptions was intended for import purpose only I suggest not to use it but add a char parameter to Write() method like this:

public static void Write(TextWriter writer, string[] headers, IEnumerable<string[]> lines, char separator=',')

Strongly named assembly

Would you consider signing assembly (making it strongly-named)?
When trying to use library in strongly-named assembly, I'm getting following error:
Unhandled Exception: System.IO.FileLoadException: Could not load file or assembly 'Csv, Version=1.0.58.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. A strongly-named assembly is required. (Exception from HRESULT: 0x80131044)
More info: https://docs.microsoft.com/en-us/dotnet/standard/library-guidance/strong-naming

Quoted commas are parsed as individual columns if preceded by space.

Issue:

A, ",," is parsed as four columns.

However, A,",," (no space) is correctly parsed as two columns.

Expected:

The line to be parsed as two columns.

Example Code:

        public static void ColumnBug()
        {
            string data =
"""
A, ",,"
A,",,"
""";

            using var stream = new MemoryStream(Encoding.UTF8.GetBytes(data));
            IEnumerable<ICsvLine> csvLines = Csv.CsvReader.ReadFromStream(stream, new CsvOptions()
            {
                HeaderMode = HeaderMode.HeaderAbsent,
                TrimData = true,
            });

            foreach (ICsvLine csvLine in csvLines)
            {
                Console.WriteLine($"Column Count {csvLine.ColumnCount}");
                Console.WriteLine($"Parsed:\t{string.Join("|", csvLine.Values)}");
                Console.WriteLine($"Raw:\t{csvLine.Raw}");
                Console.WriteLine();
            }
        }

    }

}

Output:

Column Count 4
Parsed: A|"||"
Raw:    A, ",,"

Column Count 2
Parsed: A|,,
Raw:    A,",,"

Doesn't handle "no data" well

Assuming the CSV has a header but no data, there ought to be some way of telling that that has happened. As it stands, the best I can do is

        var hasData = false;
        var csvText = File.ReadAllText("c:\\tmp\\norecords.csv");
        foreach (var line in CsvReader.ReadFromText(csvText))
        {
            hasData = true;
        }
        if (hasData)
        {

        }

[Syntax] `# comments are ignored`

Hi I know this is well-intended feature and I support it most of the time, but lately I am parsing CSV files that have real values starting with # in the first letter and because there is no way to toggle off skipping comments, such CSV files cannot be parsed.

I understand that the authoring program should have exported the values with " to "escape" them properly, but it's not done by the authoring program in this case and it would not be easy to try to escape those values in-place.

A hypothetical data would look like below:

HashTag,Author
#New,Charles
#Feature,Tom

Ideally there would be an option to Csv.CsvOptions:

Csv.CsvReader.ReadFromText(content, new Csv.CsvOptions()
{
    IgnoreComments = false
})

Appreciated!

HeaderMode.HeaderPresent columns with duplicate name

When there is a header and the header has duplicate header names it throws. InvalidOperationException("Duplicate headers detected in HeaderPresent mode. If you don't have a header you can set the HeaderMode to HeaderAbsent.");

Since the csv is provided to me can't be changed prior to processing the file, I added an option FixDuplicateColumnNames and some code to fix duplicate name issues.

If there is another way other than coding please let me know?

I would be happy to check in the changes. Please let me know if you wish to have the changes checked in.

Thanks

Fields that have a comma after a escaped quotes still split the field.

Using this as the input line: one,"two-a,two-b,""two-c"",two-d",three
The actual Lines property is [
"one",
""two-a, two-b, ""two-c"",
"two-d"",
"three" ]
Expected Lines property is [
"one",
"two-a,two-b,"two-c",two-d",
"three" ]

Test code that shows the issues is:

    [TestMethod]
    public void TestInternalSeperatorAfterEscapedQuote()
    {
        CsvOptions options = new CsvOptions();
        options.HeaderMode = HeaderMode.HeaderAbsent;
        foreach (ICsvLine line in CsvReader.ReadFromText("one,\"two - a, two - b, \"\"two - c\"\", two - d\",three", options))
            Assert.AreEqual(3, line.Values.Length);
    }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.