GithubHelp home page GithubHelp logo

markpflug / sylvan.data.excel Goto Github PK

View Code? Open in Web Editor NEW
198.0 198.0 22.0 722 KB

The fastest .NET library for reading Excel data files.

License: MIT License

C# 100.00%
data dotnet dotnet-core excel xls xlsb xlsx

sylvan.data.excel's People

Contributors

0xced avatar caiusjard avatar djshaw01 avatar markpflug avatar roncrush avatar sgunkel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sylvan.data.excel's Issues

How to read data on first row if schema unknown or first row has data?

Hello,

Thank you for an interesting library, it looks fast compared with similar alternatives after intitial testing. I have a couple of questions which I cannot find in any examples or docs.

  1. In the sample Excel file, TestExcel.xlsx, the code below is skipping row 1 so that line var value = edr.GetString(i); first reads cells A2 then B2. Is there a way to actually read row 1? I won't know the schema in advance or row 1 may not actually be a field name. How can I read data in row1?
    TestExcel.xlsx

  2. Is there way to specify a single sheet by name to read rather than iterating all sheets?

  3. Is there a way to specify a fixed range to read rather than iterate all rows and cells? Eg say I want to read an Excel range XD1000: XZ5000 can I limit iterations to this range only?

Code below is what I've been testing with.

using ExcelDataReader edr = ExcelDataReader.Create(streamCopy, ExcelWorkbookType.ExcelXml);

string importSheetName = "Sheet1";
do
{
    var sheetName = edr.WorksheetName;
    if (sheetName.Equals(importSheetName))
    {
        // enumerate rows in current sheet.
        while (edr.Read())  // <- Seems to skip row 1, first edr.RowNumber == 2 ?
        {
            // iterate cells in row.
            for (int i = 0; i < edr.RowFieldCount; i++)
            {
var value = edr.GetString(i); // <- First 2 read values are "", 1 (cells A2 and B2) ?
            }
            // Can use other strongly-typed accessors
            // bool flag = edr.GetBoolean(0);
            // DateTime date = edr.GetDateTime(1);
            // decimal amt = edr.GetDecimal(2);
        }

        break;
    }

    // iterates sheets
} while (edr.NextResult());

Sylvan.Data.Excel.ExcelFormulaException

Hi,

when I try to load a Excel Worksheet into a table I'm getting the following Error.

05/31/2022 10:06:07 : Sylvan.Data.Excel.ExcelFormulaException: Exception of type 'Sylvan.Data.Excel.ExcelFormulaException' was thrown.
   at Sylvan.Data.Excel.XlsbWorkbookReader.GetString(Int32 ordinal)
   at Sylvan.Data.Excel.ExcelDataReader.GetValue(Int32 ordinal)
   at Sylvan.Data.Excel.ExcelDataReader.GetValues(Object[] values)
   at System.Data.ProviderBase.SchemaMapping.LoadDataRow()
   at System.Data.Common.DataAdapter.FillLoadDataRow(SchemaMapping mapping)
   at System.Data.Common.DataAdapter.FillFromReader(DataSet dataset, DataTable datatable, String srcTable, DataReaderContainer dataReader, Int32 startRecord, Int32 maxRecords, DataColumn parentChapterColumn, Object parentChapterValue)
   at System.Data.Common.DataAdapter.Fill(DataTable[] dataTables, IDataReader dataReader, Int32 startRecord, Int32 maxRecords)
   at System.Data.DataTable.Load(IDataReader reader, LoadOption loadOption, FillErrorEventHandler errorHandler)
   at System.Data.DataTable.Load(IDataReader reader)

I'm using this code:

var table = new DataTable(edr.WorksheetName);
table.Load(edr);

It works fine when there are no formulas included.

Thanks a lot!

Binder Issue

Ok after long time of no hearing from me i have an issue with Binder.

In binding I assign column values to string DateOfArrival_Str and after function is used, only then I assign it to DateTime DateOfArrival, due to missmatches in date format, this way I can improve .FixDateTime() function. This proved to work on other files and classes also, but im getting errors on this file, not sure why.

Binder seems to have issue with the first line as i tried to locate the issue.
When i modify binder attached below to:

Modified binder (seems to work on this issue, but it's slow)

        public static IEnumerable<T> GetList<T>(this DbDataReader reader)
        {
            var binder = DataBinder.Create<T>(reader);
            while (reader.Read())
            {
                var item = (T)Activator.CreateInstance(typeof(T));
                try
                {
                    binder.Bind(reader, item);
                }
                catch
                {
                    item = item;
                }
                yield return item;
            }
        }

It works, but drastically slows down, basically from what i could see in debugger item is somehow binded at the point of exception, but i cannot locate where the issue lies.

It's downloaded from sharepoint.

Exception - DataBinder encountered an exception.
Inner - Specified cast is not valid.

at lambda_method21(Closure , DbDataReader , BinderContext , MDMMailRequest )
at Infinity.extension_methods.SylvanExtension.GetList[T](DbDataReader reader)+MoveNext() in C:\Users\KRZMAS\Desktop\Projects\Infinity\Infinity_UAT\Infinity\extension methods\SylvanExtension.cs:line 23
at System.Collections.Generic.List1..ctor(IEnumerable1 collection)
at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
at Infinity.main_functionalities.connections.excel.ExcelConnection.SylvanReaderT in C:\Users\KRZMAS\Desktop\Projects\Infinity\Infinity_UAT\Infinity\global functionalities\connections\excel\ExcelConnection.cs:line 114

I downloaded data on both sheets Sheet1 & out of MDC scope requests

Both of them use different setup in Extract Function those are:

Sheet1

  • Setup.SheetName = "Sheet1"
  • Setup.StartRow = 0
  • Setup.JoinHeaders() = "RequestType,1,OperatingUnit,RequestNumber,RequestSystem,RequestorName,CompletedBy,DateOfArrival_Str,DateOfCompletion_Str,9,NumberOfRecords,Comment"// -> returns string
  • Setup.NeededHeaders() ->Returns array below.
    image

out of MDC scope requests

  • Setup.SheetName = "out of MDC scope requests"
  • Setup.StartRow = 0
  • Setup.JoinHeaders() = "RequestType,OperatingUnit,RequestNumber,RequestSystem,RequestorName,CompletedBy,DateOfArrival_Str,DateOfCompletion_Str,8,NumberOfRecords,RequestID,Comment,RequestorEmail"// -> returns string
  • Setup.NeededHeaders() ->Returns array below.
    image

File:
Sent on emial

Standard Binder - Extension (this one i generally use)

        public static IEnumerable<T> GetList<T>(this DbDataReader reader)
        {
            var binder = DataBinder.Create<T>(reader);
            while (reader.Read())
            {
                var item = (T)Activator.CreateInstance(typeof(T));
                binder.Bind(reader, item);
                yield return item;
            }
        }

T - class

    public class MDMMailRequest
    {
        public DateTime? DateOfArrival { get; set; }
        public DateTime? DateOfCompletion { get; set; }
        public int? ID { get; set; }
        public string? RequestType { get; set; }
        public string? OperatingUnit { get; set; }
        public string? RequestNumber { get; set; }
        public string? RequestSystem { get; set; }
        public string? RequestID { get; set; }
        public string? RequestorName { get; set; }
        public string? CompletedBy { get; set; }
        public string? DateOfArrival_Str
        {
            get { return DateOfArrival.ToString(); }
            set
            {
                DateOfArrival = value.FixDateTime();
            }
        }
        public string? DateOfCompletion_Str
        {
            get { return DateOfCompletion.ToString(); }
            set
            {
                DateOfCompletion = value.FixDateTime();
            }
        }
        public int? ProcessingTime { get; set; }
        public int? NumberOfRecords { get; set; }
        public string? Comment { get; set; }
        public string? RequestorEmail { get; set; }
        public bool OutOfScope { get; set; }
    }

Extract Function

        private List<T>? SylvanReader<T>()
        {
            try
            {
                //sylvan option, get error lines as null (example : 20202/02/02 will be extracted as null date)
                var options = new SylExcel.ExcelDataReaderOptions { Schema = SylExcel.ExcelSchema.Default, GetErrorAsNull = true };
                //collect data
                using (SylExcel.ExcelDataReader excelDataReader = SylExcel.ExcelDataReader.Create(Setup.FilePath, options))
                {
                    //loop to locate needed sheet from passed setup
                    if (!string.IsNullOrWhiteSpace(Setup.SheetName))
                    {
                        while (excelDataReader.WorksheetName != Setup.SheetName)
                        {
                            excelDataReader.NextResult();

                            if (excelDataReader.WorksheetName == null)
                            {
                                //throw exception, that sheet was not located
                                throw new Exception("didnt find the sheet");
                            }
                        }
                    }

                    //loop to find headers, this will allow to skip unwanted rows, such as comments, empty lines
                    for (int i = 0; i < Setup.StartRow; i++)
                    {
                        excelDataReader.Read();
                    }
                    //get all coulmns in file to column specified as max
                    var schema = Schema.Parse(Setup.JoinHeaders());

                    excelDataReader.InitializeSchema(schema.GetColumnSchema(), useHeaders: true);

                    var dataReader = excelDataReader.Select(Setup.NeededHeaders());

                    var binderOptions = new DataBinderOptions
                    {
                        InferColumnTypeFromMember = true,
                        BindingMode = DataBindingMode.Any
                    };
                    DataBinder.Create<T>(dataReader, binderOptions);

                    return dataReader.GetList<T>().ToList();
                }
            }
            catch(Exception ex)
            {
                throw new Exception(ex.Message);
            }
        }

Date handler function for T class

        public static DateTime? FixDateTime(this string? _string)
        {
            DateTime test;

            if (string.IsNullOrWhiteSpace(_string)) return null;
            //initial parse
            if (DateTime.TryParse(_string, out test)) return test;

            _string = _string.Trim();   

            //remove additional spaces
            if (_string.Contains("  "))
            {
                _string = _string.Replace("  ", " ");
            }

            //test 1
            if(DateTime.TryParse(_string, out test)) return test;


            //luke format is "-", change it to needed date
            if (_string.Contains("-"))
            {
                try
                {
                    int len = _string.Length;
                    int year = int.Parse(_string.Substring(7, 4));
                    int month = int.Parse(_string.Substring(4, 2));
                    int day = int.Parse(_string.Substring(1, 2));
                    int hour = int.Parse(_string.Substring(12, 2));
                    int minute = int.Parse(_string.Substring(15, 2));
                    int second = len > 17 ? int.Parse(_string.Substring(18, 2)) :  0;
                    test = new DateTime(year: year, month: month, day: day, hour: hour, minute: minute, second: second);
                    return test;
                }
                catch
                {
                    return null;
                }
            }

            //sale force format
            if (_string.Contains(",") && _string.Contains("."))
            {
                _string = _string.Replace(",", "");

                try
                {
                    int len = _string.Length;
                    int year = int.Parse(_string.Substring(7, 4));
                    int month = int.Parse(_string.Substring(4, 2));
                    int day = int.Parse(_string.Substring(1, 2));
                    int hour = int.Parse(_string.Substring(12, 2));
                    int minute = int.Parse(_string.Substring(15, 2));
                    int second = len > 17 ? int.Parse(_string.Substring(18, 2)) : 0;
                    test = new DateTime(year: year, month: month, day: day, hour: hour, minute: minute, second: second);
                    return test;
                }
                catch
                {
                    return null;
                }
            }

            return null;

        }

Quite lengthy, I saw that you are scares of time, so i'll try to find the issue myself to best of my ability. Let me know if there is somehting i might have missed.

XlsxDataWriter - problem with specific cultures (0.4.1)

Output excel files contain errors if they were created with cultures that use comma as decimal separator (e.g. Polish).
I used the following code to generate such corrupted file:

using Sylvan.Data.Excel;
using System.Data;
using System.Globalization;

var dataTable = new DataTable();

dataTable.Columns.Add("DECIMAL", typeof(decimal));
dataTable.Columns.Add("DATETIME", typeof(DateTime));

dataTable.Rows.Add(new object[] { 1.7m, DateTime.Now });

var reader = dataTable.CreateDataReader();

CultureInfo.CurrentCulture = new CultureInfo("en-US");

using var edrEng = ExcelDataWriter.Create("english.xlsx");
edrEng.Write(dataTable.CreateDataReader());

CultureInfo.CurrentCulture = new CultureInfo("pl-PL");

using var edrPol = ExcelDataWriter.Create("polish.xlsx");
edrPol.Write(dataTable.CreateDataReader());

When I tried to open the file created with Polish culture it displayed the following error:

error

Please note that Excel interprets 1,7 as a string here and time part of DATETIME was lost. I suppose the problem with DateTime has the same cause.

My suggested solution would be to use System.Globalization.CultureInfo.InvariantCulture as IFormatProvider provider for decimal.TryFormat, for example:

var scratch = c.GetScratch();
if (val.TryFormat(scratch.AsSpan(), out var sl, provider: System.Globalization.CultureInfo.InvariantCulture))
{
	w.Write(scratch, 0, sl);
}		

What is also worth considering is to additionally instead of StreamWriter use a class that derives from StreamWriter and overrides its public virtual IFormatProvider FormatProvider property.

Read Exception

Exception: The ReadValueAsChunk method is not supported on node type EndElement

            var options = new SylExcel.ExcelDataReaderOptions { Schema = SylExcel.ExcelSchema.Default, GetErrorAsNull = true };
            //collect data
            **using (SylExcel.ExcelDataReader excelDataReader = SylExcel.ExcelDataReader.Create(Setup.FilePath, options))**

Exception occurs on create

File sent via email.

.xls buffer management bug

There appears to be a bug in the buffer management code that can crop up with large files. It manifests as an InvalidDataException that is thrown for valid files when the buffer state gets corrupted, and the invalid state is processed.

Create Issue

I have an Exception : "Found invalid data while decoding."

It's throwed on using (SylExcel.ExcelDataReader excelDataReader = SylExcel.ExcelDataReader.Create(filepath, options))

Data is actually readable, but the file by default (after extract) is corrupted

Salesforce ISO up to 30.06.2022 - Copy.xls

How can I start reading from first row in Excel Sheet?

while(edr.Read())
{
rownum = edr.RowNumber; //this is row number 1
//I would like to start from row 0 (that is 1 in Excel)
// iterate cells in row.
for(int i = 0; i < edr.FieldCount; i++)
{
var value = edr.GetString(i);
Console.WriteLine(value);
ExcelValues.Add(value);
}
// Can use other strongly-typed accessors
// bool flag = edr.GetBoolean(0);
// DateTime date = edr.GetDateTime(1);
// decimal amt = edr.GetDecimal(2);
}

System.ArgumentOutOfRangeException: 'Non-negative number required. Arg_ParamName_Name'

trace :

System.Text.UnicodeEncoding.GetString(byte[], int, int)
    Sylvan.Data.Excel.XlsbWorkbookReader.RecordReader.GetString(int, out int)
    Sylvan.Data.Excel.XlsbWorkbookReader.XlsbWorkbookReader(System.IO.Stream, Sylvan.Data.Excel.ExcelDataReaderOptions)
    Sylvan.Data.Excel.ExcelDataReader.Create(System.IO.Stream, Sylvan.Data.Excel.ExcelWorkbookType, Sylvan.Data.Excel.ExcelDataReaderOptions)
    Sylvan.Data.Excel.ExcelDataReader.Create(string, Sylvan.Data.Excel.ExcelDataReaderOptions)

please fix this

While reading, Date values are converted into numbers (version: 0.3.1)

@MarkPflug: We are getting another issue with excel reader. Date values are not reading properly, instead of date we are getting numbers.
We have Pstg Date, Doc. date. columns as date fields in the attached excel. While reading data we are getting numbers like below example.

Example:

Original data:
2,Data,0,NOV0085,6,2022,09/06/2022 00:00:00,73125,25/04/2022 00:00:00,034984,A,I,822.65,164.53,,STARK,GBP
Data.xlsx

Converted data:
2,Data,0,NOV0085,6,2022,44721,73125,44676,034984,A,I,822.65,164.53,,STARK,GBP

NOTE: The same file working as expected with NPOI and EPPlus and returning dates

How do I get the values of the merged cells

The merged cell can only get the first value, the value of the following cell is empty, there is no solution to this.

Whether to get the cell in Reader that marks the current cell as merged and contains matrix data.

Ole2Package+Ole2Stream abruptly quits when reading an XLS from a FileStream in a Blazor Server app

I've a problem reading a particular, reasonably sized (700kb, 2600 rows) file in a Blazor server app. I've done various tests of changing the problem file and project type:

  • a new "hello world" XLS file reads OK in the BS app
  • the problem file works OK in a Console app
  • but the combination of problem file+Blazor Server fails.

It's like an exception is thrown that kills the background reading thread, and the exception is silently swallowed. When debugging, I get to line 136 of the Ole2Stream class, there's a ReadAsync, and stepping over it manifests the problem; the yellow bar disappears and never returns no matter how long it's left..

Here's the code, and debug prints (some added by me):

image

The red box delineates the first time round that position increments from 0; this operation finishes OK. Another read occurs with a bigger buffer, and position starts from 0 again; the code reaches the ReadAsync and starts doing it (I assume) but the call never returns and no exception is thrown. The app remains running.

To repro the problem create a new Blazor Server project, and Install/add/reference Sylvan Excel by package or code. Open the Counter.razor page and add these two lines to the IncrementCount() method.

        using var fs = new FileStream(@"C:\temp\problem.xls", FileMode.Open, FileAccess.Read, FileShare.Read, 128 * 1024);
        using var edr = ExcelDataReader.Create(fs, ExcelWorkbookType.Excel);

I'll email the problem.xls file over..

Endcoding.Default and xlsx

Mark,

I created another pull request. I fixed an issue with xlsx xml reader and using Ecoding.Default, however the underlying issue of non ascii data is still present in xls/xlb, I didn't want to mess with that as my understanding is poor.

Unit test added

C

=Today() formula returns last saved date rather than current date

Note I'm using DD/MM/YYYY format in my example.

I had a cell with following formula:
=Today() - AB2 - 10 (15/08/2022 - 02/06/2022 - 10)
In Excel it was displaying the correct value of 65 (days difference). However, both Sylvan and OpenXML were returning 46. It turns out that 46 days was the difference from the last saved date (28/07/2022). Saving the file again fixed the problem and correct value was returned by both Sylvan and OpenXML.

Exctract XLSX

I Have another missmatch, and also on only some of the rows. Excample InvoiceNumber missmatch on row = 95 instead of "125060" it appears as "0". Tried to convert to xlsm & xlsb (same errors).

FILE: Email

Setup.JoinHeaders() returns:
"0,CustomerNumber,2,3,SupplierNumber,SupplierCountry,6,InvoiceNumber,8,9,DueDate,11,12,InvoiceType,14,15,16,17,OutstandingDays,19,OriginalInvoiceAmount,21,CurrentBalanceInvoiceAmount,23,OriginalInvoiceCurrency"

Setup.NeededHeaders() returns:
image

Code and class like in previous issues.

How to search for a string of text in header that isn't on row 1?

I've got a report that is a daily data extract. The location of the column header that I'm looking for is always changing, based on the export table that was defined. Also, it's following after the metadata, which means that it's never on row 1. How can I quickly search data, and extract an entire column, based on the cell that my query term is found in?

Formats Issue

  1. I have an issue when using .Load() function to create DataTable, some of my tables in excel contain an int value for the first 10,000 rows, later on it's mixed between int & string. This causes an exception as it cannot load string value to column it specified based on first rows. Is there a way to either look at entire scope of rows or to force all data type as string? as some are even empty. <- This is just an excample, file is a mess column might appear as date and later have blank cells or even strings etc or hold double and amount that appear as int
  2. Additionaly in below code, RowCount actually return column count.
                using (ExcelDataReader excelDataReader = ExcelDataReader.Create(_filePath))
                {
                    int WsCount = excelDataReader.WorksheetCount;
                    int ColumnCount = excelDataReader.FieldCount;
                    int RowCount = excelDataReader.RowCount;
                    DataTable dataTable = new DataTable();
                    dataTable.Load(excelDataReader);
                }
  1. Additionaly there some reports i have that start table on 3rd row (Headers from A3 to K3), 1st row beeing automated message (only cell A1) and second empty. .Load() function sets table on first column only. although .FieldCount is set correctly. It would be helpuff to be able to pass int declaring from which row extract should occur, if it's possible. I'll test later if iteration works in this case.

I'll wait for nuget with 0.1.7 as I do not now how to compile the code to implement .dll, im to fresh :)

But other than that, claim that it's the fastest reader seems to be warranted, great work :)

IndexOutOfBounds in XlsxWorkbookReader

I've a file here (that I cannot supply publicly, but can work out a way to provide if needed) that causes EDR to throw an IOOB from the constructor for the EDR:

image

The xfselem doesn't have a count attribute, so this ends up with c as 0, and the xfMap is 0 long. I'm wondering if an alternative value for c could sensibly be xfsElem.ChildNodes.Count ?

The file itself has some issues, but I'm raising a spearate ticket for a feature request for those

ExcelDataReader Create suggestions

For the Create(string path) overload, allow the passing of a ExcelWorkbookType so we can pass paths of any extension (like .tmp from path.gettempfilename) together with an overriding type

Add a message to the ArgumentException throw to indicate something like "The file extension {Path.GetExtension(filename)} is not supported. Pass a path with any supported extension: [{string.Join(", ", FileTypeMap.Keys}] or use the fileType argument to override extension-based file-type detection"

(Motivation for raising this issue was, I put a .tmp filename in and just got a bland "System.ArgumentException: 'Arg_ArgumentException Arg_ParamName_Name'" error out of it - had to read the code to determine that extension was used to determine type)

How to specify custom header row?

I have the following function to read an xlsx file as an datatable

public static DataTable GetDataTableFromExcelFile(Stream stream, string sheetName, ExcelWorkbookType format)
        {
            var dt = new DataTable();
            var dataReader = ExcelDataReader.Create(stream, ExcelWorkbookType.ExcelXml);
            while (dataReader.WorksheetName != sheetName && dataReader.NextResult());
            dt.Load(dataReader);
            return dt;
        }

this uses the first row as header, how can i specify a custom row number to be used as header?

Excel xlsx reader returns blank values for rows

Hi Mark,

I have an xlsx file whereby the read returns no values. I've debugged the issue and it's deep in the xml reading. I need to transfer the file privately, do you have a contact email, perhaps you could message me privately? charliedurrant at gmail dot com

Charlie

Headers not in first row, when reading xlsb files

By default, it will expect the first row of each sheet to contain headers. This can be disabled however. When disabled, columns can only be accessed by ordinal.

as per your comment, this should be the default behaviour, but in case of xlsb file headers are not in the first row.
to check if headers are being read i tried
_reader.GetColumnSchema()[0].ColumnName

which successfully gives the column header name.

Warning is displayed when opening ExcelDataWriter's result file in MS Excel

I'm using Sylvan.Data.Excel's version 0.4.0. After creating an .xlsx file and opening it with MS Excel (specifically MS Excel 2019 MSO Version 2210 Build 16.0.15726.20188) I get the following message box:

Capture

Clicking 'Yes' opens the file properly and no data is lost or corrupted but I believe this window should not be displayed.

Here is the code I used to produce the file:

using Sylvan.Data.Excel;
using System.Data;

DataTable table = new DataTable();

DataColumn idColumn = table.Columns.Add("ID", typeof(int));
table.Columns.Add("Name", typeof(string));

table.PrimaryKey = new DataColumn[] { idColumn };

table.Rows.Add(new object[] { 1, "Mary" });
table.Rows.Add(new object[] { 2, "Andy" });
table.Rows.Add(new object[] { 3, "Peter" });
table.Rows.Add(new object[] { 4, "Russ" });

DataTableReader reader = table.CreateDataReader();
using (var edw = ExcelDataWriter.Create("data.xlsx"))
{
    var result = edw.Write(reader, "My sheet");
    Console.WriteLine($"Is complete: {result.IsComplete}, rows written: {result.RowsWritten}");
    reader.Close();
}

After some investigating I discovered that the culprit is docProps/app.xml file - AppVersion property specifically. After changing its value manually to e.g. 16.0300 (or deleting the whole element) the file opens with no issues.

Duplicate key crash, when creating xlsx reader

stacktrace :

System.ThrowHelper.ThrowAddingDuplicateWithKeyArgumentException<T>(T)
 System.Collections.Generic.Dictionary<TKey, TValue>.TryInsert(TKey, TValue, System.Collections.Generic.InsertionBehavior)
 System.Collections.Generic.Dictionary<TKey, TValue>.Add(TKey, TValue)
 Sylvan.Data.Excel.XlsxWorkbookReader.XlsxWorkbookReader(System.IO.Stream, Sylvan.Data.Excel.ExcelDataReaderOptions)
 Sylvan.Data.Excel.ExcelDataReader.Create(System.IO.Stream, Sylvan.Data.Excel.ExcelWorkbookType, Sylvan.Data.Excel.ExcelDataReaderOptions)
 Sylvan.Data.Excel.ExcelDataReader.Create(string, Sylvan.Data.Excel.ExcelDataReaderOptions)

can provide the file, but it has some sensitive information.

Not seeing last column

I tested this library and it works fine in most cases. However, I've noticed one issue.
I have one test file that looks like this:
image

Library is not aware of the last (E) column. ColumnCount returns 4 and trying to retrieve ColumnName at index 5 does not work. However, it is possible to retrieve value through edr.GetString(5). So using <= edr.FieldCount in the loop that retrieves values fixes this problem for this particular file but at the same time means I'd be trying to read 1 column too many in other files.

I am not sure what causes this problem so I don't know whether it is limited only to last column in certain files, or if it might not recognise any column, or multiple columns.

testLoader.xlsx

Extract XLSB

Setup.JoinHeaders() returns: "SupplierCountry,SupplierRegion,SupplierNumber,3,4,5,CustomerNumber,CustomerCountry,CustomerRegion,9,10,11,InvoiceNumber,InvoiceType,DueDate,OriginalInvoiceCurrency,OriginalInvoiceAmount,CurrentBalanceInvoiceAmount,18,19,20,21,22,23,24,25,26,27,28,WHTSmallDifference,30,31,32,33,34,35"

Setup.NeededHeaders() returns:
image

File:
On Email

Issue:
Excample: for InvoiceNumber "353010003330" : Extracts Number - 10728114.41, where it should be different. (as on file :))
When converted to xlsm/xlsx works fine.
So i checked against previous versions: it had same issue.

Code:

                //sylvan option, get error lines as null (example : 20202/02/02 will be extracted as null date)
                var options = new SylExcel.ExcelDataReaderOptions { Schema = SylExcel.ExcelSchema.Default, GetErrorAsNull = true };
                //collect data
                using (SylExcel.ExcelDataReader excelDataReader = SylExcel.ExcelDataReader.Create(Setup.FilePath, options))
                {
                    //loop to locate needed sheet from passed setup
                    while (excelDataReader.WorksheetName != Setup.SheetName)
                    {
                        excelDataReader.NextResult();

                        if (excelDataReader.WorksheetName == null)
                        {
                            //throw exception, that sheet was not located
                            throw new Exception("didnt find the sheet");
                        }
                    }
                    //loop to find headers, this will allow to skip unwanted rows, such as comments, empty lines
                    for (int i = 0; i < Setup.StartRow; i++)
                    {
                        excelDataReader.Read();
                    }
                    //get all coulmns in file to column specified as max
                    var schema = Schema.Parse(Setup.JoinHeaders());

                    excelDataReader.InitializeSchema(schema.GetColumnSchema(), useHeaders: true);

                    var dataReader = excelDataReader.Select(Setup.NeededHeaders());
                    var binderOptions = new DataBinderOptions
                    {
                        InferColumnTypeFromMember = true,
                        BindingMode = DataBindingMode.Any
                    };
                    DataBinder.Create<T>(dataReader, binderOptions);
                    return dataReader.GetList<T>().ToList();
                }

Class of T:

    public class ExcelInvoice
    {
        public string? HashKey { get; set; }
        public bool? IsActive { get; set; }
        public string? GenKey { get; set; }
        public int? InsertID { get; set; }
        public string? SupplierNumber { get; set; }
        public string? CustomerNumber { get; set; }
        public string? InvoiceNumber { get; set; }
        public DateTime? DueDate { get; set; }
        public decimal? OriginalInvoceAmount { get; set; }
        public string? OriginalInvoceCurrency { get; set; }
        public DateTime? PaymentDate { get; set; }
        public string? BaswareStatus { get; set; }
        public string? Reference { get; set; }
        public string? SupplierCountry { get; set; }
        public string? SupplierRegion { get; set; }
        public string? CustomerCountry { get; set; }
        public string? CustomerRegion { get; set; }
        public string? InvoiceType { get; set; }
        public decimal? InvoiceBalance { get; set; }
        public string? InvoiceBalanceCurrency { get; set; }
        public string? WHTSmallDifference { get; set; }
        public decimal? LocalAmount { get; set; }
        public int? DaysOutstanding { get; set; }
        //additional data to be added
        public string ReportType { get; set; }
        public string FileHash { get; set; }
    }

Unable to read some excel files - Error: Specified argument was out of the range of valid values. Parameter name: i

Team,

We are unable to read some excel files and getting below error while reading data.
Message:โ€‰
System.AggregateException : One or more errors occurred.
----> System.ArgumentOutOfRangeException : Specified argument was out of the range of valid values.
Parameter name: i

Stack Trace:โ€‰
AggregateException.Handle(Func`2 predicate)
ExcelFileSylvanFileReader.ctor(Import import)โ€‰lineโ€‰39
FileReaderFactory.GetFileReader(Import import)โ€‰lineโ€‰58
Converter.ConvertFile(Import currentImport)โ€‰lineโ€‰68
ConvertFile_ExcelReaderBaseTests.IndexOutOfRange()โ€‰lineโ€‰378
--ArgumentOutOfRangeException
XlsxWorkbookReader.GetSharedString(Int32 i)
XlsxWorkbookReader.ParseRowValues()
XlsxWorkbookReader.InitializeSheet()
XlsxWorkbookReader.NextResult()
XlsxWorkbookReader.ctor(Stream iStream, ExcelDataReaderOptions opts)
ExcelDataReader.Create(Stream stream, ExcelWorkbookType fileType, ExcelDataReaderOptions options)
ExcelSlyvanToBlockingCollection.Read()โ€‰lineโ€‰115
<.ctor>b__0()โ€‰lineโ€‰29
Task.InnerInvoke()
Task.Execute()

Data.xlsx

Improve strongly-typed accessors

I have scenarios where imported Excel columns have empty or unknown values as strings rather than required data type, eg , "'-" etc. (Unfortunately we can't always control what people input in Excel).

When using the strongly-typed accessors such as DateTime dt = edr.GetDateTime(i); this will throw error Input string was not in a correct format.. I thought GetErrorAsNull might fix this but this seems to only handle actual cell errors rather than casting errors.

Is it possible to either add nullable types or ability to handle fail casts without throwing errors?

Something like this...

DateTime? dt = edr.GetDateTime(i, returnNullOnFail: true);

or

DateTime? dt = edr.GetDateTimeNullable(i);

At the moment I have to return all data as string and handle parsing manually.

33 - System.IO.InvalidDataException: 'Found invalid data while decoding.'

Yet another one, sorry you must hate me at this point.

Crash at create:

                var options = new SylExcel.ExcelDataReaderOptions { Schema = SylExcel.ExcelSchema.Default, GetErrorAsNull = true };
                //collect data
                using (SylExcel.ExcelDataReader excelDataReader = SylExcel.ExcelDataReader.Create(Setup.FilePath, options))

Exception:

System.IO.InvalidDataException: 'Found invalid data while decoding.'

File
Copy of dmsg-11-50-a3-a.xls

Additionaly, it has error with mapping class on ExRate due to Venezuelan "NA", I fixed that with ExRateStr but just to let you know :)

    public class ExRateItem
    {
        //fixer
        private string _ExRateStr;
        public string ExRateStr
        {
            get { return _ExRateStr; }
            set
            {
                _ExRateStr = value;
                try
                {
                    ExRate = decimal.Parse(ExRateStr);
                }
                catch
                {
                    ExRate = 0;
                }
            }
        }

        public string Country { get; set; }
        public string CurrencyName { get; set; }
        public string BaseCurrency { get; set; }
        public string LocalCurrency { get; set; }
        public int Year { get; set; }
        public int Month { get; set; }
        public decimal? ExRate { get; set; }
        public bool isActive { get; set; }
        public string AddedBy { get; set; }
        public DateTime? AddedDate { get; set; }
        public string ModifiedBy { get; set; }
        public DateTime? ModifiedDate { get; set; }
    }

System.ArgumentNullException: 'Value cannot be null. Arg_ParamName_Name'

This exception was originally thrown at this call stack:
System.ThrowHelper.ThrowArgumentNullException(System.ExceptionArgument)
System.Collections.Generic.Dictionary<TKey, TValue>.FindValue(TKey)
System.Collections.Generic.Dictionary<TKey, TValue>.this[TKey].get(TKey)
Sylvan.Data.Excel.XlsbWorkbookReader.XlsbWorkbookReader(System.IO.Stream, Sylvan.Data.Excel.ExcelDataReaderOptions)
Sylvan.Data.Excel.ExcelDataReader.Create(System.IO.Stream, Sylvan.Data.Excel.ExcelWorkbookType, Sylvan.Data.Excel.ExcelDataReaderOptions)
Sylvan.Data.Excel.ExcelDataReader.Create(string, Sylvan.Data.Excel.ExcelDataReaderOptions)

Mapping to class

Setup.JoinHeaders() returns: "0,CustomerNumber,2,3,SupplierNumber,SupplierCountry,6,InvoiceNumber,8,9,DueDate,11,12,InvoiceType,14,15,16,17,DaysOutstanding,19,LocalAmount,21,InvoiceBalance,InvoiceBalanceCurrency,OriginalInvoceCurrency"

Setup.NeededHeaders() returns:
image

I even tried to put them in same arrangement but encountered same issue.
image

File:
AP ageing 202108 - Copy.xlsx

Issue:
This previously also mapped correctly, but this time it seems to missmatch values. Excample: CustomerNumber is taken from col "C" insted of "B", or Invoice Number form "O" instead of "H".

Additional Info: When changed to xlsb, worked fine.

So i checked against previous versions: 1.9 = current issue ,1.8 crashed on .Create, 1.7 mapped correctly.
Code:

                //sylvan option, get error lines as null (example : 20202/02/02 will be extracted as null date)
                var options = new SylExcel.ExcelDataReaderOptions { Schema = SylExcel.ExcelSchema.Default, GetErrorAsNull = true };
                //collect data
                using (SylExcel.ExcelDataReader excelDataReader = SylExcel.ExcelDataReader.Create(Setup.FilePath, options))
                {
                    //loop to locate needed sheet from passed setup
                    while (excelDataReader.WorksheetName != Setup.SheetName)
                    {
                        excelDataReader.NextResult();

                        if (excelDataReader.WorksheetName == null)
                        {
                            //throw exception, that sheet was not located
                            throw new Exception("didnt find the sheet");
                        }
                    }
                    //loop to find headers, this will allow to skip unwanted rows, such as comments, empty lines
                    for (int i = 0; i < Setup.StartRow; i++)
                    {
                        excelDataReader.Read();
                    }
                    //get all coulmns in file to column specified as max
                    var schema = Schema.Parse(Setup.JoinHeaders());

                    excelDataReader.InitializeSchema(schema.GetColumnSchema(), useHeaders: true);

                    var dataReader = excelDataReader.Select(Setup.NeededHeaders());
                    var binderOptions = new DataBinderOptions
                    {
                        InferColumnTypeFromMember = true,
                        BindingMode = DataBindingMode.Any
                    };
                    DataBinder.Create<T>(dataReader, binderOptions);
                    return dataReader.GetList<T>().ToList();
                }

Class of T:

    public class ExcelInvoice
    {
        public string? HashKey { get; set; }
        public bool? IsActive { get; set; }
        public string? GenKey { get; set; }
        public int? InsertID { get; set; }
        public string? SupplierNumber { get; set; }
        public string? CustomerNumber { get; set; }
        public string? InvoiceNumber { get; set; }
        public DateTime? DueDate { get; set; }
        public decimal? OriginalInvoceAmount { get; set; }
        public string? OriginalInvoceCurrency { get; set; }
        public DateTime? PaymentDate { get; set; }
        public string? BaswareStatus { get; set; }
        public string? Reference { get; set; }
        public string? SupplierCountry { get; set; }
        public string? SupplierRegion { get; set; }
        public string? CustomerCountry { get; set; }
        public string? CustomerRegion { get; set; }
        public string? InvoiceType { get; set; }
        public decimal? InvoiceBalance { get; set; }
        public string? InvoiceBalanceCurrency { get; set; }
        public string? WHTSmallDifference { get; set; }
        public decimal? LocalAmount { get; set; }
        public int? DaysOutstanding { get; set; }
        //additional data to be added
        public string ReportType { get; set; }
        public string FileHash { get; set; }
    }

Wierd thing, other xlsx file seems to work ok.

Working File:
Setup.JoinHeaders() returns:" 0,SupplierCountry,SupplierRegion,SupplierNumber,4,5,6,7,CustomerNumber,CustomerCountry,CustomerRegion,11,12,InvoiceNumber,InvoiceType,Reference,DueDate,OriginalInvoceCurrency,OriginalInvoceAmount,InvoiceBalance,InvoiceBalanceCurrency,21,22"

Setup.NeededHeaders() returns:
image

Working File:
matrix conversion no + reference Oracle vs Oracle ( + LE wave 12)TEST.xlsx

Error #N/A #N/D

Hi, I found this bug, when I have this Cell (#N/D), return a exception
image

kind regards

Forced Int32

Hello Mark,

I noticed an exception on below : {"Value must be non-negative and less than or equal to Int32.MaxValue. (Parameter 'count')"}

This is caused duet o the fact that one value is well above int32 range, although i though that it would be mapped as string?, those values are decimal in .xslb. spreadsheet. Value : "-11001000000"

I tried both mapping fields as decimal?, decimal. In my JoinHeaders() method, non of this worked. Once line was manually removed all worked fine.

        private List<T>? SylvanReader<T>()
        {
            try
            {
                var options = new SylExcel.ExcelDataReaderOptions { Schema = SylExcel.ExcelSchema.Default, GetErrorAsNull = true };
                //collect data
                using (SylExcel.ExcelDataReader excelDataReader = SylExcel.ExcelDataReader.Create(Setup.FilePath, options))
                {
                    //loop to locate needed sheet from passed setup
                    while (excelDataReader.WorksheetName != Setup.SheetName)
                    {
                        excelDataReader.NextResult();

                        if (excelDataReader.WorksheetName == null)
                        {
                            //throw exception, that sheet was not located
                            throw new Exception(string.Format("Didn't find the sheet: {0}", Setup.SheetName));
                        }
                    }
                    //loop to find headers, this will allow to skip unwanted rows, such as comments, empty lines etc.
                    for (int i = 0; i < Setup.StartRow; i++)
                    {
                        excelDataReader.Read();
                    }
                    //get all coulmns in file to column specified as max
                    //JoinHeaders is a method that returns array of needed sheets, and not needed as int. 
                    //excample: Data1, Data2, 2, 3, Data3, 5 as string 
                    var schema = Schema.Parse(Setup.JoinHeaders());

                    excelDataReader.InitializeSchema(schema.GetColumnSchema(), useHeaders: true);

                    //JoinHeaders is a method that returns only array of needed sheets. 
                    //excample: Data1, Data2, Data3 as string[]
                    var dataReader = excelDataReader.Select(Setup.NeededHeaders());
                    var binderOptions = new DataBinderOptions
                    {
                        InferColumnTypeFromMember = true,
                        BindingMode = DataBindingMode.Any
                    };
                    DataBinder.Create<T>(dataReader, binderOptions);
                    return dataReader.GetList<T>().ToList();
                }
            }
            catch(Exception ex)
            {
                throw new Exception(ex.Message);
            }
        }

ExcelSchema should allow remapping columns

Looking at #92, the column mapping component should be handled by the default ExcelSchema implementation. Currently, it only looks at ColumnName, and not BaseColumnName. The behavior should match Sylvan.Data.Csv.CsvSchema, which allows mapping. At the same time, I should add ExcelSchemaProvider abstract base.

edr.GetString(i) fails to pick up underscore '_' and replaces with 'U'

I use the below function to read each sheet, row and column from a excel file. When doing so I found that the underscores are not maintained when used as a string. If I open and save the file with Excel, it seems to work fine on the next file open, though not if the file was created with the library.

I also have attached an example file that seems to fail with this function. The file itself was full created with this library and works really well in general!

Parts.xlsx

Before
image

After
image

public static void LoadDataTables() {
            //0.35s for 16k rows
            Stopwatch sw = new Stopwatch();
            sw.Start();

            dataTables.Clear();
            bsTables.Clear();

            FileInfo fi = new FileInfo(Config.GetDatabasePath());
            Stream st = null;
            try {
                st = fi.Open(FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
            }
            catch (Exception) {
                return;
            }
            ExcelWorkbookType type = ExcelWorkbookType.Unknown;

            if (Config.GetDatabasePath().EndsWith(".xls"))
                type = ExcelWorkbookType.Excel;
            else if (Config.GetDatabasePath().EndsWith(".xslb"))
                type = ExcelWorkbookType.ExcelBinary;
            else if (Config.GetDatabasePath().EndsWith(".xlsx"))
                type = ExcelWorkbookType.ExcelXml;

            bool success = true;

            try {
                using ExcelDataReader edr = ExcelDataReader.Create(st, type);
                do {
                    var sheetName = edr.WorksheetName;
                    bool firstRow = true;
                    //int rowID = 2;  //follow same indexing as excel + skip header row

                    // enumerate rows in current sheet.
                    DataTable dt = new DataTable();

                    //while (get next row)
                    do {
                        DataRow toInsert = dt.NewRow();
                        //if (firstRow)
                        //    dt.Columns.Add("rowID");
                        //else
                        //    toInsert[0] = rowID++;

                        // can use edr.RowFieldCount when sheet contains jagged, non-rectangular data
                        for (int i = 0; i < edr.FieldCount; i++) {
                            var value = edr.GetString(i);
                            if (firstRow) {
                                if (!string.IsNullOrEmpty(value)) {
                                    dt.Columns.Add(value);
                                }
                                else {
                                    dt.Columns.Add("");
                                }
                            }
                            else {
                                toInsert[i] = value;
                                //toInsert[i + 1] = value;  //allow space for rowID if used.
                            }
                        }

                        if (!firstRow)
                            dt.Rows.Add(toInsert);

                        firstRow = false;
                    } while (edr.Read());

                    //double check header structure. raise warning with non-matching columns that it will corrupt any NEW data.
                    //if the columns are correct, but new ones were added, pad out the columns
                    bool mismatched_columns = false;
                    if (dt.Columns.Count != Types.ComponentInfo.COLUMN_HEADERS.Length) {
                        mismatched_columns = true;
                    }
                    else {
                        for (int i = 0; i < dt.Columns.Count; i++) {
                            if (dt.Columns[i].ColumnName != Types.ComponentInfo.COLUMN_HEADERS[i]) {
                                mismatched_columns = true;
                                break;
                            }
                        }
                    }

                    if (mismatched_columns) {
                        MainForm.Log.Warning($"Colunmns mismatched on sheet '{sheetName}'. Either file is old and column updates have been added or it is corrupted. Modifying data is fine. Though added new data lines will corrupt file");
                    }

                    // iterates sheets
                    dataTables.Add(sheetName, dt);
                    var bs = new BindingSource();
                    bs.DataSource = dataTables[sheetName];
                    bsTables.Add(sheetName, bs);
                } while (edr.NextResult());
            }
            //catch (InvalidDataException) {
            catch (Exception e) {
                success = false;
                if (e is InvalidDataException) {
                    MainForm.Log.Error($"Couldn't Excel DataBase file is corrupt '{Config.GetDatabasePath()}'");
                }
                else {
                    MainForm.Log.Error($"Couldn't Excel DataBase unknown error");
                }
            }

            if (st != null) {
                st.Dispose();
            }
            sw.Stop();
            if (success) {
                MainForm.Log.Debug($"Opened DB in {sw.ElapsedMilliseconds / 1000}.{sw.ElapsedMilliseconds % 1000:000} seconds");
            }
        }

Skip Blank Rows in .Read() while loop

I am running into an issue where there are blank rows in an excel worksheet. It appears in the first row after any blank row, some columns that read as "" when they should be populated. The row is populated in the in the file. Is this a known issue?

System.ArgumentOutOfRangeException: 'Index was out of range. (xlsx file)

This exception was originally thrown at this call stack:
System.ThrowHelper.ThrowArgumentOutOfRange_IndexException()
System.Collections.Generic.List.this[int].get(int)
System.Collections.ObjectModel.ReadOnlyCollection.this[int].get(int)
Sylvan.Data.Excel.XlsxWorkbookReader.IsDBNull(int)
Sylvan.Data.Excel.ExcelDataReader.GetValue(int)

Sylvan.Data.Excel.ExcelFormulaException, when reading xlsx values

stack trace :

Sylvan.Data.Excel.XlsxWorkbookReader.GetString(int)
Sylvan.Data.Excel.ExcelDataReader.GetValue(int)

i can provide the file, if there is any way i can turn off formula processing than that can be a valid fix too, it will increase performance too i guess?

Is there a way to read a specific worksheet with ExcelDataReaderOptions

I'm trying to sqlbulkcopy but the file has an schema different than the target table. For example: The file has multiple worksheets, the worksheet that I need has 1million rows, 10 columns and I only need 3 columns but all rows. Out of the three columns, I need to rename them.

I'm thinking I should prepare the excel first and then SqlBulkCopy. It would be great if I could open specific worksheet.

This is my code in visual basic.
edited code

Using edr = ExcelDataReader.Create(sourcePath)
                Do
                    Dim hoja = edr.WorksheetName
                    If hoja.ToLower().Contains("ValidWorkSheetName") Then

                        Dim binderOpc = New DataBinderOptions With {
                            .BindingMode = DataBindingMode.Any,
                            .InferColumnTypeFromMember = True
                        }
                        Dim binder = DataBinder.Create(Of HojaSap)(edr, binderOpc)
                        Dim dt = New DataTable()
                        dt.Columns.Add("Id")
                        dt.Columns.Add("value")
                        dt.Columns.Add("year")
                        dt.Columns.Add("month")
                        dt.Columns.Add("CompleteDate")
                        dt.Columns.Add("UpdateDate")
                        Dim dr As DataRow = Nothing
                        Dim UpdateDate = DateTime.Now
                        While edr.Read()
                            Dim item = New SchemaTestProperties()
                            binder.Bind(edr, item)
                            dr = dt.NewRow()
                            dr("Id") = item.Account
                            dr("value") = item.Value
                            dr("CompleteDate") = item.ProcessingTime
                            dr("year") = CInt(item.ProcessingTime.Substring(0, 4))
                            dr("month") = CInt(item.ProcessingTime.Substring(4, 2))
                            dr("UpdateDate") = UpdateDate 
                            dt.Rows.Add(dr)
                        End While

                        Dim dtReader = dt.CreateDataReader()
                        Using cdw As CsvDataWriter = CsvDataWriter.Create(FinalPath)
                            cdw.Write(dtReader)
                        End Using

                    End If
                Loop While edr.NextResult()
            End Using

Is there a way to skip rows of an excel in order to reach a header row? Feature request a way?

I've some files that look like this:

image

The two blank rows above the dark grey one are irrelevant to me but EDR experiences an index out of bounds exception if they are present (raised in separate ticket) - I presume because it's running into problems building the column schema. The dark grey row is the header row. The light grey row is a summary row that I can handle normally by skipping the read, but is there a way to skip the first two rows so that EDR parses the headers out of the third row?

This "has two useless rows" only happens on the first sheet of a multi sheet file, so perhaps a methodology of a callback Func<object[], bool> IsTheHeaderRow that I can attach a func to and EDR will call it repeatedly while it hasn't before returned true for this sheet; I'll inspect the object[] data to see if it's e.g. "the header I expect in column 1".Equals(myObject[0]). Some way like this would cope with variable numbers of junk rows before the header, but I could also track a bool in my own code and set a "HeaderRowIndex" option integer differently for the first sheet compared to the others

System.Memory and netstandard2.0

Hi Mark,

I've just put in a pull request and thought I'd give you some background.

I work with huge Excel files, 1 million lines with >300 columns is not uncommon and as you already know processing them in other libraries is very slow and memory hungry. Luckily I came across your library as I was about to embark on using the MS libraries to SAX parse xlsx files.

I integrated your library into our code. We have a configurable importer than can use different excel extractors as we've used a number of the years and each has their own set of issues. We have a mixed set of .net4.8 code and .netstandard and on adding your library I got a runtime error with respect to System.Memory, rightly as to use Span you have forced a dependency on a later version.

Unfortunately with the tangled web of references we have that fixed dependency was not possible to handle. I thus changed the code to no longer have the dependency. It's not a complex change and maintains the benefits of Span in netstandard 2.1

Once I had it working I then ran our unit tests against the code and found some issues. Many passed which was great but some failed when compared to the other excel extractors we have. This was great and I'd like to put in further pull requests to fix the issues or post the files with an explanation of the issue, however for me to be of use, I need to the code to not be dependent on the netstandard2.0 System.Memory version specified. Selfish I know...

I also wondered about writing some code to compare the output of say EPPlus which for us is the most reliable library against this library's output. This could be added to each unit test.

Charlie

Skipping columns when from start some rows are empty (not all)

It happens on = excelDataReader.Initialize(); which finally leads to excpetion "Specified argument was out of the range of valid values. (Parameter 'names')" on var dataReader = excelDataReader.Select(NeededHeaders());

It stops on NumberOfRecords (column in file - Number of records created/updated/checked) as it's empty for longer period of rows?

Excample file on email.

Extract Class:

        private List<T>? SylvanReader<T>()
        {
            try
            {
                var schema = Sylvan.Data.Schema.Parse(JoinHeaders());
                var options = new SylExcel.ExcelDataReaderOptions { Schema = new SylExcel.ExcelSchema(true, schema), GetErrorAsNull = true };
                //collect data
                using (SylExcel.ExcelDataReader excelDataReader = SylExcel.ExcelDataReader.Create(filepath, options))
                {
                    //loop to locate needed sheet from passed setup - not used in this excample
                    if (!string.IsNullOrWhiteSpace(columninfo.SheetName))
                    {
                        while (excelDataReader.WorksheetName != columninfo.SheetName)
                        {
                            excelDataReader.NextResult();

                            if (excelDataReader.WorksheetName == null)
                            {
                                //throw exception, that sheet was not located
                                throw new Exception("didnt find the sheet");
                            }
                        }
                    }

                    //loop to find headers, this will allow to skip unwanted rows, such as comments, empty lines - not used in this excample
                    for (int i = 0; i < columninfo.StartRow; i++)
                    {
                        excelDataReader.Read();
                    }

                    excelDataReader.Initialize();

                    AllRecords = excelDataReader.RowCount;

                    var dataReader = excelDataReader.Select(NeededHeaders());

                    var binderOptions = new DataBinderOptions
                    {
                        InferColumnTypeFromMember = true,
                        BindingMode = DataBindingMode.Any
                    };
                    DataBinder.Create<T>(dataReader, binderOptions);

                    List<T> extracts = dataReader.GetList<T>().ToList();

                    ExtractedRecords = extracts.Count;

                    return extracts;
                }
            }
            catch (Exception ex)
            {
                throw new Exception(ex.Message);
            }
        }

JoinHeaders() returns : "TypeOfRequest,IsMass,OutOfScope,OperatingUnit,Country,Other,AccountNumber,RequestID,RequestSystem,RequestorName,RequestorEmail,CompletedBy,DateOfArrival,DateOfCompletion,NumberOfRecords,Comment,RootCause,Priority"
NeededHeaders() returns:
image
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.