GithubHelp home page GithubHelp logo

modelica-tools / csv-compare Goto Github PK

View Code? Open in Web Editor NEW
24.0 12.0 13.0 738 KB

Tool to compare curves from one csv files with curves from other csv files using an adjustable tolerance

Home Page: https://www.modelica.org

License: Other

Makefile 0.05% C# 99.56% CSS 0.07% Shell 0.22% Batchfile 0.10% Dockerfile 0.01%
csv-files fmu hacktoberfest

csv-compare's Introduction

csv-compare

Build status

Build Status Build status

About

The CSV Result Compare can be used to compare curves from one CSV file with curves from other CSV files using a special algorithm.

The files are not compared 1:1 but a tube is generated around the curve and the values are checked to be inside the tube. The tube can be changed by setting a tolerance at the command line interface (--tolerance). The tolerance is a double value setting the width of the tube at discontinuity in x-direction.

Documents

For build instructions see: Build.md License information is provided in: LICENSE

Usage

To see current usage information start the tool with the command line argument --help or -h.

You can set the arguments that way:

  • --argument=value
  • --argument value
  • -a=value
  • -a value

To check two CSV files (compare "file_compare.csv" with the reference file "file_base.csv" in verbose mode and save the result to a html file in "C:\temp\test001" use the following command line:

compare.exe --mode=csvFileCompare --verbosity=1 --reportdir="C:\temp\test001" "file_compare.csv" "file_base.csv"

To check a folder recursively, compare the CSV files with reference files, a tolerance of 0.004 and report the test-results to the folder "C:\temp\test002", use the following line:

compare.exe --mode csvTreeCompare --reportdir "C:\temp\test002" --tolerance 4e-3 "C:\test\2013-05-17-Testrun" "C:\test\comparison-base-files"

To check a tree of FMU files against reference CSV files (in the same directory with the same name as the FMU or the name "protocol.csv") use the following command:

compare.exe -mode FmuChecker -c "C:\Program Files\FMUChecker-1.0.2-win64\fmuCheck.win64.exe" -r "C:\temp\test003" "C:\test\FMus\2013-05-17_Test"

The tool returns 0 if all results were valid and no errors occurred during the validation, 1 if there were invalid results and 2 if there were exceptions or errors during the program run.

Modes

CsvFileCompare

Is the default operation mode and compares a given compare file with a base file where "compare " is the value that is to be compared to the base file.

CsvTreeCompare

A given compare directory is recursively browsed for CSV files. When a compare CSV file has been found the tool searches the second argument, the base directory for the existence of a CSV file with the same name as the compare CSV. If the file is not found the tool searches for a base CSV in the same tree as the compare CSV. I.e.

compare.exe -m csvtreecompare c:\temp\compare c:\temp\base

When the file "c:\temp\compare\foobar\blubb.csv" has been found the tool searches for "c:\temp\base\blubb.csv" if not found it searches for the file "c:\temp\base\foobar\blubb.csv". That way you can compare whole directory trees.

FmuChecker

In this mode the tool uses the FMU checker tool. The path to the tool has to be given via command line (and optional the arguments for the checker). The output of the FMU checker is saved to a temporary directory and the directory containing the FMU is searched for a base CSV to compare the output with. At first the tool searches for a CSV file with the same name as the FMU. If not found it searches for a file called "protocol.csv".

If no report directory has been set the compare tool uses the directory of the compare file as the location for the master report and saves the reports for the several checks next to the compare CSV. If set all html files are saved in a flat structure in the given report directory using relative hyperlinks to each other to make it possible to disseminate the reports.

Hints

CSV (comma-separated values) is a file format that is used by many tools and lacks a proper standardization. RFC 4180 describes the basic definitions that are supported by this comparison tool, too.

The default settings for CSV compare to read a CSV file are:

  1. one dataset per line (CRLF or LF does not matter)
  2. first column contains the time and its heading is either "time" or "t"
  3. Default separator is ";" you can set the delimiter with the -d flag
  4. When the CSV contains double values and the delimiter is set to "," the double values have to be saved with a "." as decimal separator and/or they have to be enclosed in quotation marks (i.e. 0.001 "0,001")
  5. Column titles are parsed as they are

To receive optimal validation results it is good to use a base file with a better resolution than the compare file, so the tube can be calculated more accurate.

The tool generates html reports that contain scalable vector graphics of all results that are found in both files. Keep that in mind when you compare CSV files with many results as the created html reports can grow too big to be viewed in a browser. The best way is to prepare your base file to contain only results you really want to compare.

To run the tool you need Microsoft Windows and Microsoft .NET Framework 4.0 or Mono (Stable) and a Mono-Supported platform.

Alternatively a Docker container can be built and used to run the tool.

csv-compare's People

Contributors

bastianbin avatar beutlich avatar harmanpa avatar sjoelund avatar svenruetz avatar tbeu avatar thorade avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

csv-compare's Issues

Absolute vs Relative URLs

The generated reports use absolute file:// URLs. This makes it impossible to move the files or to serve them via http. An option should be available to use relative URLs instead.

Test on Linux machine

Compile and test the current code on a linux test system comparing the MSL regression testing tree with itself.

The --tolerance option doesn't work as expected (or: documentation needs to be improved)

I am unable to use the --tolerance option in a meaningful way, as the mapping of the specified value to the actual shape of the tube is completely opaque (to my mind at least).

Here's a capture of a tube created with --tolerance=0.05:

image

Can you please explain a bit more how the --tolerance works? The short mention in the help is hardly informative enough...

Trouble running the Compare tool due to locale settings

from: https://trac.modelica.org/Modelica/ticket/1646

I tried running the Compare tool (r8061) with a simple csv file using the command

Compare.exe Example.csv Example.csv

This however gives an error

2015-01-20Z09:45:36 [       Error ] Exception during run: Input string was not in a correct format.

where the problematic part seems to be the tolerance as the below command works fine,

Compare.exe -t 0,001 Example.csv Example.csv

This needs to be fixed so that it works out of the box. Additionally this dependency on the locale makes it difficult to use the Compare tool in other tools.

Incorrect handling of result points at the same timepoint

We previously created this ticket. It was fixed with the new algorithm. However we still see some odd behaviours, probably related to input handling.
fig
If you check the result points you see that the result points at t = 0.1 has been swapped.

Both the base and result has been swapped in the example above, however there are other places in that signal where only the result is swapped, t=~0.148, t=~0.1965, t=~2446, etc.

Command:

Compare.exe -o -m csvFileCompare -d , actual.csv expected.csv
Files can be found here
This was found using this version

Result compare error

When this csv file is processed by CSVCompare as both base and target, CSVCompare report errors. The upper and lower bounds starts to behave incorrectly.

Command used:

Compare.exe -o -m csvFileCompare -t 1e-3 -d , res.csv res.csv

Version:

Compare.exe -v
CSV File Comparison Tool v2.0.0-rel
Copyright ? 2015 ITI GmbH

Few error points causes strange error bar

First of, the new version of CSV compare is much better!

The behaviour for the error graph has changed since previous versions of csv-compare (pre 2.0?).
Now it only plot the graph for the part that has an error.
I don't know if this is intended or not but it causes problems when there's only a one or two error points:
fig1
fig2

All files that was used can be found here, and the command was:

Compare.exe -o -m csvFileCompare -r . -t 0.001 --inlineactual.csv target.csv
2015-03-20Z07:53:27 [ Warning ] aimc.idq_rr[2] is invalid! 1 errors have been found during validation.
2015-03-20Z07:53:27 [ Warning ] aimc.idq_rr[1] is invalid! 2 errors have been found during validation.
2015-03-20Z07:53:27 [ Warning ] aimc.idq_sr[1] is invalid! 1 errors have been found during validation.

Version:

Compare.exe --help
CSV File Comparison Tool 2.0.0.1
Copyright © 2015 - ITI GmbH

It would be really nice if the two graphs was synced, so if you zoomed in one, the other one was zoomed as well, in x-axis that is.

File remains in temporary folder after run

Running Compare.exe with the following input:

Compare.exe -r . --override result.csv reference.csv

where result.csv contains:

time;x
0.0;0.0
1.0;0.0
2.0;0.0

and reference.csv contains:

time;x
0.0;0.0
2.0;0.0

will give the following output:

2018-04-12Z15:51:42 [     Warning ] The resolution of the base x-axis is smaller
than the compare x-axis. The better the base resolution is, the better the
validation result will be!

It will also generate an empty file in windows temporary folder. An example file name is: tmpB053.tmp

Expected behavior is to not get any file in the temporary folder.

This was found in version 2.0.0.1

Missing variables causes crash

Since r8061 (on the MA svn) we get the following errors when a variable is missing in the result file.
Command used is

Compare.exe -o -m csvFileCompare res.csv base.csv

Output is:

2015-01-23Z15:02:43 [ Warning ] No report directory has been set, using ""
2015-01-23Z15:02:43 [ Warning ] actuator.armature.mass.der(s) not found in "
C:\Users\jon_041\Downloads\res.csv", skipping checks.
2015-01-23Z15:02:43 [ Error ] Exception during run: Object reference not set to an instance of an object.
CSV File Comparison Tool 2.0.0.8061
Copyright © 2014 - ITI GmbH

This software is free software and released under The BSD 3-Clause License
...

Report directory with trailing backslash (Win)

On Win8, Compare.exe reports Exception during run: Illegales Zeichen im Pfad. (for default --mode=CsvFileCompare) if the report directory specified by --reportDir has a trailing backslash. Would be nice if this could be considered as valid input.

Error while building using Visual Studio

Hello,

I have an issue that almost certainly is due to a lack of knowledge on using visual studio and applications. I've downloaded the files from GitHub and once I try to open the solution and build the project, I get an error "The program '[21536] Compare.exe' has exited with code 2 (0x2)."

Is there a way that you could elucidate me a bit more on how to run the application?

Thanks

Incorrect handling of trajectory endpoints

We have been seeing some issues with the false-negatives in the endpoints of some trajectory comparisons. It usually happens when the signal have a "high" derivative.
For example:
fig1
You can see that the bounds calculation doesn't account for the derivative of the trajectory at the end point. The two results are very close and should be accepted.

Example files can be found here:
https://gist.github.com/jon-modelon/656e05349415bb2776a8

Command:

Compare.exe -o -m csvFileCompare -r . -t 0.001 --inline actual.csv expected.csv

Version:

Compare.exe --help
CSV File Comparison Tool v2.0.0-rel
Copyright ? 2015 ITI GmbH

Compare.exe needs a lot of RAM in tree compare mode

source: https://trac.modelica.org/Modelica/ticket/1486

If I compare all test cases of MSL in tree mode, compare.exe needs about 5 GB of RAM.
On my 8 GB machine, I have to close applications before being able to start the tree compare operation.

If the number of test cases increases in future, it's going to get critical on standard PCs.

It should be investigated if RAM usage can be reduced.
E.g. by writing reports to hard disk during tree compare.
Currently it seems to write all reports at the end.

Used version: compare.exe 64bit r7267

Used call:

compare.exe --mode csvTreeCompare --tolerance 2e-3 --delimiter "," --verbosity 2 C:\Work\upload\RegressionTesting\MSL\ReferenceResults\v3.2.1+build.2.release\Modelica C:\Work\upload\RegressionTesting\MSL\ReferenceResults\v3.2.1+build.2.release\Modelica --logfile log.txt --comparisonflag --reportdir 07_r7267_Modelica

Index out of bounds exception

When I run Compare.exe such as:
Compare.exe -o -m csvFileCompare -r ".\output" ".\data\actual.csv" ".\data\expected.csv"

The files can be found at: https://gist.github.com/JohanYli/cc6a9bb6f8d7e106ac9f

I get the error:
[ Error ] Exception during run: Index was outside the bounds of the array.

I use version 2.0.0.1 of the CSV File Comparison Tool and run it on a Windows 7 machine.

csvFileCompare fails since 2.0.1

mode csvFileCompare compares files but before writing report the error "Paths do not have the same base" occurs even though paths are ok. Same setup with version 2.0.0 works...

Compare.exe does not accept relative .csv files from current working directory

> compare.exe file.csv file_base.csv
2019-07-18Z15:00:35 [       Error ] Exception during run: The path is not of a legal form.

The problem is that it tries to deduce the report directory from the first .csv file. Here there is no leading directory so Path.GetDirectoryName(options.Items[0]) is empty string. Then Path.GetFullPath("") causes an exception. You should deal with this case and in case of empty string passed to GetFullPathWithEndingSlashes you should ask for current working directory Path.GetFullPath(".")

Next is the current logic from Program.cs that needs fixing:

options.ReportDir = GetFullPathWithEndingSlashes(Path.GetDirectoryName(options.Items[0]));
....
private static string GetFullPathWithEndingSlashes(string input)
       {
           string fullPath = Path.GetFullPath(input);
           return fullPath.TrimEnd(Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar) + Path.DirectorySeparatorChar;
       }

Missing variables are not always skipped

CSV Compare does not skip variables which aren't present in both csv files in some cases.
For example the following base.csv:

time;a;b;c
0;1;2;3
1;2;3;4
2;3;4;5
3;4;5;6
4;5;6;7

and actual.csv:

time;a;x;c
0;1;0;3
1;2;0;4
2;3;0;5
3;4;0;6
4;5;0;7

Note that b exists in base.csv but not in actual.csv, and x is missing from base.csv but exists in actual.csv. In the generated report the variable b is listed with a 100% success rate. This is a rather severe bug since the user get the impression that everything is fine and the variable produces the correct result! It does however leave a warning in the log, saying that b has been skipped (which it apparently haven't).
Log:

2015-07-07Z13:16:51 [     Warning ] b not found in "actual.csv", skipping checks.

Command used:

Compare.exe actual.csv base.csv -r out

Version:

CSV File Comparison Tool v2.0.0-rel
Copyright ? 2015 ITI GmbH

Use one time vector for results in html reports

We have to check if the used jqplot library supports a different way of reading its data sets to be able to set a variable with the time values and use this variable instead of repeating vectors in the report.

Maybe the series to plot could be initialized seperately.

Junit support

Since CSVcompare is typically used in CI toolchains for verifying non-regression of results, it would make sense to also generate a Junit XML summary of the comparisons.

Index out of bound exception

We're getting "Exception during run: Index was outside the bounds of the array."
Command used is:

Compare.exe -o -m csvFileCompare -r . -t 0.001 --inline res.csv base.csv

And the files can be found here

Exact error message is:

2015-05-27Z10:58:36 [ Warning ] The resolution of the base x-axis is smaller than the compare x-axis. The better the base resolution is, the better the validation result will be!
2015-05-27Z10:58:36 [ Warning ] a is invalid! 219 errors have been found during validation.
2015-05-27Z10:58:36 [ Error ] Exception during run: Index was outside thebounds of the array.

The two warnings are rather strange, the two result files has different resolution during different time (due to events).
It would also be rather interesting to know why "a" is invalid...

Version of CSV Compare:

CSV File Comparison Tool v2.0.0-rel

Locale-dependent error value in compare_failed.log

CSV Compare Version 2.0.0.1 (X86)
Comparison result file for C:\temp\actual.csv
. Time:        2015-04-15T14:02:28.7713039+02:00
. Operation:   CsvFileCompare
. Tolerance:   0.001
. Result:      failed
. Biggest error: actual.a=>2,49843247137984E-07
. Failed values:
actual.a=>2,49843247137984E-07

Tolerance is locale-independent (as expected) but error value is not.

Set tolerance of tube in y-axis

I am working with a set of signals of varying magnitude where one signal is very small (and very small variation). Since there seem to be some limitation on the ratio of sides of the rectangle used to calculate the tube (I saw a max statement with 0.0004 in the ratio calculation) this allows really big relative differences of small results.

As I understand the setting of tolerance, it only affects the width of the tube on the x-axis, is there a way to specify the width of the tube on the y-axis instead? Preferably on a signal-to-signal basis (not all signals are small). Or can you improve the tube algorithm, there seems to be an attempt to this in the function 'SetStandardBaseAndRatio' in Modelica_ResultCompare/CurveCompare/TubeSize.cs but for some reason 'SetFormerBaseAndRatio' is used instead.

NaN and Infinity support

Is there any chance that the support for NaN and Infinity will improve in the near future?
In the case of NaN an error is thrown, but for Infinity the behavior is different depending where the Infinity is present, in the base.csv or in the to_compare.csv.

Detailed documentation

@svenruetz Is there any detailed document on the CSV-compare tool? It would be great to help us to understand purpose of each function.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.