GithubHelp home page GithubHelp logo

Comments (18)

dankelley avatar dankelley commented on August 17, 2024 1

I'm closing this now, because I now know that we cannot expect a high level of agreement, given that the Matlab system is different from the GSW-C system.

from gsw-r.

dankelley avatar dankelley commented on August 17, 2024

Sorry, I forgot to tag @PaulMBarker in previous comment.

from gsw-r.

PaulMBarker avatar PaulMBarker commented on August 17, 2024

from gsw-r.

dankelley avatar dankelley commented on August 17, 2024

Thanks, Paul. I am relaxing the built-in tests for GSW-R. Dan.

from gsw-r.

efiring avatar efiring commented on August 17, 2024

See TEOS-10/GSW-C#36, which I merged just now. I had forgotten
that it was still sitting there unmerged. The current C code, and the Python code, are
using pchip interpolation, which is better than the buggy old interpolator in the original
C code, but which does not match either the old or the new Matlab code. Implementing
the newest Matlab algorithm in C will be a big job.

from gsw-r.

dankelley avatar dankelley commented on August 17, 2024

@efiring thanks for the note on the GSW-C update. Might I ask a question? I see (snapshot of git diff HEAD^1 below) that there are some comments about not using gsw_data_v3_0.nc, and I'm wondering if that means GSW-R should avoid that file. At the moment, I use the GSW-Fortran/test/gsw_data_v3_0.nc file and I'd like to know which (if either) I ought to use.

Please note that I do not incorporate this netcdf file directly into GSW-R. Doing so would put it over the package-size limit. Instead, I read it in and save a file in what's called "rda" R-data-archive format, which shrinks it from 4.7M to 1.4M. (I've no idea how it can compress so much better than netcdf, but I think it is doing it right, since all test values from matlab work, except for the one that this issue is about.)

Screen Shot 2021-07-06 at 11 06 11 AM

from gsw-r.

efiring avatar efiring commented on August 17, 2024

See also TEOS-10/GSW-C#35.
The current C version is using the v3_06_11 matfile data and check
values for everything except geo_strf_dyn_height; for the latter,
I patched in values that match the pchip interpolator being used.
In the C code, the check values are all stored in ascii as C source
code.
Can you use that C source? Or read it in and save it in your "rda"
format? Or make a modified version of "make_data_from_mat.py" that
would generate some other format you can read and then save as
"rda"?

from gsw-r.

dankelley avatar dankelley commented on August 17, 2024

@efiring Just to check, do you mean the following file, which I got from my minutes-old download from http://www.teos-10.org/software/gsw_matlab_v3_06_12.zip?

$ pwd
/Users/kelley/src/gsw_matlab_v3_06_12/library
$ md5 *mat
MD5 (gsw_data_v3_0.mat) = 25b840e6f8bdd72522e28d99eccd9e99

from gsw-r.

efiring avatar efiring commented on August 17, 2024

No, the base is from the Matlab version before that, gsw_matlab_v3_06_11/library/gsw_data_v3_0.mat.
And then the check values for strf_dyn_height have to be adjusted.

from gsw-r.

efiring avatar efiring commented on August 17, 2024

You can see the adjustment in https://github.com/TEOS-10/GSW-C/pull/36/files, in the diff
for gsw_check_data.c.

from gsw-r.

dankelley avatar dankelley commented on August 17, 2024

Any idea how I can obtain gsw_matlab_v3_06_11/library/gsw_data_v3_0.mat? The website only provides the more recent version, gsw_matlab_v3_06_12, at http://www.teos-10.org/software.

Things are quite tricky without git for versioning.

from gsw-r.

dankelley avatar dankelley commented on August 17, 2024

Oh, sorry @efiring -- I just realized that I can download http://www.teos-10.org/software/gsw_matlab_v3_06_11.zip even though it's not linked-to in that webpage. The rest of this comment refers to http://www.teos-10.org/software/gsw_matlab_v3_06_12.zip ...

As a matter of interest, I tried reading the matlab file within http://www.teos-10.org/software/gsw_matlab_v3_06_12.zip and comparing with the netcdf file in https://github.com/TEOS-10/GSW-Fortran and I get identical values for delta_SA, but differing values for SAAR_ref. The graph below shows a histogram of the difference in SAAR_ref between netcdf and matlab (top panel) and, for context, shows a histogram of SAAR_ref from netcdf. The top panel is trimmed of a very few values that are far from this central cloud. Also, as shown in the text above that top panel, the number of "bad" data is quite different between these two files.

In summary, the SAAR_ref data are somewhat different between the matlab file and the gsw-fortran netcdf file. A rough comparison of the two graphs shows that the difference is of order 2e-6, while the typical values are fairly broadly distributed from zero to about 3e-4. In other words, the difference in SAAR is of roughly a few percent.

My next step will be to compare http://www.teos-10.org/software/gsw_matlab_v3_06_11.zip with http://www.teos-10.org/software/gsw_matlab_v3_06_12.zip

create_saar_new

from gsw-r.

dankelley avatar dankelley commented on August 17, 2024

Hm, interesting. The matlab versions 3.06-11 and 3.06-12 are quite different. In 3.06-11, there are 46 latitude values, spanning the range -90 to 90, but in 3.06-12, there are 45 lats, spanning the range -86 to 90. And of course there are corresponding changes in the dimensionality of the arrays.

I thank @efiring for emphasizing that GSW-C is referenced to 3.06-11, not 3.06-12. It clearly will make a big difference, as C does not have a proper array type, so indexing is done by computing locations in a vector, knowing the dimensions of the array to which it corresponds. (For example, in GSW-C/gsw_saar_data.c line 5, we have a definition that gsw_ny is 46, which is right for 3.06-11 but would be wrong for 3.06-12. Indexing past the end of a vector is a big problem in C, and so it would have been a big mistake for me to have used a 3.06-12 matlab file with GSW-C code. (I won't bother checking the netcdf in GSW-Fortran ... at some point, it just gets too confusing, trying to keep up with all these files, and the decision of GSW-R is to key on GSW-C.

# Compare data in matlab versions 3.06-11 and 3.06-12

library(R.matlab)
#> R.matlab v3.6.2 (2018-09-26) successfully loaded. See ?R.matlab for help.
#> 
#> Attaching package: 'R.matlab'
#> The following objects are masked from 'package:base':
#> 
#>     getOption, isOpen
m11 <- readMat("~/src/gsw_matlab_v3_06_11/library/gsw_data_v3_0.mat")
str(m11, 1)
#> List of 20
#>  $ CT.ref.cast     : num [1:32, 1] 28.8 28.8 28.7 28.4 27.8 ...
#>  $ SAAR.ref        : num [1:45, 1:46, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ SA.ref.cast     : num [1:32, 1] 34.7 34.7 34.8 34.9 35 ...
#>  $ SR.ref          : num [1:45, 1:46, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ deltaSA.ref     : num [1:45, 1:46, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ deltaSA.ref2    : num [1:45, 1:46, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ gamma.n.ref.cast: num [1:32, 1] 21.8 21.8 21.9 22.1 22.4 ...
#>  $ gsw.cv          :List of 791
#>   ..- attr(*, "dim")= int [1:3] 791 1 1
#>   ..- attr(*, "dimnames")=List of 3
#>  $ gsw.demo.data   :List of 11
#>   ..- attr(*, "dim")= int [1:3] 11 1 1
#>   ..- attr(*, "dimnames")=List of 3
#>  $ lat.ref.cast    : num [1, 1] 4
#>  $ lats.ref        : num [1:46, 1] -90 -86 -82 -78 -74 -70 -66 -62 -58 -54 ...
#>  $ long.ref.cast   : num [1, 1] 188
#>  $ longs.ref       : num [1:91, 1] 0 4 8 12 16 20 24 28 32 36 ...
#>  $ ndepth.ref      : num [1:46, 1:91] NaN NaN NaN NaN NaN 39 41 42 43 38 ...
#>  $ ocean.ref       : num [1:46, 1:91] NaN NaN NaN NaN NaN 6 6 6 6 6 ...
#>  $ p.ref           : num [1:45, 1] 0 10 20 30 40 50 76 101 126 151 ...
#>  $ p.ref.cast      : num [1:32, 1] 10 20 30 50 75 100 125 150 200 250 ...
#>  $ sigma.2.ref.cast: num [1:32, 1] 30 30.1 30.1 30.3 30.6 ...
#>  $ version.date    : chr [1, 1] "15th_May_2011"
#>  $ version.number  : chr [1, 1] "3.0.11"
#>  - attr(*, "header")=List of 3

m12 <- readMat("~/src/gsw_matlab_v3_06_12/library/gsw_data_v3_0.mat")
str(m12, 1)
#> List of 20
#>  $ CT.ref.cast     : num [1:32, 1] 28.8 28.8 28.7 28.4 27.8 ...
#>  $ SAAR.ref        : num [1:45, 1:45, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ SA.ref          : num [1:45, 1:45, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ SA.ref.cast     : num [1:32, 1] 34.7 34.7 34.8 34.9 35 ...
#>  $ SR.ref          : num [1:45, 1:45, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ deltaSA.ref     : num [1:45, 1:45, 1:91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
#>  $ gamma.n.ref.cast: num [1:32, 1] 21.8 21.8 21.9 22.1 22.4 ...
#>  $ gsw.cv          :List of 816
#>   ..- attr(*, "dim")= int [1:3] 816 1 1
#>   ..- attr(*, "dimnames")=List of 3
#>  $ gsw.demo.data   :List of 11
#>   ..- attr(*, "dim")= int [1:3] 11 1 1
#>   ..- attr(*, "dimnames")=List of 3
#>  $ lat.ref.cast    : num [1, 1] 4
#>  $ lats.ref        : num [1:45, 1] -86 -82 -78 -74 -70 -66 -62 -58 -54 -50 ...
#>  $ long.ref.cast   : num [1, 1] 188
#>  $ longs.ref       : num [1:91, 1] 0 4 8 12 16 20 24 28 32 36 ...
#>  $ ndepth.ref      : num [1:45, 1:91] NaN NaN NaN NaN 39 41 42 43 38 37 ...
#>  $ ocean.ref       : num [1:45, 1:91] NaN NaN NaN NaN 6 6 6 6 6 6 ...
#>  $ p.ref           : num [1:45, 1] 0 10 20 30 40 50 76 101 126 151 ...
#>  $ p.ref.cast      : num [1:32, 1] 10 20 30 50 75 100 125 150 200 250 ...
#>  $ sigma.2.ref.cast: num [1:32, 1] 30 30.1 30.1 30.3 30.6 ...
#>  $ version.date    : chr [1, 1] "15th_May_2011"
#>  $ version.number  : chr [1, 1] "3.06.12"
#>  - attr(*, "header")=List of 3

Created on 2021-07-06 by the reprex package (v2.0.0)

from gsw-r.

ocefpaf avatar ocefpaf commented on August 17, 2024

and the decision of GSW-R is to key on GSW-C.

Some for the Python wrapper. It is nice to share a common code base and hopefully that will make both libs in different languages more robust.

from gsw-r.

dankelley avatar dankelley commented on August 17, 2024

A bit along this line, and to @efiring -- the R system emits the following warnings. This is for code that I've modified to fit R, so the line numbers are wrong. In terms of GSW-C/gsw_saar.c, the relevant line numbers are 59, 62, 177 and 180. Indenting GSW-C/gsw_saar.c lines 59-60 and 177-178 will make this warning go away. (I can make a PR request if that's warranted.)

gsw_saar.c: In functiongsw_saar’:
gsw_saar.c:63:5: warning: thisifclause does not guard... [-Wmisleading-indentation]
   63 |     if (isnan(lat) || isnan(lon) || isnan(p))
      |     ^~
gsw_saar.c:66:9: note: ...this statement, but the latter is misleadingly indented as if it were guarded by theif66 |         if (lat  <  -86.0  ||  lat  >  90.0)
      |         ^~
gsw_saar.c: In functiongsw_deltasa_atlas’:
gsw_saar.c:181:5: warning: thisifclause does not guard... [-Wmisleading-indentation]
  181 |     if (isnan(lat) || isnan(lon) || isnan(p))
      |     ^~
gsw_saar.c:184:9: note: ...this statement, but the latter is misleadingly indented as if it were guarded by theif184 |         if (lat < -86.0  ||  lat  >  90.0)
      |         ^~

from gsw-r.

dankelley avatar dankelley commented on August 17, 2024

Oh, sorry, and I forgot to point out that the GSW-R version uses isnan, which is built-in to the R system, rather than the ISNAN macro that is defined in GSW-C/gsw_saar.c.

from gsw-r.

efiring avatar efiring commented on August 17, 2024

Dan, since the file is short and badly formatted, I made a PR with a complete reformatting:
TEOS-10/GSW-C#37. I will leave it out for a bit to see whether
anyone objects; if so, another PR with a minimal reformatting to remove the warnings
can be made.
Regarding isnan, I would think that using the macro would actually use plain isnan in
R--or does R define its own non-standard version of isnan for some reason?

from gsw-r.

dankelley avatar dankelley commented on August 17, 2024

Eric, thanks for the note.

R does some things to avoid conflicts between namespaces. I think you're right -- it would use the else part of the macro that starts at line 11 of GSW-C/gsw_saar.c, so likely I could have retained that part. I removed it because we know for sure that this isna() function is available in C code built into R, because of how the C-within-R system is set up.

I submitted GSW-R to the R build system. It has passed the two preliminary hurdles, and within a day I will likely either see it start appearing in the archive, or I'll get a message explaining a problem. The system is really quite nice. Submitted packages have to pass a lot of tests, on a lot of different machine types (see https://cran.r-project.org/web/checks/check_results_gsw.html for the previous version). Not only must things work on different machines, but also on the present version of R, the just-previous one and the upcoming one. And all packages that rely on the submitted package are also tested to be sure that they still work as intended. It takes about a week for a package to get into the system, with all this testing. And, sometimes, the tests reveal interesting problems, since there are lots of different C/C++ and Fortran compilers being used. All of this makes it a bit tricky to get a clean submission, but it also means that you can have a pretty high degree of trust in R packages (which number 17819 at the moment).

from gsw-r.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.