I'd like to suggest to include (at least) two particles from the ARTS SSDB in the ARTS

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Thanks, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

The way we distribute the data are by Zenodo (<a href="https://doi.org/10.5281/zenodo.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Include particle(s) from ARTS SSDB in test data about arts HOT 14 CLOSED

simonpf commented on July 26, 2024

Include particle(s) from ARTS SSDB in test data

from arts.

Comments (14)

olemke commented on July 26, 2024 1

@simonpf I'm in the process of downloading the v1.1.0 from Zenodo and putting it on the ftp.

from arts.

olemke commented on July 26, 2024 1

@olemke , I cannot find any official data volume policy on github CI. Is there something I am missing? It's about a 1Gb extra to download per merge/push/PR to add this data. I think this is fine, but it's getting to the level (i.e., a round number) where knowing the policy is good.

Couldn't find anything in the Github docs on CI bandwidth limits either. However, considering that setting up CI environments can already download quite some data (packages, docker containers or conda environments), it's probably not an issue to have a few more data files to download. However, we should still consciously be aware that every file is downloaded for every single CI job, which multiplies bandwidth demand and thus make an effort to keep the number files we need to download to the minimum necessary.

from arts.

erikssonpatrick commented on July 26, 2024

Not sure why feel that you need to ask. The test data in ARTS is already a mess. Some in controlfiles/testdata, but still many folders contain local test data. So you will ruin anything!
Or do you worry about file sizes? A bit of TRO data I assume is no problem. But what about ARO? I assume you want that as well.
Or shall we make ARTS dependent on the database? A bit hard already to use ARTS without arts-cat-data and arts-xml-data. So one more data repository does not feel as a big deal.
Sorry if more questions than answers.

from arts.

simonpf commented on July 26, 2024

Thanks, @erikssonpatrick , I included you in the discussion mainly to provide more substance to my request. But also the discussion regarding the relation between ARTS and the SSDB is probably worth having.

However, what I need right now is the ability to efficiently and flexibly load scattering data in a way that is non-obsolete and reproducible. I think the easiest way to achieve this would be to add some data from the SSDB to the ARTS testdata.

from arts.

riclarsson commented on July 26, 2024

I would prefer this data to be downloaded on the fly for the tests. Is there a server holding it with relatively good granularity of files and really good chance to be up at all times? For the XML and CAT data, we have the mechanisms to download on the fly in place. For SVN stuff, give me a link if this is how the SSDB stuff is stored and I can arrange for it to work there as well. Otherwise python provides download tools we can use

from arts.

erikssonpatrick commented on July 26, 2024

The way we distribute the data are by Zenodo (https://doi.org/10.5281/zenodo.1175572). And we have only the code behind the data in repositories, the data are too large for that. Don't know if there is a solution. Let's discuss at next ARTS meeting.
@simonpf Go ahead and add some testdata, so this does not delay you. Progress in the coding is priority 1, 2 and 3. How to handle the data we can sort out later.

from arts.

simonpf commented on July 26, 2024

@erikssonpatrick This is what I am trying to do, but since this is not a 1-man-project, it probably makes sense to coordinate with the other developers.

@riclarsson @olemke Can you comment on what you consider the best way of including test data? I see at least two different folders with test data. I thought that the approach you took with the catalog data in tests/testdata looked quite nice. Then it would enough to just replace the scattering data in the ARTS XML data with the SSDB data.

from arts.

olemke commented on July 26, 2024

Replacing (or for now adding some set of SSDB data) to arts-xml-data sounds good to me. I'm just worried about the size of the scattering data. How big are the ~10 files you would need to add?

from arts.

olemke commented on July 26, 2024

We also do have an unpacked version of the scattering data on our projects FTP server. The testdata framework could be used to download selected files from there. See lftp ftp://ftp-projects.cen.uni-hamburg.de:/arts/ArtsScatDbase/v1.0.0/StandardHabits/FullSet

from arts.

simonpf commented on July 26, 2024

The files are 6 to 10 MB. So the FTP sound like a good option. Can you update it to include the latest version?

Nonetheless, it probably makes sense to get rid of the scattering data in ARTS xml as it will be obsolete.

from arts.

riclarsson commented on July 26, 2024

The files are 6 to 10 MB. So the FTP sound like a good option. Can you update it to include the latest version?

@olemke , I cannot find any official data volume policy on github CI. Is there something I am missing? It's about a 1Gb extra to download per merge/push/PR to add this data. I think this is fine, but it's getting to the level (i.e., a round number) where knowing the policy is good.

I think the amount of data sounds reasonable since it is testing a quite adaptive system.

from arts.

olemke commented on July 26, 2024

@simonpf v1.1.0 is now available at ftp://ftp-projects.cen.uni-hamburg.de:/arts/ArtsScatDbase/v1.1.0/StandardHabits/FullSet

from arts.

simonpf commented on July 26, 2024

Thank you, @olemke for updating the SSDB data.

Currently the downloaded data amounts to 40 MB. It can probably reduced further but that would require modifying the SSDB data increasing the risk of the test data becoming obsolete with the next updated of the SSDB.

Ultimately it may not be necessary to test the SSDB compatibility in the CI but instead use on-the-fly generated scattering data. Nonetheless, I think it is usefull to have reproducible tests for this available.

from arts.

olemke commented on July 26, 2024

40MB should be no problem.

from arts.

Include particle(s) from ARTS SSDB in test data about arts HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs