Comments (14)
@simonpf I'm in the process of downloading the v1.1.0 from Zenodo and putting it on the ftp.
from arts.
@olemke , I cannot find any official data volume policy on github CI. Is there something I am missing? It's about a 1Gb extra to download per merge/push/PR to add this data. I think this is fine, but it's getting to the level (i.e., a round number) where knowing the policy is good.
Couldn't find anything in the Github docs on CI bandwidth limits either. However, considering that setting up CI environments can already download quite some data (packages, docker containers or conda environments), it's probably not an issue to have a few more data files to download. However, we should still consciously be aware that every file is downloaded for every single CI job, which multiplies bandwidth demand and thus make an effort to keep the number files we need to download to the minimum necessary.
from arts.
Not sure why feel that you need to ask. The test data in ARTS is already a mess. Some in controlfiles/testdata, but still many folders contain local test data. So you will ruin anything!
Or do you worry about file sizes? A bit of TRO data I assume is no problem. But what about ARO? I assume you want that as well.
Or shall we make ARTS dependent on the database? A bit hard already to use ARTS without arts-cat-data and arts-xml-data. So one more data repository does not feel as a big deal.
Sorry if more questions than answers.
from arts.
Thanks, @erikssonpatrick , I included you in the discussion mainly to provide more substance to my request. But also the discussion regarding the relation between ARTS and the SSDB is probably worth having.
However, what I need right now is the ability to efficiently and flexibly load scattering data in a way that is non-obsolete and reproducible. I think the easiest way to achieve this would be to add some data from the SSDB to the ARTS testdata.
from arts.
I would prefer this data to be downloaded on the fly for the tests. Is there a server holding it with relatively good granularity of files and really good chance to be up at all times? For the XML and CAT data, we have the mechanisms to download on the fly in place. For SVN stuff, give me a link if this is how the SSDB stuff is stored and I can arrange for it to work there as well. Otherwise python provides download tools we can use
from arts.
The way we distribute the data are by Zenodo (https://doi.org/10.5281/zenodo.1175572). And we have only the code behind the data in repositories, the data are too large for that. Don't know if there is a solution. Let's discuss at next ARTS meeting.
@simonpf Go ahead and add some testdata, so this does not delay you. Progress in the coding is priority 1, 2 and 3. How to handle the data we can sort out later.
from arts.
@erikssonpatrick This is what I am trying to do, but since this is not a 1-man-project, it probably makes sense to coordinate with the other developers.
@riclarsson @olemke Can you comment on what you consider the best way of including test data? I see at least two different folders with test data. I thought that the approach you took with the catalog data in tests/testdata
looked quite nice. Then it would enough to just replace the scattering data in the ARTS XML data with the SSDB data.
from arts.
Replacing (or for now adding some set of SSDB data) to arts-xml-data sounds good to me. I'm just worried about the size of the scattering data. How big are the ~10 files you would need to add?
from arts.
We also do have an unpacked version of the scattering data on our projects FTP server. The testdata framework could be used to download selected files from there. See lftp ftp://ftp-projects.cen.uni-hamburg.de:/arts/ArtsScatDbase/v1.0.0/StandardHabits/FullSet
from arts.
The files are 6 to 10 MB. So the FTP sound like a good option. Can you update it to include the latest version?
Nonetheless, it probably makes sense to get rid of the scattering data in ARTS xml as it will be obsolete.
from arts.
The files are 6 to 10 MB. So the FTP sound like a good option. Can you update it to include the latest version?
@olemke , I cannot find any official data volume policy on github CI. Is there something I am missing? It's about a 1Gb extra to download per merge/push/PR to add this data. I think this is fine, but it's getting to the level (i.e., a round number) where knowing the policy is good.
I think the amount of data sounds reasonable since it is testing a quite adaptive system.
from arts.
@simonpf v1.1.0 is now available at ftp://ftp-projects.cen.uni-hamburg.de:/arts/ArtsScatDbase/v1.1.0/StandardHabits/FullSet
from arts.
Thank you, @olemke for updating the SSDB data.
Currently the downloaded data amounts to 40 MB. It can probably reduced further but that would require modifying the SSDB data increasing the risk of the test data becoming obsolete with the next updated of the SSDB.
Ultimately it may not be necessary to test the SSDB compatibility in the CI but instead use on-the-fly generated scattering data. Nonetheless, I think it is usefull to have reproducible tests for this available.
from arts.
40MB should be no problem.
from arts.
Related Issues (20)
- aa_grid HOT 4
- Inconsistent ppath inline code comments HOT 1
- make pyarts did not work properly HOT 1
- Inconsistent documentation HOT 1
- Is an ARTS without master grids possible? HOT 15
- make_audo_md_h throws when an Agenda uses a non-existing WSV
- ARTS vector arithmetic is broken HOT 2
- The art of developing ARTS
- New types for ARTS-3 HOT 3
- Automatic data download HOT 1
- test_interpolation.cc compilation error
- wigner_functions.cc compilation error
- missing test functions HOT 4
- some of the tests still failed with make check-all HOT 1
- Bug for Numeric HOT 1
- Bug for iyEmissionHybrid HOT 1
- Name of scattering solver WSMs HOT 1
- pyarts 2.6.2 Installation error even though glibc requirement is satisfied HOT 3
- test_path_point fails on CI on windows
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arts.