desihub / desibackup Goto Github PK
View Code? Open in Web Editor NEWContains wrapper scripts for backing up DESI data using the hpsspy package.
License: BSD 3-Clause "New" or "Revised" License
Contains wrapper scripts for backing up DESI data using the hpsspy package.
License: BSD 3-Clause "New" or "Revised" License
Examine files in the spectro directory that are newer than existing backups.
spectro/redux/sjb/dogwood
has new data, but even the new data are from 2015! Maybe just re-backup.spectro/redux/sjb
archival?redux/oak1/bricks/3127p090/blat.h5
: Just one file. Not sure who originally owned it.@sbailey, is the desi/cmx/ci data set now static and ready for backup?
Please add /project/projectdirs/desi/datachallenge/quicklook/review-19.1 to the backups, while excluding review-19.1/redux/preproc/ (114G of easily reproduced preprocessed data).
The LGal_spectra
directory has new data. @weaverba137 will contact rtojeiro
and chahah
.
Add tools for validating the backup specifications in etc/desi.json
, in particular:
--manifest
option (or some equivalent name) that would generate a manifest of what files would go into what tarballs. e.g. perhaps this would point to a directory and it would create fake *.tar files in the same hierarchy and naming that would end up on HPSS, but the contents of those files would be the listing of file paths rather than the contents of the files themselves (like tar -tf blat.tar
would report). This would allow someone to do a dry run and check if the results are intended before actually trying to send TB of data to HPSS.Note that this validation would need to be run at NERSC since it needs access to the actual files; this isn't just the json syntax check of the travis test.
I can't recall now whether desi/spectro/data/QL
is archival. If it is, it would not only need backup configuration, but a data model as well.
Motivated by #2, add more documentation about how to add a new entry to etc/desi.json
, e.g.
Since json files don't allow comments, the existing etc/desi.json
is only partially useful for deriving these patterns.
Create a system for easily monitoring what has already been backed up, what needs to be backed up, what backups are stale, etc.
Please add tape backup configurations for $DESI_ROOT/spectro/redux/daily/tiles/archive/TILEID/ARCHIVEDATE . These are the basis of the MTL ledger update decisions and should be archived as part of the history of operations.
Unlike the rest of the daily prod, the tiles/archive/TILEID/ARCHIVEDATE directories are guaranteed to be frozen once written and thus are safe to backup. New TILEID/ARCHIVEDATE directories may appear (including for the same TILEID that had previously been archived under and earlier ARCHIVEDATE), but the contents of the existing ones won't change once written.
These are ~5 GB each, which is a bit on the small side for htar files. If we need to go to a larger bundling, all tiles on a given ARCHIVEDATE could be put together, though that's a little "unnatural" given the TILEID/ARCHIVEDATE organization instead of ARCHIVEDATE/TILEID, but I don't think that's a blocking factor if we need to go for bigger bundles.
Please add the fuji and guadalupe reductions to desi.json and back them up to HPSS. They likely could follow the same structure as everest for how directories are split into individual htar files.
In case I don't get this done today...
Configure the mocks directory for tape backups. Note that there is a working area in the lya_forest directory that probably shouldn't be backed up.
And make sure the backups actually take place once they are configured.
Update the backup configuration for guadalupe and check on other public VACs.
hsi ls -l
shows older files with only day-level precision.
-rw-r----- 1 desi desi 3332536320 Oct 18 2019 desi_spectro_data_20191017.tar
hpsspy
deals with this by assigning an arbitrary time of day to the backup, but desiBackup assumes second-level precision when comparing files on disk to files on HPSS, and this leads to spurious warnings about files on disk being newer than files on HPSS.
While preparing this code to back up the oak1
reduction, I was thinking about how to handle the configuration of the backups. The low-level backup system, in the hpsspy package, needs a JSON file which specifies how files on disk map to files on HPSS. desiBackup has the necessary file in etc/desi.json
, and the shell script desiBackup.sh
passes this file to the low-level backup system.
The question is, do we:
etc/desi.json
as needed to backup stuff?PS, while dusting off & testing desiBackup, I've already made a backup of oak1
, but this could be deleted at will for further testing.
Examine files in the datachallenge directory that are newer than existing backups. In some cases there is just one file that changed.
surveysim2018/weather/README
; owner sjbailey
dc17a-twopct/spectro/redux/dc17a2/dc17a2_qa.json
; owner sjbailey
dc17a-twopct/spectro/redux/dc17a2/exposures/NIGHT/EXPID/qa-SPECTROGRAPH-EXPID.yaml
; owner sjbailey
dc17a-twopct/spectro/redux/dc17a2/exposures/NIGHT/EXPID/qa-(sky|flux)-SPECTROGRAPH-EXPID.png
; owner sjbailey
dc17a-twopct/spectro/redux/dc17a2/calib2d/NIGHT/qa-z5-EXPID.yaml
; owner sjbailey
reference_runs/18.2/survey/test-tiles.fits
; owner mmagana
This is a nice one because once these issues are resolved, we should be able to go directly from red to green/Complete.
Assigning to @sbailey since he is the owner of most of these files.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.