metaarchive / bagit-split Goto Github PK
View Code? Open in Web Editor NEWTools for handling the splitting (and unsplitting) of BagIt archives. Made with Python.
License: Other
Tools for handling the splitting (and unsplitting) of BagIt archives. Made with Python.
License: Other
https://github.com/MetaArchive/bagit-split/blob/master/bag-split.py#L21 provides API documentation for the 'unsplit()' method. Should similar docs for 'verify_split()' (function at https://github.com/MetaArchive/bagit-split/blob/master/bag-split.py#L79) be added as well? E.g.,
API Usage:
bag = unsplit("/parent/dir/of/bags")
is_split = verify_split("/path/to/dir/of/bags")
LOC is removing the CLI tools from bagit-java in version 5, essentially breaking this script. Since bagit-python, bagit ruby (https://github.com/tipr/bagit), and bagger all do not have bag splitting capability, the only way for institutions to split bags is to write JAVA code, which is not feasible for all dev shops. Any thought to updating this script to make direct use of the Java library?
In both the spit and unsplit operations, it appears that manifest mismatches are always reported:
python bag-split.py split /home/mark/Downloads/bagsplittest/marksbag
Verifying bag marksbag_2... success.
Verifying bag marksbag_1... success.
Verifying bag marksbag_0... success.
Verifying original bag integrity... success.
Original manifest does NOT appear consistent with the split manifests! Diff:
set([])
set([])
Creating metadata bag: marksbag_metadata
Found file: manifest-md5.txt
Found file: bag-info.txt
Found file: data
Found file: bagit.txt
Found file: tagmanifest-md5.txt
and
python bag-split.py unsplit /home/mark/Downloads/bagsplittest/marksbag_split/
Validating bag marksbag_2... success.
Validating bag marksbag_1... success.
Validating bag marksbag_0... success.
Copying payload from marksbag_0...
Copying payload from marksbag_1...
Copying payload from marksbag_2...
New manifest does NOT appear consistent with the split manifests!
Traceback (most recent call last):
File "bag-split.py", line 344, in <module>
unsplit(bag_path, args.output_dir, args.no_verify)
File "bag-split.py", line 237, in unsplit
raise RuntimeError("merged bag manifest inconsistent with split " \
RuntimeError: merged bag manifest inconsistent with split manifests
(I'll open a separate issue for the RuntimeError).
The code that compares the manifests lives around line 130.
When running bag-split.py in unsplit mode, a RuntimeError is raised and exposed to the user:
python bag-split.py unsplit /home/mark/Downloads/bagsplittest/marksbag_split/
Validating bag marksbag_2... success.
Validating bag marksbag_1... success.
Validating bag marksbag_0... success.
Copying payload from marksbag_0...
Copying payload from marksbag_1...
Copying payload from marksbag_2...
New manifest does NOT appear consistent with the split manifests!
Traceback (most recent call last):
File "bag-split.py", line 344, in <module>
unsplit(bag_path, args.output_dir, args.no_verify)
File "bag-split.py", line 237, in unsplit
raise RuntimeError("merged bag manifest inconsistent with split " \
RuntimeError: merged bag manifest inconsistent with split manifests
It might be worth using a try/except around lines 321-238 or some other way of printing a more friendly form of the message to the user (similar to the "New manifest does NOT appear consistent with the split manifests!" message).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.