GithubHelp home page GithubHelp logo

ami-tools's People

Contributors

bturkus avatar cgmcnamara avatar genfhk avatar hshiue avatar nkrabben avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ami-tools's Issues

survey_drive fail: improper bags

hey nick,

we've been using survey_drive.py to do another overall count of everything we've got on HDDs and servers, and I noticed the script seems to fail when it encounters improperly formed bags. If you'd like to recreate the error, you can simply copy a bag manifest into a data folder. The traceback will read:

Traceback (most recent call last):
  File "/Users/pamiaudio/Desktop/ami-tools/bin/survey_drive.py", line 50, in survey_bag
    bag = ami_bag(bag_path)
  File "/usr/local/lib/python3.6/site-packages/ami_bag/ami_bag.py", line 26, in __init__
    super(ami_bag, self).__init__(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/ami_bag/update_bag.py", line 34, in __init__
    super(Repairable_Bag, self).__init__(*args, **kwargs)
  File "/usr/local/bin/bagit.py", line 172, in __init__
    self.path = abspath(path)
  File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/posixpath.py", line 369, in abspath
    path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not NoneType

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/survey_drive.py", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/Users/pamiaudio/Desktop/ami-tools/bin/survey_drive.py", line 139, in <module>
    main()
  File "/Users/pamiaudio/Desktop/ami-tools/bin/survey_drive.py", line 108, in main
    bag_info = survey_bag(bag_path)
  File "/Users/pamiaudio/Desktop/ami-tools/bin/survey_drive.py", line 55, in survey_bag
    bag = bagit.Bag(bag_path)
  File "/usr/local/bin/bagit.py", line 177, in __init__
    self._open()
  File "/usr/local/bin/bagit.py", line 188, in _open
    raise BagError("No bagit.txt found: %s" % bagit_file_path)
bagit.BagError: No bagit.txt found: /Volumes/video_repository/Working_Storage/Transfer/4_RTG/4_RTG_Audio/2017_117_mellon_004_NYPL_16700_pilotAsProgress/342465/data/bagit.txt

When we're doing this type of thing, our needs are simpler, and I'm wondering if there might be a way to add some arg.parse options for just doing the survey_files piece of this. Just an idea.

Thanks!
Ben

survey_drive.py: ModuleNotFoundError: No module named 'bagit'

MY-PAMI-061930:ami-tools genevievehavemeyerking$ python3 /Users/genevievehavemeyerking/ami-tools/bin/survey_drive.py -d /Volumes/NYPL_16650/ -o /Users/genevievehavemeyerking/Desktop/drop_folder
Traceback (most recent call last):
File "/Users/genevievehavemeyerking/ami-tools/bin/survey_drive.py", line 4, in
import bagit
ModuleNotFoundError: No module named 'bagit'

new feature: reporting for get_file

hi,

I was hoping we could add another new feature to the get_file script, some form of post_getting reporting that would indicate files successfully pulled/not. Logic could be copied from validate_ami_bags, and ideal reporting would be both in terminal and in a newly created csv log.

thanks!

INFO not reporting which bags are not ready for media ingest

root: 2021-06-29 16:09:46,737 - INFO - 254 of 258 bags are ready for ingest
root: 2021-06-29 16:09:46,737 - INFO - The following bags are ready for media ingest:.....

(above output example lists only which bags are ready, but not the 4 that apparently are not.)

Add Tests!

!!!!!!!
WTF! Why no tests? This maintainer is the worst.

validate_ami_bags: additional json test?

Interesting recent case: in-house audio project passes json validation and is considered "ready for ingest" by validate_ami_bags, but the actual information inside many of the the json files is incorrect (it was a rare situation in which we made a series of mistakes that resulted in correct json filenames but incorrect data within).

I know it's a little out of scope, but could we consider adding some sort of test that looks inside json files and matches on filenames or technical metadata?

survey_drive.py module not found error

hi,

I'm getting the following traceback when attempting to run survey_drive.py:

Traceback (most recent call last):
File "/Users/pamiaudio/Desktop/ami-tools/bin/survey_drive.py", line 5, in
from ami_bag.ami_bag import ami_bag
File "/usr/local/lib/python3.6/site-packages/ami_bag/ami_bag.py", line 8, in
from ami_md.ami_excel import ami_excel
File "/usr/local/lib/python3.6/site-packages/ami_md/ami_excel.py", line 14, in
import ami_md.ami_json as ami_json
File "/usr/local/lib/python3.6/site-packages/ami_md/ami_json.py", line 8, in
import ami_files.ami_file as ami_file
ModuleNotFoundError: No module named 'ami_files'

Not sure if it's my computer/installation, but I'd love some help. Thanks!

pamidb_to_json.py: line tabulation kill?

Hi,
I've noticed that, from time to time, we've been accidentally adding (and not noticing) extra lines being added to database fields. These pose a problem because, when exported in merge files and transformed into JSON, they end up as weird new line characters that fail validation.

So this in a mer:
screen shot 2018-02-23 at 9 00 50 am
Ends up as this in JSON:
screen shot 2018-02-21 at 9 53 17 am

We can certainly try to impose more order in the a database, but I see this being a recurring problem, and if we could add a kill mechanism of some sort (without losing information), that'd be wonderful.
Let me know what you think...
Thanks,
Ben

new feature: add pami dance storage to get repo file

pretty self-explanatory, but believe the steps involved would be (1) adjusting the assets list to include this aws bucket and (2) tweaking the script to allow for pulling either preservation or service files from this location

virtual environment for ami-tools

Hi,

Recently I've been experiencing some frustrating python/pip issues...I don't know exactly what's causing the problem or how to resolve it, but It appears that a brew update/upgrade to python 3.9 is triggering this kind of error:

MY-PAMI-038536:bin benjaminturkus$ python3 validate_ami_bags.py -h
Traceback (most recent call last):
  File "/Users/benjaminturkus/Desktop/ami-tools/bin/validate_ami_bags.py", line 7, in <module>
    from ami_bag.ami_bag import ami_bag
  File "/usr/local/lib/python3.9/site-packages/ami_bag/ami_bag.py", line 8, in <module>
    from ami_md.ami_excel import ami_excel
  File "/usr/local/lib/python3.9/site-packages/ami_md/ami_excel.py", line 14, in <module>
    import ami_md.ami_json as ami_json
  File "/usr/local/lib/python3.9/site-packages/ami_md/ami_json.py", line 8, in <module>
    import ami_files.ami_file as ami_file
  File "/usr/local/lib/python3.9/site-packages/ami_files/ami_file.py", line 3, in <module>
    from pymediainfo import MediaInfo
ModuleNotFoundError: No module named 'pymediainfo'

As far as I can tell, everything is installed correctly and my python path is as it should be, but there's some disconnect that's preventing python from finding the right packages. I'd like to (a) actually learn how to resolve what's going on, but (b) also discuss the possibility of setting up a virtual environment for ami-tools that might potentially future proof us a little bit more and prevent this kind of thing from cropping up.

Any advice or thoughts would be appreciated.

Thanks,

Ben

validate_ami_bags.py. / ami_bagError questions

  1. This error doesn't get conveyed in the error/warning 'summary' (are all errors supposed to be summarized?): 
    ERROR - Following error encountered while loading /Volumes/NYPL357540/Video/492776: 'AMI bag must contain either Excel or JSON metadata'
  2. In one instance, a bag threw a "Missing metadata files" message which was not also caught by the "...must contain either Excel or JSON metadata" ami_bagError - so although it was missing metadata for some SCs, this bag only threw a 'warning', rather than an error.
    WARNING - Error in bag structure: Filenames for media files do not match filenames for metadata. ... Missing metadata files: 
  3. Warning that should be an error for most video formats (I think optical video would be the only exception? Should we ask vendors to quarantine optical video as a workaround?) : 
    Error in asset balance: Mismatch of PM's and SC's

General thing: Use of the word "error" in Warning message makes parsing difficult.

validate_ami_bags dateCreated json test

I think there might be something a little off in way that validate_ami_bags is checking for alignment of the dateCreated in Mediainfo vs. as reported in the JSON. I'm getting a lot of this:

ami_md.ami_json: 2021-02-03 13:25:08,102 - WARNING - dateCreated in JSON and from file disagree. JSON: 2021-01-12, From file: 2021-02-03.

But in running the file thru Mediainfo, I'm not seeing the "from file" (in this case 2021-02-03) anywhere:

MY-LPAMI-056430:~ benjaminturkus$ mediainfo -F /Volumes/lpasync/\!-PAMI/_QCFail/2020_017_pami_178_mao822/323942/data/EditMasters/mao_323942_v01f01_em.flac 
General
Count                                    : 331
Count of stream of this kind             : 1
Kind of stream                           : General
Kind of stream                           : General
Stream identifier                        : 0
Count of audio streams                   : 1
Audio_Format_List                        : FLAC
Audio_Format_WithHint_List               : FLAC
Audio codecs                             : FLAC
Complete name                            : /Volumes/lpasync/!-PAMI/_QCFail/2020_017_pami_178_mao822/323942/data/EditMasters/mao_323942_v01f01_em.flac
Folder name                              : /Volumes/lpasync/!-PAMI/_QCFail/2020_017_pami_178_mao822/323942/data/EditMasters
File name extension                      : mao_323942_v01f01_em.flac
File name                                : mao_323942_v01f01_em
File extension                           : flac
Format                                   : FLAC
Format                                   : FLAC
Format/Info                              : Free Lossless Audio Codec
Format/Url                               : https://xiph.org/flac/
Format/Extensions usually used           : fla flac
Commercial name                          : FLAC
Internet media type                      : audio/x-flac
File size                                : 275218861
File size                                : 262 MiB
File size                                : 262 MiB
File size                                : 262 MiB
File size                                : 262 MiB
File size                                : 262.5 MiB
Duration                                 : 1542803
Duration                                 : 25 min 42 s
Duration                                 : 25 min 42 s 803 ms
Duration                                 : 25 min 42 s
Duration                                 : 00:25:42.803
Duration                                 : 00:25:42.803
Overall bit rate mode                    : VBR
Overall bit rate mode                    : Variable
Overall bit rate                         : 1427111
Overall bit rate                         : 1 427 kb/s
Stream size                              : 0
Stream size                              : 0.00 Byte (0%)
Stream size                              :  Byte0
Stream size                              : 0.0 Byte
Stream size                              : 0.00 Byte
Stream size                              : 0.000 Byte
Stream size                              : 0.00 Byte (0%)
Proportion of this stream                : 0.00000
File last modification date              : UTC 2021-01-12 23:25:16
File last modification date (local)      : 2021-01-12 18:25:16

Audio
Count                                    : 280
Count of stream of this kind             : 1
Kind of stream                           : Audio
Kind of stream                           : Audio
Stream identifier                        : 0
Format                                   : FLAC
Format                                   : FLAC
Format/Info                              : Free Lossless Audio Codec
Format/Url                               : https://xiph.org/flac/
Commercial name                          : FLAC
Internet media type                      : audio/x-flac
Duration                                 : 1542803
Duration                                 : 25 min 42 s
Duration                                 : 25 min 42 s 803 ms
Duration                                 : 25 min 42 s
Duration                                 : 00:25:42.803
Duration                                 : 00:25:42.803
Bit rate mode                            : VBR
Bit rate mode                            : Variable
Bit rate                                 : 1426752
Bit rate                                 : 1 427 kb/s
Channel(s)                               : 1
Channel(s)                               : 1 channel
Channel positions                        : Front: C
Channel positions                        : 1/0/0
Channel layout                           : C
Sampling rate                            : 96000
Sampling rate                            : 96.0 kHz
Samples count                            : 148109088
Bit depth                                : 24
Bit depth                                : 24 bits
Compression mode                         : Lossless
Compression mode                         : Lossless
Stream size                              : 275149593
Stream size                              : 262 MiB (100%)
Stream size                              : 262 MiB
Stream size                              : 262 MiB
Stream size                              : 262 MiB
Stream size                              : 262.4 MiB
Stream size                              : 262 MiB (100%)
Proportion of this stream                : 0.99975
Writing library                          : reference libFLAC 1.3.3 20190804
Writing library                          : libFLAC 1.3.3 (UTC 2019-08-04)
Encoded_Library_Name                     : libFLAC
Encoded_Library_Version                  : 1.3.3
Encoded_Library_Date                     : UTC 2019-08-04


MY-LPAMI-056430:~ benjaminturkus$ 

software dependencies

validate_ami_bags.py requires modules: xlrd, openpyxl . Add to README as prereq's or update setup.py to include these as well?

pamidb_to_json.py not in usr/local/bin

not sure I fully understand why, but I just cloned and installed ami-tools, and the new pamidb_to_json script didn't seem to make it to my usr/local/bin. am I missing something?

JSON bag rules don't allow "Images" directory?

2017-06-28 16:16:49,533 - ERROR - Error in AMI bag type: JSON bags may only have the following directories - {'ArchiveOriginals', 'PreservationMasters', 'ServiceCopies', 'EditMasters'}

undetermined installation issue

not sure what's causing this...help!

NYPLPAMIC04sPro:~ nypl_pami_c04$ pamidb_to_json.py -i /Users/nypl_pami_c04/Documents/JSON_Export/2019_003_pami_090_wilson2.mer -o /Volumes/NYPL_15968/2019_003_pami_090_wilson2/PreservationMasters 

/usr/local/bin/pamidb_to_json.py: line 1: import: command not found

/usr/local/bin/pamidb_to_json.py: line 2: import: command not found

/usr/local/bin/pamidb_to_json.py: line 3: import: command not found

/usr/local/bin/pamidb_to_json.py: line 5: dtypes: command not found

/usr/local/bin/pamidb_to_json.py: line 6: digitizationProcess.analogDigitalConverter.serialNumber:: command not found

/usr/local/bin/pamidb_to_json.py: line 7: digitizationProcess.captureSoftware.version:: command not found

/usr/local/bin/pamidb_to_json.py: line 8: digitizationProcess.playbackDevice.serialNumber:: command not found

/usr/local/bin/pamidb_to_json.py: line 9: digitizationProcess.timeBaseCorrector.serialNumber:: command not found

/usr/local/bin/pamidb_to_json.py: line 10: digitizationProcess.phonoPreamp.serialNumber:: command not found

/usr/local/bin/pamidb_to_json.py: line 11: physicalDescription.properties.stockProductID:: command not found

/usr/local/bin/pamidb_to_json.py: line 12: syntax error near unexpected token `}'

/usr/local/bin/pamidb_to_json.py: line 12: `}'

Reverse dtypes (str-->int) for pamidb_to_json

Hey nick,

Could you add something to the pamidb script that'll force values to be integers? I'm having recurring weirdness (I think with pandas...) that I've been unable to resolve within the DB, and it'd be nice to just hardcore a solution.

Thinking specifically about digitizationProcess.speed.measure and source.designatedSpeed.measure

Thanks!

pamidb_to_json.py: str coercion error w/ mult. digitizers

Hi,
I just noticed that the pamidb script fails when a mer export has multiple digitizers (a rare, but not totally uncommon occurrence). Error reads as follows:

pamis-iMac:~ pamiaudio$ pamidb_to_json.py -i /Users/pamiaudio/Desktop/scm2451.mer -o /Users/pamiaudio/Desktop/scm2451 
/usr/local/lib/python3.6/site-packages/ami_md/ami_json.py:4: FutureWarning: The pandas.tslib module is deprecated and will be removed in a future version.
  from pandas.tslib import Timestamp
Traceback (most recent call last):
  File "/Users/pamiaudio/Desktop/ami-tools/bin/pamidb_to_json.py", line 46, in <module>
    main()
  File "/Users/pamiaudio/Desktop/ami-tools/bin/pamidb_to_json.py", line 41, in main
    schema_version = args.schema)
  File "/usr/local/lib/python3.6/site-packages/ami_md/ami_json.py", line 71, in __init__
    self.coerce_strings()
  File "/usr/local/lib/python3.6/site-packages/ami_md/ami_json.py", line 163, in coerce_strings
    for key, item in self.dict["digitizer"]["organization"]["address"].items():
KeyError: 'digitizer'

I'll share a representative mer via email.
Thanks!
Ben

pamidb_to_json.py : utf-8 UnicodeDecodeError

Hi,

I've got another kind of unusual pamidb_to_json issue: when trying to convert a mer, I'm getting the following error:

pamis-iMac:~ pamiaudio$ pamidb_to_json.py -i /Users/pamiaudio/Desktop/scm2144.mer -o /Users/pamiaudio/Desktop/scm2144 
/usr/local/lib/python3.6/site-packages/ami_md/ami_json.py:4: FutureWarning: The pandas.tslib module is deprecated and will be removed in a future version.
  from pandas.tslib import Timestamp
Traceback (most recent call last):
  File "pandas/_libs/parsers.pyx", line 1162, in pandas._libs.parsers.TextReader._convert_tokens (pandas/_libs/parsers.c:14858)
  File "pandas/_libs/parsers.pyx", line 1273, in pandas._libs.parsers.TextReader._convert_with_dtype (pandas/_libs/parsers.c:17119)
  File "pandas/_libs/parsers.pyx", line 1289, in pandas._libs.parsers.TextReader._string_convert (pandas/_libs/parsers.c:17347)
  File "pandas/_libs/parsers.pyx", line 1524, in pandas._libs.parsers._string_box_utf8 (pandas/_libs/parsers.c:23041)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 66: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pamiaudio/Desktop/ami-tools/bin/pamidb_to_json.py", line 46, in <module>
    main()
  File "/Users/pamiaudio/Desktop/ami-tools/bin/pamidb_to_json.py", line 33, in main
    md = pd.read_csv(args.input, dtype = dtypes)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 655, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 411, in _read
    data = parser.read(nrows)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1005, in read
    ret = self._engine.read(nrows)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1748, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862)
  File "pandas/_libs/parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11138)
  File "pandas/_libs/parsers.pyx", line 989, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:12175)
  File "pandas/_libs/parsers.pyx", line 1117, in pandas._libs.parsers.TextReader._convert_column_data (pandas/_libs/parsers.c:14136)
  File "pandas/_libs/parsers.pyx", line 1169, in pandas._libs.parsers.TextReader._convert_tokens (pandas/_libs/parsers.c:14972)
  File "pandas/_libs/parsers.pyx", line 1273, in pandas._libs.parsers.TextReader._convert_with_dtype (pandas/_libs/parsers.c:17119)
  File "pandas/_libs/parsers.pyx", line 1289, in pandas._libs.parsers.TextReader._string_convert (pandas/_libs/parsers.c:17347)
  File "pandas/_libs/parsers.pyx", line 1524, in pandas._libs.parsers._string_box_utf8 (pandas/_libs/parsers.c:23041)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 66: invalid continuation byte

I can't figure out exactly what's different about these records (I thought for a minute the issue was due to the way I duped PM records to make EM records--exporting out of Filemaker, opening in LibreOffice, changing PMs to EMs, re-importing into Filemaker--but I don't believe this been an issue in the past). I'll happily share the offending mer...

Thanks!

Ben

better error messages for validate_ami_bags.py

Suggestions:

  1. for complex bags, instead of 'invalid', give 'warning' to differentiate from more critical errors.
  2. for 'ready' vs. 'not ready' for ingest, give percentage or fraction ("1450 of 1500 bags are ready for ingest")
    2b. this tool is meant to evaluate 'ingestability' - so complex bags are flagged as 'not ready for ingest'. For QC, this makes it hard to pinpoint actual fixity/structure failures in the giant haystack of 'not ready for ingest' bags, as there are many complex bags. Can complex bags be their own category in the report (i.e. 'too complex for ingest')?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.