fennekki / cdparacord Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 1.0 281 KB

A quick and dirty cdparanoia wrapper

License: BSD 2-Clause "Simplified" License

Python 100.00%

cd musicbrainz

cdparacord's People

Contributors

Stargazers

Watchers

Forkers

thwaller

cdparacord's Issues

Make music directory and track template configurable.

Use string.Template
Some stuff already in config.py

Albumartist omitted on multi-artist album where artist matches albumartist

When rippng a CD that has tracks with artist tag matching albumartist tag, and the setting to always tag albumartist is set to false, the generation of albumartist tags is determined per track based on whether their values would be the same.

This logic works for single-artist-albums where no albumartist is wanted for whatever reason, but on multi-artist albums it means the tracks with artist == albumartist will not have albumartist tags created. While this is not a problem in all software, I personally use some pathological cases that actually treat the parts of the album with albumartist tagged, and the parts without, as two separate albums with the same name from the same albumartist. This needs to stop.

Make it possible to always create albumartist tags

Currently albumartist tags are only created when an album has several artists. Depends on #5 because this is currently the way I want it to be.

Improve/add test docstrings where missing/deficient

Several tests in eg. test_albumdata.py are missing docstrings or have incomplete ones. This issue is completed when, at the time of closing it, all docstrings exist.

Allow disabling/configuring tagging

This is a key issue with making the encoder configurable: It will prove difficult to encode flac or ogg if the software is hardcoded to use MP3 tools. Other file formats need their metadata too, though, and this should be facilitated perhaps with extended use of Mutagen.

Concepts

configurable output format
match on output format, tag only if it is recognised (Mutagen supports basically everything)
if no match is found, write tagging information somewhere accessible for custom scripts

Currently the code uses Mutagen's abstraction to guess what the output file is (though it explicitly tries EasyID3 first - I assume this will fail on non-ID3 files so it should be OK?) after which it generates a generic tag object. I believe I should be using a format that works for all kinds of outputs, but this needs to be checked.

We might have to manually do filename test but it's possible that mutagen is good enough to guess correctly most of the time based on the file?

The existence of cdparanoia is checked relatively late.

This can cause issues related to other bugs where you lose data due to cdparacord crashing.

However, it feels like it might require significant restructuring of the code? If so, this might be postponed after some investigation.

Don't move files into target directory until everything has been ripped and encoded

This is to make resuming rips easier. Depends on #7

Create issue template

This issue template should note that you need to add all your stuff to the continuous backlog project, and label it correctly.

INVESTIGATE: Under Python 3.7, a branch is missed

Under Python 3.7, coverage 4.5.1, pytest 3.10.0:
A branch that only contains "continue" in rip.py is missed in tests. Adding any operation to this branch (inside the if body) seems to fix this issue.

Assumption: pytest or coverage is faulty in that they don't recognise the "continue" is effectively reached.
Alternate assumption: Python has a fault where it does in fact eliminate the continue

Current solution: Recommending use of python 3.6, waiting for fixes in testing tools, not considering this a bug in cdparacord.

Investigate if possible

Write a pull request template

This template should contain some information about how pull requests should be structured, what kinds of test coverage they require etc.

Store tag data and ripping progress in rip directory

Ripping progress should be stored in a manner that doesn't allow for false positives (For instance: creating a file called track{:d}.done only when said track has been ripped and encoded). Depends on #7

Unify coding conventions

Generally

Follow PEP 8 if you're looking for guidance, but don't sweat it.

Comments

Avoid inline comments
Format comments to 72 characters wide

Docstrings

Write docstrings for all functions, but only if you can come up with a useful description of them.
Use """triple quoted heredoc""" for docstrings.
Follow PEP 257
Format to 72 characters wide

Strings

Use 'single quotes' for one-line strings. Use """double quote heredoc""" for docstrings. Do not use other quoting styles unless there is a syntactical necessity to do so. (UPDATE 2019-08-05: This might be changed to "double quotes every time" at some later time because for some reason I physically can't stop myself from always using double quotes by default.)
Use string.format instead of f"formatted string literals". The latter allow variable interpolation in strings without a method call but this software started with string.format and should keep with it for consistency. If these are to be changed, they will all be changed at once.

Testing

Test your code with an actual CD, and without a CD in the drive. This is a program to rip CDs. If it does that, everything's pretty good.
Mark lines that do not need to be coverage tested as such. Note that very few lines require this, and if you're not sure, it's better to have the line marked uncovered.
Try to put code in functions and classes that can be unit tested, but don't contort it to fit that. Unit tests provide automatic code coverage.
If unit tests cannot be provided eg. where two modules are too intertwined to be separately tested, provide a hybrid test. Note that these might require hacks like forcing re-import to work properly (see tests/test_config.py)
Covering code paths is good, but don't get stuck. Prioritise testing the actual expected success and failure paths, and add tests as issues are discovered.

Crash when musicbrainz disc output lacks a date

Test case: Gorillaz - Demon Days

  File "cdparacord/albumdata.py", line 155, in _albumdata_from_disc
    albumdata['date'] = release['date']

Add branch coverage measurements

Add branch coverage (if our test system supports it) and then also ensure that we have total branch coverage in addition to total line coverage.

Check that target directory exists when starting to rip, but don't create it until we start moving the files in

Currently the existence of the target directory is only checked because we try to create it and fail if we can't. The issue with this is that it requires us to remove the directory if we want to restart a rip that has been terminated, whether or not anything has been ripped yet!

Filter a drastically higher amount of characters in filenames

Ideally, something like this

possibly: Put individual non-letter-non-number-non-whatever symbols through unidecode to match them to ASCII punctuation
Put it through the current punctuation filter (and expand upon the current punctuation filter)
discard everything that isn't specifically allowed punctuation, or written letters

This should allow us to easier copy between platforms. If #5 is fixed, we could additionally make this configurable so you could still have correct filenames (though in my personal opinion filenames are only really useful on the command line, and the tags are what you'll be looking at; this means you want filenames to be typeable instead of correct).

Primarily we want to reduce the characters to a set that is actually typeable on a keyboard. For this purpose minute marks (such as seen on the eponymous album by Franz Ferdinand) and other look-typeable-but-aren't symbols are rather problematic.

Secondarily, we want to support filesystems like FAT32, which are still in common use on memory cards and the like, and have a much more limited character set, and operating systems where the symbols disallowed in filenames constitute a much larger set (see eg. this). If we limit ourselves to typeable punctuation minus those characters we should be much better off, even when dealing with Unicode.

Make ripping directory predictable based on discid and username

This would make it possible to detect if we already have data for this disc and to potentially continue a rip.

Refactor test_config.py

It was the first test file created and has massive interdependencies with xdg.py that I don't think it needs. See if that can be fixed. A chief issue is that importing xdg will crash unless the test environment is specially set up, but that should be possible to work around by somehow mocking the entire xdg module, whose contents are very small.

Make cdparanoia location configurable.

See config.py

No way to terminate cdparacord before it starts ripping

After cdparacord puts you in vim, there's no way to terminate cdparacord before it starts ripping other than inputting erroneous data (which you will subsequently lose). There should be a Y/N question instead.

Let the user disable MusicBrainz lookup

Since Click is being adopted in the refactor branch, this could be done with eg. --musicbrainz/--no-musicbrainz with the default being true.

Tagging data is lost if ripping fails

If something fails after entering disc information in a text editor, for example, you input data in an incorrect format or you terminate the program after you've input the tag data, you lose the data you inputted. You can manually save it elsewhere to mitigate.

Perhaps there should be some place where the data is stored? If resuming is implemented, this is an obvious necessity.

Add customisable post-rip operations

Depends on #5

For instance, running mp3gain (or preferably an equivalent; mp3gain has a lot of problems and hasn't been maintained for what I think is close to a decade) on all the files after they've been ripped and copied to the target directory.

Different possible actions

file-post-rip: Run something on an individual file, before encoding. Would complicate the "pipeline" as it'd add more dependencies before we can actually encode each track, but doable. Also, if we keep track of the rip/encode status (#8) this could be integrated into that.
file-post-encode: Same as earlier but after encode. If we implement #10 this should be relatively easy. Also, it'd only be run after encode, so it wouldn't interfere with the parts we already know work.
file-post-copy: Same, except after the file is copied to the target directory. Currently they're directly encoded to the target directory, but this would be a thing after #10.
album-post-rip: Bad idea. Stalls encoding until after everything is ripped. While that isn't actually a catastrophe in and of itself, it would significantly increase rip times as currently the effective total time to completion is "rip time + encode time for the last file" because usually it seems encoding a track is somewhat faster than ripping one, and we can run them in parallel. Therefore, the net contribution of encoding to the process is relatively slow. This would increase that contribution to the full time of encoding each file. On the other hand, if you really need to apply some transformation to the PCM before it is encoded, this is necessary anyway.
album-post-encode: Actually a good idea: Before copying the files to the target directory (pending #10) run something on all of them (or some of them: These actions would ideally give you the filenames but you could still just choose to run something completely different)
album-post-copy: If the final file layout needs to be changed somehow, or permissions need to be changed on the files and you copied them over filesystem boundaries or did something else that ended up not preserving file permissions.

Proposed functionality

Some way of configuring actions. YAML or raw python configuration files would be suitable.
Each action should be a list of programs to run (directly, not in a shell) after the action has been taken.
In the command string, there should be a way to specify placeholders for:
- All arguments (run command once for all filenames)
- Single filename (run command separately for each filename)
- Optionally, the tasks could be objects,
```
tasks:
    "chmod":
        args:
            - "u+x"
            - placeholder.ALLFILES
        run: "once-per-file"
```

For multi-cd releases, only the first CD is considered when fetching sources

For example, the Outkast double album Speakerboxxx/The Love Below contains two CDs (labeled, appropriately enough, Speakerboxxx and The Love Below). The first CD has 19 tracks and the second one 20.

If you try ripping the first CD, everything works out fine and MusicBrainz data is correctly fetched. However, if you're ripping the second CD, you get this warning:

Warning: Source MusicBrainz dropped for wrong track count (Got 19, 20 expected)

This seems to suggest only the first CD is considered when fetching sources, making it currently impossible to get MusicBrainz data properly for multi-CD releases.

(Curses?) editor for albumdata

Instead of launching a text editor, integrate an album data editor into cdparacord. This album data editor should prefill data as the text editor does now, and it should have the following features:

At the top, the data pertaining to the entire album (album title, album artist, release date)
Beneath that, three columns
- Song title
- Song artist
- Song target filename
Beneath that, "Abort" and "Rip" buttons (and possibly "Save and Abort")
There should be the following controls
- Tab and Shift+Tab move sideways between columns
- Up and Down move up and down to other columns
- Enter/Return moves to the next column (left-to-right, top-to-bottom) and presses buttons
- A hotkey to revert a column to its default value (Whatever value the editor initially loaded with)
POSSIBLY: The albumdata selection from musicbrainz could be folded into this interface somehow (dunno how)

Update README to be better structured

Currently it mostly tries to explain the design decisions made here instead of how to actually use or develop the tool

Add command line options for most configuration options

Not all of them need to be available, but some of them seem like they could be useful.

Typing

Might require making stubs for dependencies

Could not find "parameters".

Under some quite unknown circumstances, with at least one specific album (lnTlccHq6F8XNcvNFafPUyaw1mA-), a fresh git clone fails to encode on the first track and crashes.

The script correctly rips track 1 and when it starts to encode there's an issue. After the rip...

Prints:

Could not find "parameters".
Can't init infile 'parameters'

And then raises:

File "cdparacord/rip.py", line 104, in _encode_track
    raise RipError('Failed to encode track {}'.format(track.filename))

Print discid submission url in the text

We could therefore print it only if the discid is not in musicbrainz, and with an additional message that states "terminate and restart the rip if you want to use the musicbrainz information"

Allow resuming rip

Add a question before editing album data for whether you want to resume or restart. Depends on #7 #8 #9 #10

Editor is fixed to vim

Should respect EDITOR, probably.

New UI

My concept for the UI of this program has historically been pretty different from what it looks like now. It should get a new UI, maybe one done in curses or urwid or something.

Allow restarting rip with existing album data input

Specifically, ask before we would fetch from musicbrainz, if data already exists.

Depends on #8

Nothing is currently configurable

There should be a configuration file of some kind. $XDG_CONFIG_HOME/cdparacord/config, maybe.

Restructure project for future improvements

While I did consider the current state of the code an improvement over the original, the structure is kind of awful for extensions.

Investigate what could be done about refactoring, at the very least. If there's something definite, create new tickets and close this.

Actually add a way to get the MusicBrainz submission URL

#13 was supposed to add a submission URL to the data displayed to the user, and was apparently fixed at some point. However, it seems that since the big refactor (half a year ago!) it's again been impossible to get this URL easily. I have no idea when I last used it, which could explain this.

Make encoder customisable

Currently it's fixed to lame -V2 which probably isn't ideal for everyone.

Action plan:

Call the option encoder instead of lame
Add a placeholder (templatable with string.Template) for the source and target files in the params

Add setting to not delete ripdir after the rip is done

This might be useful for someone, or debugging.