GithubHelp home page GithubHelp logo

cdparacord's People

Contributors

fennekki avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

thwaller

cdparacord's Issues

Albumartist omitted on multi-artist album where artist matches albumartist

When rippng a CD that has tracks with artist tag matching albumartist tag, and the setting to always tag albumartist is set to false, the generation of albumartist tags is determined per track based on whether their values would be the same.

This logic works for single-artist-albums where no albumartist is wanted for whatever reason, but on multi-artist albums it means the tracks with artist == albumartist will not have albumartist tags created. While this is not a problem in all software, I personally use some pathological cases that actually treat the parts of the album with albumartist tagged, and the parts without, as two separate albums with the same name from the same albumartist. This needs to stop.

Allow disabling/configuring tagging

This is a key issue with making the encoder configurable: It will prove difficult to encode flac or ogg if the software is hardcoded to use MP3 tools. Other file formats need their metadata too, though, and this should be facilitated perhaps with extended use of Mutagen.

Concepts

  • configurable output format
  • match on output format, tag only if it is recognised (Mutagen supports basically everything)
  • if no match is found, write tagging information somewhere accessible for custom scripts

Currently the code uses Mutagen's abstraction to guess what the output file is (though it explicitly tries EasyID3 first - I assume this will fail on non-ID3 files so it should be OK?) after which it generates a generic tag object. I believe I should be using a format that works for all kinds of outputs, but this needs to be checked.

We might have to manually do filename test but it's possible that mutagen is good enough to guess correctly most of the time based on the file?

The existence of cdparanoia is checked relatively late.

This can cause issues related to other bugs where you lose data due to cdparacord crashing.

However, it feels like it might require significant restructuring of the code? If so, this might be postponed after some investigation.

Create issue template

This issue template should note that you need to add all your stuff to the continuous backlog project, and label it correctly.

INVESTIGATE: Under Python 3.7, a branch is missed

Under Python 3.7, coverage 4.5.1, pytest 3.10.0:
A branch that only contains "continue" in rip.py is missed in tests. Adding any operation to this branch (inside the if body) seems to fix this issue.

Assumption: pytest or coverage is faulty in that they don't recognise the "continue" is effectively reached.
Alternate assumption: Python has a fault where it does in fact eliminate the continue

Current solution: Recommending use of python 3.6, waiting for fixes in testing tools, not considering this a bug in cdparacord.

Investigate if possible

Write a pull request template

This template should contain some information about how pull requests should be structured, what kinds of test coverage they require etc.

Unify coding conventions

Generally

  • Follow PEP 8 if you're looking for guidance, but don't sweat it.

Comments

  • Avoid inline comments
  • Format comments to 72 characters wide

Docstrings

  • Write docstrings for all functions, but only if you can come up with a useful description of them.
  • Use """triple quoted heredoc""" for docstrings.
  • Follow PEP 257
  • Format to 72 characters wide

Strings

  • Use 'single quotes' for one-line strings. Use """double quote heredoc""" for docstrings. Do not use other quoting styles unless there is a syntactical necessity to do so. (UPDATE 2019-08-05: This might be changed to "double quotes every time" at some later time because for some reason I physically can't stop myself from always using double quotes by default.)
  • Use string.format instead of f"formatted string literals". The latter allow variable interpolation in strings without a method call but this software started with string.format and should keep with it for consistency. If these are to be changed, they will all be changed at once.

Testing

  • Test your code with an actual CD, and without a CD in the drive. This is a program to rip CDs. If it does that, everything's pretty good.
  • Mark lines that do not need to be coverage tested as such. Note that very few lines require this, and if you're not sure, it's better to have the line marked uncovered.
  • Try to put code in functions and classes that can be unit tested, but don't contort it to fit that. Unit tests provide automatic code coverage.
  • If unit tests cannot be provided eg. where two modules are too intertwined to be separately tested, provide a hybrid test. Note that these might require hacks like forcing re-import to work properly (see tests/test_config.py)
  • Covering code paths is good, but don't get stuck. Prioritise testing the actual expected success and failure paths, and add tests as issues are discovered.

Add branch coverage measurements

Add branch coverage (if our test system supports it) and then also ensure that we have total branch coverage in addition to total line coverage.

Filter a drastically higher amount of characters in filenames

Ideally, something like this

  • possibly: Put individual non-letter-non-number-non-whatever symbols through unidecode to match them to ASCII punctuation
  • Put it through the current punctuation filter (and expand upon the current punctuation filter)
  • discard everything that isn't specifically allowed punctuation, or written letters

This should allow us to easier copy between platforms. If #5 is fixed, we could additionally make this configurable so you could still have correct filenames (though in my personal opinion filenames are only really useful on the command line, and the tags are what you'll be looking at; this means you want filenames to be typeable instead of correct).

Primarily we want to reduce the characters to a set that is actually typeable on a keyboard. For this purpose minute marks (such as seen on the eponymous album by Franz Ferdinand) and other look-typeable-but-aren't symbols are rather problematic.

Secondarily, we want to support filesystems like FAT32, which are still in common use on memory cards and the like, and have a much more limited character set, and operating systems where the symbols disallowed in filenames constitute a much larger set (see eg. this). If we limit ourselves to typeable punctuation minus those characters we should be much better off, even when dealing with Unicode.

Refactor test_config.py

It was the first test file created and has massive interdependencies with xdg.py that I don't think it needs. See if that can be fixed. A chief issue is that importing xdg will crash unless the test environment is specially set up, but that should be possible to work around by somehow mocking the entire xdg module, whose contents are very small.

Tagging data is lost if ripping fails

If something fails after entering disc information in a text editor, for example, you input data in an incorrect format or you terminate the program after you've input the tag data, you lose the data you inputted. You can manually save it elsewhere to mitigate.

Perhaps there should be some place where the data is stored? If resuming is implemented, this is an obvious necessity.

Add customisable post-rip operations

Depends on #5

For instance, running mp3gain (or preferably an equivalent; mp3gain has a lot of problems and hasn't been maintained for what I think is close to a decade) on all the files after they've been ripped and copied to the target directory.

Different possible actions

  • file-post-rip: Run something on an individual file, before encoding. Would complicate the "pipeline" as it'd add more dependencies before we can actually encode each track, but doable. Also, if we keep track of the rip/encode status (#8) this could be integrated into that.
  • file-post-encode: Same as earlier but after encode. If we implement #10 this should be relatively easy. Also, it'd only be run after encode, so it wouldn't interfere with the parts we already know work.
  • file-post-copy: Same, except after the file is copied to the target directory. Currently they're directly encoded to the target directory, but this would be a thing after #10.
  • album-post-rip: Bad idea. Stalls encoding until after everything is ripped. While that isn't actually a catastrophe in and of itself, it would significantly increase rip times as currently the effective total time to completion is "rip time + encode time for the last file" because usually it seems encoding a track is somewhat faster than ripping one, and we can run them in parallel. Therefore, the net contribution of encoding to the process is relatively slow. This would increase that contribution to the full time of encoding each file. On the other hand, if you really need to apply some transformation to the PCM before it is encoded, this is necessary anyway.
  • album-post-encode: Actually a good idea: Before copying the files to the target directory (pending #10) run something on all of them (or some of them: These actions would ideally give you the filenames but you could still just choose to run something completely different)
  • album-post-copy: If the final file layout needs to be changed somehow, or permissions need to be changed on the files and you copied them over filesystem boundaries or did something else that ended up not preserving file permissions.

Proposed functionality

  • Some way of configuring actions. YAML or raw python configuration files would be suitable.
  • Each action should be a list of programs to run (directly, not in a shell) after the action has been taken.
  • In the command string, there should be a way to specify placeholders for:
    • All arguments (run command once for all filenames)
    • Single filename (run command separately for each filename)
    • Optionally, the tasks could be objects,
      tasks:
          "chmod":
              args:
                  - "u+x"
                  - placeholder.ALLFILES
              run: "once-per-file"

For multi-cd releases, only the first CD is considered when fetching sources

For example, the Outkast double album Speakerboxxx/The Love Below contains two CDs (labeled, appropriately enough, Speakerboxxx and The Love Below). The first CD has 19 tracks and the second one 20.

If you try ripping the first CD, everything works out fine and MusicBrainz data is correctly fetched. However, if you're ripping the second CD, you get this warning:

Warning: Source MusicBrainz dropped for wrong track count (Got 19, 20 expected)

This seems to suggest only the first CD is considered when fetching sources, making it currently impossible to get MusicBrainz data properly for multi-CD releases.

(Curses?) editor for albumdata

Instead of launching a text editor, integrate an album data editor into cdparacord. This album data editor should prefill data as the text editor does now, and it should have the following features:

  • At the top, the data pertaining to the entire album (album title, album artist, release date)
  • Beneath that, three columns
    • Song title
    • Song artist
    • Song target filename
  • Beneath that, "Abort" and "Rip" buttons (and possibly "Save and Abort")
  • There should be the following controls
    • Tab and Shift+Tab move sideways between columns
    • Up and Down move up and down to other columns
    • Enter/Return moves to the next column (left-to-right, top-to-bottom) and presses buttons
    • A hotkey to revert a column to its default value (Whatever value the editor initially loaded with)
  • POSSIBLY: The albumdata selection from musicbrainz could be folded into this interface somehow (dunno how)

Typing

Might require making stubs for dependencies

Could not find "parameters".

Under some quite unknown circumstances, with at least one specific album (lnTlccHq6F8XNcvNFafPUyaw1mA-), a fresh git clone fails to encode on the first track and crashes.

The script correctly rips track 1 and when it starts to encode there's an issue. After the rip...

Prints:

Could not find "parameters".
Can't init infile 'parameters'

And then raises:

File "cdparacord/rip.py", line 104, in _encode_track
    raise RipError('Failed to encode track {}'.format(track.filename))

Print discid submission url in the text

We could therefore print it only if the discid is not in musicbrainz, and with an additional message that states "terminate and restart the rip if you want to use the musicbrainz information"

New UI

My concept for the UI of this program has historically been pretty different from what it looks like now. It should get a new UI, maybe one done in curses or urwid or something.

Restructure project for future improvements

While I did consider the current state of the code an improvement over the original, the structure is kind of awful for extensions.

Investigate what could be done about refactoring, at the very least. If there's something definite, create new tickets and close this.

Actually add a way to get the MusicBrainz submission URL

#13 was supposed to add a submission URL to the data displayed to the user, and was apparently fixed at some point. However, it seems that since the big refactor (half a year ago!) it's again been impossible to get this URL easily. I have no idea when I last used it, which could explain this.

Make encoder customisable

Currently it's fixed to lame -V2 which probably isn't ideal for everyone.

Action plan:

  • Call the option encoder instead of lame
  • Add a placeholder (templatable with string.Template) for the source and target files in the params

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.