GithubHelp home page GithubHelp logo

plex_dupefinder's Introduction

Plex DupeFinder

made-with-python License: GPL v3 last commit (develop) Discord Contributing Donate



Introduction

Plex DupeFinder is a python script that finds duplicate versions of media (TV episodes and movies) in your Plex Library and tells Plex to remove the lowest rated files/versions (based on user-specified scoring) to leave behind a single file/version.

Duplicates can be either in bulk (automatic) or on-by-one (interactively).

Demo

Click to enlarge.

asciicast

Requirements

  1. Python 3.6+.

  2. Required Python modules (see below).

Installation

Note: Steps below are for Debian-based distros (other operating systems will require tweaking to the steps).

  1. Install Python 3 and PIP

    sudo apt install python3 python3-pip
    
  2. Clone the Plex DupeFinder repo.

    sudo git clone https://github.com/l3uddz/plex_dupefinder /opt/plex_dupefinder
    
  3. Find your user & group.

    id
    
  4. Fix permissions of the Plex DupeFinder folder (replace user/group with yours).

    sudo chown -R user:group /opt/plex_dupefinder
    
  5. Go into the Plex DupeFinder folder.

    cd /opt/plex_dupefinder
    
  6. Install the required python modules.

    sudo python3 -m pip install -r requirements.txt
    
  7. Create a shortcut for Plex DupeFinder.

    sudo ln -s /opt/plex_dupefinder/plex_dupefinder.py /usr/local/bin/plex_dupefinder
    
  8. Generate a config.json file.

    plex_dupefinder
    
  9. Fill in Plex URL and credentials at the prompt to generated a Plex Access Token (optional).

    Dumping default config to: /opt/plex_dupefinder/config.json
    Plex Server URL: http://localhost:32400
    Plex Username: your_plex_username
    Plex Password: your_plex_password
    Auto Delete duplicates? [y/n]: n
    Please edit the default configuration before running again!
    
  10. Configure the config.json file.

    nano config.json
    

Configuration

Sample

{
  "AUDIO_CODEC_SCORES": {
    "Unknown": 0,
    "aac": 1000,
    "ac3": 1000,
    "dca": 2000,
    "dca-ma": 4000,
    "eac3": 1250,
    "flac": 2500,
    "mp2": 500,
    "mp3": 1000,
    "pcm": 2500,
    "truehd": 4500,
    "wmapro": 200
  },
  "AUTO_DELETE": false,
  "FIND_DUPLICATE_FILEPATHS_ONLY": false,
  "FILENAME_SCORES": {
    "*.avi": -1000,
    "*.ts": -1000,
    "*.vob": -5000,
    "*1080p*BluRay*": 15000,
    "*720p*BluRay*": 10000,
    "*HDTV*": -1000,
    "*PROPER*": 1500,
    "*REPACK*": 1500,
    "*Remux*": 20000,
    "*WEB*CasStudio*": 5000,
    "*WEB*KINGS*": 5000,
    "*WEB*NTB*": 5000,
    "*WEB*QOQ*": 5000,
    "*WEB*SiGMA*": 5000,
    "*WEB*TBS*": -1000,
    "*WEB*TROLLHD*": 2500,
    "*WEB*VISUM*": 5000,
    "*dvd*": -1000
  },
  "PLEX_LIBRARIES": [
    "Movies",
    "TV"
  ],
  "PLEX_SERVER": "https://plex.your-server.com",
  "PLEX_TOKEN": "",
  "SCORE_FILESIZE": true,
  "SKIP_LIST": [],
  "VIDEO_CODEC_SCORES": {
    "Unknown": 0,
    "h264": 10000,
    "h265": 5000,
    "hevc": 5000,
    "mpeg1video": 250,
    "mpeg2video": 250,
    "mpeg4": 500,
    "msmpeg4": 100,
    "msmpeg4v2": 100,
    "msmpeg4v3": 100,
    "vc1": 3000,
    "vp9": 1000,
    "wmv2": 250,
    "wmv3": 250
  },
  "VIDEO_RESOLUTION_SCORES": {
    "1080": 10000,
    "480": 3000,
    "4k": 20000,
    "720": 5000,
    "Unknown": 0,
    "sd": 1000
  }
}

Foreword

The scoring is based on: non-configurable and configurable parameters.

  • Non-configurable parameters are: bitrate, duration, height, width, and audio channel.

  • Configurable parameters are: audio codec scores, video codec scores, video resolution scores, filename scores, and file sizes (can only be toggled on or off).

  • Note: bitrate, duration, height, width, audio channel, audio and video codecs, video resolutions (e.g. SD, 480p, 720p, 1080p, 4K, etc), and file sizes are all taken from the metadata Plex retrieves during media analysis.

Details

Audio Codec Scores

  • You can set AUDIO_CODEC_SCORES to your preference.

  • The default settings should be sufficient for most.

Auto Delete

  • Under AUTO_DELETE, set your desired option.

    • "AUTO_DELETE": true, - Plex DupeFinder will run in automatic mode.

    • "AUTO_DELETE": false, - Plex DupeFinder will run in interactive mode. (Default)

      • Options:

        • Skip (i.e. keep both): 0

        • Choose the best one (and delete the rest): b

        • Select the item to keep (and delete the rest): # (i.e. 1, 2, 3, etc).

Find Duplicate File Paths Only

  • Finds duplicates that only share the same file path.

    "FIND_DUPLICATE_FILEPATHS_ONLY": false,
  • This option has a very limited use case, i.e. in instances where Plex may have glitched and created multiple duplicates of the same media item.

  • If using this setting, we recommend using UnionFS-Fuse that can generate whiteout files (*_HIDDEN~) to prevent the deletion of the actual file on the system. The _HIDDEN~ files can then be removed afterwards or even during the dupe cleanup (e.g. watch -n 5 rm -rf /mnt/local/.unionfs-fuse/*).

  • The default settings should be sufficient for most.

Filename Scores

  • You can set FILENAME_SCORES to your preference.

  • The default settings should be sufficient for most.

Plex Libraries

  1. Go to Plex and get all the names of your Plex Libraries you want to find duplicates in.

    • Example Library:

  2. Under PLEX_LIBRARIES, list your Plex Libraries exactly as they are named in your Plex.

    • Format:

      "PLEX_LIBRARIES": [
        "LIBRARY_NAME_1",
        "LIBRARY_NAME_2"
      ],

      or

      "PLEX_LIBRARIES": ["LIBRARY_NAME_1", "LIBRARY_NAME_2"],
    • Example:

      "PLEX_LIBRARIES": [
        "Movies",
        "TV"
      ],

Plex Server URL

Plex Token

  1. Obtain a Plex Access Token:

  2. Add the Plex Access Token to "PLEX_TOKEN" so that it now appears as "PLEX_TOKEN": "abcd1234",.

    • Note: Make sure it is within the quotes (") and there is a comma (,) after it.

Filesize Scores

  • "SCORE_FILESIZE": true will add more points to the overall score based on the actual file size.

  • The default settings should be sufficient for most.

  • Note: In some situations (e.g. a bad encode resulting in a large size), this may be something you want to turn it off (i.e. false).

Skip List

  • In Auto Delete mode, any file paths matching the patterns (i.e folders), listed in SKIP_LIST, will be ignored.

  • Example:

    "SKIP_LIST": ["/Movies4K/"]
  • The default settings should be sufficient for most.

Video Codec Scores

  • You can set VIDEO_CODEC_SCORES to your preference.

  • The default settings should be sufficient for most.

Video Resolution Scoring

  • You can set VIDEO_RESOLUTION_SCORES to your preference.

  • The default settings should be sufficient for most.

Plex

You will need to make sure that Allow media deletion is enabled in Plex.

  1. In Plex, click the Settings icon -> Server -> Library.

  2. Set the following:

    • Allow media deletion: enabled
  3. Click SAVE CHANGES.

Usage

Simply run the script/command:

plex_dupefinder

Donate

If you find this project helpful, feel free to make a small donation to the developer:

plex_dupefinder's People

Contributors

desimaniac avatar l3uddz avatar rickygrassmuck avatar saltydk avatar zenjabba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

plex_dupefinder's Issues

README

Hey dude, could you please post a quick README sometime on how to use this? I'm really interested in using this project but I'm unsure how to get started.

traceback Error -> for 1 specific section

Describe the bug
Using to scan my TV folder and got that issue

its work great for my Movies section but not sure why, suddenly it doesnt work for my TV folder

I manually switch in config file from my Movie to TV folder since I don't use same config

To Reproduce
don't know why it doesnt work :/

Logs

   _                 _                   __ _           _

_ __ | | _____ __ | | _ _ __ ___ / () __ | | ___ _ __
| '
| |/ _ \ / / / _ | | | | '_ \ / _ \ |_| | '_ \ / _ |/ _ \ '
|
| |) | | __/> < | (| | || | |) | / | | | | | (| | / |
| ./|_|_
//_\ _,|_,_| .
/ _|| ||| ||_,|__||
|
| |_|

#########################################################################

Author: l3uddz

URL: https://github.com/l3uddz/plex_dupefinder

--

Part of the Cloudbox project: https://cloudbox.works

#########################################################################

GNU General Public License v3.0

#########################################################################

Initialized
Finding dupes...
Found 2130 dupes for section 'Séries TV'
Traceback (most recent call last):
File "/usr/local/bin/plex_dupefinder", line 352, in
item.grandparentTitle, int(item.parentIndex), int(item.index), item.title)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

System Information

  • Plex DupeFinder Version: Master branch
  • Operating System: Debian 4.9.144-3 (2019-02-02) x86_64

thanks for help :)

Plex Versions not shown as duplicate

I create "Plex Versions", with the location: subfolder of original file.
Plex "get Info" shows both versions, ./plexdupes.py does not. I don't have anything in the SKIP_LIST.
Using Plex v1.16.5.1554

Example of movie and Plex Version:

.../movies/my Movie (2019)/my.Movie.WEB-DL.DDP5.1.H.264-NTG English.mkv
.../movies/my Movie (2019)/Plex Versions/Optimized for Mobile/my Movie (2019).mp4

Figuring out how filepath works

I am trying to make the script prefer files that are inside of directories, instead of files that are on their own.

For example:
/movies/Movie Name/movie.mkv
vs
/movies/movie.mkv

I have looked through the code for 'filepath_score' in plexdupes.py but it seems like it is just matching the start of the filename, and I am not sure how to make it variable based on the particular file it is looking at. Any advice on how to use this feature?

Recycle Bin instead of Delete

Describe the problem
If you pick the wrong "best" movie to keep then it deletes the other movies.

Describe any solutions you think might work
Would be nice if instead of deleting the files it moved them to a separate path "Recyle Bin"

Additional context
Running this from Unraid 6.8.3 in a docker

Mismatches in plex can get deleted

Wrong matches in plex have the potential to get deleted if running with Auto Delete enabled. This can be avoided by comparing the base path for the files before deleting by adding

            m = ""		
            fail = False		
            for row in data:		
                n = row[3][0].split("/")		
                n = n[0:6]		
                out = "/".join(n)		
                if m == "":		
                    m = out		
                else:		
                    if m != out:		
                        fail = True		
                        break		
                        		
            if fail == True:		
                print("breaking..")		
                continue		

Between lines 390 and 391, and by adding

           		
            		
            partz = {}		
            media_items = {}		
            best_item = None		
            pos = 0		
            for media_id, part_info in collections.OrderedDict(		
                    sorted(parts.items(), key=lambda x: x[1]['score'], reverse=True)).items():		
                pos += 1		
                if pos == 1:		
                    best_item = part_info		
                media_items[pos] = media_id		
                partz[media_id] = part_info		
            headers, data = build_tabulated(partz, media_items)		
            		
            m = ""		
            fail = False		
            for row in data:		
                n = row[3][0].split("/")		
                n = n[0:6]		
                out = "/".join(n)		
                if m == "":		
                    m = out		
                else:		
                    if m != out:		
                        fail = True		
                        break		
                        		
            if fail == True:		
                print("breaking..")		
                continue		
            		
            		

Between lines 417 and 418.

The only downside is that you have to change n = n[0:6] to match the directory depth to match the user's set up in plex.

[Feature Request] Ability to keep all "Plex Version" files

I convert 4K files using Plex versions down to 1080P, I would ideally like Plex to leave everything in the "Plex Versions" folder alone and only look at the files outside of that folder. Is this possible?

For example

Which media item do you wish to keep for '2001: A Space Odyssey' ?

  choice  score        id  file                                                                                                           size      duration    bitrate     resolution         codecs
--------  -------  ------  -------------------------------------------------------------------------------------------------------------  --------  ----------  ----------  -----------------  --------------------
       1  355,597  798202  ['/mnt/Movies/2001 A Space Odyssey (1968)/2001 A Space Odyssey 1968 [WEBDL-2160p][DTS-HD MA 5.1][x265].mkv']   22.09 GB  02:28:49    20.75 Mbps  4k (3840 x 1746)   hevc, dca-ma x 6
       2  213,288  798225  ['/mnt/Movies/2001 A Space Odyssey (1968)/Plex Versions/TV - 20 Mbps 15269/2001_ A Space Odyssey (1968).mp4']  11.92 GB  02:28:49    11.20 Mbps  1080 (1920 x 874)  h264, aac x 6
       3  12,608   525431  ['/mnt/Movies/2001 A Space Odyssey (1968)/2001 A Space Odyssey (1968) Bluray-1080p.mkv']                       379.9 MB  00:43:07    0 Kbps      Unknown (0 x 0)    Unknown, Unknown x 0

Choose item to keep (0 = skip | b = best): 0
Unexpected response, skipping deletion(s) for '2001: A Space Odyssey'

I would want to keep 1 and 2 in this case

Odd character/diaeresis in file name crashes script

Describe the bug
The character/diaeresis "ö" in a file name causes plex_dupefinder to crash.

To Reproduce
Have file with "ö" in it. Such as move-Auflösung.mkv
Run plex_dupefinder, when dupefinder comes this files is crashes with:

Traceback (most recent call last):
File "/usr/local/bin/plex_dupefinder", line 433, in
print("\nDetermining best media item to keep for %r ..." % item)
UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 81: ordinal not in range(128

Program doesn't finish

Describe the bug
Running plex dupefinder runs, and identifies my duplicates, but doesn't go into interactive mode to delete them

To Reproduce
Steps to reproduce the behavior:

  1. Follow instructions
  2. Rum program

Expected behavior
I had expected it to score my files and offer them for deletion

Screenshots
Screenshot_20230530_075403_JuiceSSH.jpg

Logs

You can enable debug mode by adding --loglevel=DEBUG to the run command.

System Information

  • Plex DupeFinder Version: 61e63c8
  • Operating System: [e.g. Ubuntu Server 22.04.2

* Run git rev-parse --short HEAD in the folder to get the

Additional context
Add any other context about the problem here.

Report Generation

Is there a way for this to just generate a text file that would say XX Movie with the path and the score? This way you could manually delete and do a verification process instead?

Exclude from

Is there any way to exclude a path from being looked at?
IE i have /mnt/Movies/4k and /mnt/Movies/4k/shared I want things in the /mnt/Movies/4k/shared to not be scanned cause I don't have access to delete the file etc. Read only.

unable to start plex_dupefinder.py

The readme page is very detailed and updated often. It should be the first thing used to find the answers to your questions.

If you still have a support question, use the Discord chat server to post your question in the #misc channel.

Support questions or requests will be redirected there and the issue ticket will be closed.

"Plex Versions" Trancode are considered as duplicate

Describe the bug
I have activated some preset "Conversions" from within Plex.
These transcoded versions are always stored in a subfolder "Plex Versions" of the movie folder.
I'd like these versions to be ignored during the duplicate search.

I have set the skip list param as:
"SKIP_LIST": ["/Plex Versions/],

But unfortunately, the Plex versions are still considered as duplicates:

Which media item do you wish to keep for 'Adults in the Room' ?

  choice  score       id  file                                                                                                       size     duration    bitrate    resolution         codecs
--------  -------  -----  ---------------------------------------------------------------------------------------------------------  -------  ----------  ---------  -----------------  -------------
       1  115,099  15116  ['/data/Movies/Adults in the Room (2019)/Adults.in.the.Room.2019.WEBDL-1080p.EVO.[EN].tt7493370.mkv']      4.4 GB   02:07:14    4.84 Mbps  1080 (1920 x 806)  h264, ac3 x 6
       2  85,891   15117  ['/data/Movies/Adults in the Room (2019)/Plex Versions/TV-720p@4Mbps 1925/Adults in the Room (2019).mp4']  2.68 GB  02:07:14    2.94 Mbps  720 (1280 x 538)   h264, ac3 x 6

System Information

  • Plex DupeFinder Version: 900d4a0
  • Operating System: Ubuntu Server 18.04 LTS with cloudbox master

* Run git rev-parse --short HEAD in the folder to get the GIT COMMIT ID.
900d4a0

Docker deployment

It would be great if someone could set this up to use in docker with all of the python requirements already ready to go. Or if at docker run time the requirements could be pulled and installed.

I'm running unRaid and I would love to have this run there with all of my other docker containers instead of on a different machine.

Not Skipping Files with pattern in the Path of SKIP_LIST

Describe the bug
Even though the SKIP_LIST variable is set, files with the pattern specified are being considered when finding duplicates and the 1080p/720p copy is then deleted when running in automatic mode (which, base on the docs, is the only mode that SKIP_LIST is considered).

To Reproduce
Steps to reproduce the behavior:

  1. Run plex_dupefinder once to generate the initial config file.
  2. Edit the config file to have the settings from the Additional Context section below.
  3. Run plex_dupefinder with the configuration set to automatic mode.
  4. Kill the script after the first file or two to avoid trashing valid copies of files.

Expected behavior
That the SKIP_LIST be considered when the script is run in automatic mode and paths containing the pattern(s) from this list be excluded from the results that are evaluated for deletion.

Screenshots
N/A

Logs

$ plex_dupefinder --loglevel=DEBUG

       _                 _                   __ _           _
 _ __ | | _____  __   __| |_   _ _ __   ___ / _(_)_ __   __| | ___ _ __
| '_ \| |/ _ \ \/ /  / _` | | | | '_ \ / _ \ |_| | '_ \ / _` |/ _ \ '__|
| |_) | |  __/>  <  | (_| | |_| | |_) |  __/  _| | | | | (_| |  __/ |
| .__/|_|\___/_/\_\  \__,_|\__,_| .__/ \___|_| |_|_| |_|\__,_|\___|_|
|_|                             |_|

#########################################################################
# Author:   l3uddz                                                      #
# URL:      https://github.com/l3uddz/plex_dupefinder                   #
# --                                                                    #
#         Part of the Cloudbox project: https://cloudbox.works          #
#########################################################################
#                   GNU General Public License v3.0                     #
#########################################################################

Initialized
Finding dupes...
Found 1059 dupes for section 'Movies'
Found 5982 dupes for section 'TV Shows'

Determining best media item to keep for '3 from Hell' ...
        Keeping  : 365248 - ['/mnt/media/4K/4K-Movies/3 from Hell (2019)/3 from Hell 2019 Remux-2160p.mkv']
        Removing : 497434 - ['/mnt/media/Movies/3 from Hell (2019)/3 from Hell 2019 Remux-1080p.mkv']
                Deleted media item: 497434

Determining best media item to keep for '6 Underground' ...
        Keeping  : 365160 - ['/mnt/media/4K/4K-Movies/Six Underground (0)/6 Underground 2019 WEBRip-2160p.mkv']
        Removing : 365382 - ['/mnt/media/Movies/Six Underground (0)/6 Underground 2019 WEBDL-1080p.mkv']
^CTraceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/plex_dupefinder", line 465, in <module>
    delete_item(part_info['show_key'], media_id)
  File "/usr/local/bin/plex_dupefinder", line 203, in delete_item
    if requests.delete(delete_url, headers={'X-Plex-Token': cfg.PLEX_TOKEN}).status_code == 200:
  File "/usr/local/lib/python3.6/dist-packages/requests/api.py", line 152, in delete
    return request('delete', url, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 383, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.6/http/client.py", line 1373, in getresponse
    response.begin()
  File "/usr/lib/python3.6/http/client.py", line 311, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.6/http/client.py", line 272, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
KeyboardInterrupt

System Information

  • Plex DupeFinder Version: [Master, 900d4a0]
  • Operating System: [e.g. Ubuntu Server 18.04 LTS]

Additional context
Full config.json with sensitive bits redacted.

{
  "AUDIO_CODEC_SCORES": {
    "Unknown": 0,
    "aac": 1000,
    "ac3": 1000,
    "dca": 2000,
    "dca-ma": 4000,
    "eac3": 1250,
    "flac": 2500,
    "mp2": 500,
    "mp3": 1000,
    "pcm": 2500,
    "truehd": 4500,
    "wmapro": 200
  },
  "AUTO_DELETE": true,
  "FIND_DUPLICATE_FILEPATHS_ONLY": false,
  "FILENAME_SCORES": {
    "*.avi": -1000,
    "*.ts": -1000,
    "*.vob": -5000,
    "*1080p*BluRay*": 15000,
    "*720p*BluRay*": 10000,
    "*HDTV*": -1000,
    "*PROPER*": 1500,
    "*REPACK*": 1500,
    "*Remux*": 20000,
    "*WEB*CasStudio*": 5000,
    "*WEB*KINGS*": 5000,
    "*WEB*NTB*": 5000,
    "*WEB*QOQ*": 5000,
    "*WEB*SiGMA*": 5000,
    "*WEB*TBS*": -1000,
    "*WEB*TROLLHD*": 2500,
    "*WEB*VISUM*": 5000,
    "*dvd*": -1000
  },
  "PLEX_LIBRARIES": [
    "Movies",
    "TV Shows"
  ],
  "PLEX_SERVER": "http://127.0.0.1:32400",
  "PLEX_TOKEN": "<REDACTED>",
  "SCORE_FILESIZE": true,
  "SKIP_LIST": ["/4K/"],
  "VIDEO_CODEC_SCORES": {
    "Unknown": 0,
    "h264": 10000,
    "h265": 5000,
    "hevc": 5000,
    "mpeg1video": 250,
    "mpeg2video": 250,
    "mpeg4": 500,
    "msmpeg4": 100,
    "msmpeg4v2": 100,
    "msmpeg4v3": 100,
    "vc1": 3000,
    "vp9": 1000,
    "wmv2": 250,
    "wmv3": 250
  },
  "VIDEO_RESOLUTION_SCORES": {
    "1080": 10000,
    "480": 3000,
    "4k": 20000,
    "720": 5000,
    "Unknown": 0,
    "sd": 1000
  }
}

UnionFS

When using UnionFS and Google drive this will not actually delete the file from the system just removes it from Plex Database. So if you ever do a manual scan it will find all the dupes.

Option to remove the worst, instead of keep the best?

I'd like to see an option that would remove the worst quality item and leave the best.

So for example, let's say I have a 4K item, a 1080p, a plex-optimized copy at 720p and then and old xvid/SD of the same. I'd like to just remove the xvid/SD and leave all the others.

Will it work with Python under windows?

When attempting to run, after pip installing the items listed in requirements, I get the following:

C:\Python27\Scripts\plex_dupefinder>.\plexdupes.py
Traceback (most recent call last):
File "C:\Python27\Scripts\plex_dupefinder\plexdupes.py", line 11, in
from config import cfg
File "C:\Python27\Scripts\plex_dupefinder\config.py", line 146, in
tmp = load_config()
File "C:\Python27\Scripts\plex_dupefinder\config.py", line 99, in load_config
return AttrConfig(json.load(fp))
File "C:\Python27\Scripts\plex_dupefinder\config.py", line 33, in init
super().init(config)
TypeError: super() takes at least 1 argument (0 given)

C:\Python27\Scripts\plex_dupefinder>

Is this a Windows not supported issue, or did I screw something up along the way?

Installed using instructions getting mapping from collections error

Describe the bug
I tried installing on two different VM's running Ubuntu 22.04.1 LTS. I made sure everything was updated using apt update/upgrade. I recive the following message on both servers:

Traceback (most recent call last):
File "/usr/local/bin/plex_dupefinder", line 11, in
from config import cfg
File "/opt/plex_dupefinder/config.py", line 8, in
from attrdict import AttrDict
File "/usr/local/lib/python3.10/dist-packages/attrdict/init.py", line 5, in
from attrdict.mapping import AttrMap
File "/usr/local/lib/python3.10/dist-packages/attrdict/mapping.py", line 4, in
from collections import Mapping
ImportError: cannot import name 'Mapping' from 'collections' (/usr/lib/python3.10/collections/init.py)

To Reproduce
Steps to reproduce the behavior:

  1. follow the installation instructions

Expected behavior
the plex_dupefinder to show the output text

You can enable debug mode by adding --loglevel=DEBUG to the run command.

System Information

  • Plex DupeFinder Version: current release as of 8/28/22
  • Operating System: Ubuntu Server 22.04.1 LTS

* Run git rev-parse --short HEAD in the folder to get the GIT COMMIT ID.
COMMIT ID is 6650e39

ImportError: cannot import name 'Mapping' from 'collections'

Logs

  File "/tools/plex_dupefinder/plex_dupefinder.py", line 11, in <module>
    from config import cfg
  File "/tools/plex_dupefinder/config.py", line 8, in <module>
    from attrdict import AttrDict
  File "/usr/local/lib/python3.10/dist-packages/attrdict/__init__.py", line 5, in <module>
    from attrdict.mapping import AttrMap
  File "/usr/local/lib/python3.10/dist-packages/attrdict/mapping.py", line 4, in <module>
    from collections import Mapping
ImportError: cannot import name 'Mapping' from 'collections' (/usr/lib/python3.10/collections/__init__.py)

Additional context
I recently started getting this error. Any ideas why? maybe a plex update?

Feature Request: Write to .plexignore instead of delete

For servers that have multiple shared paths a writable .plexignore-file could help with keeping duplicates hidden. An option for writing the filepath for the lowest quality file as a line in a .plexignore-file could possibly make this work better.

Doesn't work with spaces in file path

Describe the bug

When setting plex_dupefinder to delete, it will detect correctly filepaths that have spaces in it but will fail to delete them.

To Reproduce
Steps to reproduce the behavior:

  1. Set plex_dupefinder to auto delete
  2. Check for dups
  3. Watch as it fails to delete path paths and files that have spaces in them

Expected behavior

I would expect that it would correctly handle deletion of these files that have spaces in their full file path (and file
name)

Screenshots
alex_internal1___

Logs
`Initialized
Finding dupes...
Found 0 dupes for section 'Movies'
Found 150 dupes for section 'TV Shows'

Determining best media item to keep for 'The 100 - 05x01 - Eden' ...
Keeping : 690455 - ['/shared/tv/The 100/Season 5/The 100.S5.E1.Eden.m4v']
Removing : 690467 - ['/shared/tv/The 100/Season 5/The 100.S5.E1.Eden-1.m4v']
Error deleting media item: 690467`

System Information

  • Plex DupeFinder Version: commit 0430d84
  • Operating System: Ubuntu Server 18.04

Perfect !

Thx, it's awesome !
It's possible to ADD option "Dissotiate" with 0 and B ? (when Plex are wrong)

Thx.

Keep best version of each resolution ?

Hey.
Nice work. But is it possible to keep the best quality for each resolution ? e.g. keep best Version of 4k, 1080p aswell as 720p, but delete everything else ? To keep Plex Transcodes as low as possible, of course.

thanks

Prioritize by path vs bitrate

Thanks for creating this awesome utility!

Feature request for consideration:
In the config, provide the option to prioritize by path vs quality. e.g. When s dupe is found in 3 paths, always keep the duplicate in /path/to/media/a/ and always delete the media in /path/to/media/b/ and /path/to/media/c/

This would be useful for those that have a "read-only" mount of content + a read/write mount of content.

unicode character issues

Determining best media item to keep for 'A.I. Rising' ...
Traceback (most recent call last):
  File "/usr/local/bin/plex_dupefinder", line 461, in <module>
    print("\tRemoving : %r - %r" % (media_id, part_info['file']))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe8' in position 55: ordinal not in range(128)

Configure by Bitrate

It would be great to be able to configure it to "Keep Best Bitrate". Sometimes it is possible to have for example a 720p at a higher bitrate (i.e. better rip) than a 1080p. In these cases bitrate is often more important than resolution. Similarly, if they are the same resolution that you would simply want to keep the higher bitrate file. While the "File Size" option is available, this also may not be the best when you account for h.265 vs h.264 where 265 provides a much better compression rate and therefore may have a better quality at a smaller file size.

I would imagine this would be less "Scoring" and more of just a direct comparison - greater bitrate wins.

Add language as important criteria

Describe the problem
Compare / list language and make it available for auto delete

Describe any solutions you think might work

Something like that might work... important is "item.reload()", otherwise AudioStream info is not provided.
...
# loop returned duplicates
for item in dupes:
item.reload()
languages = []
for media in item.media:
#print(media)
for part in media.parts:
#print(part)
for stream in part.streams:
if type(stream) is plexapi.media.AudioStream:
#print(type(stream))
#print(stream, end=', ')
#print(stream.codec , end=', ')
#print(stream.languageCode)
languages.append(stream.languageCode)
....

        elif 'languages' in k:
            tmp.append(parts[item_id][k])

...

#add language to score
if 'ger' in media_info["languages"]:
    score+=20000

...

Additional context
Nice tool you have created. FInd it very useful...

Add music scan capacity

Hi there,

I would like to use plex dupefinder to cleanup my music library.
I currently do have several versions of albums (like, sometimes, literally 4 or 5 version. of an album) and would like to remove all the lower bitrate versions.

I understand it is already possible in current state to score based on the files type mp3 (1000) vs. flac (2500). But is there any way to go a little further and to tell dupefinder to score the mp3/flac files based on their bitrate ?

Example :

mp3 128 kbit (100)
mp3 192 kbit (200)
mp3 (V0) (1000)
flac 16bit (2000)
flac 24 bit (3000)

Of course, as we are speaking about music albums, the structure is the following :

/album name folder/files.mp3 (or flac)

Therefore it would be good to analyze each item as an "album", meaning we would then analyze the content of each folder, and delete the music album entirely based on this analysis (and not each file independently).

this would help me a lot and I'm pretty sure other users would take benefit of this.

Can you please let me know if this is feasible ? As plexamp now allows to display the bitrate, I guess this information is stored somewhere in the plex library database.

Thanks !

Type Error

Found 4109 dupes for section 'TV Shows'
Traceback (most recent call last):
  File "C:/Users/bbaker/Desktop/plex_dupefinder-master/plex_dupefinder.py", line 353, in <module>
    item.grandparentTitle, int(item.parentIndex), int(item.index), item.title)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Dump to CSV/XLSX file for offline editing

Describe the problem
I've merged all my movies into my plex system, now I have about 3000 movies that are "duplicates". I could auto delete files, but Plex has mismatched a lot of movies and that would just delete good movies.

Describe any solutions you think might work
Export a CSV/XLSX so that I can edit the files I want to keep and those I want to delete, it will also allow me to see the mismatched movies. After the file is edited, importing that in the plex_dupefinder to batch process removals.

plexapi.exceptions.NotFound: Invalid library section: TV

Describe the bug
on start it stops with error:

# plex_dupefinder

       _                 _                   __ _           _
 _ __ | | _____  __   __| |_   _ _ __   ___ / _(_)_ __   __| | ___ _ __
| '_ \| |/ _ \ \/ /  / _` | | | | '_ \ / _ \ |_| | '_ \ / _` |/ _ \ '__|
| |_) | |  __/>  <  | (_| | |_| | |_) |  __/  _| | | | | (_| |  __/ |
| .__/|_|\___/_/\_\  \__,_|\__,_| .__/ \___|_| |_|_| |_|\__,_|\___|_|
|_|                             |_|

#########################################################################
# Author:   l3uddz                                                      #
# URL:      https://github.com/l3uddz/plex_dupefinder                   #
# --                                                                    #
#         Part of the Cloudbox project: https://cloudbox.works          #
#########################################################################
#                   GNU General Public License v3.0                     #
#########################################################################

Initialized
Finding dupes...
Traceback (most recent call last):
  File "/usr/local/bin/plex_dupefinder", line 344, in <module>
    dupes = get_dupes(section, section_type)
  File "/usr/local/bin/plex_dupefinder", line 51, in get_dupes
    dupes = plex.library.section(plex_section_name).search(duplicate=True, libtype=sec_type)
  File "/usr/local/lib/python3.5/dist-packages/plexapi/library.py", line 55, in section
    raise NotFound('Invalid library section: %s' % title)
plexapi.exceptions.NotFound: Invalid library section: TV

To Reproduce
Steps to reproduce the behavior:
fresh install under debian 9 described in readme.md

System Information

MASTER cloned, debian 9
f0a79ce

Additional context
i have Movie and TV Libs

unable to delete files

It deleted a few files successfully, and for most of them I am getting "Error deleting media item" when I try to delete the file.

The log doesn't say much other then the same thing. Appreciate any help. I am running Ubuntu 16.04.

something wrong on my Mac OS

Hi,

when I run plex_dupefinder I have this errors:

python plexdupes.py Traceback (most recent call last): File "plexdupes.py", line 11, in <module> from config import cfg File "/Users/martinoroberto/Scripts/plex_dupefinder/config.py", line 146, in <module> tmp = load_config() File "/Users/martinoroberto/Scripts/plex_dupefinder/config.py", line 99, in load_config return AttrConfig(json.load(fp)) File "/Users/martinoroberto/Scripts/plex_dupefinder/config.py", line 33, in __init__ super().__init__(config) TypeError: super() takes at least 1 argument (0 given)

this is my config file

cat config.json

{ "AUDIO_CODEC_SCORES": { "Unknown": 0, "aac": 1000, "ac3": 1000, "dca": 2000, "dca-ma": 4000, "eac3": 1250, "flac": 2500, "mp2": 500, "mp3": 1000, "pcm": 2500, "truehd": 4500, "wmapro": 200 }, "AUTO_DELETE": false, "FILENAME_SCORES": { "*.avi": -1000, "*.ts": -1000, "*.vob": -5000, "*1080p*BluRay*": 15000, "*720p*BluRay*": 10000, "*HDTV*": -1000, "*PROPER*": 1500, "*REPACK*": 1500, "*Remux*": 20000, "*WEB*CasStudio*": 5000, "*WEB*KINGS*": 5000, "*WEB*NTB*": 5000, "*WEB*QOQ*": 5000, "*WEB*SiGMA*": 5000, "*WEB*TBS*": -1000, "*WEB*TROLLHD*": 2500, "*WEB*VISUM*": 5000, "*dvd*": -1000 }, "FILEPATH_SCORES": {}, "PLEX_SECTIONS": { "Film": 0 }, "PLEX_SERVER": "https://plex.myDomain.com", "PLEX_TOKEN": "myToken", "SCORE_FILESIZE": true, "SKIP_LIST": [], "VIDEO_CODEC_SCORES": { "Unknown": 0, "h264": 10000, "h265": 5000, "hevc": 5000, "mpeg1video": 250, "mpeg2video": 250, "mpeg4": 500, "msmpeg4": 100, "msmpeg4v2": 100, "msmpeg4v3": 100, "vc1": 3000, "vp9": 1000, "wmv2": 250, "wmv3": 250 }, "VIDEO_RESOLUTION_SCORES": { "1080": 10000, "480": 3000, "4k": 20000, "720": 5000, "Unknown": 0, "sd": 1000 } }

May you help me!

Unicode errors abound

Getting a lot of these errors:

--- Logging error ---
Traceback (most recent call last):
  File "C:\Program Files\Python35\lib\logging\__init__.py", line 982, in emit
    stream.write(msg)
  File "C:\Program Files\Python35\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2605' in position 80: character maps to <undefined>
Call stack:
  File "plexdupes.py", line 359, in <module>
    log.info("Processing: %r", title)
Message: 'Processing: %r'
Arguments: ("The 100 - 01x01 - The 100 - S01E01 'Pilot' 720p  \u2605L@\u266bBerT\u2605",)
--- Logging error ---
Traceback (most recent call last):
  File "C:\Program Files\Python35\lib\logging\__init__.py", line 982, in emit
    stream.write(msg)
  File "C:\Program Files\Python35\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0101' in position 56: character maps to <undefined>
Call stack:
  File "plexdupes.py", line 359, in <module>
    log.info("Processing: %r", title)
Message: 'Processing: %r'
Arguments: ('Hawaii Five-0 - 05x16 - N\u0101nahu',)
--- Logging error ---
Traceback (most recent call last):
  File "C:\Program Files\Python35\lib\logging\__init__.py", line 982, in emit
    stream.write(msg)
  File "C:\Program Files\Python35\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0101' in position 61: character maps to <undefined>
Call stack:
  File "plexdupes.py", line 359, in <module>
    log.info("Processing: %r", title)
Message: 'Processing: %r'
Arguments: ("Hawaii Five-0 - 05x20 - 'Ike H\u0101nau",)
--- Logging error ---
Traceback (most recent call last):
  File "C:\Program Files\Python35\lib\logging\__init__.py", line 982, in emit
    stream.write(msg)
  File "C:\Program Files\Python35\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u016b' in position 61: character maps to <undefined>
Call stack:
  File "plexdupes.py", line 359, in <module>
    log.info("Processing: %r", title)
Message: 'Processing: %r'
Arguments: ('Hawaii Five-0 - 09x08 - Lele p\u016b n\u0101 manu like',)
--- Logging error ---
Traceback (most recent call last):
  File "C:\Program Files\Python35\lib\logging\__init__.py", line 982, in emit
    stream.write(msg)
  File "C:\Program Files\Python35\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03c0' in position 61: character maps to <undefined>
Call stack:
  File "plexdupes.py", line 359, in <module>
    log.info("Processing: %r", title)
Message: 'Processing: %r'
Arguments: ('Person of Interest - 02x11 - 2πR',)

Some processing does occur, then it crashes with the error:

Traceback (most recent call last):
  File "plexdupes.py", line 377, in <module>
    print("\nWhich media item do you wish to keep for %r ?\n" % item)
  File "C:\Program Files\Python35\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2605' in position 93: character maps to <undefined>

Any way to fix this?

unexpected entry causes ValueError: invalid literal for int() with base 10:

Describe the bug
in keep_item, it is expecting a "s", "b" or a number. If you type ss by mistake, you will get
Choose item to keep (0 or s = skip | 1 or b = best): ss
Traceback (most recent call last):
File "./plex_dupefinder.py", line 413, in
if (keep_item.lower() != 's') and (keep_item.lower() == 'b' or 0 < int(keep_item) <= len(media_items)):
ValueError: invalid literal for int() with base 10: 'ss'

To Reproduce

  1. Start plex_dupefinder
  2. Get to an entry
  3. Enter "ss" for example

Expected behavior
Either reask for input (it should validate that the input is correct) and prompt (ideally) or skip with unexpected

Screenshots
Choose item to keep (0 or s = skip | 1 or b = best): ss
Traceback (most recent call last):
File "./plex_dupefinder.py", line 413, in
if (keep_item.lower() != 's') and (keep_item.lower() == 'b' or 0 < int(keep_item) <= len(media_items)):
ValueError: invalid literal for int() with base 10: 'ss'

Logs
Link to debug or trace log files.

You can enable debug mode by adding --loglevel=DEBUG to the run command.

System Information

  • Plex DupeFinder Version: 900d4a0
  • Operating System: MacOS 10.15.5
    Python 3.8.2 (v3.8.2:7b3ab5921f, Feb 24 2020, 17:52:18)

900d4a0

Additional context
Add any other context about the problem here.

Remove Subtitle(s) When Present with Duplicate Media

Let me start off by saying this script is AWESOME. Works perfectly!

One feature that would be great is to implement the ability to also delete any corresponding subtitle file(s), if present, when deleting the original duplicate media file.

Example.

/movies/this is a movie (2019)
/this is a movie 2018 - 1080p.mkv
/this is a movie 2018 - 1080p.en.srt
/this is a movie 2018 - 1080p.eng.srt
/this is a movie 2018 - 720.mkv <<<<<delete this file
/this is a movie 2018 - 720.en.srt <<<<<delete this file
/this is a movie 2018 - 720.eng.srt <<<<<delete this file

Set config from environment

Describe the problem
At the moment (I believe) the config needs to be specified in the config.json file. It would be useful to be able to override k,v combinations from the environment (when they exist). This would assist with dockerising the script.

Describe any solutions you think might work
After the file has been loaded merge the environment dict so that any set variables are overwritten in the config.

Unable to exclude remuxes from scope

Describe the bug
I've been unable to have plex_dupefinder ignore anything with [R|r]emux in the filename. For some of my favourite films I like to keep as high a quality as possible (i.e. remux) whilst also having an encode available (e.g. 720p WEBDL for iPad).

I've tried to add these into the SKIP_LIST as a pattern but for some reason this doesn't seem to work.

To Reproduce
My config.json contains the following for SKIP_LIST and I've tried a number of different combinations.

"SKIP_LIST": ["/Plex Versions/", "Remux", "remux"]

Expected behavior
I would expect nothing to be identified as a dupe when I have an encode and a remux.

e.g.

  • My.Film.Remux.1080p.AVC.AtmosTrueHD.8ch.mkv
  • My.Film.WEB-DL.720p.AVC.AC3.6ch.mkv

Instead these are both flagged and I'm prompted to pick one to keep.

System Information

  • Plex DupeFinder Version: Master [900d4a0]
  • Operating System: Ubuntu Server 18.04 LTS

Crash when parsing "…" character

Describe the bug
Plex_Dupefinder can't handle the movie Once Upon a Time… in Hollywood, the "…" character generates the error

UnicodeEncodeError: 'ascii' codec can't encode character '\u2026' in position 59: ordinal not in range(128)

To Reproduce
Steps to reproduce the behavior:

  1. Have two files named with the full movie title
  2. Run plex_dupefinder

Expected behavior
Normal functionality

Logs

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.6/logging/__init__.py", line 996, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2026' in position 47: ordinal not in range(128)
Call stack:
  File "/usr/local/bin/plex_dupefinder", line 366, in <module>
    log.info("Processing: %r", title)
Message: 'Processing: %r'
Arguments: ('Once Upon a Time\u2026 in Hollywood',)
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.6/logging/__init__.py", line 996, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2026' in position 339: ordinal not in range(128)
Call stack:
  File "/usr/local/bin/plex_dupefinder", line 375, in <module>
    part_info)
Message: 'ID: %r - Score: %s - Meta:\n%r'
Arguments: (19252, 290870, {'id': 19252, 'video_bitrate': 15811, 'audio_codec': 'dca-ma', 'audio_channels': 6, 'video_codec': 'h264', 'video_resolution': '1080', 'video_width': 1920, 'video_height': 800, 'video_duration': 9689686, 'file': ['/p/movies/Once Upon a Time in Hollywood (2019)/Once Upon a Time\u2026 in Hollywood (2019) Bluray-1080p.mkv'], 'multipart': False, 'file_size': 19151003257, 'score': 290870, 'show_key': '/library/metadata/14156'})
Found 4 dupes for section 'Movies 4K'
Traceback (most recent call last):
  File "/usr/local/bin/plex_dupefinder", line 385, in <module>
    print("\nWhich media item do you wish to keep for %r ?\n" % item)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2026' in position 59: ordinal not in range(128)

File size score - Ascending or Descending

This is a cool tool, nice work!

Describe the problem
I sometimes see duplicates on H265 videos and i would like to keep the smaller file.

Describe any solutions you think might work
Implement ASC or DSC (large / small) for file score so that its possible to have smaller files get a larger score.

Restart from where you left off

Describe the problem
If you have hundreds of duplicates and you need to skip over a bunch but then stop, you have to go through that whole list again to skip

Describe any solutions you think might work
a method that saves where in the file list it is and the ability to restart from that point

Additional context
Add any other context or screenshots about the feature request here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.