GithubHelp home page GithubHelp logo

kairohm / tikatree Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 43 KB

Directory tree metadata parser using Apache Tika

License: MIT License

Python 100.00%
tika metadata metadata-parser file-tree apache-tika directory-tree

tikatree's Introduction

tikatree

Directory tree metadata parser using Apache Tika

tikatree parses all files in a directory and creates a:

  • _metadata.json - A single file with the metdata from each file that was parsed
  • _file_tree.json and _file_tree.csv - A list of all files and directories with some basic information. One file that's easy to read and another for importing into excel and things like that
  • _directory_tree.txt - A graphical representation of the directory
  • .sfv - A CRC32 checksum

Installation

pip install tikatree

tikatree uses tika-python for interacting with Apache Tika. You may need to refer to the tika-python documentation if you have any issues with Tika.

Usage

Open up a command line and type tikatree <directory>, by default it'll create all files at or above that directory. You can target multiple directories, just put a space in between each one on the command line.

usage: tikatree [-h] [-v] [-d] [-e EXCLUDE [EXCLUDE ...]] [-f] [-m] [-nm] [-s] [-y] DIRECTORY [DIRECTORY ...]

A directory tree metadata parser using Apache Tika, by default it runs arguments: -d, -f, -m, -s

positional arguments:
  DIRECTORY             directory(s) to parse

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -d, --directorytree   create directory tree
  -e EXCLUDE [EXCLUDE ...], --exclude EXCLUDE [EXCLUDE ...]
                        directory(s) to exclude, includes subdirectories
  -f, --filetree        creates a json and csv file tree
  -m, --metadata        parse metadata
  -nm, --newmetadata    create individual metadata files in a 'tikatree' directory
  -s, --sfv             create sfv file
  -y, --yes             automatically overwrite older files

Example

I've included some output examples in the output_examples folder.

Windows Fixes

When parsing files too fast there can be connection errors to Apache Tika. In order to get around this run these commands in Powershell as Admin

$KeyPath = "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"
Set-ItemProperty -Path $KeyPath -Name "MaxUserPort" -Value 65534
$KeyPath = "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"
Set-ItemProperty -Path $KeyPath -Name "TcpTimedWaitDelay" -Value 30

Part of the Keep Dreaming Project

tikatree's People

Contributors

zeigren avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.