GithubHelp home page GithubHelp logo

radhetians / classification-and-analysis-of-unstructured-data-for-ubuntu Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 1.0 38 KB

This is a tool/system for Ubuntu 19.04 machine which reads the data from files available in multiple SoRs (System of Records) and suggests what type of record it is, How old it is, when it was accessed by user last. Extract file metadata, etc.

Home Page: https://github.com/RadheTians/Classification-And-Analysis-of-Unstructured-Data-For-Ubuntu

License: MIT License

Python 100.00%

classification-and-analysis-of-unstructured-data-for-ubuntu's Introduction

Classification-And-Analysis-of-Unstructured-Data-For-Ubuntu

Project statement:

Classification and analysis of unstructured data: Need to develop a tool/system which reads the data (There are Terabytes of unstructured data available in various storage systems like servers, local drives, online portals etc. These unstructured data are informed of below categories but not limited to it. Unstructured data available in forms like MS office formats(.xlsx,.docx, .pptx, .txt, .msg, etc), PDF form(scanned and text recognizable), Image form(.jpg. png,etc.), engineering drawing(.dwg, .dgn, etc.), Databases (.accdb, .dgf, .xml, etc.) Non-Engineering Documents: Emails, Presentations, Photos, Videos, Circulars, etc.) from files available in multiple SoRs (System of Records) and suggest what type of record it is, How old it is, when it was accessed by user last. Extract file metadata, etc. (Ref. https://patents.google.com/patent/US8266148B2) Hints: Metadata-based classification using mongoDB

META-DATA EXTRACTOR

META-DATA EXTRACTOR is a tool for classification and analysis of unstructured data,which extracts metadata of files in the unstructure ddata using mongoDB.

Thus, it allows Metadata-based classification using mongoDB.

USAGE

Python2, PyQt5, monngoDB Cluster.

REQUIREMENT

Environment:

Source code to this tool is coded in Python2, with its GUI running in PyQt5 env.

Packages:

subprocess, os, pymongo.

Command Line Interface(CLI):

exiftool.

INSTALLATION

Use the package manager pip to install packages subprocess, os, pymongo and set env for Python2.

Packages :

$ pip install subprocess
$ pip install os
$ pip install pymongo

Python2 env :

$ sudo apt update
$ sudo apt upgrade
$ sudo apt install python2.7 python-pip

Setting up PyQt5 env for Python2 :

$ sudo apt-get install python-pyqt5

Installing Command line interface(CLI) exiftool :

$ sudo apt install libimage-exiftool-perl

USER INSTRUCTIONS

  1. Open terminal(Ctrl+Alt+T), and change/move to the directory of the file mdata.py ,i.e, mini.
$ cd /'PATH'/mini
  1. Open the tool to start, by executing mdata.py file.
$ python mdata.py
  1. Enter the pathname in the tool for metadata extraction and click on the button for its corresponding type of sorted output.

classification-and-analysis-of-unstructured-data-for-ubuntu's People

Contributors

radhetians avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

rajeev00021

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.