GithubHelp home page GithubHelp logo

joao-conde / feup-lapd Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 13.13 MB

Repository to host the Markup Languages and Document Processing project, a fourth year course @FEUP

Python 87.55% HTML 11.06% Dockerfile 1.39%

feup-lapd's Introduction

feup-lapd's People

Contributors

cyrilico avatar msramalho avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

msramalho

feup-lapd's Issues

Pitch Rehearsel

I suggest Wednesday afternoon for a rehearsal of the Pitch and live demo.

Public

Make this beauty public

Avoid duplicate datasets

in the structure_the_unstructured a new dataset is always created regardless of being a duplicate, this should be handle by, for instance, performing a hash of all the files in the dataset folder and checking if it already exists, it should also be included as an optional operation by adding it as a boolean value to the argparse options (see the --verbose case)

Implement Metrics, according to the script

This involves updating the extract_metrics method in the sensor class, if necessary implement it in child classes that call this one. The metrics are the ones in the report and the ones mentioned in the FRAUNHOFER/FH meeting

Correct ID on inserted documents

Sample code:

from bson.binary import JAVA_LEGACY
from bson.codec_options import CodecOptions
from pymongo import MongoClient
from uuid import uuid4

client = MongoClient(...)
db = client.get_database(<DB_NAME>, CodecOptions(uuid_representation=JAVA_LEGACY)) #DB_NAME should be demdata_db in our case
id = db.cenas.insert_one({'_id': uuid4(), 'other': '...'}).inserted_id #when inserting new documents into the DB, set the _id property as done here
print(id) #printed as UUID4 (converted by driver), in mongo shell appears as BinData(3, ...)

Manual Documentation Cleanup

A cleanup of each "subproduct" (api, app) 's README.md file is required and also a review for the packages used as only those that are strictly necessary, so:

  • review README instructions for instalation/deploy
  • ensure only required packages/modules are specified in the requirements/packages

Setup with pymongo

  • Improve the readme instructions for deplying the db and on which port to access it
  • make a PoC of a document inserted in a test-collection in the database
  • tell the others

Give context of the user in dispatcher

Because we are processing users separatly, the message of "No dispatcher found for File.txt" or "Dispatcher found" appears N times, being N the number of users. Giving context or showing only once maybe?

Screenshot from 2019-05-09 18-19-50

Submeter

  • Paper
  • Code (without .git, .ARCHIVE, data folders and report folders)
  • updated docs
  • presentation

Sugarcoat with API

  • Extra docker image
  • Small Flask server with some GET routes focusing on metrics/DB expansion

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.