GithubHelp home page GithubHelp logo

drc's People

Contributors

claesn avatar fsteeg avatar matana avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

drc's Issues

Design and implement user profiles

  • What information is required (name, contact, location, access rights, age, language, region)?
  • How to store user data?
  • What else do we need to gather useful statistics?

Evaluate Mylyn WikiText editor for OCR result editing

Some related questions:

  • How will we transform the raw text to basic wiki markup (and which)?
  • Should we store the markup in the index or preprocess before displaying?
  • How does the WikiText editor work with e4/RAP?

Basically depends on #5 (index) and #8 (highlighting).

Store OCR results in search index

Based on our plan to use Lucene, we could:

  • Use Solr to store the OCR result PDF directly
  • Get the raw text out of the OCR result PDF with iText, store that

This issue is related to getting and storing positional information, which needs to be associated with the texts stored in the index.

Get positional information from OCR result PDFs

The OCR result PDFs contain line-base position information. It could be retrieved using regular expressions or line-based parsing (which is probably easier here).

The positional information should be associated with the actual index (#5).

Find a solution for tilted lines

As we only have starting positions of lines, we get wrong highlighting for words distant from the line start if the lines are tilted.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.