GithubHelp home page GithubHelp logo

tesseract-shadow / tesseract-ocr-compilation Goto Github PK

View Code? Open in Web Editor NEW
52.0 52.0 30.0 24 KB

Tesseract 4 OCR Compilation - Docker Container

License: Apache License 2.0

Shell 67.38% Dockerfile 32.62%
compilation tesseract-ocr

tesseract-ocr-compilation's People

Contributors

grooverdan avatar hoto17296 avatar kinjelom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tesseract-ocr-compilation's Issues

SSH into the container is exposed to the world

@wildloop

Because the Dockerfile runs EXPOSE 22 and Docker automatically modifies firewall rules, running this container means SSH gets exposed on your public IP.

The Dockerfile also enables root login over SSH (bad) and sets the root password to root (quite bad) ๐Ÿคฆ

I recently left this running over night, and in the morning I found out that someone was mining Monero on my machine.

Is SSH really necessary? Why can't we just use docker exec instead?

Calling tesseract from ruby app gives "sh: tesseract: not found"

After using an adapted version of the docker file provided in this repo that adds my app. When i call tesseract from my ruby app i get sh: tesseract: not found. My Dockerfile looks like this and installs fine:

` FROM ubuntu:16.04

RUN apt-get update && apt-get install -y \
    autoconf \
    autoconf-archive \
    automake \
    build-essential \
    checkinstall \
    cmake \
    g++ \
    git \
    libcairo2-dev \
    libcairo2-dev \
    libicu-dev \
    libicu-dev \
    libjpeg8-dev \
    libjpeg8-dev \
    libpango1.0-dev \
    libpango1.0-dev \
    libpng12-dev \
    libpng12-dev \
    libtiff5-dev \
    libtiff5-dev \
    libtool \
    pkg-config \
    wget \
    xzgv \
    zlib1g-dev


# SSH for diagnostic
RUN apt-get update && apt-get install -y --allow-downgrades --allow-remove-essential --allow-change-held-packages openssh-server
RUN mkdir /var/run/sshd
RUN echo 'root:root' | chpasswd
RUN sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config
# SSH login fix. Otherwise user is kicked off after login
RUN sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd
ENV NOTVISIBLE "in users profile"
RUN echo "export VISIBLE=now" >> /etc/profile

EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]

# Directories
ENV SCRIPTS_DIR /home/scripts
ENV PKG_DIR /home/pkg
ENV BASE_DIR /home/workspace
ENV LEP_REPO_URL https://github.com/DanBloomberg/leptonica.git
ENV LEP_SRC_DIR ${BASE_DIR}/leptonica
ENV TES_REPO_URL https://github.com/tesseract-ocr/tesseract.git
ENV TES_SRC_DIR ${BASE_DIR}/tesseract
ENV TESSDATA_PREFIX /usr/local/share/tessdata

RUN mkdir ${SCRIPTS_DIR}
RUN mkdir ${PKG_DIR}
RUN mkdir ${BASE_DIR}
RUN mkdir ${TESSDATA_PREFIX}

COPY ./container-scripts/* ${SCRIPTS_DIR}/
RUN chmod +x ${SCRIPTS_DIR}/*
RUN ${SCRIPTS_DIR}/repos_clone.sh
RUN ${SCRIPTS_DIR}/tessdata_download.sh

RUN groupadd -r tesseract && useradd -r -g tesseract tesseract
USER tesseract

FROM iron/ruby

WORKDIR /app

ADD . /app
ADD ./bin/textcleaner /usr/local/bin

ENTRYPOINT ["ruby", "app.rb"]`

my app looks like this:

`
require 'sinatra'
require 'fileutils'
require "carrierwave"
require 'carrierwave/datamapper'
require "carrierwave/orm/activerecord"
require_relative 'models/image'
require_relative 'data_mapper_setup'

set :protection, except: [ :json_csrf ]

port = ENV['PORT'] || 8080
puts "STARTING SINATRA on port #{port}"
set :port, port
set :bind, '0.0.0.0'

CarrierWave.configure do |config|
config.root = File.dirname(FILE)
end

get '/' do
({"Hello" => "World!"}).to_json
end

post '/extractText' do
begin
path = File.dirname(FILE)
billID = params[:billID]
image = Image.new(file: params[:file])
file = File.new("#{path}#{image.file.url}")
system("tesseract #{file} --psm 6 resultsFile.txt")
results = File.read("resultsFile.txt")
rescue
status 402
return "Error reading image"
end
status 200
return resultsFile.to_json
end
`

The line where i get the error at after passing the file is system("tesseract #{file} --psm 6 resultsFile.txt")

Any help would be great

Newbie question....

Hello!
I have a virtual machine (Hyper-V) with Lubuntu installed:
(Ubuntu 21.10 "impisb" base).
I installed Docker an run scripts 1 to 6. I can do the OCR test (although I had to modify the script 6-test-ocr.sh , phototest.tif is not available anymore at the location specified in the script).
But I cannot connect to SSH / localhost at port 4022. If I understand right, it should be user root and passwort root but this connection is always refused. What could be the reason?
Kind regards,
Micha

Error: No "autobuild" file in leptonica source folder

Looks like "autobuild" file already disappeared from leptonica source, but the compile script still using it.

# ${SCRIPTS_DIR}/compile_leptonica.sh
... ...
parallel-tests: installing 'config/test-driver'
autoreconf: Leaving directory `.'
/home/scripts/compile_leptonica.sh: line 5: ./autobuild: No such file or directory
make: *** No targets specified and no makefile found.  Stop.

And check the leptonica folder:

# ls ${LEP_SRC_DIR}
CMakeLists.txt  Makefile.in  aclocal.m4    autom4te.cache  configure     lept.pc.cmake          lok.lua        make-for-local  src              version-notes.html
Doxyfile        README.html  appveyor.yml  cmake           configure.ac  lept.pc.in             m4             moller52.jpg    style-guide.txt
Makefile.am     README.md    autogen.sh    config          cppan.yml     leptonica-license.txt  make-for-auto  prog            sw.cpp

Remove the && ./autobuild command in that script should safely fix the issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.