tesseract-shadow / tesseract-ocr-compilation Goto Github PK
View Code? Open in Web Editor NEWTesseract 4 OCR Compilation - Docker Container
License: Apache License 2.0
Tesseract 4 OCR Compilation - Docker Container
License: Apache License 2.0
@wildloop
Because the Dockerfile runs EXPOSE 22
and Docker automatically modifies firewall rules, running this container means SSH gets exposed on your public IP.
The Dockerfile also enables root login over SSH (bad) and sets the root password to root
(quite bad) ๐คฆ
I recently left this running over night, and in the morning I found out that someone was mining Monero on my machine.
Is SSH really necessary? Why can't we just use docker exec
instead?
After using an adapted version of the docker file provided in this repo that adds my app. When i call tesseract from my ruby app i get sh: tesseract: not found
. My Dockerfile looks like this and installs fine:
` FROM ubuntu:16.04
RUN apt-get update && apt-get install -y \
autoconf \
autoconf-archive \
automake \
build-essential \
checkinstall \
cmake \
g++ \
git \
libcairo2-dev \
libcairo2-dev \
libicu-dev \
libicu-dev \
libjpeg8-dev \
libjpeg8-dev \
libpango1.0-dev \
libpango1.0-dev \
libpng12-dev \
libpng12-dev \
libtiff5-dev \
libtiff5-dev \
libtool \
pkg-config \
wget \
xzgv \
zlib1g-dev
# SSH for diagnostic
RUN apt-get update && apt-get install -y --allow-downgrades --allow-remove-essential --allow-change-held-packages openssh-server
RUN mkdir /var/run/sshd
RUN echo 'root:root' | chpasswd
RUN sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config
# SSH login fix. Otherwise user is kicked off after login
RUN sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd
ENV NOTVISIBLE "in users profile"
RUN echo "export VISIBLE=now" >> /etc/profile
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
# Directories
ENV SCRIPTS_DIR /home/scripts
ENV PKG_DIR /home/pkg
ENV BASE_DIR /home/workspace
ENV LEP_REPO_URL https://github.com/DanBloomberg/leptonica.git
ENV LEP_SRC_DIR ${BASE_DIR}/leptonica
ENV TES_REPO_URL https://github.com/tesseract-ocr/tesseract.git
ENV TES_SRC_DIR ${BASE_DIR}/tesseract
ENV TESSDATA_PREFIX /usr/local/share/tessdata
RUN mkdir ${SCRIPTS_DIR}
RUN mkdir ${PKG_DIR}
RUN mkdir ${BASE_DIR}
RUN mkdir ${TESSDATA_PREFIX}
COPY ./container-scripts/* ${SCRIPTS_DIR}/
RUN chmod +x ${SCRIPTS_DIR}/*
RUN ${SCRIPTS_DIR}/repos_clone.sh
RUN ${SCRIPTS_DIR}/tessdata_download.sh
RUN groupadd -r tesseract && useradd -r -g tesseract tesseract
USER tesseract
FROM iron/ruby
WORKDIR /app
ADD . /app
ADD ./bin/textcleaner /usr/local/bin
ENTRYPOINT ["ruby", "app.rb"]`
my app looks like this:
`
require 'sinatra'
require 'fileutils'
require "carrierwave"
require 'carrierwave/datamapper'
require "carrierwave/orm/activerecord"
require_relative 'models/image'
require_relative 'data_mapper_setup'
set :protection, except: [ :json_csrf ]
port = ENV['PORT'] || 8080
puts "STARTING SINATRA on port #{port}"
set :port, port
set :bind, '0.0.0.0'
CarrierWave.configure do |config|
config.root = File.dirname(FILE)
end
get '/' do
({"Hello" => "World!"}).to_json
end
post '/extractText' do
begin
path = File.dirname(FILE)
billID = params[:billID]
image = Image.new(file: params[:file])
file = File.new("#{path}#{image.file.url}")
system("tesseract #{file} --psm 6 resultsFile.txt")
results = File.read("resultsFile.txt")
rescue
status 402
return "Error reading image"
end
status 200
return resultsFile.to_json
end
`
The line where i get the error at after passing the file is system("tesseract #{file} --psm 6 resultsFile.txt")
Any help would be great
Hello!
I have a virtual machine (Hyper-V) with Lubuntu installed:
(Ubuntu 21.10 "impisb" base).
I installed Docker an run scripts 1 to 6. I can do the OCR test (although I had to modify the script 6-test-ocr.sh , phototest.tif is not available anymore at the location specified in the script).
But I cannot connect to SSH / localhost at port 4022. If I understand right, it should be user root and passwort root but this connection is always refused. What could be the reason?
Kind regards,
Micha
Looks like "autobuild" file already disappeared from leptonica source, but the compile script still using it.
# ${SCRIPTS_DIR}/compile_leptonica.sh
... ...
parallel-tests: installing 'config/test-driver'
autoreconf: Leaving directory `.'
/home/scripts/compile_leptonica.sh: line 5: ./autobuild: No such file or directory
make: *** No targets specified and no makefile found. Stop.
And check the leptonica folder:
# ls ${LEP_SRC_DIR}
CMakeLists.txt Makefile.in aclocal.m4 autom4te.cache configure lept.pc.cmake lok.lua make-for-local src version-notes.html
Doxyfile README.html appveyor.yml cmake configure.ac lept.pc.in m4 moller52.jpg style-guide.txt
Makefile.am README.md autogen.sh config cppan.yml leptonica-license.txt make-for-auto prog sw.cpp
Remove the && ./autobuild
command in that script should safely fix the issue.
With error:
libtool: error: REVISION '00' must be a nonnegative integer
libtool: error: '4:00' is not valid version information
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.