husseinyoussef / arabic-ocr Goto Github PK
View Code? Open in Web Editor NEWOCR system for Arabic language that converts images of typed text to machine-encoded text.
License: MIT License
OCR system for Arabic language that converts images of typed text to machine-encoded text.
License: MIT License
Hey Hussein, you have made a good project but i want to ask one thing, how to train the model for arabic image/handwritten image or how you trained the model?
Answers are appreciated :)
Thank you.
Hello,
I try to run the training but I get this error in below:
File "train.py", line 214, in <module>
train()
File "train.py", line 143, in train
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8)
File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 2120, in train_test_split
default_test_size=0.25)
File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 1805, in _validate_shuffle_split
train_size)
ValueError: With n_samples=0, test_size=None and train_size=0.8, the resulting train set will be empty. Adjust any of the aforementioned parameters.
Please advise!
Thank you in advance.
how do i train model?
Challenge
I am experiencing an issue with the ocr.py file in my cloned repository. After following the provided instructions to have the images I want to convert to text in the test file, I ran the ocr.py file, but I'm not getting any output in the "truth” folder or anywhere else as expected.
Steps to Reproduce
Clone the repository.
Follow the instructions to prepare the images in the test folder.
Run the ocr.py file using python path/ocr.py
Observe that no output is generated in the "truth" folder
Hey when i m loading one of your test images i m getting this error any idea how to solve this ?
After reading all images from dataset the training process is stuck on scores[ ] Pdb. Please describe is the training process is running in background? or debugger is unable to continue the process. Moreover, how much time will it require to train the models?
Hey man
I want to ask you a few questions
Do you have a skype account or anything like that where i can message you
I willl not take a lot of your time
Thanks in advance
I just looked at the datasets that you used and found that it only use computer typed text not handwritten by a human, is this correct?!
hi
in install it say :
pip3 install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: numpy in /home/mohsen/.local/lib/python3.6/site-packages (from -r requirements.txt (line 1)) (1.18.3)
Requirement already satisfied: opencv-python in /home/mohsen/.local/lib/python3.6/site-packages (from -r requirements.txt (line 2)) (4.2.0.34)
Requirement already satisfied: scikit-learn in /usr/lib64/python3.6/site-packages (from -r requirements.txt (line 3)) (0.22.2.post1)
Requirement already satisfied: scikit-image in /home/mohsen/.local/lib/python3.6/site-packages (from -r requirements.txt (line 4)) (0.16.2)
Requirement already satisfied: pandas in /home/mohsen/.local/lib/python3.6/site-packages (from -r requirements.txt (line 5)) (0.25.3)
Requirement already satisfied: matplotlib in /home/mohsen/.local/lib/python3.6/site-packages (from -r requirements.txt (line 6)) (3.2.1)
Collecting multiprocessing
Using cached multiprocessing-2.6.2.1.tar.gz (108 kB)
ERROR: Command errored out with exit status 1:
command: /usr/bin/python3.6 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-oe37fbft/multiprocessing/setup.py'"'"'; __file__='"'"'/tmp/pip-install-oe37fbft/multiprocessing/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-oe37fbft/multiprocessing/pip-egg-info
cwd: /tmp/pip-install-oe37fbft/multiprocessing/
Complete output (6 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-oe37fbft/multiprocessing/setup.py", line 94
print 'Macros:'
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print('Macros:')?
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
i'm trying to use this code, but i get empty output. I think there is something wrong with my image,not sure. can you provide some sample working test image
I have tried to detect Arabic from Aljazeera news snaps but the code failed to read the file. The code is restricted to read 8- bit depth jpeg or png image. However, I have used the 24-bit depth jpeg image and 8-bit bitmap image (added .bmp in list) but still that uint error. Is your code is supporting a specific Arabic font? and how to resolve this bit depth issue. Please guide thanks.
C:\Users\AMR\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\sklearn\base.py:329: UserWarning: Trying to unpickle estimator MLPClassifier from version 0.22 when using version 1.0.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
warnings.warn(
0%| | 0/3 [00:02<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64__qbz5n2kfra8p0\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64__qbz5n2kfra8p0\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "C:\Users\AMR\Scrapy_projects\Arabic-OCR\Arabic-OCR\src\OCR.py", line 23, in run2
char_imgs = segment(line, word)
File "C:\Users\AMR\Scrapy_projects\Arabic-OCR\Arabic-OCR\src\character_segmentation.py", line 659, in segment
valid = filter_regions(binary_word, no_dots_copy, SRL, VP, upper_base, lower_base, MTI, MFV, top_line)
File "C:\Users\AMR\Scrapy_projects\Arabic-OCR\Arabic-OCR\src\character_segmentation.py", line 401, in filter_regions
cc, l = cv.connectedComponents(1-(no_dots_copy[:, end_idx:start_idx+1]), connectivity=4)
cv2.error: Unknown C++ exception from OpenCV code
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\AMR\Scrapy_projects\Arabic-OCR\Arabic-OCR\src\OCR.py", line 98, in <module>
running_time.append(run(images_path))
File "C:\Users\AMR\Scrapy_projects\Arabic-OCR\Arabic-OCR\src\OCR.py", line 47, in run
predicted_words = pool.map(run2, words)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64__qbz5n2kfra8p0\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64__qbz5n2kfra8p0\lib\multiprocessing\pool.py", line 771, in get
raise self._value
cv2.error: Unknown C++ exception from OpenCV code
Hello,
Thanks for the great tool, I am trying to build this on an M1 Macbook but getting errors on installation
error: the clang compiler does not support 'faltivec', please use -maltivec and include altivec.h explicitly
.
I tried to add docker file but still got the same issue.
FROM python:3.8-slim-buster
WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY . .
CMD [ "python" ,"OCR.py"]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.