Comments (12)
While we're investigating this issue, can you try the Leptonica methods that determine the skew angles? If they yield more consistent and accurate results, you may want to go that route; however, the image format conversion, Java BufferedImage
to Leptonica Pix
and back, will incur some overheads. Please do some analysis, and submit a PR if needed. Thanks.
http://tess4j.sourceforge.net/docs/index.html
from tess4j.
Could it be that the image has some invisible artifacts (lines) that skewed the results?
The existing Java native method has been in use ever since the library inception, and no one has complained about it.
from tess4j.
I attached the file. No lines that I could see. Again, the document is not skewed at all. Not sure if anyone has ever tried it on an image like that. Do you see the same results?
I can try the Leptonica library. What exactly is that for?
I see the findSkew() method. I assume that pangle
is the skew angle it finds. But what is pconf
?
from tess4j.
The lines may be invisible to human eyes.
Leptonica is the image processing library that Tesseract directly depends on. You will need to consult its documentation for usage.
https://tpgit.github.io/Leptonica/skew_8c.html
from tess4j.
I don't think that's likely. Are you able to try it and determine if this is a bug? I thought this was a place I could get support
from tess4j.
I tried your image in VietOCR gui. Deskewing the entire image did incorrectly skew it. If I split it in half top/bottom and trim empty spaces, it works correctly. The large empty space in between the header/footer seems to have thrown it off.
from tess4j.
Yes, that is exactly my point. Is there anything I can do to improve this? I have several images like this (pages from a PDF file), that have an address at the top or a few other lines of text, with a lot of white space. But the lines of text are clearly horizontal
from tess4j.
Try your question and images in SO. There are image processing experts that could help.
Or you may want to try to use Leptonica methods first; if need be, post on Leptonica site for help.
from tess4j.
@peterkronenberg Any luck (better results) with Leptonica methods?
from tess4j.
Sorry, haven't had an opportunity to try it yet.
from tess4j.
Ran a test case for Lept4J:
/**
* Test of pixFindSkew method, of class Leptonica1.
*/
@Test
public void testPixFindSkew() {
System.out.println("pixFindSkew");
File input = new File("C:\\Temp\\samplerotatedimage-Redacted.tif");
Pix pixs = Leptonica1.pixRead(input.getPath());
Pix pix1pp = Leptonica1.pixConvertTo1(pixs, 128);
FloatBuffer pangle = FloatBuffer.allocate(1);
FloatBuffer pconf = FloatBuffer.allocate(1);
int expResult = 0;
int result = Leptonica1.pixFindSkew(pix1pp, pangle, pconf);
float conf = pconf.get();
float angle = pangle.get();
System.out.println("Confidence: " + conf + " Angle: " + angle);
assertEquals(expResult, result);
}
Output:
Running net.sourceforge.lept4j.Leptonica1Test
pixFindSkew
Confidence: 2.8027375 Angle: 0.21875
Documentation: https://tpgit.github.io/Leptonica/skew_8c.html
from tess4j.
Thanks for trying this out!
from tess4j.
Related Issues (20)
- Tess4j on Mac M1 HOT 4
- Tesseract 5 Support HOT 4
- tomcat 9 war file with tess4j is giving java.lang.UnsatisfiedLinkError HOT 13
- springboot+tess4j4.5.3+docker
- Bad performance compared with direct use of Tesseract HOT 15
- Building own Jar for Linux and Macos HOT 1
- Do all processing in-memory HOT 1
- OSD usage HOT 5
- DLL Extracting on linux even when using custom libs path and not utilizing LoadLibs.extractTessResources() HOT 5
- Docker Image with Java 11 + tess4j:5.2.0 + Spring Boot 2.6.6 not working HOT 9
- Getting error Invalid memory access when use Tess4j in JAVA project Netbeans HOT 2
- OCRResult only contains last page's scan
- "createDocumentsWithResults(...)" fails when "filename" is null HOT 2
- Performance degradation for multipaged tiff ocr result HOT 2
- error: Error looking up function 'TessBaseAPIInit5': dlsym(0x7fe46b6d06a0, TessBaseAPIInit5): symbol not found HOT 1
- Tesseract.doOCR() failing on Windows from cmd line for Java 17, works in eclipse HOT 2
- JVM crash on M1 / MacOS with tess4j HOT 4
- Can tess4j be used to convert a pdf with images as content to a searchable pd HOT 2
- doOCR() vs. createDocuments() / createDocumentsWithResults() HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tess4j.