Comments (2)
This is an example of a script I wrote to parse the tsv generated from tesseract
import { exec } from "child_process"
import tempy from "tempy"
import Papa from "papaparse"
import fs from "fs/promises"
const outputFilePath = tempy.file({ extension: "tsv" })
await new Promise((resolve, reject) => {
exec(
`tesseract --psm 12 --oem 2 -l chi_tra ${inputFilePath} ${outputFilePath.replace(
/\.tsv$/,
""
)} tsv`,
(err, stdout, stderr) => {
if (err) reject(err)
resolve(null)
}
)
})
const recognizedChars: Array<RChar> = Papa.parse(
(await fs.readFile(outputFilePath)).toString(),
{
header: true,
}
).data.map((a) => ({
...a,
block_num: parseInt(a.block_num),
left: parseInt(a.left),
top: parseInt(a.top),
width: parseInt(a.width),
height: parseInt(a.height),
}))
from node-tesseract-ocr.
This is an example of a script I wrote to parse the tsv generated from
tesseract
import { exec } from "child_process" import tempy from "tempy" import Papa from "papaparse" import fs from "fs/promises" const outputFilePath = tempy.file({ extension: "tsv" }) await new Promise((resolve, reject) => { exec( `tesseract --psm 12 --oem 2 -l chi_tra ${inputFilePath} ${outputFilePath.replace( /\.tsv$/, "" )} tsv`, (err, stdout, stderr) => { if (err) reject(err) resolve(null) } ) }) const recognizedChars: Array<RChar> = Papa.parse( (await fs.readFile(outputFilePath)).toString(), { header: true, } ).data.map((a) => ({ ...a, block_num: parseInt(a.block_num), left: parseInt(a.left), top: parseInt(a.top), width: parseInt(a.width), height: parseInt(a.height), }))
Thank you so nuch, exactly what I was looking for!
from node-tesseract-ocr.
Related Issues (20)
- Can I afferent in a binary image? HOT 2
- Run e2e tests on Windows
- How do we use --user-words?
- his tesseract has no URL support HOT 2
- pipeInput example
- crash when whitelisting some characters
- Error HOT 3
- Have been waiting for more than an hour for this Image
- V8 crashes while batch OCRing
- Tesseract v4.x support HOT 2
- Error during processing
- Error [ERR_STREAM_DESTROYED]: Cannot call write after a stream was destroyed HOT 1
- 'tesseract' is not recognized as an internal or external command, operable program or batch file. HOT 2
- How to use in a mobile app, without downloading Tesseract Library ? HOT 1
- Command injection HOT 1
- Error: cannot open input file: stdin
- Buffer Array
- Error: Unhandled 'error' event write EOF ... Emitted 'error' event on Socket instance at: HOT 3
- cmd fail when filename contains spaces (Windows) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from node-tesseract-ocr.