GithubHelp home page GithubHelp logo

Comments (48)

dbashford avatar dbashford commented on September 4, 2024

What version of textract are you using?

from textract.

nishvs avatar nishvs commented on September 4, 2024

0.20.0....I got it from npm

from textract.

dbashford avatar dbashford commented on September 4, 2024

Bump to 1.0 and you won't need unzip.

If you have catdoc installed then it's installed such that textract can't use it. Give me a few and I'll whip a script up that should help figure the issue out.

from textract.

nishvs avatar nishvs commented on September 4, 2024

Yes i just saw it my mistake....Thanks for the help david...appreciated

from textract.

dbashford avatar dbashford commented on September 4, 2024

All is good then?

from textract.

nishvs avatar nishvs commented on September 4, 2024

Yeah i am able to read text from pdf and docx file. But still when i try to read text from doc file i get problems.

warning at project startup:
textract: 'catdoc' does not appear to be installed, so it will be unable to extr
act DOCs.

and at runtime i get the error:

{ [Error: textract does not currently extract files of type [[ application/mswor
d ]]] typeNotFound: true }

Note: I am using catdoc which is ported for windows
http://blog.brush.co.nz/2009/09/catdoc-windows/

The reason i am using this is because on the site they say they dont support windows.

from textract.

dbashford avatar dbashford commented on September 4, 2024

Create a file, call it doc, and copy this text into it. Then run it from the command line. What does it say?

#!/usr/bin/env node
var exec = require("child_process").exec;
exec('catdoc ' + __filename,
  function (error, stdout, stderr) {
    if (error) {
      console.log("catdoc cannot be found/executed by this script, errors to follow.")
      console.log("**************ERROR*****************");
      console.log(error);
      console.log("**************stderr*****************")
      console.log(stderr);
      console.log("**************stdout*****************")
      console.log(stdout);
    } else {
      console.log("Found catdoc, textract should be able to use it.")
    }
  }
);

from textract.

nishvs avatar nishvs commented on September 4, 2024

I pasted the code in a method and called the method in the application.
I am using windows environment.

catdoc cannot be found/executed by this script, errors to follow.
_ERROR_***
{ [Error: Command failed: C:\Windows\system32\cmd.exe /s /c "catdoc E:\NewStartu
p\Code\Tutorial\Node Samples\SampleNodeProject\ClientApp\controllers\home.js"
catdoc: No such file or directory
catdoc: No such file or directory
]
killed: false,
code: 1,
signal: null,
cmd: 'C:\Windows\system32\cmd.exe /s /c "catdoc E:\NewStartup\Code\Tutor
ial\Node Samples\SampleNodeProject\ClientApp\controllers\home.js"' }
_stderr_***
catdoc: No such file or directory
catdoc: No such file or directory

_stdout_***

{ [Error: textract does not currently extract files of type [[ application/mswor
d ]]] typeNotFound: true }
catdoc cannot be found/executed by this script, errors to follow.
_ERROR_***
{ [Error: Command failed: C:\Windows\system32\cmd.exe /s /c "catdoc E:\NewStartu
p\Code\Tutorial\Node Samples\SampleNodeProject\ClientApp\controllers\home.js"
catdoc: No such file or directory
catdoc: No such file or directory
]
killed: false,
code: 1,
signal: null,
cmd: 'C:\Windows\system32\cmd.exe /s /c "catdoc E:\NewStartup\Code\Tutor
ial\Node Samples\SampleNodeProject\ClientApp\controllers\home.js"' }
_stderr_***
catdoc: No such file or directory
catdoc: No such file or directory

_stdout_***

from textract.

nishvs avatar nishvs commented on September 4, 2024

I think the input file is not provided. one second i will do it again.

from textract.

dbashford avatar dbashford commented on September 4, 2024

That script is just trying to catdoc itself. Is home.js where you put the text I put above?

from textract.

nishvs avatar nishvs commented on September 4, 2024

yes in home.js i have put the script in a method which i called.

from textract.

nishvs avatar nishvs commented on September 4, 2024

I got this output when i gave the correct input file path.

Found catdoc, textract should be able to use it.

from textract.

dbashford avatar dbashford commented on September 4, 2024

How is it getting the path wrong?

from textract.

dbashford avatar dbashford commented on September 4, 2024

I don't have a Windows machine to test on I'm afraid, so I may need your help to debug this really quickly.

If you find go into node_modules/textract/lib/extractors/doc.js and add those console.logs from above to this line: https://github.com/dbashford/textract/blob/master/lib/extractors/doc.js#L49

That'll help me figure out what might be happening.

from textract.

nishvs avatar nishvs commented on September 4, 2024

This is the change which i did to ur script. I remode __filename and added "./Resume.doc after catdoc

var exec = require("child_process").exec;
// exec('catdoc ' + __filename,
exec('catdoc ./Resume.doc' ,
function (error, stdout, stderr) {
if (error) {
console.log("catdoc cannot be found/executed by this script, errors to follow.")
console.log("ERROR**");
console.log(error);
console.log("**
stderr**")
console.log(stderr);
console.log("**
*******stdout****************")
console.log(stdout);
} else {
console.log("Found catdoc, textract should be able to use it.")
}
}
);

from textract.

dbashford avatar dbashford commented on September 4, 2024

I wonder if it has an issue with the file not being a .doc and its just giving a very bad error.

If you change the path again and point it at the script itself the way you pointed at Resume.doc what happens?

from textract.

nishvs avatar nishvs commented on September 4, 2024

Following output when i gave the path '../controllers/home.js'

textract: 'catdoc' does not appear to be installed, so it will be unable to extr
act DOCs.
textract: 'drawingtotext' does not appear to be installed, so it will be unable
to extract DXFs.
catdoc cannot be found/executed by this script, errors to follow.
_ERROR_***
{ [Error: Command failed: C:\Windows\system32\cmd.exe /s /c "catdoc ../controlle
rs/home.js"
catdoc: No such file or directory
]
killed: false,
code: 1,
signal: null,
cmd: 'C:\Windows\system32\cmd.exe /s /c "catdoc ../controllers/home.js"' }
_stderr_***
catdoc: No such file or directory

_stdout_***

catdoc cannot be found/executed by this script, errors to follow.
_ERROR_***
{ [Error: Command failed: C:\Windows\system32\cmd.exe /s /c "catdoc ../controlle
rs/home.js"
catdoc: No such file or directory
]
killed: false,
code: 1,
signal: null,
cmd: 'C:\Windows\system32\cmd.exe /s /c "catdoc ../controllers/home.js"' }
_stderr_***
catdoc: No such file or directory

_stdout_***

from textract.

nishvs avatar nishvs commented on September 4, 2024

THere was small mistake in the path.I gave extra .
Now i gave the path "./controllers/home.js"
and i got the following output.

textract: 'drawingtotext' does not appear to be installed, so it will be unable
to extract DXFs.
textract: 'catdoc' does not appear to be installed, so it will be unable to extr
act DOCs.
Found catdoc, textract should be able to use it.

from textract.

dbashford avatar dbashford commented on September 4, 2024

Ok, so it seems to work whether it is a .doc or not.

And it seems to work as long as you provide relative paths.

But when you provide the full path it does not? Can you try putting in the full path to the file?

from textract.

dbashford avatar dbashford commented on September 4, 2024

O, wait.

When you use __filename it was using windows slashes:

catdoc E:\NewStartup\Code\Tutor ial\Node Samples\SampleNodeProject\ClientApp\controllers\home.js

But when you are putting them in by hand, you are using unix slashes.

./controllers/home.js

node.js, when it creates __filename is going to build an OS specific filename for you. It is building a windows filename.

from textract.

nishvs avatar nishvs commented on September 4, 2024

so i am putting __filename+'/Resume.doc' in the path line. Is it ok??

from textract.

dbashford avatar dbashford commented on September 4, 2024

It would be __dirname not __filename, but I think the slashes might be throwing it off.

from textract.

nishvs avatar nishvs commented on September 4, 2024

I gave the path, it contains double \ before Resume.doc:

exec('catdoc ' + __dirname+'\Resume.doc' ,

I got the following output

textract: 'catdoc' does not appear to be installed, so it will be unable to extr
act DOCs.
textract: 'drawingtotext' does not appear to be installed, so it will be unable
to extract DXFs.
catdoc cannot be found/executed by this script, errors to follow.
_ERROR_***
{ [Error: Command failed: C:\Windows\system32\cmd.exe /s /c "catdoc E:\NewStartu
p\Code\Tutorial\Node Samples\SampleNodeProject\ClientApp\controllers\Resume.doc"

catdoc: No such file or directory
catdoc: No such file or directory
]
killed: false,
code: 1,
signal: null,
cmd: 'C:\Windows\system32\cmd.exe /s /c "catdoc E:\NewStartup\Code\Tutor
ial\Node Samples\SampleNodeProject\ClientApp\controllers\Resume.doc"' }
_stderr_***
catdoc: No such file or directory
catdoc: No such file or directory

_stdout_***

from textract.

dbashford avatar dbashford commented on September 4, 2024

Something else to try. Think the slash mismatch is screwing things up.

var fileName = __filename.replace(/(.+):/, "/$1")

Then use that fileName, so...

exec('catdoc ' + fileName,

from textract.

nishvs avatar nishvs commented on September 4, 2024

Sry but i did not understand what you want me to do??

from textract.

dbashford avatar dbashford commented on September 4, 2024
#!/usr/bin/env node
var exec = require("child_process").exec;
var fileName = __filename.replace(/(.+):/, "/$1").replace(/\\/g, "/");
exec('catdoc ' + fileName,
  function (error, stdout, stderr) {
    if (error) {
      console.log("catdoc cannot be found/executed by this script, errors to follow.")
      console.log("**************ERROR*****************");
      console.log(error);
      console.log("**************stderr*****************")
      console.log(stderr);
      console.log("**************stdout*****************")
      console.log(stdout);
    } else {
      console.log("Found catdoc, textract should be able to use it.")
    }
  }
);

from textract.

nishvs avatar nishvs commented on September 4, 2024

I copied the code above in the method in home.js and ran the application.

I got the following output

textract: 'catdoc' does not appear to be installed, so it will be unable to extr
act DOCs.
textract: 'drawingtotext' does not appear to be installed, so it will be unable
to extract DXFs.
catdoc cannot be found/executed by this script, errors to follow.
_ERROR_***
{ [Error: Command failed: C:\Windows\system32\cmd.exe /s /c "catdoc /E\NewStartu
p\Code\Tutorial\Node Samples\SampleNodeProject\ClientApp\controllers\home.js"
catdoc: No such file or directory
catdoc: No such file or directory
]
killed: false,
code: 1,
signal: null,
cmd: 'C:\Windows\system32\cmd.exe /s /c "catdoc /E\NewStartup\Code\Tutor
ial\Node Samples\SampleNodeProject\ClientApp\controllers\home.js"' }
_stderr_***
catdoc: No such file or directory
catdoc: No such file or directory

_stdout_***

from textract.

dbashford avatar dbashford commented on September 4, 2024

i updated the script above, plz try again?

from textract.

nishvs avatar nishvs commented on September 4, 2024

I got the following output:

textract: 'drawingtotext' does not appear to be installed, so it will be unable
to extract DXFs.
textract: 'catdoc' does not appear to be installed, so it will be unable to extr
act DOCs.
catdoc cannot be found/executed by this script, errors to follow.
_ERROR_***
{ [Error: Command failed: C:\Windows\system32\cmd.exe /s /c "catdoc /E\NewStartu
p\Code\Tutorial\Node Samples\SampleNodeProject\ClientApp\controllers\home.js"
catdoc: No such file or directory
catdoc: No such file or directory
]
killed: false,
code: 1,
signal: null,
cmd: 'C:\Windows\system32\cmd.exe /s /c "catdoc /E\NewStartup\Code\Tutor
ial\Node Samples\SampleNodeProject\ClientApp\controllers\home.js"' }
_stderr_***
catdoc: No such file or directory
catdoc: No such file or directory

_stdout_***

from textract.

nishvs avatar nishvs commented on September 4, 2024

One minute but what did you update in the script??

from textract.

dbashford avatar dbashford commented on September 4, 2024

I'm hoping you are able to see what I am doing and try to do it yourself. You are using unix paths for your tests that are working. Please transform this into a unix path: /E\NewStartu p\Code\Tutorial\Node Samples\SampleNodeProject\ClientApp\controllers\home.js

from textract.

dbashford avatar dbashford commented on September 4, 2024

/E/NewStartup/Code/Tutorial/Node Samples/SampleNodeProject/ClientApp/controllers/home.js

from textract.

nishvs avatar nishvs commented on September 4, 2024

K i will do this and same i need to do in the "node_modules/textract/lib/extractors/doc.js" file right??

from textract.

dbashford avatar dbashford commented on September 4, 2024

Just do it in the script to see if that works =)

from textract.

dbashford avatar dbashford commented on September 4, 2024

just hardcode the full unix path to home.js and if that works, let me know what that looks like.

from textract.

nishvs avatar nishvs commented on September 4, 2024

Following path i gave:
var fileName = '/E/NewStartup/Code/Tutorial/Node Samples/SampleNodeProject/ClientApp/controllers/home.js'

Output

textract: 'drawingtotext' does not appear to be installed, so it will be unable
to extract DXFs.
textract: 'catdoc' does not appear to be installed, so it will be unable to extr
act DOCs.
catdoc cannot be found/executed by this script, errors to follow.
_ERROR_***
{ [Error: Command failed: C:\Windows\system32\cmd.exe /s /c "catdoc /E/NewStartu
p/Code/Tutorial/Node Samples/SampleNodeProject/ClientApp/controllers/home.js"
catdoc: No such file or directory
catdoc: No such file or directory
]
killed: false,
code: 1,
signal: null,
cmd: 'C:\Windows\system32\cmd.exe /s /c "catdoc /E/NewStartup/Code/Tutorial
/Node Samples/SampleNodeProject/ClientApp/controllers/home.js"' }
_stderr_***
catdoc: No such file or directory
catdoc: No such file or directory

_stdout_***

from textract.

dbashford avatar dbashford commented on September 4, 2024

You have proven relative paths work. ./controllers/home.js.

In order for me to help you I'll need you to figure out how to make a full path work.

What are you using to make Windows unix-y? GitBash? Cygwin?

from textract.

nishvs avatar nishvs commented on September 4, 2024

I am sorry but i dont understand your question??
i am using normal node.js command prompt...

from textract.

dbashford avatar dbashford commented on September 4, 2024

Let me try to go at this differently.

If you type catdoc and nothing else at the command line, what does it output?

from textract.

nishvs avatar nishvs commented on September 4, 2024

I get following output:

Usage:
catdoc [-vu8btawxlV] [-m number] [-s charset] [-d charset] [ -f format] files

If i provide filepath then it shows the text of the file

I alreadt told tat i am using a catdoc which was ported for windows. I hope tat is not the problem.

from textract.

dbashford avatar dbashford commented on September 4, 2024

I just published a new version of textract, 1.0.1, please give it a shot?

from textract.

nishvs avatar nishvs commented on September 4, 2024

I am getting the error when i start the project:

  if (error !== null && error.indexOf( "catdoc: No such file or directory"
                              ^

TypeError: undefined is not a function
at E:\NewStartup\Code\Tutorial\Node Samples\SampleNodeProject\ClientApp\node
_modules\textract\lib\extractors\doc.js:48:35
at ChildProcess.exithandler (child_process.js:758:5)
at ChildProcess.emit (events.js:110:17)
at maybeClose (child_process.js:1015:16)
at Socket. (child_process.js:1183:11)
at Socket.emit (events.js:107:17)
at Pipe.close (net.js:485:12)

from textract.

dbashford avatar dbashford commented on September 4, 2024

Install 1.0.2. Introduced a problem there, should be fixed.

from textract.

nishvs avatar nishvs commented on September 4, 2024

Yes it is working..Thanks for your help i could not have done it without your guidance.Please accept my linked request if it is not much trouble to you. :)

from textract.

nishvs avatar nishvs commented on September 4, 2024

Should i close the issue now..K u did it already...Thank you again. :)

from textract.

dbashford avatar dbashford commented on September 4, 2024

I took care of it =)

from textract.

dbashford avatar dbashford commented on September 4, 2024

Which invite is you? Nitish?

from textract.

nishvs avatar nishvs commented on September 4, 2024

Yes i am nitish kumar...

from textract.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.