cosmocode / docsearch Goto Github PK

Search through uploaded documents in DokuWiki

Home Page: http://www.dokuwiki.org/plugin:docsearch

PHP 100.00%

docsearch's Introduction

DocSearch Plugin for DokuWiki

Extends the default search and appends a search through uploaded documents files

All documentation for this plugin can be found at
http://www.dokuwiki.org/plugin:docsearch

If you install this plugin manually, make sure it is installed in
lib/plugins/docsearch/ - if the folder is called different it
will not work!

Please refer to http://www.dokuwiki.org/plugins for additional info
on how to install plugins in DokuWiki.

----
Copyright (C) Dominik Eckelmann <[email protected]>

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2 of the License

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

See the COPYING file in your DokuWiki folder for details

docsearch's People

Contributors

Stargazers

Watchers

Forkers

yvesf holmbergius samwilson dokuwiki-translate gamma cziehr one-mb azurecloudmonk giterlizzi g-user schengawegga

docsearch's Issues

No search snippets in search results

I run a DokuWiki with the docsearch plugin. The plugin itself works it does not display search snippets for documents in the search results. I am thankful for any ideas how to solve this?

Searching for multiple words fails

I uploaded the document http://www.easa.europa.eu/ws_prod/g/doc/Agency_Mesures/Certification_Spec/decision_ED_2003_02_RM.pdf to a fresh wiki install and searched for "European" and "Agency" with good results when searching only for one of them. If I search for "European Agency"(without the quotation marks" I do not get any results. Does this plugin not allow this kind of search or is there any other syntax needed to search for more than 2 words?

Andreas

cron as apache user does not work

Here is a workaround:

https://forum.dokuwiki.org/thread/7349

Request: open PDF with search terms highlighted

Can the plugin open PDF results with search terms highlighted? This should be simply a matter of passing the search results into the link for the PDF URL. The format is documented here:

http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf

Thanks!
Jason

cron.php error with php 5.3

I updated to php 5.3 (debian lenny)

Deprecated: Directive 'register_long_arrays' is deprecated in PHP 5.3 and greater in Unknown on line 0
Deprecated: Directive 'magic_quotes_gpc' is deprecated in PHP 5.3 and greater in Unknown on line 0

Cron Job

Cronjob ALways Loops over all Files, instead only the new Files.
It would use less resources, when running the Cron, specially when using image scanning for png or jpg

Feature request: Add backlings and Add HTML Preview to search result

First I have to thank you so much. This is the plugin I always wanted but due to my limited php knowledge I was not able to code it myself.
I have 2 feature requests to make this plugin even better.

First:
On the search result page it would be nice to see the backlinks of this media file to the pages where it is linked on. Either directly on the result page or a second auto generated page. Having the backlinks one could easily find out in what context the document is used. This could greatly enhance the search as one could find related documents even if they do not match the keyword. Of course the ACL should be tested and only backlinks with read rights should be displayed.
A user could copy the full filename and search for it by hand but I think backlinks would be a lot more intuitive to most users.

Second:
It would be nice if there would be additional html previews (where the keywords are marked) like google does it with its cache. Tools like openoffice can easily be used to generate these documents. This way it would allow people to read at least a cached version without the need of having readers for all file formats installed on their pc.

Andreas

Thanks Again for this great great plugin!

plugin in a Debian Farm context

Hello,

I would be very interested to implement your plugin in at list one of my animal farm.
You wrote that if we run a DokuWiki farm, we need to run the cronjob for each animal seperately, passing the animal's name as first parameter to the script. and I am sadly at lost with this

perhaps is it due to the peculiar architecture of the Debian dokuwiki Farm (

var/lib/dokuwiki/data/media/
var/lib/dokuwiki/data/index/
...
var/lib/dokuwiki/farm/
var/lib/dokuwiki/farm/animal1/
var/lib/dokuwiki/farm/animal1/data/media/
var/lib/dokuwiki/farm/animal1/data/index/
)

So I have made few test
First, to see if all work, I put some pdf in var/lib/dokuwiki/data/media/ and run
strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php

to see how it goes...
And indeed a docsearch folder has been created and an index folder in it correctly built...
\o/
( that's great by the way ; There's a lot of pdf we use here and it is so helpful to be able to make a search inside them )

BUT I have no clue how to make it works for my animals
I tried

strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php?animal1
strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php?animal=animal1
strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php animal1
strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php animal=animal1

none work :(

Any clue ?

Thanks for all :)

Don't display "Results in Documents" when nothing found

When the searcher does not find a document matching the query, the title should not be displayed.

Add info about number of found dokuments and link on top of resultpage

When a search returns a lot of page hits, the user has to scroll down quite a bit to see the results for documents because the page hits need a lot of space. It would be nice if there is a link at the top of the page to the documents results. Maybe together with an info how many documents have been found.

Memory consumtion for extracted zip files

I wrote the zip2txt.sh script (which can be found on the plugin page) to index zip files. Because it joins all converted files from a given zip file to one big monster txt file the indexer will consume a lot of memory while working on that file. On our wiki (4Jears old, around 600 pages , 9GB in size) we have a zip file which contains scientific literature in multiple pdf documents. After joining the conversions together, the indexer / lexer needs a huge amount of memory. I had to set my memory limit for php to 250MB to avoid a crash on the generated textfile. Here is the output of wc for this file:
wc ./literatur.zip.txt
lines words bytes
78897 1242650 8856762 ./literatur.zip.txt
That means the indexer had to handle one huge 8.8MByte txt file which is off course not easy because the current indexing process is trying to index the file in one big step. Is there a way for this plugin to allow indexing of each file found in a zipfile but still return the zipfile as origin in a search? A unpacker script could return a list of file names instead of one big file. Or would it be better to change the indexing process of dokuwiki to handle such big files with less memory consumption?

Andreas

Memory problem

I tested the plugin successfully on fresh new wiki with success. After that I gave it a try at our main wiki using only the pdf converter given in the example config. After manually calling cron.php it crashes after a second and says:

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 8 bytes) in /var/www//inc/indexer.php on line 224

It did completely convert one pdf to txt which was stored in the data section. It has a size of 1597616 bytes and it contains 221296 words in 9792 text lines. Currently I can not tell whether it crasher while indexing on this file or while trying to convert the next one. Does the converter itself gets affected as well by the php memory limit when it gets calles from inside a php script?

Andreas

cosmocode / docsearch Goto Github PK

docsearch's Introduction

docsearch's People

Contributors

Stargazers

Watchers

Forkers

docsearch's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs