cosmocode / docsearch Goto Github PK
View Code? Open in Web Editor NEWSearch through uploaded documents in DokuWiki
Home Page: http://www.dokuwiki.org/plugin:docsearch
Search through uploaded documents in DokuWiki
Home Page: http://www.dokuwiki.org/plugin:docsearch
DocSearch Plugin for DokuWiki Extends the default search and appends a search through uploaded documents files All documentation for this plugin can be found at http://www.dokuwiki.org/plugin:docsearch If you install this plugin manually, make sure it is installed in lib/plugins/docsearch/ - if the folder is called different it will not work! Please refer to http://www.dokuwiki.org/plugins for additional info on how to install plugins in DokuWiki. ---- Copyright (C) Dominik Eckelmann <[email protected]> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. See the COPYING file in your DokuWiki folder for details
I run a DokuWiki with the docsearch plugin. The plugin itself works it does not display search snippets for documents in the search results. I am thankful for any ideas how to solve this?
I uploaded the document http://www.easa.europa.eu/ws_prod/g/doc/Agency_Mesures/Certification_Spec/decision_ED_2003_02_RM.pdf to a fresh wiki install and searched for "European" and "Agency" with good results when searching only for one of them. If I search for "European Agency"(without the quotation marks" I do not get any results. Does this plugin not allow this kind of search or is there any other syntax needed to search for more than 2 words?
Andreas
Here is a workaround:
Can the plugin open PDF results with search terms highlighted? This should be simply a matter of passing the search results into the link for the PDF URL. The format is documented here:
http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf
Thanks!
Jason
I updated to php 5.3 (debian lenny)
Deprecated: Directive 'register_long_arrays' is deprecated in PHP 5.3 and greater in Unknown on line 0
Deprecated: Directive 'magic_quotes_gpc' is deprecated in PHP 5.3 and greater in Unknown on line 0
Cronjob ALways Loops over all Files, instead only the new Files.
It would use less resources, when running the Cron, specially when using image scanning for png or jpg
First I have to thank you so much. This is the plugin I always wanted but due to my limited php knowledge I was not able to code it myself.
I have 2 feature requests to make this plugin even better.
First:
On the search result page it would be nice to see the backlinks of this media file to the pages where it is linked on. Either directly on the result page or a second auto generated page. Having the backlinks one could easily find out in what context the document is used. This could greatly enhance the search as one could find related documents even if they do not match the keyword. Of course the ACL should be tested and only backlinks with read rights should be displayed.
A user could copy the full filename and search for it by hand but I think backlinks would be a lot more intuitive to most users.
Second:
It would be nice if there would be additional html previews (where the keywords are marked) like google does it with its cache. Tools like openoffice can easily be used to generate these documents. This way it would allow people to read at least a cached version without the need of having readers for all file formats installed on their pc.
Andreas
Thanks Again for this great great plugin!
Hello,
I would be very interested to implement your plugin in at list one of my animal farm.
You wrote that if we run a DokuWiki farm, we need to run the cronjob for each animal seperately, passing the animal's name as first parameter to the script. and I am sadly at lost with this
perhaps is it due to the peculiar architecture of the Debian dokuwiki Farm (
So I have made few test
First, to see if all work, I put some pdf in var/lib/dokuwiki/data/media/ and run
strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php
to see how it goes...
And indeed a docsearch folder has been created and an index folder in it correctly built...
\o/
( that's great by the way ; There's a lot of pdf we use here and it is so helpful to be able to make a search inside them )
BUT I have no clue how to make it works for my animals
I tried
strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php?animal1
strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php?animal=animal1
strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php animal1
strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php animal=animal1
none work :(
Any clue ?
Thanks for all :)
When the searcher does not find a document matching the query, the title should not be displayed.
When a search returns a lot of page hits, the user has to scroll down quite a bit to see the results for documents because the page hits need a lot of space. It would be nice if there is a link at the top of the page to the documents results. Maybe together with an info how many documents have been found.
I wrote the zip2txt.sh script (which can be found on the plugin page) to index zip files. Because it joins all converted files from a given zip file to one big monster txt file the indexer will consume a lot of memory while working on that file. On our wiki (4Jears old, around 600 pages , 9GB in size) we have a zip file which contains scientific literature in multiple pdf documents. After joining the conversions together, the indexer / lexer needs a huge amount of memory. I had to set my memory limit for php to 250MB to avoid a crash on the generated textfile. Here is the output of wc for this file:
wc ./literatur.zip.txt
lines words bytes
78897 1242650 8856762 ./literatur.zip.txt
That means the indexer had to handle one huge 8.8MByte txt file which is off course not easy because the current indexing process is trying to index the file in one big step. Is there a way for this plugin to allow indexing of each file found in a zipfile but still return the zipfile as origin in a search? A unpacker script could return a list of file names instead of one big file. Or would it be better to change the indexing process of dokuwiki to handle such big files with less memory consumption?
Andreas
I tested the plugin successfully on fresh new wiki with success. After that I gave it a try at our main wiki using only the pdf converter given in the example config. After manually calling cron.php it crashes after a second and says:
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 8 bytes) in /var/www//inc/indexer.php on line 224
It did completely convert one pdf to txt which was stored in the data section. It has a size of 1597616 bytes and it contains 221296 words in 9792 text lines. Currently I can not tell whether it crasher while indexing on this file or while trying to convert the next one. Does the converter itself gets affected as well by the php memory limit when it gets calles from inside a php script?
Andreas
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.