GithubHelp home page GithubHelp logo

cosmocode / docsearch Goto Github PK

View Code? Open in Web Editor NEW
11.0 7.0 11.0 41 KB

Search through uploaded documents in DokuWiki

Home Page: http://www.dokuwiki.org/plugin:docsearch

PHP 100.00%
php dokuwiki-plugin search

docsearch's Introduction

DocSearch Plugin for DokuWiki

Extends the default search and appends a search through uploaded documents files

All documentation for this plugin can be found at
http://www.dokuwiki.org/plugin:docsearch

If you install this plugin manually, make sure it is installed in
lib/plugins/docsearch/ - if the folder is called different it
will not work!

Please refer to http://www.dokuwiki.org/plugins for additional info
on how to install plugins in DokuWiki.

----
Copyright (C) Dominik Eckelmann <[email protected]>

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2 of the License

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

See the COPYING file in your DokuWiki folder for details

docsearch's People

Contributors

araname avatar cziehr avatar dom-mel avatar gamma avatar klap-in avatar one-mb avatar rneej avatar samwilson avatar sawachan avatar sergey-art82 avatar splitbrain avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

docsearch's Issues

No search snippets in search results

I run a DokuWiki with the docsearch plugin. The plugin itself works it does not display search snippets for documents in the search results. I am thankful for any ideas how to solve this?

cron.php error with php 5.3

I updated to php 5.3 (debian lenny)

Deprecated: Directive 'register_long_arrays' is deprecated in PHP 5.3 and greater in Unknown on line 0
Deprecated: Directive 'magic_quotes_gpc' is deprecated in PHP 5.3 and greater in Unknown on line 0

Cron Job

Cronjob ALways Loops over all Files, instead only the new Files.
It would use less resources, when running the Cron, specially when using image scanning for png or jpg

Feature request: Add backlings and Add HTML Preview to search result

First I have to thank you so much. This is the plugin I always wanted but due to my limited php knowledge I was not able to code it myself.
I have 2 feature requests to make this plugin even better.

First:
On the search result page it would be nice to see the backlinks of this media file to the pages where it is linked on. Either directly on the result page or a second auto generated page. Having the backlinks one could easily find out in what context the document is used. This could greatly enhance the search as one could find related documents even if they do not match the keyword. Of course the ACL should be tested and only backlinks with read rights should be displayed.
A user could copy the full filename and search for it by hand but I think backlinks would be a lot more intuitive to most users.

Second:
It would be nice if there would be additional html previews (where the keywords are marked) like google does it with its cache. Tools like openoffice can easily be used to generate these documents. This way it would allow people to read at least a cached version without the need of having readers for all file formats installed on their pc.

Andreas

Thanks Again for this great great plugin!

plugin in a Debian Farm context

Hello,

I would be very interested to implement your plugin in at list one of my animal farm.
You wrote that if we run a DokuWiki farm, we need to run the cronjob for each animal seperately, passing the animal's name as first parameter to the script. and I am sadly at lost with this

perhaps is it due to the peculiar architecture of the Debian dokuwiki Farm (

  • var/lib/dokuwiki/data/media/
  • var/lib/dokuwiki/data/index/
    ...
  • var/lib/dokuwiki/farm/
  • var/lib/dokuwiki/farm/animal1/
  • var/lib/dokuwiki/farm/animal1/data/media/
  • var/lib/dokuwiki/farm/animal1/data/index/
    )

So I have made few test
First, to see if all work, I put some pdf in var/lib/dokuwiki/data/media/ and run
strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php

to see how it goes...
And indeed a docsearch folder has been created and an index folder in it correctly built...
\o/
( that's great by the way ; There's a lot of pdf we use here and it is so helpful to be able to make a search inside them )

BUT I have no clue how to make it works for my animals
I tried

strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php?animal1
strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php?animal=animal1
strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php animal1
strace php /var/lib/dokuwiki/lib/plugins/docsearch/cron.php animal=animal1

none work :(

Any clue ?

Thanks for all :)

Add info about number of found dokuments and link on top of resultpage

When a search returns a lot of page hits, the user has to scroll down quite a bit to see the results for documents because the page hits need a lot of space. It would be nice if there is a link at the top of the page to the documents results. Maybe together with an info how many documents have been found.

Memory consumtion for extracted zip files

I wrote the zip2txt.sh script (which can be found on the plugin page) to index zip files. Because it joins all converted files from a given zip file to one big monster txt file the indexer will consume a lot of memory while working on that file. On our wiki (4Jears old, around 600 pages , 9GB in size) we have a zip file which contains scientific literature in multiple pdf documents. After joining the conversions together, the indexer / lexer needs a huge amount of memory. I had to set my memory limit for php to 250MB to avoid a crash on the generated textfile. Here is the output of wc for this file:
wc ./literatur.zip.txt
lines words bytes
78897 1242650 8856762 ./literatur.zip.txt
That means the indexer had to handle one huge 8.8MByte txt file which is off course not easy because the current indexing process is trying to index the file in one big step. Is there a way for this plugin to allow indexing of each file found in a zipfile but still return the zipfile as origin in a search? A unpacker script could return a list of file names instead of one big file. Or would it be better to change the indexing process of dokuwiki to handle such big files with less memory consumption?

Andreas

Memory problem

I tested the plugin successfully on fresh new wiki with success. After that I gave it a try at our main wiki using only the pdf converter given in the example config. After manually calling cron.php it crashes after a second and says:

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 8 bytes) in /var/www//inc/indexer.php on line 224

It did completely convert one pdf to txt which was stored in the data section. It has a size of 1597616 bytes and it contains 221296 words in 9792 text lines. Currently I can not tell whether it crasher while indexing on this file or while trying to convert the next one. Does the converter itself gets affected as well by the php memory limit when it gets calles from inside a php script?

Andreas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.