GithubHelp home page GithubHelp logo

eklem / search-index-cookbook Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 5.0 52 KB

A collection of recipes and how to's on interesting use cases with search-index

License: MIT License

norch cookbooks search search-engine javascript

search-index-cookbook's Introduction

search-index and norch cookbook

Join the chat at https://gitter.im/fergiemcdowall/search-index NPM version NPM downloads MIT License

NOT COMPATIBLE WITH LATEST SEARCH-INDEX !!!

Version 0.8.x and 0.9.x of had some changes that makes a good part of the guide outdated. Put a comment on this issue to get notification when updated is out.

A collection of recipes and how to's on interesting use cases with search-index. Feel free to suggest a new topic you want explained. Check the prerequisits for getting the most out of the cookbook.

Topics

Pitfalls

References

Get up and running with Node and NMP

TODO

Work wery much in progress. A long list of things yet to do. And you're more than welcome to suggest new recipes & topics.

search-index-cookbook's People

Contributors

eklem avatar fergiemcdowall avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

search-index-cookbook's Issues

Hit highlighting

Howto get hit highlighting work with the teaser-field. Describe the limitations, and best strategies to make it work the best for you (i.e. only on one field).

Cache search index file/data for index in the browser app

When running search-index in the browser and the webserver is only holding static files it would be good to show how to cache the normal stuff as HTML, CSS, JS and also the actual index. Or the index data as JSON.

This will ensure two things:

  • You won't need a lot of CPU since the app is running in the browser.
  • You won't need a lot of bandwidth since the index file (that changes every now and then) is cached in various network equipment.

Strategies for weighting fields indexing and query side

Different strategies when weighting document fields for batches, and how to weigh some fields for certain documents higher or lower.

And the benefits of query side weighting:

  • No re-indexing
  • Seasonal changes (through a day, a week, a month, a year)

Feature tradeoff's for memory and CPU

Take a look at memory use while indexing. See what makes it differ:

  • Indexing in to existing index
  • nGramLength
  • fielded search
  • searchable false/true
  • batchSize

Just look at top to get some ballpark numbers. Could check if memwatch-next could maybe help

Getting the data: Crawling

How to get the data and what you have to figure out if crawling the data

  • How to figure out which URLs to actually crawl (site crawl or list pages)
  • When is a paginated list finished (tests do run)
  • New content
  • Updated content
  • What content to crawl on a page
  • Preparing for filtering on buckets or categories

The main parts of a search engine

  1. Getting the documents (crawling)
  2. Document processing & enrichment
  3. Index (add/remove + take query/return results)
  4. Query pipeline
  5. User queries

Make reference into one document

To not create a competing documentation to search-index own, keep the reference to one document, and mostly explanatory text, not code and howto.

Get data

The data can be kept outside this repo, but have code to index some existing repos on GitHub. Need one or more indexer inside this repo.

  1. Use Zapier-recipe on GitHub events for search-index.
  2. Food recipes
  3. Reuters data set

How to use the cookbook interactively?

For best use of this cookbook:

npm install search-index-cookbook

Should have html files with browserified search-index javascript ready to index different data sets. Should be easy to play around in developer tool's javascript consoles. This goes for both indexing side and query side and should have example files for each recipe.

This should be possible, @fergiemcdowall ?

Update for new search-index API

Search API changed in 0.8.0 to allow for NOT and OR. Facets are now 'categories' and 'buckets'

Need to update the examples here

opensearch based on matcher running in the browser

A matcher only search-index running in the browser for opensearch. Not sure if it is possible to reach an in-browser index from the browser search box, but think it's doable.

Investigate: Set up browser demo from search-index on a server w/ a domain and add an opensearch.xml

Base it on search-index and the ngraminator.

When to use buckets and when to use categories

Short answer: When the list of categories grow too long. If you feel the list of available filters is almost as- or as long as the result list, then you need to group them into buckets.

You filter to split the result list into more a more manageable size, not to pinpoint with one filter added.

"phrased search"

How to let the user do a "phrased search".

This comes out of the box, but none of the frontends we've created have used this feature. Should be explained that you only need to NOT split up the query string into separate words.

Create your first search engine backend and frontend

On your own server/laptop:

Easiest steps to get something up and running:

  • dataset + config: JSON Gist
  • data in: search-index-indexer
  • norch: point to index (and accept calls from other IP)
  • norch-angular-app: config your frontend

Cloud

Heroku / norch.io

Show how you can sort search results other than tf-idf

Numerical and alphabetical sorting of search results, facets etc.

Use cases:

  • Numeric sorting could lead to geographic proximity sorting, and other fun stuff.
  • When the type and amount of facets are known, you could use alphabetical sort

The query object

All the bells an whistles of the query object: List all features, and create an example that the user can play around with.

CPU and file size tradeoffs when indexing

search-index has a lot of nice features, but some of them come at a resource price.

  • fielded search
  • nGramLength
  • searchable: false
  • batchSize
    ...
    Document what are the tradeoffs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.