GithubHelp home page GithubHelp logo

karmi / retire Goto Github PK

View Code? Open in Web Editor NEW
1.9K 52.0 533.0 4.88 MB

A rich Ruby API and DSL for the Elasticsearch search engine

Home Page: http://karmi.github.com/retire/

License: MIT License

Ruby 100.00%

retire's Introduction

Tire


NOTICE: This library has been renamed and retired in September 2013 (read the explanation). It is not considered compatible with Elasticsearch 1.x.

Have a look at the http://github.com/elasticsearch/elasticsearch-rails suite of gems, which contain similar set of features for ActiveModel/Record and Rails integration as Tire.


Tire is a Ruby (1.8 or 1.9) client for the Elasticsearch search engine/database.

Elasticsearch is a scalable, distributed, cloud-ready, highly-available, full-text search engine and database with powerful aggregation features, communicating by JSON over RESTful HTTP, based on Lucene, written in Java.

This Readme provides a brief overview of Tire's features. The more detailed documentation is at http://karmi.github.com/retire/.

Both of these documents contain a lot of information. Please set aside some time to read them thoroughly, before you blindly dive into „somehow making it work“. Just skimming through it won't work for you. For more information, please see the project Wiki, search the issues, and refer to the integration test suite.

Installation

OK. First, you need a running Elasticsearch server. Thankfully, it's easy. Let's define easy:

$ curl -k -L -o elasticsearch-0.20.6.tar.gz http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.20.6.tar.gz
$ tar -zxvf elasticsearch-0.20.6.tar.gz
$ ./elasticsearch-0.20.6/bin/elasticsearch -f

See, easy. On a Mac, you can also use Homebrew:

$ brew install elasticsearch

Now, let's install the gem via Rubygems:

$ gem install tire

Of course, you can install it from the source as well:

$ git clone git://github.com/karmi/tire.git
$ cd tire
$ rake install

Usage

Tire exposes easy-to-use domain specific language for fluent communication with Elasticsearch.

It easily blends with your ActiveModel/ActiveRecord classes for convenient usage in Rails applications.

To test-drive the core Elasticsearch functionality, let's require the gem:

    require 'rubygems'
    require 'tire'

Please note that you can copy these snippets from the much more extensive and heavily annotated file in examples/tire-dsl.rb.

Also, note that we're doing some heavy JSON lifting here. Tire uses the multi_json gem as a generic JSON wrapper, which allows you to use your preferred JSON library. We'll use the yajl-ruby gem in the full on mode here:

    require 'yajl/json_gem'

Let's create an index named articles and store/index some documents:

    Tire.index 'articles' do
      delete
      create

      store :title => 'One',   :tags => ['ruby']
      store :title => 'Two',   :tags => ['ruby', 'python']
      store :title => 'Three', :tags => ['java']
      store :title => 'Four',  :tags => ['ruby', 'php']

      refresh
    end

We can also create the index with custom mapping for a specific document type:

    Tire.index 'articles' do
      delete

      create :mappings => {
        :article => {
          :properties => {
            :id       => { :type => 'string', :index => 'not_analyzed', :include_in_all => false },
            :title    => { :type => 'string', :boost => 2.0,            :analyzer => 'snowball'  },
            :tags     => { :type => 'string', :analyzer => 'keyword'                             },
            :content  => { :type => 'string', :analyzer => 'snowball'                            }
          }
        }
      }
    end

Of course, we may have large amounts of data, and it may be impossible or impractical to add them to the index one by one. We can use Elasticsearch's bulk storage. Notice, that collection items must have an id property or method, and should have a type property, if you've set any specific mapping for the index.

    articles = [
      { :id => '1', :type => 'article', :title => 'one',   :tags => ['ruby']           },
      { :id => '2', :type => 'article', :title => 'two',   :tags => ['ruby', 'python'] },
      { :id => '3', :type => 'article', :title => 'three', :tags => ['java']           },
      { :id => '4', :type => 'article', :title => 'four',  :tags => ['ruby', 'php']    }
    ]

    Tire.index 'articles' do
      import articles
    end

We can easily manipulate the documents before storing them in the index, by passing a block to the import method, like this:

    Tire.index 'articles' do
      import articles do |documents|

        documents.each { |document| document[:title].capitalize! }
      end

      refresh
    end

If this declarative notation does not fit well in your context, you can use Tire's classes directly, in a more imperative manner:

    index = Tire::Index.new('oldskool')
    index.delete
    index.create
    index.store :title => "Let's do it the old way!"
    index.refresh

OK. Now, let's go search all the data.

We will be searching for articles whose title begins with letter “T”, sorted by title in descending order, filtering them for ones tagged “ruby”, and also retrieving some facets from the database:

    s = Tire.search 'articles' do
      query do
        string 'title:T*'
      end

      filter :terms, :tags => ['ruby']

      sort { by :title, 'desc' }

      facet 'global-tags', :global => true do
        terms :tags
      end

      facet 'current-tags' do
        terms :tags
      end
    end

(Of course, we may also page the results with from and size query options, retrieve only specific fields or highlight content matching our query, etc.)

Let's display the results:

    s.results.each do |document|
      puts "* #{ document.title } [tags: #{document.tags.join(', ')}]"
    end

    # * Two [tags: ruby, python]

Let's display the global facets (distribution of tags across the whole database):

    s.results.facets['global-tags']['terms'].each do |f|
      puts "#{f['term'].ljust(10)} #{f['count']}"
    end

    # ruby       3
    # python     1
    # php        1
    # java       1

Now, let's display the facets based on current query (notice that count for articles tagged with 'java' is included, even though it's not returned by our query; count for articles tagged 'php' is excluded, since they don't match the current query):

    s.results.facets['current-tags']['terms'].each do |f|
      puts "#{f['term'].ljust(10)} #{f['count']}"
    end

    # ruby       1
    # python     1
    # java       1

Notice, that only variables from the enclosing scope are accessible. If we want to access the variables or methods from outer scope, we have to use a slight variation of the DSL, by passing the search and query objects around.

    @query = 'title:T*'

    Tire.search 'articles' do |search|
      search.query do |query|
        query.string @query
      end
    end

Quite often, we need complex queries with boolean logic. Instead of composing long query strings such as tags:ruby OR tags:java AND NOT tags:python, we can use the bool query. In Tire, we build them declaratively.

    Tire.search 'articles' do
      query do
        boolean do
          should   { string 'tags:ruby' }
          should   { string 'tags:java' }
          must_not { string 'tags:python' }
        end
      end
    end

The best thing about boolean queries is that we can easily save these partial queries as Ruby blocks, to mix and reuse them later. So, we may define a query for the tags property:

    tags_query = lambda do |boolean|
      boolean.should { string 'tags:ruby' }
      boolean.should { string 'tags:java' }
    end

And a query for the published_on property:

    published_on_query = lambda do |boolean|
      boolean.must   { string 'published_on:[2011-01-01 TO 2011-01-02]' }
    end

Now, we can combine these queries for different searches:

    Tire.search 'articles' do
      query do
        boolean &tags_query
        boolean &published_on_query
      end
    end

Note, that you can pass options for configuring queries, facets, etc. by passing a Hash as the last argument to the method call:

    Tire.search 'articles' do
      query do
        string 'ruby python', :default_operator => 'AND', :use_dis_max => true
      end
    end

You don't have to define the search criteria in one monolithic Ruby block -- you can build the search step by step, until you call the results method:

    s = Tire.search('articles') { query { string 'title:T*' } }
    s.filter :terms, :tags => ['ruby']
    p s.results

If configuring the search payload with blocks feels somehow too weak for you, you can pass a plain old Ruby Hash (or JSON string) with the query declaration to the search method:

    Tire.search 'articles', :query => { :prefix => { :title => 'fou' } }

If this sounds like a great idea to you, you are probably able to write your application using just curl, sed and awk.

Do note again, however, that you're not tied to the declarative block-style DSL Tire offers to you. If it makes more sense in your context, you can use the API directly, in a more imperative style:

    search = Tire::Search::Search.new('articles')
    search.query  { string('title:T*') }
    search.filter :terms, :tags => ['ruby']
    search.sort   { by :title, 'desc' }
    search.facet('global-tags') { terms :tags, :global => true }
    # ...
    p search.results

To debug the query we have laboriously set up like this, we can display the full query JSON for close inspection:

    puts s.to_json
    # {"facets":{"current-tags":{"terms":{"field":"tags"}},"global-tags":{"global":true,"terms":{"field":"tags"}}},"query":{"query_string":{"query":"title:T*"}},"filter":{"terms":{"tags":["ruby"]}},"sort":[{"title":"desc"}]}

Or, better, we can display the corresponding curl command to recreate and debug the request in the terminal:

    puts s.to_curl
    # curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{"facets":{"current-tags":{"terms":{"field":"tags"}},"global-tags":{"global":true,"terms":{"field":"tags"}}},"query":{"query_string":{"query":"title:T*"}},"filter":{"terms":{"tags":["ruby"]}},"sort":[{"title":"desc"}]}'

However, we can simply log every search query (and other requests) in this curl-friendly format:

    Tire.configure { logger 'elasticsearch.log' }

When you set the log level to debug:

    Tire.configure { logger 'elasticsearch.log', :level => 'debug' }

the JSON responses are logged as well. This is not a great idea for production environment, but it's priceless when you want to paste a complicated transaction to the mailing list or IRC channel.

The Tire DSL tries hard to provide a strong Ruby-like API for the main Elasticsearch features.

By default, Tire wraps the results collection in a enumerable Results::Collection class, and result items in a Results::Item class, which looks like a child of Hash and Openstruct, for smooth iterating over and displaying the results.

You may wrap the result items in your own class by setting the Tire.configuration.wrapper property. Your class must take a Hash of attributes on initialization.

If that seems like a great idea to you, there's a big chance you already have such class.

One would bet it's an ActiveRecord or ActiveModel class, containing model of your Rails application.

Fortunately, Tire makes blending Elasticsearch features into your models trivially possible.

ActiveModel Integration

If you're the type with no time for lengthy introductions, you can generate a fully working example Rails application, with an ActiveRecord model and a search form, to play with (it even downloads Elasticsearch itself, generates the application skeleton and leaves you with a Git repository to explore the steps and the code):

$ rails new searchapp -m https://raw.github.com/karmi/tire/master/examples/rails-application-template.rb

For the rest of us, let's suppose you have an Article class in your Rails application.

To make it searchable with Tire, just include it:

    class Article < ActiveRecord::Base
      include Tire::Model::Search
      include Tire::Model::Callbacks
    end

When you now save a record:

    Article.create :title =>   "I Love Elasticsearch",
                   :content => "...",
                   :author =>  "Captain Nemo",
                   :published_on => Time.now

it is automatically added into an index called 'articles', because of the included callbacks.

The document attributes are indexed exactly as when you call the Article#to_json method.

Now you can search the records:

    Article.search 'love'

OK. This is where the search game stops, often. Not here.

First of all, you may use the full query DSL, as explained above, with filters, sorting, advanced facet aggregation, highlighting, etc:

    Article.search do
      query             { string 'love' }
      facet('timeline') { date   :published_on, :interval => 'month' }
      sort              { by     :published_on, 'desc' }
    end

Second, dynamic mapping is a godsend when you're prototyping. For serious usage, though, you'll definitely want to define a custom mapping for your models:

    class Article < ActiveRecord::Base
      include Tire::Model::Search
      include Tire::Model::Callbacks

      mapping do
        indexes :id,           :index    => :not_analyzed
        indexes :title,        :analyzer => 'snowball', :boost => 100
        indexes :content,      :analyzer => 'snowball'
        indexes :content_size, :as       => 'content.size'
        indexes :author,       :analyzer => 'keyword'
        indexes :published_on, :type => 'date', :include_in_all => false
      end
    end

In this case, only the defined model attributes are indexed. The mapping declaration creates the index when the class is loaded or when the importing features are used, and only when it does not yet exist.

You can define different analyzers, boost levels for different properties, or any other configuration for elasticsearch.

You're not limited to 1:1 mapping between your model properties and the serialized document. With the :as option, you can pass a string or a Proc object which is evaluated in the instance context (see the content_size property).

Chances are, you want to declare also a custom settings for the index, such as set the number of shards, replicas, or create elaborate analyzer chains, such as the hipster's choice: ngrams. In this case, just wrap the mapping method in a settings one, passing it the settings as a Hash:

    class URL < ActiveRecord::Base
      include Tire::Model::Search
      include Tire::Model::Callbacks

      settings :number_of_shards => 1,
               :number_of_replicas => 1,
               :analysis => {
                 :filter => {
                   :url_ngram  => {
                     "type"     => "nGram",
                     "max_gram" => 5,
                     "min_gram" => 3 }
                 },
                 :analyzer => {
                   :url_analyzer => {
                      "tokenizer"    => "lowercase",
                      "filter"       => ["stop", "url_ngram"],
                      "type"         => "custom" }
                 }
               } do
        mapping { indexes :url, :type => 'string', :analyzer => "url_analyzer" }
      end
    end

Note, that the index will be created with settings and mappings only when it doesn't exist yet. To re-create the index with correct configuration, delete it first: URL.index.delete and create it afterwards: URL.create_elasticsearch_index.

It may well be reasonable to wrap the index creation logic declared with Tire.index('urls').create in a class method of your model, in a module method, etc, to have better control on index creation when bootstrapping the application with Rake tasks or when setting up the test suite. Tire will not hold that against you.

You may have just stopped wondering: what if I have my own settings class method defined? Or what if some other gem defines settings, or some other Tire method, such as update_index? Things will break, right? No, they won't.

In fact, all this time you've been using only proxies to the real Tire methods, which live in the tire class and instance methods of your model. Only when not trampling on someone's foot — which is the majority of cases —, will Tire bring its methods to the namespace of your class.

So, instead of writing Article.search, you could write Article.tire.search, and instead of @article.update_index you could write @article.tire.update_index, to be on the safe side. Let's have a look on an example with the mapping method:

    class Article < ActiveRecord::Base
      include Tire::Model::Search
      include Tire::Model::Callbacks

      tire.mapping do
        indexes :id, :type => 'string', :index => :not_analyzed
        # ...
      end
    end

Of course, you could also use the block form:

    class Article < ActiveRecord::Base
      include Tire::Model::Search
      include Tire::Model::Callbacks

      tire do
        mapping do
          indexes :id, :type => 'string', :index => :not_analyzed
          # ...
        end
      end
    end

Internally, Tire uses these proxy methods exclusively. When you run into issues, use the proxied method, eg. Article.tire.mapping, directly.

When you want a tight grip on how the attributes are added to the index, just implement the to_indexed_json method in your model.

The easiest way is to customize the to_json serialization support of your model:

    class Article < ActiveRecord::Base
      # ...

      self.include_root_in_json = false
      def to_indexed_json
        to_json :except => ['updated_at'], :methods => ['length']
      end
    end

Of course, it may well be reasonable to define the indexed JSON from the ground up:

    class Article < ActiveRecord::Base
      # ...

      def to_indexed_json
        names      = author.split(/\W/)
        last_name  = names.pop
        first_name = names.join

        {
          :title   => title,
          :content => content,
          :author  => {
            :first_name => first_name,
            :last_name  => last_name
          }
        }.to_json
      end
    end

Notice, that you may want to skip including the Tire::Model::Callbacks module in special cases, like when your records are indexed via some external mechanism, let's say a CouchDB or RabbitMQ river, or when you need better control on how the documents are added to or removed from the index:

    class Article < ActiveRecord::Base
      include Tire::Model::Search

      after_save do
        update_index if state == 'published'
      end
    end

Sometimes, you might want to have complete control about the indexing process. In such situations, just drop down one layer and use the Tire::Index#store and Tire::Index#remove methods directly:

    class Article < ActiveRecord::Base
      acts_as_paranoid
      include Tire::Model::Search

      after_save do
        if deleted_at.nil?
          self.index.store self
        else
          self.index.remove self
        end
      end
    end

Of course, in this way, you're still performing an HTTP request during your database transaction, which is not optimal for large-scale applications. In these situations, a better option would be processing the index operations in background, with something like Resque or Sidekiq:

    class Article < ActiveRecord::Base
      include Tire::Model::Search

      after_save    { Indexer::Index.perform_async(document) }
      after_destroy { Indexer::Remove.perform_async(document) }
    end

When you're integrating Tire with ActiveRecord models, you should use the after_commit and after_rollback hooks to keep the index in sync with your database.

The results returned by Article.search are wrapped in the aforementioned Item class, by default. This way, we have a fast and flexible access to the properties returned from Elasticsearch (via the _source or fields JSON properties). This way, we can index whatever JSON we like in Elasticsearch, and retrieve it, simply, via the dot notation:

    articles = Article.search 'love'
    articles.each do |article|
      puts article.title
      puts article.author.last_name
    end

The Item instances masquerade themselves as instances of your model within a Rails application (based on the _type property retrieved from Elasticsearch), so you can use them carefree; all the url_for or dom_id helpers work as expected.

If you need to access the “real” model (eg. to access its associations or methods not stored in Elasticsearch), just load it from the database:

    puts article.load(:include => 'comments').comments.size

You can see that Tire stays as far from the database as possible. That's because it believes you have most of the data you want to display stored in Elasticsearch. When you need to eagerly load the records from the database itself, for whatever reason, you can do it with the :load option when searching:

    # Will call `Article.search [1, 2, 3]`
    Article.search 'love', :load => true

Instead of simple true, you can pass any options for the model's find method:

    # Will call `Article.search [1, 2, 3], :include => 'comments'`
    Article.search :load => { :include => 'comments' } do
      query { string 'love' }
    end

If you would like to access properties returned by Elasticsearch (such as _score), in addition to model instance, use the each_with_hit method:

    results = Article.search 'One', :load => true
    results.each_with_hit do |result, hit|
      puts "#{result.title} (score: #{hit['_score']})"
    end

    # One (score: 0.300123)

Note that Tire search results are fully compatible with WillPaginate and Kaminari, so you can pass all the usual parameters to the search method in the controller:

    @articles = Article.search params[:q], :page => (params[:page] || 1)

OK. Chances are, you have lots of records stored in your database. How will you get them to Elasticsearch? Easy:

    Article.index.import Article.all

This way, however, all your records are loaded into memory, serialized into JSON, and sent down the wire to Elasticsearch. Not practical, you say? You're right.

When your model is an ActiveRecord::Base or Mongoid::Document one, or when it implements some sort of pagination, you can just run:

    Article.import

Depending on the setup of your model, either find_in_batches, limit..skip or pagination is used to import your data.

Are we saying you have to fiddle with this thing in a rails console or silly Ruby scripts? No. Just call the included Rake task on the command line:

    $ rake environment tire:import:all

You can also force-import the data by deleting the index first (and creating it with correct settings and/or mappings provided by the mapping block in your model):

    $ rake environment tire:import CLASS='Article' FORCE=true

When you'll spend more time with Elasticsearch, you'll notice how index aliases are the best idea since the invention of inverted index. You can index your data into a fresh index (and possibly update an alias once everything's fine):

    $ rake environment tire:import CLASS='Article' INDEX='articles-2011-05'

Finally, consider the Rake importing task just a convenient starting point. If you're loading substantial amounts of data, want better control on which data will be indexed, etc., use the lower-level Tire API with eg. ActiveRecordBase#find_in_batches directly:

    Article.where("published_on > ?", Time.parse("2012-10-01")).find_in_batches(include: authors) do |batch|
      Tire.index("articles").import batch
    end

If you're using a different database, such as MongoDB, another object mapping library, such as Mongoid or MongoMapper, things stay mostly the same:

    class Article
      include Mongoid::Document
      field :title, :type => String
      field :content, :type => String

      include Tire::Model::Search
      include Tire::Model::Callbacks

      # These Mongo guys sure do get funky with their IDs in +serializable_hash+, let's fix it.
      #
      def to_indexed_json
        self.to_json
      end

    end

    Article.create :title => 'I Love Elasticsearch'

    Article.tire.search 'love'

Tire does not care what's your primary data storage solution, if it has an ActiveModel-compatible adapter. But there's more.

Tire implements not only searchable features, but also persistence features. This means you can use a Tire model instead of your database, not just for searching your database. Why would you like to do that?

Well, because you're tired of database migrations and lots of hand-holding with your database to store stuff like { :name => 'Tire', :tags => [ 'ruby', 'search' ] }. Because all you need, really, is to just dump a JSON-representation of your data into a database and load it back again. Because you've noticed that searching your data is a much more effective way of retrieval then constructing elaborate database query conditions. Because you have lots of data and want to use Elasticsearch's advanced distributed features.

All good reasons to use Elasticsearch as a schema-free and highly-scalable storage and retrieval/aggregation engine for your data.

To use the persistence mode, we'll include the Tire::Persistence module in our class and define its properties; we can add the standard mapping declarations, set default values, or define casting for the property to create lightweight associations between the models.

    class Article
      include Tire::Model::Persistence

      validates_presence_of :title, :author

      property :title,        :analyzer => 'snowball'
      property :published_on, :type => 'date'
      property :tags,         :default => [], :analyzer => 'keyword'
      property :author,       :class => Author
      property :comments,     :class => [Comment]
    end

Please be sure to peruse the integration test suite for examples of the API and ActiveModel integration usage.

Extensions and Additions

The tire-contrib project contains additions and extensions to the core Tire functionality — be sure to check them out.

Other Clients

Check out other Elasticsearch clients.

Feedback

You can send feedback via e-mail or via Github Issues.


Karel Minarik and contributors

retire's People

Contributors

chromeragnarok avatar cjbottaro avatar clemens avatar crx avatar dylanahsmith avatar erickt avatar fabn avatar fcheung avatar filiptepper avatar floere avatar grantr avatar greglearns avatar jonkarna avatar jwaldrip avatar karmi avatar martinciu avatar michaelklishin avatar moktin avatar msonnabaum avatar nz avatar phoet avatar phungleson avatar pshoukry avatar ralph avatar redbeard avatar romanbsd avatar rubish avatar timoschilling avatar vhyza avatar woahdae avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

retire's Issues

Tire requires old version of rake

when I'm trying to install tire, it gives following error message:

Bundler could not find compatible versions for gem "rake":
  In Gemfile:
    tire depends on
      rake (~> 0.8.0)

    rails (>= 3.0.7) depends on
      rake (0.9.1)

"bundle update" of course solves the problem and installs rake-0.8, but if I'm not wrong, Rails 3.1 will be dependent of rake-0.9. May be changing dependency will be possible?

How to define mappings for nested Data?

Currently I have Yellowpages, each containing an address. When searching for an Yellowpage I want the search to search in those nested addresses too. So I embedded address in the to_index_json method.

class Yellowpage < ActiveRecord::Base
  has_one :address

  mapping do
    indexes :id,      :type => 'integer', :analyzed => false
    indexes :title,   :type => 'string',  :boost => 2
    indexes :body,    :type => 'string',  :analyzer => 'snowball'
  end

  def to_indexed_json
    {
      id:     id,
      title:  title,
      body:   body,
      address: {
        id:     address.id,
        name:   address.name,
        street: address.street,
        ...
      }      
    }.to_json
  end
end

Is there a way to define the nested mapping (yellowpage.address) in an ActiveRecord with Tire? And is this a good way to solve my problem or is there an other best-practice/usage pattern I should look into=

Multi Word Facet issue

I am incorporating Tire gem to one of my rails application. I have a derived field called channel_name which I need to use for faceting. My model looks like this

include Tire::Model::Search
include Tire::Model::Callbacks

mapping do
indexes :channel_name,:type => 'string',:index=> :not_analyzed
end

def to_indexed_json
channel_name =self.group_channel.name

    {
      :title => title,
      :description => description,
      :channel_name =>channel_name,
      :created_at=>created_at
    }.to_json

end

And I build index like this Show.import :per_page => 500

And when I search using this
q = 'series'
s=Show.search do
query { string q }
facet('channel_name') { terms :channel_name}
sort { created_at 'desc' }
end

I am getting facets as single words. Ex: I have a channel name 'This is India' , but facets comes as

term=>'this',count=>2
term=>'is',count=>2
term=>'India',count=>2

But I need this term=>'this is India',count=>2

Could you please tell me where I am going wrong. I checked the index created in data folder of elastic search , but there my channel_name got indexed properly.

facet counts of entire index

Not sure if this one is an issue, but i wanted to do a facet of commercial_type with no query.

s = EmailMessage.search do


    facet 'commercial_type' do
      terms :commercial_type
    end

end

The above results in this request failure:

[REQUEST FAILED] curl -X POST "http://localhost:9200/email_messages/_search?pretty=true" -d '{"query":null,"facets":{"commercial_type":{"terms":{"field":"commercial_type","size":10,"all_terms":false}}}}'

I am able to get it working by using a wildcard string search:

s = EmailMessage.search do
    query{ string '*'}

    facet 'commercial_type' do
      terms :commercial_type
    end

end

Thanks!

Method name conflict with Mongoid: index

I'm using Mongoid and running into an issue when I attempt to use tire:

/home/chris/.rvm/gems/ruby-1.9.2-p180/gems/tire-0.1.1/lib/tire/model/search.rb:54:in index': wrong number of arguments (2 for 0) (ArgumentError)`

This is being thrown from my model where I've got the Mongoid "index" method for specifying that I want a field in my model indexed in MongoDB.

I'm relatively new to Ruby - is there a way I can namespace the class and instance methods from the Tire gem so I don't run into this?

ActiveRecord fields not in database

I am trying to define a custom mapping block on my ActiveRecord object for fields that are not in my database. I noticed that tire is explicitly deleting these fields when i call to_indexed_json.

else
     self.serializable_hash.
       reject { |key, value| ! self.class.mapping.keys.map(&:to_s).include?(key.to_s) }.
       to_json
end

What is the proper way of defining fields that are not attributes within ActiveRecord?

Thanks!

Undefined method facets

I have a feeling i am doing something wrong :-). Trying to follow the faceted search example here:

http://karmi.github.com/tire/

My code looks like:

https://gist.github.com/33754cdc422b9dfb70eb

And from irb, i get:

irb(main):014:0* s.results.facets['tags']['terms'].each do |f|
irb(main):015:1*   puts "#{f['term'].ljust(10)} #{f['count']}"
irb(main):016:1> end
NoMethodError: undefined method `facets' for #<Array:0x00000003adc070>
from (irb):14
from /usr/bin/irb:12:in `<main>'
irb(main):017:0> 

Thanks!

multiple filter bug

In the Filter#to_json method, it is possible to have an array of filters but when converting the filters to a hash, the last filter in the array will be the one that is used.

JSON encoding broken when using MultiJSON without Yajl?

This is an example request from my app, as generated by Tire:

"{"query":{"match_all":{}},"sort":[{"last_name":"asc"}],"size":10}"

Since the switch to MultiJSON, I now get a rogue "value" key in my sort param, which makes the whole thing invalid:

"{"query":{"match_all":{}},"sort":{"value":[{"last_name":"asc"}]},"size":10}"

This is with MultiJson.engine == :json_gem, the default. If I set the engine to :yajl in the debugger, it works. If I set it back to :json_gem, it breaks again. Oddly, if I restore the require 'yajl/json_gem' in lib/tire.rb (alongside the MultiJSON require), it works with either engine value, as if Yajl is really doing the work even when it's set to :json_gem.

I have no problem just forcing the use of Yajl, but when using Tire in a Rails apps, what's the proper way to do that? gem 'yajl-ruby' alone in the Gemfile doesn't cut it. Should I put MultiJson.engine = :yajl in an initializer? Or is this better fixed in Tire?

URL with slash breaks tire

Doing

Tire.configure do
  url "http://localhost:9200/"
end

results in a bunch of:

[ERROR] 500 Internal Server Error:{"error":"ElasticSearchException[String index out of range: 0]; nested: StringIndexOutOfBoundsException[String index out of range: 0]; ","status":500}, retrying (1)...

Given the error looks javaish I wasn't suspecting it to be the configuration - also because I could see it did connect to ES.

So I propose tire checks the format of the string or URI.parse's it.

created_at and updated_at are returned as nil

I indexed a basic User ActiveRecord model and although the created_at fields and updated_at fields are indexed properly, as shown with the Elastic Search Head browser, they are being returned as nil from User.search. A few other fields are coming back as nil as well. This is probably a user error, my apologies. I am using Rails 3.05.

undefined method `paginate' for MyClass:Class, when importing

So I'm using mongo, but I'm not sure thats the problem here, when I try to run tire:import on my model it fails with undefined methodpaginate' for Recipe:Class`

rake environment tire:import CLASS='Recipe' PARAMS='{:page => 1}' FORCE=1 --trace
** Execute environment
** Execute tire:import
# ...
--------------------------------------------------------------------------------
rake aborted!
undefined method `paginate' for Recipe:Class

And here is the stack trace for that:

rake aborted!
undefined method `paginate' for Recipe:Class
/Users/jpsilvashy/.rvm/gems/ruby-1.9.2-p180/gems/tire-0.1.14/lib/tire/index.rb:136:in `import'
Tasks: TOP => tire:import

Any ideas? I've tried it with and without will_paginate installed also with no success.

Possible pagination issue when importing with mongoid >=2.1.2

Take a look at https://github.com/mongoid/mongoid/issues/1131.

Doing the importing like described here: #48 (comment) started to result in these errors:

[ERROR] 400 Bad Request:{"error":"Failed to derive xcontent from []"}, retrying (1)...
[ERROR] 400 Bad Request:{"error":"Failed to derive xcontent from []"}, retrying (2)...
[ERROR] 400 Bad Request:{"error":"Failed to derive xcontent from []"}, retrying (3)...
[ERROR] 400 Bad Request:{"error":"Failed to derive xcontent from []"}, retrying (4)...
[ERROR] 400 Bad Request:{"error":"Failed to derive xcontent from []"}, retrying (5)...

I worked around it by adding ".all.to_a" to my paginate call as seen here: https://github.com/mongoid/mongoid/issues/1131#issuecomment-1808706

Bug?

Define multilevel nested mapping

Creating this ticket from the dialog in #56

Tire currently only supports single nested mapping, and not capable of creating nested multi level mapping.

In searchable ActiveModel integration, add a proxy object to every result item, pointing to the model instance

Due to the problems with ActiveRecord models with associations (see the original issue #10), add a proxy object to every result item allowing the access to the underlying model instance.

This way, we would keep the performance of ElasticSearch, for most cases, and allow to access the original model instance (and its methods), where needed. So, instead of result.comments.first.my_complicated_method, you would write result.object.comments.first.my_complicated_method.

Index aliases

In built support for index aliases would be a very nice feature to have. I would suggest something on the similar lines of escargot(https://github.com/angelf/escargot).

Considerations:

  1. Make indexes and aliases different for different environments
  2. Provide a rake task which creates a new index and on successful completion, updates the alias model used to read to point to new index
  3. While new index is being created send updates to both old and new index. May be maintain a write alias for each model, which points to all indexes currently in ES for model.
  4. Provide a rake task to delete unused indexes, once which are not used by the alias anymore

Tire::Model::Callbacks prevents object deletion

When I include Tire::Model::Callbacks in a model the records are no longer removed on destroy() from the database. All associations are deleted and also the Elastic Search Index gets update but the final delete statement is never run.

So far I had no luck narrowing it down. Any ideas where to look?

This is with Rails 3.0.9 and tire 0.1.16.

Dual interface for methods

I would like to push this to the top of the queue. I started digging into the source and it seems like the simplest solution would be to add another method like "method_hash". I am not sure this is the best way to do it or maybe if you are calling it with just a hash, set the type to be nil and do some logic in the method to signify this.

too much data for sort() with no index. add an index or specify a smaller limit

When running import for my model, if I have a lot of records I get the following:

[IMPORT] Starting import for the 'Recipe.all' class
--------------------------------------------------------------------------------
rake aborted!
too much data for sort() with no index.  add an index or specify a smaller limit

Now, I think this is more of a Mongo problem, but I'm not sure what index to create to satisfy Tire, or maybe it's a tire problem, I can't really tell.

will_paginate errors

Trying to create a simple search form with will paginate:

https://gist.github.com/35b576c02ce480177ec7

But i get this error:

ActionView::Template::Error (undefined method offset' for #<Tire::Results::Collection:0x00000105314ce0>): 1: <%= page_entries_info @email_messages %> #-> Displaying posts 6 - 10 of 26 in total app/views/simple_searches/create.html.erb:1:in_app_views_simple_searches_create_html_erb__1619823318822797195_2155602960_2692340048174035920'

Thanks!

Latest Release Breaks STI

I had some trouble tracking this down, but for some reason, in 0.2.0, if add tire to an ActiveRecord model that is inherited via STI, it breaks Rail's ability to correctly set the type column. Search still works, but for some reason, Rails can no longer correctly set the type column for the model when new records are created.

Chicken & Egg problem with <Model>.index_name

I have an initializer, which looks like this:

Tire.index(Show.index_name) do
  create(
    :settings => {
      "analysis" => {
        "analyzer" => {
          "show_name_analyzer" => {
            "type" => "custom",
            "tokenizer" => "lowercase",
            "filter" => ["name_ngram"]
          }
        },
        "filter" => {
          "name_ngram" => {
            "type" => 'edgeNGram',
            "min_gram" => 2,
            "max_gram" => 7,
            "side" => "front"
          }
        }
      }
    }
  )
end

However the problem is that when autoloading Show, in order to call index_name, the index is created first, with the mapping as defined in the model, I see this in the (verbose) tire log:

#2011-07-13 17:40:30:412 [CREATE] ("test-wit-shows")
#
curl -X POST "http://localhost:9200/test-wit-shows" -d '{"mappings":{"show":{"properties":{"id":{"type":"string","analyzer":"not_analysed"},"name":{"type":"string","analyzer":"show_name_analyzer","boost":500},"overview":{"type":"string"},"started":{"type":"integer"}}}}}'

#2011-07-13 17:40:30:412 [400 Bad Request]

#2011-07-13 17:40:30:537 [CREATE] ("test-wit-shows")
#
curl -X POST "http://localhost:9200/test-wit-shows" -d '{"settings":{"analysis":{"analyzer":{"show_name_analyzer":{"type":"custom","tokenizer":"lowercase","filter":["name_ngram"]}},"filter":{"name_ngram":{"type":"edgeNGram","min_gram":2,"max_gram":7,"side":"front"}}}}}'

#2011-07-13 17:40:30:538 [200]

Suggestion would be to allow to set a settings along side the mapping inside the model, an issue about which I have spoken with you on IRC; thus ensuring when the model is auto-loaded from anywhere,

The workaround, of course is to define the index name explicitly, in my case #{Rails.env}-wit-shows.

Perhaps I am mis-using index_name on models, for me the naming scheme is clear, it's always Rails.env plus whatever the name of the project and model are.

Facet Counts

When using filtering by facets only the counts do not seem to reflect what the search results are

@s = Tire.search "14" do
query do
string 'title:*'
end
filter :terms, :prices => ["#{price}"] if price
filter :terms, :tags => ["#{tag}"] if tag
facet 'current-tags' do
terms :tags
end
facet 'current-prices' do
terms :prices
end
end

Empty search returns

Tags
1 - 48 (coresponds to "Kitchen")
Prices
1 - 2282
2 - 1455
4 - 258
3 - 138

Filter based on tag 1 apears to select the correct documents but facet counts remain as above. however by editing the query to title :kitchen

the facets look correct

Tags
1 - 48
Prices
1 - 48
2 - 47
4 - 14
3 - 4

Hope this make sense.

Regards

rspec 2 empty result with capybara dsl

i've setup rspec with capybara dsl. When running integration spec, i keep on getting empty search result from tire search. But when i use the same curl in a shell, i'm able to pull back result. Anyone else seeing this phenomenon?

[REQUEST FAILED] curl -X POST "http://localhost:9200/test_events/_search?pretty=true" -d '...'

Indexing appears to be fine.

gist of my spec_helper.rb looks like:

require 'capybara/rspec'
require 'capybara/rails'

RSpec.configure do |config|
    config.before :all do
      # namespace  elastic search index with test_
      # to avoid clashing with development data
      SuperDuperModel.index_name('test_' + SuperDuperModel.model_name.plural)
    end

    # clear DB before scenario
    config.before :each do
      Tire.index(SuperDuperModel.index_name).delete
    end
end

Models with Associations

I am trying to create an index with associations and running into errors.

For example, I have a Video model that has many categories. I want to be able to search on the video information as well as the category name.

def to_indexed_json
    { :title => title,
      :description => description,
      :categories => categories.collect { |c| c.name },
      :released_at => released_at,
    }.to_json
end

Though when the object gets reloaded I get an unknown attribute error from Rails being caused by this line in Results::Collection

object = Configuration.wrapper.new(document)

I see what the problem is, so my question is, are there plans on changing on how this is implemented in the future or am I just going about this the wrong way.

ActiveRecord mapping ":analyzed => false" broken

In your documentation you have this example:

  mapping do
    indexes :id,           :type => 'string',  :analyzed => false
    ...

For me that doesn't create the right mapping in elasticsearch. It should be:

  mapping do
    indexes :id,           :type => 'string',  :index => :not_analyzed

Tire.index

instance_eval is taking over self in when giving this a block; there is no other option then that;

welI I cannot give register_percolator_query or any other method instance variables from my original class; as instance_eval is called on the index and down to the query

Questions about ActiveRecord branch

I noticed that the activerecord branch has the :load option for loading results from the database but master does not. Is this something that is going to be merged or will it stay in a separate branch?

I'm trying to figure out if I can use this functionality or if I should change my index to include all of the fields I want to display in my app.

Make Tire HTTP-Client Agnostic

Currently, Tire is closely coupled with the RestClient HTTP library on the implementation level, although it is prepared for pluggable backends on the architecture level.

To make this architectural support, such as Tire.configure { client MyClient }, really useful, and HTTP backend pluggable, the following must be implemented:

  • Create a wrapper class Tire::HTTP::Response for HTTP responses, with uniform interface for getting response body, code, headers.
  • Use the wrapper class in the default client implementation, Tire::Client::RestClient
  • Use the wrapper class throughout the code (eg. in Tire::Search::Search.perform method)
  • Create another client implementation for testing purposes, preferably for Curb

The goal is to make the implementation of another HTTP client as easy as providing an interface in Tire or Tire Contrib.

In searchable ActiveModel integration, do not wrap the results in the model class, but in Tire::Configuration.wrapper

Due to the problem with indexing and retrieving ActiveRecord models with associations (such as Article has_many :comments), do not wrap the results in the model class, but in the Tire::Configuration.wrapper class. See the #10 issue for explanation.

This way, we do not break the model initialization with unknown attributes, while preserving easy and elastic access to result's properties.

Default facet size?

I'm just wondering why the default facet size is set to 10. It doesn't seem like it is necessary and should probably only be set if explicitly requested.

Add a :load option to search methods to load records from database

In the new ActiveModel implementation (in the activerecord branch), there's an option to load the "real", underlying model with the loadmethod. This allows for loading the records only one-by-one, though.

An option :load => true shall be implemented for all the search methods, which will eagerly load all the records from the database (in Results::Tire::Collection#results), based on IDs returned from ElasticSearch.

This will allow to use Tire (and ElasticSearch) to retrieve batches of records at one, when neccessary (eg. in export jobs).

Cannot index virtual attributes

In my User model I have the following mapping specified:

mapping do
indexes :id, :type => 'integer', :index => :not_analyzed
indexes :lat_lon, :type => 'geo_point'
end

lat_lon is just a virtual attribute that takes the actual latitude and longitude attributes and puts them in the correct geo_point format. However, it doesn't seem to try and index this (I know I can get around this by overriding to_indexed_json). Is there anyway that this can be done without overriding to_indexed_json though? Or is there an easier way to index geo_points?

Relations in Tire::Model::Persistence?

I want to throw away MongoDB and I need relations. So the question is: are all those belongs_to, has_many, has_and_belongs_to_many, has_one, has_many :xxx, :as => :polymorphic are they planned for Tire::Model::persistence?

Thank you.

how to do unsupported queries

I'd like to use the elastic search geo distance facet, but this search feature is not yet part of the Tire DSL. Is there a recommended way of doing an arbitrary elastic search query? Or to be more specific, to use the elastic search geo distance facet? Ideally I'd like a solution that can be combined in a single query with the elastic search features that are supported by Tire.

Best,
Andy

Kaminari support

It would be nice if you guys added Kaminari support to the pagination module. Kaminari seems to be replacing will_paginate as the preferred pagination solution for Rails.

I haven't looked at in in detail but I believe this is pretty easy and would just involve aliasing some methods on the pagination module.

My code does not work under new 0.2.0 release

I have some simple tire search code within my rails 3.0.10 app. Today after I updated to 0.2.0, the code does not work anymore. There is no errors or warnings. My code snippet is like this:

  q = request.GET[:q]
  @s = Tire.search 'index_name' do
     query do
        string q 
     end
   end

Infer document type from document itself

Currently, the Index#store, Index#remove, etc methods take either a single argument (document ID), or double argument (document type, document ID). The type should be inferred from the document itself, from type/_type Hash key or object method, as stated in the TODO.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.