GithubHelp home page GithubHelp logo

opakalex / couch_normalizer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zatvobor/couch_normalizer

0.0 2.0 0.0 968 KB

Data migration appliance that works inside Couch DB node

Home Page: https://github.com/datahogs/couch_normalizer

License: Other

Elixir 49.27% Erlang 50.73%

couch_normalizer's Introduction

Couch Normalizer: A convenience for massive document migration

at Datahogs we've been using a Couch Normalizer (it's a result of our challenges) as a part of our data driven engineering to one of music startup.

A Couch Normalizer designed as a standard Apache CouchDB httpd handler (it means that you got a RESTfull interface for interoperability) and uses Rails db migration approach. Written both in Erlang and Elixir. Works well on production and has a great IO performance.

As a result, it allows a developer to deploy migration scripts (aka scenarios) and change big amount of documents as fast as possible (without HTTP overhead and some kind of 'delayed jobs') via internal CouchDB functions, such as couch_db:open_doc/2, couch_db:update_doc/3 and so on.

Indicative performance (non trivial case)

# CouchDB: 21_634_631 documents, 67.19Kb (average document size)
# runtime: 43 minutes and 41 seconds (2621)
[{ "docs_normalized":347144, "docs_read":21634631, "started_on":1357306518, "finished_on":1357309139, "num_workers":20 }]

# runtime: 27 minutes and 45 seconds (1665)
[{ "docs_normalized":17, "docs_read":21634631, "started_on":1357656809, "finished_on":1357658474, "num_workers":20 }]

Imagine, you manage a big DB which contains 'user', 'track', 'album', 'artist' documents and you have to improve/change structure for all 'user' documents once or just only for couple of recent (daily) documents as quick as possible, without network issues/latency, fallbacks and execution monitoring...

Let's consider an example:

  use CouchNormalizer.Scenario

  CouchNormalizer.Registry.acquire "1-example-scenario", fn(db, _doc_id, _rev, body) ->
    # 0. retrieves field value from the given document body.
    # `field/1`, `field/2` just another convenience for getting values from the body.
    if body["type"] == "user" do
      # 1. updates/improves document structure.
      update_field :field, String.upcase(body["field"])
      rename_field :old_name, :new_name

      # 2. removes unused/deprecated fields.
      remove_field  :unused_a
      remove_fields [:unused_b, :unused_c, :unused_d]

      # 3. creates new fields.
      create_field :string, "string"
      create_field :array,  ["hello", "world"]
      create_field :hash,   {[{"key", "value"}, {"key_1", "value_1"}]}
      create_field :integer, 10

      # 4. reads some value from external document.
      create_field :link, doc_field("ddoc", :link)
      # 4.1 reads some doc and value from external db.
      create_field :link1, doc_field("db", "ddoc", :link)
      # 4.2 reads external document once and cache it.
      # so, further method calls (during particular normalization session) returns cached value.
      create_field :link2, doc_field(db, "ddoc", :link, :cached!)

      if field(body, :no_longer_available) == true do
        # 5. removes current document (mark as _deleted = true).
        mark_as_deleted!
        # 5.1 removes some external document.
        remove_document!("db", field(:ticket_id))
      end

      # Finally, notifies the normalizer engine about changes which should be applied (it updates a document).
      {:update, body}
    end
  end

All you need to do, just to submit a POST request:

curl -v -XPOST -H"Content-Type: application/json" http://127.0.0.1:5984/db/_normalize
=> {"ok":"Normalization process has been started (<0.174.0>)."}

As a result, a Couch Normalizer starts internal iterator for db database and tries to apply each document to 1-example-scenario scenario.

In short, CouchNormalizer.Registry.acquire/2 accepts scenario title and callback function which will be applied for each document inside your CouchDB. Whether callback returns {:update, body}, then Couch Normalizer will update this document immediately.

Each normalized document has a special rev_history_ field which contains recent normalization info:

  "rev_history_" => {"title" => "1-example-scenario", "normpos" => 1}

Actually, normpos is a anchor and means that some document meets some scenario. So, for further normalization only 'user' documents without "normpos" => 1 will be processed and updated.

All further migrations should be called as 2-..., 3-.... In case when Couch Normalizer processes a document which hasn't yet a normpos, then processing engine will try to apply a document form the 1-... to X-....

Check more advanced examples:

Scenario DSL API

A CouchNormalizer.Scenario module would be a good start point.

Normalization HTTP API

Check the couch_normalizer_httpd_db module for examples and documentation.

Installation Quickstart

After downloading, type:

make setup              # get-deps compile test
make get-couchdb-deps   # Optional: clone couch db 1.2.x git from apache repos if you want to use a CouchDB as dependency
make setup-dev-couchdb  # Optional: install CouchDB development version, and you'll have a `deps/couchdb/utils/./run -i`

After passed tests, you will be ready for final configuration step:

put in elixir and couch_normalizer ebins to couchdb bash script

ELIXIR_PA_OPTIONS="-pa /var/www/couch_normalizer/current/deps/elixir/lib/elixir/ebin"
COUCH_NORMALIZER_PA_OPTIONS="-pa /var/www/couch_normalizer/current/ebin"
ERL_START_OPTIONS="$ERL_OS_MON_OPTIONS -sasl errlog_type error +K true +A 4 $ELIXIR_PA_OPTIONS $COUCH_NORMALIZER_PA_OPTIONS"

configure CouchDB local.ini config

[httpd_db_handlers]
_normalize = {couch_normalizer_httpd_db, handle_normalize_req}

[daemons]
couch_normalizer_manager={couch_normalizer_manager, start_link, [[{seed_labeled, [{scenarios_path, "/path/to/scenarios"}, {num_workers, 5}]}]]}

That is it. Stay tuned!

License

Couch Normalizer source code is released under Apache 2 License. Check LICENSE and NOTICE files for more details.

couch_normalizer's People

Contributors

opakalex avatar zatvobor avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.