GithubHelp home page GithubHelp logo

mosuka / phalanx Goto Github PK

View Code? Open in Web Editor NEW
352.0 2.0 26.0 15.98 MB

Phalanx is a cloud-native distributed search engine that provides endpoints through gRPC and traditional RESTful API.

License: Apache License 2.0

Shell 0.51% Makefile 0.94% Go 96.64% Dockerfile 0.16% HTML 0.25% JavaScript 0.48% CSS 1.01%
cloud-native distributed search engine go golang grpc restful-api gossip-protocol object-storage

phalanx's Introduction

Phalanx

Phalanx is a cloud-native distributed search engine written in Go built on top of Bluge that provides endpoints through gRPC and traditional RESTful API.
Phalanx implements a cluster formation by hashicorp/memberlist and managing index metadata on etcd, so it is easy to bring up a fault-tolerant cluster.
Metrics for system operation can also be output in Prometheus exposition format, so that monitoring can be done immediately using Prometheus.
Phalanx is using object storage for the storage layer, it is only responsible for the computation layer, such as indexing and retrieval processes. Therefore, scaling is easy, and you can simply add new nodes to the cluster.
Currently, it is an alpha version and only supports Amazon S3 and MinIO as the storage layer, but in the future it will support Google Cloud Storage, and Azure Blob Storage.

Architecture

Phalanx is a master node-less distributed search engine that separates the computation layer for searching and indexing from the storage layer for persisting the index. The storage layer is designed to use object storage on public clouds such as Amazon S3, Google Cloud Storage, and Azure Blob Storage.

Phalanx makes it easy to bring up a distributed search engine cluster. A phalanx cluster simply adds nodes when its resources are run out. Of course, it can also simply shut down nodes that are not needed. Indexes are managed by object storage, so there is no need to worry about index placement. No complex operations are required. Clusters are very flexible and scalable.

Phalanx stores index metadata in etcd. The metadata stores the index and the path of the shards under that index. The nodes process the distributed index based on the metadata stored in etcd.

Phalanx also uses etcd as a distributed lock manager to ensure that updates to a single shard are not made on multiple nodes at the same time.

phalanx_architecture

Build

Building Phalanx as following:

% git clone https://github.com/mosuka/phalanx.git
% cd phalanx
% make build

You can see the binary file when build successful like so:

% ls ./bin
phalanx

Table of Contents

phalanx's People

Contributors

dependabot[bot] avatar mosuka avatar wolfeidau avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

phalanx's Issues

Allow the DynamoDB metadata store to send changes made to DynamoDB as storage events.

Allow the DynamoDB metadata store to send changes made to DynamoDB as storage events.

Implement the following watch function.

func (m *DynamodbStorage) watch() error {
// Watch file system event.
go func() {
for {
select {
case cancel := <-m.stopWatching:
// check
if cancel {
return
}
// TODO: implement
// case DynamoDB events:
// Catches changes made to the database and sends storage events to the event channel.
}
}
}()
return nil
}

It would be better if it could be implemented in the same way as the etcd metadata store.

func (m *EtcdStorage) watch() error {

Index Update Call

Great project. Is there an update mechanism for an index - adding a field, or updating index type, for example.

[BUG] Panic: runtime error on invalid index creation

Context: phalanx with file storage configured
When I made a call to create index with minio & etcd took down whole phalanx with panic error:

phalanx {"_level_":"info","_timestamp_":"2022-09-28T08:19:20.986Z","_name_":"phalanx","_caller_":"server/index_service.go:198","_message_":"opening index writers"} phalanx {"_level_":"error","_timestamp_":"2022-09-28T08:19:20.986Z","_name_":"phalanx.manager.writer.directory","_caller_":"directory/directory_minio.go:48","_message_":"Endpoint: does not follow ip address or domain name standards.","uri":"minio://phalanx/indexes/product/shard-2qpDDnjJ"} phalanx panic: runtime error: invalid memory address or nil pointer dereference phalanx [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x11ea50d] phalanx phalanx goroutine 49 [running]: phalanx github.com/mosuka/phalanx/directory.(*MinioDirectory).exists(0x0) phalanx /go/src/github.com/mosuka/phalanx/directory/directory_minio.go:77 +0x4d phalanx github.com/mosuka/phalanx/directory.(*MinioDirectory).Setup(0x0, 0xb) phalanx /go/src/github.com/mosuka/phalanx/directory/directory_minio.go:95 +0x53 phalanx github.com/blugelabs/bluge/index.OpenWriter({{0x15b6c0b, 0x3}, 0x1, 0xc0002263c0, 0x0, 0x0, 0x0, {0xa, 0x4c4b40, 0x4024000000000000, ...}, ...}) phalanx /go/pkg/mod/github.com/blugelabs/[email protected]/index/writer.go:85 +0x394 phalanx github.com/blugelabs/bluge.OpenWriter({{{0x15b6c0b, 0x3}, 0x1, 0xc0002263c0, 0x0, 0x0, 0x0, {0xa, 0x4c4b40, 0x4024000000000000, ...}, ...}, ...}) phalanx /go/pkg/mod/github.com/blugelabs/[email protected]/writer.go:36 +0xb8 phalanx github.com/mosuka/phalanx/index.(*IndexWriters).open(0xc00020b4a0, {0xc0005fc88a, 0x7}, {0xc0003a9a62, 0xe}, 0xc0005fe5a0, 0xc0001a8b00) phalanx /go/src/github.com/mosuka/phalanx/index/writer.go:107 +0x3b8 phalanx github.com/mosuka/phalanx/index.(*IndexWriters).Open(0xc00020b4a0, {0xc0005fc88a, 0x7}, {0xc0003a9a62, 0xe}, 0x0, 0x0) phalanx /go/src/github.com/mosuka/phalanx/index/writer.go:128 +0xe7 phalanx github.com/mosuka/phalanx/server.(*IndexService).assignShardsToNode(0xc000124a80) phalanx /go/src/github.com/mosuka/phalanx/server/index_service.go:216 +0x3ce5 phalanx github.com/mosuka/phalanx/server.(*IndexService).Start.func1() phalanx /go/src/github.com/mosuka/phalanx/server/index_service.go:114 +0x692 phalanx created by github.com/mosuka/phalanx/server.(*IndexService).Start phalanx /go/src/github.com/mosuka/phalanx/server/index_service.go:86 +0x5b

The expected behaviour should be an appropriate HTTP error code returned, index creation attempt should not happened and the application should be functional

Client CLI generator

https://github.com/NathanBaulch/protoc-gen-cobra

Would you consider a CLI / CTL for the server ?

The above golang lib will take a grpc proto and generate the golang cli code.
Because tis GRPC you can make it a CLI and a CTL, Because you can use the generated golang code from GRPC to call in process or out of process.

I used it on other projects and it works great.

so then you can do forward engineering... update the Proto and generate the Client CLI

Search of documents in cloud-native (Google Kubernetes Cluster) fails

We are trying to use Phalanx as a search engine and we see that Search of documents after inserting in cloud-native (GKE) as well as stand-along docker steps does not allow for the documents to be returned after a search POST call.

I am seeing the following error during initialization of Phalanx after few documents are inserted:
{"_level_":"error","_timestamp_":"2023-06-21T23:05:42.632-0700","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:144","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-4x76b7O1"}

While doing the search, I am seeing the following warning and no documents returned back:
{"_level_":"warn","_timestamp_":"2023-06-21T23:16:02.958-0700","_name_":"phalanx","_caller_":"server/index_service.go:1241","_message_":"no index readers are assigned","index_name":"example","shard_names":["shard-1TrQbp3v","shard-BALwDsbc","shard-M9UIob2J","shard-linNfk1K","shard-cgllS7mr","shard-Rjjugv2u","shard-4x76b7O1","shard-0AY1YYWM","shard-p5JJAniy","shard-ns5Tb3lf"]}

Our setup is as follows and we have tried both in GKE as well as standalone docker:

  • Google Kubernetes Cluster - running etcd, minIO and phalanx as Pods

When running it in a docker mode, I see that the Search is actually crashing with SIGSEGV:

  • Steps:
    ** run the docker command that is posted in the documentation above
    ** add the index with index_uri=file:///tmp/phalanx/metadata
    ** add few documents using curl -XPUT which I see those are added in the index and I could see that in the index metastore
    ** Try to search using curl POST call and I see SIGSEGV error
curl -X POST http://localhost:8000/v1/indexes/example/_search -d `{
    "query": "text:example",
    "boost": 1.0,
    "start": 0,
    "num": 10,
    "sort_by": "-_score",
    "fields": [
        "id",
        "text"
    ]
}`

api_1 | {"_level_":"info","_timestamp_":"2023-06-22T20:33:01.158Z","_name_":"phalanx","_caller_":"server/index_service.go:103","_message_":"shard metadata has been created","metastore_event":{"Type":3,"Index":"example","Shard":"shard-pWJ2XGkd"}} api_1 | {"_level_":"info","_timestamp_":"2023-06-22T20:33:01.158Z","_name_":"phalanx","_caller_":"server/index_service.go:198","_message_":"opening index writers"} api_1 | {"_level_":"error","_timestamp_":"2023-06-22T20:33:01.161Z","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:129","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-IGKNAfrL"} api_1 | {"_level_":"warn","_timestamp_":"2023-06-22T20:33:01.161Z","_name_":"phalanx","_caller_":"server/index_service.go:301","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-IGKNAfrL"} api_1 | {"_level_":"error","_timestamp_":"2023-06-22T20:33:01.165Z","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:129","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-m65GeEjC"} api_1 | {"_level_":"warn","_timestamp_":"2023-06-22T20:33:01.165Z","_name_":"phalanx","_caller_":"server/index_service.go:301","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-m65GeEjC"} api_1 | {"_level_":"error","_timestamp_":"2023-06-22T20:33:01.168Z","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:129","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-Bx05hbMk"} api_1 | {"_level_":"warn","_timestamp_":"2023-06-22T20:33:01.168Z","_name_":"phalanx","_caller_":"server/index_service.go:301","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-Bx05hbMk"} api_1 | {"_level_":"error","_timestamp_":"2023-06-22T20:33:01.171Z","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:129","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-yFbRK33L"} api_1 | {"_level_":"warn","_timestamp_":"2023-06-22T20:33:01.171Z","_name_":"phalanx","_caller_":"server/index_service.go:301","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-yFbRK33L"} api_1 | {"_level_":"error","_timestamp_":"2023-06-22T20:33:01.174Z","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:129","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-Hp3BVksK"} api_1 | {"_level_":"warn","_timestamp_":"2023-06-22T20:33:01.174Z","_name_":"phalanx","_caller_":"server/index_service.go:301","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-Hp3BVksK"} api_1 | {"_level_":"error","_timestamp_":"2023-06-22T20:33:01.177Z","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:129","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-DSwnLR02"} api_1 | {"_level_":"warn","_timestamp_":"2023-06-22T20:33:01.177Z","_name_":"phalanx","_caller_":"server/index_service.go:301","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-DSwnLR02"} api_1 | {"_level_":"error","_timestamp_":"2023-06-22T20:33:01.180Z","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:129","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-ecaa880o"} api_1 | {"_level_":"warn","_timestamp_":"2023-06-22T20:33:01.180Z","_name_":"phalanx","_caller_":"server/index_service.go:301","_message_":"error opening index: unable to find a usable snapshot","index_name":"example","shard_name":"shard-ecaa880o"} api_1 | {"_level_":"error","_timestamp_":"2023-06-22T20:34:23.968Z","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:162","_message_":"shard does not exist","index_name":"example","shard_name":"shard-m65GeEjC"} api_1 | {"_level_":"warn","_timestamp_":"2023-06-22T20:34:23.968Z","_name_":"phalanx","_caller_":"server/index_service.go:1088","_message_":"shard does not exist","index_name":"example","shard_name":"shard-m65GeEjC"} api_1 | {"_level_":"error","_timestamp_":"2023-06-22T20:34:23.968Z","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:162","_message_":"shard does not exist","index_name":"example","shard_name":"shard-Bx05hbMk"} api_1 | {"_level_":"warn","_timestamp_":"2023-06-22T20:34:23.968Z","_name_":"phalanx","_caller_":"server/index_service.go:1088","_message_":"shard does not exist","index_name":"example","shard_name":"shard-Bx05hbMk"} api_1 | {"_level_":"error","_timestamp_":"2023-06-22T20:34:23.968Z","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:162","_message_":"shard does not exist","index_name":"example","shard_name":"shard-IGKNAfrL"} api_1 | {"_level_":"warn","_timestamp_":"2023-06-22T20:34:23.968Z","_name_":"phalanx","_caller_":"server/index_service.go:1088","_message_":"shard does not exist","index_name":"example","shard_name":"shard-IGKNAfrL"} api_1 | {"_level_":"error","_timestamp_":"2023-06-22T20:34:23.968Z","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:162","_message_":"shard does not exist","index_name":"example","shard_name":"shard-DSwnLR02"} api_1 | {"_level_":"warn","_timestamp_":"2023-06-22T20:34:23.968Z","_name_":"phalanx","_caller_":"server/index_service.go:1088","_message_":"shard does not exist","index_name":"example","shard_name":"shard-DSwnLR02"} api_1 | {"_level_":"error","_timestamp_":"2023-06-22T20:34:23.968Z","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:162","_message_":"shard does not exist","index_name":"example","shard_name":"shard-ecaa880o"} api_1 | {"_level_":"warn","_timestamp_":"2023-06-22T20:34:23.968Z","_name_":"phalanx","_caller_":"server/index_service.go:1088","_message_":"shard does not exist","index_name":"example","shard_name":"shard-ecaa880o"} api_1 | {"_level_":"error","_timestamp_":"2023-06-22T20:34:23.968Z","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:162","_message_":"shard does not exist","index_name":"example","shard_name":"shard-yFbRK33L"} api_1 | {"_level_":"warn","_timestamp_":"2023-06-22T20:34:23.968Z","_name_":"phalanx","_caller_":"server/index_service.go:1088","_message_":"shard does not exist","index_name":"example","shard_name":"shard-yFbRK33L"} api_1 | {"_level_":"error","_timestamp_":"2023-06-22T20:34:23.968Z","_name_":"phalanx.manager.reader","_caller_":"index/reader.go:162","_message_":"shard does not exist","index_name":"example","shard_name":"shard-Hp3BVksK"} api_1 | {"_level_":"warn","_timestamp_":"2023-06-22T20:34:23.968Z","_name_":"phalanx","_caller_":"server/index_service.go:1088","_message_":"shard does not exist","index_name":"example","shard_name":"shard-Hp3BVksK"} api_1 | panic: runtime error: invalid memory address or nil pointer dereference api_1 | [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x122640b] api_1 | api_1 | goroutine 75 [running]: api_1 | github.com/mosuka/phalanx/server.(*IndexService).Search.func1() api_1 | /go/src/github.com/mosuka/phalanx/server/index_service.go:1108 +0x120b api_1 | golang.org/x/sync/errgroup.(*Group).Go.func1() api_1 | /go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:57 +0x67 api_1 | created by golang.org/x/sync/errgroup.(*Group).Go api_1 | /go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:54 +0x92 phalanx_api_1 exited with code 2
Any help is appreciated.

Golang search ecosystem opportunity ?

Hey @mosuka and @prabhatsharma and @mschoch

I raised an Issue here that i would like you to have a look at if you don't mind: opensearch-project/opensearch-go#82

As a gopher i really like to run golang everywhere and i see a great synergy / opportunity here to really get a great Search ecosystem happening for golang.

I want to stress that there is Text search and Elastic Search / opensearch-go both have different needs of course.
But they both need an easy to run solution for gophers.
So i wonder if there is some happy harmony possibility here ?
Zinc is the API with a single non HA solution (Maintainer is @prabhatsharma)
Phalanx has a HA solution for bluge (Maintainer is @mosuka)

Zinc provides the API and single server solution.
Phalanx could match the ZinC API and so provide a HA solution. Phalanx would also still provide its Facetted Text Search

I am probably missing lots of detail here i know, but it woudl be great to know your thoughts.
Maybe my proposed design solution is not optimal. But i think you can see my intent .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.