abhi-bit / gouch Goto Github PK
View Code? Open in Web Editor NEWCouchbase view-engine prototype in Golang
Couchbase view-engine prototype in Golang
runtime.Memstats
dump reports too much memory allocated for a index file with 10M KV pairs(350M size on disk)
2015/10/12 23:10:12 bytes allocated and not yet freed: 280744
2015/10/12 23:10:12 bytes allocated (even if freed): 336824
2015/10/12 23:10:12 bytes allocated and not yet freed: 280744
2015/10/12 23:10:12 bytes obtained from system: 884736
2015/10/12 23:10:12 number of mallocs: 722
2015/10/12 23:10:12 number of frees: 179
2015/10/12 23:10:12 Number of GC runs: 5
2015/10/12 23:10:12 GC pause(in ns): 419113
2015/10/12 23:11:36 bytes allocated and not yet freed: 699120
2015/10/12 23:11:36 bytes allocated (even if freed): 16715914808
2015/10/12 23:11:36 bytes allocated and not yet freed: 699120
2015/10/12 23:11:36 bytes obtained from system: 1949696
2015/10/12 23:11:36 number of mallocs: 272587154
2015/10/12 23:11:36 number of frees: 272581662
2015/10/12 23:11:36 Number of GC runs: 46154
2015/10/12 23:11:36 GC pause(in ns): 9995754085
2015/11/17 14:25:51 Sub-query call failed against http://apple:9094/query?limit=10: Get http://apple:9094/query?limit=10: dial tcp 172.16.12.33:9094: cannot assign requested address
2015/11/17 14:25:51 Sub-query call failed against http://apple:9093/query?limit=10: Get http://apple:9093/query?limit=10: dial tcp 172.16.12.33:9093: cannot assign requested address
Seeing above error, when firing queries using
wrk -c 1000 -t 1000 -d 10 --timeout 10 http://apple:9091/query?limit=10
9091 is where cluster manager is running
5.9GB Garbage in less than a minute after serving 1K queries with limit=10000.
memory profile with GOGC
=off - https://s3.amazonaws.com/uploads.hipchat.com/55340/599331/bAGQH9A7ZWOoBDK/pprof004.svg
Compare and benchmark gouch
with CGO implementation of couchstore
B-Tree traversal
key
and value
fields.beer-sample
view index.Options to try:
couchstore
implementation for couch_viewdump
from https://github.com/abhi-bit/couchstore/tree/view_dump. Print out ascii values and compare between C and Golang.gouch
:)Right now, support is only there for single node. Need to implement logic to handle multi-node cluster like in Couchbase and fire subqueries accordingly to requests from client.
CPU profile shows majority of time being spent in snappy decode, file reads(from disk or memory) and GC. Sync.pool might help in lowering down CPU cycles for these areas by reusing memory byte buffers.
Right now traversal doesn't support limit and crashes with StartKey/End
Incorporate tests to make sure responses between standalone Erlang view-query engine and Go view-query prototype is consistent
(pprof) top20
60110ms of 86060ms total (69.85%)
Dropped 132 nodes (cum <= 430.30ms)
Showing top 20 nodes out of 80 (cum >= 5800ms)
flat flat% sum% cum cum%
9430ms 10.96% 10.96% 19910ms 23.14% bytes.(*Buffer).WriteByte
9350ms 10.86% 21.82% 9350ms 10.86% scanblock
9000ms 10.46% 32.28% 15270ms 17.74% bytes.(*Buffer).grow
5350ms 6.22% 38.50% 33680ms 39.14% encoding/json.Indent
4960ms 5.76% 44.26% 6320ms 7.34% runtime.mallocgc
2910ms 3.38% 47.64% 2910ms 3.38% runtime.MSpan_Sweep
2390ms 2.78% 50.42% 2450ms 2.85% encoding/json.stateInString
2240ms 2.60% 53.02% 2240ms 2.60% runtime.memmove
1930ms 2.24% 55.26% 1930ms 2.24% runtime.writebarrierptr
1910ms 2.22% 57.48% 5540ms 6.44% encoding/json.(*encodeState).string
1700ms 1.98% 59.46% 1810ms 2.10% encoding/json.stateEndValue
1630ms 1.89% 61.35% 4230ms 4.92% bytes.(*Buffer).WriteString
1170ms 1.36% 62.71% 1790ms 2.08% encoding/json.stateBeginValue
1110ms 1.29% 64.00% 14210ms 16.51% encoding/json.(*structEncoder).encode
910ms 1.06% 65.06% 2280ms 2.65% runtime.deferreturn
900ms 1.05% 66.11% 1290ms 1.50% encoding/json.state1
880ms 1.02% 67.13% 970ms 1.13% github.com/golang/snappy.Decode
830ms 0.96% 68.09% 1100ms 1.28% strconv.formatBits
760ms 0.88% 68.98% 2320ms 2.70% runtime.makeslice
750ms 0.87% 69.85% 5800ms 6.74% runtime.newobject
(pprof)
Possible options:
json.NewEncoder
and see if it helps in lowering the CPU footprint(by passing smaller buffers).Value2JSON
conversion.Currently on single node setup, results are already sorted because it's sorted in B-Tree like that. Will multi-node support, we would require "scatter-gather" logic in place like existing Erlang implementation.
Will require reviewing existing algorithm in Erlang.
Add test cases that verifies the sanity of exported functions from gouch
library and makes sure code checkins aren't breaking existing APIs
Options:
Output useful data from B-Tree:
Objective of this is see if the throughput limitation is a manifestation of HTTP protocol used by default. I think we "might" even be able to do close to 10x throughput using a server over binary protocol compared to HTTP interface.
With every git push, run all tests with different limit options for 60s or so and store in benchdb
. Right now this is all manual and can be painful.
Compare Erlang and Go version for view index files with 10/100M KV pairs.
To track wiring up of C based merger via CGO interface correctly.
Articles to look at:
Tool to plot STW, sweeping, marking & waiting using GC Trace
Right now there isn't any automated way to verify at sanity of responses from Go version of view-query. Something similar to Random Query Generator utility for mysql and N1QL would be helpful.
As it stands right now, "gouch" throughput looks kind of stuck at 1.2K requests per sec whereas standalone Erlang version using couchdb master is doing close to 4K requests per sec.
limit Erlang (Req/sec) Erlang (MB/Sec) Golang (Req/sec) Golang (MB/Sec)
10 3710 2.51 1191 0.868
50 2413 7.14 1146 3.42
100 1245 7.24 1087 6.32
200 649 7.5 995 11.71
500 276 7.93 800 23.71
1000 129 7.48 592 35.24
2000 60 7.22 393 46.93
3000 40 7.05 293 52.75
5000 20 7.14 191 58.1
10000 10 8.08 100 62.17
50000 0 0 17 66.54
100000 0 0 7 66.82
Need to understand the bottleneck. Is inefficient implementation of limit
in "gouch" or something else is slowing things down with smaller limit
values.
Right now, gouch
library is only doing bytes.Compare
as comparator function. Need to switch to robust JSON collation mechanism.
Options:
couchstore
- collate JSONUse benchdb to store benchmark results and tag them with git sha. Maybe it can be added as git hook or something.
go test
has a download option i.e to download test-data from somewhere, so could use that to download index files with 10M or 100M rows
When we talk about cluster of nodes, there is need to merge responses from multiple nodes in utf-8 sort order and send it back to the client who made query request.
Right now, curl --raw
shows that chunking isn't at row level instead it seems to be flushed after every 800 chars or something.
Existing erlang version already does this
Given some promising numbers by exposing view-engine over binary protocol over a TCP server, need to have standard utility to benchmark the TCP interface(would probably expose it alongside existing HTTP interface).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.