GithubHelp home page GithubHelp logo

pyr / cyanite Goto Github PK

View Code? Open in Web Editor NEW
446.0 52.0 79.0 1.16 MB

cyanite stores your metrics

Home Page: http://cyanite.io

License: Other

Python 0.59% Shell 5.43% Clojure 89.63% Makefile 1.72% Java 2.62%

cyanite's Introduction

Cyanite

Cyanite is a daemon which provides services to store and retrieve timeseries data. It aims to serve as a drop-in replacement for Graphite/Graphite-web.

Getting Started

Before you begin, make sure you have the following installed:

You can download the latest distribution of graphite from GitHub releases and start it with:

java -jar <path-to-cyanite-jar>.jar --path <path-to-cyanite-config>.yaml

See default configuration and basic configuration options.

For advanced usage and information on possible Cyanite optimisations, refer to configuration guide.

Getting help

You can get help by creating an issue or asking on IRC channel #cyanite on freenode.

For more information, refer to http://cyanite.io

cyanite's People

Contributors

addisonj avatar bfritz avatar brutasse avatar eckardt avatar emanuelis avatar ifesdjeen avatar jshook avatar kirillk77 avatar luckyswede avatar michaelhierweck avatar mpenet avatar neilprosser avatar ogg1e avatar petecheslock avatar pyr avatar ryan-williams avatar sanmai avatar tcoupland avatar tehlers320 avatar zbintliff avatar zx-zheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cyanite's Issues

Why?

Hello,

Sorry for the subject, I figured it would be eye catching ;). So I have a question, this project is intriguing but I am curious why it exists? What problem are you solving, and how well are you solving it ? If this a performance issue, and if so what sort of improvements have you seen ? Thanks, and looking forward to hearing from you .

-John

Add a statsd input

It would not take much to have a statsd listener in addition to the carbon one, our
storage schema in cassandra makes the listener part stateless, which is a nice addition

commit 645af

Seems like the master branch is broken at this time? I just tried commit (in subject) and I am getting errors, is this a known issue?

"Exception: " #<FileNotFoundException java.io.FileNotFoundException: Could not locate org/spootnik/cyanite/logging__init.class or org/spootnik/cyanite/logging.clj on classpath: >
Exception in thread "main" clojure.lang.ExceptionInfo: no such namespace: org.spootnik.cyanite.logging/start-logging {}
at clojure.core$ex_info.invoke(core.clj:4554)
at org.spootnik.cyanite.config$instantiate.invoke(config.clj:95)
at org.spootnik.cyanite.config$get_instance.invoke(config.clj:102)
at clojure.lang.AFn.applyToHelper(AFn.java:156)
at clojure.lang.AFn.applyTo(AFn.java:144)
at clojure.core$apply.invoke(core.clj:628)
at clojure.core$update_in.doInvoke(core.clj:5853)
at clojure.lang.RestFn.invoke(RestFn.java:467)
at org.spootnik.cyanite.config$init.invoke(config.clj:122)
at org.spootnik.cyanite$_main.doInvoke(cyanite.clj:31)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at org.spootnik.cyanite.main(Unknown Source)

High CPU usage and no metrics

I'm using the latest (as of now) clone of Cyanite and Cassandra 2.0.5 on a single node. I've created the metric namespace by doing bin/cqlsh < doc/schema.cql and cyanite has been packaged using leiningen 2.3.4.

When I start up cyanite using the configuration given in README.md I can see it using a large amount of CPU and after I've sent metrics to it using echo "test.metric $RANDOM date +%s" | nc 127.0.0.1 2003 I'm not seeing anything when I do SELECT * FROM metric.metric ; on Cassandra.

If I crank up the logging level to trace in cyanite.yaml I can see the following fly by multiple times a second in cyanite.log:


---
TRACE [2014-03-09 16:59:32,852] clojure-agent-send-off-pool-0 - com.datastax.driver.core.Connection - [localhost/127.0.0.1-2] writing request QUERY SELECT path from metric;
TRACE [2014-03-09 16:59:32,852] New I/O worker #2 - com.datastax.driver.core.Connection - [localhost/127.0.0.1-2] request sent successfully
TRACE [2014-03-09 16:59:32,860] New I/O worker #2 - com.datastax.driver.core.Connection - [localhost/127.0.0.1-2] received: ROWS [path(metric, metric), org.apache.cassandra.db.marshal.UTF8Type]

---

I've never used Cassandra before but set it up using these instructions and everything looked good. I was able to insert data and retrieve it.

This has happened to me on Mac and Linux (both using Java 1.7.0_51). I'm hoping there's something basic that I've missed. Any ideas?

pickle receiver support?

I am currently running a carbon-relay -> carbon-aggregator -> carbon-cache setup and since carbon-cache performs poor I am looking into alternative backends.

To be able to use cyanite as a carbon-cache replacement it needs to accept metrics in as python pickles since this is how carbon-aggregator forwards metrics to carbon-cache.

I have found your interesting blog-post [1] about the python pickle format but I wasn't able to find out if it is already possible, and if so how, to configure cyanite to accept carbon's pickle output.

It would be great if you could give me a hint. I'm happy to contribute documentation updates as a pull request if you could point out here how to achieve this.

[1] http://spootnik.org/entries/2014/04/05_diving-into-the-python-pickle-format.html

Rollups Clarification

So my test cluster has been filling up it's disks at a rate MUCH higher than I expected. I went digging to see how the data was actually being stored in Cassandra. I was pretty surprised when I saw that the lower resolution "rollups" were simply a row with an array, "data", that contains every single value for that path during the time period. i.e.:
Given sending stats every 10s (i.e.: with statsd)
rollups defined in cyanite.yaml:
10s:1d = 8,640 rows (each row with 1 value in "data")
1m:7d = 10,080 rows (each row with 6 values in "data")
5m:365d = 525,600 rows (each row with 30 values in "data")
So for each unique metric at the end of a year, I would have 544,320 rows, with 15,837,120 values total in the "data" arrays.

When querying a "lower resolution" value (i.e.: 5m in my example), I believe cyanite is returning the average of the values in that row's "data" array.

Is this correct?

[ Webservice ]. Cassandra has data but API does not response result.

Dear,
I setup cyanite and it works ok when i push some metrics. But when i push so many data in 2, 3 days, cyanite not display new metrics on graphite web. So i restart cyanite and all metrics not display on graphite-web. I check cyanite API: http://10.30.12.133:8080/paths?query=*, result = [] , fletch data : http://10.30.12.133:8080/metrics?path=UP_ZME_Test_30_14_10_30_12_42.loadavg.1min&from=1393150441&to=1393200441 , result = {"error":"LIMIT must be strictly positive"} . Please help me, and i don't understand rollups ("period and rollup") in config and why has 2 rollups? . I tried setup cyanite 3 times and all error with webservice.

Result Cassandra query:
cqlsh:metric> select path,data,time from metric where path in ('UP_ZME_Test_30_14_10_30_12_42.loadavg.1min') and rollup = 600 and period = 105120 and time >= 1393212735 and time <= 1394212735 order by time asc limit 1;

path | data | time
--------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------
UP_ZME_Test_30_14_10_30_12_42.loadavg.1min | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] | 1393213200

Can't fetch data from graphite-web to cyanite

When querying from graphite-web to cyanite I get the following stacktrace :

DEBUG [2014-02-26 09:39:24,730] New I/O worker #7 - so.grep.cyanite.http - got request:  {:remote-addr 127.0.0.1, :scheme :http, :request-method :get, :query-string path=d-dbsinf-0001_adm_dev10_aub_example_net.df-boot.df_inodes-free&from=1393321164&to=1393407564, :action :metrics, :content-type nil, :keep-alive? false, :uri /metrics, :server-name localhost, :params {:path d-dbsinf-0001_adm_dev10_aub_example_net.df-boot.df_inodes-free, :from 1393321164, :to 1393407564}, :headers {user-agent Python-urllib/2.7, connection close, host 127.0.0.1:8080, accept-encoding identity}, :content-length nil, :server-port 8080, :character-encoding nil, :body nil}
DEBUG [2014-02-26 09:39:24,730] New I/O worker #7 - so.grep.cyanite.http - fetching paths:  d-dbsinf-0001_adm_dev10_aub_example_net.df-boot.df_inodes-free
DEBUG [2014-02-26 09:39:24,735] New I/O worker #7 - so.grep.cyanite.store - fetching paths from store:  (d-dbsinf-0001_adm_dev10_aub_example_net.df-boot.df_inodes-free) 10 60480 1393321164 1393407564 8641
ERROR [2014-02-26 09:39:24,763] New I/O worker #7 - so.grep.cyanite.http - could not process request
java.lang.NullPointerException
        at clojure.lang.Numbers.ops(Numbers.java:942)
        at clojure.lang.Numbers.lt(Numbers.java:219)
        at clojure.core$_LT_.invoke(core.clj:859)
        at clojure.core$range$fn__4269.invoke(core.clj:2668)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:484)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core$map$fn__4207.invoke(core.clj:2479)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:60)
        at clojure.lang.RT.seq(RT.java:484)
        at clojure.core$seq.invoke(core.clj:133)
        at clojure.core.protocols$seq_reduce.invoke(protocols.clj:30)
        at clojure.core.protocols$fn__6026.invoke(protocols.clj:54)
        at clojure.core.protocols$fn__5979$G__5974__5992.invoke(protocols.clj:13)
        at clojure.core$reduce.invoke(core.clj:6177)
        at so.grep.cyanite.store$fetch.invoke(store.clj:232)
        at so.grep.cyanite.http$fn__14101.invoke(http.clj:79)
        at clojure.lang.MultiFn.invoke(MultiFn.java:227)
        at so.grep.cyanite.http$wrap_process$fn__14111.invoke(http.clj:97)
        at so.grep.cyanite.http$wrap_process.invoke(http.clj:93)
        at so.grep.cyanite.http$start$handler__14122.invoke(http.clj:115)
        at aleph.http.netty$start_http_server$fn$reify__13496$stage0_13482__13497.invoke(netty.clj:77)
        at aleph.http.netty$start_http_server$fn$reify__13496.run(netty.clj:77)
        at lamina.core.pipeline$fn__3666$run__3673.invoke(pipeline.clj:31)
        at lamina.core.pipeline$resume_pipeline.invoke(pipeline.clj:61)
        at lamina.core.pipeline$start_pipeline.invoke(pipeline.clj:78)
        at aleph.http.netty$start_http_server$fn$reify__13496.invoke(netty.clj:77)
        at aleph.http.netty$start_http_server$fn__13479.invoke(netty.clj:77)
        at lamina.connections$server_generator_$this$reify__13275$stage0_13261__13276.invoke(connections.clj:376)
        at lamina.connections$server_generator_$this$reify__13275.run(connections.clj:376)
        at lamina.core.pipeline$fn__3666$run__3673.invoke(pipeline.clj:31)
        at lamina.core.pipeline$resume_pipeline.invoke(pipeline.clj:61)
        at lamina.core.pipeline$start_pipeline.invoke(pipeline.clj:78)
        at lamina.connections$server_generator_$this$reify__13275.invoke(connections.clj:376)
        at lamina.connections$server_generator_$this__13258.invoke(connections.clj:376)
        at lamina.connections$server_generator_$this__13258.invoke(connections.clj:371)
        at lamina.trace.instrument$instrument_fn$fn__6374$fn__6408.invoke(instrument.clj:140)
        at lamina.trace.instrument$instrument_fn$fn__6374.invoke(instrument.clj:140)
        at clojure.lang.AFn.applyToHelper(AFn.java:161)
        at clojure.lang.RestFn.applyTo(RestFn.java:132)
        at clojure.lang.AFunction$1.doInvoke(AFunction.java:29)
        at clojure.lang.RestFn.invoke(RestFn.java:408)
        at lamina.connections$server_generator$fn$reify__13322.run(connections.clj:407)
        at lamina.core.pipeline$fn__3666$run__3673.invoke(pipeline.clj:31)
        at lamina.core.pipeline$resume_pipeline.invoke(pipeline.clj:61)
        at lamina.core.pipeline$subscribe$fn__3699.invoke(pipeline.clj:118)
        at lamina.core.result.ResultChannel.success_BANG_(result.clj:388)
        at lamina.core.result$fn__1349$success_BANG___1352.invoke(result.clj:37)
        at lamina.core.queue$dispatch_consumption.invoke(queue.clj:111)
        at lamina.core.queue.EventQueue.enqueue(queue.clj:327)
        at lamina.core.queue$fn__1980$enqueue__1995.invoke(queue.clj:131)
        at lamina.core.graph.node.Node.propagate(node.clj:282)

Error when starting the cyanite

I use the following command to start
sudo java -jar cyanite/target/cyanite-0.1.0.jar -f /etc/cyanite.yaml

I get the following errors

Exception in thread "main" clojure.lang.ExceptionInfo: Query prepare failed {:query "UPDATE metric USING TTL ? SET data = data + ? WHERE tenant = '' AND rollup = ? AND period = ? AND path = ? AND time = ?;", :type :qbits.alia/prepare-error, :exception #<SyntaxError com.datastax.driver.core.exceptions.SyntaxError: line 1:24 mismatched input '?' expecting INTEGER>}
at clojure.core$ex_info.invoke(core.clj:4403)
at qbits.alia$ex__GT_ex_info.invoke(alia.clj:168)
at qbits.alia$prepare.invoke(alia.clj:185)
at org.spootnik.cyanite.store$insertq.invoke(store.clj:33)
at org.spootnik.cyanite.store$cassandra_metric_store.invoke(store.clj:133)
at clojure.lang.Var.invoke(Var.java:379)
at org.spootnik.cyanite.config$instantiate.invoke(config.clj:91)
at org.spootnik.cyanite.config$get_instance.invoke(config.clj:99)
at clojure.lang.AFn.applyToHelper(AFn.java:156)
at clojure.lang.AFn.applyTo(AFn.java:144)
at clojure.core$apply.invoke(core.clj:626)
at clojure.core$update_in.doInvoke(core.clj:5698)
at clojure.lang.RestFn.invoke(RestFn.java:467)
at org.spootnik.cyanite.config$init.invoke(config.clj:121)
at org.spootnik.cyanite$_main.doInvoke(cyanite.clj:29)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at org.spootnik.cyanite.main(Unknown Source)
Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:24 mismatched input '?' expecting INTEGER
at com.datastax.driver.core.Responses$Error.asException(Responses.java:94)
at com.datastax.driver.core.SessionManager$2.apply(SessionManager.java:209)
at com.datastax.driver.core.SessionManager$2.apply(SessionManager.java:184)
at com.google.common.util.concurrent.Futures$1.apply(Futures.java:720)
at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:859)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Any help apprecaited.

~Thanks

Scaling Cyanite

How can I run multiple instances of Cyanite on the same machine with different settings. What configurations that I need to be aware of. Also I am using codahale in the code to publish metrics. I am using Graphite publisher. Do anyone have experience in configuring the multiple Cyanite/graphite in the codehale graphite publisher.

Thanks in advance.

Why?

Hello,

Sorry for the subject, I figured it would be eye catching ;). So I have a question, this project is intriguing but I am curious why it exists? What problem are you solving, and how well are you solving it ? If this a performance issue, and if so what sort of improvements have you seen ? Thanks, and looking forward to hearing from you .

-John

Rollups and inserts

If I have a rollup value of 10s, but insert data into Cassandra in 1s increments I noticed the data field is an array where I am actually inserting 10 records into the one rollup. Is this expected? Looking at the code it is, but was this a way of handling more of a statsd approach in cyanite?

Poor performance when retrieving metrics on large dataset

When retrieving metrics the response time is quite slow.

I have about 8 million metrics, each one being updated about once every 60 seconds. This come out to about 600 writes/sec according to opscenter with average write latencies significantly less than 1 ms, the cassandra clusters seems to have no trouble keeping up.

However, when hitting the API endpoint with a request that retrieves the last hour of metrics, it takes about 4 or 5 seconds to get the data. Often times it can be much worse and results in timeouts from the graphite-web frontend where the graphs fail to render.

Looking at opscenter, the read latencies don't seem to go above a few milliseconds, and the average read latency as reported by nodetool is just over 6ms.

Any clues as to where the bottleneck might be?

I have one cyanite being used just for writes, another just for reads, but it doesn't seem to make too big a difference.

Here is my config:

carbon:
  host: "0.0.0.0"
  port: 2003
  rollups:
    - period: 60480
      rollup: 10
    - period: 105120
      rollup: 600
http:
  host: "0.0.0.0"
  port: 8080
logging:
  level: info
  console: true
store:
  cluster: 'mycluster.com'
  keyspace: 'metric'

loosing metrics.

I recently have lots of lost metrics when rendering data from graphite-api or grafana.

http://graphite-api:8000/render?target=collectd.server.memory.memory-used&from=-1h

cyanite-data-losr

the lost rate is higher for recent metrics then for older.

I also noticed that the lost happens when I have this error in cyanite logs, (not sure if it's related)

ERROR [2014-07-11 11:24:55,209] New I/O worker #63 - lamina.core.utils - error on inactive probe: tcp-server:error
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at aleph.netty.core$cached_thread_executor$reify__8830$fn__8831.invoke(core.clj:78)
at clojure.lang.AFn.run(AFn.java:22)
at java.lang.Thread.run(Unknown Source)

I should also mention that I gather data from 25 installed in collectd servers, and the server where cyanite/cassandra are installed is 12 Go RAM good processor and SSDs.

I'am using cyanite commit e708113

the infra layout/config:

infra

can you help me identify/resolve the issue plz ? thanks!

TTL for metric tree objects (ElasticSearch)

There is planned to set/update TTLs in ES, however after recent patch #36 'index/present' used to avoid excessive updates. This logic breaks an ability to maintain tree objects' TTLs properly.

Any ideas on how to maintain tree index now?

Messages since library updates

Seeing a lot of this since 470ab39 (multiple times a minute):

WARN [2014-03-26 17:00:45,965] Cassandra Java Driver worker-0 - com.datastax.driver.core.Cluster - Re-preparing already prepared query UPDATE metric USING TTL ? SET data = data + ? WHERE tenant = '' AND rollup = ? AND period = ? AND path = ? AND time = ?;. Please note that preparing the same query more than once is generally an anti-pattern and will likely affect performance. Consider preparing the statement only once.

Also seeing this:

0    [main] 2014-03-26 16:52:41,148 WARN  com.datastax.driver.core.FrameCompressor  - Cannot find Snappy class, you should make sure the Snappy library is in the classpath if you intend to use it. Snappy compression will not be available for the protocol.
3    [main] 2014-03-26 16:52:41,151 WARN  com.datastax.driver.core.FrameCompressor  - Cannot find LZ4 class, you should make sure the LZ4 library is in the classpath if you intend to use it. LZ4 compression will not be available for the protocol.

... on stdout at startup.

Should mention that I'm using Cassandra 2.0.6.

What's missing in comparison to carbon and "production ready"?

In looking over the documentation and README, its not clear to me what functionality cyanite has in comparison to carbon.

I was able to get it going yesterday along with graphite-web and everything seems to be working (and its pretty awesome so far), but its murky whether the rollup and pruning functionality is already working. In other words, what maintenance should I expect to have to do?

Also, is this being used in production anywhere? I am about to create a new graphite cluster and really like the idea and would love to hear how your experience has been using it.

Questions about Cyanite features

Hi, we are working with the system Graphite-Cyanite-Cassandra, comparing it with other systems like Influxdb or Carbon-Whisper. We would like to know some features about Cyanite:

-There is any maximum in the temporal retention in the rollups? Days, months, years?

-We know that Whisper is a size fixed database because it reserves space in that for each group of retention, and if there is no data, whisper saves null values. The question is if Cyanite works like that or simply Cyanite doesn't reserve any space and the database is variable-sized in function of the data that receives?

Features about effectiveness on the manegement of the clustering:

We work with clusters of Cyanite and Elasticsearch for one side and with Cassandra cluster in the other side. Maybe this questions are more refered to the Cassandra cluster but we will appreciate any advice:

  • It is possible to obtain only the data contained in a determinate node of the cluster?
  • It is possible to restrict the maximum size by node?

Thank you very much for any help that you can give us!

Constant high CPU usage

Hi there,

Currently working with a carbon relay receiving about 1.2million metrics per minute.

For testing purposes, have deployed a two node cassandra cluster with each cassandra node having a cyanite process attempting to write metrics. This setup seems to function fine when I throw some simple stress-test metrics at it.

However, when I direct a portion of our production metrics at the cluster, CPU utilisation hops to near 100% for the cyanite process across all available cores (currently 8 per instance) - and continues to spin with this usage even after I cease sending metrics. Cassandra writes and CPU utilisation remain very low throughout this ~5% usage on a single core.

I initially thought that @addisonj had a pull request (#37) that would address this issue, as there were a number of exceptions being thrown in the cyanite.log file relating to badly formed metrics. However, after manually merging the pull request and retrying the issue persists (although the formatting exceptions are now being handled elegantly!)

Any pointers for this one? Quite excited to get cyanite working on our production metric volume!

-Paul

Expose interval information for paths

Hi,

graphite-api finders (like graphite-cyanite) expose a method, get_intervals() to provide hints about when a given path is valid from and to. Currently graphite-cyanite doesn't seem to have a way to query this information from cyanite, and returns that a given path is always valid for any given range.

If cyanite exposed this information, graphite-cyanite could make use of it, and the resulting frontends could make more sensible display information.

Cheers,

Query execution failed

Just doing a /metrics request for something which exists (although I've tweaked the name in the log message below) with a from querystring parameter gives Query execution failed in the response error message and the following entry in cyanite.log:

ERROR [2014-03-26 17:08:12,651] New I/O worker #7 - org.spootnik.cyanite.http - could not process request
clojure.lang.ExceptionInfo: Query execution failed {:values [("redacted.*.metrics") 60 4320 1395770400 1395853692 5556], :query #<BoundStatement com.datastax.driver.core.BoundStatement@3d8c3e90>, :type :qbits.alia/execute, :exception #<InvalidQueryException com.datastax.driver.core.exceptions.InvalidQueryException: Cannot page queries with both ORDER BY and a IN restriction on the partition key; you must either remove the ORDER BY or the IN and sort client side, or disable paging for this query>}
    at clojure.core$ex_info.invoke(core.clj:4403)
    at qbits.alia$ex__GT_ex_info.invoke(alia.clj:125)
    at qbits.alia$ex__GT_ex_info.invoke(alia.clj:127)
    at qbits.alia$execute.doInvoke(alia.clj:251)
    at clojure.lang.RestFn.invoke(RestFn.java:457)
    at org.spootnik.cyanite.store$fetch.invoke(store.clj:231)
    at org.spootnik.cyanite.http$fn__15785.invoke(http.clj:80)
    at clojure.lang.MultiFn.invoke(MultiFn.java:227)
    at org.spootnik.cyanite.http$wrap_process$fn__15797.invoke(http.clj:102)
    at org.spootnik.cyanite.http$wrap_process.invoke(http.clj:98)
    at org.spootnik.cyanite.http$start$handler__15808.invoke(http.clj:120)
    at aleph.http.netty$start_http_server$fn$reify__15180$stage0_15166__15181.invoke(netty.clj:77)
    at aleph.http.netty$start_http_server$fn$reify__15180.run(netty.clj:77)
    at lamina.core.pipeline$fn__3632$run__3639.invoke(pipeline.clj:31)
    at lamina.core.pipeline$resume_pipeline.invoke(pipeline.clj:61)
    at lamina.core.pipeline$start_pipeline.invoke(pipeline.clj:78)
    at aleph.http.netty$start_http_server$fn$reify__15180.invoke(netty.clj:77)
    at aleph.http.netty$start_http_server$fn__15163.invoke(netty.clj:77)
    at lamina.connections$server_generator_$this$reify__14959$stage0_14945__14960.invoke(connections.clj:376)
    at lamina.connections$server_generator_$this$reify__14959.run(connections.clj:376)
    at lamina.core.pipeline$fn__3632$run__3639.invoke(pipeline.clj:31)
    at lamina.core.pipeline$resume_pipeline.invoke(pipeline.clj:61)
    at lamina.core.pipeline$start_pipeline.invoke(pipeline.clj:78)
    at lamina.connections$server_generator_$this$reify__14959.invoke(connections.clj:376)
    at lamina.connections$server_generator_$this__14942.invoke(connections.clj:376)
    at lamina.connections$server_generator_$this__14942.invoke(connections.clj:371)
    at lamina.trace.instrument$instrument_fn$fn__6340$fn__6374.invoke(instrument.clj:140)
    at lamina.trace.instrument$instrument_fn$fn__6340.invoke(instrument.clj:140)
    at clojure.lang.AFn.applyToHelper(AFn.java:154)
    at clojure.lang.RestFn.applyTo(RestFn.java:132)
    at clojure.lang.AFunction$1.doInvoke(AFunction.java:29)
    at clojure.lang.RestFn.invoke(RestFn.java:408)
    at lamina.connections$server_generator$fn$reify__15006.run(connections.clj:407)
    at lamina.core.pipeline$fn__3632$run__3639.invoke(pipeline.clj:31)
    at lamina.core.pipeline$resume_pipeline.invoke(pipeline.clj:61)
    at lamina.core.pipeline$subscribe$fn__3665.invoke(pipeline.clj:118)
    at lamina.core.result.ResultChannel.success_BANG_(result.clj:388)
    at lamina.core.result$fn__1315$success_BANG___1318.invoke(result.clj:37)
    at lamina.core.queue$dispatch_consumption.invoke(queue.clj:111)
    at lamina.core.queue.EventQueue.enqueue(queue.clj:327)
    at lamina.core.queue$fn__1946$enqueue__1961.invoke(queue.clj:131)
    at lamina.core.graph.node.Node.propagate(node.clj:282)
    at lamina.core.graph.core$fn__1875$propagate__1880.invoke(core.clj:34)
    at lamina.core.graph.node.Node.propagate(node.clj:282)
    at lamina.core.graph.core$fn__1875$propagate__1880.invoke(core.clj:34)
    at lamina.core.channel.Channel.enqueue(channel.clj:63)
    at lamina.core.utils$fn__1070$enqueue__1071.invoke(utils.clj:74)
    at lamina.core$enqueue.invoke(core.clj:107)
    at aleph.http.core$collapse_reads$fn__14021.invoke(core.clj:229)
    at lamina.core.graph.propagator$bridge$fn__2919.invoke(propagator.clj:194)
    at lamina.core.graph.propagator.BridgePropagator.propagate(propagator.clj:61)
    at lamina.core.graph.core$fn__1875$propagate__1880.invoke(core.clj:34)
    at lamina.core.graph.node.Node.propagate(node.clj:282)
    at lamina.core.graph.core$fn__1875$propagate__1880.invoke(core.clj:34)
    at lamina.core.channel.SplicedChannel.enqueue(channel.clj:111)
    at lamina.core.utils$fn__1070$enqueue__1071.invoke(utils.clj:74)
    at lamina.core$enqueue.invoke(core.clj:107)
    at aleph.netty.server$server_message_handler$reify__9192.handleUpstream(server.clj:135)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.handler.codec.http.HttpContentEncoder.messageReceived(HttpContentEncoder.java:81)
    at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at aleph.netty.core$upstream_traffic_handler$reify__8884.handleUpstream(core.clj:258)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at aleph.netty.core$connection_handler$reify__8877.handleUpstream(core.clj:240)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at aleph.netty.core$upstream_error_handler$reify__8867.handleUpstream(core.clj:199)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at aleph.netty.core$cached_thread_executor$reify__8830$fn__8831.invoke(core.clj:78)
    at clojure.lang.AFn.run(AFn.java:22)
    at java.lang.Thread.run(Thread.java:724)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Cannot page queries with both ORDER BY and a IN restriction on the partition key; you must either remove the ORDER BY or the IN and sort client side, or disable paging for this query
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:96)
    at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:108)
    at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:228)
    at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:354)
    at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:571)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    ... 1 more

It's Cassandra 2.0.6.

Internal Metrics

When a carbon process is spun up, it reports a bunch of useful metrics that can be used to monitor the health of the process (see screenshot for a few).

Does a cyanite process expose any of its internal metrics?

screen shot 2014-04-02 at 11 42 23 am

Unable to bind to port

After killing the cyanite process it will take a very long time before starting it again will not produce this below error. There are no other running instances and no other processes listening on tcp 2003.

starting with configuration:  nil
DEBUG [2014-04-18 14:46:44,840] main - org.spootnik.cyanite.config - building  :store  with  org.spootnik.cyanite.store/cassandra-metric-store
INFO [2014-04-18 14:46:44,841] main - org.spootnik.cyanite.store - connecting to cassandra cluster
INFO [2014-04-18 14:46:45,326] main - org.spootnik.cyanite.carbon - starting carbon handler
Exception in thread "main" org.jboss.netty.channel.ChannelException: Failed to bind to: VALID_DOMAIN/VALID_IP:2003
        at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
        at aleph.netty.server$start_server.invoke(server.clj:68)
        at aleph.tcp$start_tcp_server.invoke(tcp.clj:31)
        at org.spootnik.cyanite.carbon$start.invoke(carbon.clj:38)
        at org.spootnik.cyanite$_main.doInvoke(cyanite.clj:31)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at org.spootnik.cyanite.main(Unknown Source)
Caused by: java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:444)
        at sun.nio.ch.Net.bind(Net.java:436)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
        at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at aleph.netty.core$cached_thread_executor$reify__8828$fn__8829.invoke(core.clj:78)
        at clojure.lang.AFn.run(AFn.java:22)
        at java.lang.Thread.run(Thread.java:724)

Support for multiple elasticsearch nodes in index

Is there a way to configure cyanite to use more than one node for elasticsearch?

It looks like the yaml file only supports passing in one node for elasticsearch when I follow the yaml convention. That is, this works:

index:
use: "io.cyanite.es_path/es-native"
index: "cyanite_stats_paths" #defaults to "cyanite_paths"
host: "192.168.1.103"

But this gives the below exception:

index:
use: "io.cyanite.es_path/es-native"
index: "cyanite_stats_paths" #defaults to "cyanite_paths"
host:
- "192.168.1.103"
- "192.168.1.102"

(Additionally, it would be nice to be able to pass the port number in as part of the host, so one can do "localhost:8300, localhost:9300" to test failover on a local machine.)

Thanks,
Jeff

Exception in thread "main" java.lang.ClassCastException: clojure.lang.LazySeq cannot be cast to java.lang.String
at clojurewerkz.elastisch.native.conversion$__GT_socket_transport_address.invokePrim(conversion.clj:174)
at clojurewerkz.elastisch.native$connect.invoke(native.clj:250)
at io.cyanite.es_path$es_native.invoke(es_path.clj:198) at clojure.lang.Var.invoke(Var.java:379)
at io.cyanite.config$instantiate.invoke(config.clj:94) at io.cyanite.config$get_instance.invoke(config.clj:102)
at clojure.lang.AFn.applyToHelper(AFn.java:156)
at clojure.lang.AFn.applyTo(AFn.java:144)
at clojure.core$apply.invoke(core.clj:628)
at clojure.core$update_in.doInvoke(core.clj:5853)
at clojure.lang.RestFn.invoke(RestFn.java:467)
at io.cyanite.config$init.invoke(config.clj:129)
at io.cyanite$_main.doInvoke(cyanite.clj:31)
at clojure.lang.RestFn.applyTo(RestFn.java:137) at io.cyanite.main(Unknown Source)

How to set cassandra hosts and port?

Hi,

I'm not seeing how to set the cassandra hosts and port for cyanite to connect to. When I try running cyanite with cluster set to anything other than an IP address (e.g. either host:port or host,host), it fails; and attempting to give cluster a list of hosts, yaml style, also fails.

Thanks,
Jeff

Scaling Recommendations

Currently, we use a series of carbon processes to handle all of our metrics. We have "relays" and "caches" and we have to spin up multiple of these processes to take advantage of the cores in the server.

In preparing cyanite to receive a subset of our production metrics (~1M per minute), do you recommend I follow the same pattern of multiple processes? Is there some benchmark of how many metrics per second a single cyanite process could handle?

Path DB update fails

I think it's possible that an exception while attempting to update path-db in store.clj can stop the whole process from continuing.

If I replace update-path-db-every with:

(defn update-path-db-every
  "At each interval, fetch all known paths, and store the
   resulting set in path-db"
  [session interval]
  (while true
    (try
      (->> (alia/execute session pathq)
           (map :path)
           (set)
           (reset! path-db))
      (catch Exception e
        (error e "failure while updating path db")))
    (Thread/sleep (* interval 1000))))

... I get:

ERROR [2014-03-26 10:22:57,505] clojure-agent-send-off-pool-0 - org.spootnik.cyanite.store - failure while updating path db
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /1.1.1.1 (Timeout during read), /2.2.2.2 (Timeout during read))
    at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:64)
    at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:269)
    at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:183)
    at com.datastax.driver.core.Session.execute(Session.java:111)
    at qbits.alia$execute.doInvoke(alia.clj:190)
    at clojure.lang.RestFn.invoke(RestFn.java:421)
    at org.spootnik.cyanite.store$update_path_db_every$fn__86.invoke(store.clj:123)
    at org.spootnik.cyanite.store$update_path_db_every.invoke(store.clj:122)
    at org.spootnik.cyanite.store$cassandra_metric_store$fn__93.invoke(store.clj:140)
    at clojure.core$binding_conveyor_fn$fn__4145.invoke(core.clj:1910)
    at clojure.lang.AFn.call(AFn.java:18)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /1.1.1.1 (Timeout during read), /2.2.2.2 (Timeout during read))
    at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
    at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:170)
    ... 3 more

I think the actual exception is my problem - not sure whether my Cassandra setup is working properly (this is my first time playing with Cassandra). Though it could be an issue with the number of paths it's attempting to pull back. nodetool cfstats tells me that my metric keyspace has approximately 273k rows.

@pyr is this what you were referring to in #12?

in memory metric store not being updated

Using latest master, it looks as if the in memory metric store no longer fills its cache from the DB.

Specifically, the update-path-db-every looks like its gone?

I see how the path store is used for new metrics, but I have a cyanite process for reads only that won't get any paths.

Any ideas on this one? (I realize this might still be work in progress stuff, just trying to figure out the direction)

Also, as an aside, after realizing this, I downgraded back, and it seems with the old version, the cache isn't getting reliably updated... I have about 8 million rows, so perhaps its just taking a long time to fill the cache?

Lein uberjar failure

When trying the to build the latest cyanite I get the following error

Could not transfer artifact cc.qbits:alia:pom:2.2.0 from/to clojars (https://clojars.org/repo/): Checksum validation failed, expected 7495cbbed368dee884510d3329979f53baa013b1 but is bc2637df032b359c6baf9bd847d5157cb3706a45

Value out of range for int

With the following cyanite.yaml...

carbon:
  rollups:
    - period: 21600
      rollup: 15
    - period: 259200
      rollup: 60
    - period: 1209600
      rollup: 300
    - period: 31536000
      rollup: 600
    - period: 94608000
      rollup: 3600

... my intention was to try and match these retentions (hopefully I've got the right idea):

retentions = 15s:6h,1m:72h,5m:2w,10m:1y,1h:3y

However, I get the following exception while sending data to cyanite:

ERROR [2014-03-17 10:33:17,565] New I/O worker #5 - lamina.core.utils - Error in permanent callback.
java.lang.IllegalArgumentException: Value out of range for int: 18921600000
    at clojure.lang.RT.intCast(RT.java:1115)
    at clojure.lang.RT.intCast(RT.java:1085)
    at org.spootnik.cyanite.store$channel_for$fn__11999.invoke(store.clj:154)
    at lamina.core.graph.propagator.CallbackPropagator.propagate(propagator.clj:42)
    at lamina.core.graph.core$fn__1875$propagate__1880.invoke(core.clj:34)
    at lamina.core.graph.node.Node.propagate(node.clj:282)
    at lamina.core.graph.core$fn__1875$propagate__1880.invoke(core.clj:34)
    at lamina.core.channel.Channel.enqueue(channel.clj:63)
    at lamina.core.utils$fn__1070$enqueue__1071.invoke(utils.clj:74)
    at lamina.core$enqueue$fn__4919.invoke(core.clj:111)
    ... lots more lines ...

Looking at the number 18921600000 (which is 31536000 * 600) does setting :values in store.clj:154 need to have more longs in it and a change to the cyanite cassandra schema? Sorry, I'd submit a pull-request but my lack of understanding at this point would probably break something.

Cyanite and multi-DC support

Hi Pierre,

I'm hoping to get cyanite working for showing stats from multiple data centers.

We're writing data into cassandra (via graphite-api) using something like the following:

  • stats.us_east.app.XXX
  • stats.us_west_1.app.XXX
  • stats.us_west_2.app.XXX

So, in us-east, all stats are being stored in cassandra under statsd buckets starting with stats.us_east.app, etc.

I'm able to do a "select * from metric limit 100" and can see the data from the other DCs is in cassandra -- i.e. data is replicating between the cassandra nodes, cross-DC, just fine.

However, when I read /metrics/index.json from graphite-api, only the data from the local data center is showing up.

How is cyanite providing a list of metrics to graphite-api?

Thanks,
Jeff

Basic metric administracion with cyanite-cassandra?

Hi, I'm testing cyanite and evaluating how difficult can be administration of this graphite backend vs the default carbon-whisper system.

I would like to know hot to do some basic things with metrics.

a) rename metrics which matches a pattern.
b) delete metrics which matches a pattern.
c) change roll-up global roll-up aggregation without affect the current data.

But when trying to do a simple query something happens.

cqlsh:metric> select * from metric where path='collectd.dades.graphite0.graphitews0.system.cpu.percent-active';
Bad Request: partition key part path cannot be restricted (preceding part rollup is either not restricted or by a non-EQ relation)

Can you help me to learn more on metric administration?

Request: on startup, when keyspace isn't inited, either exit or retry connections

We're seeing a minor bug when running cyanite in a testing setup, where the testing setup resets and re-inits everything.

Specifically, if the metric keyspace doesn't exist right at startup of cyanite (because our init scripts are running it at the same time -- so we can fix this by delaying the startup of cyanite), then the below stack trace occurs and the process hangs. I'd suggest either having cyanite exit, or retry on some interval, and on success, continue. As it is now, the process spins up, but fails to reach a point where it's listening on any ports for traffic.

-J

DEBUG [2014-09-19 10:15:58,545] main - org.spootnik.cyanite.config - building :store with org.spootnik.cyanite.store/cassandra-metric-store
INFO [2014-09-19 10:15:58,558] main - org.spootnik.cyanite.store - creating cassandra metric store
at org.spootnik.cyanite.config$init.invoke(config.clj:124)
at org.spootnik.cyanite$_main.doInvoke(cyanite.clj:31)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at org.spootnik.cyanite.main(Unknown Source)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Keyspace 'metric' does not exist
INFO [2014-09-19 10:15:58,558] main - org.spootnik.cyanite.store - creating cassandra metric store

Very divergent performance angled fork

I've got a very divergent fork going that i've been working on for some time to improve performance. Some highlights from the changes are:

  • ES path lookup to query for complete paths before breaking them down for creation.
  • ES requests to batch versions and to use http-kit for async requests.
  • Casandra updates are now done in asynchronous prepared batches.
  • String processing has been improved.
  • Removed Lamina and moved it all over to core.async.

I'd like to know what the appetite is for taking this rather large change set wholesale. That would obviously be simplest for me, but i appreciate that the original author might not want that. If not then I'll see if i can separate out some bits to give back.

Import existing whisper files?

Is there already a way to import existing whisper files into cyanite? I guess it would make cyanite an even more attractive alternative if there was a way to migrate an existing carbon deployment.

If it is not possible yet would you mind pointing out how an import could be acchieved? I suppose it should be fairly easy to write a python script that uses whisper to read and decode the whisper files and then simply streams data into cassandra?

Error when starting the cyanite

Received this error when starting the Cyanite

ERROR [2014-04-12 17:34:44,381] clojure-agent-send-off-pool-0 - org.spootnik.cyanite.store - could not update path database
clojure.lang.ExceptionInfo: Query execution failed {:values nil, :query #<SimpleStatement SELECT distinct tenant, path, rollup, period from metric;>, :type :qbits.alia/execute, :exception #<SyntaxError com.datastax.driver.core.exceptions.SyntaxError: line 1:16 no viable alternative at input 'tenant'>}
at clojure.core$ex_info.invoke(core.clj:4403)
at qbits.alia$ex__GT_ex_info.invoke(alia.clj:125)
at qbits.alia$ex__GT_ex_info.invoke(alia.clj:127)
at qbits.alia$execute.doInvoke(alia.clj:251)
at clojure.lang.RestFn.invoke(RestFn.java:421)
at org.spootnik.cyanite.store$update_path_db_every$fn__13702.invoke(store.clj:123)
at org.spootnik.cyanite.store$update_path_db_every.invoke(store.clj:122)
at org.spootnik.cyanite.store$cassandra_metric_store$fn__13709.invoke(store.clj:140)
at clojure.core$binding_conveyor_fn$fn__4145.invoke(core.clj:1910)
at clojure.lang.AFn.call(AFn.java:18)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:16 no viable alternative at input 'tenant'
at com.datastax.driver.core.Responses$Error.asException(Responses.java:94)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:108)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:228)
at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:354)
at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:571)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
... 3 more

There are 2 wrong CQL statement

One with Distinct, which is not allowed in CQLSH without one of the elements for the partition keys mentioned

SELECT distinct tenant .... from metrics;

Other, where tenant cannot be empty which is part of the partition key
UPDATE metric USING TTL ? SET data = data + ? "
"WHERE tenant = '' AND rollup = ? AND period = ? AND path = ? AND time = ?;")))

Can't build cyanite

Hello I'm getting the following during my build

(Could not transfer artifact org.apache.httpcomponents:httpcore:pom:4.3.2 from/to central (http://repo1.maven.org/maven2/): Checksum validation failed, could not read expected checksum: Failed to transfer file: http://repo1.maven.org/maven2/org/apache/httpcomponents/httpcore/4.3.2/httpcore-4.3.2.pom.sha1. Return code is: 500 , ReasonPhrase:Domain Not Found.)
(Could not transfer artifact org.apache.httpcomponents:httpcore:pom:4.3.1 from/to central (http://repo1.maven.org/maven2/): Checksum validation failed, could not read expected checksum: Failed to transfer file: http://repo1.maven.org/maven2/org/apache/httpcomponents/httpcore/4.3.1/httpcore-4.3.1.pom.sha1. Return code is: 500 , ReasonPhrase:Domain Not Found.)
(Could not transfer artifact org.apache.httpcomponents:httpmime:pom:4.3.2 from/to central (http://repo1.maven.org/maven2/): Checksum validation failed, could not read expected checksum: Failed to transfer file: http://repo1.maven.org/maven2/org/apache/httpcomponents/httpmime/4.3.2/httpmime-4.3.2.pom.sha1. Return code is: 500 , ReasonPhrase:Domain Not Found.)
(Could not transfer artifact io.netty:netty-parent:pom:4.0.19.Final from/to central (http://repo1.maven.org/maven2/): Failed to transfer file: http://repo1.maven.org/maven2/io/netty/netty-parent/4.0.19.Final/netty-parent-4.0.19.Final.pom. Return code is: 500 , ReasonPhrase:Domain Not Found.)
This could be due to a typo in :dependencies or network issues.
If you are behind a proxy, try setting the 'http_proxy' environment variable.
Uberjar aborting because jar failed: Could not resolve dependencies

It's downloaded other dependencies but not the above.

I'm using leiningen 2.4.3

I know nothing of clojure or leiningen exceot clojure is a lisp dialect and leiningen is a build system/ I would find it extremely helpful if there was a build of cyanite that I could download.

Binds graphite listener to all interfaces

Cyanite binds the graphite listener to all interfaces. The "host" configuration setting is ignored:

Configuration:

carbon:
host: "127.0.0.1"
port: 2004
rollups:
- "1m:30d"
- "5m:90d"
http:
host: "127.0.0.1"
port: 8000
logging:
level: warn
console: false
files:
- "/var/log/cyanite/cyanite.log"
store:
cluster: 'localhost'
keyspace: 'metric'
index:
use: "io.cyanite.es_path/es-native"
index: "cyanite_paths"
host: "127.0.0.1"
port: 9300
cluster_name: "Monitoring"

Test:

$ netstat -tulpen |grep 2004
tcp 0 0 0.0.0.0:2004 0.0.0.0:* LISTEN 107 1763123 14763/java

where 14763 is the PID of the cyanite process.

Add kafka support

I think this would bring cyanite closer to the features of carbon with the exception of python pickle format. We are looking at use cases of sending data to brokers to fan out messaging to several cyanite instances.

Problems with Cassandra

Using Cassandra version 2.0.6

I start up graphite-api, cyanite and apache-cassandra. I start sending metrics from a DropWizard application and within minutes my Cassandra process dies. Just trying to find out if you are aware of any issues writing to Cassandra.

Sometimes I am able to write metrics for a few minutes, other times the crash is more immediate.

In Cassandra:

 INFO [OptionalTasks:1] 2014-04-07 09:03:53,102 MeteredFlusher.java (line 63) flushing high-traffic column family CFS(Keyspace='metric', ColumnFamily='metric') (estimated 72791577 bytes)
 INFO [OptionalTasks:1] 2014-04-07 09:03:53,103 ColumnFamilyStore.java (line 785) Enqueuing flush of Memtable-metric@133887464(20720419/72791577 serialized/live bytes, 339679 ops)
 INFO [FlushWriter:12] 2014-04-07 09:03:53,104 Memtable.java (line 331) Writing Memtable-metric@133887464(20720419/72791577 serialized/live bytes, 339679 ops)
 INFO [FlushWriter:12] 2014-04-07 09:03:54,526 Memtable.java (line 371) Completed flushing /datos/monitoring/apache-cassandra/data/metric/metric/metric-metric-jb-27-Data.db (6649345 bytes) for commitlog position ReplayPosition(segmentId=1396879560863, position=548607

In Cyanite:

ERROR [2014-04-07 22:00:42,062] New I/O worker #25 - lamina.core.utils - error on inactive probe: tcp-server:error
clojure.lang.ExceptionInfo: Query prepare failed {:query "UPDATE metric USING TTL ? SET data = data + ? WHERE tenant = '' AND rollup = ? AND period = ? AND path = ? AND time = ?;", :type :qbits.alia/prepare-error, :exception #<NoHostAvailableException com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)>}
    at clojure.core$ex_info.invoke(core.clj:4403)
    at qbits.alia$ex__GT_ex_info.invoke(alia.clj:125)
    at qbits.alia$prepare.invoke(alia.clj:141)
    at org.spootnik.cyanite.store$insertq.invoke(store.clj:34)
    at org.spootnik.cyanite.store$channel_for.invoke(store.clj:150)
    at org.spootnik.cyanite.carbon$handler$fn__13780.invoke(carbon.clj:29)
    at aleph.tcp$start_tcp_server$fn__9355$fn__9357.invoke(tcp.clj:34)
    at aleph.netty.server$server_message_handler$initializer__9141.invoke(server.clj:111)
    at aleph.netty.server$server_message_handler$reify__9192.handleUpstream(server.clj:131)
    at aleph.netty.core$upstream_traffic_handler$reify__8884.handleUpstream(core.clj:258)
    at aleph.netty.core$connection_handler$reify__8877.handleUpstream(core.clj:240)
    at aleph.netty.core$upstream_error_handler$reify__8867.handleUpstream(core.clj:199)
    at org.jboss.netty.channel.Channels.fireChannelOpen(Channels.java:170)
    at org.jboss.netty.channel.socket.nio.NioAcceptedSocketChannel.<init>(NioAcceptedSocketChannel.java:42)
    at org.jboss.netty.channel.socket.nio.NioServerBoss.registerAcceptedChannel(NioServerBoss.java:137)
    at org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:104)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at aleph.netty.core$cached_thread_executor$reify__8830$fn__8831.invoke(core.clj:78)
    at clojure.lang.AFn.run(AFn.java:22)
    at java.lang.Thread.run(Thread.java:724)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
    at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:100)
    at com.datastax.driver.core.SessionManager.execute(SessionManager.java:417)
    at com.datastax.driver.core.SessionManager.prepareAsync(SessionManager.java:124)
    at com.datastax.driver.core.SessionManager.prepare(SessionManager.java:108)
    at qbits.alia$prepare.invoke(alia.clj:139)
    ... 20 more

Help in Rollup Def

I am new to Carbon/Cyanite, I am trying to understand the meaning of rollups settings

rollups:
- period: 60480
rollup: 10
- period: 105120
rollup: 600

What does the 105120/600 means , I appreciate any help on this,

Thanks

Problems with Cyanite configuration

Hi everyone,

We are working with Cyanite to store metrics in Cassandra, store a cache in Elasticsearch, and read them through Graphite-web, all of it in a multiple node cluster. After a upgrade of Cassandra to a 2.1 version, and Cyanite to the 0.1.3 version, we have problems with Cyanite configuration. When we want to view the metrics, the Graphite-web doesn't find them.

cyanite.yaml:

carbon:
  host: "192.168.150.111"
  port: 2003
  rollups:
   - "60s:30d"
   - "5m:180d"
   - "1h:300d"
   - "1d:1y"

http:
  host: "192.168.150.111"
  port: 8080

logging:
  level: debug
  console: true
  files:
    - "/var/log/cyanite.log"

store:
  cluster: 'localhost'
  keyspace: 'metric'

index:
  use: "io.cyanite.es_path/es-native" 
  index: "my_paths" #defaults to "cyanite_paths"
  host: "localhost" # defaults to localhost
  port: 9300 # defaults to 9300
  cluster_name: "es_4_cyanite" #REQUIRED! this is specific to your cluster and has no sensible default

/var/log/cyanite.log:

ERROR [2014-10-13 11:06:36,034] async-dispatch-27 - io.cyanite.es_path - No node available
org.elasticsearch.client.transport.NoNodeAvailableException: No node available
at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:196)
at org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:94)
at org.elasticsearch.client.support.AbstractClient.get(AbstractClient.java:172)
at org.elasticsearch.client.transport.TransportClient.get(TransportClient.java:375)
at clojurewerkz.elastisch.native$get.invoke(native.clj:63)
at clojurewerkz.elastisch.native.document$get.invoke(document.clj:136)
at clojurewerkz.elastisch.native.document$present_QMARK_.invoke(document.clj:164)
at clojure.core$partial$fn__4328.invoke(core.clj:2503)
at io.cyanite.es_path$es_native$reify__5158$fn__5302$state_machine__4698__auto____5303$fn__5305.invoke(es_path.clj:219)
at io.cyanite.es_path$es_native$reify__5158$fn__5302$state_machine__4698__auto____5303.invoke(es_path.clj:217)
at clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:940)
at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:944)
at clojure.core.async.impl.ioc_macros$take_BANG_$fn__4714.invoke(ioc_macros.clj:953)
at clojure.core.async.impl.channels.ManyToManyChannel$fn__1714.invoke(channels.clj:102)
at clojure.lang.AFn.run(AFn.java:22)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Do you have an idea of what is going wrong? Is the configuration correct? At least in the previous version this works fine.

Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.