sourcegraph / appdash Goto Github PK

View Code? Open in Web Editor NEW

1.7K 1.7K 138.0 2.2 MB

Application tracing system for Go, based on Google's Dapper.

Home Page: https://sourcegraph.com

License: Other

Go 75.81% Python 7.87% HTML 16.32%

appdash's Introduction

appdash (view on Sourcegraph)

Appdash is an application tracing system for Go, based on Google's Dapper and Twitter's Zipkin.

Appdash allows you to trace the end-to-end handling of requests and operations in your application (for perf and debugging). It displays timings and application-specific metadata for each step, and it displays a tree and timeline for each request and its children.

To use appdash, you must instrument your application with calls to an appdash recorder. You can record any type of event or operation. Recorders and schemas for HTTP (client and server) and SQL are provided, and you can write your own.

Usage

To install appdash, run:

go get -u sourcegraph.com/sourcegraph/appdash/cmd/...

A standalone example using Negroni and Gorilla packages is available in the examples/cmd/webapp folder.

A demo / pure net/http application (which is slightly more verbose) is also available at cmd/appdash/example_app.go, and it can be ran easily using appdash demo on the command line.

Community

Questions or comments? Join us on #sourcegraph in the Gophers slack!

Development

Appdash uses vfsgen to package HTML templates with the appdash binary for distribution. This means that if you want to modify the template data in traceapp/tmpl you can first build using the dev build tag, which makes the template data be reloaded from disk live.

After you're finished making changes to the templates, always run go generate sourcegraph.com/sourcegraph/appdash/traceapp/tmpl so that the data_vfsdata.go file is updated for normal Appdash users that aren't interested in modifying the template data.

Components

Appdash follows the design and naming conventions of Google's Dapper. You should read that paper if you are curious about why certain architectural choices were made.

There are 4 main components/concepts in appdash:

Spans: A span refers to an operation and all of its children. For example, an HTTP handler handles a request by calling other components in your system, which in turn make various API and DB calls. The HTTP handler's span includes all downstream operations and their descendents; likewise, each downstream operation is its own span and has its own descendents. In this way, appdash constructs a tree of all of the operations that occur during the handling of the HTTP request.
Event: Your application records the various operations it performs (in the course of handling a request) as Events. Events can be arbitrary messages or metadata, or they can be structured event types defined by a Go type (such as an HTTP ServerEvent or an SQLEvent).
Recorder: Your application uses a Recorder to send events to a Collector (see below). Each Recorder is associated with a particular span in the tree of operations that are handling a particular request, and all events sent via a Recorder are automatically associated with that context.
Collector: A Collector receives Annotations (which are the encoded form of Events) sent by a Recorder. Typically, your application's Recorder talks to a local Collector (created with NewRemoteCollector. This local Collector forwards data to a remote appdash server (created with NewServer that combines traces from all of the services that compose your application. The appdash server in turn runs a Collector that listens on the network for this data, and it then stores what it receives.

Language Support

Appdash has clients available for Go, Python (see python/ subdir) and Ruby (see https://github.com/bsm/appdash-rb).

OpenTracing Support

Appdash supports the OpenTracing API. Please see the opentracing subdir for the Go implementation, or see the GoDoc for API documentation.

Acknowledgments

appdash was influenced by, and uses code from, Coda Hale's lunk.

appdash's People

Contributors

Stargazers

Watchers

Forkers

gbbr slimsag tcolgate pombreda savaki bobbyzhu zhouyuyong joeshaw is00hcw tomzhang keegancsmith yuanfeng0905 evalphobia johnnyluo586 trong chris-ramon jweissig bg451 sguzwf wqx081 bobbwu markchou lookfwd nandakola morecrazy houcy se77en hardcoar ycaihua flashbuckets supertest tigerqiu712 jiesensun saj1th maniacs-ops linearregression amandacameron bcllemon ldesiqueira xuzhaokui nothingmuch pybender flazz chakra-coder wlibo666 kelseyhightower namliz alanjds fengyindiehun sandcu it-man-cn kangzhenkang wwjiang007 setriones anewhuahua alexxnica justinzhu gophersgang etsangsplk yanmaipian guangminglion micro-tech-solutions gamehu valery-barysok izouxv beiyexertz terrygl brucezu telegrap doubaokun wfxiang08 ht101996 zyf7862634 wyatt88 antimatter96 rzs840707 ustcwangzi huzichunjohn nuaays tifancy sunrongya lorgine-li sharkbobo cheikhshift foobarren thomasruiz ninefive heyitsanthony forging2012 haiderny xushiwei yanghongkjxy simonpasquier marcelomata listr0ng liebesu treedomcn windgoogle ifa6 xiaolang098

appdash's Issues

(long term) consider making OpenTracing API the primary interface

I've yet to try it out for real, but if the OpenTracing API turns out to be nice in real applications, we could in theory make Appdash just operate as if it was just an implementation of the OpenTracing API. I.e. there would not be an Appdash API, it would just be that Appdash is a tracer designed specifically for OpenTracing.

I've marked this issue as long-term, because this would be a serious change and would probably be done in multiple phases (unexporting the current API, simplification of the internals w.r.t marshaling events, etc). I'm also not 100% convinced yet that this would be most ideal or appropriate, but it is something I am considering longer term.

`appdash demo` sub-spans _appear_ to have duplicate/double metadata

EDIT: Updated (Mar 7) with recent findings.

Running appdash demo or examples/cmd/webapp and clicking on a sub-span from the trace, brings you to the page for that sub-span. Strangely, there appear to be double results for meta-data present:

I tracked down why and how this occurs, and in fact it's not duplicate data at all (it's data from the client sending the request and the server responding to it) -- it's mostly working-as-intended:

Client sends a request to /endpoint with a Span-ID header (correct).
Server handling /endpoint checks for the Span-ID header and records directly to it (correct).
Both sets of data end up in the same span ID (also correct).

The issue is that the keys do not make a distinction between Client versus Server -- which is confusing when looking at the data. Consider trying to track down a bug in which the Go HTTP client for some unknown reason always reports that the server responded with a 404 status code:

Key	Value
`Response.StatusCode`	`200`
`Response.StatusCode`	`404`

It's not clear to the reader whether the server responded with 404 or the client received a 404. A better presentation would be:

Key	Value
`Server.Response.StatusCode`	`200`
`Client.Response.StatusCode`	`404`

Aha! Now we know the server responded with 200, but the client for some reason got a 404.

I've put the Verbose Data View into a gist for prying eyes (see here).

TL;DR: We should probably prefix client vs. server annotations with such as clarification.

make releases and container available

I want to try out appdash but encountered a bug in influxdb influxdata/influxdb#6445

It would be great if there were release binaries available and a container up on quay.io for me to use as well.

Thanks for the great looking project, looking forward to trying it out.

Can use appdash trace golang grpc?

I used grpc in my micro services, can u use appdash trace the grpc request?

InfluxDBStore: incorrect span hierarchy

If my assumptions are correct that both webapp and webapp-influxdb do the same thing and should generate a basically identical trace structure, I think the hierarchy that InfluxDBStore reassembles traces/spans in is incorrect.

Steps to reproduce:

Run webapp and visit http://localhost:8699, click on the trace, click on Verbose Data View tab.
Killwebapp, run webapp-Influxdb.
In a new tab, visit http://localhost:8699, click the trace, go to Verbose Data View tab.
Switch back and forth between the tabs and compare the data displayed.

What happens?

webapp-influxdb displays Client.Foo fields when viewing the root of the trace (at e.g. http://localhost:8700/traces/419d201ab0ce01d4)

What should happen?

It should in reality only display those Client.Foo fields if you have clicked to view a sub-span of the trace, e.g. by clicking localhost:8699/endpoint (290ms) / on the http://localhost:8700/traces/<trace-id>/<sub-span-id> page.

Notes:

This is not a bug in the web UI / front-end code, but rather it means the frontend is displaying the trace correctly, but it's root vs. sub-span hierarchy is incorrect.

Make httptrace.responseInfoRecorder compatible with http.Hijacker

The httptrace server middleware writes to the HTTP response at

appdash/httptrace/server.go

Line 85 in 9dd479d

SetSpanIDHeader(rr.Header(), *spanID)

even if it has been hijacked (with http.Hijacker, for WebSockets for example). This has undefined behavior. It should detect if it has been hijacked and not write to it. Same for Flush.

InfluxDBStore: point batching during Collect is suboptimal

Issue

Ideally, InfluxDBStore.Collect is as fast as reasonably possible. Right now, with examples/cmd/webapp-influxdb I've noticed Collect times right now in the range of 60-200ms (just eyeballing it, I could be off by a bit).

If Collect cannot complete in under 50ms we lose trace data, because we cannot have trace data build up in memory forever (memory leak), nor can it block pending HTTP requests. 50ms for Collect is, ideally, an upper time bound (hopefully most Collect are much quicker).

To measure this, I've added some hacky timing debug information to influxdb_store.go and changed the webapp-influxdb command, you can try my test branch issue131 for example, or see my changes here: https://github.com/sourcegraph/appdash/compare/issue131 (note: this branch is just PR #127 and #131 merged, then f552611 applied on top).

Reproducing

Run the example app cleanly:

go install ./examples/cmd/webapp-influxdb/ && rm -rf ~/.influxdb/ && webapp-influxdb

Then using vegeta HTTP profiling tool, perform 1 HTTP request/sec for 8s:

echo "GET http://localhost:8699/" | vegeta attack -duration=8s -rate=1 | vegeta report

You should observe some logs that look like:

InfluxDBStore.Collect -> in.con.Write took 364.948424ms
InfluxDBStore.Collect -> took 367.329577ms
appdash: 2016/04/05 10:55:54 ChunkedCollector: queue entirely dropped (trace data will be missing)
appdash: 2016/04/05 10:55:54 ChunkedCollector: queueSize:3 queueSizeBytes:2133

...

InfluxDBStore.Collect -> in.con.Write took 65.655258ms
InfluxDBStore.Collect -> took 68.063186ms
appdash: 2016/04/05 10:56:00 ChunkedCollector: queue entirely dropped (trace data will be missing)
appdash: 2016/04/05 10:56:00 ChunkedCollector: queueSize:3 queueSizeBytes:3440

Possible solution

Note that in.con.Write takes most of the time spent during Collect, i.e. the Collect function itself is not very expensive, but writing to InfluxDB via in.con.Write is!

I think this is because Collect is inherently a very small operation, at most it will be writing a single InfluxDB data point. Consider our code:

    // A single point represents one span.
    pts := []influxDBClient.Point{*p}
    bps := influxDBClient.BatchPoints{
        Points:   pts,
        Database: in.dbName,
    }
    _, writeErr := in.con.Write(bps)
    if writeErr != nil {
        return writeErr
    }
    return nil

We only write a single point to InfluxDB, and this becomes very expensive because InfluxDB cannot handle small writes very easily, and adds a large overhead like 50-200ms to them. However, InfluxDB can write a very large number of points (500+, a batch of points) in almost the same amount of time (50-200ms) from my tests.

I think the solution here is to make InfluxDBStore.Collect append to an internal slice, such that it queues up an entire batch of points, and then after some period of time writes them to InfluxDB in a background goroutine. Important aspects would be:

Allow flush time, maximum queue size, etc to be configurable.
Ensure that the queue size cannot grow forever, causing memory leaks.
For inspiration, see ChunkedCollector here

InfluxDBStore: "retention policy conflicts with an existing policy"

Track issue reported by @chris-ramon, with a fix already sent upstream at influxdata/influxdb#6202 -- close once merged.

'appdash serve': should use InfluxDBStore instead of MemoryStore by default

GET /dashboard/data?start=0&end=72: error: handler panic

runtime error: invalid memory address or nil pointer dereference

goroutine 22 [running]:
runtime/debug.Stack(0x0, 0x0, 0x0)
C:/Go/src/runtime/debug/stack.go:24 +0x87
sourcegraph.com/sourcegraph/appdash/traceapp.handlerFunc.ServeHTTP.func1(0x16a6ff0, 0xc08219f860, 0xc0824d89a0)
E:/RDLL/Golang/src/sourcegraph.com/sourcegraph/appdash/traceapp/handler.go:18 +0x5a
panic(0xa95e60, 0xc082008060)
C:/Go/src/runtime/panic.go:426 +0x4f7
sourcegraph.com/sourcegraph/appdash/traceapp.(_App).serveDashboardData(0xc08200fa90, 0x16a7070, 0xc082194000, 0xc0824d89
a0, 0x0, 0x0)
E:/RDLL/Golang/src/sourcegraph.com/sourcegraph/appdash/traceapp/dashboard.go:60 +0x295
sourcegraph.com/sourcegraph/appdash/traceapp.(_App).(sourcegraph.com/sourcegraph/appdash/traceapp.serveDashboardData)-fm
(0x16a7070, 0xc082194000, 0xc0824d89a0, 0x0, 0x0)
E:/RDLL/Golang/src/sourcegraph.com/sourcegraph/appdash/traceapp/app.go:65 +0x53
sourcegraph.com/sourcegraph/appdash/traceapp.handlerFunc.ServeHTTP(0xc08215a640, 0x16a6ff0, 0xc08219f860, 0xc0824d89a0)
E:/RDLL/Golang/src/sourcegraph.com/sourcegraph/appdash/traceapp/handler.go:22 +0xbf
github.com/gorilla/mux.(_Router).ServeHTTP(0xc08200f220, 0x16a6ff0, 0xc08219f860, 0xc0824d89a0)
E:/RDLL/Golang/src/github.com/gorilla/mux/mux.go:103 +0x277
sourcegraph.com/sourcegraph/appdash/traceapp.(_App).ServeHTTP(0xc08200fa90, 0x16a6ff0, 0xc08219f860, 0xc0824d89a0)
E:/RDLL/Golang/src/sourcegraph.com/sourcegraph/appdash/traceapp/app.go:76 +0x4c
net/http.serverHandler.ServeHTTP(0xc0820b3080, 0x16a6ff0, 0xc08219f860, 0xc0824d89a0)
C:/Go/src/net/http/server.go:2081 +0x1a5
net/http.(_conn).serve(0xc0821ac980)
C:/Go/src/net/http/server.go:1472 +0xf35
created by net/http.(_Server).Serve
C:/Go/src/net/http/server.go:2137 +0x455

Golang 1.6
Windows 10
PowerShell

InfluxDBStore: Aggregate method needs extensive tests

(inclusive of continuous queries)

Upgrade TravisCI to use Go 1.4 and Go 1.5

Running with just 1.3 and 1.4 currently, we can drop 1.3 now that it's older.

Custom events cannot render in Appdash (cannot call RegisterEvent)

When running appdash on the command line, custom events don't render. This is because nobody in the entire application can invoke RegisterEvent with the new event type.

I have a few ideas on how we can fix this.

As a fast-and-easy hack for now: you can copy the source for cmd/appdash (or just modify it directly) such that a call to RegisterEvent for your type is made.

Reported by @joeshaw over Slack

Here is what running cmd/appdash should look like:

And here is what it ends up looking like (because the custom event type is not registered):

Do not load template data from disk by default.

Reported on Slack by @rhastamasta who had trouble running Appdash.

Currently we load the template data from disk by default, but it doesn't seem to work in all cases, as he was getting:

template [root.html layout.html]: Asset root.html can't read by error: Error reading asset root.html at ../go/src/sourcegraph.com/sourcegraph/appdash/traceapp/tmpl/root.html: open ../go/src/sourcegraph.com/sourcegraph/appdash/traceapp/tmpl/root.html: no such file or directory

MacBook-Pro:src adu$ pwd
/Users/adu/Workspace/go/src
MacBook-Pro:src adu$ l
total 0
0 drwxr-xr-x   6 adu  1955793164   204B Apr  3 10:47 .
0 drwxr-xr-x   5 adu  1955793164   170B Mar 11 14:20 ..
0 drwxr-xr-x   3 adu  1955793164   102B Mar 11 14:20 bitbucket.org
0 drwxr-xr-x   3 adu  1955793164   102B Mar 11 14:20 code.google.com
0 drwxr-xr-x  21 adu  1955793164   714B Apr  6 10:09 github.com
0 drwxr-xr-x   3 adu  1955793164   102B Apr  3 10:47 sourcegraph.com

The solution in this case is to determine why Appdash doesn't load assets relative to the proper $GOPATH variable (/Users/adu/Workspace/go above) or simply require go generate to be ran during development / modification of template files.

[Ideas] User vs machine generated traces

Observation: A lot of traces we have in Sourcegraph originate from machine actions (bots), rather than a user action. Machine traces are less valuable than what happens from a direct user action. (ie a pageload/XHR request trace is usually more valuable than a trace of a worker asking for a job). Machine traces are also higher volume. We get into situations of appdash service degradation or traces expiring soon. Dashboards / lists of traces can be overpopulated with traces originating from machines.

What can we do so we get a better experience when investigating or discovering traces from user actions?

Replace AggregateStore with InfluxDBStore

What is AggregateStore?

AggregateStore is the most complex of the Appdash storage backends. Unlike MemoryStore which just collects traces, AggregateStore both collects traces and aggregates some data to provide some useful stats within the Appdash Dashboard page (like slowest average trace, etc).

Why we should use InfluxDBStore instead

It became clear to me after @bg451's comment that I have not done a great job conveying the overall direction or problems of storage backends in Appdash. The major issues seen today is that:

MemoryStore is very simple / lightweight, but doesn't support the Dashboard page (no aggregated data / metrics about traces). It's good, for example, in testing or within a lightweight CLI application.
AggregateStore has a number of serious problems:
- After about ~130k traces for a given name (e.g. HTTP route), it becomes so slow that it can no longer store traces at all. In fact, this caused a few serious memory leaks to crop up in production for us, albeit in unrelated code.
- It is extremely complex: for what a simple operation it is apparently doing at a high level, the implementation is extremely convoluted. But don't take my word for it, just check out the high level overview.
- Implementing more features like more complex or exact queries, would be near impossible to manage given the code complexity.
- is so slow that the Dashboard "time range selection" bar appears broken in real applications (load time is so bad).

In contrast, InfluxDBStore:

Has the potential to be significantly more performant that AggregateStore.
Like AggregateStore, it can be embedded within your Go process entirely (no external InfluxDB setup is required), this is the default setup.
Can in the future support connecting to an externally hosted InfluxDB server (enabling clusters, etc).
Is a real time-series database, supporting complex queries in an SQL-like language. This will let us make the Dashboard even cooler and answer more important questions about your application's performance in the future.

Conclusion

Due to the above reasons, and after much hard thought, I can only come to the conclusion that AggregateStore would slow us down by making the codebase more complex, would mislead new users into using it and thinking Appdash isn't for real-world work, etc.

The intent is to bring this project forward for all Appdash users, and make app tracing better than ever before. I don't take the decision to remove existing code in an incompatible way lightly, but do find this to be the best path forward.

Potentially flaky CI test (TestCancelRequest)

From CI logs:

--- FAIL: TestCancelRequest (0.02s)
    client_test.go:142: got &url.Error{Op:"Get", URL:"http://example.com/foo", Err:(*http.httpError)(0xc82000b860)}, want Get http://example.com/foo: net/http: request canceled while waiting for connection

If it pops up again we can investigate.

Use from other languages? (Scala, Python, Ruby, etc)

Would it be possible to implement a tracing client in other languages? e.g. Could I use this in a Scala/Finatra app? What would it take to implement that sort of client?

Usage from a Martini app

Currently it's very tricky to utilize Appdash from a Martini application because httptrace primarily exposes a Negroni HTTP middleware which Martini doesn't seem to support at all.

We need to determine: what is the best way to use a Negroni HTTP middleware from within a Martini app? I imagine many others will run into this question in the future.

A hacky workaround for now: https://gist.github.com/slimsag/a7e1de60844656ec6a65

Consider exposing OpenTracing-compatible gRPC API

I haven't run very far with this idea, so this is just a thought dump for now. But I've been considering how Appdash could reach the most languages.. it stands to reason that we would gather support for a large number of languages by exposing a gRPC service which would literally be the OpenTracing API itself.

Then, from a user's perspective, they could either directly use this gRPC client from their language of choice OR we could even implement opentracing-python, opentracing-java, etc by simply calling out to this gRPC service.

This is interesting, because it could give many other tracers that do not wish to spend a significant amount of time implementing tracing clients in various languages automatic support if they were to expose the same OpenTracing-compatible gRPC service. This could also be done over, e.g. HTTP or others, I just chose gRPC because I am most familiar with it.

Connection Pooling/http.Client usage

Perhaps I'm misunderstanding the code or for some other reason its not a big deal, but it seems to me like the model that is being used to collect traces and keep track of spans means that connection pooling with persistent connections would not be easy to do.

I'm looking at the example application at https://github.com/sourcegraph/appdash/blob/master/examples/cmd/webapp/main.go#L103

httpClient := &http.Client{
    Transport: &httptrace.Transport{
        Recorder: appdash.NewRecorder(span, collector),
        SetName:  true,
    },
}

and a new client is being creating each time the handler is being called, and I don't see any way around this because the span is tightly tied in with the transport.

Traces page order changes upon refresh

I noticed when visiting the /traces page that the order of traces changes randomly upon refreshing the page. It would be nice if it was consistent upon multiple reloads, so long as the data doesn't change.

Probably is due to a map being used somewhere (which have no specific iteration order) within the template. Sorting would be a good option to fix this.

Traces page ordered randomly

The traces on the traces page are ordered randomly and change their order after refreshs. This was fixed once (issue #47) but it's broken again. If I had to guess, I'd start by looking at 1ddd075 which removed the call to sort.Sort.

Related, I'm not sure if sorting by ID is the best approach, or if it should be sorted by date instead?

InfluxDBStore: use a CQ to make the Dashboard faster

Use a continuous query to downsample our data and inherently make the Dashboard much much faster.

Slider in dashboard works the wrong way around

When the slider reads "X-Y hours ago", the query that is sent to the server is really "(72-X)-(72-Y)" hours ago. So for example, when the slider is set to "0-2 hours ago" the data that is reported is from 72 to 70 hours ago.

Adhere to gokit.io tracing spec

Depending on how it evolves, it'd be great to adhere to the gokit tracing spec. Looks like appdash hits all of the points so far except Zipkin compat, which should be doable.

This issue can remain open as a tracking issue.

Libraries shouldn't vendor code or leak vendored types

The top-level appdash package is vendoring a bunch of libraries, including InfluxDB. Since the appdash package is a library, and not a command, it shouldn't vendor libraries. Vendoring is the sole responsibility of the project owner. For example, it would be fine for appdash/cmd/appdash to vendor the libraries it needs.

The problem with libraries that vendor code is that one dependency might be introduced with different versions to the final binary. This results in two possible symptoms.

a library might affect global state. A DB driver, for example, might register itself with database/sql. If two versions of the same driver do this, it causes problems.
The package might be leaking vendored types. In the case of appdash, that's for example the influxDBServer.Config type. The type vendored in appdash will be different from the type I vendored myself in my project.

Now, it's of course possible for me to pull your vendored libs into my vendor directory, and delete yours. However that means that I'm not just tracking your repository anymore, but that I am maintaining a patched version of it. This creates a maintenance burden for me.

build error

sourcegraph.com/sourcegraph/appdash/cmd/appdash $ go build

github.com/influxdata/influxdb/tsdb/engine/tsm1

../../../../../github.com/influxdata/influxdb/tsdb/engine/tsm1/int.go:260: cannot use d.values[:](type []uint64) as type *[240]uint64 in argument to simple8b.Decode
../../../../../github.com/influxdata/influxdb/tsdb/engine/tsm1/timestamp.go:204: cannot use simple8b.NewDecoder(nil) (type *simple8b.Decoder) as type simple8b.Decoder in field value

InfluxDBStore: use InfluxDB client v2

We are not currently using v2 of the client API.

example/cmd/webapp-influxdb: demonstrates 1d retention policy but Dashboard needs 72/hr

The example demonstrates a 1d retention policy, but the Dashboard needs 72hr (as in, has a hard-coded default of a 72/hr timeline / GUI widgets).

Go 1.5 issues with package internal

In Go 1.5, the internal/ package rule is enforced. When trying to build appdash, the build errors with a message:

package sourcegraph.com/sourcegraph/appdash/cmd/appdash
    imports github.com/cznic/mathutil
    imports github.com/elazarl/go-bindata-assetfs
    imports github.com/gogo/protobuf/io
    imports github.com/gogo/protobuf/proto
    imports github.com/gorilla/context
    imports github.com/gorilla/mux
    imports github.com/jessevdk/go-flags
    imports sourcegraph.com/sourcegraph/appdash
    imports sourcegraph.com/sourcegraph/appdash/internal/wire
    imports sourcegraph.com/sourcegraph/appdash/internal/wire
    imports sourcegraph.com/sourcegraph/appdash/internal/wire: use of internal package not allowed

I think this means you'd have to move the cmd/appdash somewhere to be a root of internal. Perhaps the easiest thing is to move ./internal to ./cmd/appdash/internal although it's not very elegant.

Impossible to view a whole trace's graph at once

When looking at a trace (/traces/<id>), the graph only shows two spans (the root, and one child). Clicking on the child loads a new page that shows the child and its child. There is no way to view the entire trace's graph at once. Screenshot with example attached:

InfluxDBStore: multi-depth traces not reassembled due to error "maximum number of retries"

Steps to reproduce:

Unpack the provided .influxdb directory (download influxdb.tar.gz) into user home directory. This DB was generated by starting a Sourcegraph server instance and visiting the homepage of the app, i.e. it contains legitimate trace data (which works with MemoryStore etc).
Start webapp-influxdb and visit http://localhost:8700/traces

What is seen?

Error message maximum number of retries from InfluxDBStore.

What is expected?

No error / proper traces.

Notes:

findTraceParent returns nil: https://github.com/sourcegraph/appdash/blob/master/influxdb_store.go#L628
It returns nil because root.Sub does not yet exist! root.Sub does not yet exist because that is what addChildren is trying to solve :) (a contradiction)

rename dashboard column "Timespans" to "Count"

It was not clear to me what timespans meant.

Rename apptrace to appdash

The apptrace project is being renamed to appdash, to avoid naming conflicts with other things out there.

CCing everyone who's posted an issue or contributed so far. Sorry for the abrupt change, but we figured it's better to do it quick and early instead of waiting.

The new import path is sourcegraph.com/sourcegraph/appdash.

I will close this issue when the rename is complete (in an hour or so).

/cc @ernesto-jimenez @thoward @slimsag @samertm @beyang @gbbr

Identify profiling spans with identical names.

If multiple spans have identical names, it's rather hard (impossible?) to tell which one you're looking at. For example, Which one is localhost:8699?:

Possible solutions are:

Add a Span ID column to the table.
When clicking on a table row, bring the user to the appropriate sub-span page.

Fuzzy search fails on keys containing dots.

I just tested to ensure fuzzy searching works with most HTTP-based metadata, and it looks like it's broken when keys contain periods e.g. Response.StatusCode. It seems to be because Fuse identifies those keys as accessors into sub-data. I.e. obj.Response.StatusCode in JS.

Probably need to replace periods in the keys with underscores, e.g. Response_StatusCode, so that Fuse doesn't misinterpret our intent.

httptrace middleware for normal HTTP handleFunc

Hi, if I'm not misreading anything, the httptrace package only supports Negroni? Is there a way to directly return a HTTP handleFunc or I have to wrap up myself?

Thanks!

Documentation: Ruby client

Hi, not really and issue, but we have created a Ruby client and were wondering if you wanted to mention that in your documentation. Cheers, dim

`apptrace serve --sample-data` is broken

I'm trying to get a tackle on what the timeline UI looks like when there are a lot of cascading spans and sub-spans; it seems that running:

apptrace serve --sample-data

does not work (the timeline doesn't show at all). Chrome's console outputs:

Uncaught TypeError: Cannot read property 'forEach' of null    d3-timeline.js:82
(anonymous function)                                                               d3-timeline.js:73
(anonymous function)                                                               d3.js:884
(anonymous function)                                                               d3.js:890
d3_selection_each                                                                   d3.js:883
d3_selectionPrototype.each                                                     d3-timeline.js:72
timeline                                                                                     d3.js:897
d3_selectionPrototype.call                                                       00b74083bb122981:119
timelineHover                                                                           00b74083bb122981:122 (anonymous function)

I will report back here with more info as I find it.

Add README and docs

Right now there's no details or examples about how apptrace works and ho it is implemented.

It would be good to have some docs so people like me can check how useful the package would be for our use cases and what it would take to integrate it.

Add important event support to the OpenTracing impl

It would be nice to be able to add important annotations via the opentracing API.

I haven't spent much time thinking about it, but it would work by prefixing a tag's key, i.e. span.SetTag("impt:status", resp.StatusCode). This would add an important annotation with key="status".

Build instructions for dummies

Hi, there. I think it would be helpful to have a BUILD or otherwise file that could allow folks who are new to go a quick-start on how to build this. Ex. check version is at least X, watch out for Y (like package internal), etc.

LimitStore/RecentStore functionality in AggregateStore

It would be very useful if AggregateStore also provided functionality similar to LimitStore/RecentStore.

Investigate flaky CI test

Every once in awhile Travis CI fails due to a flaky test, we should investigate why this is. It seems that most PRs fail due to this right off the bat, but requeuing the build seems to fix it.

Degrade gracefully when Flash isn't available.

This is a possible enhancement, non-critical and low priority.

Right now it seems there's no handling of the case when Flash is not available in user's browser (either it's not installed, not available, or disabled).

Clicking "Copy as JSON" does nothing in such scenario.

If Flash is not available, perhaps something else should happen. Disable copying as json? Display a message saying "Sorry, Flash is needed for this functionality"? Display a text box popup that user can copy/paste? Do something else?

As an example of one way to handle such scenario, this is how GitHub looks like when you don't have Flash:

And this is how it appears when you do: