Implement open telemetry integration - client/server and explict token latency#296
Implement open telemetry integration - client/server and explict token latency#296ppalucki wants to merge 1 commit intoopea-project:mainfrom
Conversation
9bf7090 to
d15ac8c
Compare
| // bag := baggage.FromContext(ctx) | ||
| // span.AddEvent("handling this...", trace.WithAttributes(uk.String(bag.Member("username").Value()))) | ||
|
|
||
| _, _ = io.WriteString(w, "Hello, world!\n") |
There was a problem hiding this comment.
Hi @ppalucki, what's the purpose of this response?
There was a problem hiding this comment.
@zhlsunshine I accidently copied it from some working branch of streaming, later rebased - now it is removed.
| // span.AddEvent("handling this...", trace.WithAttributes(uk.String(bag.Member("username").Value()))) | ||
|
|
||
| _, _ = io.WriteString(w, "Hello, world!\n") | ||
| ctx, cancel := context.WithTimeout(req.Context(), 3600*time.Minute) |
There was a problem hiding this comment.
@zhlsunshine fixed (long timeout was used it for my debugging sessions)
|
Hi @ppalucki, thanks for your effort on this PR! I think there should be a measurement after I think you also can support the measurement metrics in |
| //mux.Handle("/dataprep", otelhttp.NewHandler(http.HandlerFunc(mcDataHandler), "mcDataHandler")) | ||
|
|
||
| // Export metric using Prometheus client built-in handler. | ||
| mux.Handle("/metrics", promhttp.Handler()) |
There was a problem hiding this comment.
Hi @ppalucki, is it possible to registry this router metrics via function handleFunc as well?
There was a problem hiding this comment.
@zhlsunshine fixed (added)
yes, it is possible, initally I didn't want to include /metrics in stats
but you're right - it should be covered and excluded later in visualization/post processing
I quickly added:
I need to test it more carefully. here is example output: |
735d3be to
d1f3b04
Compare
Small issues and also collect stats from /metrics router remove examples + add allTokenLatency metric Fix description of histograms pipeline latency histogram before token measurments
d1f3b04 to
5548dfd
Compare
|
Hi @ppalucki, I think you may need to fix the DCO error by adding the |
| provider := sdkmetric.NewMeterProvider(sdkmetric.WithReader(exporter)) | ||
| otel.SetMeterProvider(provider) | ||
|
|
||
| // ppalucki: Own metrics defintion bellow |
There was a problem hiding this comment.
Typos:
| // ppalucki: Own metrics defintion bellow | |
| // ppalucki: Own metrics definition below |
| var ( | ||
| firstTokenLatencyMeasure metric.Float64Histogram | ||
| nextTokenLatencyMeasure metric.Float64Histogram | ||
| allTokenLatencyMeasure metric.Float64Histogram | ||
| pipelineLatencyMeasure metric.Float64Histogram | ||
| ) |
There was a problem hiding this comment.
IMHO this would look nicer as:
| var ( | |
| firstTokenLatencyMeasure metric.Float64Histogram | |
| nextTokenLatencyMeasure metric.Float64Histogram | |
| allTokenLatencyMeasure metric.Float64Histogram | |
| pipelineLatencyMeasure metric.Float64Histogram | |
| ) | |
| type struct LatencyMeasureT { | |
| firstToken, nextToken, allTokens, pipeline metric.Float64Histogram | |
| } | |
| var latencyMeasure latencyMeasureT |
| firstTokenLatencyMeasure, err = meter.Float64Histogram( | ||
| "llm.first.token.latency", | ||
| metric.WithUnit("ms"), | ||
| metric.WithDescription("Measures the duration of first token generation."), | ||
| api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364), | ||
| ) | ||
| if err != nil { | ||
| log.Error(err, "metrics: cannot register first token histogram measure") | ||
| } | ||
| nextTokenLatencyMeasure, err = meter.Float64Histogram( | ||
| "llm.next.token.latency", | ||
| metric.WithUnit("ms"), | ||
| metric.WithDescription("Measures the duration of generating all but first tokens."), | ||
| api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364), | ||
| ) | ||
| if err != nil { | ||
| log.Error(err, "metrics: cannot register next token histogram measure") | ||
| } | ||
|
|
||
| allTokenLatencyMeasure, err = meter.Float64Histogram( | ||
| "llm.all.token.latency", | ||
| metric.WithUnit("ms"), | ||
| metric.WithDescription("Measures the duration to generate response with all tokens."), | ||
| api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364), | ||
| ) | ||
| if err != nil { | ||
| log.Error(err, "metrics: cannot register all token histogram measure") | ||
| } | ||
|
|
||
| pipelineLatencyMeasure, err = meter.Float64Histogram( | ||
| "llm.pipeline.latency", | ||
| metric.WithUnit("ms"), | ||
| metric.WithDescription("Measures the duration to going through pipeline steps until first token is being generated (including read data time from client)."), | ||
| api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364), | ||
| ) | ||
| if err != nil { | ||
| log.Error(err, "metrics: cannot register pipeline histogram measure") | ||
| } |
There was a problem hiding this comment.
There's a bit of duplication. Maybe this could iterate over slice like (untested code):
metrics := struct {
metric *metric.Float64Histogram
name string
desc string
}[] = {
{ &latencyMeasure.firstToken, "llm.first.token.latency", "..." },
...
}
for _, item := metrics {
item.metric, err = meter.Float64Histogram(
item.name,
metric.WithUnit("ms"),
metric.WithDescription(item.desc),
api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364),
)
if err != nil {
log.Errorf(err, "metrics: cannot register '%s' histogram measure", item.name)
}
}
|
See also: opea-project/GenAIComps#845 |
|
This PR is going to be superseded by new work - please check this #644 Also some issues were found:
Here is proposed change semantic to generated metrics: Currently (wrong): and should be (fixed): where ntl stands for "next token latnecy" |
|
IMHO first and next token latency are also OK as metrics names, at least as OTEL spec does not offer any official name for them. Proposed semantics looks good. |
Description
Integrate OpenTelemetry (metrics only) and expose it using Promteheus exporter for:
Additional feature is integrated opentelemetry trace context propagation (not tested).
Metrics are exposed on /metric with Prometheus handler. Example output for metrics:
1) server metrics example
2) client metrics example
3) custom metrics example
4) Prometheus client default Go language metrics:
Issues
n/a
Type of change
Dependencies
go.opentelemetry.io/otelgo.opentelemetry.io/otel/exporters/prometheusgo.opentelemetry.io/otel/metricgo.opentelemetry.io/otel/sdk/metricgo.opentelemetry.io/otel/metricgithub.com/prometheus/client_golang/prometheus/promhttpgo.opentelemetry.io/contrib/instrumentation/net/http/otelhttpTests
Manually using curl (examples show above) after running only with ChatQnA sample graph.
TODO: