-
Notifications
You must be signed in to change notification settings - Fork 10
Add Distributed Tracing support #177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Distributed Tracing support #177
Conversation
6c6fb74
to
5606d42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited for distributed tracing 👏
@@ -161,16 +176,48 @@ public final actor ValkeyConnection: ValkeyClientProtocol, Sendable { | |||
|
|||
@inlinable | |||
func _execute<Command: ValkeyCommand>(command: Command) async throws -> RESPToken { | |||
#if DistributedTracingSupport | |||
let span = startSpan(Command.name, ofKind: .client) | |||
defer { span.end() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted to get rid of the unstructured span, could also do something like:
func _execute<Command: ValkeyCommand>(command: Command) async throws -> RESPToken {
#if DistributedTracingSupport
try await _executeWithTracing(command: command)
#else
try await _execute0(command: command)
#endif
}
func _execute0<Command: ValkeyCommand>(command: Command) async throws -> RESPToken {
// existing execute logic
}
#if DistributedTracingSupport
func _executeWithTracing<Command: ValkeyCommand>(command: Command) async throws -> RESPToken {
try await withSpan { span in
// update the span
do {
try await _execute0(command: command)
} catch {
// extract info from the error into the span
}
}
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. This is similar to what I originally went with. I eventually switched to the more imperative API because I thought it was easier to read. I'm happy to explore withSpan
again though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let Adam/Fabian steer you here, I'm also happy with either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure the client fast path API will not use structured concurrency. See #158.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #177 +/- ##
=======================================
Coverage ? 39.90%
=======================================
Files ? 86
Lines ? 15277
Branches ? 0
=======================================
Hits ? 6097
Misses ? 9180
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The API breakage check rightfully complains about the added protocol requirement extension ValkeyCommand {
public static var name: String { String(describing: Self.self) }
} |
a67131f
to
c57a3b5
Compare
API breakages are fine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of initial comments.
It's a shame I haven't got the benchmarks running for non-members of the repo. It fails on add the comment to the PR. I imagine this will add allocations. Can you run the benchmarks locally (run main and compare this PR with main) and post the results as a comment, cheers.
Here's the benchmark diff between swift-driver version: 1.127.10 Apple Swift version 6.2 (swiftlang-6.2.0.14.8 clang-1700.3.14.6)
Target: arm64-apple-macosx15.0 ===============================================================
Threshold deviations for ValkeyBenchmarks:Client: GET benchmark
===============================================================
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Malloc (total) (K, %) │ main │ pr │ Difference % │ Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p25 │ 74 │ 81 │ 10 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p50 │ 77 │ 83 │ 8 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p75 │ 78 │ 85 │ 8 │ 5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛
=========================================================================================================
Threshold deviations for ValkeyBenchmarks:Client: GET benchmark | parallel 20 | 20 concurrent connections
=========================================================================================================
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Malloc (total) (K, %) │ main │ pr │ Difference % │ Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p75 │ 117 │ 125 │ 6 │ 5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛
=========================================================================================================
Threshold deviations for ValkeyBenchmarks:Client: GET benchmark | parallel 50 | 20 concurrent connections
=========================================================================================================
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Malloc (total) (K, %) │ main │ pr │ Difference % │ Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p25 │ 5 │ 7 │ 30 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p50 │ 8 │ 11 │ 45 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p75 │ 11 │ 18 │ 59 │ 5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Instructions (M, %) │ main │ pr │ Difference % │ Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p25 │ 247 │ 269 │ 8 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p50 │ 249 │ 271 │ 9 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p75 │ 250 │ 272 │ 9 │ 5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛
===================================================================
Threshold deviations for ValkeyBenchmarks:Connection: GET benchmark
===================================================================
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Throughput (# / s) (#, %) │ main │ pr │ Difference % │ Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p25 │ 29 │ 27 │ 6 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p50 │ 29 │ 27 │ 6 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p75 │ 29 │ 27 │ 6 │ 5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Time (wall clock) (ms, %) │ main │ pr │ Difference % │ Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p25 │ 34 │ 37 │ 6 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p50 │ 35 │ 37 │ 6 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p75 │ 35 │ 38 │ 8 │ 5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Time (total CPU) (ms, %) │ main │ pr │ Difference % │ Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p25 │ 29 │ 31 │ 6 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p50 │ 29 │ 31 │ 6 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p75 │ 30 │ 32 │ 9 │ 5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Instructions (M, %) │ main │ pr │ Difference % │ Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p25 │ 225 │ 247 │ 9 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p50 │ 225 │ 247 │ 9 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p75 │ 225 │ 247 │ 9 │ 5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛
========================================================================
Threshold deviations for ValkeyBenchmarks:Connection: Pipeline benchmark
========================================================================
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Throughput (# / s) (#, %) │ main │ pr │ Difference % │ Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p25 │ 17 │ 16 │ 5 │ 5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p50 │ 17 │ 16 │ 5 │ 5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Malloc (total) (K, %) │ main │ pr │ Difference % │ Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p25 │ 14 │ 26 │ 83 │ 5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛
====================================================================
Threshold deviations for ValkeyBenchmarks:HashSlot – {user}.whatever
====================================================================
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Throughput (# / s) (#, %) │ main │ pr │ Difference % │ Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p75 │ 57 │ 53 │ 7 │ 5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Time (wall clock) (ms, %) │ main │ pr │ Difference % │ Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p75 │ 18 │ 19 │ 7 │ 5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛
=====================================================================================
Threshold deviations for ValkeyBenchmarks:ValkeyCommandEncoder – Command with 7 words
=====================================================================================
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Time (wall clock) (μs, %) │ main │ pr │ Difference % │ Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p75 │ 420 │ 441 │ 5 │ 5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Time (total CPU) (μs, %) │ main │ pr │ Difference % │ Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p75 │ 421 │ 443 │ 5 │ 5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛
New baseline 'pr' is WORSE than the 'main' baseline thresholds.
error: benchmarkThresholdRegression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slashmo Thank you for taking a stab at this one! Really like the direction. We will need to dive deep on some of the aspects.
while prefixEndIndex < message.endIndex, message[prefixEndIndex] != " " { | ||
message.formIndex(after: &prefixEndIndex) | ||
} | ||
let prefix = message[message.startIndex..<prefixEndIndex] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure all of this is in the UTF8 space. Can we operate on the utf8 view instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe break the code into a separate function as well, to make it clear what you are doing
attributes["network.peer.address"] = channel.remoteAddress?.ipAddress | ||
attributes["network.peer.port"] = channel.remoteAddress?.port | ||
attributes["server.address"] = address?.hostOrSocketPath | ||
attributes["server.port"] = address?.port == 6379 ? nil : address?.port |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check first if we have a value. Also please use explicit self. Can we make all those labels configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look like the OTel Semantic Conventions, this could also take an explicit dependency on that library to make these typesafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check first if we have a value.
Functionally there's no difference between setting a nil
value for an attribute and not setting it at all. Or do you see a performance optimization in not going through the subscript unless necessary?
After discussing this PR in more detail, @fabianfett and I defined the following next actions:
|
This list looks fairly good. You did forget find a solution for pipelined commands. |
We also briefly discussed this, forgot to mention it here. What I'd ideally like to do is to pass whether the pipelined execute was preceded by a |
Great to see this happening, thank you @slashmo, @adam-fowler, @fabianfett! Was just discussing otel plans with Moritz and the thread came up. I'm really glad to see this approach does:
It's somewhat concerning to adopt a pattern of "pass // bad, always need to run the nested code
tracer?.withSpan(...) { span in code } // ofc wrong, always need to run the 'code'
// so what... duplication like this is absolutely not ok
if tracer {
tracer.withSpan(...) { span in code }
} else {
code
} While with noop-tracers this "just works" because: tracer.withSpan(...) { noopSpan in code } // noopspan is a struct Are we absolutely certain this optional-nil vs "noop" is a must here? Is the difference actually significant enough to warrant the pain, given this is in tracing is enabled mode? If so, we must answer the question what libraries should be doing when using tracer like this. Because what would the pattern look like for adopter code of tracing that do use structured concurrency. I understand your specific lib is not using structured concurrency in the core, but if we start recommending "pass We have a weekly call we could purpose for designing the answer here, @slashmo will try to organize the invites! :-) |
Seems we had a minor misunderstanding here! Coming off with a chat with @fabianfett - we clarified this isn't trying to establish "the" pattern we'll ask other libraries to adopt. So technically it's really up to this lib to do whatever you want with passing the tracer -- optional or noop. So my review isn't as strict really, we were only concerned about end users adopting this pattern in their own code, but this isn't what this valkey intended to be a showcase of. The optional vs no-op difference is rather minimal in performance but it does cause a few rare allocations whereas without it valkey is solidly at The way this sets up the tracer "once" in the config object is also great and I have no concerns there. We should bring the "in memory recording" tracer to swift-distributed-tracing as discussed up-thread though, I look forward to accepting that over there :-) In case it didn't get across the first comment: this is GREAT and I'm really happy with this PR, I just had some usability concerns if we were to recommend this style to everyone (but we don't, so there's no problem) 👍 |
9129908
to
3d01310
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I think that's a nice approach, maybe just add some comments about it
@@ -117,6 +120,10 @@ public struct ValkeyConnectionConfiguration: Sendable { | |||
/// Default value is `nil` (no client name is set). | |||
public var clientName: String? | |||
|
|||
#if DistributedTracingSupport | |||
public var tracing: ValkeyTracingConfiguration = .init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this one maybe just a small note that picks global bootstrapped tracer at this point in time and allows configuring tracing?
#if DistributedTracingSupport | ||
public struct ValkeyTracingConfiguration: Sendable { | ||
|
||
public var tracer: (any Tracer)? = InstrumentationSystem.tracer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth some docs here how this is
- by default picks up global tracer bootstrapped in
InstrumentationSystem
- can be set to
nil
to disable tracing entirely, even if global tracer is bootstrapped - can be set to specific tracer, which takes precedence over global bootstrapped tracer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If global tracer is NoOp tracer is it worth setting tracer
to nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had asked about this over here #177 (comment) and had a chat with @fabianfett about it.
So there seems to be minor differences and although the performance is roughly the same it seems to have caused an odd allocation here and there, whereas without it the allocation count in valkey was a solid 0... The NoopTracer is definitely nicer to use but if you truly want to go with the Tracer?
here I think that's fine -- @fabianfett strongly preferred that at least for this lib.
If we were to suggest people in libraries, I think you're very right -- doing Tracer = NoopTracer...
is preferable, rather than Tracer?
because of the usability aspect, but in this specific lib I think it's fine... Since Fabian really wanted to have the allocs count a strict zero here heh :)
It's not that the allocations come from the noop-spans btw, they are structs... so I'm uncertain where they were coming from; we'd need to do more digging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slashmo Can we get some comments on ValkeyTracingConfiguration and its members. This is currently a public symbol without any comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let me add some.
8030400
to
56ed070
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think we are good, added a few more comments.
self.channelHandler.write(command: command, continuation: continuation, requestID: requestID) | ||
} catch let error as ValkeyClientError { | ||
#if DistributedTracingSupport | ||
span?.recordError(error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe wrap all this code in if let span
#if DistributedTracingSupport | ||
public struct ValkeyTracingConfiguration: Sendable { | ||
|
||
public var tracer: (any Tracer)? = InstrumentationSystem.tracer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If global tracer is NoOp tracer is it worth setting tracer
to nil
?
while prefixEndIndex < message.endIndex, message[prefixEndIndex] != " " { | ||
message.formIndex(after: &prefixEndIndex) | ||
} | ||
let prefix = message[message.startIndex..<prefixEndIndex] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe break the code into a separate function as well, to make it clear what you are doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to ensure everything runs with the DistributedTracingSupport trait disabled
swift test --disable-default-traits
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Fabian Fett <[email protected]>
Signed-off-by: Fabian Fett <[email protected]>
Signed-off-by: Fabian Fett <[email protected]>
Signed-off-by: Fabian Fett <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
Signed-off-by: Moritz Lang <[email protected]>
6ca1b36
to
34db440
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor thing and we're good to go
#if DistributedTracingSupport | ||
public struct ValkeyTracingConfiguration: Sendable { | ||
|
||
public var tracer: (any Tracer)? = InstrumentationSystem.tracer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slashmo Can we get some comments on ValkeyTracingConfiguration and its members. This is currently a public symbol without any comments.
Signed-off-by: Moritz Lang <[email protected]>
9643e52
to
faf9dcc
Compare
Signed-off-by: Moritz Lang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @slashmo this is great
Motivation
To make the most of Distributed Tracing, users may want to view operations performed by their database clients as part of a larger request to gain more insight into their distributed systems.
Modifications
DistributedTracingSupport
trait, enabled by defaultdb.operation.name
span attributeopen-telemetry
example to showcase Distributed Tracing supportWhen implementing the span and span attributes I closely followed the OpenTelemetry semantic conventions for Redis and general database client spans:
I believe that this provides the highest chance of the span being recognized as a database client operation by various observability backends.
Result
Users can now use Distributed Tracing spans automatically created by the library to more deeply observe their systems.
Closes #176