-
Notifications
You must be signed in to change notification settings - Fork 1.4k
OpenTelemetry metric plugin and dual emitting #16665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
9dcedc0
to
edbb88c
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #16665 +/- ##
============================================
- Coverage 63.42% 63.35% -0.07%
- Complexity 1400 1424 +24
============================================
Files 3054 3067 +13
Lines 178766 179400 +634
Branches 27399 27450 +51
============================================
+ Hits 113378 113666 +288
- Misses 56656 57008 +352
+ Partials 8732 8726 -6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There must be some Opentelemetry Collector processor which can help achieve the same thing we are achieving here?
178b9b2
to
0f8a2e5
Compare
The major concern is: pinot metric name is NOT Opentelemetry friendly (or more specifically, not multi-dimension friendly), meaning: whenever pinot core call the metric SPI to emit metric, it has concatenated all attributes/dimensions into metric name. The concatenating happens at every caller place, not inside plugin implementation. So if using Opentelemetry Collector processor to collect the metric from JMX, it's still NOT multi-dimension metric. This make monitoring dashboard un-maintainable: we have to create a dashboard for each table, because table name has been concatenated into metric name. This PR emit multi-dimension metric, where table name is a dimension (or "attribute" in Opentelemetry's terminology). It will make dashboard more maintainable: you will be bale to create one dashboard, then on your metric platform UI, there will be a dropdown list to let you select the table (because table name is now a dimension, you will be able to search metrics using filters etc). |
0c5afb9
to
e368f08
Compare
0e8a8be
to
7b8ca0d
Compare
f0839ff
to
5d7eb08
Compare
This PR add OpenTelemetry metric plugin.
Problem and Challenge:
The Pinot metric SPI is not OpenTelemetry friendly, when call the metric SPI to emit metric, it has concatenated all attributes/dimensions into metric name. The concatenating happens at every caller place, not inside plugin implementation.
For different metrics, the order of the concatenating are different.
Example of metric naming pattern includes:
pinot.broker.<rawTableName>.<queryPhaseName>
pinot.controller.<metricName>.<taskType>
pinot.controller.<metricName><tableNameWithType>.<taskType>
pinot.controller.<metricName>.<rawTableName>.<topicName>
pinot.controller.<resourceName>.<metricName>
pinot.minion.<tableName>.<taskType>.<metricName>
pinot.minion.<taskType>.<metricName>
pinot.minion.<tableName>.<metricName>
Proposed solution
The idea of this PR is using
PinotMatricName
to carry aString simplifiedMericName
andMap<Stting, String> attributes
, Yammer/Dropwizard will just ignore those two extra filed, but OpenTelemetry will use thesimplifiedMericName
as metric name and attach attributes when updating metric value.For example, the metric of
pinot.controller.Purge.NumMinionSubtasksWaiting
will be treated as:This approach has minimal impact to existing Yammer and Dropwizard metrics.
Alternative considered
Another approach is updating the Pinot metric SPI to be dimension friendly and concatenating metic name with dimensions at Yammer/Dropwizard plugin implementation layer. The concern of this approach is that many pinot users has dashboard/alert created on top of existing Yammer/Dropwizard metric. This approach will touch huge number of places and has very high risk of breaking existing Yammer/Dropwizard metric and causing regression.
Local Testing
Step 1: Launch a Opentelemetry-Collector.
Step 2: Build and Run the Pinot Quick Demo cluster
Tweak DEFAULT_HTTP_METRIC_EXPORTER
First please change
DEFAULT_HTTP_METRIC_EXPORTER
endpoint ashttp://127.0.0.1:4318/v1/metrics
. The Opentelemetry-Collector that we use to collect/display metrics is listening tolocalhost:4318
instead of0.0.0.0:4318
, so we have to emit/report/export metrics to127.0.0.1:4318
.Please note that Docker port forwarding (meaning
docker run -p 127.0.0.1:22784:4318 otel/opentelemetry-collector-contrib:0.133.0
)will not work. As the /opentelemetry-collector container be configured to listen only on 127.0.0.1 (localhost) instead of 0.0.0.0 (all interfaces). Docker's port forwarding relies on the application binding to an address accessible from outside the container's localhost. This is a bug/feature-gap that opentelemetry-collector need to fix.Enable OTel metric plugin
Change
DEFAULT_METRICS_FACTORY_CLASS_NAME
in pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java asorg.apache.pinot.plugin.metrics.opentelemetry.CompoundPinotMetricsFactory
so that we can tryout Yammer and OTel metrics dual emitting.There are huge number of testing are based on Yammer metric, if we make OpenTelemetry as default, it will bring down many testing.
Build and launch a Pinot cluster
Step 3: Verifying metric being reported/exported
On the terminal where we launch the docker Opentelemetry-Collector container, we can see Pinot metric has been emitted and successfully reported/exported:
For example, we can see that
pinot.server.realtimeBytesDropped
has been parsed as