Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/skywalking.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,8 @@ jobs:
test:
- name: Cluster ZK/ES
config: test/e2e-v2/cases/cluster/zk/es/e2e.yaml
- name: Cluster ZK/BanyanDB
config: test/e2e-v2/cases/cluster/zk/banyandb/e2e.yaml

- name: Agent NodeJS Backend
config: test/e2e-v2/cases/nodejs/e2e.yaml
Expand Down
50 changes: 50 additions & 0 deletions docs/en/changes/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,28 @@
* Add `CLAUDE.md` as AI assistant guide for the project.
* Upgrade Groovy to 5.0.3 in OAP backend.
* Bump up nodejs to v24.13.0 for the latest UI(booster-ui) compiling.
* Add `library-batch-queue` module — a partitioned, self-draining queue with type-based dispatch,
adaptive partitioning, idle backoff, and throughput-weighted drain rebalancing (`DrainBalancer`).
Designed to replace DataCarrier in high-fan-out scenarios.
* Replace DataCarrier with BatchQueue for L1 metrics aggregation, L2 metrics persistence, TopN persistence,
all three exporters (gRPC metrics, Kafka trace, Kafka log), and gRPC remote client.
All metric types (OAL + MAL) now share unified queues instead of separate OAL/MAL pools.
Each exporter keeps its own dedicated queue with 1 thread, preserving original buffer strategies.
Thread count comparison on an 8-core machine (gRPC remote client excluded — unchanged 1 thread per peer):

| Queue | Old threads | Old channels | Old buffer slots | New threads | New partitions | New buffer slots | New policy |
|-------|-------------|--------------|------------------|-------------|----------------|------------------|------------|
| L1 Aggregation (OAL) | 24 | ~1,240 | ~12.4M | 8 (unified) | ~330 adaptive | ~6.6M | `cpuCores(1.0)` |
| L1 Aggregation (MAL) | 2 | ~100 | ~100K | (unified above) | | | |
| L2 Persistence (OAL) | 2 | ~620 | ~1.24M | 3 (unified) | ~330 adaptive | ~660K | `cpuCoresWithBase(1, 0.25)` |
| L2 Persistence (MAL) | 1 | ~100 | ~100K | (unified above) | | | |
| TopN Persistence | 4 | 4 | 4K | 1 | 4 adaptive | 4K | `fixed(1)` |
| Exporters (gRPC/Kafka) | 3 | 6 | 120K | 3 (1 per exporter) | — | 60K | `fixed(1)` each |
| **Total** | **36** | **~2,070** | **~13.9M** | **15** | **~664** | **~7.3M** | |

* Remove `library-datacarrier-queue` module. All usages have been replaced by `library-batch-queue`.
* Enable throughput-weighted drain rebalancing for L1 aggregation and L2 persistence queues (10s interval).
Periodically reassigns partitions across drain threads to equalize load when metric types have skewed throughput.

#### OAP Server

Expand All @@ -29,6 +51,34 @@
* Replace BanyanDB Java client with native implementation.
* Remove `bydb.dependencies.properties` and set the compatible BanyanDB API version number in `${SW_STORAGE_BANYANDB_COMPATIBLE_SERVER_API_VERSIONS}`.
* Fix trace profiling query time range condition.
* Add named ThreadFactory to all `Executors.newXxx()` calls to replace anonymous `pool-N-thread-M` thread names
with meaningful names for easier thread dump analysis. Complete OAP server thread inventory
(counts on an 8-core machine, exporters and JDBC are optional):

| Catalog | Thread Name | Count | Policy | Partitions |
|---------|-------------|-------|--------|------------|
| Data Pipeline | `BatchQueue-METRICS_L1_AGGREGATION-N` | 8 | `cpuCores(1.0)` | ~330 adaptive |
| Data Pipeline | `BatchQueue-METRICS_L2_PERSISTENCE-N` | 3 | `cpuCoresWithBase(1, 0.25)` | ~330 adaptive |
| Data Pipeline | `BatchQueue-TOPN_PERSISTENCE-N` | 1 | `fixed(1)` | ~4 adaptive |
| Data Pipeline | `BatchQueue-GRPC_REMOTE_{host}_{port}-N` | 1 per peer | `fixed(1)` | `fixed(1)` |
| Data Pipeline | `BatchQueue-EXPORTER_GRPC_METRICS-N` | 1 | `fixed(1)` | `fixed(1)` |
| Data Pipeline | `BatchQueue-EXPORTER_KAFKA_TRACE-N` | 1 | `fixed(1)` | `fixed(1)` |
| Data Pipeline | `BatchQueue-EXPORTER_KAFKA_LOG-N` | 1 | `fixed(1)` | `fixed(1)` |
| Data Pipeline | `BatchQueue-JDBC_ASYNC_BATCH_PERSISTENT-N` | 4 (configurable) | `fixed(N)` | `fixed(N)` |
| Scheduler | `RemoteClientManager` | 1 | scheduled | — |
| Scheduler | `PersistenceTimer` | 1 | scheduled | — |
| Scheduler | `PersistenceTimer-prepare-N` | 2 (configurable) | fixed pool | — |
| Scheduler | `DataTTLKeeper` | 1 | scheduled | — |
| Scheduler | `CacheUpdateTimer` | 1 | scheduled | — |
| Scheduler | `HierarchyAutoMatching` | 1 | scheduled | — |
| Scheduler | `WatermarkWatcher` | 1 | scheduled | — |
| Scheduler | `AlarmCore` | 1 | scheduled | — |
| Scheduler | `HealthChecker` | 1 | scheduled | — |
| Scheduler | `EndpointUriRecognition` | 1 (conditional) | scheduled | — |
| Scheduler | `FileChangeMonitor` | 1 | scheduled | — |
| Scheduler | `BanyanDB-ChannelManager` | 1 | scheduled | — |
| Scheduler | `GRPCClient-HealthCheck-{host}:{port}` | 1 per client | scheduled | — |
| Scheduler | `EBPFProfiling-N` | configurable | fixed pool | — |
* Fix BanyanDB time range overflow in profile thread snapshot query.
* `BrowserErrorLog`, OAP Server generated UUID to replace the original client side ID, because Browser scripts can't guarantee generated IDs are globally unique.
* MQE: fix multiple labeled metric query and ensure no results are returned if no label value combinations match.
Expand Down
222 changes: 11 additions & 211 deletions docs/en/setup/backend/grafana-cluster.json
Original file line number Diff line number Diff line change
Expand Up @@ -6693,17 +6693,17 @@
"uid": "$datasource"
},
"editorMode": "code",
"expr": "topk(10,avg by(metricName)(metrics_aggregation_queue_used_percentage{job=\"$job\",level=\"1\",kind=\"OAL\"}))",
"expr": "avg by(slot)(metrics_aggregation_queue_used_percentage{job=\"$job\",level=\"1\"})",
"format": "time_series",
"hide": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "{{metricName}}",
"legendFormat": "{{slot}}",
"range": true,
"refId": "A"
}
],
"title": "OAL L1 Aggregation Queue Percentage (%)",
"title": "L1 Aggregation Queue Percentage (%)",
"type": "timeseries"
},
{
Expand Down Expand Up @@ -6793,17 +6793,17 @@
"uid": "$datasource"
},
"editorMode": "code",
"expr": "topk(10,avg by(metricName)(metrics_aggregation_queue_used_percentage{job=\"$job\",level=\"1\",kind=\"MAL\"}))",
"expr": "avg by(slot)(metrics_aggregation_queue_used_percentage{job=\"$job\",level=\"2\"})",
"format": "time_series",
"hide": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "{{metricName}}",
"legendFormat": "{{slot}}",
"range": true,
"refId": "A"
}
],
"title": "MAL L1 Aggregation Queue Percentage (%)",
"title": "L2 Aggregation Queue Percentage (%)",
"type": "timeseries"
},
{
Expand Down Expand Up @@ -6871,206 +6871,6 @@
"x": 8,
"y": 122
},
"id": 150,
"interval": "1m",
"options": {
"dataLinks": [],
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"pluginVersion": "11.3.1",
"targets": [
{
"datasource": {
"uid": "$datasource"
},
"editorMode": "code",
"expr": "topk(10,avg by(metricName)(metrics_aggregation_queue_used_percentage{job=\"$job\",level=\"2\",kind=\"OAL\"}))",
"format": "time_series",
"hide": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "{{metricName}}",
"range": true,
"refId": "A"
}
],
"title": "OAL L2 Aggregation Queue Percentage (%)",
"type": "timeseries"
},
{
"datasource": {
"uid": "$datasource"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 6,
"w": 8,
"x": 16,
"y": 122
},
"id": 151,
"interval": "1m",
"options": {
"dataLinks": [],
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"pluginVersion": "11.3.1",
"targets": [
{
"datasource": {
"uid": "$datasource"
},
"editorMode": "code",
"expr": "topk(10,avg by(metricName)(metrics_aggregation_queue_used_percentage{job=\"$job\",level=\"2\",kind=\"MAL\"}))",
"format": "time_series",
"hide": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "{{metricName}}",
"range": true,
"refId": "A"
}
],
"title": "MAL L2 Aggregation Queue Percentage (%)",
"type": "timeseries"
},
{
"datasource": {
"uid": "$datasource"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 6,
"w": 8,
"x": 0,
"y": 128
},
"id": 149,
"interval": "1m",
"options": {
Expand Down Expand Up @@ -7168,8 +6968,8 @@
"gridPos": {
"h": 6,
"w": 8,
"x": 8,
"y": 128
"x": 16,
"y": 122
},
"id": 152,
"interval": "1m",
Expand Down Expand Up @@ -7269,7 +7069,7 @@
"gridPos": {
"h": 6,
"w": 8,
"x": 16,
"x": 0,
"y": 128
},
"id": 146,
Expand Down Expand Up @@ -7421,8 +7221,8 @@
"gridPos": {
"h": 6,
"w": 8,
"x": 0,
"y": 134
"x": 8,
"y": 128
},
"id": 145,
"interval": "1m",
Expand Down
Loading
Loading