This software is pre-production and should not be deployed to production servers.
Table of Contents
WCA project contains simple built-in dependency injection framework that allows to extend existing or add new functionalities.
This document contains examples of:
- simple
Runnerthat outputs"Hello World!", - HTTP based
Storagecomponent to save metrics in external http based service, usingrequestslibrary.
To provide new functionality using external compoent operator of WCA has to:
- provide new component defined as Python class,
- register this Python class upon starting with extra command line
--registerparameter aspackage_name.module_name:class_name) (package name is optional), - reference component name in configuration file (using name of class),
- make Python module accessible by Python interpreter for import (
PYTHONPATHandPEX_INHERIT_PATHenvironment variables)
In this document when referring to component, it means a simple Python class that was registered and by this allowed to be used in configuration file.
All WCA features (detection/CMS integration) are based on internal components and use the same mechanism for initialization.
From high-level standpoint, main entry point to application is only responsible for
instantiation of python classes defined in yaml configuration, then parsing and preparing logging infrastructure and then call generic run method on already created Runner instance.
Runner class is a main vehicle integrating all other depended objects together.
For example, MeasurementRunner is implements simple loop
that uses Node subclass (e.g. MesosNode) instance to discover locally running tasks, then collects metrics for those tasks
and then uses a Storage subclass to store those metrics somewhere (e.g. KafkaStorage or LogStorage).
To illustrate that, when someone uses WCA with configuration file like this:
runner: !MeasurementRunner
node: !MesosNode # subclass of Node
metric_storage: !LogStorage # subclass of Storage
output_filename: /tmp/logs.txtit effectively means running equivalent of Python code:
runner = MeasurementRunner(
node = MesosNode()
metric_storage = LogStorage(
output_filename = '/tmp/logs.txt')
)
runner.run()For example, to provide measure-only mode, anomaly detection mode or resource allocation mode, WCA contains following components:
MeasurementRunnerthat is only responsible for collecting metrics,DetectionRunnerthat extendsMeasurementRunnerto allow anomaly detection and generate additional metrics,AllocationRunnerthat allows to configure resources based on providedAllocatorcomponent instance,
It is important to note, that configuration based objects (components) are static singletons available throughout whole application life and only accessible by parent objects.
Let's start with very basic thing and create HelloWorldRunner that just outputs 'Hello world!' string.
With Python module hello_world_runner.py containing HelloWorldRunner subclass of Runner:
from wca.runners import Runner
class HelloWorldRunner(Runner):
def run(self):
print('Hello world!')you need to start WCA with following example config file:
runner: !HelloWorldRunnerand then with WCA started like this
PYTHONPATH=$PWD/examples PEX_INHERIT_PATH=fallback ./dist/wca.pex -c $PWD/configs/extending/hello_world.yaml -r hello_world_runner:HelloWorldRunner| Tip: | You can just copy-paste this command, all required example files are already in project, but you have to build pex file first with make. |
|---|
should output:
Hello world!To integrate with custom monitoring system it is enough to provide definition of custom Storage class.
Storage class is a simple interface that exposes just one method store as defined below:
class Storage:
def store(self, metrics: List[Metric]) -> None:
"""store metrics; may throw FailedDeliveryException"""
...where Metric is simple class with structure influenced by Prometheus metric model and OpenMetrics initiative :
@dataclass
class Metric:
name: str
value: float
labels: Dict[str, str]
type: str # gauge/counter
help: strThis is simple Storage class that can be used to post metrics serialized as json to
external http web service using post method:
(full source code here)
import requests, json
from dataclasses import dataclass
from wca.storage import Storage
@dataclass
class HTTPStorage(Storage):
http_endpoint: str = 'http://127.0.0.1:8000'
def store(self, metrics):
requests.post(
self.http_endpoint,
json={metric.name: metric.value for metric in metrics}:w
)then in can be used with MeasurementRunner with following configuration file:
runner: !MeasurementRunner
config: !MeasurementRunnerConfig
node: !StaticNode
tasks: [] # this disables any tasks metrics
metrics_storage: !HTTPStorageTo be able to verify that data was posted to http service correctly please start naive service
using socat:
socat - tcp4-listen:8000,forkand then run WCA like this:
sudo env PYTHONPATH=$PWD/examples PEX_INHERIT_PATH=fallback ./dist/wca.pex -c $PWD/configs/extending/measurement_http_storage.yaml -r http_store:HTTPStorage --root --log http_storage:infoExpected output is:
# from WCA:
2019-06-14 21:51:17,862 INFO {MainThread} [http_storage] sending!
# from socat:
POST / HTTP/1.1
Host: 127.0.0.1:8000
User-Agent: python-requests/2.21.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Content-Length: 240
Content-Type: application/json
{"wca_up": 1560541957.1652732, "wca_tasks": 0, "wca_memory_usage_bytes": 50159616,
"memory_usage": 1399689216, "cpu_usage_per_cpu": 1205557,
"wca_duration_seconds": 1.0013580322265625e-05,
"wca_duration_seconds_avg": 1.0013580322265625e-05}Note:
- sudo is required to enable perf and resctrl based metrics,
- --log parameter allow to specify log level for custom components
Depending on Runner component, different kinds of metrics are produced and send to different instances of Storage components:
MeasurementRunnerusesStorageinstance undermetrics_storageproperty to store:- platform level resources usage (CPU/memory usage) metrics,
- internal WCA metrics: number of monitored tasks, number of errors/warnings, health-checks, WCA memory usage,
- (per-task) perf system based metrics e.g. instructions, cycles
- (per-task) Intel RDT based metrics e.g. cache usage, memory bandwidth
- (per-task) cgroup based metrics e.g. CPU/memory usage
Each of those metrics has additional metadata attached (in form of labels) about:
- platform topology (sockets/cores/cpus),
extra labelsdefined in WCA configuration file (e.g. own_ip),- labels to identify WCA version
wca_versionand host name (host) and host CPU modelcpu_model, - (only for per-task metrics) task id (
task_id) and metadata acquired from orchestration system (Mesos task or Kubernetes pod labels)
DetectionRunnerusesStoragesubclass instances:in
metrics_storageproperty:- the same metrics as send to
MeasurmentRunnerinmetrics_storageabove,
in
anomalies_storageproperty:- number of anomalies detected by
Allcocatorclass - individual instances of detected anomalies encoded as metrics (more details here)
- the same metrics as send to
AllocationRunnerusesStoragesubclass instances:in
metrics_storageproperty:- the same metrics as send to
MeasurementRunnerinmetrics_storageabove,
in
anomalies_storageproperty:- the same metrics as send to
DetectionRunnerinanomalies_storageabove,
in
alloation_storageproperty:- number of resource allocations performed during last iteration,
- details about performed allocations like: number of CPU shares or CPU quota or cache allocation,
- more details here
- the same metrics as send to
Note that it is possible by using YAML anchors and aliases to configure that the same instance of Storage should be used to store all kinds of metrics:
runner: !AllocationRunner
config: !AllocationRunnerConfig
metrics_storage: &kafka_storage_instance !KafkaStorage
topic: all_metrics
broker_ips:
- 127.0.0.1:9092
- 127.0.0.2:9092
max_timeout_in_seconds: 5.
anomalies_storage: *kafka_storage_instance
allocations_storage: *kafka_storage_instanceThis approach can help to save resources (like connections), share state or simplify configuration (no need to repeat the same arguments).
If component requires some additional dependencies and you do not want dirty system interpreter library, the best way to bundle new component is to use PEX file to package all source code including dependencies.
(requests library from previous example was available because it is already required by WCA itself).
pex -D examples python-dateutil==2.8.0 -o hello_world.pex -vwhere example/hello_world_runner_with_dateutil.py:
from wca.runners import Runner
from dateutil.utils import today
class HelloWorldRunner(Runner):
def run(self):
print('Hello world! Today is %s' % today())then it is possible to combine two PEX files into single environment, by using
PEX_PATH environment variable:
PEX_PATH=hello_world.pex ./dist/wca.pex -c $PWD/configs/extending/hello_world.yaml -r hello_world_runner_with_dateutil:HelloWorldRunneroutputs:
Hello world! Today is 2019-06-14 00:00:00Note this method works great if there is no conflicting sub dependencies (Diamond dependency problem), because only one version will be available during runtime. In such case, you need to consolidate WCA and your component into single project (with common requirments) so that conflicts will be resolved during requirements gathering phase. You can check Platform Resource Manager prm component as an example of such approach.
Any children object that is used by any runner, can be replaced with extrnal component, but WCA was designed to be extended, by providing following components:
Nodeclass used by allRunnersto perform task discovery,Storageclasses used to enable persistance for internal metrics (*_storageproperties),Detectorclass to provide anomaly detection logic,Allocatorclass to provide anomaly detection and anomaly mittigation logic (by resource allocation),