使用OpenTelemetry来监控Python应用
本文介绍了使用OpenTelemetry零代码插桩方式监控Python应用的方法。主要内容包括:1)快速部署基于Docker的Grafana LGTM作为监控后端;2)通过安装opentelemetry-distro包和自动插桩工具,实现Python应用的自动监控;3)配置环境变量并使用opentelemetry-instrument启动应用;4)展示了包含吞吐量、延迟时间、服务调用关系等关键指标
1. 前言
最近我写了一些使用OpenTelemetry监控几种编程语言(比如:Java,Node.JS)的文档,还有跨编程语言的OBI方式。当然这些文章都只介绍了非侵入式的方式,在OpenTelemetry里叫做Automatic instrumentation(自动插桩),现在又叫做Zero-code instrumentation(零代码插桩)。 其实几乎每种编程语言都有使用OpenTelemetry SDK编程的侵入式方式,适于平台开发者、商品软件提供者和希望提供精品应用的开发者,可以参考OpenTelemetry的文档。
这篇文章再介绍一下如何如何使用OpenTelemetry来监控Python应用,当然也是只讲非侵入式的方式,做起来也非常容易。
2. 启动OpenTelemetry的后端工具
OpenTelemetry的后端工具,就是支持OpenTelemetry metrics/traces/logs/profiles的数据库和UI的工具集。最常见的做法还是使用OpenTelemetry Collector来连接不同的后端工具。假如您是初学者,或者系统很小,可以直接使用基于Docker的Grafana LGTM,几乎是一键安装完成,非常简单易行。前提是需要您有个支持Docker的环境。
假设您想把LGTM安装到/opt/lgtm目录 (任何目录均可),下面是命令(假设在Linux系统):
docker pull grafana/otel-lgtm
mkdir /opt/lgtm
cd /opt/lgtm
wget https://raw.githubusercontent.com/grafana/docker-otel-lgtm/main/run-lgtm.sh
chmod +x run-lgtm.sh
sed -i 's/3000:3000/3100:3000/' run-lgtm.sh
注意最后一行命令,因为LGTM的Grafana的默认对外端口是3000,这个端口经常和一些应用程序冲突,我就改成了3100.
前面几篇文章我都在启动LGTM时激活了OBI,可以增加更多的metrics。但是如果LGTM和被监控的应用不在同一台机器时,会多出一些配置OBI的步骤。这次我干脆用默认方式(不激活OBI)。下面是启动LGTM的方法,就一个命令:
cd /opt/lgtm
./run-lgtm.sh
LGTM要监听以下的端口:
- 4317/4318 是OTLP端口,用来接受metrics/traces/logs数据
- 3100 是Grafana UI的端口
- 9090 是Prometheus的端口,用于调试
- 4040 是Pyroscope接受Profiles的端口,将来会整合进入4317/4318
3. 如何配置和启动Python应用来激活OpenTelemetry监控
1) 增加Python包:opentelemetry-distro 和 opentelemetry-exporter-otlp
一般先要进到应用使用的虚拟环境(venv, Conda之类的)。
然后可以简单地使用如下的命令安装Python包:
pip install opentelemetry-distro opentelemetry-exporter-otlp
当然更正规的方式是在应用的requirements.txt里面增加两行:
opentelemetry-distro
opentelemetry-exporter-otlp
然后重新执行:
pip install -r requirements.txt
2) 生成自动插桩的包
执行如下命令:
opentelemetry-bootstrap -a install
该命令会针对目前安装的Python包配置响应的自动插桩的包。单独执行opentelemetry-bootstrap可以看到这些自动插桩的包。下面是我在某一个虚拟环境执行opentelemetry-bootstrap的结果:
$ opentelemetry-bootstrap
opentelemetry-instrumentation-asyncio==0.61b0
opentelemetry-instrumentation-dbapi==0.61b0
opentelemetry-instrumentation-logging==0.61b0
opentelemetry-instrumentation-sqlite3==0.61b0
opentelemetry-instrumentation-threading==0.61b0
opentelemetry-instrumentation-urllib==0.61b0
opentelemetry-instrumentation-wsgi==0.61b0
opentelemetry-instrumentation-asgi==0.61b0
opentelemetry-instrumentation-click==0.61b0
opentelemetry-instrumentation-fastapi==0.61b0
opentelemetry-instrumentation-grpc==0.61b0
opentelemetry-instrumentation-requests==0.61b0
opentelemetry-instrumentation-sqlalchemy==0.61b0
opentelemetry-instrumentation-starlette==0.61b0
opentelemetry-instrumentation-tortoiseorm==0.61b0
opentelemetry-instrumentation-urllib3==0.61b0
3) 使用 opentelemetry-instrument 启动应用
假如您启动应用的命令是:
python app.py
那可以改成如下命令来启动应用,同时激活OpenTelemetry:
export OTEL_SERVICE_NAME=python-demo
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_TRACES_EXPORTER=otlp
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
cd /opt/dev/otel/otel-python-demo/
opentelemetry-instrument python app.py
当然也可以用uvicorn或者gunicorn。假如您启动应用的命令是:
uvicorn app:app --host 0.0.0.0 --port 8000
那可以改成如下命令来启动应用,同时激活OpenTelemetry:
export OTEL_SERVICE_NAME=python-demo
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_TRACES_EXPORTER=otlp
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
cd /opt/dev/otel/otel-python-demo/
opentelemetry-instrument uvicorn app:app --host 0.0.0.0 --port 8000
注意:
- 按照您自己的应用内容来修改OTEL_SERVICE_NAME环境变量
- 如果OpenTelemetry后端(本例采用LGTM)不在本机,假如在1.2.3.4,则设置响应的OTEL_EXPORTER_OTLP_ENDPOINT为
export OTEL_EXPORTER_OTLP_ENDPOINT=http://1.2.3.4:4317
4. 监控Python应用的Dashboard
使用“http://localhost:3100”就可以访问LGTM的Grafana UI界面(如果是远程的话,用主机名或者IP替换localhost),使用admin/admin登录。虽然LGTM支持各种类型的数据:metrics/traces/logs/profiles,但是由于OpenTelemetry对Python的自动监控目前只支持metrics/traces,所以我们的Grafana Dashboard只能有这两种数据。
以下是我做的Dashboard的截图:

以下是点击某一个Trace的截图:

我简单介绍一下这个Dashboard:
第一行左图是描述吞吐量,主要的PromQL是:
sum by (http_target, http_status_code) (
rate(http_server_duration_milliseconds_count[2m])
)
第一行右图是描述HTTP服务器延迟时间(90%),主要的PromQL是:
histogram_quantile(0.90, sum by (le, http_target) (rate(http_server_duration_milliseconds_bucket[2m])))
第二行左图是描述返回包的大小,主要的PromQL是:
histogram_quantile(0.90, sum by (le, http_target) (rate(http_server_response_size_bytes_bucket[2m])))
第二行右图是描述HTTP客户机延迟时间(P90),主要的PromQL是:
histogram_quantile(0.90, sum without (http_flavor,http_scheme,job) (rate(http_client_duration_milliseconds_bucket[2m])))
第三行左图是描述活跃的HTTP请求数,主要的PromQL是:
sum without (__name__,job,http_flavor,http_scheme) (http_server_active_requests)
第三行右图是描述DB连接数,主要的PromQL是:
avg without (__name__,job) (
db_client_connections_usage
)
第四行左图是描述服务之间的调用关系。
第四行右图是所有Traces。
以下是整个Dashboard的代码,可以直接导入。
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"links": [],
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineStyle": {
"fill": "solid"
},
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"showValues": false,
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": 0
},
{
"color": "red",
"value": 80
}
]
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"id": 2,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"hideZeros": false,
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "12.4.1",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "sum by (http_target, http_status_code) (\r\n rate(http_server_duration_milliseconds_count[2m])\r\n)",
"instant": true,
"interval": "",
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "吞吐量",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"showValues": false,
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": 0
},
{
"color": "red",
"value": 80
}
]
},
"unit": "ms"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 0
},
"id": 1,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"hideZeros": false,
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "12.4.1",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "histogram_quantile(0.90, sum by (le, http_target) (rate(http_server_duration_milliseconds_bucket[2m])))",
"instant": true,
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "HTTP服务器延迟时间(P90)",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"showValues": false,
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": 0
},
{
"color": "red",
"value": 80
}
]
},
"unit": "bytes"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 8
},
"id": 3,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"hideZeros": false,
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "12.4.1",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "histogram_quantile(0.90, sum by (le, http_target) (rate(http_server_response_size_bytes_bucket[2m])))",
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "返回包的大小",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"showValues": false,
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": 0
},
{
"color": "red",
"value": 80
}
]
},
"unit": "ms"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 8
},
"id": 5,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"hideZeros": false,
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "12.4.1",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "histogram_quantile(0.90, sum without (http_flavor,http_scheme,job) (rate(http_client_duration_milliseconds_bucket[2m])))",
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "HTTP客户机延迟时间(P90)",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"showValues": false,
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": 0
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 16
},
"id": 6,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"hideZeros": false,
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "12.4.1",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "sum without (__name__,job,http_flavor,http_scheme) (http_server_active_requests)",
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "活跃的HTTP请求数",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"showValues": false,
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": 0
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 16
},
"id": 4,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"hideZeros": false,
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "12.4.1",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "avg without (__name__,job) (\r\n db_client_connections_usage\r\n)\r\n",
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "DB连接数",
"type": "timeseries"
},
{
"datasource": {
"type": "tempo",
"uid": "tempo"
},
"fieldConfig": {
"defaults": {},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 24
},
"id": 7,
"options": {
"edges": {},
"layoutAlgorithm": "layered",
"nodes": {},
"zoomMode": "cooperative"
},
"pluginVersion": "12.4.1",
"targets": [
{
"datasource": {
"type": "tempo",
"uid": "tempo"
},
"key": "Q-00d81243-72c6-4f8c-b817-2f3b91dacad4-0",
"limit": 20,
"metricsQueryType": "range",
"queryType": "serviceMap",
"refId": "A",
"serviceMapUseNativeHistograms": false,
"tableType": "traces"
}
],
"title": "服务调用图",
"type": "nodeGraph"
},
{
"datasource": {
"type": "tempo",
"uid": "tempo"
},
"fieldConfig": {
"defaults": {
"custom": {
"align": "auto",
"cellOptions": {
"type": "auto"
},
"footer": {
"reducers": []
},
"inspect": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": 0
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 24
},
"id": 8,
"options": {
"cellHeight": "sm",
"showHeader": true
},
"pluginVersion": "12.4.1",
"targets": [
{
"datasource": {
"type": "tempo",
"uid": "tempo"
},
"filters": [
{
"id": "0d6f56a3",
"operator": "=",
"scope": "span"
}
],
"limit": 20,
"metricsQueryType": "range",
"queryType": "traceqlSearch",
"refId": "A",
"serviceMapUseNativeHistograms": false,
"tableType": "traces"
}
],
"title": "Traces",
"type": "table"
}
],
"preload": false,
"schemaVersion": 42,
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "browser",
"title": "Python Dashboard",
"uid": "admnssk",
"version": 20,
"weekStart": ""
}
5. 总结
本文介绍了使用OpenTelemetry零代码插桩方式监控Python应用的方法。主要内容包括:1)快速部署基于Docker的Grafana LGTM作为监控后端;2)通过安装opentelemetry-distro包和自动插桩工具,实现Python应用的自动监控;3)配置环境变量并使用opentelemetry-instrument启动应用;4)展示了包含吞吐量、延迟时间、服务调用关系等关键指标的Grafana仪表板。该方法无需修改代码即可实现Python应用的全面监控,适合中小型系统快速搭建监控体系。
更多推荐




所有评论(0)