最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

How much cardinality can I have in my prometheus metrics? - Stack Overflow

programmeradmin12浏览0评论

I am trying to understand what are the limitations for ingesting custom metrics in prometheus.

I understand that each metrics is an active timeserie, for example that would be two series container_image_size{base=java, name=myapp} container_image_size{base=java, container=myotherapp}.

I understand that prometheus cannot deal with high cardinality, high cardinality is too many labels and metrics. What is "high"? How much is too high?

Thank you.

I am trying to understand what are the limitations for ingesting custom metrics in prometheus.

I understand that each metrics is an active timeserie, for example that would be two series container_image_size{base=java, name=myapp} container_image_size{base=java, container=myotherapp}.

I understand that prometheus cannot deal with high cardinality, high cardinality is too many labels and metrics. What is "high"? How much is too high?

Thank you.

Share Improve this question asked Nov 18, 2024 at 17:46 user5994461user5994461 7,2384 gold badges45 silver badges66 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

In my experience a prometheus server can easily handle in the order of 1000 servers sending 1000 metrics. That's about a million active time series.

A prometheus server can get to 10 million active metrics if you're willing to allocate the hardware for it and tune the collectors. Infrastructure metrics from the prometheus exporter are often on a 5s to 60s interval, whereas a 5 minutes interval may be sufficient for custom metrics. A small adjustment in interval (and retention period) has a massive impact on load/storage.

A typical server sends 1k to 10k metrics, depending on what collectors are enabled. A small server in the cloud is closer to 1k metrics. A large physical server is closer to 10k metrics. The most notable metrics is per-core CPU metrics which generate a ton of series on physical servers with 100+ cores nowadays, they are enabled out-of-the-box on the prometheus exporter.

As a rule of thumb. If you make an app that sends one thousand active series, it's fine, it's like another server to monitor.

If you make an app that sends one million active metrics, it's not fine, it's basically the full size of the infrastructure estate. It's gonna topple over the prometheus server.

A few million series is not a lot. That upper limit is a strong limitations on what prometheus can be used for in practice. Let's consider some examples:

  • disk_usage_per_volume(disk="/etc") -> fine, there are only a dozen disks or volumes on a machine
  • disk_usage_per_dir(dir="/home/username/subdir/...") -> not fine, it's easy gonna run into millions of directories
  • http_request(domain="example", status=200) -> fine, there are only a few domains, there are about a hundred HTTP status codes
  • http_request(domain="example", status=200, url="/order/cart/) -> not fine, too many URLs

Things that have high cardinality like web server requests typically go to a logging system like ElasticSearch, which is optimized to store individual messages with repeated fields.

发布评论

评论列表(0)

  1. 暂无评论