I am trying to understand what are the limitations for ingesting custom metrics in prometheus.
I understand that each metrics is an active timeserie, for example that would be two series container_image_size{base=java, name=myapp}
container_image_size{base=java, container=myotherapp}
.
I understand that prometheus cannot deal with high cardinality, high cardinality is too many labels and metrics. What is "high"? How much is too high?
Thank you.
I am trying to understand what are the limitations for ingesting custom metrics in prometheus.
I understand that each metrics is an active timeserie, for example that would be two series container_image_size{base=java, name=myapp}
container_image_size{base=java, container=myotherapp}
.
I understand that prometheus cannot deal with high cardinality, high cardinality is too many labels and metrics. What is "high"? How much is too high?
Thank you.
Share Improve this question asked Nov 18, 2024 at 17:46 user5994461user5994461 7,2384 gold badges45 silver badges66 bronze badges1 Answer
Reset to default 0In my experience a prometheus server can easily handle in the order of 1000 servers sending 1000 metrics. That's about a million active time series.
A prometheus server can get to 10 million active metrics if you're willing to allocate the hardware for it and tune the collectors. Infrastructure metrics from the prometheus exporter are often on a 5s to 60s interval, whereas a 5 minutes interval may be sufficient for custom metrics. A small adjustment in interval (and retention period) has a massive impact on load/storage.
A typical server sends 1k to 10k metrics, depending on what collectors are enabled. A small server in the cloud is closer to 1k metrics. A large physical server is closer to 10k metrics. The most notable metrics is per-core CPU metrics which generate a ton of series on physical servers with 100+ cores nowadays, they are enabled out-of-the-box on the prometheus exporter.
As a rule of thumb. If you make an app that sends one thousand active series, it's fine, it's like another server to monitor.
If you make an app that sends one million active metrics, it's not fine, it's basically the full size of the infrastructure estate. It's gonna topple over the prometheus server.
A few million series is not a lot. That upper limit is a strong limitations on what prometheus can be used for in practice. Let's consider some examples:
disk_usage_per_volume(disk="/etc")
-> fine, there are only a dozen disks or volumes on a machinedisk_usage_per_dir(dir="/home/username/subdir/...")
-> not fine, it's easy gonna run into millions of directorieshttp_request(domain="example", status=200)
-> fine, there are only a few domains, there are about a hundred HTTP status codeshttp_request(domain="example", status=200, url="/order/cart/)
-> not fine, too many URLs
Things that have high cardinality like web server requests typically go to a logging system like ElasticSearch, which is optimized to store individual messages with repeated fields.