I want to understand what is the current memory usage for my pod running on EKS Cluster. I have Metrics server and prometheus installed.
When I run a "kubectl top pods", I get a memory usage of 2.5 GB
sh-4.2$ kubectl top pods liink-goquorum-node-learner-quorum-0 -n my-namespace
NAME CPU(cores) MEMORY(bytes)
liink-goquorum-node-learner-quorum-0 128m 2577Mi
The metrics server documentation advises against using this as an accurate source of metrics.
When I go inside the pod and execute the command, my memory usage is vastly different.
root@container[liink-goquorum-node-learner-quorum-0]:/sys/fs/cgroup/memory# cat /sys/fs/cgroup/memory/memory.usage_in_bytes
8921739264
When I checked in Prometheus (as advised in metrics server github page), my memory utilization is closer to the one that I am getting in the system.
container_memory_usage_bytes{cluster="liink-uat",namespace="<my-namespace>",container="",pod="liink-goquorum-node-learner-quorum-0"} => 8923037696
However, when I run the below query, the value is closer to metrics server.
container_memory_working_set_bytes{cluster="liink-uat",namespace="<my-namespace>",container="",pod="liink-goquorum-node-learner-quorum-0"} => 2704633856
So which one is the best source to get the current memory usage for the pod? Also if metrics server can't be trusted for accurate usage, then are the auto scaling decisions based on this wrong? What am I missing here?
Below are the explanation of these metrics from the cadvisor documentation and it is not very clear to me from this.
container_memory_usage_bytes: Current memory usage, including all memory regardless of when it was accessed
container_memory_working_set_bytes: Current working set
Also pasting the cgroups explanation for the memory usage. https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt which lets me believe that the current container utilization is closer to what I am getting by running the cat command.
Can someone help me understand what is the correct way to get the current memory utilization and if i am missing something here.
Both ways are correct, they just provide information about different utilization categories.
My understanding:
container_memory_working_set_bytesincludes memory allocated to your application. This is critical for application functioning, and thus this metric is monitored most of the time and has the most interest. OOM and autoscaling are based on its value.container_memory_usage_bytesin addition to memory allocated to your application also includes all kinds of memory used by pod: filesystem caching, shared memory, that kind of stuff. Theoretically this memory is not critical and can be freed if needed.I believe
container_memory_usage_bytesis closer to "current memory usage for the pod". But you should remember that it is simply not the same as memory usage of application in pod.