AWS EKS Fargate pod scheduling issue with Prometheus deployment

24 views Asked by Typewriter At 30 March 2024 at 12:27

I'm encountering a problem with scheduling Prometheus pods on AWS EKS Fargate within my Kubernetes cluster. Here are the details of my setup:

I have configured an AWS EKS Fargate profile named terraform_eks_fargate_profile_monitoring for my Prometheus namespace (monitoring). Here's the relevant Terraform configuration:

resource "aws_eks_fargate_profile" "terraform_eks_fargate_profile_monitoring" {
  fargate_profile_name   = "monitoring"
  cluster_name           = aws_eks_cluster.terraform_eks_cluster.name
  pod_execution_role_arn = aws_iam_role.terraform_eks_fargate_pods.arn
  subnet_ids             = aws_subnet.terraform_eks_vpc_private_subnets[*].id
  selector {
    namespace = "monitoring" 
  }
}

I've deployed Prometheus using Helm with the following command:

helm install prometheus prometheus-community/prometheus -n monitoring

However, when I describe the pods, I receive the following warning:

Name:             prometheus-prometheus-node-exporter-4kvv7
Namespace:        monitoring
Priority:         0
Service Account:  prometheus-prometheus-node-exporter
Node:             <none>
Labels:           app.kubernetes.io/component=metrics
                  app.kubernetes.io/instance=prometheus
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=prometheus-node-exporter
                  app.kubernetes.io/part-of=prometheus-node-exporter
                  app.kubernetes.io/version=1.7.0
                  controller-revision-hash=9bd9c77f
                  helm.sh/chart=prometheus-node-exporter-4.31.0
                  pod-template-generation=1
Annotations:      cluster-autoscaler.kubernetes.io/safe-to-evict: true
Status:           Pending
IP:
IPs:              <none>
Controlled By:    DaemonSet/prometheus-prometheus-node-exporter
Containers:
  node-exporter:
    Image:      quay.io/prometheus/node-exporter:v1.7.0
    Port:       9100/TCP
    Host Port:  9100/TCP
    Args:
      --path.procfs=/host/proc
      --path.sysfs=/host/sys
      --path.rootfs=/host/root
      --path.udev.data=/host/root/run/udev/data
      --web.listen-address=[$(HOST_IP)]:9100
    Liveness:   http-get http://:9100/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:9100/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      HOST_IP:  0.0.0.0
    Mounts:
      /host/proc from proc (ro)
      /host/root from root (ro)
      /host/sys from sys (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  proc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:
  sys:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:
  root:
    Type:          HostPath (bare host directory volume)
    Path:          /
    HostPathType:
QoS Class:         BestEffort
Node-Selectors:    kubernetes.io/os=linux
Tolerations:       :NoSchedule op=Exists
                   node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                   node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                   node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                   node.kubernetes.io/not-ready:NoExecute op=Exists
                   node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                   node.kubernetes.io/unreachable:NoExecute op=Exists
                   node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  21m                default-scheduler  0/5 nodes are available: 1 Too many pods. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
  Warning  FailedScheduling  15s (x6 over 20m)  default-scheduler  0/7 nodes are available: 1 Too many pods. preemption: 0/7 nodes are available: 7 No preemption victims found for incoming pod.

I get this on all the node-exporter pods. The pushgateway and metrics pod runs fine

I'm not sure that the resource requests and limits specified in the Prometheus pod configuration are compatible with Fargate.

Since i used

helm install prometheus prometheus-community/prometheus -n monitoring

it has the default resource specifications for the Prometheus server container.

Original Q&A

TechQA.

AWS EKS Fargate pod scheduling issue with Prometheus deployment

There are 0 answers

Related Questions in AMAZON-WEB-SERVICES

Related Questions in TERRAFORM

Related Questions in DEVOPS

Related Questions in AMAZON-EKS

Related Questions in AWS-DEVOPS

Popular Questions

Trending Questions