Amazon EKS

Data Collection Setup

Metrics are collected via the Amazon EKS API endpoint and Kubernetes cluster API. See the Network Requirements for more details.

Network Requirements

The BindPlane Collector MUST have access to the following:

The EKS API Server: Port: 443 (TCP) HTTPS to the Amazon Cloudwatch API
Kubelet Port: 10250 (TCP)

🚧
For users intending to deploy a BindPlane Collector that is not within the same VPC that the EKS Cluster exists in, please be aware of the following
The security group used to control access to the individual networks will need to open up port 10250 to the collector. This change will usually be performed in the Cloudformation template used to build out the worker nodes, or any other systems being used to manage the EKS worker nodes.
In the Kubernetes Source configuration, the External IPs selection may need to be made if the collector resides on a network segment that does not have access to the private worker node IPs.

Least Privileged User

Amazon User

Navigate to the AWS console and create an IAM user with programmatic access. The user will need the following permissions. You can create a policy specifically for these permissions and apply the permissions to the user.

{
	"Version": "2012-10-17",
	"Statement": [
    	{
        	"Sid": "VisualEditor0",
        	"Effect": "Allow",
        	"Action": [
            	"ec2:Describe*",
            	"ec2:Get*",
            	"ec2:Search*",
            	"eks:Get*",
            	"eks:Describe*"
        	],
        	"Resource": "*"
    	}
	]
}

EKS Cluster: Setting Up A Monitoring User

To setup a monitoring user on the EKS cluster

Ensure that you have the aws-iam-authenticator installed. Click here for more details

📘
Where do I install the aws-iam-authenticator?
The installation of the aws-iam-authenticator must be done on your main management workstation.

Install the bindplane-monitoring role. To do this, run the following commands against your Kubernetes Cluster:

kubectl apply -f https://raw.githubusercontent.com/BlueMedoraPublic/bm-kube-lpu/master/bluemedora-lpu/bm-clusterrole.yaml

kubectl apply -f https://raw.githubusercontent.com/BlueMedoraPublic/bm-kube-lpu/master/bluemedora-lpu/bm-clusterrolebinding.yaml

kubectl apply -f https://raw.githubusercontent.com/BlueMedoraPublic/bm-kube-lpu/master/bluemedora-lpu/bm-role.yaml

kubectl apply -f https://raw.githubusercontent.com/BlueMedoraPublic/bm-kube-lpu/master/bluemedora-lpu/bm-rolebinding.yaml

kubectl apply -f https://raw.githubusercontent.com/BlueMedoraPublic/bm-kube-lpu/master/bluemedora-lpu/bm-serviceaccount.yaml

Obtain the token to use with the monitoring user. (The example below is for a Linux system)

kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep bluemedora | awk '{print $1}')

BindPlane Collector & Kubernetes Source Setup

Gather information from your EKS cluster

From the main EKS configuration page, the following information will be needed:

The API Server endpoint
VPC
Subnet(s)
Security Group

Deploy a collector

Select the AWS Cloudformation platform when deploying a collector in BindPlane:

Use the same VPC, Subnet, and Security Group collected in the previous step
"Assign a public IP" can be left off unless you need it for administrative purposes
Select a Key Pair to allow for the management of the system
Click Deploy

Configure the Kubernetes Source

Host: This is the API Server endpoint previously collected, but without the https:// included
Port: 443
Kubelet Port: 10250
Credentials: Use the monitoring token from above
Use External or Internal IPs: Use Internal IPs
SSL Configuration: No Verify

Connection Parameters

Name	Required?	Description
Region
Access Key ID	Required
Secret Access Key	Required
Additional Threads		The number of additional threads allowed to be utilized during collection.
Request Timeout (seconds)		The number of seconds to allow for the API to return a response.
Host	Required	The Kubernetes host to connect to.
Kubelet Port		The port for communication to the Kubernetes kubelet.
Bearer Token	Required
Use External or Internal IPs		Determines if Node external or internal IP addresses are used for connection.
SSL Configuration		The SSL mode to use when connecting to the target. Can be configured to not use SSL (No SSL), use SSL but do not verify the target's certificate (No Verify), and use SSL and verify the target's certificate (Verify).
Cluster Name	Required	The name of the cluster as defined in EKS.
Event Cutoff Time		Only events with a 'last updated' timestamp that is after this timestamp will be collected.
Connection Timeout		The number of seconds to allow for connecting to the target.
Max Simultaneous Kubelet Requests		Sets the maximum simultaneous requests to Kubelets. Set to 0 to disable throttling.
Collect Containers		Toggles the collection of containers.
Collect Volumes		Toggles the collection of volumes.
Collect from Kubelet API		Toggles the collection from the cAdvisor API used by the Kubelets.

Metrics

Cluster

Name	Description
ARN	The Amazon Resource Name (ARN) of the cluster.
Average CPU Usage (Nanocores)	The current average CPU usage per node in this Kubernetes Cluster.
Average Filesystem Space (Bytes)	The average filesystem space per node in this Kubernetes Cluster.
Average Filesystem Used (Bytes)	The average used filesystem space per node in this Kubernetes Cluster.
Average Memory Usage (Bytes)	The current average memory usage per node in this Kubernetes Cluster.
Certificate Authority	The base64 encoded certificate data required to communicate with your cluster.
Client Request Token	Unique, case-sensitive identifier that you provide to ensure the idempotency of the request.
Created At (Seconds)	The Unix epoch timestamp in seconds for when the cluster was created.
Endpoint	The endpoint for your Kubernetes API server.
Failed Pod Count	The current number of pods in this Kubernetes Cluster in a Failed phase.
Name	The name of the cluster.
Node Count	The current number of nodes in this Kubernetes Cluster.
Not Ready Node Count	The current number of nodes in this Kubernetes Cluster in a Not Ready condition.
Not Ready Pod Count	The current number of pods in this Kubernetes Cluster in a Not Ready condition.
Pending Pod Count	The current number of pods in this Kubernetes Cluster in a Pending phase.
Platform Version	The platform version of your Amazon EKS cluster.
Pod Capacity	The total pod capacity of this Kubernetes Cluster.
Pod Count	The current number of pods in this Kubernetes Cluster.
Ready Node Count	The current number of nodes in this Kubernetes Cluster in a Ready condition.
Ready Pod Count	The current number of pods in this Kubernetes Cluster in a Ready condition.
Region	The AWS Region this object belongs to.
Role ARN	The Amazon Resource Name (ARN) of the IAM role that provides permissions for the Kubernetes control plane to make calls to AWS API operations on your behalf.
Running Not Ready Pod Count	The current number of pods in this Kubernetes Cluster in a Running phase, but not a Ready condition.
Running Pod Count	The current number of pods in this Kubernetes Cluster in a Running phase.
Status	The current status of the cluster.
Succeeded Pod Count	The current number of pods in this Kubernetes Cluster in a Succeeded phase.
Total CPU Capacity (Nanocores)	The total CPU capacity over all nodes in this Kubernetes Cluster.
Total CPU Usage (Nanocores)	The current CPU usage over all nodes in this Kubernetes Cluster.
Total CPU Usage Ratio (%)	The current CPU usage over all nodes in this Kubernetes Cluster.
Total Filesystem Space (Bytes)	The total filesystem space over all nodes in this Kubernetes Cluster.
Total Filesystem Usage (%)	The current filesystem usage over all nodes in this Kubernetes Cluster.
Total Filesystem Used (Bytes)	The total used filesystem space over all nodes in this Kubernetes Cluster.
Total Memory (Bytes)	The total memory capacity over all nodes in this Kubernetes Cluster.
Total Memory Usage (Bytes)	The current memory usage over all nodes in this Kubernetes Cluster.
Total Memory Usage Ratio (%)	The current memory usage over all nodes in this Kubernetes Cluster.
Total Received Data (Bytes)	The total received data over all nodes in this Kubernetes Cluster.
Total Transmitted Data (Bytes)	The total transmitted data over all nodes in this Kubernetes Cluster.
Unknown Node Count	The current number of nodes in this Kubernetes Cluster in an Unknown condition.
Unknown Pod Count	The current number of pods in this Kubernetes Cluster in an Unknown condition.
Version	The Kubernetes server version for the cluster.
VPC ID	The VPC associated with your cluster.
VPC Security Group IDs	The security groups associated with the cross-account elastic network interfaces that are used to allow communication between your worker nodes and the Kubernetes control plane.
VPC Subnet IDs	The subnets associated with your cluster.

Container

Name	Description
Available Space (Bytes)	The storage space available for the filesystem.
CPU Limit (Nanocores)	The maximum CPU usage this container is allowed to consume.
CPU Limit Usage (%)	The total CPU Usage in proportion to the CPU Limit.
CPU Request (Nanocores)	The minimum CPU usage this container requires for it to be scheduled.
CPU Request Usage (%)	The total CPU Usage in proportion to the CPU Request.
CPU Time (Nanoseconds)	Cumulative CPU time (sum of all cores) since object creation.
CPU Usage (Nanocores)	Total CPU usage (sum of all cores) averaged over the sample window.
Free inodes	The number of free inodes in the filesystem.
Image	The Image name of this container.
Major Page Faults	Cumulative number of major page faults.
Memory Limit (Bytes)	The maximum Memory usage this container is allowed to consume.
Memory Limit Usage (%)	The total Memory Usage in proportion to the Memory Limit.
Memory Request (Bytes)	The minimum memory this container requires for it to be scheduled.
Memory Request Usage (%)	The total Memory Usage in proportion to the Memory Request.
Memory Usage (Bytes)	Total memory in use. This includes all memory regardless of when it was accessed.
Minor Page Faults	Cumulative number of minor page faults.
Name	Name of the Container.
Pod Name	Name of the Pod the Container is running on.
Pod Namespace	Namespace of the Pod the Container is running on.
Ready	The Container is Ready when it is ready to start accepting traffic.
Region	The AWS Region this object belongs to.
Restart Count	The number of times this container has restarted.
RSS Usage (Bytes)	The amount of anonymous and swap cache memory (includes transparent hugepages).
Start Time	The time that this container started running.
Total inodes	The total number of inodes in the filesystem.
Total Space (Bytes)	The total capacity of the filesystems underlying storage.
Used inodes	The number of inodes used by the filesystem.
Used Space (Bytes)	The storage space used on the filesystem.
Used Space Ratio (%)	The storage space used on the filesystem.
Working Set Usage (Bytes)	The amount of working set memory. This includes recently accessed memory, dirty memory, and kernel memory.

Deployment

Name	Description
Available Replicas	Total number of available pods (ready for at least minReadySeconds) targeted by this deployment.
Collision Count	Count of hash collisions for the Deployment. The Deployment controller uses this field as a collision avoidance mechanism when it needs to create the name for the newest ReplicaSet.
Desired Replicas	Number of desired pods. This is a pointer to distinguish between explicit zero and not specified.
Minimum Ready (Seconds)	Minimum number of seconds for which a newly created pod should be ready without any of its container crashing, for it to be considered available.
Name	Name of the Deployment.
Namespace	Namespace defines the space within which each name must be unique. An empty namespace is equivalent to the "default" namespace, but "default" is the canonical representation. Not all objects are required to be scoped to a namespace - the value of this field for those objects will be empty. Must be a DNS_LABEL. Cannot be updated.
Observed Generation	The generation observed by the deployment controller.
Paused	Indicates that the deployment is paused.
Progress Deadline (Seconds)	Indicates that the deployment is paused.
Ready Replicas	Total number of ready pods targeted by this deployment.
Region	The AWS Region this object belongs to.
Replicas	Total number of non-terminated pods targeted by this deployment (their labels match the selector).
Revision History Limit	The number of old ReplicaSets to retain to allow rollback. This is a pointer to distinguish between explicit zero and not specified.
Strategy	Type of deployment.
Unavailable Replicas	Total number of unavailable pods targeted by this deployment. This is the total number of pods that are still required for the deployment to have 100% available capacity. They may either be pods that are running but not yet available or pods that still have not been created.
Updated Replicas	Total number of non-terminated pods targeted by this deployment that have the desired template spec.

Job

Name	Description
Active Deadline (Seconds)	Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer
Active Pods	The number of actively running pods.
Backoff Limit	Specifies the number of retries before marking this job failed.
Completion Time	Represents time when the job was completed. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.
Completions	Specifies the desired number of successfully finished pods the job should be run with. Setting to nil means that the success of any pod signals the success of all pods, and allows parallelism to have any positive value. Setting to 1 means that parallelism is limited to 1 and the success of that pod signals the success of the job.
Failed Pods	The number of pods which reached phase Failed.
Manual Selector	Controls generation of pod labels and pod selectors. Leave `manualSelector` unset unless you are certain what you are doing. When false or unset, the system pick labels unique to this job and appends those labels to the pod template. When true, the user is responsible for picking unique labels and specifying the selector. Failure to pick a unique label may cause this and other jobs to not function correctly.
Name	Name of the Job.
Namespace	Namespace defines the space within which each name must be unique. An empty namespace is equivalent to the "default" namespace, but "default" is the canonical representation. Not all objects are required to be scoped to a namespace - the value of this field for those objects will be empty. Must be a DNS_LABEL. Cannot be updated.
Parallelism	Specifies the maximum desired number of pods the job should run at any given time. The actual number of pods running in steady state will be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism), i.e. when the work left to do is less than max parallelism.
Region	The AWS Region this object belongs to.
Start Time	Represents time when the job was acknowledged by the job controller. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.
Succeeded Pods	The number of pods which reached phase Succeeded.

Namespace

Name	Description
Average CPU Usage (Nanocores)	The current average CPU usage per pod in this Namespace.
Average Memory Usage (Bytes)	The current average memory usage per pod in this Namespace.
Failed Pod Count	The current number of pods hosted by this Namespace in a Failed phase.
Name	Name of the Namespace.
Not Ready Pod Count	The current number of pods hosted by this Namespace in a Not Ready condition.
Pending Pod Count	The current number of pods hosted by this Namespace in a Pending phase.
Phase	Current condition of the Namespace.
Pod Count	The current number of pods hosted by this Namespace.
Ready Pod Count	The current number of pods hosted by this Namespace in a Ready condition.
Region	The AWS Region this object belongs to.
Running Not Ready Pod Count	The current number of pods hosted by this Namespace in a Running phase, but not a Ready condition.
Running Pod Count	The current number of pods hosted by this Namespace in a Running phase.
Succeeded Pod Count	The current number of pods hosted by this Namespace in a Succeeded phase.
Total CPU Usage (Nanocores)	The current CPU usage over all pods in this Namespace.
Total Memory Usage (Bytes)	The current memory usage over all pods in this Namespace.
Unknown Pod Count	The current number of pods hosted by this Namespace in an Unknown condition.

Node

Name	Description
Allocatable CPU (Nanocores)	The amount of memory on the Node available for scheduling.
Allocatable Memory (Bytes)	The amount of memory on the Node available for scheduling.
Allocatable Pods	The number of pods available for scheduling.
Architecture	The Architecture reported by the Node.
Available Memory (Bytes)	Available memory for use. This is defined as (Total Memory - Working Set Usage).
Available Space (Bytes)	The storage space available for the filesystem.
Container Runtime Version	ContainerRuntime Version reported by the Node through runtime remote API.
Containers CPU Limit (Nanocores)	The maximum CPU usage the containers on this node are allowed to consume.
Containers CPU Request (Nanocores)	The minimum CPU usage required by the containers on this node.
Containers Memory Limit (Bytes)	The maximum Memory usage the containers on this node are allowed to consume.
Containers Memory Request (Bytes)	The minimum memory required by the containers on this node.
CPU Capacity (Nanocores)	Total CPU capacity of the Node.
CPU Time (Nanoseconds)	Cumulative CPU time (sum of all cores) since object creation.
CPU Usage (Nanocores)	Total CPU usage (sum of all cores) averaged over the sample window.
CPU Usage Ratio (%)	Total CPU usage (sum of all cores) averaged over the sample window.
Creation Timestamp	A timestamp representing the server time when Node object was created. It is not guaranteed to be set in happens-before order across separate operations. Clients may not set this value.
External ID	External ID of the Node assigned by some machine database (e.g. a cloud provider).
External IP	The external IP address of the Node.
Failed Pod Count	The current number of pods hosted by this Node in a Failed phase.
Five Minute Evictions	The number of Pods evicted from this node in the last five minutes.
Free inodes	The number of free inodes in the filesystem.
Hostname	The hostname of the Node.
Internal IP	The internal IP address of the Node.
Is Master	Indicates if the Node is hosting master components (kube-apiserver, kube-controller-manager, kube-scheduler).
Kernel Version	Kernel Version reported by the Node from 'uname -r'.
Kubelet Version	Kubelet Version reported by the Node.
KubeProxy Version	KubeProxy Version reported by the Node.
Machine ID	MachineID reported by the Node. For unique machine identification in the cluster this field is preferred.
Major Page Faults	Cumulative number of major page faults.
Memory Usage (Bytes)	Total memory in use. This includes all memory regardless of when it was accessed.
Memory Usage Ratio (%)	Total memory in use. This includes all memory regardless of when it was accessed.
Minor Page Faults	Cumulative number of minor page faults.
Not Ready Pod Count	The current number of pods hosted by this Node in a Not Ready condition.
Operating System	The Operating System reported by the Node.
OS Image	OS Image reported by the Node from /etc/os-release.
Pending Pod Count	The current number of pods hosted by this Node in a Pending phase.
Pod Capacity	Total pod capacity of the Node.
Pod Count	The current number of pods hosted by this Node.
Provider ID	ID of the Node assigned by the cloud provider in the format: ://.
Ready	True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last 40 seconds
Ready Pod Count	The current number of pods hosted by this Node in a Ready condition.
Receive Errors (Errors)	Cumulative count of receive errors encountered.
Received Data (Bytes)	Cumulative amount of data received.
Region	The AWS Region this object belongs to.
RSS Usage (Bytes)	The amount of anonymous and swap cache memory (includes transparent hugepages).
Running Not Ready Pod Count	The current number of pods hosted by this Node in a Running phase, but not a Ready condition.
Running Pod Count	The current number of pods hosted by this Node in a Running phase.
Succeeded Pod Count	The current number of pods hosted by this Node in a Succeeded phase.
Total inodes	The total number of inodes in the filesystem.
Total Memory (Bytes)	Total memory capacity of the Node.
Total Space (Bytes)	The total capacity of the filesystems underlying storage.
Transmit Errors (Errors)	Cumulative count of transmit errors encountered.
Transmitted Data (Bytes)	Cumulative amount of data transmitted.
UID	UID of the Node.
Unknown Pod Count	The current number of pods hosted by this Node in an Unknown condition.
Used inodes	The number of inodes used by the filesystem.
Used Space (Bytes)	The storage space used on the filesystem.
Used Space Ratio (%)	The storage space used on the filesystem.
Working Set Usage (Bytes)	The amount of working set memory. This includes recently accessed memory, dirty memory, and kernel memory.

Pod

Name	Description
Available Ephemeral Space (Bytes)	The ephemeral storage space available.
Available Memory (Bytes)	Available memory for use. This is defined as (Memory Limit - Working Set Usage). If Memory Limit is undefined, this metric is not returned.
Component	The component label of the Pod.
CPU Limit (Nanocores)	The maximum CPU usage the containers on this pod are allowed to consume.
CPU Limit Usage (%)	The total CPU Usage in proportion to the CPU Limit.
CPU Request (Nanocores)	The minimum CPU usage required by the containers on this pod.
CPU Request Usage (%)	The total CPU Usage in proportion to the CPU Request.
CPU Time (Nanoseconds)	Cumulative CPU time (sum of all cores) since object creation.
CPU Usage (Nanocores)	Total CPU usage (sum of all cores) averaged over the sample window.
DNS Policy	The DNS policy for the Pod. Defaults to "ClusterFirst". Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'. DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy. To have DNS options set along with hostNetwork, you have to specify DNS policy explicitly to 'ClusterFirstWithHostNet'. Note that 'None' policy is an alpha feature introduced in v1.9 and CustomPodDNS feature gate must be enabled to use it.
Free Ephemeral inodes	The number of free ephemeral inodes.
Host IP Address	IP address of the host to which the pod is assigned. Empty if not yet scheduled.
Major Page Faults	Cumulative number of major page faults.
Memory Limit (Bytes)	The maximum memory usage the containers on this pod are allowed to consume.
Memory Limit Usage (%)	The total Memory Usage in proportion to the Memory Limit.
Memory Request (Bytes)	The minimum memory required by the containers on this pod.
Memory Request Usage (%)	The total Memory Usage in proportion to the Memory Request.
Memory Usage (Bytes)	Total memory in use. This includes all memory regardless of when it was accessed.
Minor Page Faults	Cumulative number of minor page faults.
Name	Name of the Pod.
Namespace	Namespace defines the space within which each name must be unique. An empty namespace is equivalent to the "default" namespace, but "default" is the canonical representation. Not all objects are required to be scoped to a namespace - the value of this field for those objects will be empty. Must be a DNS_LABEL. Cannot be updated.
Phase	Current condition of the Pod.
Pod IP Address	IP address allocated to the pod. Routable at least within the cluster. Empty if not yet allocated.
QoS Class	The Quality of Service (QoS) classification assigned to the Pod based on resource requirements.
Ready	True if all containers on this Pod are in a ready state.
Region	The AWS Region this object belongs to.
Restart Count	The number of times Containers running on this Pod have restarted.
Restart Policy	Restart policy for all containers within the Pod. One of Always, OnFailure, Never. Default to Always.
RSS Usage (Bytes)	The amount of anonymous and swap cache memory (includes transparent hugepages).
Total Ephemeral inodes	The total number of ephemeral inodes.
Total Ephemeral Space (Bytes)	The total ephemeral storage capacity.
Used Ephemeral inodes	The number of ephemeral inodes used.
Used Ephemeral Space (Bytes)	The ephemeral storage space used.
Used Ephemeral Space Ratio (%)	The ephemeral storage space used.
Working Set Usage (Bytes)	The amount of working set memory. This includes recently accessed memory, dirty memory, and kernel memory.

Volume

Name	Description
Available Space (Bytes)	The storage space available of this volume.
Free inodes	The number of free inodes in this volume.
Name	Name of the Volume.
Region	The AWS Region this object belongs to.
Total inodes	The total number of inodes in this volume.
Total Space (Bytes)	The total storage capacity of this volume.
Type	The type of the volume.
Used inodes	The number of inodes used in this volume.
Used Space (Bytes)	The storage space used of this volume.

Amazon EKS

Data Collection Setup

Network Requirements

🚧
For users intending to deploy a BindPlane Collector that is not within the same VPC that the EKS Cluster exists in, please be aware of the following

Least Privileged User

Amazon User

EKS Cluster: Setting Up A Monitoring User

To setup a monitoring user on the EKS cluster

📘
Where do I install the aws-iam-authenticator?

BindPlane Collector & Kubernetes Source Setup

Gather information from your EKS cluster

Deploy a collector

Configure the Kubernetes Source

Connection Parameters

Metrics

Cluster

Container

Deployment

Job

Namespace

Node

Pod

Volume

Data Collection Setup

Network Requirements

🚧For users intending to deploy a BindPlane Collector that is not within the same VPC that the EKS Cluster exists in, please be aware of the following

Least Privileged User

Amazon User

EKS Cluster: Setting Up A Monitoring User

To setup a monitoring user on the EKS cluster

📘Where do I install the aws-iam-authenticator?

BindPlane Collector & Kubernetes Source Setup

Gather information from your EKS cluster

Deploy a collector

Configure the Kubernetes Source

Connection Parameters

Metrics

Cluster

Container

Deployment

Job

Namespace

Node

Pod

Volume

🚧
For users intending to deploy a BindPlane Collector that is not within the same VPC that the EKS Cluster exists in, please be aware of the following

📘
Where do I install the aws-iam-authenticator?