Kubernetes
Monitor Container Orchestration and Scaling
Data Collection Setup
Metrics are collected via the Kubernetes Host API and Kubelet API.
Blue Medora has created a LPU file for secure data collection. See the Least Privileged User Section below for detailed instructions
In-depth information around Kuberenetes Authentication mechanisms can be found here: https://kubernetes.io/docs/reference/access-authn-authz/authentication/
Network Requirements
Ports for Kubernetes Host:
- Non-SSL: 8080 (TCP) Default
- SSL: 443 (TCP) Default
Ports for Kubelet API:
- Non-SSL: 10255 (TCP) Default
- SSL: 10250 (TCP) Default
Configurable Ports
The network ports for the Kubernetes servers may differ depending on your deployment.
Least Privileged User
Credentials are supported when using a Bearer token.
Install the bindplane-monitoring role
- Verify you are pointed to the correct cluster and your kubectl environment is setup properly
kubectl cluster-info
- Next, run the following commands to install the bindplane-monitoring role
kubectl apply -f https://raw.githubusercontent.com/BlueMedoraPublic/bm-kube-lpu/master/bluemedora-lpu/bm-clusterrole.yaml
kubectl apply -f https://raw.githubusercontent.com/BlueMedoraPublic/bm-kube-lpu/master/bluemedora-lpu/bm-clusterrolebinding.yaml
kubectl apply -f https://raw.githubusercontent.com/BlueMedoraPublic/bm-kube-lpu/master/bluemedora-lpu/bm-role.yaml
kubectl apply -f https://raw.githubusercontent.com/BlueMedoraPublic/bm-kube-lpu/master/bluemedora-lpu/bm-rolebinding.yaml
kubectl apply -f https://raw.githubusercontent.com/BlueMedoraPublic/bm-kube-lpu/master/bluemedora-lpu/bm-serviceaccount.yaml
Obtain the Bearer Token
- Finally, after the bindplane-monitoring role has been installed, you can get the Bearer Token needed when configuring the source in BindPlane.
You can get the bearer token by running:
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep bluemedora | awk '{print $1}')
Use this Bearer Token In the Source configuration.
Enable Role-Based Access Control (RBAC) authorization mode for Custom Installations
For manually installed Kubernetes clusters, ensure that you RBAC enabled. Most Kubernetes deployments will have this enabled by default.
Before you can install the bluemedora-k8s-lpu.tar.gz file, your cluster must have RBAC authorization enabled.
1.Make sure the apiserver has been started with --authorization-mode=RBAC
2.If you have to change your configuration in order to enable this you will also need to pass --authorization-rbac-super-user=admin replacing 'admin' with whatever your admin account is.
3.Most Kubernetes deploy tools have RBAC enabled by default. If yours was not, you will need to do some additional setup in order to get everything running smoothly. This blog post has a step by step walkthrough: http://blog.screwdriver.cd/post/150999692902/how-we-got-rbac-working-in-kubernetes
4.You will need to restart the apiserver if you had to update the configuration to enable this.
Supported Versions
Kubernetes: 1.16 - 1.20
How to Leverage BindPlane Metrics for Kubernetes HPA Setup
Pre-Requisites
Must be using the BindPlane for Stackdriver offering
The Stackdriver Project being used for the BindPlane Destination must contain the GKE cluster.
The Stackdriver Destination in BindPlane must have the
bindplane_id
label enabled. For more information please read Enabling thebindplane_id
in Google Stackdriver
Setup Kubernetes for GKE Metrics
1. Make sure you are a user that has admin privilege to the GKE cluster.
You can add yourself with:
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value account)
2. Verify the cluster has the correct access
For GKE cluster <my_cluster>, use following command:
gcloud container clusters describe <my_cluster>
For each node pool check the section oauthScopes - there should be https://www.googleapis.com/auth/monitoring scope listed there.
3. Deploy the Stackdriver Collector
###Using the legacy resource model (Under your GKE settings, titled "Legacy Stackdriver Monitoring"), run the following:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter.yaml
Using the new resource model, run the following:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
4. Once deployed, you can view all metrics by running:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq
5. Record the path given by this API to build the HPA configuration.
You can verify an individual metric by running the following, and gather the data that comes back
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/*/external.googleapis.com|bluemedora|generic_node|aerospike_server|namespace|memory|usage" | jq
6. Create a HPA resource.
Make sure you create a HPA using the v2beta1 API, so that external metrics can be leveraged. An example configuration is below, but will be heavily dependent on the specific metric and resource
Example Configuration
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: myservice
namespace: mynamespace
spec:
minReplicas: 1
maxReplicas: 5
metrics:
- external:
metricName: external.googleapis.com|bluemedora|generic_node|aerospike_server|namespace|memory|usage
metricSelector:
matchLabels:
metric.labels.bindplane_id: 35691764da932e9cabd24ae124de5577
targetAverageValue: 300M
type: External
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myservice
Connection Parameters
Name | Required? | Description |
---|---|---|
Host | Required | The Kubernetes host to connect to. |
Port | The port for communication to the Kubernetes host. | |
Kubelet Port | The port for communication to the Kubernetes kubelet. | |
Bearer Token | Required | |
SSL Configuration | The SSL mode to use when connecting to the target. Can be configured to not use SSL (No SSL), use SSL but do not verify the target's certificate (No Verify), and use SSL and verify the target's certificate (Verify). | |
Cluster Name | ||
Event Cutoff Time | Only events with a 'last updated' timestamp that is after this timestamp will be collected. | |
Connection Timeout | The number of seconds to allow for connecting to the target. | |
Max Simultaneous Kubelet Requests | Sets the maximum simultaneous requests to Kubelets. Set to 0 to disable throttling. | |
Collect Containers | Toggles the collection of containers. | |
Collect Volumes | Toggles the collection of volumes. | |
Collect from Kubelet API | Toggles the collection from the cAdvisor API used by the Kubelets. |
Metrics
Cluster
Name | Description |
---|---|
Average CPU Usage (Nanocores) | The current average CPU usage per node in this Kubernetes Cluster. |
Average Filesystem Space (Bytes) | The average filesystem space per node in this Kubernetes Cluster. |
Average Filesystem Used (Bytes) | The average used filesystem space per node in this Kubernetes Cluster. |
Average Memory Usage (Bytes) | The current average memory usage per node in this Kubernetes Cluster. |
Failed Pod Count | The current number of pods in this Kubernetes Cluster in a Failed phase. |
Node Count | The current number of nodes in this Kubernetes Cluster. |
Not Ready Node Count | The current number of nodes in this Kubernetes Cluster in a Not Ready condition. |
Not Ready Pod Count | The current number of pods in this Kubernetes Cluster in a Not Ready condition. |
Pending Pod Count | The current number of pods in this Kubernetes Cluster in a Pending phase. |
Pod Capacity | The total pod capacity of this Kubernetes Cluster. |
Pod Count | The current number of pods in this Kubernetes Cluster. |
Ready Node Count | The current number of nodes in this Kubernetes Cluster in a Ready condition. |
Ready Pod Count | The current number of pods in this Kubernetes Cluster in a Ready condition. |
Running Not Ready Pod Count | The current number of pods in this Kubernetes Cluster in a Running phase, but not a Ready condition. |
Running Pod Count | The current number of pods in this Kubernetes Cluster in a Running phase. |
Succeeded Pod Count | The current number of pods in this Kubernetes Cluster in a Succeeded phase. |
Total CPU Capacity (Nanocores) | The total CPU capacity over all nodes in this Kubernetes Cluster. |
Total CPU Usage (Nanocores) | The current CPU usage over all nodes in this Kubernetes Cluster. |
Total CPU Usage Ratio (%) | The current CPU usage over all nodes in this Kubernetes Cluster. |
Total Filesystem Space (Bytes) | The total filesystem space over all nodes in this Kubernetes Cluster. |
Total Filesystem Usage (%) | The current filesystem usage over all nodes in this Kubernetes Cluster. |
Total Filesystem Used (Bytes) | The total used filesystem space over all nodes in this Kubernetes Cluster. |
Total Memory (Bytes) | The total memory capacity over all nodes in this Kubernetes Cluster. |
Total Memory Usage (Bytes) | The current memory usage over all nodes in this Kubernetes Cluster. |
Total Memory Usage Ratio (%) | The current memory usage over all nodes in this Kubernetes Cluster. |
Total Received Data (Bytes) | The total received data over all nodes in this Kubernetes Cluster. |
Total Transmitted Data (Bytes) | The total transmitted data over all nodes in this Kubernetes Cluster. |
Unknown Node Count | The current number of nodes in this Kubernetes Cluster in an Unknown condition. |
Unknown Pod Count | The current number of pods in this Kubernetes Cluster in an Unknown condition. |
Container
Name | Description |
---|---|
Available Space (Bytes) | The storage space available for the filesystem. |
CPU Limit (Nanocores) | The maximum CPU usage this container is allowed to consume. |
CPU Limit Usage (%) | The total CPU Usage in proportion to the CPU Limit. |
CPU Request (Nanocores) | The minimum CPU usage this container requires for it to be scheduled. |
CPU Request Usage (%) | The total CPU Usage in proportion to the CPU Request. |
CPU Time (Nanoseconds) | Cumulative CPU time (sum of all cores) since object creation. |
CPU Usage (Nanocores) | Total CPU usage (sum of all cores) averaged over the sample window. |
Free inodes | The number of free inodes in the filesystem. |
Image | The Image name of this container. |
Major Page Faults | Cumulative number of major page faults. |
Memory Limit (Bytes) | The maximum Memory usage this container is allowed to consume. |
Memory Limit Usage (%) | The total Memory Usage in proportion to the Memory Limit. |
Memory Request (Bytes) | The minimum memory this container requires for it to be scheduled. |
Memory Request Usage (%) | The total Memory Usage in proportion to the Memory Request. |
Memory Usage (Bytes) | Total memory in use. This includes all memory regardless of when it was accessed. |
Minor Page Faults | Cumulative number of minor page faults. |
Name | Name of the Container. |
Pod Name | Name of the Pod the Container is running on. |
Pod Namespace | Namespace of the Pod the Container is running on. |
Ready | The Container is Ready when it is ready to start accepting traffic. |
Restart Count | The number of times this container has restarted. |
RSS Usage (Bytes) | The amount of anonymous and swap cache memory (includes transparent hugepages). |
Start Time | The time that this container started running. |
Total inodes | The total number of inodes in the filesystem. |
Total Space (Bytes) | The total capacity of the filesystems underlying storage. |
Used inodes | The number of inodes used by the filesystem. |
Used Space (Bytes) | The storage space used on the filesystem. |
Used Space Ratio (%) | The storage space used on the filesystem. |
Working Set Usage (Bytes) | The amount of working set memory. This includes recently accessed memory, dirty memory, and kernel memory. |
Namespace
Name | Description |
---|---|
Average CPU Usage (Nanocores) | The current average CPU usage per pod in this Namespace. |
Average Memory Usage (Bytes) | The current average memory usage per pod in this Namespace. |
Failed Pod Count | The current number of pods hosted by this Namespace in a Failed phase. |
Name | Name of the Namespace. |
Not Ready Pod Count | The current number of pods hosted by this Namespace in a Not Ready condition. |
Pending Pod Count | The current number of pods hosted by this Namespace in a Pending phase. |
Phase | Current condition of the Namespace. |
Pod Count | The current number of pods hosted by this Namespace. |
Ready Pod Count | The current number of pods hosted by this Namespace in a Ready condition. |
Running Not Ready Pod Count | The current number of pods hosted by this Namespace in a Running phase, but not a Ready condition. |
Running Pod Count | The current number of pods hosted by this Namespace in a Running phase. |
Succeeded Pod Count | The current number of pods hosted by this Namespace in a Succeeded phase. |
Total CPU Usage (Nanocores) | The current CPU usage over all pods in this Namespace. |
Total Memory Usage (Bytes) | The current memory usage over all pods in this Namespace. |
Unknown Pod Count | The current number of pods hosted by this Namespace in an Unknown condition. |
Node
Name | Description |
---|---|
Allocatable CPU (Nanocores) | The amount of memory on the Node available for scheduling. |
Allocatable Memory (Bytes) | The amount of memory on the Node available for scheduling. |
Allocatable Pods | The number of pods available for scheduling. |
Architecture | The Architecture reported by the Node. |
Available Memory (Bytes) | Available memory for use. This is defined as (Total Memory - Working Set Usage). |
Available Space (Bytes) | The storage space available for the filesystem. |
Container Runtime Version | ContainerRuntime Version reported by the Node through runtime remote API. |
Containers CPU Limit (Nanocores) | The maximum CPU usage the containers on this node are allowed to consume. |
Containers CPU Request (Nanocores) | The minimum CPU usage required by the containers on this node. |
Containers Memory Limit (Bytes) | The maximum Memory usage the containers on this node are allowed to consume. |
Containers Memory Request (Bytes) | The minimum memory required by the containers on this node. |
CPU Capacity (Nanocores) | Total CPU capacity of the Node. |
CPU Time (Nanoseconds) | Cumulative CPU time (sum of all cores) since object creation. |
CPU Usage (Nanocores) | Total CPU usage (sum of all cores) averaged over the sample window. |
CPU Usage Ratio (%) | Total CPU usage (sum of all cores) averaged over the sample window. |
Creation Timestamp | A timestamp representing the server time when Node object was created. It is not guaranteed to be set in happens-before order across separate operations. Clients may not set this value. |
External ID | External ID of the Node assigned by some machine database (e.g. a cloud provider). |
External IP | The external IP address of the Node. |
Failed Pod Count | The current number of pods hosted by this Node in a Failed phase. |
Five Minute Evictions | The number of Pods evicted from this node in the last five minutes. |
Free inodes | The number of free inodes in the filesystem. |
Hostname | The hostname of the Node. |
Internal IP | The internal IP address of the Node. |
Is Master | Indicates if the Node is hosting master components (kube-apiserver, kube-controller-manager, kube-scheduler). |
Kernel Version | Kernel Version reported by the Node from 'uname -r'. |
Kubelet Version | Kubelet Version reported by the Node. |
KubeProxy Version | KubeProxy Version reported by the Node. |
Machine ID | MachineID reported by the Node. For unique machine identification in the cluster this field is preferred. |
Major Page Faults | Cumulative number of major page faults. |
Memory Usage (Bytes) | Total memory in use. This includes all memory regardless of when it was accessed. |
Memory Usage Ratio (%) | Total memory in use. This includes all memory regardless of when it was accessed. |
Minor Page Faults | Cumulative number of minor page faults. |
Not Ready Pod Count | The current number of pods hosted by this Node in a Not Ready condition. |
Operating System | The Operating System reported by the Node. |
OS Image | OS Image reported by the Node from /etc/os-release. |
Pending Pod Count | The current number of pods hosted by this Node in a Pending phase. |
Pod Capacity | Total pod capacity of the Node. |
Pod Count | The current number of pods hosted by this Node. |
Provider ID | ID of the Node assigned by the cloud provider in the format: ://. |
Ready | True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last 40 seconds |
Ready Pod Count | The current number of pods hosted by this Node in a Ready condition. |
Receive Errors (Errors) | Cumulative count of receive errors encountered. |
Received Data (Bytes) | Cumulative amount of data received. |
RSS Usage (Bytes) | The amount of anonymous and swap cache memory (includes transparent hugepages). |
Running Not Ready Pod Count | The current number of pods hosted by this Node in a Running phase, but not a Ready condition. |
Running Pod Count | The current number of pods hosted by this Node in a Running phase. |
Succeeded Pod Count | The current number of pods hosted by this Node in a Succeeded phase. |
Total inodes | The total number of inodes in the filesystem. |
Total Memory (Bytes) | Total memory capacity of the Node. |
Total Space (Bytes) | The total capacity of the filesystems underlying storage. |
Transmit Errors (Errors) | Cumulative count of transmit errors encountered. |
Transmitted Data (Bytes) | Cumulative amount of data transmitted. |
UID | UID of the Node. |
Unknown Pod Count | The current number of pods hosted by this Node in an Unknown condition. |
Used inodes | The number of inodes used by the filesystem. |
Used Space (Bytes) | The storage space used on the filesystem. |
Used Space Ratio (%) | The storage space used on the filesystem. |
Working Set Usage (Bytes) | The amount of working set memory. This includes recently accessed memory, dirty memory, and kernel memory. |
Pod
Name | Description |
---|---|
Available Ephemeral Space (Bytes) | The ephemeral storage space available. |
Available Memory (Bytes) | Available memory for use. This is defined as (Memory Limit - Working Set Usage). If Memory Limit is undefined, this metric is not returned. |
Component | The component label of the Pod. |
CPU Limit (Nanocores) | The maximum CPU usage the containers on this pod are allowed to consume. |
CPU Limit Usage (%) | The total CPU Usage in proportion to the CPU Limit. |
CPU Request (Nanocores) | The minimum CPU usage required by the containers on this pod. |
CPU Request Usage (%) | The total CPU Usage in proportion to the CPU Request. |
CPU Time (Nanoseconds) | Cumulative CPU time (sum of all cores) since object creation. |
CPU Usage (Nanocores) | Total CPU usage (sum of all cores) averaged over the sample window. |
DNS Policy | The DNS policy for the Pod. Defaults to "ClusterFirst". Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'. DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy. To have DNS options set along with hostNetwork, you have to specify DNS policy explicitly to 'ClusterFirstWithHostNet'. Note that 'None' policy is an alpha feature introduced in v1.9 and CustomPodDNS feature gate must be enabled to use it. |
Free Ephemeral inodes | The number of free ephemeral inodes. |
Host IP Address | IP address of the host to which the pod is assigned. Empty if not yet scheduled. |
Major Page Faults | Cumulative number of major page faults. |
Memory Limit (Bytes) | The maximum memory usage the containers on this pod are allowed to consume. |
Memory Limit Usage (%) | The total Memory Usage in proportion to the Memory Limit. |
Memory Request (Bytes) | The minimum memory required by the containers on this pod. |
Memory Request Usage (%) | The total Memory Usage in proportion to the Memory Request. |
Memory Usage (Bytes) | Total memory in use. This includes all memory regardless of when it was accessed. |
Minor Page Faults | Cumulative number of minor page faults. |
Name | Name of the Pod. |
Namespace | Namespace defines the space within which each name must be unique. An empty namespace is equivalent to the "default" namespace, but "default" is the canonical representation. Not all objects are required to be scoped to a namespace - the value of this field for those objects will be empty. Must be a DNS_LABEL. Cannot be updated. |
Phase | Current condition of the Pod. |
Pod IP Address | IP address allocated to the pod. Routable at least within the cluster. Empty if not yet allocated. |
QoS Class | The Quality of Service (QoS) classification assigned to the Pod based on resource requirements. |
Ready | True if all containers on this Pod are in a ready state. |
Restart Count | The number of times Containers running on this Pod have restarted. |
Restart Policy | Restart policy for all containers within the Pod. One of Always, OnFailure, Never. Default to Always. |
RSS Usage (Bytes) | The amount of anonymous and swap cache memory (includes transparent hugepages). |
Total Ephemeral inodes | The total number of ephemeral inodes. |
Total Ephemeral Space (Bytes) | The total ephemeral storage capacity. |
Used Ephemeral inodes | The number of ephemeral inodes used. |
Used Ephemeral Space (Bytes) | The ephemeral storage space used. |
Used Ephemeral Space Ratio (%) | The ephemeral storage space used. |
Working Set Usage (Bytes) | The amount of working set memory. This includes recently accessed memory, dirty memory, and kernel memory. |
Volume
Name | Description |
---|---|
Available Space (Bytes) | The storage space available of this volume. |
Free inodes | The number of free inodes in this volume. |
Name | Name of the Volume. |
Total inodes | The total number of inodes in this volume. |
Total Space (Bytes) | The total storage capacity of this volume. |
Type | The type of the volume. |
Used inodes | The number of inodes used in this volume. |
Used Space (Bytes) | The storage space used of this volume. |
Updated almost 4 years ago