Microsoft Azure Batch
Azure Batch Computing Monitoring
This source has been deprecated
observIQ is in the process of transitioning a subset of BindPlane's monitoring capabilities to the observIQ OpenTelemetry Collector. As a result, this Source is no longer publicly available in BindPlane. If you need access to this Source, please reach out to our support via chat or via [email protected].
Please refer to the Microsoft Azure Sources topic for additional information on how to configure the LPU, and general Azure Data Collection setup details.
Least Privileged User
Steps:
- Using the Azure CLI Client, find the Subscription ID and Tenant ID from your account list
- Create a custom RBAC role using the JSON provided. Include your Subscription ID and rename the file to azure.json
- Create an Active Directory Service Principal and assign the custom RBAC role t it.
Creating custom roles using the Azure CLI:
https://docs.microsoft.com/en-us/azure/role-based-access-control/custom-roles
Assigning roles using the Azure CLI:
https://docs.microsoft.com/en-us/azure/role-based-access-control/role-assignments-portal
{
"Name": "LPU Batch",
"Description": "LPU for Load Balancers",
"Actions": [
"Microsoft.Batch/batchAccounts/*/read",
"Microsoft.Insights/metrics/*/read",
"Microsoft.Authorization/*/read"
],
"AssignableScopes": [
"/subscriptions/[Subscription ID]"
]
}
Connection Parameters
Name | Required? | Description |
---|---|---|
Subscription ID | Required | GUID Subscription ID |
Tenant ID | Required | GUID Tenant ID (also known as Directory ID) |
Client ID | Required | GUID Client ID (also known as Application ID) |
Client Secret | Required | The Secret (also known as Key) corresponding to the Client ID. |
Maximum HTTP Retry Time (seconds) | The maximum amount of time in seconds to retry each API request when the API is throttling. | |
HTTP Request Timeout (seconds) | The maximum amount of time in seconds before a single HTTP request will fail. | |
Monitor Metric Collection Level | Selects which monitor metrics should be collected. | |
Filter By Resource Group Type | Selects whether to use a whitelist or blacklist when filtering by Resource Groups. | |
Filter By Resource Group Whitelist | A comma separated list of resource groups to explicitly allow. A '*' character is used to represent 'all', and a blank string is used for 'none'. | |
Filter By Resource Group Blacklist | A comma separated list of resource groups to filter out. A '*' character is used to represent 'all', and a blank string is used for 'none'. | |
Filter By Tags Group Type | Selects whether to use a whitelist or blacklist when filtering by Resource Groups. | |
Filter By Tags Group Whitelist | A comma separated list of tags to explicitly allow. Tags must follow the format <key:value>. Instead of a specific tag, or tag value, a '*' character is used to represent 'all'. A blank entry is treated as 'none'. | |
Filter By Tags Group Blacklist | A comma separated list of tags to filter out. Tags must follow the format <key:value>. Instead of a specific tag, or tag value, a '*' character is used to represent 'all'. A blank entry is treated as 'none'. |
Metrics
Account
Name | Description |
---|---|
Active Job And Job Schedule Quota | The active job and job schedule quota for this batch account |
Auto Storage | The properties and status of any auto-storage account associated with the Batch account |
Creating Node Count | Number of nodes being created |
Dedicated Core Count | Total number of dedicated cores in the batch account |
Dedicated Core Quota | The dedicated core quota for this batch account |
Dedicated Node Count | Total number of dedicated nodes in the batch account |
Endpoint | The account endpoint used to interact with the Batch service |
ID | The ID of the batch account |
Idle Node Count | Number of idle nodes |
Last Key Sync | The time at which the auto-storage key was last synced |
Leaving Pool Node Count | Number of nodes leaving the Pool |
Location | The location of the batch account |
Low-Priority Core Count | Total number of low-priority cores in the batch account |
Low Priority Core Quota | The low-priority core quota for this batch account |
Low-Priority Node Count | Total number of low-priority nodes in the batch account |
Name | The name of the batch account |
Offline Node Count | Number of offline nodes |
Pool Allocation Mode | The allocation mode for creating pools in the batch account |
Pool Create Events | Total number of pools that have been created |
Pool Delete Complete Events | Total number of pool deletes that have completed |
Pool Delete Start Events | Total number of pool deletes that have started |
Pool Quota | The pool quota for this batch account |
Pool Resize Complete Events | Total number of pool resizes that have completed |
Pool Resize Start Events | Total number of pool resizes that have started |
Preempted Node Count | Number of preempted nodes |
Provisioning State | The provisioned state of the batch account |
Re-imaging Node Count | Number of reimaging nodes |
Rebooting Node Count | Number of rebooting nodes |
Resource Group | The Resource Group of the Azure resource. |
Running Node Count | Number of running nodes |
Start Task Failed Node Count | Number of nodes where the Start Task has failed |
Starting Node Count | Number of nodes starting |
Storage Account ID | The storage account id of the auto-storage account associated with the batch account |
Task Complete Events | Total number of tasks that have completed |
Task Fail Events | Total number of tasks that have completed in a failed state |
Task Start Events | Total number of tasks that have started |
Type | The type of the batch account |
Unusable Node Count | Number of unusable nodes |
Waiting For Start Task Node Count | Number of nodes waiting for the Start Task to complete |
API Usage
Name | Description |
---|---|
Average Pages | The average amount of pages needed for a paged resource type. |
Average Request Retries | The average number of retry requests per unique requests made. |
Average Retry Attempts | The average number of retry requests made per unique request that was retried. |
Average Retry Wait (Milliseconds) | The average amount of time retried requests spent waiting. |
Client ID | The client ID used to make API calls. |
Failed Requests | The total number of requests that returned a failure response. |
Maximum Pages | The most amount of pages needed for a paged resource type. |
Maximum Retries | The highest number of retries made for a single request. |
Maximum Retry Wait (Milliseconds) | The most amount of time a retried request spent waiting. |
Minimum Pages | The least amount of pages needed for a paged resource type. |
Minimum Retry Wait (Milliseconds) | The least amount of time a retried request spent waiting. |
Other Status Responses | The total number of successful requests that responded with some other accepted status. |
Request Timeouts | The total number of requests that timed out waiting for a response. |
Requests Retried | The number of unique requests that were retried. |
Retry Status Responses | The total number of successful requests that responded with the status TOO MANY REQUESTS (429). |
Retry Timeouts | The total number of requests that needed to be retried, but the request retry time exceeded the maximum retry time. |
Status OK Responses | The total number of successful requests that responded with the status OK (200). |
Subscription ID | The subscription ID used to make API calls. |
Successful Requests | The total number of requests that returned a successful response. |
Tenant ID | The tenant ID used to make API calls. |
Total Monitor Requests | The total number of requests made to get monitor metrics. |
Total Paged Requests | The total amount of resource types that required paging. |
Total Requests | The total number of requests made during collection. |
Total Retries | The total number of retry requests that were made. |
Unique Monitor Requests | The number of unique requests made to get monitor metrics. |
Unique Requests | The number of requests made with unique endpoints. |
Application
Name | Description |
---|---|
Allow Updates | A value indicating whether packages within the application may be overwritten using the same version string |
Default Version | The package to use if a client requests the application but does not specify a version |
Display Name | The display name for the application |
ID | Resource ID of the application |
Parent ID | The id of the parent resource. |
Pool
Name | Description |
---|---|
Allocation State | Whether the pool is resizing |
Allocation State Transition Time | The time at which the pool entered its current allocation state |
Application Licenses | The list of application licenses the Batch service will make available on each compute node in the pool |
Creation Time | The creation time of the pool |
Current Dedicated Nodes | The number of compute nodes currently in the pool |
Current Low Priority Nodes | The number of low priority compute nodes currently in the pool |
Entity Tag | The ETag of the resource, used for concurrency statements |
ID | Resource ID of the pool |
Inter Node Communication | Whether the pool permits direct communication between nodes. This imposes restrictions on which nodes can be assigned to the pool. Enabling this value can reduce the chance of the requested number of nodes to be allocated in the pool. If not specified, this value defaults to 'Disabled' |
Last Modified | This is the last time at which the pool level data, such as the targetDedicatedNodes or autoScaleSettings, changed. It does not factor in node-level changes such as a compute node changing state |
Maximum Tasks Per Node | The maximum number of tasks that can run concurrently on a single compute node in the pool |
Name | Resource name of the pool |
Provisioning State | The current state of the pool |
Provisioning State Transition Time | The time at which the pool entered its current state |
Task Scheduling Policy | How tasks are distributed across compute nodes in a pool |
Type | Microsoft Azure resource type |
VM Size | The size of virtual machines in the pool. All VMs in a pool are the same size |
Updated over 1 year ago