SQL Server Monitoring – Agent jobs performance

This blog is part of the blog series named SQL Server Monitoring.

Purpose

To get a baseline of the SQL agent jobs and their executions over time, how long time they took, and did they succeed, or ended with errors, in failure or other outcomes.

Sources

This feature uses the tables msdb.dbo.sysjobs and msdb.dbo.sysjobhistory which returns information of the individual steps executed including a final outcome step (step id 0) which contains the status for the job as well as the overall execution job.

The service also checks if it is running at a too slow frequency resulting in data loss as the rows in msdb.dbo.sysjobhistory are purged before being read.

Report

The report is divided into 2 parts.

The upper part is showing a line and stacked column chart which shows the agent jobs executions over time, where the columns shows the number of executions, and where the line (secondary y-axis) show the execution time in seconds. Execution times of 0 second will be reported as 1 second.

The lower part shows 2 tables, where the left table shows the jobs executed on the selected date, showing the number of executions in the categories “Succeeded”, “Ended with errors”, “Failed” and “Other errors”.
The “Missing data” is showing how many times it has been detected that records was purged before being read.
The right table shows the error details for the jobs that failed or ended with errors on the selected date.

Model

The model for the report uses the facts [Agent jobs performance] and [Agent jobs performance errors], and the dimensions [Date], [Time], [Agent job] and [Agent job status].

Configuration

The feature accepts a filter specification that excludes certain errors.

This can be needed in e.g. an Always On setup, where there agent jobs are replicated onto the standby servers, but where the agent jobs step 1 checks if they are executing on the primary node, and if not the steps fails with an error.

The filter must be written such that it can be added as an “AND” block.

E.g.:

(sjh.message NOT LIKE ‘%Not primary%’)

Scheduling

The default schedule for the feature is every 10 minutes (600 secinds).

Retention

The default retention for the tables dbo.fhsmAgentJobsPerformance and dbo.fhsmAgentJobsPerformanceLatestError is 365 days.