Help – Troubleshooting High DTU Utilisation
One of the very common topic for support cases which we receive at Microsoft is around customers trying to understand the cause of high DTU for their Azure SQL Database.
In this blog, I want to share the steps we follow, almost always, to approach a high DTU issue.
The most important thing to understand is that DTU is a blended measure of CPU, Data IO, Log Writes and Memory. If you have ever received an alert for your database consuming high DTU, then the first step to investigate the cause would be to find out which factor or component of DTU actually went high i.e. CPU, Data IO or Log Write.
If the DTU spiked any time within the last 14 days then the utilisation history can be fetched from the master database by using below script. The data is aggregated every 5 minutes and retained for 14 days.
The start and end time in the output is in Coordinated Universal Time (UTC).
SELECT [start_time] ,[end_time] ,[database_name] ,[avg_cpu_percent] ,[avg_data_io_percent] ,[avg_log_write_percent] ,( SELECT Max(v) FROM ( VALUES ([avg_cpu_percent]) ,([avg_data_io_percent]) ,([avg_log_write_percent]) ) AS value(v) ) AS [avg_DTU_percent] FROM sys.resource_stats --WHERE [database_name] = 'DB Name Here' ORDER BY start_time desc
Alternatively, if the issue is occurring right now then you can use below script to get more granular data (aggregated every 15 seconds), by running it against the impacted database.
SELECT end_time ,[avg_cpu_percent] ,[avg_data_io_percent] ,[avg_log_write_percent] ,( SELECT Max(v) FROM ( VALUES ([avg_cpu_percent]) ,([avg_data_io_percent]) ,([avg_log_write_percent]) ) AS value(v) ) AS [avg_DTU_percent] FROM sys.dm_db_resource_stats GO SELECT AVG(avg_cpu_percent) AS 'Average CPU Utilization In Percent' , MAX(avg_cpu_percent) AS 'Maximum CPU Utilization In Percent' , AVG(avg_data_io_percent) AS 'Average Data IO In Percent' , MAX(avg_data_io_percent) AS 'Maximum Data IO In Percent' , AVG(avg_log_write_percent) AS 'Average Log Write Utilization In Percent' , MAX(avg_log_write_percent) AS 'Maximum Log Write Utilization In Percent' FROM sys.dm_db_resource_stats GO
Note: The DTU graph in Azure Portal is plotted using the same data as the one being queried by the DMVs in above two scripts.
Once you have identified which component (CPU/Data IO/Log Write) is resulting in high DTU, it’s now time to review the culprit queries. This can be checked by querying the Query Store data, please check this blog for the script and more information on it.
By now, you should have the list of expensive queries from Query Store which resulted in the DTU spike.