The Silence of the Plan Cache Purge

I’m going to have to blog very quietly tonight since I don’t want the plan cache to hear me. This is a summary of a very silent plan cache purge on a production SQL Server that was whispering to us that it needed a new RAM configuration.

This purge was happening on an old SQL Server 2012 instance that I inherited from a contractor. It didn’t have all the settings that I put in my builds, especially for memory. On my new builds I follow Glenn Berry’s formulas for setting ‘max server memory (MB)’.  see here (for general formula)  or also here (see step 26.g)

Silent Rumors!

A colleague and I first discovered the silent purge when talking very quietly about a stored procedure that was just run three weeks prior and only runs once a year. We were talking very quietly since there were rumors floating around. Other coworkers were whispering that someone may have run the annual procedure more than once … scandalous! The first place I went to look for proof of the rumor was at the last_execution_time in sys.dm_exec_procedure_stats. You can imagine my horror when I found out that the plan cache had silently been purged the day before. I wanted to scream, but the plan cache may have heard me if I had done that.

Why the Silence?

Arron Bertrand nicely summarizes some reasons for a silent purge here on dba.stackexchange.com .  I carefully considered each of the items in Aaron’s list. We had recently started running a daily automated sp_Blitz on the server, so I knew that a manual DBCC FREEPROCCACHE (CheckID = 208) had not happened, and CheckID = 125 was never detected either. So it seemed as if memory pressure and memory intensive queries were the causes, even though I wasn’t getting CheckID = 51 either.

Listening for clues.

I wished that the plan cache would have spoken up. I looked for its cries for help in the default trace, and for its little voice in the system health file, but found nothing. Over the course of two weeks the plan cache purged four more times, and always at the same time in the early morning during a GIS batch job. A few times I woke up to listen for it with an extended events trace, but never heard the job purging the cache. Each time the last_execution_time in sys.dm_exec_procedure_stats always indicated that the same GIS procedure was the oldest one in the cache.

It seemed pretty certain that this GIS job was the cause of the silence, until it turned out that it wasn’t. (I’ll write more about what I found in this job in another post.) The fifth time the plan cache silent cleared was during a weekly billing job.

Then the plan cache stopped purging … for a month.

A noisy solution.

For a month the plan cache went along quietly doing its job without a silent purge until the middle of a Tuesday afternoon. This time I found a softly spoken clue in the system health ring buffer. I shredded the XML from the query and sure enough, there was a high number of page faults. We took the opportunity at the end of the day to finally take the server down, triple the ram after confirming available resources in the VMWare cluster, and finally correct the ‘max server memory (MB)’. A month later and it hasn’t repeated.

Reading & Meeting Rick Lowe.

Since then I’ve been reading more about the plan cache. This past week a Santa Fe SQL PASS user group meeting featured a presentation by Rick Lowe (blog|twitter) titled “Why should I care about … the plan cache?”  A slide from his deck has a nice query for politely asking the plan cache for recent query issues.  His query is similar to the plan cache details that I’ve been recently exploring from sp_Blitz with the optional flag @OutputProcedureCache = 1.

Rick had some great advice, but the most valuable takeaway was that the plan cache data can now be persisted in Query Store. This won’t help me in the old 2012 instance, but we’ll be able to fix this soon when we upgrade this speechless instance to 2016.

Plan Cache on a Pedestal.

I’ve since gone through a series of SQLPASS presentations on configuring Query Store. We’ve now enabled Query Store on all of our development 2016 instances. After I monitor the log growth and CPU impact, we’ll do the same in production. In one of her presentations (media|post), Erin Stellato was asked about its readiness, her response was “There is no reason *not* to use Query Store.” So now our silent little plan cache will get a much bigger voice with Query Store, and it seems the correctly provisioned RAM helps as well.