Are you sure? Last good DBCC CHECKDB over 2 weeks old.

So this past week I had a little confusion over a new CheckID that was suddenly firing every day in sp_Blitz. This new CheckId was number 68, ‘Last good DBCC CHECKDB over 2 weeks old.’ What? How could that be? I run it daily from Ola’s maintenance plans.

My first check was to verify if it was a recent occurrence only:

SELECT *
 FROM dbo.BlitzResults
 WHERE
 CheckID = 68;

Yup, in my history of BlitzResults it had only happened in the past day. So I kept looking since this was something new and weird.

Next check … were there errors in the Agent log for the dbcc script? No, nothing there, even though the agent truncates the script output anyway making that log fairly useless.

Next, check Ola’s log table. Clear. There were no errors in the maintenance plan logs for any DBCC_CHECKDB runs for the unit_test_1 database over its entire history.

SELECT *
 FROM [dbo].[CommandLog]
 WHERE
 CommandType = 'DBCC_CHECKDB' AND
 DatabaseName = 'unit_test_1';

Something was fishy, but was it a red herring?

red herring   https://www.flickr.com/photos/24263322@N05/2403978660

Spoiler alert. The database name should have been a dead giveaway.

In the scrum meeting that morning I mentioned to the lead developer that there were strange errors in the logs for that database, but that I
thought the errors were a red herring.

Fortunately, I didn’t create any panic for either the developer or myself. I told him I would get back to him if I discovered something really fishy.

I reviewed the job for Ola’s DatabaseIntegrityCheck, and found nothing out-of-the-ordinary for the procedure call:

EXECUTE [dbo].[DatabaseIntegrityCheck]
 @Databases = 'USER_DATABASES',
 @LogToTable = 'Y'

I checked the code in Ola’s stored procedure to double check the parameters in use, which were fine:

DBCC CHECKDB ([unit_test_1]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY

So, the next step was to open up the sp_Blitz stored procedure to see why the error was firing.  I found out the script was running this:

DBCC DBINFO('unit_test_1') WITH TABLERESULTS

This command generates kind of a set of key-value pairs. The key sp_Blitz looks for is ‘dbi_dbccLastKnownGood‘.  Sure enough, it was over two weeks old, 2018-11-14 17:00:49.890, even though I knew I had been running DBCC CHECKDB nightly without errors.  This didn’t make any sense.

I had a lot going on, so I put it on the back burner for a day, but then the next day the same thing happened for two other databases, but not unit_test_1.  Why were these values jumping around? I began to worry that something really bad was happening.

I checked the SQL-Server-First-Responder-Kit issue log for more history on CheckID 68.  There wasn’t much in there, except the issue that became the origin for the 2-week check:

https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit/issues/1466

I ran into an article by Erin Stellato that proved that not all DBCC commands update dbi_dbccLastKnownGood.  That didn’t help my cause, it only opened the door to a wider range of possibilities.

Then I ran into a proof of concept blog written by Jason Brimhall discussing issues involving dbi_dbccLastKnownGood and database restore commands.

Viola!  Once again, Jason to the rescue. After reading his blog post the lights came on. I must have been running the sp_Blitz check on recently recovered databases before I was running the daily dbcc checkdb. Sure enough, I then discovered that the team of developers initiated a new set of unit tests this week.  Part of their process was restoring a fresh, but three-week old baseline backup file and rerunning a set of unit tests.

At this point I think I’ll submit an issue on the SQL-Server-First-Responder-Kit issue log, and recommend expanding the description of checkid 68.  I think they should change the wording from:

'Last good DBCC CHECKDB over 2 weeks old' AS Finding

to:

'Last good DBCC CHECKDB over 2 weeks old, or unit testers just restored old baseline backups' AS Finding