JMeter, SQL Server and Thread Safety.

It’s been a busy summer and I’ve missed making a lot of posts. This post is about an interesting event from June — interesting from a technical point of view, but also because it highlights a testing trap that I fall into repeatedly, which is reporting errors from tests that are slightly out of scale.

Using Apache JMeter for SQL Server concurrency testing.

I’ve used Apache JMeter in the past for stress testing J2EE web apps. Vendors don’t like me very much when I tell them we’re going to run a stress test on their apps. Oh, and if your network group is paranoid you may need to reassure them that you have mission-critical testing requirements before you start using JMeter. This time around I’m using it as a shell around a SQL script for calling a stored procedure that I strongly suspected of concurrency issues.

Install JMeter

There’s not much to do to install JMeter. I had already installed the version 1.8 JRE, so I just grabbed the java package for JMeter from the Apache website: https://jmeter.apache.org I usually just extract the zip file into a folder here: c:\downloads\jmeter\apache-jmeter-4.0\
Then I make a shortcut to: c:\downloads\jmeter\apache-jmeter-4.0\bin\jmeterw.cmd

For my use case today, I’ll also need the SQL Server JDBC JAR file. I searched for the Microsoft JDBC driver “Microsoft JDBC Driver 6.4 for SQL Server” This should take you to this website today: https://www.microsoft.com
The downloaded file is sqljdbc_6.4.0.0_enu.exe. This extracts to a folder with several *.JAR files for different versions of Java. So, I located mssql-jdbc-6.4.0.jre8.jar in sqljdbc_6.4\enu and
copied it into the lib folder for JMeter: C:\downloads\jmeter\apache-jmeter-4.0\lib\

The Test and the Issue.

Backtracking a little to the real issue. One of our departments uses unique sequential numbers for each employee’s reports. This business rule was a real design headache and was unchangeable. Let that sink in … each employee has their own sequence number series. When the vendor designed it they stored it as a non-key column and rolled their own sequence generator on that column. For good discussions on this subject search for “sqlserver roll your own identity max”. I personally liked the response by Erland Sommarskog [1] from Microsoft on rolling your own sequence in this link: https://social.msdn.microsoft.com
We’ve been repeatedly handling errors in one area of the vendor’s app when multiple employees are added to an Assignment.  The application has been interchanging the max sequence number of one employee with another employee.

During triage, we evaluate if the issue stems from the code in the web app, or in the JavaScript, or the stored procedure. A lot of the code in this application is in stored procedures. There is one 500 line stored procedure that is responsible for setting this sequence number. The issue seemed very likely to originate from the procedure when looking at the code. The first thing that seemed odd was that it used a NOLOCK hint when calculating the next number.[1] Another issue is that the queries are not wrapped in a transaction.   The vendor is rolling not one, but two sequence numbers in their own custom identity code. The code first generates its primary key ID for the record, then inserts a record without the second sequence number, then queries for the max employee sequence number, then updates the record with the second sequence number.  Sadly, the procedure also lacked a try-catch block and error handling. Most of the time things worked ok, but the error repeated every month or two. The vendor insisted that the stored procedure was fine, not the source of the error.

Bring in the Clones.

To really test how it handled under concurrent use I decided to use JMeter to control the concurrency test.  There are other ways to run a concurrency test. I love the approach that Aaron Bertrand uses in this post:  http://sqlperformance.com However, JMeter was built for testing and has a lot of free functionality like result graphs, and can be extended into other languages.

The last time I used JMeter I used the proxy recording feature to script out a sequence of 100 AJAX and HTTP requests (calls to a J2EE ControllerServlet) of a web application. For that J2EE test I edited the JMeter .JMX file for the test carefully, partially through the GUI and partially in a text editor. Some of the powerful things I’ve used JMeter for in the past for website concurrency testing is for being able to do things like:

  • crafting ${COOKIE_SESSIONID} values into the path
  • running javascript on the fly during the test ${__javaScript(new Date().getTime();)}
  • using static data to insert from a CSV file with a “CSV Data Set Config”
  • using XPath queries and an “XPath Extractor” in Jmeter to strip data out of the AJAX response.

We found that a single Windows 10 PC running JMeter could easily handle simulated loads of up to 250 users on the web application to find Oracle queries that needed adjustments.

This time my Jmeter task is much simpler. I basically need only two items in the Thread Group for the test instead of 100 HTTP requests and a dozen other items in the Thread Group.   I got the information for running a SQL Server script file in JMeter from this SO post:  https://stackoverflow.com

I need a window.

I needed a window function to look for the gaps in the roll-your-own sequence code before running the test and then again after the test.

SELECT
 UserID,
 pregap_Seq_No,
 postgap_Seq_No
 FROM
 (
 SELECT
 UserID,
 Seq_No AS postgap_Seq_No,
 LAG(Seq_No) OVER (ORDER BY UserID, Seq_No) pregap_Seq_No
 FROM dbo.RequestAssignment
 ) q
 WHERE
 pregap_Seq_No <> postgap_Seq_No - 1
 ORDER BY
 UserID,
 postgap_Seq_No

The code for the clone.

I also needed to make the same kind of call as the web app. For that I came up with a SQL script to make the same kind of call to the troubled sproc as the web app was doing.

---------------------------------------------------
 --
 -- name:
 -- \2018-06-26-concurrency-test\insert_new_assignment.sql
 --
 -- choose a user at random from the same group of users on previous Assignments
 -- choose an open request without that user on the Assignment at random
 -- call the troubled sproc with this new data to insert
 --
 -- syntax:
 -- insert_new_assignment.sql
 --
 -- dependencies:
 -- running on an isolated instance and database [STRESSDB]
 --
 -- updated:
 -- 2018-06-26 original version
 --
 -- todo:
 --

-- choose a user at random from previous Assignments

DECLARE @RAND_USER INT =
 (
 SELECT TOP(1)
 UserID
 FROM [STRESSDB].[dbo].[RequestAssignment]
 ORDER BY CHECKSUM(NEWID())
 );

SELECT @RAND_USER;

-- choose an open request without that user on the Assignment at random

DECLARE @RAND_RID INT =
 (
 SELECT TOP(1)
 fr.RID
 FROM STRESSDB.dbo.Request fr
 INNER JOIN [STRESSDB].[dbo].[RequestAssignment] fra
 ON fr.FRID = fra.FRID
 WHERE
 fr.StatusID IN (1,2,3,4,6,10) AND
 fra.UserID <> @RAND_USER
 ORDER BY CHECKSUM(NEWID())
 );

SELECT @RAND_RID;

-- call the sproc with this new data to insert

DECLARE @CURRENT_ASNDATE VARCHAR(20) = CAST(getdate() AS VARCHAR(20));

EXEC troubled_Sproc
 ...snip...
 @UserId=@RAND_USER,
 @StatusID='6',
 @Asndate=@CURRENT_ASNDATE,
 @Comments='this is a small comment',
 @AuditUser='stresstest',
 @Action='I',

Back to the JMeter .JMX file.

With all that done I could setup the JMeter test.

As seen in the image below this particular test consists of a small Thread Group with four components. The thread group is where we set the number of concurrent users (I like to call them clones) and the number of times they will repeat the test.

The JDBC connection Configuration component is where we wire up the SQL Server .JAR file to JMeter.

The JDBC Request component is where we tell JMeter to use its FileToString() function to find the SQL Server script that the concurrent user will run. In this image below we can see it make the call to the script above called insert_new_assignment.sql

The testing trap.

If you scroll back up to the Thread Group image you can see where I fall into the testing trap. For some reason, I tend to focus on a larger than needed number of concurrent users and end up finding a different, but similar bug. This was no exception, and I didn’t even focus on a very large number of concurrent users, 10 users running the test 10 times. I quickly reported my findings before I realized my sample size was too large. Yes, I found a thread safety error. Yes, it was repeatable. Yes, I could add a transaction block, remove the nolock hint, add deadlock detection and the thread safety error would go away. However, I missed one key point. Only two managers would only ever always be the only two users for this stored procedure. I caught my own mistake, and re-ran the test with two users over 10000 simulated times … and the error never occurred. Same thing for 3 and 4 users. Thousands of tests later I found the first case where the error would occur, five users, which was more than twice the number of users who would ever use it.

I had fallen into a similar testing trap with the much longer J2EE test I described earlier. That trap was a little different. In that JMeter test, I had to simulate a workflow with 250 users across the state, but my test data wasn’t distributed across the entire state, but only on one area of the state. The Oracle server was collapsing, but the data skew was causing the error. When I went back and spread the data across the entire state the errors went away.

We eventually found another query in the stored procedure which was causing the bug.  We’ll eventually circle back to eliminate the source of the thread safety error, but probably as a refactoring exercise where we address all the other areas where the vendor rolled their own sequence numbers.

[1] Note that Erland recommends a UPDLOCK hint when you need to roll your own identity generator.  https://social.msdn.microsoft.com
Our vendor code currently contains over 9200 NOLOCK hints, so there’s a large refactoring project ahead of us.  It’s widely discussed that overuse of NOLOCK hints is a bad practice. A recent discussion on this topic was during the last 24 Hours of PASS. http://www.pass.org