Parallel Unit Testing via SFDX CLI

In the few years I have been working with Salesforce one of things that has bugged me a lot is how long unit test suites take to run. I am going to explain here how we have finally made progress on this, rather excellent progress as Bill & Ted might have told it.

The short version is that we have automated what devs have been doing for a while by building a custom SFDX CLI command, this first runs the tests in parallel and then mops up any of those tests that fail with the infamous UNABLE_TO_LOCK_ROW by running them again. The automation makes things easier, but also gives us the chance to add some extra resilience to the process.

Prompted by a question on Salesforce StackExchange I have made some code that follows this approach available at github. It’s a bit embedded in something else but you can either chase through to see how this works or carry on reading and I will explain. Mostly what it is doing is orchestrating the running of standard sfdx commands to get the right output. If you just want to play with a pre-package version see instructions at bottom.

Progress

As tests runs can still take a little time the sfdx cli command reports progress so let’s start with that.

This shows a test run of a suite that if ran sequentially would take 8-9 hours to complete. I have changed the methods names just to protect the innocent but everything else is accurate, including that I had a few hours free late on a Saturday that a choose to use working on this ;-(

There are two stages being executed here, the first just runs all local tests on the org in parallel. While this runs the command reports progress roughly every minute so you can see what is happening. After that the tests failing due to locking issues are identified and these are re-run sequentially, if they pass during this phase then they are removed from the failure list. At the end of the run a JSON file is generated with any tests that have not passed.

Workarounds

It has taken a bit of effort to make this CLI stable but the approach taken is being used on a fairly busy CI pipe with very few issues. If you have spent much time working with SFDX CLI you will know its still a bit on the immature side and some of the effort in writing this has been in trying to anticipate when things might fail and deal with them.

The first issue we had to deal with is that force:apex:test:report would not handle this many tests, it has its limit. To workaround that we do use force:apex:test:run to start a test run & then we run force:apex:test:report but only as a means to be notified when the test run has completed. Unfortunately this was not that stable so there is some handling in the code to restart waiting if it dies unexpectedly alongside dealing with it failing due to too many tests.

Without being able to use force:apex:test:report the command implements its own reporting via running SOQL queries on the results. We also use SOQL queries for reporting progress every minute, this is just for progress so the data returned is not used for anything else.

Tests that fail due to locking are identified by looking for a couple of patterns in the result message. Once these are identified they are run again via force:apex:test:run but with the –synchronous flag set and naming the methods we want to re-test.

In the implementation I linked to above the final act is to save the list of failure details in a file as JSON. We have found it useful here to duplicate how force:apex:test:report prepares output to make CI integration easier. I have not included that handling in this code as it’s a bit involved, maybe I will add it later. If you want to see how this is done in force:apex:test:report have a look at testResults.js in the salesforce-alm package of the cli installation.

Quicker?

We have looked at a couple of ways you might make this run quicker. Initially we were quite excited by the possibilities of annotating test classes with @isTest(isParallel=true). Sadly after rather a lot of time spent on this we concluded it has no significant impact on our tests and it’s a bit of pain to do since not all tests can use it, maybe yours will be different.

What does help though is removing batch tests. If you look at the progress information you might note the number of tests executed in a period is reducing during the run. We spotted this and tried removing the batch tests and found the execution time on one suite tested halved. I don’t really have any insight into why Batch jobs hold everything up but something to be aware of.

Install via sfdx cli

The code for this is available as a sfdx cli extension on npm. To install run:

sfdx plugins:unistall apexlink

This might take a little while as the package has some Java code in it that I use for parsing Apex. To run the command make sure you are in a sfdx project and authenticated to an org and then do:

sfdx apexlink:retest

There are some arguments on the command but none are functional just yet.