Continuous performance testing using GitHub Actions

In an effort to address the need for continuous performance benchmarking in data.table, I created a GitHub Action¹ to facilitate testing the time/memory-based performance of the incoming changes that are introduced via Pull Requests (PRs) to the official GitHub repository.

My motivation in taking this initiative was to help ensure that data.table consistently maintains its code efficiency or upholds its high performance standards as PRs keep coming and getting integrated frequently (meaning they need to be monitored for retaining the quality of code contributions performance-wise, especially to avoid regressions, and an automatic way to do that would be ideal, noh?).

Through this post, I aim to share some insights regarding my action and discuss some implementation details, but before that, I’m happy to convey that it has been live since over the past seven months now! There are numerous examples of it being used to generate diagnostic performance plots for PRs that involve changes to the C and R files in the codebase, which can be found through the ‘Pull requests’ section of data.table on GitHub (aside from the ‘Actions’ tab, where jobs keep popping up as new PRs and commits involving code changes emerge from time to time).

Key features

Predefined flexible tests
The action runs test cases (utilizing the atime² package) from the setup defined in .ci/atime/tests.R (the path can be customized) on different versions of data.table (or the R package being tested). These tests are either based on documented historical regressions or performance improvements.
Automated commenting
Using cml³, the action posts information/results in a comment on the pull request thread. The comment is automatically edited with each new push to avoid cluttering, ensuring that only one comment exists per PR, which is the updated or latest one. > - The comment is authored by a GitHub bot and operates using the GITHUB_TOKEN I provide to authenticate itself and interact with the GitHub API within the scope of the workflow. > - If multiple commits are pushed together in quick succession or before the previous job finishes, only the most recent one among them is fully run to save CI time.

Versioning
The action computes the tests on different data.table versions that can be visually compared on the resultant plot. These include various labels, as enlisted in the table below:

Label Name	R Package Version Description
base	PR target
HEAD	PR source
merge-base	Common ancestor between base and HEAD
CRAN	Latest version on the platform
Before	Pre-regression commit (predates the onset of the performance regression)
Regression	Commit that is either responsible for the degradation in performance or is affected by it
Fixed	Commit where the performance has been restored or even improved to exceed the ‘Before’ version
Slow	Older version with slower performance (non-regression) when compared to the latest developments
Fast	Newer version that demonstrates noticeable performance improvement over the ‘Slow’ version

Diagnostic visualization
Plots are uploaded within the comment which contain subplots for each test case, showing the time and memory trends across different data.table versions. The plot shown in the PR threads will be one generated for a well-proportioned preview, meaning it is condensed to only show the top 4 tests (this number can be configured using the N.tests.preview variable in the tests.R file) based on having the most significant differences between HEAD and min. The full version (all tests) will be shown when we click/tap on the plot.
Timing information
The time taken for executing various tasks (such as setting up R, installing different data.table versions, running and plotting the test cases) is measured (in seconds) and organized in a table within the comment.
Links
A download link that retrieves a zipped file for the artifact containing all the atime-generated results is provisioned, aside from the hyperlink to the commit that triggered the workflow and generated that particular comment.

Usage

The workflow can be directly fetched from the Marketplace⁴ for use in any GitHub repository of an R package. For example, one can use this template for their .github/workflows/<workflowName>.yml:

name: Autocomment atime-based performance analysis on PRs

on:
  pull_request:
    types:
      - opened
      - reopened
      - synchronize
    # Modify path filters as per need:
    paths:
      - 'R/**'
      - 'src/**'
      - '.ci/atime/**'

jobs:
  comment:
    runs-on: ubuntu-latest
    container: ghcr.io/iterative/cml:0-dvc2-base1
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
      repo_token: ${{ secrets.GITHUB_TOKEN }}
    steps:
      - uses: Anirban166/Autocomment-atime-results@v1.4.1

The example I provided above can be customized further as needed, as long as a few things are kept intact: - The workflow runs on a pull_request event - GITHUB_PAT is supplied (required to authenticate git operations, have higher rate limits, etc.) - The container and repo_token fields are specified as I did above (required for cml functionality)

Note

The action is not constrained to be OS-specific and there is only one single job or set of steps that execute on the same runner.

Steps

Interested to learn more about the code behind this? Fret not! In this section, I’ll walk you through the steps I have in my workflow, one by one - right from the actions, software, and snippets involved to how they fit into the overall logic.

To begin, I use the checkout action to fetch the repository’s current contents into the runner’s file system. This allows my workflow to access and work with the target project’s source code in the right branch. Note that I set fetch-depth to 0 as I want all commits, branches, and tags (the entire history basically). This is essential for running other versions of the target R package, as otherwise the default value of 0 would only fetch the latest commit on the checked-out branch.

Next, I disable the safe directory check on my repository to bypass the restriction on running commands within foreign directories that git by default enables for security purposes.

I then use two git switches (rationale⁵) to ensure local branch references exist and can be found by atime when using git2r::revparse_single to pick up the right environment variables for HEAD and base.

For a standard R setup, I use the RStudio Package Manager (RSPM) to install the latest version of R.

Next up, I perform an up-to-date system-wide installation of libgit2 (requisite for git2r operations; in turn, atime requires git2r) from source.

I then proceed to install the required R packages from a CRAN mirror. These include atime with its hard dependencies plus the packages required for generating the diagnostic visualizations. I follow up (within the same step) by running atime::atime_pkg (using the tests.R file from the .ci directory of the target package) in the workspace as allocated by my checkout step ($GITHUB_WORKSPACE environment variable pertains to the default checkout directory).

All of the results that have been generated are then uploaded as an artifact using the upload-artifact action. v4⁶ brings this feature (allowing artifacts to be identifiable within a workflow) to the table as the action’s API can now create ID variables that are available (after the artifacts have been generated and uploaded) within the succeeding steps of the workflow (I use them to construct the artifact retrieval URLs).

Finally, its time to publish the results within a comment in PR threads via the GitHub Actions bot! Everything goes into a markdown file - Two plots (one hyperlinked to the other), the SHA for the commit everything is based upon, the link to download the artifact (which again, is being concocted using various environment variables), and an organized table with timing information for different measured phases (the calculations for which are run within this step, and the timestamp recording points are distributed accordingly throughout the workflow and collected in $GITHUB_ENV for ease of access in subsequent steps).

This specific order and segregation of tasks above can also be found in my old slides⁷.

Future work

My action has come a long way since the time I created an issue⁸ to introduce it to the data.table community and subsequently the PR⁹ through which it got integrated into the project (the follow-up[^integration-follow-up-pr] to that also included my first atime test!), but the main goal for me in updating it from time to time (e.g. v1.4.1¹⁰ and v1.3.1¹¹) since then has been constant - to maintain the current functionality of automatically and actively monitoring changes in PRs for noticeable impact in performance (avoiding regressions is the highlighted focus, but the same enthusiasm applies in detecting improvements or observing stability). As and when required, the GHA can be expected to receive updates (or break out of a potential plateau) as long as this approach remains useful, or the needs of the data.table project/community in particular align with this goal.

If reading so far has piqued your curiosity enough that you would like to contribute in terms of optimizing the workflow, and if by the laws of coincidence you also happen to be a student, I would recommend checking out the Google Summer of Code¹² program as I recently wrote in detail a project¹³ (primarily based on minifying package versions and caching/reusing ones based on historical references to save CI resources/time) for extending work on this action. Until next time, happy coding!

Categories

Key features

Usage

Steps

Future work

Footnotes