Introducing OMSF Self-Hosted Runners
Written by: Ethan Holz
Today marks the official launch of almost a year of work from the OMSF Ecosystem Infrastructure team to deliver a solution for Self-Hosted GitHub Actions runners. This directive was one set out in the OMSF's NSF POSE grant, to implement a solution to enable GPU accelerated testing on cloud infrastructure for the OMSF's hosted projects and the broader molecular software space. This process has been in production over the last few months as we learned about the demands of maintaining software to provision such critical infrastructure for projects that heavily rely on GPUs.
This idea is not a new one, in fact we reviewed 6 potential solutions in this space sourced from the awesome-runners repository. Many of these were out of scope for us but we narrowed down testing to the following:
- ansible-github_actions_runner
- docker-github-actions-runner
- ec2-github-runner
- actions-runner-controller
- terraform-aws-github-runner
- github-runner
These runners provided numerous ways for provisioning self-hosted runners. Solutions like actions-runner-controller and terraform-aws-github-runner are focused on using Kubernetes-based/AWS-native scaling to enable rapid deployment of CI environments whereas docker-github-actions-runner and github-runner utilize Docker to enable deployment on any container platform. What we really wanted was a solution that was:
- Configurable in GitHub Actions
- Supported multiple clouds
- Was easy for the research software community to contribute to
One stood out to us for the right mix of workflow and scale, ec2-github-runner. This project focuses on ephemeral deployments using the GitHub Actions syntax we all know and love. To integrate this project, a developer only needs to sandwich their existing project with a start and stop block to create and teardown the runner. In fact, this almost met our goals!There were a few problems here:
- This project was not very actively maintained
- This only supports running on AWS and not other providers
- Contribution might be difficult from our projects since it is a Javascript-based project.
With these drawbacks in mind, we made the decision to write a solution that is Python-based architected to be as easy to use as ec2-github-runner but flexible enough to support other clouds.
What we built
With this in mind, we built gha-runner, a PyPI-published library aimed at making it easier for developers to create custom provisioning for GitHub Actions Runners. This library is a set of abstract classes and helper functions to make creating ephemeral self-hosted runners, easy and maintainable. Additionally, we implemented start-aws-gha-runner and stop-aws-gha-runner to showcase how this library can be leveraged to deploy on AWS.
Not only did we deliver this with excellent unit testing, but we also performed numerous integration tests and a few of the OMSF hosted projects are using this today for not only GPU testing, but also job submission. We also have full documentation for the base library as well as llms.txt
support to provide LLMs with excellent context on how to build using this library.
Where do we go from here?
We aim to provide support for a few other cloud providers in the near future, enabling our developers (and the community at large), more flexibility when it comes to compute. We would also love for the community to try this project out, show us how you might leverage more versatile compute when developing research software! Happy testing!