Manual Deployment
If you contribute to the GATK-SV codebase, we recommend you ensure that affected Docker images build successfully and function as intended. The process involves two steps:
-
Build: Create Docker images from Dockerfiles.
-
Publish: Upload the built Docker images to container registries (e.g., Google or Azure container registries, GCR and ACR, respectively) to make them available for use in Terra or Cromwell. You may skip this step unless you would like to host the images you built on your own container registry.
To streamline the process, we have developed a script that automates both the build and publish steps. This section provides guidelines on setting up the environment and running the script with a minimal example.
Only Linux machines (dedicated or virtual) are supported for building GATK-SV Docker images. In addition, images created on non-Intel processor architectures (e.g., Apple M1) may not function as intended, even if the build process runs successfully.
Setup an Ubuntu VM
This section outlines steps to follow in order to create and connect to a Linux virtual machine (VM) on a cloud service provider. You may skip to the next section if you are using a dedicated Linux machine (e.g., a laptop running Ubuntu).
1. Set environment variables
- GCP
export PROJECT_ID="<GOOGLE PROJECT ID>"
export ZONE_ID="<ZONE ID>"
# Make sure no machine with the following name exist,
# and you follow VM naming conventions, e.g., all lower-case characters.
export INSTANCE_NAMES="<VM NAME>"
2. Create an Ubuntu VM
You may skip to the next step if you have already created a VM.
- GCP
gcloud compute instances create $INSTANCE_NAMES \
--project=$PROJECT_ID \
--zone=$ZONE_ID \
--machine-type=e2-standard-2 \
--create-disk=auto-delete=yes,boot=yes,device-name=$INSTANCE_NAMES,image=projects/ubuntu-os-cloud/global/images/ubuntu-2310-mantic-amd64-v20240213,mode=rw,size=100
Note that this command creates a VM with 100 GiB
disk size,
to accommodate for the disk space requirements of GATK-SV Docker images.
You may follow the documentation on this page for more details on creating a virtual machine on GCP.
The firewall rules of your institute may require you to be on-site or connected to the institute's VPN before you can access the cloud resources billed to your institute.
3. Connect to the VM
- GCP
gcloud compute ssh $INSTANCE_NAMES --project $PROJECT_ID
Follow the on-screen prompts for authorizing access to ssh
credentials.
Errors running this command
If you are getting any of the following error messages when you try to connect to the VM immediately after you have created it, it may indicate that the VM is not ready yet, and you may need to wait a few minutes before retrying.
ssh: connect to host [IP address] port 22: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
username@[IP address]: Permission denied (publickey).
4. Install Docker
You may skip to the next step if you have already installed and configured Docker on this VM.
-
Install pre-requisites
sudo apt-get update && \
sudo apt-get install ca-certificates curl && \
sudo install -m 0755 -d /etc/apt/keyrings && \
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc && \
sudo chmod a+r /etc/apt/keyrings/docker.asc && \
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null && \
sudo apt-get update -
Install Docker
sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin && \
sudo usermod -aG docker ${USER} && \
newgrp dockerYou may follow Docker documentation on details on installed Docker on Ubuntu.
-
Login to Docker
- GCP
-
Run the following command on the VM.
gcloud auth login
-
Follow the on-screen prompts, it will display a URL that you need to copy-paste it on the browser of your computer (not the VM).
-
Follow the prompts on your browser, and login with an account that will provide you with access to the GCR repository. If you are planning on publishing images you build to GCR, you need to make sure you account has sufficient access to GCR.
-
Configure Docker with your credentials.
gcloud auth configure-docker
You may refer to this page for more details on configure Docker to access GCR.
Checkout codebase
-
Clone the repository or its fork that contains the branch with the changes that you want to build the Docker images based-off.
git clone https://github.com/broadinstitute/gatk-sv && cd gatk-sv
-
Checkout the branch containing your changes.
git checkout <BRANCH_NAME>
Build and Publish Docker Images
In its minimal setup, you may use the following command to build and publish GATK-SV Docker images.
python3 scripts/docker/build_docker.py \
--targets <IMAGES> \
--image-tag <TAG> \
--docker-repo <CONTAINER_REGISTRY>
The arguments are explained in the following.
--targets
You may follow either of the following approaches to determine which images to rebuild.
-
Manual: You may refer to the table in this section to determine which Docker images to rebuild based on the changed files. For instance, if you modified any of the files under the
gatk-sv/src/svtk/
directory, you will need to rebuild thesv-pipeline
Docker image. You can set the list of images to rebuild using the--targets
argument. For instance:python scripts/docker/build_docker.py \
--targets sv-pipelineYou may specify multiple images to rebuild by providing a list of their names. For instance, the following command builds the
sv-pipeline
and thestr
Docker images.python scripts/docker/build_docker.py \
--targets sv-pipeline str -
Automatic (advanced): You may refer to this page for details on this method. Briefly, you may take the following steps.
-
git commit
the changes. -
Identify
BASE_SHA
andHEAD_SHA
usinggit log
or GitHub. You may use the following commands to get these SHAs.export \
HEAD_SHA=$(git log -1 --pretty=format:"%H") \
BASE_SHA=$(git merge-base main $(git branch --show-current))Note that, you may need to modify these commands if your branch has a complicated git history.
-
Run the script using
--base-git-commit
and--current-git-commit
instead of--targets
.
python scripts/docker/build_docker.py \
--base-git-commit <BASE_SHA> \
--current-git-commit <HEAD_SHA> -
Please note that --targets
and --base-git-commit --current-git-commit
options are mutually exclusive. In other words, you can either manually specify
images to rebuild, or let the script determine them automatically using commit SHAs;
combining or avoiding both options is not currently supported.
Following the steps above, the script builds the specified Docker images
and all the images derived from them.
You may add the --skip-dependent-images
flag to build only the explicitly specified images.
--image-tag
You may use any naming convention for the Docker image tags. GATK-SV Docker images are tagged using the following template (you may refer to this section for details).
[Date]-[Release Tag]-[Head SHA 8]
For example:
--image-tag 2023-07-28-v0.28.1-beta-e70dfbd7
--docker-repo
If you are only testing GATK-SV Docker image build,
you may skip this section and avoid providing --docker-repo <registry>
.
However, if you need to push image to container registries,
need images for WDL testing, or need to host the images on a container registry
other than those maintained by the GATK-SV team.
The build_docker.py
script automatically pushes Docker images to a container registry
when --docker-repo <registry>
is provided, replacing <registry>
with the container registry you want to use.
When providing this argument, ensure that you are logged into Docker with
credentials granting push access to the registry,
You may configure and set the registry as the following.
- ACR
- GCR
-
You may follow these steps if you have not configured a container registry.
-
Once configured, you may set
<registry>
in the following template.<REGISTRY>.azurecr.io/<REPOSITORY>/<IMAGE>
Example:
myregistry.azurecr.io/gatk-sv
-
You may follow these steps if you have not configured a container registry.
-
Once configured, you may set
<registry>
in the following template.<HOST_NAME>/<REPOSITORY>/<IMAGE>
Example:
us.gcr.io/my-repository/gatk-sv
Post-build
-
GATK-SV docker images are mainly intended for use in WDLs. Therefore, it's a good practice to run the related WDLs with updated images to assert if the images function as expected.
-
If you were using a Linux VM to build the Docker images, ensure you either stop or delete the VM after building the images. Stopping the VM won't delete the disk, and you may continue to incur disk usage charges. If you plan on re-using the VM, stopping is preferred as it preserves the configuration; otherwise, you may delete the VM and all the associated resources (attached disks in particular).