Source code and related materials for the O'Reilly book

This project is maintained by broadinstitute


Source code and related materials for Genomics in the Cloud, an O’Reilly book by Geraldine A. Van der Auwera and Brian D. O’Connor.

This site is a work in progress, we will continue to add content here now that the book has been released.

Find the electronic version of the book today at or on Amazon (Kindle version), or pre-order the paperback version on Amazon.

Book overview

Data in the genomics field is booming. In just a few years, organizations such as the National Institutes of Health (NIH) will host 50+ petabytes—or 50 million gigabytes—of genomic data, and they’re turning to cloud infrastructure to make that data available to the research community. How do you adapt analysis tools and protocols to access and analyze that data in the cloud?

With this practical book, researchers will learn how to work with genomics algorithms using open source tools including the Genome Analysis Toolkit (GATK), Docker, WDL, and Terra. Geraldine Van der Auwera, longtime custodian of the GATK user community, and Brian O’Connor of the UC Santa Cruz Genomics Institute guide you through the process. You’ll learn by working with real data and genomics algorithms from the field.

This book takes you through:


List of commands

See the commands folder for text files that let you easily copy and paste the commands from the hands-on exercises.


For those of you reading the print version of the book, which does not include color figures, we’ve made the figures available in the figures directory of the GCS bucket.
You may use all figures except 3-3 and 6-15 in your own non-commercial work, preferably with a notice of attribution referring to the book. For commercial use, please contact Figures 3-3 and 6-15 do not belong to us, so you must request permission from their respective owners, which are noted in the book.


We’re developing a blog for the book at where we will publish blog posts, additional tutorials, errata for the book, and regular updates on new features that you maay be interested in. Feel free to suggest blog topics by reaching out to us on Twitter or LinkedIn (see contact info below).

Reporting errors

If you encounter errors or broken links in the book, please file an issue on O’Reilly’s Errata page. Anything reported there that we can verify will get fixed and updated in both the electronic versions and subsequent printing runs of the book, so others won’t run into the same problems.

We don’t use Github Issues for this project to avoid confusion and redundancy with the O’Reilly Errata page.

Getting help

If you run into problems while working through the hands-on exercises, or if have follow-up questions about the topics we discuss in the book, please post your questions in either the GATK forum or the Terra forum. The frontline support team will most likely be able to address your questions, and for anything else they will loop us into the conversation if you mention that your question is related to our book. If you’re not sure which forum to use, just flip a coin; it’s the same team that maintains both communities.

Remember also that you can often save yourself some time by searching the GATK documentation or Terra documentation before posting a question – that way you don’t have to wait for someone to get back to you.

Getting in touch with us

If you’d like to get in touch, you can reach us on Twitter (@VdAGeraldine and @boconnor) and on LinkedIn (Geraldine and Brian). We look forward to hearing what you think of the book! If you like it, please consider posting a review on Amazon.