About

This is the blog for the book “Genomics in the Cloud” by Geraldine Van der Auwera and Brian O’Connor, published by O’Reilly Media in May 2020. The purpose of this blog is to provide supplemental information about updates, context and materials that either did not fit in the book or arose after its original publication.


About the book

This book takes you through:

  • Essential genomics and computing technology background
  • Basic cloud computing operations
  • Getting started with GATK, the Broad Institute’s industry-leading variant calling software
  • Three major GATK Best Practices pipelines for variant discovery
  • Automating analysis with scripted workflows using WDL and Cromwell
  • Scaling up workflow execution in the cloud, including parallelization and cost optimization
  • Interactive analysis in the cloud using Jupyter notebooks
  • Secure collaboration and computational reproducibility using Terra

Where to find it / Available formats

You can find the electronic version of the book in the O’Reilly Learning Library at https://oreil.ly/genomics-cloud or from your preferred e-book vendor. There is a Kindle version available from Amazon.

The print version is available on Amazon and from all major online booksellers. We do encourage you to support an independent bookseller if you are able.

The e-book versions feature full-color figures, while the print version is grayscale only. See the “Figures and companion booklet” section in the Resources below for free access to the original full-color figure files.


Resources

Code, list of commands etc

See the Github repository for access to the book materials.

Figures & companion booklet

All figures from the book are available in the figures directory of the GCS bucket.
You may use all figures except 3-3 and 6-15 in your own non-commercial work, preferably with a notice of attribution referring to the book. For commercial use, please contact permissions@oreilly.com. Figures 3-3 and 6-15 do not belong to us, so you must request permission from their respective owners, which are noted in the book.

In addition, we made a PDF booklet containing the detailed table of contents and all the figures (organized by chapter) to make it easy to download, browse and optionally print figures in full color.

Reporting errors

If you encounter errors or broken links in the book, please file an issue on O’Reilly’s Errata page. Anything reported there that we can verify will get fixed and updated in both the electronic versions and subsequent printing runs of the book, so others won’t run into the same problems.

We don’t use Github Issues for this project to avoid confusion and redundancy with the O’Reilly Errata page.

Getting help

If you run into problems while working through the hands-on exercises, or if have follow-up questions about the topics we discuss in the book, please post your questions in either the GATK forum or the Terra forum. The frontline support team will most likely be able to address your questions, and for anything else they will loop us into the conversation if you mention that your question is related to our book. If you’re not sure which forum to use, just flip a coin; it’s the same team that maintains both communities.

Remember also that you can often save yourself some time by searching the GATK documentation or Terra documentation before posting a question – that way you don’t have to wait for someone to get back to you.

Getting in touch with us

If you’d like to get in touch, you can reach us on Twitter (@VdAGeraldine and @boconnor) and on LinkedIn (Geraldine and Brian). We look forward to hearing what you think of the book! If you like it, please consider posting a review on the Amazon listing or on O’Reilly’s site.