This hands-on course introduces students to modern computational methods for biology with an emphasis on emerging standards of Open Science and Reproducible Research. No prior programming experience is assumed. In coordination with Queen’s University Centre for Advanced Computing (CAC), the content of this course is intended to teach core-competency in the following areas:

  1. Programming basics: regular expressions, flow control, file manipulation, custom functions.
  2. Unix/Linux command-line programming and shell scripting on CAC infrastructure (SLURM). R with GitHub, R Markdown, and the tidyverse (dplyr, %>%, tidyr, ggplot2) for seamless reproducible data management, analysis and production of publication-ready graphics.
  3. Introduction to some standard formats (e.g. FASTA, FASTQ, SAM, BED, BAM), programs (e.g. SAMTOOLS, VCF, HISAT2), and pipelines in Unix/Linux, R and Python for reproducible analysis of data from next-generation sequencing platforms.

Content includes introduction to state-of-the-art methods in genomics and metagenomics, including sequence assembly, alignment, variant detection, and gene annotation. However, the majority (~3/4) of the course is devoted to learning basic computational tools that may be useful to biologists working on large datasets of any kind.