INTRODUCTION TO BIOINFORMATICS RESOURCES FOR VECTOR GENOMICS STUDIES

  • Date: Monday 24 – Friday 28 September 2018
  • Venue: Santa Chiara Lab, Universita’ di Siena 1240, Via Valdimontone 1, 53100 Siena (Italy)
  • Hosted by: Polo d’Innovazione di Genomica, Genetica e Biologia
  • Application opens: 8 May 2018
  • Application deadline extended to 11 July 2018: CLOSED
  • Availability: 15 places
  • Participation: Application with independent expert panel selection for course attendance and travel grants eligibility

©isak55/Shutterstock.com

Course overview

A course on the principles and practices of vector genomics and bioinformatics aimed at junior scientists (students and postdocs), and/or applicants with a research capacity strengthening component. This course will provide an overview of the bioinformatics tools and resources that will help you in your vector studies, to get more from your data and enable you to further explore the Infravec2 facilities that can support your future research.

No prior experience of bioinformatics is required.

This is a 5-day course organized in modules delivered by both national and international trainers. Each module includes a practical and hands-on session.

Registration and accomodation costs are covered by the course. All successful candidates will receive funding in the form of travel grants to attend the course. These grants are calculated on the basis of the country where the applicant host laboratory / institution is established, regardless of the applicant nationality:

  • €460 to successful applicants working/studying in a EU Member State or Horizon 2020 Associated Country;
  • €750 to successful applicants working/studying in any other country.

A list of Horizon 2020 Associated Countries can be found in the attached document [PDF].

Any other costs not specified in the course overview will not be covered (such as visa fees, stationary, telephone charges, etc.). The fixed-amount travel grant will be paid to selected candidates upon registration on the first day of the course. Successful applicants will be asked to provide a photocopy of their passport/ID and bank details in advance of the course start. No receipts, proof of purchase or expenses claims are necessary for the fixed-amount travel grant.

For more information, please contact the Course Manager.

Syllabus, tools and resources

Coursework material will be distributed by the trainer during the course. Any preparatory or recommended read will be shared with successful applicants in advance of the course. Successful applicants must bring their laptop to the course (laptops minimum requirements: 8GB RAM and a recent CPU) and ensure that UNIX system is supported and running. Successful candidates may be instructed to install dedicated software, such as VirtualBox, in advance of or during the course.

Programme

Morning sessions: 9.00-12.30, afternoon sessions: 14.00-18.00, coffee breaks are included in each session


DAY 1

TOPIC 1.1: INTRODUCTION TO BASH AND R

  • Bash environment
  • Basic bash commands to manage files, folders and simple scripts
  • Practical session
  • R environment and R Studio
  • CRAN and bioconductor packages
  • Working with biological data in R: basic tasks and simple plots

Trainer: Roberto Semeraro, University of Florence


DAY 2

TOPIC 2.1: RESEQUENCING

  • Sequence quality and pre-processing: FastQC, SAMStats
  • Sequence alignment: BAW, Bowtie, SOAP, MrFAST
  • Alignment visualization and manipulation: IGV, SamTools
  • Small variant calling: BCFTools, GATK

Trainer: Luca Pinello, Harvard University


TOPIC 2.2 : DE NOVO ASSEMBLING

  • Sequence quality and pre-processing: FastQC, SAMStats
  • Sequence error correction-self-correction: PbCR, Canu, PoreSeq
  • Sequence error correction-hybrid error correction: Nanocorr, NaS, PbCr

Trainer: Alberto Magi, University of Florence


DAY 3

TOPIC 3.1: RNA SEQUENCING

  • Introduction to RNA sequencing
  • Process raw data and quality control
  • Alignment and visualization
  • Quantification of expression of transcripts
  • Differential analysis

Trainer: Nace Kranjc, Imperial College London


TOPIC 3.2: smallRNA SEQUENCING

  • Introduction to RNA Sequencing
  • Process raw data and quality control
  • Know and novel miRNAs identification – mirBASE db
  • Expression quantification of miRNAs
  • Differential expression analysis across different conditions

Trainer: Romina D’Aurizio, National Research Council


DAY 4

TOPIC 4.1: METAGENOMIC

  • Introduction to metagenomic sequencing
  • Preprocessing raw data and quality control (remove primers, demultiplex, quality filter, decontamination)
  • Analyze 16S rRNA marker gene data, OTUs picking -SILVA and GreenGenes database
  • Taxonomic assignment, OTU table building
  • Differential analysis of taxonomic abundance across sample groups

Marco Bruttini, Polo GGB


DAY 5

TOPIC 5.1: LONG READS: NANOPORE SEQUENCING

  • Introduction to Nanopore sequencing: sequence size, sequence quality and error rate
  • Long reads for resequencing (alignment): BWA, BLASR, LAST, GraphMap
  • Long reads and small variant calling: Nanopolosh, Margin Caller

Trainer: Alberto Magi, University of Florence

 

Trainers biographies


Dr. Roberto Semeraro, University of Florence
Roberto Semeraro graduated in Biotechnology at the University of Firenze, with a thesis on the composition and evolution of Sinorhizobium meliloti metabolism. Later he completed his PhD in Information Engineering working with biological complex systems. During this period, Roberto focused his attention on human kind mainly, particularly on cancer research, developing methods for the analysis of tumor sequencing data. Currently his post-doctoral project involves working with cancer genomes, studying the impact of single nucleotide variations (SNV), InDels and structural variants on disease progress and onset. Roberto is also engaged in developing methods for visualization and processing of long read sequencing data from Oxford Nanopore Technologies.
Back to topic 1.1


Dr Luca Pinello, Harvard University
Luca Pinello is a computational biologist studying the role of chromatin structure/dynamics and of non-coding regions in gene regulation. The mission of his lab is the integration of omics data to explore and better understand the functional mechanisms of the non-coding genome and to provide accessible tools for the community to accelerate discovery in this field. Luca fully embraced the revolution in functional genomics made possible by the novel genome editing approaches such as CRISPR/Cas9 and has developed computational tools to quantify and visualize the outcome of genome editing experiments that are nowadays the standard de facto for the community. He is also actively involved in the single-cell community and is part of the Human Cell Atlas initiative by proposing computational strategies to model gene expression variability, its relationship with chromatin accessibility and to reconstruct developmental trajectories. The long-term goal of his research is to develop computational approaches and to use cutting-edge experimental assays, such as single cell and genome editing techniques, to systematically analyze sources of genetic and epigenetic variation that affect gene regulation in different human traits and diseases. “I believe this will provide a foundation for the development of new drugs and more targeted treatments”.
Back to topic 2.1


Dr Alberto Magi, University of Florence
After obtaining a MD in Environmental Engineering (2002) and a PhD in Nonlinear Dynamic and Complex Systems (2006) in Florence, Dr. Magi carried out his postdoctoral research period focusing on genomic informatics and Network Biology. His fascination with genetic variation brought him to develop computational approaches that make use of Next generation sequencing and microarray data for the study of human genetic variation, publishing the first algorithm for the detection of common CNVs (Copy Number Variants) by using whole genome sequencing data.
He was among the first authors to demonstrate the capability of Read Count (RC) approach to predict the boundaries and the exact number of DNA copies of genomic regions involved in CNVs.
Thanks to his experience in RC approach, he developed and published EXCAVATOR, a software tool for the detection of germline and somatic copy number alterations (CNA) from whole-exome sequencing data. Actually, EXCAVATOR is considered a state-of-the-art tool for somatic CNA detection; with more than 4000 downloads, this tool is used in several cancer genomic papers published by different research groups.
In 2015, he discovered the presence of almost 100.000 errors in the sequence of the GRC human reference genome, demonstrating that currently available somatic variants caller are not capable to detect variants in these loci and developing a tool, RAREVATOR, for calling SNVs and small InDels taking into account these errors.
Recently, he focused his interest in long reads sequencing, publishing the first and most comprehensive analysis paper for understanding the capability of third generation sequencing data (Nanopore and PacBio) to be used for the detection of SNVs (Single Nucleotide Variants), small InDels and structural variants in re-sequencing studies.
He is the author of more than 40 papers in international peer-reviewed journals and journal reviewer for Nucleic Acids Research, Bioinformatics, BMC Bioinformatics, PLOS ONE, Journal of Biomedicine and Biotechnology, Human Heredity, Communications in Nonlinear Science and Numerical Simulations.
At present, he is the principal investigator of a research grant funded by the Italian Ministry of Health, for the development of novel computational strategies for the detection of structural variants from whole-exome sequencing data applied to acute myeloid leukemia. He is also the co-principal investigator of a research grant (Italian Ministry of Health) for the study of “Clonal heterogeneity and clonal progression in the leukemic transformation of chronic myelo-proliferative neoplasms”.
Since 2014 he is an assistant professor at the Department of Experimental and Clinical Medicine of the University of Florence and is the head of the Laboratory of Bioinformatics for Omic Sciences.
Back to topic 2.2
Back to topic 5.1


Mr. Nace Kranjc, Imperial College London
Nace Kranjc is a PhD student at Imperial College London at the renowned laboratory of Professor Andrea Crisanti. He is focusing on improving gene drive nucleases for use in the mosquito and understanding the consequences of their activity in the mosquito genome by using a bioinformatics approach. Before commencing his doctoral studies, Nace was involved with Genialis, a Slovenian provider of bioinformatics products and consulting services, where he gained experience in developing interactive web tools for RNA-seq data exploration and visualization for both private and academia applications.
Back to topic 3.1


Dr. Romina D’Aurizio, Centro Nazionale di Ricerca
Romina D’Aurizio is a mathematician with a wide expertise in the analysis of high-throughput data, especially deep sequencing data (whole and exome seq, RNAseq and small RNAseq). During the PhD programme she actively collaborated with the Research Department of Novartis Vaccines and Diagnostics in Siena. Her main activities were developing computational methods to predict cellular localization of Gram+/- bacterial proteins and for proteins identification in mass spectrometry experiments. She was involved in comparative analysis studies of raw genomes, their normalization and functional annotation to identify proteins interacting with host cell targets. As postdoctoral she developed expertise in deep sequencing data analysis for de novo assembly of prokaryotic genomes and variants characterization as foundation for investigating the relationship between genotype and phenotype and comparative studies. In 2012 she joined National Research Council (CNR) in Pisa and she is involved in various projects aiming at characterizing the functional role of microRNAs and other ncRNAs as biomarkers, in resistance mechanisms to anticancer drugs and in cardiac pathogenesis.
Back to topic 3.2


Dr Marco Bruttini, Polo d’Innovazione Genomica Genetica e Biologia
Marco Bruttini is an NGS data analyst with experience in bioinformatics workflows for genomics, transcriptomics and metagenomics applications, involved also in the creation and use of data algorithms especially in the field of targeted resequencing approaches. He completed his PhD in Molecular Biology in collaboration between GSK Vaccines and the University of Siena working on a project based on the characterization of the immune IgG response following serogroup B Neisseria meningitidis vaccination of healthy subjects. Marco’s selected software environment for statistical computing is R and he is currently engaged in activities aimed to release a package for the analysis of NGS raw data from CRISPR/Cas9 experiments.
Back to topic 4.1