Bioinformatics Team

MRC Clinical Sciences Centre

Thomas Carroll

The Bioinformatics Team.

  • Tom Carroll 
  • Gopuraja Dharmalingam
  • Sanjay Khadayate
  • Yi-Fang Wang
  • Marion Dore
  • TBD

Websites

Where to find the team.

  • ICTEM
  • 2nd floor, MRC.
  • Central aisle,
  • Behind the printers.

Role

  • Analysis
  • Experimental design.
  • Bioinformatics Infrastructure.
  • Training.
  • Bioinformatics Seminar Series - June 2016.

Text

Experimental Design

“To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.”

Fisher RA, 1938

  • Work closely with Genomics Team to help with design questions
    • Replicate number.
    • Sequencing depth.
    • Sequencing strategy.

Nice example experimental design

  • RNA-seq experiment (2014)
  • Graph shows major sources of variation.
  • Samples from same groups close together.
  • Samples from different experimental conditions separate. 

Nice example of experimental design

  • Smaller sources of variance relating to other metadata.
  • Samples group according to the day that RNA was extracted on.
  • Known effects can be removed from analysis.

Analysis

  • Initial data processing and QC.
  • Advice and support as needed.
  • Support throughout project.

Increased demand for long term support.

Authorships in 19 publications since 2014

12 in 2015.

Analysis support

  • Increased use of high throughput techniques in projects.
  • Greater use for bioinformatics in projects.
  • Analysis across project lifetime or individual elements.
  • Requires reproducible research.

Reproducible research

  • Reproducible results from computational methods should be straight forward.
  • Common problems.
    • Version and software changes.
    • Lack of analysis documentation.

rMarkdown

  • rMarkdown converted R code to dynamic reports.
  • Code, results and versions are reported within the same page.
  • HTML allows for inclusion of dynamic elements.

A do it yourself guide

Project tracking

  • Use Redmine software.
  • Multiple user interface to record project information.
  • Repository to version control scripts (SVN).
  • Wiki for internal documentation.

Infrastructure

  • Analysis pipelines.
  • Data delivery.
  • Software development.

ChIP-seq and RNA-seq

pipelines.

  • Common analysis steps can be automated.
  • Optimised for local resources.
  • Reproducible and comparable.
  • ChIP-seq and RNA-seq pipeline to automate alignment and quality control.
  • Freely available for use or customisation on github

http://mrccsc.github.io/

RNA-seq and others in the pipeline

  • Internal RNA-seq pipeline
    • Written in R.
    • Easily installed, maintained.
    • Allows Core to move between systems easily.
    • Released soon.
  • Genomics pipeline.
    • R based.
    • Automate basecalling and sequence QC capture.
    • Development version on github site.
  • ChIP-seq R pipeline.
  • Basecalling to ChIP/RNA-seq QC.

UCSC genome browser

  • UCSC allows for visualisation of a range of genomics data types.
  • Public instances can be very slow.
  • CSC public instance maintained by Bioinformatics team.
  • web: http://ucsc

    FTP: ftp://ucsc

Software

  • Develop and maintain software relevant to our work.
  • R packages and javascript toolsets.
  • Release software to public (peer-reviewed) repositories.
    • Collaborative feedback.
    • Automated build reports and checking.

ChIPQC

  • Lack of suitable R/Bioconductor quality control tools for ChIP-seq.
  • Require methods to assess quality across high volumes of samples
  • ChIPQC developed and tested on 500 public datasets.

Package

Bioc2014 Tutorial

  • IGV is an popular alternate to UCSC.
  • Allows for inclusion of per sample metadata and complex sample display types.
  • Tracktables creates standalone and rMarkdown compliant tables.

Tracktables

  • Visualising genomics data over regions of the genome.
  • Allows for rapid generation of profiles and subsetting by IDs or other regions.
  • Arithmetic operations between and within profiles allows for rapid, iterative investigation of hypotheses.

Soggi

  • Peak calling in R is convenient.
  • Many peak callers in R have unwieldy input and far from optimised.
  • triform contains a reliable peak calling algorithm in need of optimisation for speed and long marks.
  • MRC CSC took over maintenance of triform in 2015

triform

Training

  • Aim to develop courses to meet MRC Clinical Sciences requirements.
    • ​R
    • Python
    • High throughput sequencing analysis.

CSC Bioinformatics Course

  • Current and upcoming Bioinformatics training material can be found at our site

http://mrccsc.github.io/training.html

Training Collaborations

Develop and share courses between other Bioinformatics teams.

https://github.com/bioinformatics-core-shared-training.html

Training on the cloud.

  • Awarded grant from Amazon Web Services.
  • Use virtual linux servers to host  R and RStudio pre-loaded with course material.
  • Allow for larger, real world analysis tasks during training.
  • No need for dedicated classroom - train from anywhere.

Bioinformatics Seminar Series

 

  • Discuss methodology behind bioinformatics analyses.
    • ​Laurent Gatto - Head of Computational Proteomics, Cambridge Proteomics centre.
    • Simon Andrews - Head of Bioinformatics, Babraham Institute
  • Starts again June 2016. 
  • More information on previous and upcoming speakers will be found on our website.

Have a great week!

Contacts and thanks

Bioinformatics Team

Tom - thomas.carroll@imperial.ac.uk

Gopu - gopuraja.dharmalingam@imperial.ac.uk

Sanjay -  sanjay.khadayate@imperial.ac.uk

Yi-Fang - yi-fang.wang@imperial.ac.uk

Marion - marion.dore@imperial.ac.uk