Topic: environmental biotechnology
Order Description
 Aim: Using publicly available sequence data provide an analysis comparing two or more metagenomes.
 Background:
 MG-RAST is a project run by Argonne National Laboratory (US). It is a helpful and relatively simple tool which can be used to analyse metagenomic data. The pipeline provided can be used for quality control, protein prediction and the necessary bioinformatic tools in order to annotate protein function and taxonomic classification of sequences The data used in this practical has been made publicly available by the teams who have published research using these sequences. This means that they have collected and extracted the DNA for sequencing. Once the sequencing files have been obtained all subsequent files have been uploaded onto MG-RAST, where they then undergo quality control protocols.
 To find out exactly what has been done to the sequences you select for your assignment, you will need to read the dataset report within MG-RAST. You may also find the paper(s) associated with these sequences useful.
 Task:
 In order to write your full scientific report, you need to use sequences in order to obtain full metagenomic dataset from your chosen biomes.
Everyone will be working with different sequences. Using the search function in MG-RAST
 You need to know what question(s) you are going to address in your paper. This will have a big impact on what data is relevant to you.
 Some things to consider when selecting the appropriate sequence files:
 – Are you looking to compare specific environments?
 – Are you going to only look at a specific phyla or class of organism? i.e. what taxonomic resolution and how many?
 – Do you want to include replication?
 o This is a particularly important point as not all metagenomics data is replicated, not all sequences will have replicated data but some will.
 o Besides cost, replicating metagenomic data is also time intensive for the investigators.
 o However, if you do not have much or any metadata to compare your data to, replicated samples will look better to journal reviewers.
Example of MG-RAST search:
 I conduct a search and find 2 metagenomic projects which sample surface sediments in two very different environments. The projects are unrelated.