Goals:
I) Learn how to use the basic sequence download interface at the NCBI (National Center for Biotechnology Information or ‘GenBank’)
II) Learn how to design forward and reverse primers to PCR amplify an open reading frame (ORF), a basic skill in biotechnology
III) Understand the properties of the PCR product these primers would produce
IV) Learn how to modify the base sequence of the primers so that the PCR product can be cut by restriction endonuclease cleavage
V) Learn how to modify the base sequence of the reverse “stop” primer to eliminate the stop codon, so that you can make translational protein fusions
VI) Try out webtools for translating proteins and predicting the properties of proteins. This will show you some of the molecular biology software resources available on the web.Primer design
You are working on a project in which you want to express the major outer membrane porin OmpA from Liberibacter asiaticus in E. coli to study it further (see paper [Duan et al., 2009] on the Liberibacter asiaticus genome in assignment folder). The outer membrane proteins of Gram negative bacteria can be extremely important for bacterial survival in different environments. You want to express this protein in E. coli. If you can manipulate the expression so that it is properly inserted into the outer membrane, you will use this to look for drugs that can interact with this protein and influence its conformation in the membrane. A drug that does this might destabilize the membrane or block the OmpA pore.
Your first goal is to design PCR primers to ‘amplify’ the DNA coding for this protein, so you can ligate it into an expression plasmid. First you have to obtain the sequence of the entire open reading frame (ORF) for the protein you want to express, and then design primers to amplify it by PCR.
Use this Web procedure to get the nucleotide sequence (ompA) encoding OmpA protein from Liberibacter asiaticus
a) Go to the NCBI website (National Center for Biotechnology information) website at http://www.ncbi.nlm.nih.gov/
b) Paste NC_012985.3, the Accession number for the Liberibacter asiaticus strain psy62 genome, into the search field and click “Search”. (psy62 means that the genome was assembled from metagenomic DNA collected from the psyllid insect host of L. asiaticus.)
c) You should see a list of databases. Scroll down until you see the one hit in the database “Nucleotide”. Click on that.
d) This page will direct you the nucleotide sequence of the entire genome of Liberibacter asiaticus
strain psy62.
e) So that you can view the whole sequence, check the box that says ‘Show sequence’ under ‘Customize view’ on the right side menu. Then click ‘Update view’. It will take a few minutes for all of it to display. Don’t click on anything else while you wait.
When it completes, the genome sequence itself will be at the bottom of the page. At the top of the page will be the meta-information for the bacterial isolate and below that will be a list of open reading frames and the amino acid sequences of those ORFs.
f) You want only the coding region for the ompA porin open reading frame. It is encoded by the nucleotides from position 205404…206459. The protein ID # of the protein is WP_012778533
g) Search within the page for WP_012778533. This should take you directly to the correct entry.
h) In the left margin of the sequence beside the entry for this gene will be the term ‘CDS’, which stands for ‘coding region’. Click on ‘CDS’. Be sure you click on CDS for WP_012778533 and not for the one above or below it.
i) What should happen next is that a pop up window will appear at bottom of the page where the nucleotide sequence is located. It should say ‘complement (205404..206459)’ at the top of the pop up window. A region of DNA sequence should also be highlighted in brown. The highlighted sequence should read ‘ttagaaa….’
Note: If you didn’t set the ‘Display options’ to ‘Show sequence’ back in step (e), it will say ‘Warning: Cannot highlight feature because no sequence is shown.’ If you see that, click on the prompt that says ‘Show the sequence’ and wait for it to load.
j) At the bottom left of the window, it should say ‘CDS’ ‘Feature 183 of 1046′. At the bottom right of the window, it should say ‘Display: FASTA GenBank Help’. Click on the ‘FASTA’ link.
k) This should take you to a page that displays the nucleotide sequence of only the 205404..206459 region with a tagline at the beginning that says “>gi|346722692:205404- 206459 Candidatus Liberibacter asiaticus str. psy62, complete genome”. If this is not what the tagline says, you have selected the wrong ORF.
l) Select all of this DNA sequence, and copy it (control C on PC; command C on Mac)
m) You need to get the ‘reverse complement’ of the sequence as it is given in NCBI. This makes it easier to view the open reading frame start and stop codons.
Go to this website http://www.bioinformatics.org/sms/rev_comp.html
This is a quick and easy webtool for getting the reverse complement of a DNA sequence.
Clear the sample sequence in the window and paste the sequence you copied from NCBI into the window and hit ‘Submit’. Copy the sequence from the output window.
n) Now paste the sequence from the output window into Word or other text editor program. It will be easier to work with if you make sure that all the spaces and paragraph returns in the DNA sequence have been eliminated. (In Word, use Advanced Find/Replace under the Edit menu to do this. To find spaces using Advanced Find/Replace, just type in a space. To find paragraph returns search for ^p)
You should now have the sequence you will be working with to choose primers.
If you have the sequence in the correct orientation, it will start with an ‘ATG’ start codon and end with a ‘TAA’ stop codon.
Selecting primers
To amplify the ompA ORF by PCR, you need to choose DNA primers each with a Tm of 68oC. 68oC, or slightly lower will be a good temperature to choose for your ‘annealing temperature’ in the PCR. (In real life, after designing these, you would have them chemically synthesized by a company.) You would then use these primers to amplify this open reading frame in the PCR reaction, using isolated genomic DNA from citrus greening-diseased trees as the template.
There are many different formulas for calculating the Tm of primers. The simplest is:
Tm = 4°C x (number of G’s and C’s in the primer) + 2°C x (number of A’s and T’s in the primer)
[For example a primer 5′ ACGAAAT 3′ would have a Tm for binding template DNA of (4°C x 2 GorC) + (2°C x 5 AorT) = 18oC]
Use this formula for this exercise (if you do not do the calculations correctly, your primer will not have the correct length.
Refer to your primers as “start, forward primer” and “stop, reverse primer”.
You cannot use a program such as Primer BLAST to generate these primers unless you know how to use it very, very well. Your choice for the position of the primers is completely constrained by the need for the start primer to begin with the start codon and your stop primer to begin with the stop codon in the reverse complement direction.
(For entering the primer sequences in Bb, the ‘start codon’ primer must be the forward primer and the ‘stop codon’ primer must be the reverse primer. Both should be entered starting at the 5′ end. You can use the same reverse complement webtool that you used before to get the reverse complement. http://www.bioinformatics.org/sms/rev_comp.html )
If this does not make sense to you, review the PCR ppt or PCR movie posted in lecture pdfs. Look very carefully at how the primers interact with the template sequence during the PCR.
– To submit your answers, type or paste your primer sequences into the fields in the Blackboard assignment. (Type in only the DNA sequence of the primer. Eliminate all spaces and extra characters from the text)
– You MUST enter the primer sequences starting with the 5′ end and enter them in the correct orientation. (refer to the PCR lecture powerpoint and come to the Help Sessions if you do not understand how to do this). If the orientation is incorrect or they are not on the correct strand, the PCR would fail. There is no partial credit for incorrectly entered primer sequences (wrong orientation, wrong template strand, etc.). Getting the orientation correct indicates that you understand how PCR works, which is the point of this part of the exercise.
1) “Start” forward primer, Tm = 68oC: 5′
2) “Stop” reverse primer, Tm = 68oC: 5′
More PCR questions:
3) How many hydrogen bonds between an A/T base pair?
4) How many hydrogen bonds between a G/C base pair?
5) How many base pairs will the PCR product be that is generated by using these primers?
6) How many amino acids is the predicted protein encoded by this open reading frame?
7) Translate the DNA sequence into the amino acid sequence. (Do not do this by hand! Use a webtool. Directions for using one of them are below.)
–An excellent webtool for this is the ExPASY translate tool. Go to the ExPASY Bioinformatics Resource Portal http://www.expasy.org/ (There are many useful tools here. Check them out when you have time!)
–In the left-hand menu panel, click on ‘proteomics’, in this category, click on ‘protein sequence and identification’
–On the page that opens, look under ‘Tools’ and click on ‘Translate’ at the bottom of the list
–Paste the ompA DNA sequence into the box
–Under ‘Output format’ choose ‘Compact’ from the pulldown menu and leave ‘Genetic code’ as Standard
–Click the ‘Translate sequence’ button
–This will give you translations in all 6 reading frames (why are there 6 reading frames? Think about this.)
–Only one reading frame will be a single coding sequence from the beginning to the end of the sequence.
–Copy and paste this sequence into a text editor program such as Word.
–Use ‘Advanced Find and Replace’ to eliminate spaces and paragraph returns at the end of lines
Paste the translated sequence into the Number 7) answer space in Bb
8) Approximately how many kiloDaltons (kDa) in molecular weight (MW) is your protein predicted to be?
(Be very careful about using online programs to calculate this. Some are just wrong.) I suggest continuing to use the tools at ExPASY. Find the tool called ‘ProtParam’. This is a webtool that calculates predicted physical and chemical properties of proteins. Give the answer in the number of kiloDaltons (ProtParam gives the answer in Daltons. You will need to convert to kDa)
Another way to calculate approximate molecular weight is to use the molecular weight of the AVERAGE amino acid, which is 110 Da = 0.110 kDa
9) What is the theoretical pI (isoelectric point) of the protein? (I suggest using ProtParam) This is the point of neutral charge in a pH gradient.
10) How does the pI of OmpA compare with proteins used as examples in lecture when we have talked about isoelectric focusing? (Look in Lecture 4 or Lecture 12)
(multiple choice)
OmpA has a pI than the proteins shown on 2D gels in lecture.
11) The two most abundant amino acids in OmpA are and ?
FYI, look at the total number of negatively charged amino acid ‘residues” and the total number of positively charged residues that ProtParam detects in OmpA. This might help you understand why OmpA has this pI. On an isoelectric focusing gel (pH gradient), proteins migrate to the pH where they have a neutral charge. You don’t need to answer this, but think about why OmpA would have a neutral charge at the pH equivalent to its pI.
12) OmpA is predicted to be a transmembrane protein. Most ‘porin’ proteins like this are composed of ‘transmembrane beta-barrels’. The protein snakes back and forth across the outer membrane multiple times. How many transmembrane strands is L. crescens OmpA predicted to have?
A good webtool for this is ‘PRED-TMBB’ for prediction of transmembrane beta-barrels http://biophysics.biol.uoa.gr/PRED-TMBB/
–Go to this website and read the description of the software, then click on ‘Proceed with PRED- TMBB’
–Paste the OmpA amino acid sequence in the box and click ‘Submit Query’
–The easiest way to visualize the result is the ‘2D Representation’. Scroll cown until you see the 2D representation box and click on ‘Create’
–This will give a schematic view of how the protein is predicted to pass back and forth across the membrane.
Now you can answer the question, How many transmembrane strands is L. crescens OmpA predicted to have? (Type in the number only)
13) You decide that you want to eliminate the stop codon at the end of the ORF and replace it with a leucine, so that you can make a C-terminal tagged version of the protein. You want to attach a 6 histidine tag (6xHis) to the protein. If you eliminate the stop codon between ompA and the DNA sequence coding for the tag, the ribosome will translate these sequences as one continuous ORF.
What will the DNA sequence of your “Mutated stop primer” sequence be with a leucine codon substituted for the stop codon? (You have more than one option for leucine codons)
Don’t worry about the change in Tm of the primer. Leave the primer length the same as what you chose in 2. (enter it in the same 5′ to 3′ format as the original primer) mutated stop primer : 5′
14) For cloning purposes, add an XbaI restriction site to the 5′ end of the Start primer (enter it in the same 5′ to 3′ format as the original primer)
You will need to look up the XbaI restriction site sequence (New England Biolabs website has good tools for this)
https://www.neb.com/products/restriction-endonucleases
Don’t worry about the change in Tm. The new part of the sequence won’t bind your template.
15) Add a NarI restriction site to the 5′ end of the new “mutated stop primer” from Question 13 (enter it in the same 5′ to 3′ format as the original primer). You will need to look up the NarI restriction site sequence.
Don’t worry about the change in Tm. The new part of the sequence won’t bind your template.
16) Since you have mutated the stop primer to eliminate the stop codon, the NarI restriction site will add two amino acids to the protein. The amino acids encoded by the sequence of the NarI site are and ? (Must be entered in correct order as they would read in the sequence)