Difference between revisions of "Reference annotation yeast Mbp1"

From "A B C"
Jump to navigation Jump to search
Line 1: Line 1:
 
<div id="BIO">
 
<div id="BIO">
==Mbp1 reference annotation==
+
=Mbp1 pritein reference annotation=
  
This is a reference annotation of the Saccharomyces cerevisiae Mbp1 protein sequence that integrates a number of annotation sources we encounter throughout the course.
 
  
===Links===
+
This is a reference annotation of the ''Saccharomyces cerevisiae'' Mbp1 protein sequence that integrates annotation sources we encounter throughout the course.
 +
 
 +
 
 +
==Links==
  
 
* [http://www.ncbi.nlm.nih.gov/protein/6320147 NP_010227.1] at Genbank
 
* [http://www.ncbi.nlm.nih.gov/protein/6320147 NP_010227.1] at Genbank
Line 15: Line 17:
 
* [http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?&uid=146823 cl04494 (KilA-N superfamily domain)] at CDD
 
* [http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?&uid=146823 cl04494 (KilA-N superfamily domain)] at CDD
  
===FASTA===
+
 
 +
==FASTA sequences==
  
 
  >gi|6320147|ref|NP_010227.1| Mbp1p [Saccharomyces cerevisiae S288c]
 
  >gi|6320147|ref|NP_010227.1| Mbp1p [Saccharomyces cerevisiae S288c]
Line 47: Line 50:
  
  
===Annotations===
+
==Annotations==
====NCBI CDD APSES domain boundaries====
+
===NCBI CDD APSES domain boundaries===
  
 
:Derived from the results of a [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi '''CDD'''] search with the RefSeq ID <tt>NP_010227</tt>. There is one KilA-N superfamily domain alignment. This superfamily contains the APSES domains.
 
:Derived from the results of a [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi '''CDD'''] search with the RefSeq ID <tt>NP_010227</tt>. There is one KilA-N superfamily domain alignment. This superfamily contains the APSES domains.
Line 64: Line 67:
  
  
====NCBI CDD Ankyrin domain boundaries====
+
===NCBI CDD Ankyrin domain boundaries===
  
 
:Derived from the results of a [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi '''CDD'''] search with the RefSeq ID (NP_010227). There are two, partially overlapping alignments with the profile, which contains 4 ANK repeats each. The aligned sequence is the consensus sequence for the profile (see the [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDConsensus CDD output documentation] for details).
 
:Derived from the results of a [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi '''CDD'''] search with the RefSeq ID (NP_010227). There are two, partially overlapping alignments with the profile, which contains 4 ANK repeats each. The aligned sequence is the consensus sequence for the profile (see the [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDConsensus CDD output documentation] for details).
Line 89: Line 92:
  
  
====SMART Annotations====
+
===SMART Annotations===
 
A [http://smart.embl-heidelberg.de/ '''SMART'''] search with the yeast Mbp1 protein sequence retrieved the APSES domain and three regions of similarity to ankyrin domains, annotated a number of low-complexity regions and a stretch of coiled coil. Annotations have been consolidated  below.
 
A [http://smart.embl-heidelberg.de/ '''SMART'''] search with the yeast Mbp1 protein sequence retrieved the APSES domain and three regions of similarity to ankyrin domains, annotated a number of low-complexity regions and a stretch of coiled coil. Annotations have been consolidated  below.
  
====SAS Annotations====
+
===SAS Annotations===
  
 
A [http://www.ebi.ac.uk/thornton-srv/databases/sas/ '''SAS'''] FASTA search with yeast Mbp1 protein sequence retrieved the homologous Ankyrin sequence from Swi6 (PDB: 1SW6), together with secondary structure annotations. This structural annotation is based on homology to a protein of known structure. Annotations were consolidated into below.
 
A [http://www.ebi.ac.uk/thornton-srv/databases/sas/ '''SAS'''] FASTA search with yeast Mbp1 protein sequence retrieved the homologous Ankyrin sequence from Swi6 (PDB: 1SW6), together with secondary structure annotations. This structural annotation is based on homology to a protein of known structure. Annotations were consolidated into below.
Line 99: Line 102:
 
While CDD, SMART and SAS all annotate the same general regions, they disagree in details of the domain boundaries and on the precise alignment.
 
While CDD, SMART and SAS all annotate the same general regions, they disagree in details of the domain boundaries and on the precise alignment.
  
===Consolidated Annotation===
+
==Consolidated Annotation==
  
 
  MBP1_SACCE
 
  MBP1_SACCE
Line 199: Line 202:
  
  
=== Orthologues ===
+
 
 +
== Orthologues ==
  
 
The Mbp1 orthologues in the six fungal reference species.
 
The Mbp1 orthologues in the six fungal reference species.
Line 251: Line 255:
 
<td class="sc">''Neurospora crassa''</td>
 
<td class="sc">''Neurospora crassa''</td>
 
<td class="sc">NEUCR</td>
 
<td class="sc">NEUCR</td>
<td class="sc">XXX</td>
+
<td class="sc"> NCU07246 </td>
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XXX XXX] <small>[http://www.ncbi.nlm.nih.gov/protein/XXX?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
+
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XP_955821 XP_955821] <small>[http://www.ncbi.nlm.nih.gov/protein/XP_955821?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
<td class="sc">[http://www.uniprot.org/uniprot/XXX XXX]</td>
+
<td class="sc">[http://www.uniprot.org/uniprot/Q7RW59 Q7RW59]</td>
 
</tr>
 
</tr>
  
Line 269: Line 273:
 
<td class="sc">''Ustilago maydis''</td>
 
<td class="sc">''Ustilago maydis''</td>
 
<td class="sc">USTMA</td>
 
<td class="sc">USTMA</td>
<td class="sc">XXX</td>
+
<td class="sc">hypothetical protein UM06196.1</td>
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XXX XXX] <small>[http://www.ncbi.nlm.nih.gov/protein/XXX?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
+
<td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XP_762343 XP_762343] <small>[http://www.ncbi.nlm.nih.gov/protein/XP_762343?report=fasta&log$=seqview&format=text (FASTA)]</small></td>
<td class="sc">[http://www.uniprot.org/uniprot/XXX XXX]</td>
+
<td class="sc">[http://www.uniprot.org/uniprot/Q4P117 Q4P117]</td>
 
</tr>
 
</tr>
  
  
 
</table>
 
</table>
 +
  
 
&nbsp;
 
&nbsp;
 +
 
[[Category:Bioinformatics]]
 
[[Category:Bioinformatics]]
 
</div>
 
</div>

Revision as of 15:08, 6 October 2012

Mbp1 pritein reference annotation

This is a reference annotation of the Saccharomyces cerevisiae Mbp1 protein sequence that integrates annotation sources we encounter throughout the course.


Links


FASTA sequences

>gi|6320147|ref|NP_010227.1| Mbp1p [Saccharomyces cerevisiae S288c]
MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF
GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDRKKAIRSASTSAIMET
KRNNKKAEENQFQSSKILGNPTAAPRKRGRPVGSTRGSRRKLGVNLQRSQSDMGFPRPAIPNSSISTTQL
PSIRSTMGPQSPTLGILEEERHDSRQQQPQQNNSAQFKEIDLEDGLSSDVEPSQQLQQVFNQNTGFVPQQ
QSSLIQTQQTESMATSVSSSPSLPTSPGDFADSNPFEERFPGGGTSPIISMIPRYPVTSRPQTSDINDKV
NKYLSKLVDYFISNEMKSNKSLPQVLLHPPPHSAPYIDAPIDPELHTAFHWACSMGNLPIAEALYEAGTS
IRSTNSQGQTPLMRSSLFHNSYTRRTFPRIFQLLHETVFDIDSQSQTVIHHIVKRKSTTPSAVYYLDVVL
SKIKDFSPQYRIELLLNTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLTANEIMNQQYEQM
MIQNGTNQHVNSSNTDLNIHVNTNNIETKNDVNSMVIMSPVSPSDYITYPSQIATNISRNIPNVVNSMKQ
MASIYNDLHEQHDNEIKSLQKTLKSISKTKIQVSLKTLEVLKESSKDENGEAQTNDDFEILSRLQEQNTK
KLRKRLIRYKRLIKQKLEYRQTVLLNKLIEDETQATTNNTVEKDNNTLERLELAQELTMLQLQRKNKLSS
LVKKFEDNAKIHKYRRIIREGTEMNIEEVDSSLDVILQTLIANNNKNKGAEQIITISNANSHA
>PDB:1MB1
MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF
GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDHHHHHH

Note: the sequence segments colored grey are disorderd in the protein structure. This generally means they do not contribute significant energy to the fold of the domain. The six histidines at the C-terminus colored in firebrick were added for purification and are not part of the Mbp1 sequence.

>1SW6:A|PDBID|CHAIN|SEQUENCE
NDDINKGPSGDNENNGTDDNDRTAGPIITFTHDLTSDFLSSPLKIMKALPSPVVNDNEQKMKLEAFLQRLLFPEIQEMPT
SLNNDSSNRNSEGGSSNQQQQHVSFDSLLQEVNDAFPNTQLNLNIPVDEHGNTPLHWLTSIANLELVKHLVKHGSNRLYG
DNMGESCLVKAVKSVNNYDSGTFEALLDYLYPCLILEDSMNRTILHHIIITSGMTGCSAAAKYYLDILMGWIVKKQNRPI
QSGTNEKESKPNDKNGERKDSILENLDLKWIIANMLNAQDSNGDTCLNIAARLGNISIVDALLDYGADPFIANKSGLRPV
DFGAGLE

Note: this sequence is a part of the Saccharomyces cerevisiae Swi6 protein, which is homologous to Mbp1 but does not contain an APSES domain. Its ankyrin domains have been structurally defined in the 1SW6 PDB file, they do not conform in all details to the canonical Ankyrin domain structure. The sequence segments colored grey are disordered in the protein structure.


Annotations

NCBI CDD APSES domain boundaries

Derived from the results of a CDD search with the RefSeq ID NP_010227. There is one KilA-N superfamily domain alignment. This superfamily contains the APSES domains.
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
1MB1            19 IHSTGSIMKRKKDDWVNATHILKAANFAKaKRTRILEKEVLKETHEKVQ----------------GGFGKYQGTWVPLNI 82
Cdd:pfam04383    3 YNDFEIIIRRDKDGYINATKLCKAAGATK-RFRNWLRLESTKELIEELSkennidvliievenkkGKNGRLQGTYVHPDL 81


                           90
                   ....*....|....*
1MB1            83 AKQLA----EKFSVY 93
Cdd:pfam04383   82 ALAIAswisPEFALK 96


NCBI CDD Ankyrin domain boundaries

Derived from the results of a CDD search with the RefSeq ID (NP_010227). There are two, partially overlapping alignments with the profile, which contains 4 ANK repeats each. The aligned sequence is the consensus sequence for the profile (see the CDD output documentation for details).
Alignment 1 - E-value = 1.69e-08
                          10        20        30        40        50        60        70        80
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
MBP1_SACCE     76 IDPELHTAFHWACSMGNLPIAEALYEAGTSIRSTNSQGQTPLMRSSLFHNsytrrtfPRIFQLLHETVFDIDSQS---QT 152
Cdd:cd00204     3 RDEDGRTPLHLAASNGHLEVVKLLLENGADVNAKDNDGRTPLHLAAKNGH-------LEIVKLLLEKGADVNARDkdgNT 75

                          90       100       110       120       130
                  ....*....|....*....|....*....|....*....|....*....|....*....
MBP1_SACCE    153 VIHHIVKRKSTtpSAVYYLdvvLSKIKDfspqyriellLNTQDKNGDTALHIASKNGDV 211
Cdd:cd00204    76 PLHLAARNGNL--DVVKLL---LKHGAD----------VNARDKDGRTPLHLAAKNGHL 119


Alignment 2 - E-value=8.66e-05
                          10        20        30        40        50        60        70        80        
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*...
MBP1_SACCE    192 NTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLT----ANEIMNQQYEQMMIQNGTNQHV--NSSNTDLNIHVNTNNIET 273 
Cdd:cd00204     1 NARDEDGRTPLHLAASNGHLEVVKLLLENGADVNAKDNDGRTplhlAAKNGHLEIVKLLLEKGADVNArdKDGNTPLHLAARNGNLDV 88


SMART Annotations

A SMART search with the yeast Mbp1 protein sequence retrieved the APSES domain and three regions of similarity to ankyrin domains, annotated a number of low-complexity regions and a stretch of coiled coil. Annotations have been consolidated below.

SAS Annotations

A SAS FASTA search with yeast Mbp1 protein sequence retrieved the homologous Ankyrin sequence from Swi6 (PDB: 1SW6), together with secondary structure annotations. This structural annotation is based on homology to a protein of known structure. Annotations were consolidated into below.


While CDD, SMART and SAS all annotate the same general regions, they disagree in details of the domain boundaries and on the precise alignment.

Consolidated Annotation

MBP1_SACCE
Annotations based on 
- CDD domain analysis,
- SAS structure annotation and
- literature data on binding region

Keys:

=   domain annotation
C   Coiled coil regions predicted by Coils2 program
x   Low complexity region
*   Proposed binding region
+   positively charged residues, oriented for possible DNA binding interactions
-   negatively charged residues, oriented for possible DNA binding interactions 

E   beta strand
H   alpha helix
t   beta turn
    Sequence that was invisible in the 1SW6 structure is listed in lowercase.


                  10         20         30         40         50         60 
          MSNQIYSARY SGVDVYEFIH STGSIMKRKK DDWVNATHIL KAANFAKAKR TRILEKEVLK
1MB1      ----EEEEEt t-EEEEEEEE t-EEEEEEtt ---EEHHHHH HH----HHHH HHHHhhhHHH
                                                               * *+**-+**** Proposed DNA binding 
pfam04383                    == ========== ========== ========== ========== (CDD alignment)
pfam04383                ====== ========== ========== ========== ========== (SMART alignment) 

                  70         80         90        100        110        120 
          ETHEKVQGGF GKYQGTWVPL NIAKQLAEKF SVYDQLKPLF DFTQTDGSAS PPPAPKHHHA
1MB1      ---EEE---- tt--EEEE-H HHHHHHHHH- --HHHHtt-                       
          **+*+***** ****                                                   Proposed DNA binding
pfam04383 ========== ========== ========== ===                              (CDD alignment)
pfam04383 ========== ========== ========== ========== ========== =          (SMART alignment) 

                 130        140        150        160        170        180 
          SKVDRKKAIR SASTSAIMET KRNNKKAEEN QFQSSKILGN PTAAPRKRGR PVGSTRGSRR
                                                                                      


                 190        200        210        220        230        240 
          KLGVNLQRSQ SDMGFPRPAI PNSSISTTQL PSIRSTMGPQ SPTLGILEEE RHDSRQQQPQ
Low compl.                                                            xxxxx (SMART SEG)


                 250        260        270        280        290        300 
          QNNSAQFKEI DLEDGLSSDV EPSQQLQQVF NQNTGFVPQQ QSSLIQTQQT ESMATSVSSS
Low compl.x                                        xx xxxxxxxxxx xxxxxxxxxx (SMART SEG)


                 310        320        330        340        350        360 
          PSLPTSPGDF ADSNPFEERF PGGGTSPIIS MIPRYPVTSR PQTSDINDKV NKYLSKLVDY
          xxxxxxx                                                           (SMART SEG)
Swi6                       GPII TFTHDLTSDF LSSPLKIMKA LPSPVVNDNE QKM--KL-EA (SAS alignment: 1SW6)
1SW6                       -EEE --tt---ttt ------EE-- ---t---HHH HHH--HH-HH (SAS 2° structure)


                                                370        380        390        400        410        420 
          FISNEMK-------------------------------SNK SLPQVLLHPP PHSAPYIDAP IDPELHTAFH WACSMGNLPI AEALYEAGTS
Swi6      FLQRLLFpeiqemptslnndssnrnseggssnqqqqhvSFD SLLQEVNDAF PNTQLNLNIP VDEHGNTPLH WLTSIANLEL VKHLVKHGSN (SAS alignment: 1SW6)
1SW6      HHHHHH-                               -HH HHHHHHHHH- t-----t--- --t----HHH HHHH--tHHH HHHHHH---- (SAS 2° structure)


                 430        440        450        460        470           480 
          IRSTNSQGQT PLMRSSLFHN SYTRRTFPRI FQLLHETVFD IDSQSQTVIH HIVKRKSTT---P
Swi6      RLYGDNMGES CLVKAVKSVN NYDSGTFEAL LDYLYPCLIL EDSMNRTILH HIIITSGMTGCSA (SAS alignment: 1SW6)
1SW6      t---tt---- HHHHHHH--H HHH---HHHH HHHHHHHHHE E-t----HHH HHHHHH--t--HH (SAS 2° structure)


                 490                                      500        510        520        530        540 
          SAVYYLDVVL-------------------------------SKIKDFSPQY RIELLLNTQD KNGDTALHIA SKNGDVVFFN TLVKMGALTT
Swi6      AAKYYLDILMGWIVKKQNRPIQSGtnekeskpndkngerkDSILENLDLKW IIANMLNAQD SNGDTCLNIA ARLGNISIVD ALLDYGADPF (SAS alignment: 1SW6)
1SW6      HHHHHHHHHHHHHHHHHH--EEE-                -HHHHHt-HHH HHHH------ t----HHHHH HHH--HHHHH HHHH----t- (SAS 2° structure)
 

                 550        560        570        580        590        600 
          ISNKEGLTAN EIMNQQYEQM MIQNGTNQHV NSSNTDLNIH VNTNNIETKN DVNSMVIMSP
Swi6      IANKSGLRPV DFGAG                                                 (SAS alignment: 1SW6)
1SW6      ---t----HH HH---                                                 (SAS 2° structure)


                 610        620        630        640        650        660 
          VSPSDYITYP SQIATNISRN IPNVVNSMKQ MASIYNDLHE QHDNEIKSLQ KTLKSISKTK
Coiled c.                                    CCCCCCCC CCCCCCCCCC CCCCC      (SMART COILS2)
 

                 670        680        690        700        710        720 
          IQVSLKTLEV LKESSKDENG EAQTNDDFEI LSRLQEQNTK KLRKRLIRYK RLIKQKLEYR
Low compl.                                          x xxxxxxxxxx xxxxxxx    (SMART SEG)

                 730        740        750        760        770        780 
          QTVLLNKLIE DETQATTNNT VEKDNNTLER LELAQELTML QLQRKNKLSS LVKKFEDNAK


                 790        800        810        820        830 
          IHKYRRIIRE GTEMNIEEVD SSLDVILQTL IANNNKNKGA EQIITISNAN SHA


Orthologues

The Mbp1 orthologues in the six fungal reference species.

  • Saccharomyces cerevisiae (SACCE)
  • Aspergillus nidulans (ASPNI)
  • Candida albicans (CANAL)
  • Neurospora crassa (NEUCR)
  • Schizosaccharomyces pombe (SCHPO)
  • Ustilago maydis (USTMA)


Species Code Name RefSeq UniProt
Saccharomyces cerevisiae SACCE Mbp1p NP_010227 (FASTA) P39678
Aspergillus nidulans ASPNI XXX XXX (FASTA) XXX
Candida albicans CANAL XXX XXX (FASTA) XXX
Neurospora crassa NEUCR NCU07246 XP_955821 (FASTA) Q7RW59
Schizosaccharomyces pombe SCHPO XXX XXX (FASTA) XXX
Ustilago maydis USTMA hypothetical protein UM06196.1 XP_762343 (FASTA) Q4P117