Difference between revisions of "Reference annotation yeast Mbp1"
| m (→Orthologs) | m (→Orthologs) | ||
| Line 293: | Line 293: | ||
| </tr> | </tr> | ||
| − | <tr class=" | + | <tr class="s2"> | 
| <td class="sc">''Cryptococcus neoformans''</td> | <td class="sc">''Cryptococcus neoformans''</td> | ||
| <td class="sc">CRYNE</td> | <td class="sc">CRYNE</td> | ||
| <td class="sc"> </td> | <td class="sc"> </td> | ||
| − | <td class="sc">[http://www.ncbi.nlm.nih.gov/protein/ | + | <td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XP_570545 XP_570545] <small>[http://www.ncbi.nlm.nih.gov/protein/XP_570545?report=fasta&log$=seqview&format=text (FASTA)]</small></td> | 
| − | <td class="sc">[http://www.uniprot.org/uniprot/Q5ANP5  | + | <td class="sc">[http://www.uniprot.org/uniprot/Q5ANP5 ????]</td> | 
| </tr> | </tr> | ||
| − | <tr class=" | + | <tr class="s1"> | 
| <td class="sc">''Neurospora crassa''</td> | <td class="sc">''Neurospora crassa''</td> | ||
| <td class="sc">NEUCR</td> | <td class="sc">NEUCR</td> | ||
| Line 313: | Line 313: | ||
| <td class="sc">PUCGR</td> | <td class="sc">PUCGR</td> | ||
| <td class="sc"> NCU07246 </td> | <td class="sc"> NCU07246 </td> | ||
| − | <td class="sc">[http://www.ncbi.nlm.nih.gov/protein/ | + | <td class="sc">[http://www.ncbi.nlm.nih.gov/protein/XP_003327086 XP_003327086] <small>[http://www.ncbi.nlm.nih.gov/protein/XP_003327086?report=fasta&log$=seqview&format=text (FASTA)]</small></td> | 
| − | <td class="sc">[http://www.uniprot.org/uniprot/Q7RW59  | + | <td class="sc">[http://www.uniprot.org/uniprot/Q7RW59 ?????]</td> | 
| </tr> | </tr> | ||
| Line 325: | Line 325: | ||
| </tr> | </tr> | ||
| − | <tr class=" | + | <tr class="s2"> | 
| <td class="sc">''Schizosaccharomyces pombe''</td> | <td class="sc">''Schizosaccharomyces pombe''</td> | ||
| <td class="sc">SCHPO</td> | <td class="sc">SCHPO</td> | ||
| Line 333: | Line 333: | ||
| </tr> | </tr> | ||
| − | <tr class=" | + | <tr class="s1"> | 
| <td class="sc">''Ustilago maydis''</td> | <td class="sc">''Ustilago maydis''</td> | ||
| <td class="sc">USTMA</td> | <td class="sc">USTMA</td> | ||
Revision as of 04:42, 6 December 2014
Yeast Mbp1 protein reference annotation
A reference annotation of the Saccharomyces cerevisiae Mbp1 protein sequence that integrates annotation sources we encounter throughout the course. 
Links
- NP_010227.1 at Genbank
- P39678 at UniProtKB
- 1MB1 at PDB
- 1BM8 at PDB
- 1L3G at PDB
- 1SW6 at PDB
- cd00204 (Ankyrin domain) at CDD
- cl04494 (KilA-N superfamily domain) at CDD
 
FASTA sequences
>gi|6320147|ref|NP_010227.1| Mbp1p [Saccharomyces cerevisiae S288c] MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDRKKAIRSASTSAIMET KRNNKKAEENQFQSSKILGNPTAAPRKRGRPVGSTRGSRRKLGVNLQRSQSDMGFPRPAIPNSSISTTQL PSIRSTMGPQSPTLGILEEERHDSRQQQPQQNNSAQFKEIDLEDGLSSDVEPSQQLQQVFNQNTGFVPQQ QSSLIQTQQTESMATSVSSSPSLPTSPGDFADSNPFEERFPGGGTSPIISMIPRYPVTSRPQTSDINDKV NKYLSKLVDYFISNEMKSNKSLPQVLLHPPPHSAPYIDAPIDPELHTAFHWACSMGNLPIAEALYEAGTS IRSTNSQGQTPLMRSSLFHNSYTRRTFPRIFQLLHETVFDIDSQSQTVIHHIVKRKSTTPSAVYYLDVVL SKIKDFSPQYRIELLLNTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLTANEIMNQQYEQM MIQNGTNQHVNSSNTDLNIHVNTNNIETKNDVNSMVIMSPVSPSDYITYPSQIATNISRNIPNVVNSMKQ MASIYNDLHEQHDNEIKSLQKTLKSISKTKIQVSLKTLEVLKESSKDENGEAQTNDDFEILSRLQEQNTK KLRKRLIRYKRLIKQKLEYRQTVLLNKLIEDETQATTNNTVEKDNNTLERLELAQELTMLQLQRKNKLSS LVKKFEDNAKIHKYRRIIREGTEMNIEEVDSSLDVILQTLIANNNKNKGAEQIITISNANSHA
>1BM8_A QIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIA KQLAEKFSVYDQLKPLFDF
>1MB1_A MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDHHHHHH
Note: the sequence segments colored grey are disorderd in the protein structure. This generally means they do not contribute significant energy to the fold of the domain. The six histidines at the C-terminus colored in firebrick were added for purification and are not part of the Mbp1 sequence.
>1L3G_A SNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLN IAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDKLAAALEHHHHHH
Note: here too the C-terminus is disordered and colored grey and the protein has a purification tag colored in firebrick. However, this being an NMR file, disordered segments are included in the PDB coordinate file and defining the extent of disorder required evaluating the superposed set of models. There are no more structured residues than in the 1MB1 structure, even though the sequence used in the experiment was longer.
>1SW6_A NDDINKGPSGDNENNGTDDNDRTAGPIITFTHDLTSDFLSSPLKIMKALPSPVVNDNEQKMKLEAFLQRLLFPEIQEMPT SLNNDSSNRNSEGGSSNQQQQHVSFDSLLQEVNDAFPNTQLNLNIPVDEHGNTPLHWLTSIANLELVKHLVKHGSNRLYG DNMGESCLVKAVKSVNNYDSGTFEALLDYLYPCLILEDSMNRTILHHIIITSGMTGCSAAAKYYLDILMGWIVKKQNRPI QSGTNEKESKPNDKNGERKDSILENLDLKWIIANMLNAQDSNGDTCLNIAARLGNISIVDALLDYGADPFIANKSGLRPV DFGAGLE
Note: this sequence is a part of the Saccharomyces cerevisiae Swi6 protein, which is homologous to Mbp1 but does not contain an APSES domain. Its ankyrin domains have been structurally defined in the 1SW6 PDB file, they do not conform in all details to the canonical Ankyrin domain structure. The sequence segments colored grey are disordered in the protein structure.
Annotations
NCBI CDD APSES domain boundaries
APSES domain boundaries can be derived from the results of a CDD search with the ID 1BM8_A. The KilA-N superfamily domain alignment is returned. This superfamily contains the APSES domains.
- (pfam 04383): KilA-N domain; The amino-terminal module of the D6R/N1R proteins defines a novel, conserved DNA-binding domain (the KilA-N domain) that is found in a wide range of proteins of large bacterial and eukaryotic DNA viruses. The KilA-N domain family also includes the previously defined APSES domain. The KilA-N and APSES domains may also share a common fold with the nucleic acid-binding modules of the LAGLIDADG nucleases and the amino-terminal domains of the tRNA endonuclease.
10 20 30 40 50 60 70 80
....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
1BM8A 16 IHSTGSIMKRKKDDWVNATHILKAANFAKaKRTRILEKEVLKETHEKVQ---------------GGFGKYQGTWVPLNIA 80
Cdd:pfam04383 3 YNDFEIIIRRDKDGYINATKLCKAAGETK-RFRNWLRLESTKELIEELSeennvdkseiiigrkGKNGRLQGTYVHPDLA 81
90
....*....|....
1BM8A 81 KQLA----EKFSVY 90
Cdd:pfam04383 82 LAIAswisPEFALK 95
Note that this domain definition begins at position 16 of the domain. But virtually all fungal APSES domains have a longer N-terminus. Blindly applying this domain definition would lose important information. For most purposes we will prefer the sequence spanned by the 1BM8_A structure. The sequence is given below, the KilA-N domain is coloured dark green. By this definition the APSES domain is 99 amino acids long and comprises residues 4 to 102 of the NP_010227 sequence.
10 20 30 40 50 60 70 80
....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
1BM8A 1 QIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIA 80
90
....*....|....*....
1BM8A 81 KQLAEKFSVYDQLKPLFDF 99
 
NCBI CDD Ankyrin domain boundaries
- Derived from the results of a CDD search with the RefSeq ID (NP_010227). There are two, partially overlapping alignments with the profile, which contains 4 ANK repeats each. The aligned sequence is the consensus sequence for the profile (see the CDD output documentation for details).
- Alignment 1 - E-value = 1.69e-08
10 20 30 40 50 60 70 80 ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....| MBP1_SACCE 76 IDPELHTAFHWACSMGNLPIAEALYEAGTSIRSTNSQGQTPLMRSSLFHNsytrrtfPRIFQLLHETVFDIDSQS---QT 152 Cdd:cd00204 3 RDEDGRTPLHLAASNGHLEVVKLLLENGADVNAKDNDGRTPLHLAAKNGH-------LEIVKLLLEKGADVNARDkdgNT 75 90 100 110 120 130 ....*....|....*....|....*....|....*....|....*....|....*.... MBP1_SACCE 153 VIHHIVKRKSTtpSAVYYLdvvLSKIKDfspqyriellLNTQDKNGDTALHIASKNGDV 211 Cdd:cd00204 76 PLHLAARNGNL--DVVKLL---LKHGAD----------VNARDKDGRTPLHLAAKNGHL 119
- Alignment 2 - E-value=8.66e-05
10 20 30 40 50 60 70 80 ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*... MBP1_SACCE 192 NTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLT----ANEIMNQQYEQMMIQNGTNQHV--NSSNTDLNIHVNTNNIET 273 Cdd:cd00204 1 NARDEDGRTPLHLAASNGHLEVVKLLLENGADVNAKDNDGRTplhlAAKNGHLEIVKLLLEKGADVNArdKDGNTPLHLAARNGNLDV 88
SMART Annotations
A SMART search with the yeast Mbp1 protein sequence retrieved the APSES domain and three regions of similarity to ankyrin domains, annotated a number of low-complexity regions and a stretch of coiled coil. Annotations have been consolidated below.
SAS Annotations
A SAS FASTA search with yeast Mbp1 protein sequence retrieved the homologous Ankyrin sequence from Swi6 (PDB: 1SW6), together with secondary structure annotations. This structural annotation is based on homology to a protein of known structure. Annotations were consolidated into below.
While CDD, SMART and SAS all annotate the same general regions, they disagree in details of the domain boundaries and on the precise alignment.
Consolidated Annotation
MBP1_SACCE
Annotations based on 
- CDD domain analysis,
- SAS structure annotation and
- literature data on binding region
Keys:
=   domain annotation
C   Coiled coil regions predicted by Coils2 program
x   Low complexity region
*   Proposed binding region
+   positively charged residues, oriented for possible DNA binding interactions
-   negatively charged residues, oriented for possible DNA binding interactions 
E   beta strand
H   alpha helix
t   beta turn
    Sequence that was invisible in the 1SW6 structure is listed in lowercase.
                  10         20         30         40         50         60 
          MSNQIYSARY SGVDVYEFIH STGSIMKRKK DDWVNATHIL KAANFAKAKR TRILEKEVLK
1MB1      ----EEEEEt t-EEEEEEEE t-EEEEEEtt ---EEHHHHH HH----HHHH HHHHhhhHHH
                                                               * *+**-+**** Proposed DNA binding 
pfam04383                    == ========== ========== ========== ========== (CDD alignment)
pfam04383                ====== ========== ========== ========== ========== (SMART alignment) 
                  70         80         90        100        110        120 
          ETHEKVQGGF GKYQGTWVPL NIAKQLAEKF SVYDQLKPLF DFTQTDGSAS PPPAPKHHHA
1MB1      ---EEE---- tt--EEEE-H HHHHHHHHH- --HHHHtt-                       
          **+*+***** ****                                                   Proposed DNA binding
pfam04383 ========== ========== ========== ===                              (CDD alignment)
pfam04383 ========== ========== ========== ========== ========== =          (SMART alignment) 
                 130        140        150        160        170        180 
          SKVDRKKAIR SASTSAIMET KRNNKKAEEN QFQSSKILGN PTAAPRKRGR PVGSTRGSRR
                                                                                      
                 190        200        210        220        230        240 
          KLGVNLQRSQ SDMGFPRPAI PNSSISTTQL PSIRSTMGPQ SPTLGILEEE RHDSRQQQPQ
Low compl.                                                            xxxxx (SMART SEG)
                 250        260        270        280        290        300 
          QNNSAQFKEI DLEDGLSSDV EPSQQLQQVF NQNTGFVPQQ QSSLIQTQQT ESMATSVSSS
Low compl.x                                        xx xxxxxxxxxx xxxxxxxxxx (SMART SEG)
                 310        320        330        340        350        360 
          PSLPTSPGDF ADSNPFEERF PGGGTSPIIS MIPRYPVTSR PQTSDINDKV NKYLSKLVDY
          xxxxxxx                                                           (SMART SEG)
Swi6                       GPII TFTHDLTSDF LSSPLKIMKA LPSPVVNDNE QKM--KL-EA (SAS alignment: 1SW6)
1SW6                       -EEE --tt---ttt ------EE-- ---t---HHH HHH--HH-HH (SAS 2° structure)
                                                370        380        390        400        410        420 
          FISNEMK-------------------------------SNK SLPQVLLHPP PHSAPYIDAP IDPELHTAFH WACSMGNLPI AEALYEAGTS
Swi6      FLQRLLFpeiqemptslnndssnrnseggssnqqqqhvSFD SLLQEVNDAF PNTQLNLNIP VDEHGNTPLH WLTSIANLEL VKHLVKHGSN (SAS alignment: 1SW6)
1SW6      HHHHHH-                               -HH HHHHHHHHH- t-----t--- --t----HHH HHHH--tHHH HHHHHH---- (SAS 2° structure)
                 430        440        450        460        470           480 
          IRSTNSQGQT PLMRSSLFHN SYTRRTFPRI FQLLHETVFD IDSQSQTVIH HIVKRKSTT---P
Swi6      RLYGDNMGES CLVKAVKSVN NYDSGTFEAL LDYLYPCLIL EDSMNRTILH HIIITSGMTGCSA (SAS alignment: 1SW6)
1SW6      t---tt---- HHHHHHH--H HHH---HHHH HHHHHHHHHE E-t----HHH HHHHHH--t--HH (SAS 2° structure)
                 490                                      500        510        520        530        540 
          SAVYYLDVVL-------------------------------SKIKDFSPQY RIELLLNTQD KNGDTALHIA SKNGDVVFFN TLVKMGALTT
Swi6      AAKYYLDILMGWIVKKQNRPIQSGtnekeskpndkngerkDSILENLDLKW IIANMLNAQD SNGDTCLNIA ARLGNISIVD ALLDYGADPF (SAS alignment: 1SW6)
1SW6      HHHHHHHHHHHHHHHHHH--EEE-                -HHHHHt-HHH HHHH------ t----HHHHH HHH--HHHHH HHHH----t- (SAS 2° structure)
 
                 550        560        570        580        590        600 
          ISNKEGLTAN EIMNQQYEQM MIQNGTNQHV NSSNTDLNIH VNTNNIETKN DVNSMVIMSP
Swi6      IANKSGLRPV DFGAG                                                 (SAS alignment: 1SW6)
1SW6      ---t----HH HH---                                                 (SAS 2° structure)
                 610        620        630        640        650        660 
          VSPSDYITYP SQIATNISRN IPNVVNSMKQ MASIYNDLHE QHDNEIKSLQ KTLKSISKTK
Coiled c.                                    CCCCCCCC CCCCCCCCCC CCCCC      (SMART COILS2)
 
                 670        680        690        700        710        720 
          IQVSLKTLEV LKESSKDENG EAQTNDDFEI LSRLQEQNTK KLRKRLIRYK RLIKQKLEYR
Low compl.                                          x xxxxxxxxxx xxxxxxx    (SMART SEG)
                 730        740        750        760        770        780 
          QTVLLNKLIE DETQATTNNT VEKDNNTLER LELAQELTML QLQRKNKLSS LVKKFEDNAK
                 790        800        810        820        830 
          IHKYRRIIRE GTEMNIEEVD SSLDVILQTL IANNNKNKGA EQIITISNAN SHA
Orthologs
The Mbp1 orthologs (by RBM) in the ten fungal reference species.
- Aspergillus nidulans (ASPNI)
- Bipolaris oryzae (BIPOR)
- Coprinopsis cinerea (COPCI)
- Cryptococcus neoformans (CRYNE)
- Neurospora crassa (NEUCR)
- Puccinia Graminis (PUCGR)
- Saccharomyces cerevisiae (SACCE)
- Schizosaccharomyces pombe (SCHPO)
- Ustilago maydis (USTMA)
- Wallemia sebi (WALSE)
| Species | Code | Name | RefSeq | UniProt | 
| Aspergillus nidulans | ASPNI | AN3154 | XP_660758 (FASTA) | Q5B8H6 | 
| Bipolaris orizae | BIPOR | XP_007682304 (FASTA) | ???? | |
| Coprinopsis cinerea | COPCI | XP_001837394 (FASTA) | ???? | |
| Cryptococcus neoformans | CRYNE | XP_570545 (FASTA) | ???? | |
| Neurospora crassa | NEUCR | NCU07246 | XP_955821 (FASTA) | Q7RW59 | 
| Puccinia graminis | PUCGR | NCU07246 | XP_003327086 (FASTA) | ????? | 
| Saccharomyces cerevisiae | SACCE | Mbp1p | NP_010227 (FASTA) | P39678 | 
| Schizosaccharomyces pombe | SCHPO | MBF transcription factor complex subunit Res2 | NP_593032 (FASTA) | P41412 | 
| Ustilago maydis | USTMA | hypothetical protein UM06196.1 | XP_762343 (FASTA) | Q4P117 | 
| Wallemia sebi | WALSE | XP_762343 (FASTA) | Q4P117 | 
Ortholog APSES domains
The ortholog APSES domains can be aligned without gaps. They comprise the following sequences:
 
