Difference between revisions of "Reference annotation yeast Mbp1"

Revision as of 18:08, 16 October 2012

Mbp1 protein reference annotation

This is a reference annotation of the Saccharomyces cerevisiae Mbp1 protein sequence that integrates annotation sources we encounter throughout the course.

see also: Reference sequences of APSES domains

FASTA sequences

>gi|6320147|ref|NP_010227.1| Mbp1p [Saccharomyces cerevisiae S288c]
MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF
GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDRKKAIRSASTSAIMET
KRNNKKAEENQFQSSKILGNPTAAPRKRGRPVGSTRGSRRKLGVNLQRSQSDMGFPRPAIPNSSISTTQL
PSIRSTMGPQSPTLGILEEERHDSRQQQPQQNNSAQFKEIDLEDGLSSDVEPSQQLQQVFNQNTGFVPQQ
QSSLIQTQQTESMATSVSSSPSLPTSPGDFADSNPFEERFPGGGTSPIISMIPRYPVTSRPQTSDINDKV
NKYLSKLVDYFISNEMKSNKSLPQVLLHPPPHSAPYIDAPIDPELHTAFHWACSMGNLPIAEALYEAGTS
IRSTNSQGQTPLMRSSLFHNSYTRRTFPRIFQLLHETVFDIDSQSQTVIHHIVKRKSTTPSAVYYLDVVL
SKIKDFSPQYRIELLLNTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLTANEIMNQQYEQM
MIQNGTNQHVNSSNTDLNIHVNTNNIETKNDVNSMVIMSPVSPSDYITYPSQIATNISRNIPNVVNSMKQ
MASIYNDLHEQHDNEIKSLQKTLKSISKTKIQVSLKTLEVLKESSKDENGEAQTNDDFEILSRLQEQNTK
KLRKRLIRYKRLIKQKLEYRQTVLLNKLIEDETQATTNNTVEKDNNTLERLELAQELTMLQLQRKNKLSS
LVKKFEDNAKIHKYRRIIREGTEMNIEEVDSSLDVILQTLIANNNKNKGAEQIITISNANSHA

>PDB:1MB1
MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF
GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDHHHHHH

Note: the sequence segments colored grey are disorderd in the protein structure. This generally means they do not contribute significant energy to the fold of the domain. The six histidines at the C-terminus colored in firebrick were added for purification and are not part of the Mbp1 sequence.

>1SW6:A|PDBID|CHAIN|SEQUENCE
NDDINKGPSGDNENNGTDDNDRTAGPIITFTHDLTSDFLSSPLKIMKALPSPVVNDNEQKMKLEAFLQRLLFPEIQEMPT
SLNNDSSNRNSEGGSSNQQQQHVSFDSLLQEVNDAFPNTQLNLNIPVDEHGNTPLHWLTSIANLELVKHLVKHGSNRLYG
DNMGESCLVKAVKSVNNYDSGTFEALLDYLYPCLILEDSMNRTILHHIIITSGMTGCSAAAKYYLDILMGWIVKKQNRPI
QSGTNEKESKPNDKNGERKDSILENLDLKWIIANMLNAQDSNGDTCLNIAARLGNISIVDALLDYGADPFIANKSGLRPV
DFGAGLE

Note: this sequence is a part of the Saccharomyces cerevisiae Swi6 protein, which is homologous to Mbp1 but does not contain an APSES domain. Its ankyrin domains have been structurally defined in the 1SW6 PDB file, they do not conform in all details to the canonical Ankyrin domain structure. The sequence segments colored grey are disordered in the protein structure.

Annotations

NCBI CDD APSES domain boundaries

Derived from the results of a CDD search with the RefSeq ID NP_010227. There is one KilA-N superfamily domain alignment. This superfamily contains the APSES domains.

                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
1MB1            19 IHSTGSIMKRKKDDWVNATHILKAANFAKaKRTRILEKEVLKETHEKVQ----------------GGFGKYQGTWVPLNI 82
Cdd:pfam04383    3 YNDFEIIIRRDKDGYINATKLCKAAGATK-RFRNWLRLESTKELIEELSkennidvliievenkkGKNGRLQGTYVHPDL 81


                           90
                   ....*....|....*
1MB1            83 AKQLA----EKFSVY 93
Cdd:pfam04383   82 ALAIAswisPEFALK 96

NCBI CDD Ankyrin domain boundaries

Derived from the results of a CDD search with the RefSeq ID (NP_010227). There are two, partially overlapping alignments with the profile, which contains 4 ANK repeats each. The aligned sequence is the consensus sequence for the profile (see the CDD output documentation for details).

Alignment 1 - E-value = 1.69e-08

                          10        20        30        40        50        60        70        80
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
MBP1_SACCE     76 IDPELHTAFHWACSMGNLPIAEALYEAGTSIRSTNSQGQTPLMRSSLFHNsytrrtfPRIFQLLHETVFDIDSQS---QT 152
Cdd:cd00204     3 RDEDGRTPLHLAASNGHLEVVKLLLENGADVNAKDNDGRTPLHLAAKNGH-------LEIVKLLLEKGADVNARDkdgNT 75

                          90       100       110       120       130
                  ....*....|....*....|....*....|....*....|....*....|....*....
MBP1_SACCE    153 VIHHIVKRKSTtpSAVYYLdvvLSKIKDfspqyriellLNTQDKNGDTALHIASKNGDV 211
Cdd:cd00204    76 PLHLAARNGNL--DVVKLL---LKHGAD----------VNARDKDGRTPLHLAAKNGHL 119

Alignment 2 - E-value=8.66e-05

                          10        20        30        40        50        60        70        80        
                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*...
MBP1_SACCE    192 NTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLT----ANEIMNQQYEQMMIQNGTNQHV--NSSNTDLNIHVNTNNIET 273 
Cdd:cd00204     1 NARDEDGRTPLHLAASNGHLEVVKLLLENGADVNAKDNDGRTplhlAAKNGHLEIVKLLLEKGADVNArdKDGNTPLHLAARNGNLDV 88

SMART Annotations

A SMART search with the yeast Mbp1 protein sequence retrieved the APSES domain and three regions of similarity to ankyrin domains, annotated a number of low-complexity regions and a stretch of coiled coil. Annotations have been consolidated below.

SAS Annotations

A SAS FASTA search with yeast Mbp1 protein sequence retrieved the homologous Ankyrin sequence from Swi6 (PDB: 1SW6), together with secondary structure annotations. This structural annotation is based on homology to a protein of known structure. Annotations were consolidated into below.

While CDD, SMART and SAS all annotate the same general regions, they disagree in details of the domain boundaries and on the precise alignment.

Consolidated Annotation

MBP1_SACCE
Annotations based on 
- CDD domain analysis,
- SAS structure annotation and
- literature data on binding region

Keys:

=   domain annotation
C   Coiled coil regions predicted by Coils2 program
x   Low complexity region
*   Proposed binding region
+   positively charged residues, oriented for possible DNA binding interactions
-   negatively charged residues, oriented for possible DNA binding interactions 

E   beta strand
H   alpha helix
t   beta turn
    Sequence that was invisible in the 1SW6 structure is listed in lowercase.

                  10         20         30         40         50         60 
          MSNQIYSARY SGVDVYEFIH STGSIMKRKK DDWVNATHIL KAANFAKAKR TRILEKEVLK
1MB1      ----EEEEEt t-EEEEEEEE t-EEEEEEtt ---EEHHHHH HH----HHHH HHHHhhhHHH
                                                               * *+**-+**** Proposed DNA binding 
pfam04383                    == ========== ========== ========== ========== (CDD alignment)
pfam04383                ====== ========== ========== ========== ========== (SMART alignment) 

                  70         80         90        100        110        120 
          ETHEKVQGGF GKYQGTWVPL NIAKQLAEKF SVYDQLKPLF DFTQTDGSAS PPPAPKHHHA
1MB1      ---EEE---- tt--EEEE-H HHHHHHHHH- --HHHHtt-                       
          **+*+***** ****                                                   Proposed DNA binding
pfam04383 ========== ========== ========== ===                              (CDD alignment)
pfam04383 ========== ========== ========== ========== ========== =          (SMART alignment) 

                 130        140        150        160        170        180 
          SKVDRKKAIR SASTSAIMET KRNNKKAEEN QFQSSKILGN PTAAPRKRGR PVGSTRGSRR
                                                                                      


                 190        200        210        220        230        240 
          KLGVNLQRSQ SDMGFPRPAI PNSSISTTQL PSIRSTMGPQ SPTLGILEEE RHDSRQQQPQ
Low compl.                                                            xxxxx (SMART SEG)


                 250        260        270        280        290        300 
          QNNSAQFKEI DLEDGLSSDV EPSQQLQQVF NQNTGFVPQQ QSSLIQTQQT ESMATSVSSS
Low compl.x                                        xx xxxxxxxxxx xxxxxxxxxx (SMART SEG)


                 310        320        330        340        350        360 
          PSLPTSPGDF ADSNPFEERF PGGGTSPIIS MIPRYPVTSR PQTSDINDKV NKYLSKLVDY
          xxxxxxx                                                           (SMART SEG)
Swi6                       GPII TFTHDLTSDF LSSPLKIMKA LPSPVVNDNE QKM--KL-EA (SAS alignment: 1SW6)
1SW6                       -EEE --tt---ttt ------EE-- ---t---HHH HHH--HH-HH (SAS 2° structure)


                                                370        380        390        400        410        420 
          FISNEMK-------------------------------SNK SLPQVLLHPP PHSAPYIDAP IDPELHTAFH WACSMGNLPI AEALYEAGTS
Swi6      FLQRLLFpeiqemptslnndssnrnseggssnqqqqhvSFD SLLQEVNDAF PNTQLNLNIP VDEHGNTPLH WLTSIANLEL VKHLVKHGSN (SAS alignment: 1SW6)
1SW6      HHHHHH-                               -HH HHHHHHHHH- t-----t--- --t----HHH HHHH--tHHH HHHHHH---- (SAS 2° structure)


                 430        440        450        460        470           480 
          IRSTNSQGQT PLMRSSLFHN SYTRRTFPRI FQLLHETVFD IDSQSQTVIH HIVKRKSTT---P
Swi6      RLYGDNMGES CLVKAVKSVN NYDSGTFEAL LDYLYPCLIL EDSMNRTILH HIIITSGMTGCSA (SAS alignment: 1SW6)
1SW6      t---tt---- HHHHHHH--H HHH---HHHH HHHHHHHHHE E-t----HHH HHHHHH--t--HH (SAS 2° structure)


                 490                                      500        510        520        530        540 
          SAVYYLDVVL-------------------------------SKIKDFSPQY RIELLLNTQD KNGDTALHIA SKNGDVVFFN TLVKMGALTT
Swi6      AAKYYLDILMGWIVKKQNRPIQSGtnekeskpndkngerkDSILENLDLKW IIANMLNAQD SNGDTCLNIA ARLGNISIVD ALLDYGADPF (SAS alignment: 1SW6)
1SW6      HHHHHHHHHHHHHHHHHH--EEE-                -HHHHHt-HHH HHHH------ t----HHHHH HHH--HHHHH HHHH----t- (SAS 2° structure)
 

                 550        560        570        580        590        600 
          ISNKEGLTAN EIMNQQYEQM MIQNGTNQHV NSSNTDLNIH VNTNNIETKN DVNSMVIMSP
Swi6      IANKSGLRPV DFGAG                                                 (SAS alignment: 1SW6)
1SW6      ---t----HH HH---                                                 (SAS 2° structure)


                 610        620        630        640        650        660 
          VSPSDYITYP SQIATNISRN IPNVVNSMKQ MASIYNDLHE QHDNEIKSLQ KTLKSISKTK
Coiled c.                                    CCCCCCCC CCCCCCCCCC CCCCC      (SMART COILS2)
 

                 670        680        690        700        710        720 
          IQVSLKTLEV LKESSKDENG EAQTNDDFEI LSRLQEQNTK KLRKRLIRYK RLIKQKLEYR
Low compl.                                          x xxxxxxxxxx xxxxxxx    (SMART SEG)

                 730        740        750        760        770        780 
          QTVLLNKLIE DETQATTNNT VEKDNNTLER LELAQELTML QLQRKNKLSS LVKKFEDNAK


                 790        800        810        820        830 
          IHKYRRIIRE GTEMNIEEVD SSLDVILQTL IANNNKNKGA EQIITISNAN SHA

Orthologues

The Mbp1 orthologues in the six fungal reference species.

Saccharomyces cerevisiae (SACCE)
Aspergillus nidulans (ASPNI)
Candida albicans (CANAL)
Neurospora crassa (NEUCR)
Schizosaccharomyces pombe (SCHPO)
Ustilago maydis (USTMA)

Species	Code	Name	RefSeq	UniProt
Saccharomyces cerevisiae	SACCE	Mbp1p	NP_010227 (FASTA)	P39678
Aspergillus nidulans	ASPNI	AN3154	XP_660758 (FASTA)	Q5B8H6
Candida albicans	CANAL	potential DNA binding component of MBF	XP_723071 (FASTA)	Q5ANP5
Neurospora crassa	NEUCR	NCU07246	XP_955821 (FASTA)	Q7RW59
Schizosaccharomyces pombe	SCHPO	MBF transcription factor complex subunit Res2	NP_593032 (FASTA)	P41412
Ustilago maydis	USTMA	hypothetical protein UM06196.1	XP_762343 (FASTA)	Q4P117

@@ Line 24: / Line 24: @@
 * see also: [[Reference_APSES_domains|Reference sequences of APSES domains]]
+&nbsp;
 ==FASTA sequences==

Difference between revisions of "Reference annotation yeast Mbp1"

Revision as of 18:08, 16 October 2012

Links

FASTA sequences

Annotations

NCBI CDD APSES domain boundaries

NCBI CDD Ankyrin domain boundaries

SMART Annotations

SAS Annotations

Consolidated Annotation

Orthologues

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools