Difference between revisions of "Reference annotation yeast Mbp1"
m (→Links) |
m (→Links) |
||
Line 24: | Line 24: | ||
* see also: [[Reference_APSES_domains|Reference sequences of APSES domains]] | * see also: [[Reference_APSES_domains|Reference sequences of APSES domains]] | ||
+ | |||
+ | |||
+ | | ||
==FASTA sequences== | ==FASTA sequences== |
Revision as of 18:08, 16 October 2012
Mbp1 protein reference annotation
This is a reference annotation of the Saccharomyces cerevisiae Mbp1 protein sequence that integrates annotation sources we encounter throughout the course.
Links
- NP_010227.1 at Genbank
- P39678 at UniProtKB
- 1MB1 at PDB
- 1BM8 at PDB
- 1L3G at PDB
- 1SW6 at PDB
- cd00204 (Ankyrin domain) at CDD
- cl04494 (KilA-N superfamily domain) at CDD
- see also: Reference sequences of APSES domains
FASTA sequences
>gi|6320147|ref|NP_010227.1| Mbp1p [Saccharomyces cerevisiae S288c] MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDRKKAIRSASTSAIMET KRNNKKAEENQFQSSKILGNPTAAPRKRGRPVGSTRGSRRKLGVNLQRSQSDMGFPRPAIPNSSISTTQL PSIRSTMGPQSPTLGILEEERHDSRQQQPQQNNSAQFKEIDLEDGLSSDVEPSQQLQQVFNQNTGFVPQQ QSSLIQTQQTESMATSVSSSPSLPTSPGDFADSNPFEERFPGGGTSPIISMIPRYPVTSRPQTSDINDKV NKYLSKLVDYFISNEMKSNKSLPQVLLHPPPHSAPYIDAPIDPELHTAFHWACSMGNLPIAEALYEAGTS IRSTNSQGQTPLMRSSLFHNSYTRRTFPRIFQLLHETVFDIDSQSQTVIHHIVKRKSTTPSAVYYLDVVL SKIKDFSPQYRIELLLNTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLTANEIMNQQYEQM MIQNGTNQHVNSSNTDLNIHVNTNNIETKNDVNSMVIMSPVSPSDYITYPSQIATNISRNIPNVVNSMKQ MASIYNDLHEQHDNEIKSLQKTLKSISKTKIQVSLKTLEVLKESSKDENGEAQTNDDFEILSRLQEQNTK KLRKRLIRYKRLIKQKLEYRQTVLLNKLIEDETQATTNNTVEKDNNTLERLELAQELTMLQLQRKNKLSS LVKKFEDNAKIHKYRRIIREGTEMNIEEVDSSLDVILQTLIANNNKNKGAEQIITISNANSHA
>PDB:1MB1 MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDHHHHHH
Note: the sequence segments colored grey are disorderd in the protein structure. This generally means they do not contribute significant energy to the fold of the domain. The six histidines at the C-terminus colored in firebrick were added for purification and are not part of the Mbp1 sequence.
>1SW6:A|PDBID|CHAIN|SEQUENCE NDDINKGPSGDNENNGTDDNDRTAGPIITFTHDLTSDFLSSPLKIMKALPSPVVNDNEQKMKLEAFLQRLLFPEIQEMPT SLNNDSSNRNSEGGSSNQQQQHVSFDSLLQEVNDAFPNTQLNLNIPVDEHGNTPLHWLTSIANLELVKHLVKHGSNRLYG DNMGESCLVKAVKSVNNYDSGTFEALLDYLYPCLILEDSMNRTILHHIIITSGMTGCSAAAKYYLDILMGWIVKKQNRPI QSGTNEKESKPNDKNGERKDSILENLDLKWIIANMLNAQDSNGDTCLNIAARLGNISIVDALLDYGADPFIANKSGLRPV DFGAGLE
Note: this sequence is a part of the Saccharomyces cerevisiae Swi6 protein, which is homologous to Mbp1 but does not contain an APSES domain. Its ankyrin domains have been structurally defined in the 1SW6 PDB file, they do not conform in all details to the canonical Ankyrin domain structure. The sequence segments colored grey are disordered in the protein structure.
Annotations
NCBI CDD APSES domain boundaries
- Derived from the results of a CDD search with the RefSeq ID NP_010227. There is one KilA-N superfamily domain alignment. This superfamily contains the APSES domains.
10 20 30 40 50 60 70 80 ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....| 1MB1 19 IHSTGSIMKRKKDDWVNATHILKAANFAKaKRTRILEKEVLKETHEKVQ----------------GGFGKYQGTWVPLNI 82 Cdd:pfam04383 3 YNDFEIIIRRDKDGYINATKLCKAAGATK-RFRNWLRLESTKELIEELSkennidvliievenkkGKNGRLQGTYVHPDL 81 90 ....*....|....* 1MB1 83 AKQLA----EKFSVY 93 Cdd:pfam04383 82 ALAIAswisPEFALK 96
NCBI CDD Ankyrin domain boundaries
- Derived from the results of a CDD search with the RefSeq ID (NP_010227). There are two, partially overlapping alignments with the profile, which contains 4 ANK repeats each. The aligned sequence is the consensus sequence for the profile (see the CDD output documentation for details).
- Alignment 1 - E-value = 1.69e-08
10 20 30 40 50 60 70 80 ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....| MBP1_SACCE 76 IDPELHTAFHWACSMGNLPIAEALYEAGTSIRSTNSQGQTPLMRSSLFHNsytrrtfPRIFQLLHETVFDIDSQS---QT 152 Cdd:cd00204 3 RDEDGRTPLHLAASNGHLEVVKLLLENGADVNAKDNDGRTPLHLAAKNGH-------LEIVKLLLEKGADVNARDkdgNT 75 90 100 110 120 130 ....*....|....*....|....*....|....*....|....*....|....*.... MBP1_SACCE 153 VIHHIVKRKSTtpSAVYYLdvvLSKIKDfspqyriellLNTQDKNGDTALHIASKNGDV 211 Cdd:cd00204 76 PLHLAARNGNL--DVVKLL---LKHGAD----------VNARDKDGRTPLHLAAKNGHL 119
- Alignment 2 - E-value=8.66e-05
10 20 30 40 50 60 70 80 ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*... MBP1_SACCE 192 NTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLT----ANEIMNQQYEQMMIQNGTNQHV--NSSNTDLNIHVNTNNIET 273 Cdd:cd00204 1 NARDEDGRTPLHLAASNGHLEVVKLLLENGADVNAKDNDGRTplhlAAKNGHLEIVKLLLEKGADVNArdKDGNTPLHLAARNGNLDV 88
SMART Annotations
A SMART search with the yeast Mbp1 protein sequence retrieved the APSES domain and three regions of similarity to ankyrin domains, annotated a number of low-complexity regions and a stretch of coiled coil. Annotations have been consolidated below.
SAS Annotations
A SAS FASTA search with yeast Mbp1 protein sequence retrieved the homologous Ankyrin sequence from Swi6 (PDB: 1SW6), together with secondary structure annotations. This structural annotation is based on homology to a protein of known structure. Annotations were consolidated into below.
While CDD, SMART and SAS all annotate the same general regions, they disagree in details of the domain boundaries and on the precise alignment.
Consolidated Annotation
MBP1_SACCE Annotations based on - CDD domain analysis, - SAS structure annotation and - literature data on binding region Keys: = domain annotation C Coiled coil regions predicted by Coils2 program x Low complexity region * Proposed binding region + positively charged residues, oriented for possible DNA binding interactions - negatively charged residues, oriented for possible DNA binding interactions E beta strand H alpha helix t beta turn Sequence that was invisible in the 1SW6 structure is listed in lowercase.
10 20 30 40 50 60 MSNQIYSARY SGVDVYEFIH STGSIMKRKK DDWVNATHIL KAANFAKAKR TRILEKEVLK 1MB1 ----EEEEEt t-EEEEEEEE t-EEEEEEtt ---EEHHHHH HH----HHHH HHHHhhhHHH * *+**-+**** Proposed DNA binding pfam04383 == ========== ========== ========== ========== (CDD alignment) pfam04383 ====== ========== ========== ========== ========== (SMART alignment) 70 80 90 100 110 120 ETHEKVQGGF GKYQGTWVPL NIAKQLAEKF SVYDQLKPLF DFTQTDGSAS PPPAPKHHHA 1MB1 ---EEE---- tt--EEEE-H HHHHHHHHH- --HHHHtt- **+*+***** **** Proposed DNA binding pfam04383 ========== ========== ========== === (CDD alignment) pfam04383 ========== ========== ========== ========== ========== = (SMART alignment) 130 140 150 160 170 180 SKVDRKKAIR SASTSAIMET KRNNKKAEEN QFQSSKILGN PTAAPRKRGR PVGSTRGSRR 190 200 210 220 230 240 KLGVNLQRSQ SDMGFPRPAI PNSSISTTQL PSIRSTMGPQ SPTLGILEEE RHDSRQQQPQ Low compl. xxxxx (SMART SEG) 250 260 270 280 290 300 QNNSAQFKEI DLEDGLSSDV EPSQQLQQVF NQNTGFVPQQ QSSLIQTQQT ESMATSVSSS Low compl.x xx xxxxxxxxxx xxxxxxxxxx (SMART SEG) 310 320 330 340 350 360 PSLPTSPGDF ADSNPFEERF PGGGTSPIIS MIPRYPVTSR PQTSDINDKV NKYLSKLVDY xxxxxxx (SMART SEG) Swi6 GPII TFTHDLTSDF LSSPLKIMKA LPSPVVNDNE QKM--KL-EA (SAS alignment: 1SW6) 1SW6 -EEE --tt---ttt ------EE-- ---t---HHH HHH--HH-HH (SAS 2° structure) 370 380 390 400 410 420 FISNEMK-------------------------------SNK SLPQVLLHPP PHSAPYIDAP IDPELHTAFH WACSMGNLPI AEALYEAGTS Swi6 FLQRLLFpeiqemptslnndssnrnseggssnqqqqhvSFD SLLQEVNDAF PNTQLNLNIP VDEHGNTPLH WLTSIANLEL VKHLVKHGSN (SAS alignment: 1SW6) 1SW6 HHHHHH- -HH HHHHHHHHH- t-----t--- --t----HHH HHHH--tHHH HHHHHH---- (SAS 2° structure) 430 440 450 460 470 480 IRSTNSQGQT PLMRSSLFHN SYTRRTFPRI FQLLHETVFD IDSQSQTVIH HIVKRKSTT---P Swi6 RLYGDNMGES CLVKAVKSVN NYDSGTFEAL LDYLYPCLIL EDSMNRTILH HIIITSGMTGCSA (SAS alignment: 1SW6) 1SW6 t---tt---- HHHHHHH--H HHH---HHHH HHHHHHHHHE E-t----HHH HHHHHH--t--HH (SAS 2° structure) 490 500 510 520 530 540 SAVYYLDVVL-------------------------------SKIKDFSPQY RIELLLNTQD KNGDTALHIA SKNGDVVFFN TLVKMGALTT Swi6 AAKYYLDILMGWIVKKQNRPIQSGtnekeskpndkngerkDSILENLDLKW IIANMLNAQD SNGDTCLNIA ARLGNISIVD ALLDYGADPF (SAS alignment: 1SW6) 1SW6 HHHHHHHHHHHHHHHHHH--EEE- -HHHHHt-HHH HHHH------ t----HHHHH HHH--HHHHH HHHH----t- (SAS 2° structure) 550 560 570 580 590 600 ISNKEGLTAN EIMNQQYEQM MIQNGTNQHV NSSNTDLNIH VNTNNIETKN DVNSMVIMSP Swi6 IANKSGLRPV DFGAG (SAS alignment: 1SW6) 1SW6 ---t----HH HH--- (SAS 2° structure) 610 620 630 640 650 660 VSPSDYITYP SQIATNISRN IPNVVNSMKQ MASIYNDLHE QHDNEIKSLQ KTLKSISKTK Coiled c. CCCCCCCC CCCCCCCCCC CCCCC (SMART COILS2) 670 680 690 700 710 720 IQVSLKTLEV LKESSKDENG EAQTNDDFEI LSRLQEQNTK KLRKRLIRYK RLIKQKLEYR Low compl. x xxxxxxxxxx xxxxxxx (SMART SEG) 730 740 750 760 770 780 QTVLLNKLIE DETQATTNNT VEKDNNTLER LELAQELTML QLQRKNKLSS LVKKFEDNAK 790 800 810 820 830 IHKYRRIIRE GTEMNIEEVD SSLDVILQTL IANNNKNKGA EQIITISNAN SHA
Orthologues
The Mbp1 orthologues in the six fungal reference species.
- Saccharomyces cerevisiae (SACCE)
- Aspergillus nidulans (ASPNI)
- Candida albicans (CANAL)
- Neurospora crassa (NEUCR)
- Schizosaccharomyces pombe (SCHPO)
- Ustilago maydis (USTMA)
Species | Code | Name | RefSeq | UniProt |
Saccharomyces cerevisiae | SACCE | Mbp1p | NP_010227 (FASTA) | P39678 |
Aspergillus nidulans | ASPNI | AN3154 | XP_660758 (FASTA) | Q5B8H6 |
Candida albicans | CANAL | potential DNA binding component of MBF | XP_723071 (FASTA) | Q5ANP5 |
Neurospora crassa | NEUCR | NCU07246 | XP_955821 (FASTA) | Q7RW59 |
Schizosaccharomyces pombe | SCHPO | MBF transcription factor complex subunit Res2 | NP_593032 (FASTA) | P41412 |
Ustilago maydis | USTMA | hypothetical protein UM06196.1 | XP_762343 (FASTA) | Q4P117 |