Reference APSES domains (reference species)
- Multi FASTA file of all APSES domains in fungal proteins.
Executing the PSI-BLAST search
A PSI-BLAST search was executed with default parameters, searching in the RefSeq database, restricted to Fungi. The query sequence - the Mbp1 APSES domain - was defined as follows
>Yeast Mbp1 APSES domain (AA 24..107 of NP_010227) SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKY QGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDG
The search returned 81 hits with significant e-values by the 5th iteration. 5 of these were from the organism Chaetomia globosum and were removed from the list since this is not one of the organisms we are studying. 6 hits were aligned only along a part of the APSES domain. For five of these hits, reasonable similarity to the whole APSES domain was independently verified by manually performing a Needleman-Wunsch optimal alignment with the Mbp1 APSES domain sequence. (EMBOSS NEEDLE using EBLOSUM 30, default gap parameters).
However the match to the Neurospora crassa protein XP_962373 suggested an incorrect gene model. Consider the alignment:
QUERY 1 SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKY 50 .:.:.:.||:....:.||..|. XP_962373 1 MLNQNPGLKDIAYSITGGAIKA 22 QUERY 51 QGTWVPLNIAKQLAEKF--SVYDQLKPLF--DFTQ---TDG 84 ||.|.|:..||::...| .:..:|.||| ||.. :.| XP_962373 23 QGYWMPYACAKAVCATFCYQIAGALIPLFGPDFPSECISPGEPRYGIMII 72
In this situation you have to be suspicious that the gene-finder algorithm skipped a part of the N-terminus. Or, the sequence was derived from a partial m-RNA. This sequence was removed from analysis.
Further, XP_712876 and XP_712970 were found to be identical sequences from the same organism. Only one of these duplicates was kept.
This gave a total of 74 ASPES domain sequences for analysis.
A multi-FASTA file
Since we are interested in only the APSES domain, we need to display the search results in an appropriate format. If we navigate to the page from where we sent the BLAST query, we have several options to display search results:
- Pairwise: the default
- Pairwise with identities: showing only differences to the query sequence
- query anchored with/without identities: looks something like a multiple sequence alignment, hyphens for gaps, insertions relative to the query are displayed below the sequence
- flat-query anchored with/without identitites: This now looks like a multiple sequence alignment (in fact it is one - all sequences aligned to the profile).
- hit-table: this gives only the numerical parameters describing the quality of the matches.
Using the flat-query anchored with/without identitites option, it is reasonably straightforward to obtain the aligned sequences, copy and paste them into a Word document and convert that into a multi-FASTA format with a few Edit > Replace commands. Of course, the sequences for which only partial matches were found need to be completed "by hand" (from the reults of the pairwise sequence alignment described above to validate these sequences).
Renaming sequences
To support the interpretation of alignments and gene trees, the Mbp1 orthologues for all species were named accordingly (e.g. MBP1_ASPFU
). All yeast genes were given the yeast-gene-name (e.g. SOK2_SACCE
). All other sequences were named with the last four digits of their RefSeq ID and a five character species code according to their species (e.g. SOK2_SACCE
). This is a pain to do by hand, so I wrote a little perl script to parse this information from the original BLAST report and modify the headers in the multi-FASTA file accordingly. However, note that renaming sequences is somewhat "cosmetic" and does not change the data or its interpretation.
The final 74 sequences
>MBP1_SACCE NP_010227 024..107 SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDG >MBP1_YARLI XP_500257 022..105 AVMRRKSDGWVNATHILKVAGFDKPQRTRILEKEVQKGVHEKVQGGYGKYQGTWVPLERAREIATLYDVDSHLAPIFNYDDEDG >XP_955821 037..118 VMRRRHDDWVNATHILKAAGFDKPARTRILEREVQKDTHEKIQGGYGRYQGTWIPLEQAEALARRNNIYERLKPIFEFQPGN >XP_569090 036..117 AVMRRRSDAYLNATQILKVAGFDKPQRTRVLEREVQKGEHEKVQGGYGKYQGTWIPIERGLALAKQYGVEDILRPIIDYVPT >MBP1_ASPNI XP_660758 028..110 SVMRRRSDDWINATHILKVAGFDKPARTRILEREVQKGVHEKVQGGYGKYQGTWIPLQEGRQLAERNNILDKLLPIFDYVAGD >MBP1_KLULA XP_454189 025..108 SIMKRKADNWVNATHILKAAKFPKAKRTRILEKEVITDTHEKVQGGFGKYQGTWIPLELASKLAEKFEVLDELKPLFDFTQQEG >MBP1_GIBZE XP_384396 045..129 AVMRRRNDSWLNATQILKVAGVDKGKRTKILEKEIQTGEHEKVQGGYGKYQGTWIKFERGLQVCRQYGVEELLRPLLTYDMGQDG >MBP1_ASPTE XP_001213217 028..110 SVMRRRADDWINATHILKVAGFDKPARTRILEREVQKGVHEKVQGGYGKYQGTWIPLPEGRLLAERNNIIDKLRPIFDYVAGD >MBP1_CANAL XP_723071 026..108 IMRRKKDSWINATHILKIAKFPKAKRTRILEKDVQTGIHEKVQGGYGKYQGTYVPLDLGAAIARNFGVYDVLKPIFEFQYIEG >MBP1_CANGL XP_445458 024..107 SIMKRKNDGWVNATHILKAANFAKAKRTRILEKEVLKEMHEKVQGGFGKYQGTWVPLNIAINLAEKFDVYQDLKPLFDFSEENG >XP_501770 036..116 AVMRRRTDSSLNATQILKVAGVEKSKRTKILEKEILTGAHEKVQGGYGKYQGTWIPYERGVDLCRQYSVYDVLQPLLAFDP >XP_362974 121..199 VMRRRVDDWINATHILKAAGFDKPARTRILEREVQKDQHEKVQGGYGKYQGTWIPLEAGEALAHRNNIFDRLRPIFEFS >XP_761485 182..262 AVMRRRGDGWLNATQILKIAGIEKTRRTKILEKSILTGEHEKIQGGYGKFQGTWIPLQRAQQVAAEYNVSHLLQPILEFDP >MBP1_USTMA XP_762343 026..107 AVMRRRSDDWLNATQILKVVGLDKPQRTRVLEREIQKGIHEKVQGGYGKYQGTWIPLDVAIELAERYNIQGLLQPITSYVPS >XP_390560 040..120 VMRRRSDDWINATHILKAAGFDKPARTRILERDVQKDVHEKIQGGYGKYQGTWIPLESGQALAERHSVIDRLRPIFEYVQG >XP_754232 001..081 MRRRGDDWINATHILKVAGFDKPARTRILEREVQKGTHEKVQGGYGKYQGTWIPLHEGRLLAERNNIIDKLRPIFDYVAGD >MBP1_CRYNE XP_570545 133..214 SVMRRASDSWVNATQILKVAGVHKSARTKILEKEVLNGIHEKIQGGYGKYQGTWVPLDRGRDLAEQYGVGSYLSSVFDFVPS >MBP1_NEUCR XP_962967 071..155 AVMRRQKDGWVNATQILKVANIDKGRRTKILEKEIQIGEHEKVQGGYGKYQGTWIPFERGLEVCRQYGVEELLSKLLTHNRGQEG >MBP1_DEBHA XP_458784 027..109 IMRRKLDSWINATHILKIAKFPKAKRTRILEKDVQTGVHEKVQGGYGKYQGTYVPLDLGADIAKNFGVFDSLRPIFEFTYVEG >XP_712876 006..088 SIMRRCKDDWVNATQILKCCNFPKAKRTKILEKGVQQGLHEKVQGGFGRFQGTWIPLEDARRLAKTYGVTEELAPVLFLDFSD >MBP1_MAGGR XP_365024 131..210 AVMKRIGDSKLNATQILKVAGVEKGKRTKILEKEIQTGEHEKVQGGYGKYQGTWIKYERALEVCRQYGVEELLRPLLEYN >XP_664319 119..198 AVMKRRSDGWLNATQILKVAGVVKARRTKTLEKEIAAGEHEKVQGGYGKYQGTWVNYQRGVELCREYHVEELLRPLLEYD >MBP1_ASPFU XP_748947 105..184 AVMKRRSDSWLNATQILKVAGVVKARRTKTLEKEIAAGEHEKVQGGYGKYQGTWVNYQRGVELCREYHVEELLRPLLEYD >MBP1_SCHPO NP_593032 027..110 SVMRRRRDSWLNATQILKVADFDKPQRTRVLERQVQIGAHEKVQGGYGKYQGTWVPFQRGVDLATKYKVDGIMSPILSLDIDEG >XP_001215548 007..086 AVMKRRSDSWLNATQILKVAGVVKARRTKTLEKEIAAGEHEKVQGGYGKYQGTWVNYQRGVDLCREYHVEELLRPLLEYD >NP_595496 026..106 LMKRCHDNWLNATQILKIAELDKPRRTRILEKFAQKGLHEKIQGGCGKYQGTWVPSERAVELAHEYNVFDLIQPLIEYSGS >XP_457246 028..109 IMRRCKDDWVNATQILKCCNFPKAKRTKILEKGVQQGLHEKIQGGYGRFQGTWIPLADAQRLAASYGVTPDLAPVLYLDASD >MBP1_EREGO NP_986147 031..114 SIMKRKADDWVNATHILKAAKFAKAKRTRILEKEVIKDTHEKVQGGFGKYQGTWVPLDIARRLAQKFEVLEELRPLFDFTRRDG >NP_986370 043..124 VMRRLHDDWVNITQVFKVATFSKTQRTKILEKESADISHEKIQGGYGRFQGTWIPLDSAKGLVAKYEITDIVVLTVINFQPD >SWI4_SACCE NP_011036 060..141 VMRRTKDDWINITQVFKIAQFSKTKRTKILEKESNDMQHEKVQGGYGRFQGTWIPLDSAKFLVNKYEIIDPVVNSILTFQFD >XP_454890 119..200 IMRRCNDNWLNITQVFKAGSFTKAQRTKILEKEANEIKHEKIQGGYGRFQGTWIPWESTKYLVEKYNINNKVVKRIVEFIPD >XP_444966 062..140 VMRRTMDDWVNVTQVFKIAQFSKTQRTKILEKESTNMKHEKVQGGYGRFQGTWVPLEAAKFMTTKYNIDNPVVNTILSF >XP_459785 307..380 SVVRRADNNMINGTKLLNVAQMTRGRRDGILKSEKVRHVVKIGSMHLKGVWIPFERALAMAQREGIVDLLYPLF >XP_663009 131..216 TVMWDYNIGLVRTTHLFKCNDYSKTTPAKMLNQNPGLRDICHSITGGALAAQGYWMPYEAAKAIAATFCWKIRFALTPLFGDNFPD >SOK2_SACCE NP_013729 436..509 SVVRRADNDMVNGTKLLNVTKMTRGRRDGILKAEKIRHVVKIGSMHLKGVWIPFERALAIAQREKIADYLYPLF >XP_449680 143..216 TVVRRADNDMVNGTKLLNVTGMTRGRRDGILKNEPVRDVVKGGPMTLKGVWIPIDRARAIARQEGIEQWLYPLF >NP_983001 352..425 SVVRRADNDMINGTKLLNVAKMTRGRRDGILKAEKVRHVVKIGSMHLKGVWIPFERALALAQREKIVDMLFPLF >XP_714197 227..300 SVVRRADNNMINGTKLLNVAQMTRGRRDGILKSEKVRHVVKIGSMHLKGVWIPFERALAMAQREQIVDMLYPLF >XP_714237 228..301 SVVRRADNNMINGTKLLNVAQMTRGRRDGILKSEKVRHVVKIGSMHLKGVWIPFERALAMAQREQIVDMLYPLF >XP_001218256 139..211 VARREDNSMINGTKLLNVAGMTRGRRDGILKSEKIRHVVKIGPMHLKGVWIPFERALEFANKEKITDLLYPLF >XP_663440 152..224 VARREDNGMINGTKLLNVAGMTRGRRDGILKSEKVRNVVKIGPMHLKGVWIPFDRALEFANKEKITDLLYPLF >XP_502292 285..357 VARREDNDMINGTKLLNVAGMTRGRRDGILKGEKLRHVVKAGAMHLKGVWIPYDRALEFANKEKIIDLLFPLF >XP_501102 130..202 VARREDNNMINGTKLLNVVGMTRGRRDGILKTEKIRHVVKIGAMHLKGVWIPYERALAFAQRERIVDVLYPLF >XP_755125 152..224 VARREDNHMINGTKLLNVAGMTRGRRDGILKSEKVRHVVKIGPMHLKGVWIPFERALEFANKEKITDLLYPLF >PHD1_SACCE NP_012881 208..281 SVVRRADNNMINGTKLLNVTKMTRGRRDGILRSEKVREVVKIGSMHLKGVWIPFERAYILAQREQILDHLYPLF >XP_448847 224..297 SVVRRADNDMINGTKLLNVTKMTRGKRDGILRSEKYRKVVKIGSMHLKGVWIPFERALFIAKREKIVDLLYPLF >XP_505499 080..165 IIWDYHTGYVHLTGLWKAIGNSKADIVKLIDNSPDLEAVIRRVRGGYLKIQGTWVPYDIARALASRTCYFIRFALIPLFGQDFPGT >XP_455299 386..459 SVVRRADNDMINGTKLLNVTRMTRGRRDGILKAEKIRHVVKIGSMHLKGVWIPFERALVMAQREKIVDLLYALF >XP_390305 226..298 VARREDNHMINGTKLLNVAGMTRGRRDGILKSEKVRHVVKIGPMHLKGVWIPYDRALDFANKEKITELLYPLF >XP_960837 139..211 VARREDNAMINGTKLLNVAGMTRGRRDGILKSEKVRHVVKIGPMHLKGVWIPFERALDFANKEKITELLYPLF >XP_368552 127..199 VARREDNHMINGTKLLNVAGMTRGRRDGILKSEKMRHVVKIGPMHLKGVWIPFERALDFANKEKITELLYPLF >XP_460447 213..285 VSRREDTNYVNGTKLLNVAGMTRGKRDGILKTEKTKSVVKVGAMNLKGVWIPFERASEIARNEGIDGLLYPLF >XP_389978 139..218 AVMWDYNIGLVRMTPFFKCRGYGKTIPAKMLGLNPGLKEITHSITGGSIAAQGYWMPYRCAKAICATFCHPIAGALIPIF >XP_711513 469..541 VSRREDTNYINGTKLLNVIGMTRGKRDGILKTEKIKNVVKVGSMNLKGVWIPFDRAYEIARNEGVDSLLYPLF >NP_596132 088..165 LRRCPDSYFNISQILRLAGTSSSENAKELDDIIESGDYENVDSKHPQIDGVWVPYDRAISIAKRYGVYEILQPLISFN >XP_751244 151..230 VMWDYNIGLVRTTHLFKCNDYSKMLNANPGLREICHSITGGALAAQGYWMPYEAAKAVAATFCWKIRHALTPLFGLDFPS >XP_760925 057..143 TMMIDVDTSFVRFTSITQALGKNKVNFGRLVKTCPALDPHITKLKGGYLSIQGTWLPFDLAKELSRRIAWEIRDHLVPLFGYDFPST >XP_001212599 130..218 IMWDYNIGLVRTTPLFRSQNYSKTTPAKVLDANPGLREISHSITGGAIVAQDKPGYWIPFEAAKAVAATFCWRIRYALTPIFGLDFPSQ >XP_459773 187..274 IIWDYETGFVHLTGIWKASINDEVNTHRNLKADIVKLLESTPKQYHQHIKRIRGGFLKIQGTWLPFDLCKMLAKRFCYHIRFQLIPIF >XP_710918 256..352 VIWDYETGWVHLTGIWKASLTIDGSNVSPSHLKADIVKLLESTPKEYQQYIKRIRGGFLKIQGTWLPYKLCKILARRFCYYLRYSLIPIFGTDFPDS >XP_459901 067..158 ILRRVQDSYINISQLFSILLKIGHLSEAQLTNFLNNEILTNTQYLSSGGSNPQFNDLRNHEVRDLRGLWIPYDRAVSLALKFDIYELAKSLF >XP_657766 089..163 LMRRSKDGYVSATGMFKIAFPWAKLEEERSEREYLKTRPETSEDEIAGNVWISPVLALELAAEYKMYDWVRALLD >XP_385459 077..154 LMRRSYDGFVSATGMFKASFPYAEASDEDAERKYIKSLPTTSHEETAGNVWIPPEQALILAEEYKISPWIRALLDPTP >XP_962267 085..162 LMRRSQDGYISATGMFKATFPYASQEEEEAERKYIKSIPTTSSEETAGNVWIPPEQALILAEEYQITPWIRALLDPSD >XP_753510 089..163 LMRRSKDGYVSATGMFKIAFPWAKLEEEKAEREYLKTREGTSEDEIAGNIWVSPLLALELAKEYQMYDWVRALLD >XP_363762 084..161 LMRRSSDGYVSATGMFKATFPYADAEDEEAERNYIKSLPATSKEETAGNVWISPDQALALAEEYSIATWIRALLDPTD >XP_723412 087..178 VLRRVQDSFVNVTQLFQILIKLEVLPTSQVDNYFDNEILSNLKYFGSSSNTPQYLDLRKHQNIYLQGIWIPYDKAVNLALKFDIYEITKKLF >NP_596166 062..140 LMRMAKDSSISATSMFRSAFPKATQEEEDLEMRWIRDNLNPIEDKRVAGLWVPPADALALAKDYSMTPFINALLEASST >XBP1_SACCE NP_012165 314..415 RDLICQSYKDFLINELGPDQIDLPNLNPANFTKRIRGGYIKIQGTWLPMEISRLLCLRFCFPIRYFLVPIFGPDFPKDCESWYLAHQNVTFASSTTGAGAAT >XP_001216355 084..197 TYFLMDGYVSATGMFKIAFPWAKLDEERSEREYLKSREETSEDEIAGNVWISPKLALELAGEYQMYNWVRALLDPTDIVQSPSSAKKQITPPPRYDLPPIEAPTQLTATSTRS >XP_369301 092..188 EEYTVMWDYGCGLVRMTHFFKCRGYTKTVPGKVLNQNHGLKDITYSITGGSISAQESPNFGRMVIDRELVAHATREAESMYGRSMQAQAQQQGPLR >XP_455262 289..388 YGKLDKPSKKDSQQKWNKWFQRESFSTYIDLHWHKLNPTLSTLLGQSYDAKIPFERMVKRIRGGYIKIQGTWLPYPVSKELCSRFCYPLRYLLVPLFGPDFPEKCEYWY >NP_983869 277..365 YTDVHWNQVDPTWKQRLCRLYQQEKNLDFTPEFQDCYKRIRGGYIKIQGTWLPMEICKRLCIRFCFPIRYFLVPIFGEGFLQECHNWYF >XP_446482 295..390 STSNSSVNYLDFHWFDISEKVRSQIFEQFKQHLEKDRNVDCSTIPKAEEYIQRIRGGYIKIQGTWVPWYIAKLICIRFCFPIRYLLVPIFGEQFPV