Difference between revisions of "Reference APSES domains (yeast)"

Latest revision as of 16:03, 26 September 2015

APSES domains in yeast

All APSES domains from APSES domain proteins in saccharomyces cerevisiae - annotated.

What is the APSES domain?

The APSES domain is a well-defined type of DNA-binding domain that is ubiquitous in fungi and unique in that kingdom. Structurally it is a member of the Winged Helix-Turn-Helix family. Recently it was found that it is homologous to the somewhat shorter, prokaryotic KilA-N domain; thus the APSES domain was retired from pFam and instances were merged into the KilA-N family. However InterPro has a KilA-N entry but still recognizes the APSES domain.

KilA-N domain boundaries in Mbp1 can be derived from the results of a CDD search with the ID 1BM8_A (the Mbp1 DNA binding domain crystal structure). The KilA-N superfamily domain alignment is returned.

(pfam 04383): KilA-N domain; The amino-terminal module of the D6R/N1R proteins defines a novel, conserved DNA-binding domain (the KilA-N domain) that is found in a wide range of proteins of large bacterial and eukaryotic DNA viruses. The KilA-N domain family also includes the previously defined APSES domain. The KilA-N and APSES domains may also share a common fold with the nucleic acid-binding modules of the LAGLIDADG nucleases and the amino-terminal domains of the tRNA endonuclease.

                            10        20        30        40        50        60        70        80
                    ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
1BM8A          16 IHSTGSIMKRKKDDWVNATHILKAANFAKaKRTRILEKEVLKETHEKVQ---------------GGFGKYQGTWVPLNIA 80
Cdd:pfam04383   3 YNDFEIIIRRDKDGYINATKLCKAAGETK-RFRNWLRLESTKELIEELSeennvdkseiiigrkGKNGRLQGTYVHPDLA 81
 
                            90
                    ....*....|....
1BM8A          81 KQLA----EKFSVY 90
Cdd:pfam04383  82 LAIAswisPEFALK 95

Note that CDD and SMART are not consistent in how they apply pFam 04383 to the Mbp1 sequence. See annotation below.

The CDD KilA-N domain definition begins at position 16 of the 1BM8 sequence. But virtually all fungal APSES domains have a longer, structurally defined, conserved N-terminus. Blindly applying the KilA-N domain definition to these proteins would lose important information. For most purposes we will prefer the sequence spanned by the 1BM8_A structure. The sequence is given below, the KilA-N domain is coloured dark green. By this definition the APSES domain is 99 amino acids long and comprises residues 4 to 102 of the NP_010227 sequence.

                            10        20        30        40        50        60        70        80
                    ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
1BM8A           1 QIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIA 80
 
                            90
                    ....*....|....*....
1BM8A          81 KQLAEKFSVYDQLKPLFDF 99

Yeast APSES domain sequence in FASTA format

>APSES_MBP1 Residues 4-102 of S. cerevisiae Mbp1
QIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRI
LEKEVLKETHEKVQGGFGKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDF

Synopsis of ranges

Domain	Link	Length	Boundary	Range (Mbp1)	Range (1BM8)

KilA-N: pfam04383 (CDD)	CDD alignment	72	`STGSI ... KFSVY`	21 - 93	18 - 90
KilA-N: pfam04383 (SMART)	Smart main page	79	`IHSTG ... YDQLK`	19 - 97	16 - 94
KilA-N: SM01252 (SMART)	Smart main page	84	`TGSIM ... DFTQT`	22 - 105	19 - 99...
APSES: Interpro IPR003163	(Interpro)	130	`QIYSA ... IRSAS`	3 - 133	1 - 99...
APSES (1BM8)	–	99	`QIYSA ... PLFDF`	4 - 102	1 - 99

To be newly created

@@ Line 1: / Line 1: @@
-__NOTOC__
+<div id="BIO">
+<div class="b1">
+APSES domains in yeast
+</div>
-;Multi FASTA file of all ''saccharomyces cerevisiae'' APSES domains.
+<section begin=contents_summary />
+All APSES domains from APSES domain proteins in ''saccharomyces cerevisiae'' - annotated.
+<section end=contents_summary />
-====Executing the PSI-BLAST search====
+__TOC__
-The starting point of this list is a PSI-BLAST search with '''one''' known APSES domain sequence. This query sequence - the Mbp1 APSES domain - was defined as follows, based on [http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=66020 Pfam profile 02292: APSES].
- >Yeast Mbp1 APSES domain (AA 24..102 of NP_010227)
+&nbsp;
- SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKY
+==What is the APSES domain?==
- QGTWVPLNIAKQLAEKFSVYDQLKPLFDF
-Even though we are only interested in yeast genes, including other sequences into the PSI-BLAST search will allow us to identify distant homologues as well. A PSI-BLAST search was executed, searching in the '''refseq''' subset of GenPept and selecting an Organism restriction to use only '''Fungi (taxid: 4751)'''. The default parameters for PSI-BLAST were used, except for using the BLOSUM45 matrix and an E-value threshold of 0.1, not 10.
+{{#lst:Reference annotation yeast Mbp1|CDD_APSES}}
-The search converged after 6 iterations, i.e. in the 6th iteration PSI-BLAST found no additional new hits above the inclusion threshold E-value of 0.005. Clicking on the '''Taxonomy reports''' link lists all hits sorted by the species they originate from. Clicking on the ''Saccharomyces cerevisiae'' link identifies the yeast genes that were found:
-  Saccharomyces cerevisiae S288c [ascomycetes] taxid 559292
- ref|NP_010227.1| Mbp1p [Saccharomyces cerevisiae S288c]           126  1e-37
- ref|NP_011036.1| Swi4p [Saccharomyces cerevisiae S288c]           103  9e-30
- ref|NP_013729.1| Sok2p [Saccharomyces cerevisiae S288c]            98  9e-28
- ref|NP_012881.1| Phd1p [Saccharomyces cerevisiae S288c]            94  2e-27
- ref|NP_012165.1| Xbp1p [Saccharomyces cerevisiae S288c]            55  1e-12
-One of these (Xbp1) is only a partial match with an alignment length of 55 amino acids. There is a somewhat complicated story here, therefore, for the purposes of the course I have removed Xbp1 from consideration as an APSES transcription factor. We will work from the four yeast gene families '''Mbp1, Swi4, Sok2, and Phd1'''.
+&nbsp;
-====Constructing the multi-FASTA file====
-A multi-FASTA file is the default input format for many MSA programs, it is simply a file that contains more than one FASTA formatted sequence.
-The PSI-BLAST search has already defined the sequences from each source protein that are similar to the APSES search profile. We only need to extract them in a convenient way from the search results. NCBI offers options to format the result page: they are presented from a link at the top of the BLAST results page: " Reformat these Results": the principal options for the format are:
+;To be newly created
-*'''Pairwise''': the default
-*'''Pairwise with identities''': showing only differences to the query sequence
-*'''query anchored with/without letters for identities''': looks something like a multiple sequence alignment, hyphens for gaps, insertions relative to the query  are displayed ''below'' the sequence
-*'''flat-query anchored with/without letters for identities''': This now looks like a multiple sequence alignment (in fact it '''is''' one - all sequences aligned to the profile).
-I have selected the  '''flat-query anchored with letters for identities''' option and restricted the output to ''saccharomyces cerevisiae'' sequences.
+<!--
+==Exercises==
+<section begin=exercises />
+<section end=exercises />
-; This is what I have received:
- Query      1    SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIA  60
+&nbsp;
- NP_010227  24   SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIA  83
+==Notes==
- NP_011036  60    VMRRTKDDWINITQVFKIAQFSKTKRTKILEKESNDMQHEKVQGGYGRFQGTWIPLDSA  118
+<references />
- NP_013729  436  SVVRRADNDMVNGTKLLNVTKMTRGRRDGILKAEKIRHV---VKIGSMHLKGVWIPFERA  492
- NP_012881  208  SVVRRADNNMINGTKLLNVTKMTRGRRDGILRSEKVREV---VKIGSMHLKGVWIPFERA  264
- NP_012165  343                                       NFTKRIRGGYIKIQGTWLPMEIS  365
- Query      61   KQLAEKFS--VYD-QLKPLFDF  79
- NP_010227  84   KQLAEKFS--VYD-QLKPLFDF  102
- NP_011036  119  KFLVNKYE--IIDPVVNSILTF  138
- NP_013729  493  LAIAQREK--IAD-YLYPLF    509
- NP_012881  265  YILAQREQ--ILD-HLYPLF    281
- NP_012165  366  RLLCLRFCFPIRY-FLVPIFG   385
-I can then simply copy the alignment, remove hyphens and create FASTA headers to which I have manually added some useful information. The "Query" itself (being identical to the original Mbp1 protein) and the Xbp1 partial match are not included.
+-->
+&nbsp;
+==Further reading and resources==
+<!-- {{#pmid:21627854}} -->
+<!-- {{WWW|WWW_UniProt}} -->
+<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
-====Yeast APSES domains====
+&nbsp;
+[[Category:Bioinformatics]]
-This is the final yeast APSES domain reference sequence set in multi-FASTA format.
+</div>
- >Mbp1_SACCE (79  ids)  NP_010227    (024..102)
- SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDF
- >Sok2_SACCE (74  ids)  NP_013729     (436..509)
- SVVRRADNDMVNGTKLLNVTKMTRGRRDGILKAEKIRHVVKIGSMHLKGVWIPFERALAIAQREKIADYLYPLF
- >Phd1_SACCE (74  ids)  NP_012881    (208..281)
- SVVRRADNNMINGTKLLNVTKMTRGRRDGILRSEKVREVVKIGSMHLKGVWIPFERAYILAQREQILDHLYPLF
- >Swi4_SACCE (79  ids)  NP_011036    (060..138)
- VMRRTKDDWINITQVFKIAQFSKTKRTKILEKESNDMQHEKVQGGYGRFQGTWIPLDSAKFLVNKYEIIDPVVNSILTF

Difference between revisions of "Reference APSES domains (yeast)"

Latest revision as of 16:03, 26 September 2015

Contents

What is the APSES domain?

Further reading and resources

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools