Difference between revisions of "Relational database example 1"
m |
m |
||
Line 15: | Line 15: | ||
<table id="toc" class="toc"><tr><td><div id="toctitle"><span style="font-size:125%; ">Related Pages</span></div> | <table id="toc" class="toc"><tr><td><div id="toctitle"><span style="font-size:125%; ">Related Pages</span></div> | ||
<ul> | <ul> | ||
− | <li><span class="toctext">[[ | + | <li><span class="toctext">[[Data modelling]]</span></li> |
<li><span class="toctext">[[Relational database example 2-Perl interface]]</span></li> | <li><span class="toctext">[[Relational database example 2-Perl interface]]</span></li> | ||
<li><span class="toctext">[[Relational database example 3-join tables]]</span></li> | <li><span class="toctext">[[Relational database example 3-join tables]]</span></li> |
Revision as of 12:46, 27 September 2015
Relational database example 1: task and model
Summary ...
Contents
Related Pages
|
Introductory reading
A worked database example
In this example, we will create a datamodel for a protein-protein interaction network, create the database as a MySQL database, load data into the database and perform interactive queries and updates.
To prepare -
- read the section on Database_introduction and make sure you understand the concepts of an ERD (Entity relationship diagram) and a database schema.
- read the section on MySQL_installation_notes, install MySQL on your computer and start the mysqld database server.
Task 1: the data
- Access the Microbial Protein Interaction Database and select interactions for Escherichia coli.
- Download the results to your computer as a tab-delimited file, name it mpidbRaw.txt.
- The file is a tab-delimited text file and we would like to extract only two columns from it:
- The gene-name part of "Alias(es) interactor A"
- The gene-name part of "Alias(es) interactor B"
- The number of different "Interaction detection method(s)" with which this interaction has been detetcted, as a stand-in for a confidence score
- Without further explanation, here is a Perl one-line script to extract the interactions from the raw data. Copy it and enter it on the commandline.
- perl -e '<STDIN>;while(<STDIN>){s/uniprotkb://g;split("\t");if($_[4]&&$_[5]){print"$_[4]\t$_[5]\t",scalar(split(":",$_[13])),"\n"}}' < mpidbRaw.txt > EcoIntx.txt
- Without further explanation, here is a Perl one-line script to extract the gene name aliases from the interactions. Copy it and enter it on the commandline.
- perl -e 'while(<STDIN>){split("\t");$g{$_[0]}++;$g{$_[1]}++;}foreach(keys(%g)){print"$_\n"};' < EcoIntx.txt > EcoGenes.txt
- Check your results - they should look something like this:
shell> head EcoIntx.txt glnL glnL 1 kdpD kdpD 1 zraS zraR 2 creC creB 2 torR torS 2 yedW yedV 2 hnr arcB 1 cheY arcB 1 basS basR 2 rstB rstA 2 shell> head EcoGenes.txt cvpA recX rfaS dapE emrY ycjZ papK2 ccmD yhgF murE shell>
Task 2: the schema
- draw up an ERD for an interaction database based on this data. Your datamodel should be able to store information on each gene and on each interaction. This requires two separate tables.
- define a schema for your datamodel. Remember to define a primary key - don't rely on either gene names or interactions to be unique.
Task 3: create the database
- Refer to the explanations and examples in the Database_introduction section and run mysql on the commandline (as root) to CREATE a database called ecoint and GRANT all privileges on that database to a user named ecodev; also GRANT FILE privileges to that user.
Tip: Do not name your interaction table "int". This is a reserved word that would need to be quoted to be used in this way. Otherwise it will result in an error. Better: use a different table name.
Task 4: create the tables
- run mysql on the commandline as user ecodevand USE the ecoint database. Use the CREATE TABLE command to create the tables you have defined in your schema. Use the DESCRIBE command to check that the tables exist and have the intended attributes. For example, one output could look like this:
mysql> describe gene; +-----------+-------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------+-------------+------+-----+---------+----------------+ | Gene_ID | int(11) | NO | PRI | NULL | auto_increment | | Gene_Name | varchar(10) | NO | | NULL | | +-----------+-------------+------+-----+---------+----------------+ 2 rows in set (0.01 sec)
Task 5: load the data
- Use the LOAD DATA INFILE filename INTO table (column_1, column_2, ...); syntax to load the data you have prepared into the database. Remember to enter the full path for your file and to quote the filename. Specify only the datacolumns you need - the identifiers (with autoincrement) will be automatically set. If you get a privilege error at that point, you probably forgot to give FILE privileges to your account.
On some Linux distros the default file permissions for mysql are very strict and set globally. If your data file is in a directory mysql doesn't like (for ubuntu8 it seems to be anything in /home), the load data operation might return something like "ERROR 13 (HY000): Can't get stat of /path/file..." even if you are logged into mysql as root or a user with read/write access to the file. It probably involves this bug.
An easier work around is to save your data somewhere in /tmp which circumvents error 13.
Task 6: try some SELECT statements
- Use SELECT statements to verify that your data has been loaded. For example you could try the following (or similar, depending on the column/table names you have actually used):
SELECT * FROM gene WHERE Gene_Name='fimC';
or
SELECT * FROM inx WHERE weight > 6;
or
SELECT * FROM inx WHERE (int_A LIKE 'mal%' OR int_B LIKE 'mal%');
Further reading
Further reading and resources