Difference between revisions of "Relational database example 1"

Revision as of 12:46, 27 September 2015

Relational database example 1: task and model

The contents of this page has recently been imported from an older version of this Wiki. This page may contain outdated information, information that is irrelevant for this Wiki, information that needs to be differently structured, outdated syntax, and/or broken links. Use with caution!

Summary ...

Introductory reading

A worked database example

In this example, we will create a datamodel for a protein-protein interaction network, create the database as a MySQL database, load data into the database and perform interactive queries and updates.

To prepare -

read the section on Database_introduction and make sure you understand the concepts of an ERD (Entity relationship diagram) and a database schema.
read the section on MySQL_installation_notes, install MySQL on your computer and start the mysqld database server.

Task 1: the data

Access the Microbial Protein Interaction Database and select interactions for Escherichia coli.

Download the results to your computer as a tab-delimited file, name it mpidbRaw.txt.
The file is a tab-delimited text file and we would like to extract only two columns from it:
- The gene-name part of "Alias(es) interactor A"
- The gene-name part of "Alias(es) interactor B"
- The number of different "Interaction detection method(s)" with which this interaction has been detetcted, as a stand-in for a confidence score
Without further explanation, here is a Perl one-line script to extract the interactions from the raw data. Copy it and enter it on the commandline.
- perl -e '<STDIN>;while(<STDIN>){s/uniprotkb://g;split("\t");if($_[4]&&$_[5]){print"$_[4]\t$_[5]\t",scalar(split(":",$_[13])),"\n"}}' < mpidbRaw.txt > EcoIntx.txt
Without further explanation, here is a Perl one-line script to extract the gene name aliases from the interactions. Copy it and enter it on the commandline.
- perl -e 'while(<STDIN>){split("\t");$g{$_[0]}++;$g{$_[1]}++;}foreach(keys(%g)){print"$_\n"};' < EcoIntx.txt > EcoGenes.txt

Check your results - they should look something like this:

shell> head EcoIntx.txt
glnL    glnL    1
kdpD    kdpD    1
zraS    zraR    2
creC    creB    2
torR    torS    2
yedW    yedV    2
hnr     arcB    1
cheY    arcB    1
basS    basR    2
rstB    rstA    2
shell> head EcoGenes.txt
cvpA
recX
rfaS
dapE
emrY
ycjZ
papK2
ccmD
yhgF
murE
shell>

Task 2: the schema

draw up an ERD for an interaction database based on this data. Your datamodel should be able to store information on each gene and on each interaction. This requires two separate tables.
define a schema for your datamodel. Remember to define a primary key - don't rely on either gene names or interactions to be unique.

Task 3: create the database

Refer to the explanations and examples in the Database_introduction section and run mysql on the commandline (as root) to CREATE a database called ecoint and GRANT all privileges on that database to a user named ecodev; also GRANT FILE privileges to that user.

Tip: Do not name your interaction table "int". This is a reserved word that would need to be quoted to be used in this way. Otherwise it will result in an error. Better: use a different table name.

Task 4: create the tables

run mysql on the commandline as user ecodevand USE the ecoint database. Use the CREATE TABLE command to create the tables you have defined in your schema. Use the DESCRIBE command to check that the tables exist and have the intended attributes. For example, one output could look like this:

mysql> describe gene;
+-----------+-------------+------+-----+---------+----------------+
| Field     | Type        | Null | Key | Default | Extra          |
+-----------+-------------+------+-----+---------+----------------+
| Gene_ID   | int(11)     | NO   | PRI | NULL    | auto_increment | 
| Gene_Name | varchar(10) | NO   |     | NULL    |                | 
+-----------+-------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)

Task 5: load the data

Use the LOAD DATA INFILE filename INTO table (column_1, column_2, ...); syntax to load the data you have prepared into the database. Remember to enter the full path for your file and to quote the filename. Specify only the datacolumns you need - the identifiers (with autoincrement) will be automatically set. If you get a privilege error at that point, you probably forgot to give FILE privileges to your account.

On some Linux distros the default file permissions for mysql are very strict and set globally. If your data file is in a directory mysql doesn't like (for ubuntu8 it seems to be anything in /home), the load data operation might return something like "ERROR 13 (HY000): Can't get stat of /path/file..." even if you are logged into mysql as root or a user with read/write access to the file. It probably involves this bug.

An easier work around is to save your data somewhere in /tmp which circumvents error 13.

Task 6: try some SELECT statements

Use SELECT statements to verify that your data has been loaded. For example you could try the following (or similar, depending on the column/table names you have actually used):

SELECT * FROM gene WHERE Gene_Name='fimC';

or

SELECT * FROM inx WHERE weight > 6;

or

SELECT * FROM inx WHERE (int_A LIKE 'mal%' OR int_B LIKE 'mal%');

@@ Line 15: / Line 15: @@
 <table id="toc" class="toc"><tr><td><div id="toctitle"><span style="font-size:125%; ">Related Pages</span></div>
 <ul>
-<li><span class="toctext">[[Relational database principles]]</span></li>
+<li><span class="toctext">[[Data modelling]]</span></li>
 <li><span class="toctext">[[Relational database example 2-Perl interface]]</span></li>
 <li><span class="toctext">[[Relational database example 3-join tables]]</span></li>

Difference between revisions of "Relational database example 1"

Revision as of 12:46, 27 September 2015

Contents

Introductory reading

A worked database example

Task 1: the data

Task 2: the schema

Task 3: create the database

Task 4: create the tables

Task 5: load the data

Task 6: try some SELECT statements

Further reading

Further reading and resources

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools