Relational database example 3-join tables
Relational database example 2-join tables
Summary ...
Contents
Related Pages
|
Introductory reading
In our database example we have used a shortcut when we loaded the data: the interactors int_A and int_B are identified by their gene names - not by their primary key. If we can somehow guarantee that the gene names are unique and will be unique in all future, that is all good. But since it is not we who decide on gene names, we can make no such guarantee and are thus placing the integrity of our datamodel into the hands of an unknown third party. It would be better to use our gene table's primary keys to identify the genes in our inx table. Here is how to use MySQL to achieve this with a join operation. Generally speaking, a join is an operation that compares the contents of two tables, where there is(are) some key(s) that rows in aech table have in common.
In our case, the gene.Gene_Name is the same as inx.int_A or inx.int_B. Therefore one can retrieve identifiers like in the following example.
SELECT gene.Gene_ID,inx.int_A FROM gene JOIN inx ON gene.Gene_Name=inx.int_A;
This joining of the data in two tables allows us to modify our interactions table.
Digression: backing up and restoring the database
Before we change the structure of our database, it is a really good idea to back-up. MySQL provides an easy tool for this. Enter the following (or similar, depending on your file/ directory names ) on the UNIX commandline:
/usr/local/mysql/bin/mysqldump -u ecodev -p --add-drop-table ecoint > ~/bak/ecoint_backup.01.sql
You can restore your database with:
/usr/local/mysql/bin/mysql -u ecodev -p ecoint < ~/bak/ecoint_backup.01.sql
continued: modifying the table structure
First we add two new columns to the table inx;
ALTER TABLE inx ADD COLUMN int_A_ID int; ALTER TABLE inx ADD COLUMN int_B_ID int;
This gives us:
mysql> describe inx; +----------+-------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------+-------------+------+-----+---------+----------------+ | int_ID | int(11) | NO | PRI | NULL | auto_increment | | int_A | varchar(10) | NO | | NULL | | | int_B | varchar(10) | NO | | NULL | | | weight | smallint(6) | NO | | NULL | | | int_A_ID | int(11) | YES | | NULL | | | int_B_ID | int(11) | YES | | NULL | | +----------+-------------+------+-----+---------+----------------+ 6 rows in set (0.00 sec)
Now we can collect the ID information and insert it into this table.
UPDATE inx JOIN gene ON inx.int_A=gene.Gene_Name SET inx.int_A_ID=gene.Gene_ID; UPDATE inx JOIN gene ON inx.int_B=gene.Gene_Name SET inx.int_B_ID=gene.Gene_ID;
Then we can verify that this was correct - first by checking that no NULL fields remin in either new column:
mysql> select * from inx where int_A_ID=NULL OR int_B_ID=NULL; Empty set (0.08 sec)
There are many further tests we could do, for example, we can join the table information for all self-interactions:
mysql> SELECT inx.int_ID, inx.int_A_ID, gene.gene_ID, inx.int_A, gene.gene_Name -> FROM inx JOIN gene ON inx.int_A=gene.Gene_Name -> WHERE inx.int_A=inx.int_B; +--------+----------+---------+--------+-----------+ | int_ID | int_A_ID | gene_ID | int_A | gene_Name | +--------+----------+---------+--------+-----------+ | 1 | 99 | 99 | glnL | glnL | | 2 | 871 | 871 | kdpD | kdpD | | 60 | 796 | 796 | hns | hns | | 80 | 661 | 661 | glnB | glnB | [...] | 1853 | 182 | 182 | napC | napC | | 1854 | 636 | 636 | fliM | fliM | | 1875 | 1432 | 1432 | Q83W87 | Q83W87 | +--------+----------+---------+--------+-----------+ 80 rows in set (0.39 sec) mysql>
That done and confirmed there is a little bit of housekeeping to do. First of all, we want to drop the int_A and int_B columns from the inx table. Why? Because we have declared them to be NOT NULL and they would prevent us from inserting new interactions without specifying gene names. Secondly we want to change the int_A_ID and int_B_ID columns to be NOT NULL. Here is how.
ALTER TABLE inx DROP COLUMN int_A; ALTER TABLE inx DROP COLUMN int_B;
You will notice that this is awfully terse syntax for very significant effect - and there is no confirm, or undo. Make it a habit to pause and think every time before you hit return!
ALTER TABLE inx CHANGE int_A_ID int_A_ID INT NOT NULL; ALTER TABLE inx CHANGE int_B_ID int_B_ID INT NOT NULL;
Note: it is not a typo that the column name appears twice. The first mention defines which column the change applies to, the second mention is part of the statement that defines the new column's characteristics. The name appears twice because it is not changed. Compare this to:
ALTER TABLE flub CHANGE foo bar INT NOT NULL;
This concludes our changes. To get at the original information (if needed), we can perform join statements like:
mysql> SELECT inx.int_ID, a.Gene_Name, b.Gene_Name, inx.weight -> FROM ((inx JOIN gene AS a ON inx.int_A_ID=a.Gene_ID) -> JOIN gene AS b ON inx.int_B_ID=b.Gene_ID) -> WHERE inx.int_ID>=100 AND inx.int_ID<=109; +--------+-----------+-----------+--------+ | int_ID | Gene_Name | Gene_Name | weight | +--------+-----------+-----------+--------+ | 100 | uvrB | uvrC | 2 | | 101 | dnaK | tig | 1 | | 102 | groL | hchA | 1 | | 103 | carB | carB | 3 | | 104 | holD | holB | 2 | | 105 | holC | dnaX | 2 | | 106 | dnaX | dnaQ | 1 | | 107 | rpsG | hrpA | 1 | | 108 | pnp | rluB | 1 | | 109 | pssA | ycbY | 1 | +--------+-----------+-----------+--------+ 10 rows in set (0.14 sec) mysql>
Note that we are using two different aliases of the same table gene AS a and gene AS b, otherwise we would not be able to keep the results apart in the SELECT statement.
Further reading
Further reading and resources