CGI
CGI
CGI - the Common Gateway Interface allows Web pages to interact with server-side programs.
Contents
Related Pages
|
Contents
The Common Gateway Interface (CGI) is a standard for connecting external applications (programs) with information servers, such as Web servers. A plain HTML document is static: it is a text file that doesn't change unless it is replaced. But often we want to have dynamic contents, contents that changes according to the needs of the client. Client-side execution of code can be achieved with scripts that are supported by a browser, such as Javascript. However, access to your own databases must typically be done on the server. A CGI script, is executed on the server; it can serve dynamic information and have access to all resources on your laboratory server that you wish to give it. Essentially, the Web server can invoke the program and serve the program's output in response to a client request, rather than a Web-page file, as it would normally do. This makes the construction of dynamic Web pages as easy as putting an executable script into the right directory. Many programming languages can be used for scripts that use the CGI interface in client-server interactions.
Apache and CGI
Apache needs to be configured to permit CGI execution. In a default installation, a variable called ScriptAlias in httpd.conf defines a directory for which Apache assumes that it contains CGI programs which it will attempt to execute, rather than serve as files.
$ grep " ScriptAlias " /usr/local/apache/conf/httpd.conf ScriptAlias /cgi-bin/ "/usr/local/apache/cgi-bin/" $
Thus in order to execute a script ...
- Apache will look for the requested program in the cgi-bin directory
- the program must exist
- the program must be executable by Apache
- the output will be returned to the requesting Web client.
- More detail
In a default installation of apache, if you type a filename into an URL, the file will be served, not executed, even if it is an executable file. In order to actually run it as a script, you have to configure apache.
- Navigate to the apache configuration directory. If this is a default installation, type
<text> $ cd /usr/local/apache2/conf (for Mac/UNIX) or $ cd /etc/apache2 (for Ubuntu/Debian style Linux) </text>
- Make a backup copy of the file
httpd.conf
<text> $ sudo cp httpd.conf httpd.conf.01
- note: on Ubuntu/Debian systems the httpd.conf file in /etc/apache2/ is blank. While you can add config settings here,
they are global in nature. Configuration of apache on Ubuntu/Debian systems is organized in apache2.conf where settings are controlled by directory with separate permissions for global, main server, and virtual hosts. Adding handlers or directives to httpd.conf may be redundant with parameters in apache2.conf, which will throw warnings when apache starts/restarts. Add the line "ServerName (yourservername)" to httpd.conf or the appropriate directory in apache2.conf to address the warning: "Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName". </text>
- Open the original
httpd.conf
for editing. (You will need the appropriate privileges to do that).
<text> $ sudo pico httpd.conf </text>
- First, you need to allow execution of files. apache allows this on a per-directory basis. Some administrators like to keep all executable files in a single directory called
cgi-bin
. Here, we simply allow execution of files from all directories. Look for the section that defines the Directory options for the Web-root directory. For a default installation this section will read ...
<text> <Directory "/usr/local/apache2/htdocs"> </text> ... below this, there is a line that defines the allowed options. Add the option ExecGGI. In a default installation it would read <text>
Options Indexes FollowSymLinks ExecCGI
- note: on Ubuntu/Debian enabling CGI in the localhost (/var/www/) directory for all users can be accomplished by adding
"Options Indexes FollowSymLinks ExecCGI" to the desired directory (/var/www/) in /etc/apache2/sites-available/default. </text>
- You then need to find the section that discusses the "AddHandler" directive and edit it to tell apache to execute perl scripts as cgi scipts. Un-comment the line that reads ...
# AddHandler cgi-script .cgi
</text>
... and change it to read .pl instead of .cgi:
<text>
AddHandler cgi-script .pl
- Save the modified
httpd.conf
and exit the editor.
- Confirm that you did not inadvertently change the owner or the permissions of the file.
- Restart apache to use the new configuration
<text> $ sudo /usr/local/apache2/bin/apachctl restart </text>
Read about more details and alternatives, troubleshooting and security issues in the apache documentation!
The simplest example
As long as the output of the program looks like an HTML page, all is well. There is one subtle and important difference however: while Apache will add the correct MIME type header to an .html file, based on its extension, it has no way of knowing what the output from your CGI program will be. HTML? Plain text? An audio clip? A Flash game? Thus you have to provide the correct header as the first line of your program output.
$ cd /usr/local/apache/cgi-bin $ sudo pico testscript.pl
Enter the following code, then save.
#!/usr/bin/perl -w print"Content-type: text/html\n"; # MIME header print "\n"; # Blank line: payload begins here print "\n"; print "\n"; print "Occam\'s Razor \n"; print "\n"; print "\n"; print "<h2>Entia non sunt multiplicanda sine necessitatem</h2>\n"; print "\n"; print "\n"; exit;
Then:
$ sudo chown root testscript.pl $ sudo chmod 755 testscript.pl $ ls -l testscript.pl -rwxr-xr-x 1 root wheel 368 Mar 22 18:31 testscript.pl $
The program does not have to be owned by root but in this way it is not possible for anyone to change it (after all you are executing code on behalf of a complete stranger somehwere on the Web) unless she has root privileges.
Now open your Web browser and type the following into the address field:
http://localhost/cgi-bin/testscript.pl
Voila. The program executes and sends the result to the client. All this takes is to request this script from the Web server. Of course, this could have been inserted as a clickable link into an HTML page, such as in this example:
<a href="http://localhost/cgi-bin/testscript.pl">Click here</a>
Generating a static file like this is perhaps not very useful, since we could have achieved the same by simply retrieving this file; no program code need be involved. Here is an example of something actually happening.
A minimal CGI example
- a URL is formatted to request an executable program
- the Web server executes the program
- the program runs and sends data to STDOUT
- the Web server accepts this data and sends it to the requesting browser
- Note
- Since this requires an active role of the remote Web server, this will not work by simply opening an HTML file in your browser locally. You actually have to run a Web server.
Code example
<perl>
- !/usr/bin/perl
- serverTime.pl - CGI example script.
- Boris Steipe
use strict; use warnings;
my $time = `date`;
print "Content-type: text/html\n\n"; print"\n"; print"\n"; print""; print"
$time
\n"; print"\n"; print"\n";exit(); </perl>
Notes on the code
date
is a unix command to convert the system's internal clock time (in seconds since the epoch) into a formatted output.date
is enclosed in "backticks": reverse slanted apostrophies. This is the perl code to run the enclosed string as a system command; the output from STDOUT is the result of this operation and can be assigned to a variable.- Content-type: text/html defines the MIME type of the output that our program is writing. It must be followed by a blank line, therefore
\n\n
. - <META HTTP-EQUIV="Refresh" CONTENT="10"> causes the page to reload itself in 10-second intervals. This is the so called "client-pull" mechanism to refresh contents (as opposed to "server-push").
How to run this script:
- Copy this code, edit it at leisure and save it e.g. under the name serverTime.pl.
- Upload the file to a directory on your server. For example I have a directory called test in my Web-root directory where I can store code I am just playing around with and that I may delete at any time. You might put it into a directory called course, or whatever.
chmod
the file to be executable. For examplechmod 755 /usr/local/apache2/htdocs/test/serverTime.pl
- Then you can paste the URL into a Web-browser
http://localhost/test/serverTime.pl
. The local time for your server should now be boldly displayed in the page that is returned. - Reload the page. You notice that the time changes. This is dynamically created content.
Recapitulate the process
- Your browser has sent a request to the server.
- The server has recognized that this is an executable cgi file.
- It has executed the script and has captured the output the script sent to STDOUT.
- It has sent the output back to the requesting browser.
- The browser has parsed the HTML and has formatted the output as a Web page.
Troubleshooting this script
- The server reports: "File not found".
- Are you sure the file is in the right place relative to document root? A file with the physical location
/usr/local/apache2/htdocs/test/serverTime.pl
has the URLhttp://localhost/test/serverTime.pl
.
- Are you sure the file is in the right place relative to document root? A file with the physical location
- The server reports "Internal server error". This can mean the file could not be executed, or it did not produce a valid output.
- Has the Directory directive been set as discussed?
- Does the server (i.e. "others") have read privileges in the directory?
- Does the server have execute privleges on the file? Set permissions as:
chmod 755 filename
. - Does the script have the extension
.pl
for which you have isssued anAddHandler
directive? - Does the script run at all? Does it run correctly when you run it from the commandline?
- Does the script output the correct header -
Content-type: text/html\n\n
- a declaration followed by a blank line?
- The server prints the source-code rather than executing the script. This means the server has treated the file as a text-file, rather than a script.
- Has the
AddHandler
directive been set inhttpd.conf
? - Did you restart apache after editing
httpd.conf
?
- Has the
A dynamic script
In order for client and server to communicate, they have to agree on a number of variables for the transaction that set the process' environment. Here is a program that will print out all the environment variables on the server side. This is easy since they are made availble to the perl process in the hash %ENV. Type ...
$ sudo cp testscript.pl env2html.pl $ sudo pico en2html.pl
then enter the following program code:
#!/usr/bin/perl -w use strict; print"Content-type: text/html\n"; # MIME header print "\n"; # Blank line: payload begins here print "<html>\n"; print "<head>\n"; print "<title>Current environment from %ENV</title>\n"; print "</head>\n"; print "<body>\n"; print "<h2>Environment Variables:</h2>\n"; print "<table border=\"1\">\n"; print " <tr><td><b>Variable</b></td><td><b>Value</b></td></tr>\n"; foreach my $key (keys %ENV) { print " <tr><td>$key</td><td>$ENV{$key}</td></tr>\n"; } print "</table>\n"; print "</body>\n"; print "</html>\n"; exit;
The file (once you save) should still be owned by root, executable and in the cgi-bin directory. Then type
http://localhost/cgi-bin/env2html.pl
into the address field of your browser ...
Next, we want to change the behaviour of the program, depending on input from the client. For this, we need to pass parameters into the program.
Passing parameters: GETting a QUERY_STRING
One of the environment variables is called QUERY_STRING and this constitutes one mechanism to pass parameters into the program: type the following into the browser address field:
http://localhost/cgi-bin/env2html.pl?My Special Data
You should notice two things: one, the Query_String variable has changed to what you typed after the questionmark, and two, the blank space has been encoded as "%20" - this URLencoding is a common mechanism to cast arbitrary characters into the valid character space for Internet URLs. Using this string is simply a question of retrieving it from %ENV, for example as
my $request = $ENV{'QUERY_STRING'};
However this is not advisable: it is far better and safer to use the methods available in the CGI package for such tasks. One more note: passing parameters in this way passes them through an HTTP GET request. Get requests are not supposed to modify the server state, multiple GET requests should always result in the same behaviour. However there is nothing that prevents you from writing CGI code that changes the server state - imagine something like:
http://localhost/cgi-bin/env2html.pl?command=delete_user&name=Blaise%20Pascal
If a data item is changed in such a way, a second GET request for this URL will fail. Nonetheless, the Web is full of such examples. Foor many developers simplicity trumps correctness.
See also: CGI GET on this wiki.
Passing parameters: POSTing a form
A (more correct and) more versatile way to pass parameters is to use HTML forms. For example, type the following into your text editor and save this as /usr/local/apache/htdocs/testform.html
See also: CGI POST on this wiki.
<html>
<head><title>Form test</title></head>
<body>
<h2>My Form</h2>
<form action="http://localhost/cgi-bin/env2html.pl" method="post">
<input type="hidden" name="occams_razor" value="entia non sunt multiplicanda sine necessitatem">
<input type="text" name="free text" size="20"><br>
Enter some text here.
<p>
<input type="submit" value="Click to submit">
</form>
</body>
</html>
Open this html file in your browser ...
http://localhost/testform..html
... and type something into the field and submit it. The output doesn't change a whole lot, but you can see that the REQEST_METHOD has changed to POST instead of GET and that the CONTENT_TYPE was defined as application/x-www-form-urlencoded. This means we have uploaded some data to the server but it would take a bit more programming to actually get at it. This is where we start using CGI. Change the program to the following, which includes a separate table for POSTed data:
#!/usr/bin/perl -w use strict; use CGI; print"Content-type: text/html\n"; # MIME header print "\n"; # Blank line: payload begins here print "<html>\n"; print "<head>\n"; print "<title>Current environment from %ENV</title>\n"; print "</head>\n"; print "<body>\n"; print "<h2>Environment Variables:</h2>\n"; print "<table border=\"1\">\n"; print " <tr><td><b>Variable</b></td><td><b>Value</b></td></tr>\n"; foreach my $key (keys %ENV) { print " <tr><td>$key</td><td>$ENV{$key}</td></tr>\n"; } print "</table>\n"; if ($ENV{'REQUEST_METHOD'} == 'POST') { print "<p>\n"; print "<h2>POSTed Contents:</h2>\n"; print "<table border=\"1\">\n"; my $form = CGI->new(); my @fields = $form->param(); # list of all posted parameters foreach my $field (@fields) { my $value = $form->param($field); print " <tr>"; print "<td>$field</td>"; print "<td>$value</td>"; print "</tr>\n"; } print "</table>\n"; } print "</body>\n"; print "</html>\n"; exit;
Then type something into your html form and submit it. Once this runs, we have all the elements for dynamic client-server transactions via HTTP in place:
- accepting input in a form in a Web browser page
- sending the input as a request to a server
- accepting the input, invoking a program and passing it the input
- analysing the input and using it in a program
- generating output and returning it to the requesting server
- displaying output in a Web browser
GBrowse does nothing else.
References
Further reading and resources