Perl: Ugly programming
Perl: Ugly programming
Summary ...
Contents
Contents
Feel free to add to this page or its discussion. I write about my personal style and preferences. Most things are debatable, sometimes there may even be valid reasons to not write code in the way it is proposed here. Tom Christianse's Perl style (on Perl.com) usually contradicts what I write here. My overriding principles are:
Code is read much more often than it is written.
and
Communication means making implicit knowledge explicit.
Pretty code communicates. Ugly code hisses. When you are writing code, you are communicating with someone. Trivially, that someone may be the compiler; more importantly, it may be a human, often yourself. Writing for the compiler is straightforward. The language has syntax and keywords and if you arrange them in away that the compiler understands and can translate into machine level instructions, your code runs. Else, an error will be reported that often points to where you went wrong. There are no shades of understanding, it either runs, or it doesn't. The purpose of human reading is very different: a human reader wants to understand the code on a higher level of insight into an algorithm, because she intends to debug, modify or extend the code in some way. Writing code for readability and maintainability is quite a different challenge from simple writing it to run.
As the author of code that has to be read by someone at a later time, it is your responsibility to communicate what you wanted to happen. If you make your intent explicit in the structure and syntax of your code, you are communicating. If you are using the quirks of the language, magical shortcuts (only because you can), or litlle used (usually for good reason) constructs, you are not communicating. Your code may run, but it is not really useful. Useful code can grow and can be changed. Some people find coding shortcuts cute, or believe they will appear to be experts because they can write code that only an expert can make any sense of or they believe the should be using some operator only because it exists or because they can. I disagree.
Structure of code: the position of subroutines
I can't for the life of me figure out why some programers start out with their subroutine code and only after everything else is presented, do they write the main body of their program. If you know why, I'd love to learn. It simply doesn't work for me. I want the high level description of program flow first (the main code) so I can have some context for the details (often I don't even need to read these). For me, readable code gets written like this:
# initializations first my $thingy = 1.0; # main body next ... do something ... exit(); # alway use exit() to make it explicit # finally, subroutines sub doSomething { ... do something ... return ($i); }
Action at a distance
One person's terse code is another person's enigma (or misery). $_, the default variable in Perl can be a powerful tool, or desperately abused for information hiding. In particular, if you pass data through $_ instead of exposing it to dedicated variables, if anything ever happens to $_ in-between (and it may: $_ is a global variable, plus you may put other code between two functions that were originally adjacent, as your code develops) then things blow up in a bad way. You introduce a side-effect that may not become apparent at the place where you actually were coding. You think it's easy to keep track of what changes $_ and what not? Are you sure you actually know if and when all modules you use change$_? You can't be. Unexpected modifications of $_ are one example of action at a distance (but there are much nastier examples, like changing $/ or even $[). Things that cause action at a distance are ugly.
Consider $_ in foreach loops:
Most Perl enthusiasts use foreach a lot, it is by many thought to be a paradigm of good, idiomatic Perl programming (maybe because other comon languages do not have this construct). I disagree. For one, in the way it is most often written ...
foreach (@array) { ... do something }
... it realies on $_, which is a Bad Thing. But even worse, it usually assumed and described to assign (set) array elelements to a variable. No, it doesn't assign! It aliases! $_ does not hold the contents of an array element, it is actually the same thing, by a different name.
Consider the folowing code: print the contents of an array, but when you encounter a lowercase a, change that to uppercase
#!/usr/bin/perl use strict; use warnings; my @array = ("jay ","crow ","raven"); print @array, "\n"; foreach (@array) { s/a/A/; print; } print "\n"; print @array; exit();
jay crow raven
jAy crow rAven
jAy crow rAven
... the contents of the array got changed in the loop! The substitution s/a/A/ operates on $_. Next, print; without arguments takes $_ as the default argument, so the entire foreachloop prints one element of the array in each iteration:
jAy crow rAven
Finally print @array; outside the loop prints the same array. But the elments have been changed! Clearly, if you assumed foreach worked like the assignment
$_ = $array[0] etc.
this would have been a surprise. But it gets worse. What do you think the following does? All I change is from
s/a/A/; print;
to:
print s/a/A/;
like this:
#!/usr/bin/perl use strict; use warnings; my @array = ("jay ","crow ","raven"); print @array, "\n"; foreach (@array) { print s/a/A/; } print "\n"; print @array; exit();
... it prints:
jay crow raven
11
jAy crow rAven
Now, why is that? After realizing the dangers of action at a distance inherent in the use of $_ , the Perl maintainers decided to implicitly localize $_ in foreach loops. Purge unexpected behaviour with more unexpected behaviour! Two wrongs make it right! Right?
What actually happens in the example, is that the $_ that s/// uses is a different $_ from that which foreach uses! print consumes the $_ of the substitution operator. The substitution succeeds once in jay and once in raven. Thus the lopp prints 1 and 1. But the array values get changed just the same.
Of course, if you don't care whether your code is readable as long as it's Perlish and idiomatic and whatever, you can do interesting stuff with this kind of behavior. Personally, I use dry, explicit loops that do look a lot like C: I explicitely iterate over the array, I explicitly assign its fields, I spell out my operations. I don't care for guesswork when I come back to Perl code after I've spent a month writing C, a week coding PHP and a term auditing a course in python. My code has to explain itself to me, not vice versa:
#!/usr/bin/perl use strict; use warnings; my @array = ("jay","crow","raven"); print (@array,"\n"); for (my $i=0; $i < scalar(@array); $i++) { my $tmp = $array[$i]; $tmp =~ s/a/A/; print ($tmp); } print "\n"; print (@array); exit();
Wordy ? Perhaps. Could be written much more tersely? No doubt. But it does exactly what I mean and when I reread this a year from now I will not have to hesitate for a moment or even reread the code to understand exactly is happening here. And in the end, this is what counts. And finally, less efficient? I doubt it. The compiler very likely maps both variants to pretty much the same machine code anyway. Importantly, it is clear. Clear is pretty.
Until and unless
You can always use until instead of while and unless instead of if. So you can write things like
unless ($string =~ /^#/) { ... do something }
It drives me crazy. Every time. I look at it, then scratch my head, read it first silently, then aloud... something somehow never seems right about this. I think this is useless, redundant cruft. Completely superfluous. Ugly as hell. Why anyone saw the need to formalize a way of expressing double negation really beats me. Because that's what this is: simply a double negation.
Consider the way an if statement works:
if (expression evaluates to TRUE) { ... do something }
wheras the same thing as an unless construct reads - note that "unless" means "except if" -
except if (expression evaluates to not TRUE) { ... do something }
Huh? Get it? I don't. I don't want to, it's ugly. Why not write:
if ($string !~ /^#/) { ... do something }
I suspect that my brain is simply trained to recognize the if and then ask for the one case when the (expression) is TRUE. While the unless seems to want me to figure out when the expression is TRUE and when it is FALSE, remember that, and then ask what it means that this is inverted by the condition. It seems to want me to consider a 4-way truth table. It confuses me. Ugly. Straightforward is pretty. Coding with less operators is also pretty. unless you are into Perl poetry.
In a Web page advocating good Perl style I read
usage() unless (@ARGV);
which means that if the program is invoked without arguments, it should print a usage statement and exit. To understand this, you have to remember that a condition is evaluated in scalar context and that if I evaluate an array in scalar context I get the length of the array and when the array has length zero, that zero is equivalent to a boolean FALSE. And then I need to understand that unless this FALSE condition is not met I invoke usage(). I find this ludicrous. I write:
if (scalar(@ARGV) == 0) { usage (); }
To understand this version you have to remember nothing at all because it tells in the most obvious way you that if the length of @ARGV is zero, usage() gets invoked. That's what I call pretty.
Notation
- use parentheses on function calls. Always!
Usually you don't need to. But it's easier to read. Easy is pretty. And good to remind yourself what is a function and what is not. exit(); is a function. return(); is a function. Even print(); is a funcion, even though most people use it like an incantation. Just speak print; and something will happen. Only sometimes that something is not what you think. Or what do you think the following prints:
print (1+2)*3, "\n";
???
TRUE or FALSE?
Another good way to hide what is really hapening is the following popular construct:
if (function) { ... do something }
to read this, you have to remember when an expression evaluates as TRUE. In fact Perl does not have built in Boolean types, so its functions and operators never return TRUE or FALSE but some value that is then interpreted according to the following conventions:
"" is FALSE 0 is FALSE undef is FALSE 1 is TRUE -1 is TRUE "FALSE" is TRUE "black" = "white" is TRUE (but only because "=" is the assignment operator and you probably meant "eq")
Why do I need to memorize this? Only so I can save some typing? And its especialy error-prone, since in shell programming it works the other way around: a return value of zero means all was good, a value of something is an errorcode. I prefer:
if ( function() == 1 ) { ... do something }
or
if ( not defined($array[$i]) ) { ... do something }
I state explicitly that I am evaluating a return value and what I expect the return value to be. Did I consider the case that my return value could be undefined? In the "Perlish" version I could not be sure, because it is implicitly included in Perls notion of TRUE. I could have, or I could have overlooked the possibility. In my version I have stated what I expect and there is no room for guesswork. It is more explicit. Explicit is pretty.
Context
Many operators silently change their implicit behavior according to the context they are used in. Sometimes they depend on list or scalar context. Sometimes they depend on other context. Some people like implicit behavior because it allows them to write terse code. I think it is ugly. It causes surprises and that ususually means things blow up. How do you guard against unexpected behavior? Making list- and scalar context explicit is straightforward, you can usualy write:
(something)
when it should be evaluated in list context (usualy forcing an assignment in list context); write
scalar(something)
when it should be evaluated in scalar context. But other operator behavior is more tricky. Consider the following examples (after the "Sins of Perl Revisited":
while (<FILE>) { my $line = <FILE>; # ... }
This is probably not what was intended, because when the condition (<FILE>)is evaluated, the diamond operator pulls a line from FILE. Then another line is requested and assigned to $line. This means only every other line is actually being used and the rest is skipped. However the following is also wrong:
while (something) { <FILE>; print "The next line is ", $_, "\n"; }
Because it is only in the context of the condition of a while-loop that <FILE> magically works as an assignment: $_ = <FILE>. In particular if (<FILE>) is evaluated, consumes a line from <FILE>, but does not set $_. It's again the curse of relying implicitly on $_. Explicitly written, the loop reads:
while (my $line = <FILE>) { # ... }
i.e. the condition is met as long as the assignment is successful, and $line is available in the loop. If EOF is encountered, or an error occurs, <FILE> becomes undefined, the assignment fails and the loop terminates. This avoids any assumptions about the context of <FILE> and its behaviour. Assumptions are ugly. Directions are pretty.
Resources
Further reading and resources