Extreme Perl - Chapter 10: Coding Style

Extreme Perl: Chapter 10: Coding Style An Evolving Book
about Extreme Programming
with Perl

dot

Previous: Chapter 9: Logistics		Next: Chapter 11: Test-Driven Design

Language requires consensus. -- Larry Wall[1] Code is the primary means of communication in an XP team. A uniform coding style greatly facilitates code comprehension, refactoring, pair programming, collective ownership and testing. An XP team agrees on a coding style before development starts. Coding style is the first problem and XP team has to work out as a group. The solution to the problem needs to be clear and unambiguous. It's amazing how far this can be, and with some teams it's impossible. Coding style discussions are like a lightning rod. If there's a storm brewing within the team, coding style will usually attract the first lightning strike. If you can't reach agreement on a style, your team is going to have difficulty building an application together. Tension around style choices is natural in some ways. We are all individuals. Programmers take pride in their own work, further motivating their own success, just as individual athletes value their own accomplishments. However, not even the best pitcher, quarterback, or forward in the world can win a game alone. It takes a team and teamwork to win a game or write a large application. Programming is a team sport.[2] If you are a programmer, you may find yourself gritting your teeth at my coding style. It wouldn't surprise me. Athletes and programmers on different teams sometimes bristle at each other's style. However, if we were to join the same team, we'd work out a compromise on coding style to ensure the success of the project. This chapter explains the need for a coding style in XP and discusses how to go about creating one. I also explain and demonstrate the coding style used in this book through a comparative example. There's More Than One Way To Do It Perl is a rich and complex language. If you ask a question about how to do something in Perl on the Internet, you'll probably get several different answers. And, the answers will often include the caveat: TMTOWTDI. This is the acronym for Perl's motto: There's more than one way to do it. The solution you choose will depend on the way you program Perl. So how do you program Perl? Larry Wall et al present a coding style in the `perlstyle` man page.[3] Yet there are myriad divergent styles on CPAN and in the Perl literature. In the Perl community, diversity is seen as a strength, and no one is going to tell you how to program Perl. Well, even if they did, you wouldn't listen to them. Give Me Consistency or Give Me Death Your team still needs to pick a style. This isn't just XP dogma; it's human nature. In the anthropology classic, The Silent Language, Edward Hall wrote, "The drive toward congruity would seem to be as strong a human need as the will to physical survival." I conclude from this that if you don't pick a Perl coding style, you'll die. If that isn't a good enough reason, stop reading now. Seriously, consistency is not an end in itself, it is the means to facilitate testing, collective ownership, pair programming, and refactoring. If you are developing a small application (a few thousand lines of Perl), it's easy to keep the code consistent, or to clean it up in an afternoon or two. For large applications (tens or hundreds of thousands of lines spread over hundreds or thousands of files), quick fixes are impossible. You would never have the time to reformat the entire codebase. The code changes too quickly. You don't get to ask everybody working on a large application to stop while you fix some style issue. However, for some necessary refactorings, such as, a change in a widely used API, you may have no choice but to dive into tens or possibly hundreds of files. You'll want this to happen as quickly as possible so you'll automate the refactoring. With a consistent style, you can probably do this fairly easily. If you have to account for the many ways you can do things in Perl, you'll probably resort to hand editing each file. Not only is this labor intensive, but it's error prone, too. Even when you aren't making global changes, you and your partner still have to read unfamiliar code. A programming pair's ability to read and to communicate through the code is affected directly by its consistency. Communication is hard enough without you and your partner having to wade through several code dialects. And, allowing for style variations when writing code opens up too many unnecessary thoughts. Do I adapt to my partner's style? Should we adopt the style of the code we're editing now? Or perhaps, I should insist on my style. After all, it has worked well for me over the years. Starting out with an agreed upon style, frees our minds of such distractions and allows us to focus on the important bit: solving the customer's problem. Team Colors In Extreme Programming Explained, Kent Beck wrote, "The standard must be adopted voluntarily by the whole team." This may not be so simple. Establishing consensus requires work on everybody's part. If your team has coded together before, you'll probably have an easy time agreeing on a style. For newly formed teams, use the style guide as a team building exercise. Everyone should be encouraged to contribute. If a particular point is too contentious, drop it until after the first iteration or so. The goal is to get full consensus on the entire guide. If someone is particularly inflexible during the discussions, it's a warning sign that a team adjustment may be necessary. Better sooner than later. A style guide can be highly motivating, however. It's like your team's colors. It's something relatively insignificant which provides significant cohesion. If even one team member is coerced into agreement, the team isn't sticking together, and the rift may grow into a chasm. When everybody voluntarily accepts the style choices, you are functioning as a team, and you are ready to code. An Example Rather than discuss style in the abstract, I'd like to demonstrate my opinion of good style through a comparative example. Here's an excerpt from the popular `Test` package on CPAN: [4] package Test; use strict; sub plan { croak "Test::plan(%args): odd number of arguments" if @_ & 1; croak "Test::plan(): should not be called more than once" if $planned; local($\, $,); # guard against -l and other things that screw with # print _reset_globals(); _read_program( (caller)[1] ); my $max=0; for (my $x=0; $x < @_; $x+=2) { my ($k,$v) = @_[$x,$x+1]; if ($k =~ /^test(s)?$/) { $max = $v; } elsif ($k eq 'todo' or $k eq 'failok') { for (@$v) { $todo{$_}=1; }; } elsif ($k eq 'onfail') { ref $v eq 'CODE' or croak "Test::plan(onfail => $v): must be CODE"; $ONFAIL = $v; } else { carp "Test::plan(): skipping unrecognized directive '$k'" } } my @todo = sort { $a <=> $b } keys %todo; if (@todo) { print $TESTOUT "1..$max todo ".join(' ', @todo).";\n"; } else { print $TESTOUT "1..$max\n"; } ++$planned; print $TESTOUT "# Running under perl version $] for $^O", (chr(65) eq 'A') ? "\n" : " in a non-ASCII world\n"; print $TESTOUT "# Win32::BuildNumber ", &Win32::BuildNumber(), "\n" if defined(&Win32::BuildNumber) and defined &Win32::BuildNumber(); print $TESTOUT "# MacPerl verison $MacPerl::Version\n" if defined $MacPerl::Version; printf $TESTOUT "# Current time local: %s\n# Current time GMT: %s\n", scalar( gmtime($^T)), scalar(localtime($^T)); print $TESTOUT "# Using Test.pm version $VERSION\n"; # Retval never used: return undef; } The routine `plan` is used to set up a unit test, for example: use Test; use strict; BEGIN { plan(tests => 2); } ok(1 + 1 == 2); ok(2 * 2 == 4); This unit test calls `plan` to declare that there are two test cases. The `ok` function checks the result after each case executes, and prints success or failure. Here's what I think is good about this implementation of `plan`. The routine: is well-used and mature. We can be relatively sure it addresses the needs of its users and is relatively stable. has had several authors. The more eyes on a problem, the better the solution. addresses type safety. `Test` has a broad user base, so it makes sense to put extra effort into argument validation. fails fast, that is, if `plan` encounters an unexpected argument or state, it terminates execution (calls `croak`) in all but one case. The sooner a programming error is detected, the less damage the errant program can do. is backwards compatible. The parameters `test` and `failok` are deprecated, that is, they shouldn't be used in new tests, but existing tests that use them still work. comes with a thorough unit test suite that describes expected behavior and enables refactoring. uses fine-granularity, feature-driven portability. It uses narrow feature checks (for example, `defined $MacPerl::Version` and `chr(65) eq 'A')`) instead of broad general checks, such as, checking the operating system (for example, `$^0 eq 'MacOS'`). Powerful and easy to use introspection is one of the reasons Perl and CPAN packages are usable on so many platforms. You Say, "`if else`", And I Say, "`? :`" While indentation, lining up braces, or other formatting is important to ease automated refactoring, you won't find much discussion about them in this book. However, the more strictly you follow your coding style, the more easily you can automate refactorings. For example, if function arguments are always surrounded by parentheses, you can rename functions or reorder parameters using a simple editor macro.[5] And speaking of editors, most programmers' editors have style formatters. If yours doesn't, you can always use `perltidy`, a very flexible Perl code reformatter.[6] Automatic formatters improve your team's efficiency and adherence to your style guidelines. That's all I'm going to say about formatters and editors. The only thing worse than coding style discussions are editor wars. Like editors, style choice is defined by your experience and personal taste, and the details matter. I follow the `perlstyle` man page for the most part, but I disagree with some of their parentheses and alignment choices. You may not like my style, so go ahead and indent by three spaces if you like.[7] Once And Only Once Parentheses and indentation aside, the important bit of my style is that I refactor ruthlessly. I don't like redundancy, especially in the form of single use temporary variables and repetitive calls. In XP, we call this the once and only once (OAOO) rule, and it's what you do when you refactor. For new code, I try to do the simplest thing that could possibly work (DTSTTCPW). This is XP's most important coding guideline. First I get it working simply. I might have to copy and paste some code or create a temporary variable. Once it passes the tests, I look at the design and simplify it so that each concept is expressed once and only once. There are some time and planning trade offs here, and the Refactoring chapter discusses them. The relevant point is that once and only once is an overarching style guidline, and one that I value highly. When concepts are expressed once and only once, the code is more robust, more easily extensible, and performs better than code with needless duplication. Refactored Example The code that follows has been changed to demonstrate once and only once and other style choices I value. The formatting matches the style used in this book. More importantly, the rewritten code is more cohesive. Not only should each concept be expressed only once, but each routine should implement only one concept. Strong cohesion allows people to comprehend and to abstract code entities (routines and packages) easily. By isolating and naming each behavior, we make it easy to understand what each piece of the software puzzle does. The four new routines are also loosely coupled. This means their inputs and outputs are few and well-defined. Loose coupling is important when isolating behavior, because it is difficult to understand and to test a routine with many inputs and outputs. In effect, the routine's identity is a combination of its name and its inputs and outputs, which is commonly know as its signature. We remember shorter signatures and the behavior they identify more easily than longer ones. That's enough theory for now, here's the my version of `plan`: package Test; use strict; sub plan { my($args) = {@_}; Carp::croak('should not be called more than once') if $_TODO; _reset_globals(); _read_program((caller)[1]); _plan_print(_plan_args($args)); return; } sub _plan_args { my($args) = @_; $_ONFAIL = _plan_arg_assert($args, ['onfail'], 'CODE'); my($max) = _plan_arg_assert($args, ['tests', 'test'], 'integer') \|\| 0; # $_TODO is the initialization sentinel, so it's the last value set $_TODO = {map {$_ => 1} @{_plan_arg_assert($args, ['todo', 'failok'], 'ARRAY') \|\| []}}; Carp::carp("@{[sort(keys(%$args))]}: skipping unrecognized or", ' deprecated directive(s)') if %$args; return $max; } sub _plan_arg_assert { my($args, $names, $type) = @_; foreach my $n (@$names) { next unless exists($args->{$n}); Carp::croak("$n: parameter must not be undef") unless defined($args->{$n}); Carp::croak("$args->{$n}: $n must be $type") unless $type eq 'integer' ? $args->{$n} =~ /^\d+$/ : ref($args->{$n}) eq $type; return delete($args->{$n}) } return undef; } sub _plan_print { my($max) = @_; _print(join("\n# ", "1..$max" . (%$_TODO ne '' && " todo @{[sort {$a <=> $b} keys(%$_TODO)]};"), "Running under perl version $] for $^O" . (chr(65) ne 'A' && ' in a non-ASCII world'), defined(&Win32::BuildNumber) && defined(Win32::BuildNumber()) ? 'Win32::BuildNumber ' . Win32::BuildNumber() : (), defined($MacPerl::Version) ? "MacPerl version $MacPerl::Version" : (), 'Current time local: ' . localtime($^T), 'Current time GMT: ' . gmtime($^T), "Using Test.pm version $VERSION\n")); return; } sub _print { local($\, $,); return print($TESTOUT @_); } Change Log The following is a detailed list of changes, and why I made them. Most of the changes are refactorings, that is, they do not modify the way `plan` works from the caller's perspective. A few changes improve the behavior ever so slightly, and are noted below. This list is ordered from most important to trivial: The four main behaviors: control flow, validating arguments, type checking, and printing are contained in separate routines. Each routine is responsible for one and only one behavior. The `localtime` and `gmtime` calls are now in the correct order. This defect in the original version only became apparent to me when I separated the two output lines. Argument type validation is consistent, because it has been isolated into a single routine (`_plan_arg_assert`) that is used for all three parameters. Several new cases are caught. For example, passing `undef` to `tests` or passing both `tests` and `test` (deprecated form) is not allowed. Carp::croak unrecognized directive warning is printed once instead of a warning per unrecognized directive. The check for unrecognized directives still does not fail fast (`croak` or `die`). I would have liked to correct this, because passing an invalid directives to `plan` probably indicates a broken test. However, the broad user base of `Test` makes this change infeasible. Somebody may be depending on the behavior that this is only a warning. Two temporary variables (`@todo` and `$x`) were eliminated by using a functional programming style. By avoiding temporary variables, we simplify algorithms and eliminate ordering dependencies. See the It's a SMOP chapter for a longer example of functional programming. `$planned` was eliminated after `$_TODO` was converted to a reference. `$planned` is known as a denormalization, because it can be computed from another value (`$_TODO` in this case). Normal form is when data structures and databases store the sources of all information once and only once. `_plan_print` writes a single string. The seven calls `print` were unnecessary duplication. I often use logical operators instead of imperative statements to avoid the use of temporary variables, which are another form of duplication (denormalization). The return value from `plan` is better represented as an empty `return`, because it handles list and scalar return contexts correctly. This is a subtle point about `return`, and it actually involves an interface change. The following use assigns an empty list: my(@result) = Test::plan(tests => 1); In the old version, `@result` would contain the list `(undef)`, that is, a list with a single element containing the value `undef`. The check for an odd number of arguments is unnecessary, because the assignment to a hash will yield a warning and the argument parsing is more rigorous (no argument may be `undef`, for example). `_print` encapsulates the output function that is used throughout `Test`. The concept that the output is directed to `$TESTOUT` is only expressed once. The global variables are named consistently (`$_ONFAIL` and `$_TODO`). I name global variables in uppercase. I use a leading underscore to identify variables and routines which are to be used internally to the package only. `$TESTOUT` was not renamed, because it is exported from the package `Test`. In general, variables should never be exported, but this would be an interface change, not a refactoring. I fully qualify all names defined outside a package (`Carp::carp` and `Carp::croak`). This helps the reader to know what is defined locally as well as enabling him to find the implementation of or documentation for external functions quickly. I apply this guideline to perl modules. In specialized Perl scripts, such as, templates and tests, I prefer the brevity of the unqualified form. For example, in the unit test example above, I used `ok`, not `Test::ok`. `carp` and `croak` print the file and line number for you, so including `Test::plan` in the error string is unnecessarily redundant. The spelling error (`verison`) in the `$MacPerl::Version` output string was corrected. The two calls to `sprintf` and `scalar` are unnecessary. The concatenation operator (dot) is sufficient, more succinct, and used consistently. The old style call syntax (`&Win32::BuildNumber()`) was eliminated, because it was not used in all places (`_reset_globals()`). The comment `# Retval never used:` was removed, because it is superfluous, and it indicates an unprovable assertion. You can't know that the return value won't be used. The comment `# guard against -l and...` was removed, because the context of `_print` is enough to explain why the `local` call is needed.[8] Even if you don't know what `$,` and `$\` are, you know they are relevent only to the call to `print`, since that's the only thing that it could possibly affect. Refactoring Now kids, don't try this at work. Refactorings and small corrections are not an end to themselves. The do not add business value--unless you are writing your coding style guideline as is the case here. Refactorings need to be related to the task at hand. For example, if there I was given a story to fix the minor defects in `plan` or to add automatic test case counting, then I would have refactored the code to allow for those changes, and possibly a bit more. However, random and extensive refactoring as I've done here is worthless. The original version works just fine, and all the corrections are minor. If you spend your days refactoring and making small corrections without explicit customer approval, you'll probably lose your job. The new `plan` is also not just a refactoring. When an interface changes, it's only a refactoring if all its uses are changed simultaneously. For public APIs like this one, that's impossible to do. In this particular case, I took a chance that the return value of `plan` was not used in this rather obscure way, that is, expecting a single list containing `undef`. Input Validation Perl is a dynamically typed language. The routine `plan` contains a set of type assertions, and the refactored version expanded on them. Is this the best way to write dynamically typed code? but It depends. In this case, explicit type checking is possibly overkill. For example, the `$_TODO` and `$_ONFAIL` are dereferenced elsewhere in the package. Dereferencing a non-reference terminates execution in Perl, so the error will be caught anyway. Since `Test` is only used in test programs, it's probably sufficient to catch an error at any point. On the other hand, `Test` is a very public API, which means it has a broad and unknown user base. Explicit type checking almost always yields more easily understood error messages than implicit error checks. This helps users debug incorrect parameters. `plan` is only called once during a test execution so the performance impact of the additional checking is insignificant. Here are some guidelines we use to determine when to add type assertions: Always validate data from untrusted sources, for example, users or third party services. It's important to give informative error messages to end users. This type of validation occurs at the outermost level of the system, where meaningful error messages can be returned with the appropriate context. Add type assertions to low level modules that define the data types, and leave them out at the middle levels where they would be redundant. There may be a performance trade off here. In general, the more public the API, the more important validation is. For example, `plan` defines and asserts that the test count is positive integer. Assert what is likely to be wrong. Write deviance tests, that is, tests which result in exceptions or type validation errors. Add assertions if the tests don't pass. The appropriateness of a particular type assertion is often hard to assess. Don't sweat it. You'll learn what's appropriate as your system evolves. Don't expect to get it right, and think about the consequences if you get it wrong. The more that's at stake, the more important assertions are.[9] Writing robust code is hard. If you add too many assertions, their sheer volume will introduce more defects than they were intended to prevent. Add too few assertions, and one day you'll find a cracker who has compromised your system, or worse. Expect the code to evolve as it gets used. You'd Rather Die Nothing is more boring than reading someone's opinion about coding style. Rather than kill off my readership, I'll stop here. When you get up to stretch your legs, I'd like you to walk away with five points: An XP team needs a consistent coding style. It doesn't matter what the style is, as long as everyone agrees to adhere to it. Take refactoring into consideration when determining your coding style. Do the simplest thing that could possibly work when writing new code. Simplify your design so that concepts are expressed once and only once. Footnotes Open Sources: Voices from the Open Source Revolution, DiBona et al, 1999, O'Reilly, p. 127. Available online at http://www.oreilly.com/catalog/opensources/book/larry.html And more generally, "Business is a team sport." Rich Kid, Smart Kid, Kiyosaki et al, Warner Books, 2001, p. 224-225. http://www.perl.com/doc/manual/html/pod/perlstyle.html http://search.cpan.org/src/SBURKE/Test-1.22/lib/Test.pm You can download some refactoring functions for Emacs from http://www.bivio.biz/f/bOP/b-perl.el Available for free from http://perltidy.sourceforge.net Even though it's a sin. In XP, "we comment methods only after doing everything possible to make the method not need a comment." See http://xp.c2.com/ExtremeDocuments.html for a document about documentation by XP's founders. Thanks to Ged Haywood for reminding me of this one.

Previous: Chapter 9: Logistics		Next: Chapter 11: Test-Driven Design

dot

dot

Copyright © 2004 Robert Nagler
Licensed under a Creative Commons Attribution 4.0 International License. back to top