Coding conventions (code style)

by Michael Ernst

March, 2011
Last updated: September 3, 2020

Contents:

Order of procedures
Comments
Code copying
Local variables
Initialization
- Initialization for variables
- Initialization for fields
Code formatting and whitespace
Version control
- Diffs before checkin
- Paragraph justification
Java
Perl

Everyone has opinions about coding style. This document contains some high-level advice. It doesn't go into minutiae like how many spaces per indentation level or whether curly braces belong on the same line as a conditional or on their own line. Details like that don't matter, so long as they are consistent. (If they are inconsistent, then the code becomes much harder to read!) Rather, this document focuses on more important issues.

Many other coding convention documents are available. Even more valuable are descriptions of good ways to design and write code. For Java programmers, I highly recommend Josh Bloch's book Effective Java.

Order of procedures

When you add a new procedure to a file, don't just type it wherever your cursor happens to be. Instead, place related procedures together. It is often helpful to put a block comment (e.g., starting with a row of 75 asterisks, or whatever style you use, so long as you are consistent) at the beginning of each group of related procedures. Such block comments divide the file into sections that are readily apparent to readers.

In general, put public methods before private ones in your files. Organize your file with helper methods (whether public or private) after the main entry points. This permits readers to read your code top-down, which is more comprehensible: the purpose of each piece of code, and how it fits into the whole, is obvious. A reader can forward-reference to just the specification, not the whole implementation, of a helper method. This doesn't mean you necessarily have to write your in a code top-down order, but do organize it that way for readers.

Comments

Every code file that you write (Java classes, Perl scripts, etc.) needs to have a comment at the top explaining exactly what it does and, if applicable, how to run it. Otherwise it will be a mystery to others — and perhaps to you when you return to it. In some cases one or two sentences will do; in many other cases, the description needs to be more complete. Every non-trivial procedure should also contain a brief comment saying what it does. In the case of Java, there should be valid Javadoc comments for every method (both public and private). Each parameter should be described as well unless they are completely obvious. Don't add useless comments that just repeat the name of the parameter and its type, or in which the @returns clause is essentially identical to the procedure summary. Comments should enlighten, not merely repeat.

When a comment is a sentence, start it with a capital letter, end it with a period, and use correct grammar. Strive to keep comments and code to 80 characters whenever possible. (Don't be slavish — an exception here and there is OK — but lots of violations lead to less-readable code. Don't assume that everyone uses the same width screen as you do — I assure you they do not — but 80 columns is a generally-accepted industry standard.) This makes it possible to print the code in a readable fashion and also to read the code in a standard-sized window.

Do not comment out large blocks of code with /* ... */; instead, prefix each line by // . Among other things, this does not lose the indentation of the original code; without that indentation, the commented-out code is much too hard to read and understand. It also makes it clear what is commented out and what is not, even when the code is printed or is viewed without color highlighting.

Code copying

In general, you should not copy code. It is easy to make a mistake when copying, even easier to forget to update some of the copies when editing other copies, and difficult for readers to understand the distinction (or lack thereof) among the versions. Rather than copying, it is often better to use hooks or to generalize the original version.

If you are forced to copy code, then it is essential that you indicate where you got the original version from; this is important for understanding the code and for giving credit where credit is due, and to not do so is intellectually dishonest. Furthermore, you should indicate the reason for the copying and how this version differs from the original, and clearly indicate every change that you have made, perhaps with a distinctive comment that is indicated in the prefatory comment, or perhaps by giving a command that can be run to get a diff of your version of code against the original. If the original code is still being maintained, you should periodically update your code with respect to the upstream version, and should document how to do this.

Local variables

Local variables should have the most restrictive scope possible. For instance, don't do this:

  int x;
  ...
  for () {
    ...
    x = ...;
    ...
  }

Instead, do this:

  for () {
    ...
    int x = ...;
    ...
  }

A loop-carried dependence is when a variable is (sometimes) set on one loop iteration and used on the next iteration. A loop-carried dependence is the only reason to declare, external to a loop, a variable that is set in the loop. Reducing scopes makes it clear that there are no loop-carried dependences. Likewise, if two loops both use a temporary variable, you should declare two separate variables rather than reusing the same one, to indicate that there are no inter-loop dependences.

Initialization

Every variable and field should be explicitly initialized (set to an initial value). However, it should never be redundantly initialized to a temporary value that will not be read.

Initialization for variables

If a variable is initialized after its declaration but before it is used, it should not be initialized to a temporary value that will never be read. An example is

  int x;    // it would be bad style to initialize x to a dummy value
  if (p) {
    x = someValue;
  } else {
    x = otherValue;
  }

It is clearer, when possible, not to reassign values immediately. Prefer the above construct with an else clause over

  int x = otherValue; // this is confusing; put it in an else clause instead
  if (p) {
    x = someValue;
  }

Initialization for fields

Fields should also be initialized exactly once. If a field is initialized by the constructor, then its declaration should not initialize it to a temporary value that will never be read.

If a declaration has an initializer, that should be its initial value.
If a declaration has no initializer, that is a signal to the programmer to look elsewhere for the initial value (and that the initial value differs for different instantiations).

In some languages (for example, Java), it is possible to omit the initializer for a field: boolean myField; is equivalent to boolean myField = false;. The short version is no more efficient, but it is more confusing. A reader must waste time searching the code (including in subclasses) to determine the initial value. The code is clearer if the initializer is explicit. Do so for all datatypes, including objects whose default value (in the absence of an initializer) is null.

Code formatting and whitespace

When code has a consistent style, particularly within a single file but also over an entire project, the code is much easier to read and understand. I don't wish to spend an excessive amount of time or energy promulgating coding guidelines, but here are a few things you should pay attention to.

Use a consistent indentation style. When editing an existing file, adopt its indentation style rather than writing your additions in an incompatible style. This means that you must set your editor to respect the current indentation style. You can do this by hand, but that's error-prone and programmers hate to perform tasks manually. You should be able to find a customization package that does this for you. For example, Emacs users can use dtrt-indent, which causes Emacs to set its indentation parameters to whatever the file already happens to use (for C and Java code).

In general, do not re-format existing files to suit your own personal indentation style. That does you very little good (you should be comfortable using any consistent indentation style), it destroys the version control history information by modifying every line of code, and it annoys others who wrote or maintain the code.

Ensure that whitespace makes keywords easy to read: do not jam punctuation against other entities, which makes the program hard to read. Here are some examples of this rule:

Place whitespace around the outside of grouping punctuation: use "if (foo) bar;" rather than "if(foo)bar;".
Use "} else {" rather than "}else{".
Place a space after // that starts a comment.

Additionally, make keywords and procedure calls visually distinct. Do not place whitespace between a procedure name and its arguments. Thus, you would write "while (x != 0)" but "myProcedure(x != 0)".

Do not use tabs in code files. They display differently in different editors, and they often print differently than they display in an editor. Always use spaces instead.

If you use Eclipse, then here's how to make it use spaces instead of tabs (as of Eclipse 3.1M6):
1. open the Preferences > Java > Code Style > Formatter dialog
2. press Edit...
3. In the Indentation tab, unselect the "Use Tab Character" checkbox, press Apply. It will ask you to save this as a new profile; say Yes and give it a name.
You can set this for the whole workspace or per-project. You can export the profile to a file and share it with other developers who use Eclipse via a version control repository or however you want (or just use it on your other projects). In the same Formatter dialog, you can also specify how many spaces should be inserted for tab. (You might have to do this by hand for each project.)

In Emacs, you can add this to your ~/.emacs file:

(defun unset-indent-tabs-mode ()
  (setq indent-tabs-mode nil))
(add-hook 'java-mode-hook 'unset-indent-tabs-mode)
(add-hook 'c-mode-hook 'unset-indent-tabs-mode)
(add-hook 'perl-mode-hook 'unset-indent-tabs-mode)
(add-hook 'cperl-mode-hook 'unset-indent-tabs-mode)

Version control

Diffs before checkin

Before you commit a change, you should always run the status and diff command, such as svn status or hg status and svn diff or hg diff, to see exactly what changes you have made. It is far too easy to inadvertently check in changes that you didn't intend to, and it is far to easy to fail to check in part of a larger change. (This advice is equally applicable to papers as it is to code.)

Paragraph justification

When editing a file of LaTeX source that is under version control, you should ordinarily not refill paragraphs (e.g., M-q in Emacs), particularly toward the end of the edit cycle or when others might want to see what you have done. Refilling paragraphs makes the diffs large, and readers must examine whole paragraphs that may or may not contain a change.

If you must refill paragraphs, make a separate checkin that changes no content except for paragraph formatting, so that readers can ignore that checkin (only).

If it is early in the editing cycle, or if you are not collaborating with anyone, or if every line of a given paragraph has changed, or if for some other reason readers will have to reread the entire document (rather than viewing the diffs), then refilling paragraphs is fine.

Java

Java code should compile without warnings using javac -g -Xlint.

In general, do not use \n in strings in Java code; when output, \n produces a line separator on Unix, but \n is not the line separator on other platforms. If you wish to output a line separator, use println, or use printf with the %n specifier. If you need the platform-specific line separator (e.g., because you are building a string that will be output), use String.format with the %n specifier, or use

  private static final String lineSep = System.lineSeparator();

Perl

Use #!/usr/bin/env perl as the first line, to permit independence from the specific location where perl is installed. If you need a particular version of Perl, then use the appropriate command, such as use 5.6.0; or require 5.6.0;. Do not hard-code a path to the perl executable; that path may not exist on other systems and may change even on a given system.
Include the following code, which enables checking for common perl errors, near the top of the file (but usually after the brief comment that describes the script and how to use it):
```
  use English;
  use strict;
  $WARNING = 1;
```
Always check the status for every use of system() or backticks. Better yet, use the system_or_die() and backticks_or_die() functions defined in system_or_die.pm, which automatically do the checking. To use them, put use system_or_die; near the top of your Perl script (after the use strict; block).
Avoid system() and backticks when possible; instead, use Perl procedures, which tend to be more portable, more efficient, and usually clearer. For instance, use @files = glob("*.c"); in preference to @files = split('\n', `ls *.c`);.
When writing a subroutine that takes arguments, consider using the checkargs facility to ensure that you have passed the correct number of them. Put use checkargs; near the top of your Perl script, then, as the first line of user-written subroutine foo, do one of the following:
```
  my ($arg1, $arg2) = check_args(2, @_);
  my ($arg1, @rest) = check_args_range(1, 4, @_);
  my ($arg1, @rest) = check_args_at_least(1, @_);
  my @args = check_args_at_least(0, @_);
```

Back to Advice compiled by Michael Ernst.

Michael Ernst