Creating and grading exams

by Michael Ernst

May, 2011
Last updated: September 3, 2020

This document describes one successful approach to creating a good exam or quiz — such as a midterm or final for a class.

Exam process

The lecturers are ultimately responsible for the exam, including selecting problems, editing and assembling the exam, correcting problems that are discovered during playtesting, and many other tasks. TAs are asked to contribute questions, test the exam, and assist in other ways.

As with any document, keep the exam under version control; email is a poor coordination mechanism, because it is error-prone and suffers time lags.

As a related point, write the exam in a format (such as LaTeX or HTML) that can be conveniently edited by multiple people, can be diffed, and is compatible with version control systems. As a minor advantage, these formats also allow code examples to be included directly from source files, which makes it easy to verify that the source code compiles.

Keep the answer key up to date at all times. Whenever the exam is tested by a TA, test the answer key by grading the completed exam. TAs shouldn't look at the answer key until after they have play-tested the exam, but the answer key should be kept under version control.

Obviously, the exam needs to be playtested multiple times, long enough before it is given to students to permit corrections and more playtesting.

Exam questions

I prefer a closed-book exam, for several reasons.

The exam should test whether students have internalized and can apply concepts, not whether they can look up facts. The exam should test deep understanding rather than taking off points for niggling details.
A closed-book exam rewards lecture attendance.
An open-book exam that uses paper too often turns into a “tree-killer”: students print out everything from the course, but generally don't refer to it. (An alternative type of open-book exam is to permit access to the Internet. This offers the possibility that test-takers may communicate with accomplices who help them.)

Exam questions should

Test real understanding. They should not be about nitpicky details, irrelevant issues, or tricks, but should quantify whether students understand the core concepts.
Be easy to grade. Aim for as many true-false and multiple-choice as practical. However, such problems are hard to propose, even if easy to grade. It takes quite a bit of work to formulate a question that tests real understanding, even in the presence of a limited number of answers.
Discriminate among students. My opinion is that on a perfectly-formulated test, the scores form a uniform distribution between 0 and 100. Other teachers don't go this far, but it is clear that a wide distribution (large standard deviation) is desirable. Otherwise, it is difficult to make distinctions between various grades, and (worse) a few silly mistakes can send a student far down in the rankings. Some questions should be easy, but many others should be answered correctly by about 50% of the class. (I do not aim to formulate a test such that anyone who gets 90% or better earns an A, anyone who gets 80-90% gets a B, etc. — I am not smart enough to predict how well students will do, and certainly not smart enough to achieve such a goal. The test discriminates among the students, and afterward we map those numbers to grades.)
Encourage particular behavior. Especially on the first test, asking questions from lecture and from the book which were not on other handouts will send the message that students should pay attention to all aspects of the course.

When you make up a question, you need to write down three types of information:

The question itself, which the student will see on the exam.
A sample solution, which you must write at the same time as you produce each question itself. This is a crucial step in verifying that you have written a good question. If you do not have an easy time writing a solution — for instance, if you find yourself writing sentence fragments or notes to yourself — then the question probably isn't a good one. The sample solution should be distributed to the students after the exam. Sample solutions are really important in both solidifying student understanding (now and in future quarters) and in saving the staff substantial time in fielding common student questions after the exam.
Information for the staff, including what the question is meant to test, how difficult you expect it to be, and its source of inspiration (book, lectures, etc.). This will help to ensure that, when you make up a question, you know what you are trying to test and your question tests that. Also produce, at the same time as the question and the sample solution, a grading key. (This goes in the comments, and is not handed out to students, lest it be a blueprint for whining.) The rubric lists the common errors you expect students may make, and how much partial credit (if any) each one is worth. While grading the exam, you will enhance the rubric; be sure to type the new information into the grading key, for future reference. An example of a bad rubric is:

If the student got the wrong answer but gave correct reasoning, give one or two points (depending on how close their answer is and how good their reasoning is).

The above is bad because it gives no guidance regarding how to re-grade an exam if such a regrade is necessary. It also gives no guidance in understanding the range of errors that students made, which is something that future staffs may want to know.

Don't just take phrases from the book or slides, and ask students to fill in a word. Instead, think about the concept that is being conveyed, and how you can test the concept rather than testing whether a student can regurgitate a phrase.

For multiple-choice questions, always indicate in the question how many answers students should circle.

As in all technical writing, be precise. For instance, short-answer questions should be precise about the length of answer required. Don't say, “answer briefly”. Instead, be specific; for instance, “one sentence”.

Whenever possible, write questions so that they have only one possible answer. For instance, don't ask for any example of a particular phenomenon; instead, ask for the shortest or best example. This makes grading much, much easier: it is both easier to understand whether an answer is right and to understand what is wrong with an incorrect one. Furthermore, solutions should always be as simple as possible; a short solution is less likely to mislead students with a red herring, and we want students to be able to understand their essentials rather than getting caught up in inessential matters.

Try to avoid questions that require students to read or write non-trivial amounts of source code. The assignments evaluate students' ability to read, write, and debug code; students will have had plenty of experience with such activities. The exam should be used to evaluate students on other aspects of the course. Code-related questions can be very frustrating to students; for instance, “find a bug in the following code” can be an “aha” experience that is not well-suited to a limited-time exam (especially with the pressure of a exam). Asking for the result of running a piece of code is ill-motivated, since a programmer would just run it. Questions about code tend to be very long, since you need to provide specifications for every library routine that might be called; students can't be expected to have memorized these. Finally, there are so many possible answers to a coding question that they tend to be quite difficult to grade.

If you write code, use good code style. For instance, comments should be in English, not pidgin: use full sentences, started by capital letters and terminated by periods (or other appropriate punctuation). Comments that are not in clear English are much harder to read, and they set a bad example to the students.

Make up exam questions throughout the term; do not wait until just before the exam to create it. An excellent way to make up exam questions is to pay attention during lecture or section and write down anything that pops into your mind. Or, if there is a common misperception that you notice one week in office hours, make that into an exam question too. If a student asks a question (in lecture or office hours), that is often an excellent exam question as well, since it was something that could confuse a student but that was covered in class. If you follow this process, then creating an exam requires very little extra work — it essentially comes for free.

Print the exam on only one side of the page. If you use figures (whether code or otherwise) that are referenced by a question not on the same page, you should duplicate the figure on a tear-out page at the end of the exam, so that students can see everything relevant at once rather than being forced to flip pages.

Exam reviews

My style is to prepare no material to present to students during an exam review. I only answer questions that students bring to the exam review session. (Naturally, tell students that this will be the case, so that they can prepare for the review!) Questions such as “Can you explain this whole section of the course?” inspire no respect, and you needn't answer them directly. But when students have specific questions, often that can segue into a broader discussion, and that is a quite productive way to run a review session.

Exam grading

Immediately after giving a exam, the course staff will gather to grade the exam. My policy is that no one leaves until all the exams are graded — but you can pop out for a class, then return afterward, if you have a conflict. Typically, this takes 3-4 hours, but it can range from 2 to 8.

The grading time depends almost entirely on the quality of the exam. The worst situation is when the staff disagrees about the best answer to a question (even a multiple-choice one) and has to work it out in the grading room. The next-worst is when a question required students to write a long explanation; pithy ones are easiest to grade.

If you grade a question, you are responsible for making up a grading key or improving an existing one, and you are responsible for recording that (typically in comments in the exam document) for use when dealing with makeup exams, regrade requests, etc. Furthermore, you are responsible for improvements to the solutions, in particular explanations of any issues that many students got wrong.

After the exam, distribute both an original version of the exam, plus one with solutions. This lets future students test their understanding, and it explains any of their misunderstandings.

Back to Advice compiled by Michael Ernst.

Michael Ernst