Program Committees and Paper Selection
Posted: October 25, 2013 Filed under: advice, general, research, reviewing Leave a commentIn this post, I’ll discuss paper selection—how a program committee considers a set of papers, each with a collection of reviews, and produces a program for a conference.
Paper Reviewing vs. Paper Selection
It’s first important to note that the paper review process and the paper selection process are related, but paper reviewing and paper selection are actually two separate events. There is a human process behind paper selection that involves distilling the reviews of a paper and ultimately making a determination that a paper should be published, but the reviewing and selection have many independent aspects. Sometimes, a paper’s reviews may appear positive, but the paper may still be ultimately rejected (or, the reviews may appear negative but the paper is accepted in spite of the reviews). More commonly, papers tend towards the middle of the review distribution: when reviewers score papers, they tend to regress towards middling scores such as “weak reject” and “weak accept”, rather than taking a strong stance. A significant fraction of papers thus end up with almost the same average “scores”, with only a handful of papers that should clearly be accepted. The number of clear accept papers is never enough to fill the entire program. Suppose, for the sake of example, that a conference can accept 40 papers. The conference may receive several hundred submissions; in most cases I’ve observed, about ten of these submissions receive uniformly glowing reviews. The next best 30–40 papers often fall into a rough equivalence class with the next best 30–40 papers, leaving anywhere from 60–80 papers that are “good enough to be published” for about 30 spots in a conference program. The job of a program committee is to create a smaller program from this larger set of papers.
Whenever a larger number of roughly equivalent candidates competes for a smaller number of positions, some amount of randomness can and will ensue. Much of this randomness is completely beyond the author’s control. As an author, the best way to avoid being subjected to the randomness of a program committee is to write your paper so that it is among the ten best submissions to the conference. Unfortunately, this is far easier said than done, and it’s not really possible to engineer this, although following various research and writing tips can increase the likelihood that your paper ends up in this class. (Speaking from experience, I can say that those tips aren’t fail-proof—even the best researchers I know routinely have papers rejected—but you can at least improve your odds!)
Unfortunately, unless you can guarantee that your paper always falls among the very best papers (a secret that nobody has yet succeeded in unlocking), much of the randomness that results from the paper selection process is out of your control as an author, but other factors can help create a sane selection process. The program committee chair (or chairs) have tremendous power for creating a sane process. These roles include setting the right tone for reviewing and mindset for paper selection, ensuring that papers are reviewed by the right program committee members, selecting the papers that will be discussed at the program committee meeting, and reducing the effects of program committee psychology, such as paper discussion order (e.g., the weight of one biased or incorrect review, and so forth). I will describe more details and tips for program committee chairs below, based on my own experiences and observations.
The Intermediate Dispositions of Papers
Before the program committee meeting itself takes place, the program committee chairs must determine whether each paper should be accepted or rejected without any discussion, or whether the paper should be discussed at the meeting itself.
Before the meeting: Selecting papers for discussion. Program committee chairs typically select papers for discussion at the program committee meeting based on their assessment of the paper’s overall quality, based on review scores, their interpretation of the paper’s reviews, and their assessment of any subsequent “online discussion” that may take place before the physical program committee meeting (most conference reviewing systems allow reviewers to discuss each other reviews in the system itself via asynchronous, email-like messages). Program committee chairs will often try to ask reviewers to reach consensus about whether a paper should be discussed at the program committee meeting before the meeting.
Best case: Acceptance without discussion. If you are very lucky, your paper will not be discussed at the program committee meeting and will be accepted without discussion based on high review scores. I think this is the best possible outcome, because a paper always has flaws, and the more a paper is discussed, the more likely that someone at the meeting will hear something they don’t like about the paper and raise an issue with it. I have seen instances where a paper that receives uniformly high scores is rejected because it is brought up for discussion and someone (often who hasn’t yet read the paper but might read it during the meeting) hears something they don’t like. Such a discussion doesn’t mean that your paper will be rejected—a good PC chair will ensure that these dynamics don’t wrongly result in the unjust rejection of an otherwise good paper—but it does mean that your paper may ultimately be subjected to some of the dynamics I discuss below. Unfortunately, as I mentioned, you can’t guarantee that your paper falls into this category, but you can sometimes be lucky, and you can certainly take many steps to increase your odds.
Next best case: Discussion. Otherwise, if you are still reasonably lucky, your paper will be discussed at the program committee meeting. Discussion represents hope for acceptance, and typically because there are a large number of papers that are of roughly equal score or rank, if your paper is being discussed, it generally has as good of a chance or being accepted as most of the other papers being discussed. The program committee chair will set the order in which these papers are discussed, which can sometimes play a surprisingly significant role in whether a paper is accepted or rejected.
The Dynamics of the Program Committee Meeting
The program committee meeting itself involves the winnowing of a larger set of papers (say, 60–80 papers) down to a final program of, say, 30–40 papers. Typical math I’ve seen in systems and networking conferences is that while overall acceptance rates hover between 10–20%, about 50% of the papers that are discussed will be accepted.
Attendance. An in-person meeting is more effective than other alternatives (e.g., conference calls), if and when it is feasible. Most top-ranked conferences—and even some competitive workshops—have in-person program committee meetings. When program committee chairs select the members of the program committee, they typically confirm that the program committee member can attend the program committee meeting; and, if the meeting is an in-person meeting, they confirm that the PC member can attend in person. If all PC members are not present at the meeting, it is nearly impossible to have anything close to a fair paper selection process. Consider that a paper under discussion might have anywhere from 3–6 reviews, and that most of those reviews might hover around average scores, but one or two reviews may be outliers. The fate of a paper can shift dramatically if the PC member who feels strongly (either positively or negatively) about a paper is not in attendance. Without a champion present, the paper may simply fade into the mix; if a reviewer who raises major concerns about a paper is not present, the rest of the committee may be more likely to discount those concerns, since they cannot hear about them first-hand. Some of these attendance-related quirks can also materialize when program committee members depart early (e.g., to catch a plane flight).
Calibration. Every reviewer has a different opinion for what constitutes a reasonable threshold for acceptance. These differences in calibration are exacerbated by the fact that every program committee member has a unique set of papers to review. That is, while each paper has many common reviewers, no two PC members review exactly the same set of papers. Therefore, calibrating the PC is incredibly important for controlling the meetings dynamics. A poorly calibrated PC may pre-emptively reject good papers (as I mentioned above), only to accept weaker papers later on in the meeting. A good PC chair will spend considerable effort to calibrate the PC to reduce randomness. There are various approaches to calibration, which I will discuss below in my advice for PC chairs. One of them is setting an appropriate order for paper discussion, since the order in which papers are discussed will affect how the PC calibrates itself at various points in the meeting.
Discussion order. There is no accepted standard for setting the order in which papers are discussed, and every program committee chair seems to have a slightly different approach. That said, discussion order has a significant effect on the ultimate disposition of a large number of papers that sit on the borderline between acceptance and rejection. Perhaps the most striking example of ordering effects is evident when a PC discusses papers in decreasing order of score (i.e., from highest ranked to lowest ranked), which is, in my opinion, the worst possible ordering. People arrive at a meeting highly energized and eager for discussion. What often happens is that papers that are discussed in the morning are subjected to a fine-toothed comb and vigorous discussion—sometimes the result of these discussions can result in pre-mature rejection of a good paper. Later in the day, PC members become more tired and are less likely to pick apart a paper’s flaws or have a paper accepted only “over his or her dead body”. Similarly, excessive aggression in the early parts of the meeting can create a situation where, by the end of the day, there are not enough papers to fill the conference program (quite an irony, given the starting point of twice as many acceptable papers as slots!) This seems to create a dynamic where papers that are discussed later in the day sometimes have a better chance of being accepted. If papers are being discussed strictly in order from best to worst, there is the potential for “inversion”, where better papers are rejected early on in the meeting, making room for weaker papers to be accepted later on, as PC members acquiesce and PC chairs are frantically trying to fill the program. PC chairs can take various steps to reduce this randomness. One tweak I have seen (and used) is to simply treat all papers under discussion as roughly equivalent, banning all discussion or consideration “scores”. In these cases, the discussion order can be set such that papers on related topics are discussed together. Another that I have used is to employ a two-pass approach to discussion, where every paper that is not accepted immediately or upon initial discussion is discussed a second time—in other words, a paper is never rejected on the first pass. A third tweak that is sometimes used is to discuss a few “highly ranked” papers first (sometimes even the papers that are supposed to be “accepted without discussion” are used as quick calibration examples), followed by a few “low ranked” papers, and so forth.
Timing of discussion. A PC meeting does not leave time for extended consideration of any single paper. If there are 80 papers to discuss, and a PC meeting has eight hours of real work (discounting breaks, lunch, and so forth), then each paper sees, on average, 6 minutes of attention. Thus, discussion must be crisp and focused. A “discussion lead” for a paper (typically one of the reviewers) will typically summarize the paper and the strengths and weaknesses, effectively summarizing all of the reviews. This summary should last no more than 90 seconds. This is not a lot of time. Therefore, a good PC member who is leading discussion on the paper will prepare that summary ahead of time, so that it is as efficient as possible. The paper’s ultimate fate is typically decided in the remaining time (less than five minutes). Many disputes and disagreements about certain papers cannot be resolved in five minutes. This truth is the most common downfall of a PC meeting—a single contentious paper runs the risk of monopolizing discussion time, leaving less time for the remaining papers, thereby increasing randomness, acquiescence, etc. for papers that are discussed towards the end of the day. I find that using the “two-pass approach” for paper discussion helps mitigate this problem. I’ll discuss this approach in more detail below.
Personalities. A paper that has a single champion tends to fare much better than a paper with middling reviews, even if both papers have the same average score. In some sense, that is exactly the right outcome—a paper that someone really likes is likely to be appreciated by others, whereas an average paper might not excite anyone in particular. However, strong personalities do hold tremendous sway over a paper’s outcome, as well. A paper stands a very good chance if it has a champion on the PC who is articulate, confident, and strong-willed enough to persistently argue for a paper’s acceptance in spite of the paper’s flaws and detractors. Every paper has flaws; a strong-willed PC member can convince the rest of the PC to overlook those flaws in favor of other bright spots. Likewise, a single similarly strong-willed PC member can amplify the weaknesses of a paper and send the paper to the reject pile. These effects are exacerbated if the strong-willed PC member asserts expertise over the topic area. Sometimes, a PC member’s expertise is as important (if not more) important as the volume or persistence of the PC member’s argument. Simple statements from an domain expert such as “I learned a lot by reading this paper” can be enough to tilt a paper towards acceptance.
Psychology. The paper selection process is a human process that involves a lot of psychology. Rather than recount the various psychological factors that can play a role in program committee meetings, I refer you to Matt Welsh’s blog post on the topic.
Tips for Program Committee Sanity
Everything I learned about how to run a program committee, I learned from Jeff Mogul, who co-chaired NSDI this past year with me. Jeff is a veteran program committee chair who has chaired pretty much every major systems and networking conference at some point in his career. He is meticulous, thorough, ethical, and fair, which are indispensable qualities for any program committee chair. Working through the process with him, I picked up several tactics and strategies, which I can recommend to others and will undoubtedly use if and when I chair another major conference program committee. Being a program committee chair has countless tasks; some of these tasks are incredibly important, and the ones that are paramount weren’t immediately obvious to me until after I’d gone through the whole process. Below, I highlight tips for what I think are the most important steps of running a program committee.
- Pick your program committee carefully. Your program committee members need to be thoughtful, reliable, and conscientious. Ultimately, the program committee will write the reviews that authors of submitted papers see, and they will also determine the fate of each submitted paper. Don’t simply pick your friends or people you know well—they may not always make the best reviewers. Take extra time to do homework on program committee members’ performance on past committees. Did they write thorough reviews? Were they active participants in discussion at the PC meeting? Did they turn in reviews on time (or, if not, were they communicative enough to help the chairs plan around timing hiccups?). Are they considered an expert in a particular area that will see a lot of submitted papers for the conference? Beyond selecting individuals, the chairs also need to ensure that every area that may see submitted papers has significant coverage. For example, if the conference will see many submissions on (say) wireless networking, it is incumbent on the chairs to ensure that there are enough reviewers for that particular paper topic.
- Make sure that every program committee member “bids” on papers, and assign reviews manually based on preferences and expertise. Most conference reviewing systems allow program committee members to review all submitted papers and express preferences for which papers they would like to review. Some conference systems also enable chairs to automatically assign reviews. Do not rely exclusively on auto-assignment. When Jeff and I chaired NSDI, we used the auto-assignment feature in HotCRP (after ensuring every PC member filled in review preferences for each submitted paper) and then manually reviewed each paper to ensure that no reviewer was assigned to review a paper for which they had expressed a negative preference. This process is painstaking, but it is perhaps the most important step in the entire paper selection process, because it ensures that reviewers are assigned papers that they are capable of and willing to review. A reviewer who knows a paper topic can typically write a thoughtful review (and often do so quickly). In contrast, a PC member who reviews a paper for which they lack expertise or enthusiasm will typically invest only minimal effort and write a superficial review (thereby increasing randomness and likely reducing overall conference quality).
- Insist on (and monitor) online discussion in advance of the meeting. Selecting discussion leads and asking the lead to type a summary in the online discussion helps organization; the discussion lead can essentially read this typed summary (or notes) at the meeting itself, ensuring that the summary discussion concludes quickly and on time. Online discussion can also encourage reviewers to identify contentious issues before the meeting (where there is limited time to resolve disputes or disagreements), and potentially resolve them ahead of time, thereby averting protracted and unfocused arguments at the PC meeting itself. Sometimes, consensus on a paper can be reached in online discussion before the meeting occurs, saving precious time at the meeting itself.
- Use a two-pass approach to paper discussion. Every other program committee I’ve served on tries to reach a conclusion about a paper’s disposition after a (short) discussion. In the best case, this works OK; in the average case, hasty and incorrect decisions can result; in the worst case, the discussion monopolizes meeting time, and the meeting is derailed and ends in a frantic rush to accept papers at the end of the meeting. The two pass approach we used worked as follows: (1) in the first pass, a discussion lead would summarize the paper and its reviews; in the remaining 3–4 minutes, the reviewers would try to agree on an outcome for the paper, but none of the outcomes were reject. Rather, possible outcomes were: accept, incremental/boring (indicating that the paper was technically correct, but not particularly interesting or groundbreaking), risky (indicating that the paper could be groundbreaking but that reviewers had concerns about correctness or something else), and discuss (indicating that there was absolutely no agreement on the paper, and that more time would be needed to reach an outcome). (2) In the second pass, our original intent was to re-discuss everything, but effectively what happened was that all papers labeled as risky were accepted by default unless someone wanted to argue against one of them. Similarly, all papers that were labeled as incremental were rejected by default unless someone wanted to advocate for one of them. (We had papers in each of these categories.) Pulling reject off the table on the first pass kept the meeting going (we could reasonably end discussion after a fixed amount of time because people knew we could come back to discussion later), and it also reduced the “ordering effects” that I described above, since every paper that wasn’t quickly accepted received two passes, with the second pass occurring after the PC had seen the complete set of papers.
- Read every paper that is being discussed. Although it is not possible to carefully read 80 papers in advance of a meeting, two PC chairs can split the workload and at least perform a quick read of about 40 papers (and take notes on them) in advance of the meeting. Jeff and I did this, and it proved to be incredibly useful, for several reasons. Having your own opinion about a paper as the chair can help a chair to moderate the PC meeting discussion, by ensuring that a strong personality (or, sometimes, a PC member who has not even read the paper!) from unfairly swaying the discussion or perception of the paper. In the end game, sometimes PC members will never come to agreement about whether a paper should be accepted or rejected. In these cases, the chair becomes the “tie breaker” and can accept or reject a paper by fiat. This typically happens at least once in every PC meeting. A PC chair who has not read a paper cannot exercise fiat effectively, so having read every paper is critical in these situations.
The Worst Process, Except for Every Other Process
The program committee process for paper selection is far from perfect; it is an inherently human process, and in the final analysis, a small handful of gatekeepers (indeed, sometimes even a single gatekeeper) can determine whether a paper is accepted to a top-tier conference and read by many others, or shelved. I personally would like to see my own community have a long discussion about (and experiment with) ways to improve this process. In the meanwhile, I trust that this post can help authors (particularly students) understand some of the dynamics of the process and also help chairs and PC members ensure that the process is as fair as possible. Although the process is far from perfect—and it is unlikely that a perfect process exists—understanding the mechanics that go on “behind the closed doors” of a program committee meeting hopefully also indicate that the outcome of the process (either acceptance or rejection) should not be interpreted as universal praise or condemnation, but rather the result of the opinions of a small number of people and the outcome of a human process. Ultimately, the proof of a research idea lies not in the outcome of a single program committee, but in an ideas ultimate acceptance, adoption, and impact (which is, in itself, a topic for a future post).