This document describes the full results of a survey of authors of computer systems papers from 2017.

Background

Peer review is a cornerstone of modern scientific research. And yet understanding and improving the process is challenging, because realistic controlled experiments on peer review are nearly impossible (for example, most conferences disallow parallel submission). Thus, peer-review decisions and policies for conferences are selected more based on the opinions of the steering committee and chairs, and less based on hard data and evidence.

This survey aims to provide insight into the author’s perspective of the peer-review process, specifically for computer systems conferences. In Computer Systems, like most fields of computer science, the main venue for publishing research results is in peer-reviewed conferences (journals assume a lesser role). In a typical reputable conference, each paper receives at least three blind reviews, where the identity of the specific reviewers is hidden from authors. Some conferences have different review policies, such as double-blind (authors’ identities is hidden from reviewers), rebuttals, two-phase reviews, etc. Such policies can affect the quality of the reviews, as well as the experience of the authors undergoing the review. This survey focused primarily on the latter, in an effort to quantify the differences between such policies. By limiting our scope to conferences in a single field (computer systems), we avoid the variability that spans different disciplines. On the other hand, by including numerous conferences and authors we aim for a comprehensive survey and analysis, to increase the statistical validity and robustness of our measurements.

Methodology

Our data set includes 56 reputable systems and systems-related conferences, all peer reviewed and all from 2017 (See table below). These conferences range in age, size, location, audience size, and impact factor, topics. But nearly all the papers surveyed were reviewed by at least three reviewers, and most by four or more. These conferences represent 2439 papers and 8196 unique authors. Of these, we were able to scrape 5918 valid email addresses from the paper’s full text or the author’s website. During the summer of 2018, we sent an email survey to all these addresses, and 918 of these authors responded, at least to some of the questions (for a response rate of 15.5% by authors). In all, responses covered 1910 reviews of 810 unique papers, although some coauthors submitted redundant responses on the same reviews.

Conferences used for this data set, ordered along response rate by papers
Name Commencement URL Papers Response
ASPLOS 2017-04-08 http://novel.ict.ac.cn/ASPLOS2017/ 56 41%
ATC 2017-07-12 https://www.usenix.org/conference/atc17 60 22%
CCGrid 2017-05-14 https://www.arcos.inf.uc3m.es/wp/ccgrid2017/ 72 14%
CCS 2017-10-31 https://www.sigsac.org/ccs/CCS2017/ 151 32%
CIDR 2017-01-08 http://cidrdb.org/cidr2017/ 32 44%
CLOUD 2017-06-25 http://www.thecloudcomputing.org/2017/ 29 28%
Cluster 2017-09-05 https://cluster17.github.io/ 65 22%
CoNEXT 2017-12-13 http://conferences2.sigcomm.org/co-next/2017/#!/home 32 31%
EuroPar 2017-08-30 http://europar2017.usc.es/ 50 34%
EuroSys 2017-04-23 https://eurosys2017.github.io/ 41 39%
FAST 2017-02-27 https://www.usenix.org/conference/fast17/ 27 56%
HCW 2017-05-29 http://hcw.eecs.wsu.edu/ 7 29%
HiPC 2017-12-18 http://hipc.org/ 41 37%
HotCloud 2017-07-10 https://www.usenix.org/conference/hotcloud17 19 58%
HotI 2017-08-28 http://www.hoti.org/hoti25/archives/ 13 0%
HotOS 2017-05-07 https://www.sigops.org/hotos/hotos17/ 29 34%
HotStorage 2017-07-10 https://www.usenix.org/conference/hotstorage17 21 29%
HPCA 2017-02-04 http://hpca2017.org 50 54%
HPCC 2017-12-18 http://hpcl.seas.gwu.edu/hpcc2017/ 77 29%
HPDC 2017-06-28 http://www.hpdc.org/2017/ 19 37%
ICAC 2017-07-18 http://icac2017.ece.ohio-state.edu/ 14 36%
ICDM 2017-11-19 http://icdm2017.bigke.org/ 72 26%
ICPE 2017-04-22 https://icpe2017.spec.org/ 29 45%
ICPP 2017-08-14 http://www.icpp-conf.org/2017/index.php 60 25%
IGSC 2017-10-23 http://igsc.eecs.wsu.edu/ 23 35%
IISWC 2017-10-02 http://www.iiswc.org/iiswc2017/index.html 31 48%
IMC 2017-11-01 http://conferences.sigcomm.org/imc/2017/ 28 50%
IPDPS 2017-05-29 http://www.ipdps.org/ipdps2017/ 116 28%
ISC 2017-06-18 http://isc-hpc.com/id-2017.html 22 45%
ISCA 2017-06-24 http://isca17.ece.utoronto.ca/doku.php 54 31%
ISPASS 2017-04-24 http://www.ispass.org/ispass2017/ 24 38%
KDD 2017-08-15 http://www.kdd.org/kdd2017/ 64 28%
MASCOTS 2017-09-20 http://mascots2017.cs.ucalgary.ca/ 20 25%
MICRO 2017-10-16 https://www.microarch.org/micro50/ 61 43%
Middleware 2017-12-11 http://2017.middleware-conference.org/ 20 35%
MobiCom 2017-10-17 https://sigmobile.org/mobicom/2017/ 35 49%
NDSS 2017-02-26 https://www.ndss-symposium.org/ndss2017/ 68 54%
NSDI 2017-03-27 https://www.usenix.org/conference/nsdi17/ 42 21%
OOPSLA 2017-10-25 https://2017.splashcon.org/track/splash-2017-OOPSLA 66 12%
PACT 2017-09-11 https://parasol.tamu.edu/pact17/ 25 24%
PLDI 2017-06-18 http://pldi17.sigplan.org/home 47 32%
PODC 2017-07-25 https://www.podc.org/podc2017/ 38 26%
PODS 2017-05-14 http://sigmod2017.org/pods-program/ 29 24%
PPoPP 2017-02-04 http://ppopp17.sigplan.org/ 29 48%
SC 2017-11-14 http://sc17.supercomputing.org/ 61 41%
SIGCOMM 2017-08-21 http://conferences.sigcomm.org/sigcomm/2017/ 36 50%
SIGIR 2017-08-07 http://sigir.org/sigir2017/ 78 29%
SIGMETRICS 2017-06-05 http://www.sigmetrics.org/sigmetrics2017 27 30%
SIGMOD 2017-05-14 http://sigmod2017.org/ 96 31%
SLE 2017-10-23 http://www.sleconf.org/2017/ 24 4%
SOCC 2017-09-25 https://acmsocc.github.io/2017/ 45 36%
SOSP 2017-10-29 https://www.sigops.org/sosp/sosp17/ 39 59%
SP 2017-05-22 https://www.ieee-security.org/TC/SP2017/index.html 60 38%
SPAA 2017-07-24 http://spaa.acm.org/2017/index.html 31 26%
SYSTOR 2017-05-22 https://www.systor.org/2017/ 16 12%
VEE 2017-04-09 http://conf.researchr.org/home/vee-2017 18 44%

We asked authors a few demographic questions, and then several questions specific to each of the papers they wrote in our collection (up to three random papers; some authors published more in this set). The complete set of questions, as well as the distributions of the responses, are listed in the next section.

Survey questions

Each of the following segments starts with the actual question asked, followed by the distribution of responses. In all cases, “NA” stands for non-response to this question.

Demographic questions

The survey started with three demographic questions, which could help us identify response bias and stratify responses by population.


Which best describes your position during 2017?

We can see that about one third (36.2%) of the respondents were students in 2017, another third or so professors (34.2%), and the rest (29.6%) is distributed between all other categories, including unknown.

We can check for response bias in industry or government responses, by looking at another data source: the affiliation of an author’s email address (or their Google Scholar profile if we couldn’t find an email address in the paper). Of our 8196 unique authors, 7042 had an identifiable affiliation. Of these, 14% had an industry affiliation, compared to 13.6% of the non-NA survey respondents (\(\chi{}^2=0.057\), \(p=0.811\)). The difference for government researchers is a little larger — 6.4% among survey respondents vs. 4.8% in author emails (\(\chi{}^2=3.137\), \(p=0.0765\)), but still not significant enough to reject the null hypothesis of independence.

As for the ratios within academia, it’s unclear whether this response distribution is representative of the field’s academic population or even of just among our set of 8196 authors. It is plausible, for example, that students are less inclined to respond to this survey than professors (especially in the summer), introducing a bias in the responses. On the other hand, it is also plausible that the senior researcher in an academic team (the professor) is the corresponding author, and therefore the most informed author to respond to a survey from their coauthors, which would improve the accuracy of the responses. Unfortunately, we don’t have a separate data source to compare these ratios against, so the existence of this bias remains unknown.


What is your English level proficiency?

Of the non-NA answers, 69% of respondents chose “Non-native” for their English level proficiency. But of our available emails, approximately 59% of addresses are in the US, UK, or CA alone. This large discrepancy could suggest a response bias in the survey, although it is also very likely that a large proportion of the authors in these migrant countries are non-native speakers.


What is your gender?

Of the non-NA answers, 10.3% of respondents chose “Female”. We have manually verified the gender of 7929 unique authors on the web, and found the percentage of women among those to be 11.1%. These two proportions are very close (\(\chi{}^2=0.336\), \(p=0.562\)), leading us to believe there is no significant selection bias in respondent gender.

We compared the values of the last two questions and found no significant difference between the English proficiency of women and men.

Paper overview questions

For each paper an author wrote in our data set (up to three), we’ve asked them to tell us a little about the paper and its submission process, as listed in the following questions.


How many months did it take to research and write this paper?

Authors were given a choice of 3-month duration ranges, including the choice to skip this question. Overall, we got responses for 678 unique papers. In a few cases, two authors responded with different values to the same question for the same paper. The chart shows the distributions of these responses for either one author group or another, arbitrarily divided (papers with a single response are also included in both distributions).

The distributions are very similar overall (\(\chi{}^2=0.56\), \(p=0.97\)), so we’ll arbitrarily stick with the first non-NA response per paper in our future analyses.

Note that the distribution seems to be monotonically increasing, with the exception of a mode at the 4-6 months range. It would be interesting to see how the 12+ months group would have broken down further if we had asked for more detail. But since we haven’t, we unfortunately can’t tell at which point (duration of research) the paper count would start going down.

We are however able to explore other relationships to research duration using additional data.

Relationship with team size

We can extract the number of co-authors on each of the papers for which we have a survey response, and compare it to the length of time it took to write the paper. As the chart below shows, there doesn’t appear to be a significant relationship, with a median of 4-5 co-authors for all papers. It’s possible that all authors contribute the same amount of time and therefore papers with more authors took significantly more effort overall. But the independence of these two factors is probably better explained by an uneven distribution of effort among authors. In other words, the lead author(s) can spend more or fewer months on a paper regardless of how many other people contribute to it.

Relationship with lead author

To dive a little deeper into this question, we’ll look specifically at the statistics of the first author of each paper, assuming they’re the lead author (only 12% of the 2046 papers in our data set with three or more authors have alphabetical author order).

The first question we asked was whether the amount of time it takes to research a paper is related to the lead author’s experience. We have no direct measures of this experience, but many authors in our set maintain a uniquely identifiable Google Scholar (GS) profile (68% of the lead authors for 678 papers for which we have months of research). From the GS profile we can extract the number of prior publications of each lead author at the time their current paper was published, and use this metric as a proxy for publication experience.

As the next chart shows, the relationship is not clear. On the one hand, the 75th-percentile experience is higher for the fastest group of papers (1-3 months of research), compared to the other groups. But the median and extreme experience levels vary only a little between groups. In particular, there is no monotonic relationship between experience and months of research.

We also looked at the sector of the lead author (as inferred from their affiliation), but found no meaningful relationship to the length of time the paper took to research (\(\chi{}^2=5.454\), \(p=0.708\)). Gender also didn’t seem to play a role in duration of research (not depicted, \(\chi{}^2=2.766\), \(p=0.598\)).

In summary, the duration of research leading to a publication cannot be explained team size or lead author’s experience, gender, or sector. It’s possible that the survey responses are too noisy, but the fact that multiple authors of the same paper typically responded with very similar answers to the duration question weakens this possibility. This suggest therefore that duration of research is independent from these factors.


How many conferences/journals was it submitted to prior to this publication?

Since we couldn’t directly survey authors of rejected papers as well (that information is not usually available outside the program committee), we asked this question to get an indirect signal about papers that get rejected at some point. In our survey data set, these papers represent 41.2% of the non-NA responses. Of those, the majority of papers had only been submitted once or twice before, although 46 tenacious papers were previously submitted three or more times. One was even submitted 12 times prior to this accepted version (curiously, it reportedly only took 1-3 months of research, far less than it took to get published).

Again, there are only small differences between the distributions of the first response and the second, so we’ll stick with the first non-NA response for later analysis.

We can plot the relationship between the last two questions as a distribution of prior submission vs. duration of research. Here, we can easily identify a monotonically increasing relationship (in a one-way ANOVA test, only the mean differences between the 1-3 / 4-6 and 7-9 / 10-12 groups had an adjusted p-value above 0.05). This monotonic relationship is trivial to explain if respondents include the time involved in prior submissions as part of the total months spent on research.

As before, we also looked at the relationships between prior resubmissions and team size, as well as lead author’s sector, gender, and experience. Again, we found no significant links to the number of resubmissions of a paper (number of rejections, really) from these four factors. This lack of clear relationship may help explain why several of our respondents expressed that the peer review process feels very random or noisy.


Please type in their names

Authors were asked to list (in free-form text) the prior venues to which each paper was submitted. Because of the unstructured responses, quantitative analysis is challenging. But the key conferences can be visualized with a word cloud, sized by frequency.