Does peer review fail to spot outstanding research?

A paper by Siler et al was published last week which attracted quite a bit of attention among those of us who take an interest in scientific publishing and the peer review process. It looked at the citation count of papers that had been submitted to 3 high-impact medical journals and subsequently published, either in one of those 3 journals or in another journal if rejected by one of the 3.

The accompanying press release from the publisher told us that “scientific peer review may have difficulties identifying unconventional and/or outstanding work”. This wasn’t too far off what was claimed in the paper, where Siler et al concluded that their work suggested that peer review “had difficulties in identifying outstanding or breakthrough work”.

The press release was reported uncritically by several organisations that should have known better, including Science, Nature,  and Retraction Watch.

It’s an interesting theory. The theory goes that peer reviewers don’t like to get out of their comfort zone, and while they may give good reviews to small incremental advances in their field, they don’t like radical new research that breaks new ground, so such research may be rejected.

The only problem with this theory is that Siler et al’s paper provides absolutely no data to support it.

Let’s look at what they did. They looked at 1008 manuscripts that were submitted to 3 top-tier medical journals (Annals of Internal Medicine, British Medical Journal, and The Lancet). Most of those papers were rejected, but subsequently published in other journals. Siler et al tracked the papers to see how many times each paper was cited.

Now, there we have our first problem. Using the number of times a paper is cited as a measure of groundbreaking research is pretty crude. Papers can be highly cited for many reasons, and presenting groundbreaking research is only one of them. I am writing this blogpost on the same day that I found that the 6th most important paper of the year according to “Altmetrics” (think of it as citation counting for the Facebook generation), was about how long it takes for boxes of chocolates on hospital wards to be eaten. A nicely conducted and amusing piece of research, to be sure, but hardly breaking new frontiers in science.

There’s also something rather fishy about the numbers of citations reported in the paper. The group of papers with the lowest citation rate reported in the paper were cited an average of 69.8 times each. That’s an extraordinarily high number. Of the 3 top-tier journals studied, The Lancet has the highest impact factor, at 39.2. That means that papers in The Lancet are cited an average of 39.2 times each. Doesn’t it seem rather odd that papers rejected from it are cited almost twice as often? I’m not sure what to make of that, but it does make me wonder if there is a problem with data quality.

Anyway, the main piece of evidence used to support the idea that peer review was bad at recognising outstanding research is that the 14 most highly cited papers of the 1008 papers examined were rejected by the 3 top journals. The first problem with that is that 12 of those 14 were rejected by the journals’ in-house editorial staff without being sent for peer review. So even if there were no further problems with the paper, we couldn’t draw any conclusions about failings of peer review: the failings would be down to journals’ in-house staff.

Another problem is that those 14 papers were not, of course, rejected by the peer review system. They were all published in peer reviewed journals: just not the first journal that the authors tried. So we really can’t conclude that peer review is preventing groundbreaking work from being published.

But in any case, if we ignore those flaws and ask ourselves is it still not true that groundbreaking (or at least highly cited) research is being rejected, I think we’d want to know that the highly cited research is more likely to be rejected than other research.

And I’m afraid the evidence for that is totally lacking.

Rejecting the top 14 papers sounds bad. But it’s important to realise that the overall rejection rate was very high: only 6.2% of the papers submitted were accepted. If the probability of accepting each of the top 14 papers was 6.2%, like all the others, then there is about a 40% chance that all 14 of them would be rejected. And that is ignoring the fact that looking specifically at the top 14 papers is a post-hoc analysis. The only robust way to see if the more highly cited papers were more likely to be rejected would have been to specify a specific hypothesis in advance, rather than to focus on what came out of the data as being the most impressive statistic.

So, to recap, this paper used a crude measure of whether papers were groundbreaking, did not look at what peer reviewers thought of them, found precisely zero high impact articles that were rejected by the peer review system, and found no evidence whatsoever that high-impact articles were more likely to be rejected than any others.

Call me a cynic if you like, but I’m not convinced. The peer review process is not perfect, of course, But if you want to convince me that one of its flaws is that it is biased against groundbreaking research, you’re going to have to come up with better evidence than Siler et al’s paper.


Clinically proven

My eye was caught the other day by this advert:


Quite a bold claim, I thought. “Defends against cold and flu” would indeed be impressive, if it were true. Though I also noticed the somewhat meaningless verb “defend”. What does that mean exactly? Does it stop you getting a cold or flu in the first place? Or does it just help you recover faster if you get a cold or flu?

I had a look at the relevant page on the Boots website to see if I could find out more. It told me

“Boots Pharmaceuticals Cold & Flu Defence Nasal Spray is an easy to use nasal spray with antiviral properties containing clinically proven Carragelose to defend against colds and flu, as well as help shorten the duration and severity of both colds and flu.”

It then went on to say

“Use three times a day to help prevent a cold or flu, or several times a day at the first signs helping reduce the severity and duration of both colds and flu.”

OK, so Boots obviously want us to think that it can do both: prevent colds and flu and help treat them.

So what is the evidence? Neither the advert nor the web page had any links to any of the evidence backing up the claim that these properties were “clinically proven”. So I tweeted to Boots to ask them.

To their credit, Boots did reply to me (oddly by direct message, in case you’re wondering why I’m not linking to their tweets) with 4 papers in peer reviewed journals.

So how does the evidence stack up?

Well, the first thing to note is that although there were 4 papers, there were only 3 clinical trials: one of the papers is a combined analysis of 2 of the others. The next thing to note is that all 3 trials were of patients in the early stages of a common cold. So right away we can see that we have no evidence whatsoever that the product can help prevent a cold or flu, and no evidence whatsoever that it can treat flu.

The “clinically proven” claim is starting to look at little shaky.

But can it at least treat a common cold? That would be pretty impressive if it could. The common cold has proved remarkably resilient to anything medical science can throw at it. A treatment that actually worked against the common cold would indeed be good news.

The first of the trials was published in 2010. It was an exploratory study in 35 patients who were in the first 48 hours of a cold, but otherwise healthy. It was randomised and double-blind, and as far as I can tell from the paper, seems to have been reasonably carefully conducted. The study showed a significant benefit of the nasal spray on the primary outcome measure, namely the average of a total symptom score on days 2 to 4 after the start of dosing.

Well, I say significant. It met the conventional level of statistical significance, but only just, at P = 0.046 (that means that there’s about a 1 in 20 chance you could have seen results like this if the product were in fact completely ineffective: not a particularly high bar). The size of the effect also wasn’t very impressive: the symptom score was 4.6 out of a possible 24 in the active treatment group and 6.3 in the placebo group. Not only that, but it seems symptom scores were higher in the placebo group at baseline as well, and no attempt was made to adjust for that.

So not wholly convincing, really. On the other hand, the study did show quite an impressive effect on the secondary outcome of viral load, with a 6-fold increase from baseline to day 3 or 4 in the placebo group, but a 92% decrease in the active group. This was statistically significant at P = 0.009.

So we have some preliminary evidence of efficacy, but with such a small study and such unconvincing results on the primary outcome of symptoms, I think we’re going to have to do a lot better.

The next study was published in 2012, and included children (ages 1 to 18 years) in the early stages of a common cold. It was also randomised and double blind. The study randomised 213 patients, but only reported efficacy data for 153 of them, so that’s not a good start. It also completely failed to show any difference between the active and placebo treatments on the primary outcome measure, the symptom score from days 2 to 7. Again, there was a significant effect on viral load, but given the lack of an effect on the symptom score, it’s probably fair to say the product doesn’t work very well, if at all, in children.

The final study was published in 2013. It was again randomised and double blind, and like the first study included otherwise healthy adults in the first 48 h of a common cold. The primary endpoint was different this time, and was the duration of disease. This was a larger study than the first one, and included 211 patients.

The results were far from impressive. One of the big problems with this study was that they restricted their efficacy analysis to the subset of 118 patients with laboratory confirmed viral infection. Losing half your patients from the analysis like this is a huge problem. If you have a cold and are tempted to buy this product, you won’t know whether you have laboratory confirmed viral infection, so the results of this study may not apply to you.

But even then, the results were distinctly underwhelming. The active and placebo treatments were only significantly different in the virus-positive per-protocol population, a set of just 103 patients: less than half the total number recruited. And even then, the results were only just statistically significant, at P = 0.037. The duration of disease was reduced from 13.7 days in the placebo group to 11.6 days in the active group.

So, do I think that Boots Cold and Flu Defence is “clinically proven”? Absolutely not. There is no evidence whatsoever that it prevents a cold. There is no evidence whatsoever that it either prevents or treats flu.

There is some evidence that it may help treat a cold. It’s really hard to know whether it does or not from the studies that have been done so far. Larger studies will be needed to confirm or refute the claims. If it does help to treat a cold, it probably doesn’t help very much.

The moral of this story is that if you see the words “clinically proven” in an advert, please be aware that that phrase is completely meaningless.

I’m back!

Well, 2014 has been an “interesting” year for me. The company that I’d run for 15 years, Dianthus Medical, went bust in July, and I rather suddenly and unexpectedly found myself unemployed. That was not a fun experience.

Happily, I didn’t stay unemployed for long, and in the autumn I started a new job. I must confess to having been a bit nervous about this after having been my own boss since the days when we all used to think Tony Blair was one of the good guys. But I needn’t have worried: my new job has turned out to be a real joy.

Running a business was pretty damn hard work. The silver lining of the cloud that was my business going tits up is that I no longer have to worry about all that business stuff: dealing with endless government-mandated red tape, chasing customers who don’t pay on time, trying to find new business, and all that sort of thing. Now I can just get on with doing all the interesting statistical consultancy that I enjoy.

You may remember that I used to write a blog on the Dianthus Medical website. That site is sadly now defunct, but I do have a backup of all the blogposts and I will get round to putting them back on the internet one of these days.

But anyway, after a little break from blogging while I sorted my life out, I’m back. I hope you’ll come back and visit my new blog and see what interesting things from the world of statistics, medicine, and science move me to write something.