Chocolate, clueless reporting, and ethics

I have just seen a report of a little hoax pulled on the media by John Bohannon. What he did was to run a small and deliberately badly designed clinical trial, the results of which showed that eating chocolate helps you lose weight.

The trial showed no such thing, of course, as Bohannon points out. It just used bad design and blatant statistical trickery to come up with the result, which should not have fooled anyone who read the paper even with half an eye open.

Bohannon then sent press releases about the study to various media outlets, many of which printed the story completely uncritically. Here’s an example from the Daily Express.

This may be a lovely little demonstration of how lazy and clueless the media are, but I have a nasty feeling it’s actually highly problematic.

The problem is that neither Bohannon’s description of the hoax nor the paper publishing the results of the study make any mention of ethical review. Let’s remember that although the science was deliberately flawed, there was still a real clinical trial here with real human participants.

What were those participants told? Were they deceived about the true nature of the study? According to Bohannon,

“They used Facebook to recruit subjects around Frankfurt, offering 150 Euros to anyone willing to go on a diet for 3 weeks. They made it clear that this was part of a documentary film about dieting, but they didn’t give more detail.”

That certainly sounds to me like deception. It is an absolutely essential feature of clinical research that all research must be approved by an independent ethics committee. This is all the more important if participants are being deceived, which is always a tricky ethical issue. There is no rule that gives an exception to research done as a hoax.

The research was apparently done under the supervision of a German doctor, Gunter Frank. While I can’t claim to be an expert in professional requirements of German doctors, I would be astonished if running a clinical trial without ethical approval was not a serious disciplinary matter.

And yet there is no mention anywhere of ethical approval for this study. I really, really hope that’s just an oversight. Recruiting human participants to a clinical trial without proper ethical approval is absolutely not acceptable.

Update 29 May:

According to the normally reliable Retraction Watch, my fears about this study were justified. They are reporting that Bohannon had confirmed to them that the study did not have ethical approval.

Also, the paper has mysteriously disappeared from the journal’s website, so I’ve replaced the link to the paper with a link to a copy of it preserved thanks to Google’s web cache and Freezepage.

Are strokes really rising in young people?

I woke up to the news this morning that there has been an alarming increase in the number of strokes in people aged 40-54.

My first thought was “this has been sponsored by a stroke charity, so they probably have an interest in making the figures seem alarming”. So I wondered how robust the research was that led to this conclusion.

The article above did not link to a published paper describing the research. So I looked on the Stroke Association’s website. There, I found a press release. This press release also didn’t link to any published paper, which makes me think that there is no published paper. It’s hard to believe a press release describing a new piece of research would fail to tell you if it had been published in a respectable journal.

The press release describes data on hospital admissions provided by the NHS, which shows that the number of men aged 40 to 54 admitted to hospital with strokes increased from 4260 in the year 2000 to to 6221 in 2014, and the equivalent figures for women were an increase from 3529 to 4604.

Well, yes, those figures are certainly substantial increases. But there could be various different reasons for them, some worrying, others reassuring.

It is possible, as the press release certainly wants us to believe, that the main reason for the increase is that strokes are becoming more common. However, it is also possible that recognition of stroke has improved, or that stroke patients are more likely now to get the hospital treatment they need than in the past. Both of those latter explanations would be good things.

So how do the stroke association distinguish among those possibilities?

Well, they don’t. The press release says “It is thought that the rise is due to increasing sedentary and unhealthy lifestyles, and changes in hospital admission practice.”

“It is thought that”? Seriously? Who thinks that? And why do they think it?

It’s nice that the Stroke Association acknowledge the possibility that part of the reason might be changes in hospital admission practice, but given that the title of the press release is “Stroke rates soar among men and women in their 40s and 50s” (note: not “Rates of hospital admission due to stroke soar”), there can be no doubt which message the Stroke Association want to emphasise.

I’m sorry, but they’re going to need better evidence than “it is thought that” to convince me they have teased out the relative contributions of different factors to the rise in hospital admissions.

Obesity and dementia

It’s always difficult to draw firm conclusions from epidemiological research. No matter how large the sample size and how carefully conducted the study, it’s seldom possible to be sure that the result you have found is what you were looking for, and not some kind of bias or confounding.

So when I heard in the news yesterday that overweight and obese people were at reduced risk of dementia, my first thought was “I wonder if that’s really true?”

Well, the paper is here. Sadly behind a paywall (seriously guys? You know it’s 2015, right?), though luckily the researchers have made a copy of the paper available as a Word document here.

In many ways, it’s a pretty good study. Certainly no complaints about the sample size: they analysed data on nearly 2 million people. With a median follow-up time of over 9 years, their analysis was based on a long enough time period to be meaningful. They had also thought about the obvious problem with looking at obesity and dementia, namely that obese people may be less likely to get dementia not because obesity protects them against dementia, but just because they are more likely to die of an obesity-related disease before they are old enough to develop dementia.

The authors did a sensitivity analysis in which they assumed that patients who died during the observation period had twice the risk of developing dementia had they lived of patients who survived to the end of follow-up. Although that weakened the negative association between overweight and dementia, it was still present.

There are, of course, other ways to do this. Perhaps it might have been appropriate to use a competing risks survival model instead of the Poisson model they used for their statistical analysis, and if you were going to be picky, you could say their choice of statistical analysis was a bit fishy (sorry, couldn’t resist).

But I don’t think the method of analysis is the big problem here.

For a start, although some of the most obvious confounders (age, sex, smoking, drinking, relevant medication use, diabetes, and previous myocardial infarction) were adjusted for in the analysis, there was no adjustment for socioeconomic status or education level, which is a big omission.

But more importantly, I think the major limitation of these results comes from what is known as the healthy survivor effect.

Let me explain.

The people followed up in the study were all aged over 40 at the start. But there was no upper age limit. Some people were aged over 90 at the start. And not surprisingly, most of the cases of dementia occurred in older people.  Only 18 cases of dementia occurred in those aged 40-44, whereas over 12,000 cases were observed in those aged 80-84. So it’s really the older age groups who are dominating the analysis. Over half the cases of dementia occurred in people aged > 80, and over 90% occurred in people aged > 70.

Now, let’s think about those 80+ year olds for a minute.

There is reasonably good evidence that obese people die younger, on average, than those of normal weight. So the obese people who were aged > 80 at the start of the study are probably not normal obese people. They are probably healthier than average obese people. Many obese people who are less healthy than average would be dead before they are 80, so would never have the chance to be included in that age group of the study.

So in other words, the old obese people in the study are not typical obese people: they are unusually healthy obese people.

That may be because they have good genes or it may be because something about their lifestyle is keeping them healthy, but one way or another, they have managed to live a long life despite their obesity. This is an example of the healthy survivor effect.

There will also be a healthy survivor effect at play in the people of normal weight at the upper end of the age range, but that will probably be less marked, as they haven’t had to survive despite obesity.

I think it is therefore possible that this healthy survivor effect may have skewed the results. The people with obesity may have been at less risk of dementia not because their obesity protected them, but because they were a biased subset of unusually healthy obese people.

This does not, of course, mean that obesity doesn’t protect against dementia. Maybe it does. One thing that would have been interesting would be to see the results broken down by the type of dementia. It is hard to believe that obesity would protect against vascular dementia, when on the whole it is a risk factor for other vascular diseases, but the hypothesis that it could protect against Alzheimer’s disease doesn’t seem so implausible.

What it does mean is that we have to be really careful when interpreting the results of epidemiological studies such as this one. It is always extremely hard to know to what extent the various forms of bias that can creep into epidemiological studies have influenced the results.

 

 

Psychology journal bans P values

I was rather surprised to see recently (OK, it was a couple of months ago, but I do have a day job to do as well as writing this blog) that the journal Basic and Applied Social Psychology has banned P values.

That’s quite a bold move. There are of course many problems with P values, about which David Colquhoun has written some sensible thoughts. Those problems seem to be particularly acute in the field of psychology, which suffers from something of a problem when it comes to replicating results. It’s undoubtedly true that many published papers with significant P values haven’t really discovered what they claimed to have discovered, but have just made type I errors, or in other words, have obtained significant results just by chance, rather than because what they claim to have discovered is actually true.

It’s worth reminding ourselves what the conventional test of statistical significance actually means. If we say we have a significant result with P < 0.05, then that means that there is a 1 in 20 chance we would have seen that result if in fact we had completely random data. A 1 in 20 chance is not at all rare, particularly when you consider the huge number of papers that are published every day. Many of them are going to have type I errors.

Clearly, something must be done.

However, call me a cynic if you like, but I’m not sure how banning P values (and confidence intervals as well, if you thought just banning P values was radical enough) is going to help. Perhaps if all articles in Basic and Applied Social Psychology in the future have robust Bayesian analyses that would be an improvement. But I hardly think that’s likely to happen. What is more likely is that researchers will claim to have discovered effects even if they are not conventionally statistically significant, which surely is even worse than where we were before.

I suspect one of the problems with psychology research is that much research, particularly negative research, goes unpublished. It’s probably a lot easier to get a paper published showing that you have just demonstrated some fascinating psychological effect than if you have just demonstrated that the effect you had hypothesised doesn’t in fact exist.

This is a problem we know well in my world of clinical trials. There is abundant evidence that positive clinical trials are more likely to be published than negative ones. This is a problem that the clinical research community has become very much aware of, and has been working quite hard to solve. I wouldn’t say it is completely solved yet, but things are a lot better now than they were a decade or two ago.

One relevant factor is the move to prospective trial registration.  It seems that prospectively registering trials is helping to solve the problem of publication bias. While clinical research doesn’t yet have a 100% publication record (though some recent studies do show disclosure rates of > 80%), I suspect clinical research is far ahead of the social sciences.

Perhaps a better solution to the replication crisis in psychology would be a system for prospectively registering all psychology experiments and a commitment by researchers and journals to publish all results, positive or negative. That wouldn’t necessarily mean more results get replicated, of course, but it would mean that we’d be more likely to know about it when results are not replicated.

I’m not pretending this would be easy. Clinical trials are often multi-million dollar affairs, and the extra bureaucracy involved in trial registration is trivial in comparison with the overall effort. Many psychology experiments are done on a much smaller scale, and the extra bureaucracy would probably add proportionately a lot more to the costs. But personally, I think we’d all be better off with fewer experiments done and more of them being published.

I don’t think the move by Basic and Applied Social Psychology is likely to improve the quality of reporting in that journal. But if it gets us all talking about the limitations of P values, then maybe that’s not such a bad thing.

 

Vaping among teenagers

Vaping, or use of e-cigarettes, has the potential to be a huge advance in public health. It provides an alternative to smoking that allows addicted smokers to get their nicotine fix without exposing them to all the harmful chemicals in cigarette smoke. This is a development that should be welcomed with open arms by everyone in the public health community, though oddly, it doesn’t seem to be. Many in the public health community are very much against vaping. The reasons for that might make an interesting blogpost for another day.

But today, I want to talk about a piece of research into vaping among teenagers that’s been in the news a lot today.

Despite the obvious upside of vaping, there are potential downsides. The concern is that it may be seen as a “gateway” to smoking. There is a theoretical risk that teenagers may be attracted to vaping and subsequently take up smoking. Obviously that would be a thoroughly bad thing for public health.

Clearly, it is an area that is important to research so that we can better understand what the downside might be of vaping.

So I was interested to see that a study has been published today that looks specifically at smoking among teenagers. Can that help to shed light on these important questions?

Looking at some of the stories in the popular media, you might think it could. We are told that e-cigs are the “alcopops of the nicotine world“, that there are “high rates of usage among secondary school pupils” and that e-cigs are “encouraging people to take up smoking“.

Those claims are, to use a technical term, bollocks.

Let’s look at what the researchers actually did. They used cross sectional questionnaire data in which a single question was asked about vaping: “have you ever tried or purchased e-cigarettes?”

The first thing to note is that the statistics are about the number of teenagers who have ever tried vaping. So they will be included in the statistics if they tried it once. Perhaps they were at a party and they had a single puff on a mate’s e-cig. The study gives us absolutely no information on the proportion of teenagers who vaped regularly. So to conclude “high rates of usage” just isn’t backed up by any evidence. Overall, about 1 in 5 of the teenagers answered yes to the question. Without knowing how many of those became regular users, it becomes very hard to draw any conclusions from the study.

But it gets worse.

The claim that vaping is encouraging people to take up smoking isn’t even remotely supported by the data. To do that, you would need to know what proportion of teenagers who hadn’t previously smoked try vaping, and subsequently go on to start smoking. Given that the present study is a cross sectional one (ie participants were studied only at a single point in time), it provides absolutely no information on that.

Even if you did know that, it wouldn’t tell you that vaping was necessarily a gateway to smoking. Maybe teenagers who start vaping and subsequently start smoking would have smoked anyway. To untangle that, you’d ideally need a randomised trial of areas in which vaping is available and areas in which it isn’t, though I can’t see that ever being done. The next best thing would be to look at changes in the prevalence of smoking among teenagers before and after vaping became available. If it increased after vaping became available, that might give you some reason to think vaping is acting as a gateway to smoking. But the current study provides absolutely no information to help with this question.

I’ve filed post this under “Dodgy reporting”, and of course the journalists who wrote about the study in such uncritical terms really should have known better, but actually I think the real fault lies here with the authors of the paper. In their conclusions, they write “Findings suggest that e-cigarettes are being accessed by teenagers more for experimentation than smoking cessation.”

No, they really don’t show that at all. Of those teenagers who had tried e-cigs, only 15.8% were never-smokers. And bear in mind that most of the overall sample (61.2%) were never-smokers. That suggests that e-cigs are far more likely to be used by current or former smokers than by non-smokers. In fact while only 4.9% of never smokers had tried e-cigs, (remember, that may mean only trying them once), 50.7% of ex-smokers had tried them. So a more reasonable conclusion might be that vaping is helping ex-smokers to quit, though in fact I don’t think it’s possible even to conclude that much from a cross-sectional study that didn’t measure whether vaping was a one-off puff or a habit.

While there are some important questions to be asked about how vaping is used by teenagers, I’m afraid this new study does absolutely nothing to help answer them.

 Update 1 April:

It seems I’m not the only person in the blogosphere to pick up some of the problems with the way this study has been spun. Here’s a good blogpost from Clive Bates, which as well as making several important points in its own right also contains links to some other interesting comment on the study.

 

Tobacco vs teddy bears

Now, before we go any further, I’d like to make one thing really clear. Smoking is bad for you. It’s really bad for you. Anything that results in fewer people smoking is likely to be a thoroughly good thing for public health.

But sadly, I have to say there are times when I think the anti-tobacco movement is losing the plot. One such time came this week when I saw the headline “Industry makes $7,000 for each tobacco death“. That has to be one of the daftest statistics I’ve seen for a long time, and I speak as someone who takes a keen interest in daft statistics.

I’m not saying the number is wrong. I haven’t checked it in detail, so it could be, but that’s not the point, and in any case, the numbers look more or less plausible.

The calculation goes like this. Total tobacco industry profits in 2013 (the most recent year for which figures are available) were $44 billion. In the same year, 6.3 million people died from smoking related diseases. Divide the first number by the second, and you end up with $7000 profit per death.

I think we’re supposed to be shocked by that. Perhaps the message is that the tobacco industry is profiting from deaths. In fact given we are told that this figure has increased from $6000 a couple of years ago as if that were a bad thing, I guess that is what we’re supposed to think.

If you haven’t yet figured out how absurd that is, let’s compare it with the teddy bear industry.

Now, some of the figures that follow come from sources that might not score 10/10 for reliability, and these calculations might look like they’ve been made up on the back of a fag packet.  But please bear with me, because all that we really require for today’s purposes is that these numbers be at least approximately correct to within a couple of orders of magnitude, and I think they probably are.

Let’s start with the number of teddy bear related deaths each year. I haven’t been able to find reliable global figures for that, but according to this website, there are 22 fatal incidents involving teddy bears and other toys in the US each year. Let’s assume that teddy bears account for half of those. That gives us 11 teddy bear related deaths per year in the US.

Since we’re looking at the US, how much profit does the US teddy bear industry make each year? I’ve struggled to find good figures for that, but I think we can get a rough idea by looking at the profits of the Vermont Teddy Bear Company, which is apparently one of the largest players in the US teddy bear market. I don’t know what their market share is. Let’s just take a wild guess that it’s about 1/3 of the total teddy bear market.

The company is now owned by private equity and so isn’t required to report its profits, but I found some figures from the last few years (2001 to 2005) before it was bought by private equity, and its average annual profit for that period was about $1.7 million. So if that represents 1/3 of the total teddy bear market, and if its competitors are similarly profitable (wild assumptions I know, but we’re only going for wild approximations here), then the total annual profits of the US teddy bear market are about £5 million.

So, if we now do the same calculation as for the tobacco industry, we see that the teddy bear industry makes a profit of about $450,000 per death ($5 million divided by 11 deaths).

So do we conclude that the teddy bear industry is far more evil than the tobacco industry?

No. What we conclude is that using “profits per death” as a measure of the social harm of an industry is an incredibly daft use of statistics. You are dividing by the number of deaths, so the more people you kill, the smaller will be your profits per death.

There are many statistics you could choose to show the harms of the tobacco industry. That it kills about half its users is a good place to start.  That chronic obstructive pulmonary disease, a disease that is massively associated with smoking, is the world’s third leading cause of death, also makes a pretty powerful point. Or one of my personal favourite statistics about smoking, that a 35-year-old smoker is twice as likely to die before age 70 as a non-smoker of the same age.

But let’s not try to show how bad smoking is by using a measure which increases the fewer people your product kills, OK?

 

How to spot dishonest nutribollocks

I saw a post on Facebook earlier today from GDZ Supplements, a manufacturer of nutribollocks products aimed at gullible sports people.

The post claimed that “Scientific studies suggest that substances in milk thistle protect the liver from toxins.” This was as part of their sales spiel for their “Milk Thistle Liver Cleanse”. No doubt we are supposed to believe that taking the product makes your liver healthier.

Well, if there really are scientific studies, it should be possible to cite them. So I commented on their Facebook post to ask them. They first replied to say that they would email me information if I shared my email address with them, and then when I asked why they couldn’t simply post the links on their Facebook page, they deleted my question and blocked me from their Facebook page.

Screenshot from 2015-02-21 11:40:18

This, folks, is not the action of someone selling things honestly. If there were really scientific studies that supported the use of their particular brand of nutribollocks, it would have been perfectly easy to simply post the citation on their Facebook page.

But as it is, GDZ Supplements clearly don’t want anyone asking about the alleged scientific studies. It is hard to think of any explanation for that other than dishonesty on GDZ Supplements’ part.

What my hip tells me about the Saatchi bill

I have a hospital appointment tomorrow, at which I shall have a non-evidence-based treatment.

This is something I find somewhat troubling. I’m a medical statistician: I should know about evidence for the efficacy of medical interventions. And yet even I find myself ignoring the lack of good evidence when it comes to my own health.

I have had pain in my hip for the last few months. It’s been diagnosed by one doctor as trochanteric bursitis and by another as gluteus medius tendinopathy. Either way, something in my hip is inflammed, and is taking longer than it should to settle down.

So tomorrow, I’m having a steroid injection. This seems to be the consensus among those treating me. My physiotherapist was very keen that I should have it. My GP thought it would be a good idea. The consultant sports physician I saw last week thought it was the obvious next step.

And yet there is no good evidence that steroid injections work. I found a couple of open label randomised trials which showed reasonably good short-term effects for steroid injections, albeit little evidence of benefit in the long term. Here’s one of them. The results look impressive on a cursory glance, but something that really sticks out at me is that the trials weren’t blinded. Pain is subjective, and I fear the results are entirely compatible with a placebo effect. Perhaps my literature searching skills are going the same way as my hip, but I really couldn’t find any double-blind trials.

So in other words, I have no confidence whatsoever that a steroid injection is effective for inflammation in the hip.

So why am I doing this? To be honest, I’m really not sure. I’m bored of the pain, and even more bored of not being able to go running, and I’m hoping something will help. I guess I like to think that the health professionals treating me know what they’re doing, though I really don’t see how they can know, given the lack of good evidence from double blind trials.

What this little episode has taught me is how powerful the desire is to have some sort of treatment when you’re ill. I have some pain in my hip, which is pretty insignificant in the grand scheme of things, and yet even I’m getting a treatment which I have no particular reason to think is effective. Just imagine how much more powerful that desire must be if you’re really ill, for example with cancer. I have no reason to doubt that the health professionals treating me are highly competent and well qualified professionals who have my best interests at heart. But it has made me think how easy it must be to follow advice from whichever doctor is treating you, even if that doctor might be less scrupulous.

This has made me even more sure than ever that the Saatchi bill is a really bad thing. If a medical statistician who thinks quite carefully about these things is prepared to undergo a non-evidence-based treatment for what is really quite a trivial condition, just think how much the average person with a serious disease is going to be at the mercy of anyone treating them. The last thing we want to do is give a free pass for quacks to push completely cranky treatments at anyone who will have them.

And that’s exactly what the Saatchi bill will facilitate.

Ovarian cancer and HRT

Yesterday’s big health story in the news was the finding that HRT ‘increases ovarian cancer risk’. The scare quotes there, of course, tell us that that’s probably not really true.

So let’s look at the study and see what it really tells us. The BBC can be awarded journalism points for linking to the actual study in the above article, so it was easy enough to find the relevant paper in the Lancet.

This was not new data: rather, it was a meta-analysis of existing studies. Quite a lot of existing studies, as it turns out. The authors found 52 epidemiological studies investigating the association between HRT use and ovarian cancer. This is quite impressive. So despite ovarian cancer being a thankfully rare disease, the analysis included over 12,000 women who had developed ovarian cancer. So whatever other criticisms we might make of the paper, I don’t think a small sample size is going to be one of them.

But what other criticisms might we make of the paper?

Well, the first thing to note is that the data are from epidemiological studies. There is a crucial difference between epidemiological studies and randomised controlled trials (RCTs). If you want to know if an exposure (such as HRT) causes an outcome (such as ovarian cancer), then the only way to know for sure is with an RCT. In an epidemiological study, where you are not doing an experiment, but merely observing what happens in real life, it is very hard to be sure if an exposure causes an outcome.

The study showed that women who take HRT are more likely to develop ovarian cancer than women who don’t take HRT. That is not the same thing as showing that HRT caused the excess risk of ovarian cancer. It’s possible that HRT was the cause, but it’s also possible that women who suffer from unpleasant menopausal symptoms (and so are more likely to take HRT than those women who have an uneventful menopause) are more likely to develop ovarian cancer. That’s not completely implausible. Ovaries are a pretty relevant organ in the menopause, and so it’s not too hard to imagine some common factor that predisposes both to unpleasant menopausal symptoms and an increased ovarian cancer risk.

And if that were the case, then the observed association between HRT use and ovarian cancer would be completely spurious.

So what this study shows us is a correlation between HRT use and ovarian cancer, but as I’ve said many times before, correlation does not equal causation. I know I’ve been moaned at by journalists for endlessly repeating that fact, but I make no apology for it. It’s important, and I shall carry on repeating it until every story in the mainstream media about epidemiological research includes a prominent reminder of that fact.

Of course, it is certainly possible that HRT causes an increased risk of ovarian cancer. We just cannot conclude it from that study.

It would be interesting to look at how biologically plausible it is. Now, I’m no expert in endocrinology, but one little thing I’ve observed makes me doubt the plausibility. We know from a large randomised trial that HRT increases breast cancer risk (at least in the short term). There also seems to be evidence that oral contraceptives increase breast cancer risk but decrease ovarian cancer risk. With my limited knowledge of endocrinology, I would have thought the biological effects of HRT and oral contraceptives on cancer risk would be similar, so it just strikes me as odd that they would have similar effects on breast cancer risk but opposite effects on ovarian cancer risk. Anyone who knows more about this sort of thing than I do, feel free to leave a comment below.

But leaving aside the question of whether the results of the latest study imply a causal relationship (though of course we’re not really going to leave it aside, are we? It’s important!), I think there may be further problems with the study.

The paper tells us, and this was widely reported in the media, that “women who use hormone therapy for 5 years from around age 50 years have about one extra ovarian cancer per 1000 users”.

I’ve been looking at how they arrived at that figure, and it’s not totally clear to me how it was calculated. The crucial data in the paper is this table.  The table is given in a bit more detail in their appendix, and I’m reproducing the part of the table for 5 years of HRT use below.

 

 Age group  Baseline risk (per 1000)  Relative excess risk Absolute excess risk (per 1000)
 50-54  1.2  0.43  0.52
 55-59  1.6  0.23  0.37
 60-64  2.1  0.05  0.10
 Total  0.99

The table is a bit complicated, so some words or explanation are probably helpful. The baseline risk is the probability (per 1000) of developing ovarian cancer over a 5 year period in the relevant age group. The relative excess risk is the proportional amount by which that risk is increased by 5 years of HRT use starting at age 50. The absolute excess risk is the baseline risk multiplied by the relative excess risk.

The risk in each 5 year period is then added together to give the total excess lifetime risk of ovarian cancer for a woman who takes HRT for 5 years starting at age 50. I assume excess risks at older age groups are ignored as there is no evidence that HRT increases the risk after such a long delay. It’s important to note here that the figure of 1 in 1000 excess ovarian cancer cases refers to lifetime risk: not the excess in a 5 year period.

The figures for incidence seem plausible. The figures for absolute excess risk are correct if the relative excess risk is correct. However, it’s not completely clear where the figures for relative risk come from. We are told they come from figure 2 in the paper. Maybe I’m missing something, but I’m struggling to match the 2 sets of figures. The excess risk of 0.43 for the 50-54 year age group matches the relative risk 1.43 for current users with duration < 5 years (which will be true while the women are still in that age group), but I can’t see where the relative excess risks of 0.23 and 0.05 come from.

Maybe it doesn’t matter hugely, as the numbers in figure 2 are in the same ballpark, but it always makes me suspicious when numbers should match and don’t.

There are some further statistical problems with the paper. This is going to get a bit technical, so feel free to skip the next two paragraphs if you’re not into statistical details. To be honest, it all pales into insignificance anyway beside the more serious problem that correlation does not equal causation.

The methods section tells us that cases were matched with controls. We are not told how the matching was done, which is the sort of detail I would not expect to see left out of a paper in the Lancet. But crucially, a matched case control study is different to a non-matched case control study, and it’s important to analyse it in a way that takes account of the matching, with a technique such as conditional logistic regression. Nothing in the paper suggests that the matching was taken into account in the analysis. This may mean that the confidence intervals for the relative risks are wrong.

It also seems odd that the data were analysed using Poisson regression (and no, I’m not going to say “a bit fishy”). Poisson regression makes the assumption that the baseline risk of developing ovarian cancer remains constant over time. That seems a highly questionable assumption here. It would be interesting to see if the results were similar using a method with more relaxed assumptions, such as Cox regression. It’s also a bit fishy (oh damn, I did say it after all) that the paper tells us that Poisson regression yielded odds ratios. Poisson regression doesn’t normally yield odds ratios: the default statistic is an incidence rate ratio. Granted, the interpretation is similar to an odds ratio, but they are not the same thing. Perhaps there is some cunning variation on Poisson regression in which the analysis can be coaxed into giving odds ratios, but if there is, I’m not aware of it.

I’m not sure how much those statistical issues matter. I would expect that you’d get broadly similar results with different techniques. But as with the opaque way in which the lifetime excess risk was calculated, it just bothers me when statistical methods are not as they should be. It makes you wonder if anything else was wrong with the analysis.

Oh, and a further oddity is that nowhere in the paper are we told the total sample size for the analysis. We are told the number of women who developed ovarian cancer, but we are not told the number of controls that were analysed. That’s a pretty basic piece of information that I would expect to see in any journal, never mind a top-tier journal such as the Lancet.

I don’t know whether those statistical oddities have a material impact on the analysis. Perhaps they do, perhaps they don’t. But ultimately, I’m not sure it’s the most important thing. The really important thing here is that the study has not shown that HRT causes an increase in ovarian cancer risk.

Remember folks, correlation does not equal causation.

Hospital special measures and regression to the mean

Forgive me for writing 2 posts in a row about regression to the mean. But it’s an important statistical concept, which also happens to be widely misunderstood. Sometimes with important consequences.

Last week, I blogged about a claim that student tuition fees had not put off disadvantaged applicants. The research was flawed, because it defined disadvantage on the basis of postcode areas, and not on the individual characteristics of applicants. This means that an increase in university applications from disadvantaged areas could have simply been due to regression to the mean (ie the most disadvantaged areas becoming less disadvantaged) rather than more disadvantaged individual students applying to university.

Today, we have a story in the news where exactly the same statistical phenomenon is occurring. The story is that putting hospitals into “special measures” has been effective in reducing their death rates, according to new research by Dr Foster.

The research shows no such thing, of course.

The full report, “Is [sic] special measures working?” is available here. I’m afraid the authors’ statistical expertise is no better than their grammar.

The research looked at 11 hospital trusts that had been put into special measures, and found that their mortality rates fell faster than hospitals on average. They thus concluded that special measures were effective in reducing mortality.

Wrong, wrong, wrong. The 11 hospital trusts had been put into special measures not at random, but precisely because they had higher than expected mortality. If you take 11 hospital trusts on the basis of a high mortality rate and then look at them again a couple of years later, you would expect the mortality rate to have fallen more than in other hospitals simply because of regression to the mean.

Maybe those 11 hospitals were particularly bad, but maybe they were just unlucky. Perhaps it’s a combination of both. But if they were unusually unlucky one year, you wouldn’t expect them to be as unlucky the next year. If you take the hospitals with the worst mortality, or indeed the most extreme examples of anything, you would expect it to improve just by chance.

This is a classic example of regression to the mean. The research provides no evidence whatsoever that special measures are doing anything. To do that, you would need to take poorly performing hospitals and allocate them at random either to have special measures or to be in a control group. Simply observing that the worst trusts got better after going into special measures tells you nothing about whether special measures were responsible for the improvement.