In retrospective observational studies on hot topics , thousands of independent analytic teams may approach a similar question—all with different plans. This field-wise multiple hypothesis testing has been shown experimentally to generate both positive and negative statistically significant associations, simply by analytic choices. Even when data sets are standardized, multiple analytic approaches may yield a range of answers to a single question. Finally, randomized controlled trials, the gold standard of causal inference, have historically been immune to questions of multiple hypothesis testing, although this is increasingly being called into question with the emergence of redundant, duplicative, and large trial portfolios. In this commentary, we explore the role of multiplicity in biomedical research—a growing challenge to the interpretation of individual study results.Consider the case of nutritional headlines that dominate the front pages of prominent news outlets such as The New York Times’ health section. One week, researchers may suggest that blueberries or dark chocolate have been shown to reduce your risk of cancer, but the next week, these same exposures may be found to increase your risk. What explains this phenomenon? To begin, for popular topics, it is likely that thousands of individual analyses of a data set will be performed over a relatively short period of time, each controlling for some co-variates—those that researchers believe are plausibly related to an outcome—in an effort to uncover a meaningful correlation. Each of these models will create a new relationship between the investigated variables,planting blueberries in containers as Patel et al. demonstrated by simulating the research community of nutritional epidemiology.The authors used the National Health and Nutrition Examination Survey and probed a series of nutritional exposures, asking if they increased or decreased overall mortality.
For each exposure, the researchers used baseline variables and the 13 most common co-variates adjusted for in the sampled literature [e.g. ‘, hypertension, diabetes, cholesterol, alcohol consumption, education, income, family history of heart disease, heart disease, any cancer, physical activity and race/ethnicity’].Then, the entire research community was simulated. Over 8000 different models were generated for each exposure-mortality association by combining all conceivable combinations of the 13 modifiable demographic factors. They found that the majority of studies showed no significant association. But, what was noteworthy is that for 31% of the variables, there were both statistically significant positive and negative outcomesforthe same hypothesis, indicating that the hazard ratio could be HR>1 or HR ≤1 with a significant p-value depending on the level of co-variant adjustment.Researchers called this the vibration of effects. Schoenfeld and Ioannidis extended this result in an analysis measuring 50 common ingredients randomly selected from a cookbookThen, the researchers conducted a literature search on articles that measured each ingredient’s link to cancer. Most of the ingredients had articles measuring their relation to cancer risk. Despite many weak and non-significant relationships, most ingredients had studies with outcomes contrary to each other, showing either an increased or decreased risk of developing cancer.Zaorsky and colleagues applied the vibration of effect approach to practical questions in cancer medicine. They found that by varying other analytic choices—leftt truncation adjustment, propensity score matching, landmark analysis, and different combinations of co-variates—they were able to generate any desired result.These are all instances of a common theme when dealing with multiplicity:studies measuring the same research question yielding opposite findings.Work by Silberzahn et al. demonstrated a similar situation of multiplicity when they categorized the skin tone of different soccer players and included it in a data set with reports of penalties to 29 research teams.The following question was posed to the teams: Were soccer referees more likely to issue a dark-skinned player a red card, signalling a penalty, than they were a light-skinned player? Twenty of the teams reported significant evidence of bias, whereas nine teams discovered a non-significant relationship, with one team finding a trend in the opposite direction .These different analytical strategies provide researchers with a great deal of latitude, allowing for the potential of a myriad of distinct outcomes.
However, the issue intensifies when one considers that significant findings are more likely to be published,resulting in a dichotomized literature devoid of a middle ground of null results.Randomized controlled trials have historically been thought to be immune to multiplicity as rarely are hundreds or thousands of studies run on a single clinical question, but this fact may be shifting. There are now four critical considerations to examine regarding the relationship between multiplicity and oncology: The United States Food and Drug Administration will approve drugs based on a single positive trial, even if the primary outcome is a surrogate endpoint, and even if other trials are negative; Drug approvals often generate enormous financial windfalls; Pharmaceutical companies tend to conduct large, duplicative trials with little rationale; and The probability value is arbitrary. First, consider neratinib, the only drug ever approved in the adjuvant setting prior to the metastatic. Approval was based on a single placebo-controlled Phase III trial measuring invasive disease-free survival as a primary composite endpoint. The magnitude of benefit was small, with 5.1% and 1.3% improvements in 5-year iDFS rates in patients with hormone receptor-positive breast cancer who began therapy with trastuzumab less than one 1-year ago or more than 1-year ago, respectively.Additionally, there are occasions when a medicine, such as adjuvant sunitinib in renal cancer, is approved despite the existence of a single negative trial and a single positive trial, thus ignoring the study portfolio.The second and third points may be coupled; approvals of cancer drugs are anticipated to yield billion-dollar profits,which encourages the conduction of duplicative studies in many tumour types, despite weak evidence. Consider the genesis of the EVOLVE-1 study, which compared everolimus with placebo in patients with advanced hepatocellular carcinoma following sorafenib failure.The maximum tolerated dose and the disease control rate were tested in early Phase I and Phase I/II studies, respectively, which laid a relatively weak foundation for expediting the EVOLVE-1 trial, rather than conducting a more conservative Phase II trial.Despite the negative outcome of the trial, one reason for taking such an enormous financial risk is because, despite the high upfrontcosts of conducting these large trials, a far larger financial incentive remains, namely drug approval if the trial is successful. However, the case of everolimus is just one example in the broader landscape. Consider that approximately 700 clinical studies were conducted in a single year for pembrolizumab, and that when more and more tumour types are evaluated, the risk/benefit profile of the drug deteriorates, as was shown during the development of sunitinib monotherapy.
Even with negative trials and worsening aggregate risk/benefit profiles, a drug approval’s billiondollar return greatly outweighs the initial expense of conducting million-dollar studies. Fourth, consider the most widely used statistical instrument, the p-value. If researchers run 100 trials to determine the effect of an inert drug on survival and assume a one-tail p-value of p<.05, a distribution of five trials on average will have a false-positive result. This is precisely the definition of the p-value—the probability of seeing this result or a more extreme result if the null hypothesis is assumed. This value is an arbitrary line in the sand, although arguably a necessary one that is admittedly susceptible for misinterpretation. These concepts are illustrated in Figure 1. The left side represents a single, large pan-tumour RCT for a novel cancertherapy that was negative. In an analysis of prespecified subgroups, there were some tumours with positive results. Is it likely that a positive subgroup finding may result in FDA approval for a particular indication? The answer to that question is no. The FDA would perform adjustments for multiplicity, and in the absence of that, the findings are, at most, hypothesis generating. Now consider that instead of conducting a single RCT, several, separate RCTs were conducted in numerous indications, approximating the mentioned subgroups . Some of these studies may be positive in the same subgroups, perhaps even by chance alone, but the overall portfolio may be the same. The difference is that now these findings will result in drug approval. The reality is that, although each of these studies in orange were performed independently, they represent a trial portfolio. Both situations are philosophically equivalent as they test a single hypothesis , and on the left, the bias is clear, but on the right, positive trials appear distinct, and the portfolio is never assessed in aggregate. Because the trial portfolio is not considered in the present oncologic regulatory environment,container growing raspberries multiplicity must be accounted for. One example that illustrates how statisticians and cancer doctors may view a question different is also captured in Figure 1. Some statistical experts have suggested meta-analyses be used for the figure on the Right , rather than multiplicity testing.However, these approaches fall short of answering the pertinent clinical question because the drug is considered in aggregate in multiple tumour types repeatedly, rather than identifying whether a drug works in one tumour or the other.A meta-analysis or pooled estimate focuses on determining whether a drug is effective in all tumour types, rather than the cancer doctor’s question of which tumour the treatment is effective in–a distinct difference. Because this technique does not exclude the possibility of a single positive trial leading to drug approval, multiplicity adjustment is needed to sate the doctor’s and patient’s question, and not a pooled estimate. This scenario also illustrates the importance of content specific experts guiding the framing of the statistical question. Combining all the key points above, businesses are now incentivized to test drugs with marginal benefits in as many indications possible. Consider that when a pharmaceutical firm develops a drug, all translational research costs are expended, leaving just the expense of additional trials. When companies consider this sunk cost, which requires no further investment in research and development but just the expense of the additional trial at the end, it incentivizes the company to test it in every single tumour type as many times as possible.
When combined with the low bar for drug approval, the considerable post-approval revenue, and a generation’s threshold of significance, pharmaceutical companies stand to profit enormously. Because these therapies are likely more effective than an inert substance, both true and false positives are obtained, which when averaged, results in a highly profitable approach. We see this with immune checkpoint inhibitor trials. There are now thousands of studies of largely similar molecules with massive duplication in the same or similar cancer settings,often yielding conflicting results.Hyperkalaemia is a common electrolyte abnormality which occurs most frequently in patients with decreased kidney function, with the highest prevalence observed in patients with end-stage renal disease . Severe hyperkalaemia is a medical emergency, as high serum potassium levels or its abrupt excursions may be a cause of sudden cardiac death. Besides a decrease in potassium excretion by the kidneys or ESRD and often made worse by medications such as inhibitors of the renin-angiotensin-aldosterone system, hyperkalaemia may also be exacerbated by an abnormal redistribution between the intracellular and extracellular space and by increased dietary potassium intake. In the early stages of CKD, even very high potassium intake is not sufficient to cause hyperkalaemia and external potassium balance is generally neutral, unless therapies reducing net intracellular shift or renal excretion capacity are administered. This is an important consideration since high potassium diets are useful in patients with CKD because they have been associated with favourable cardiovascular and renal outcomes. However, in advanced stages of CKD and in ESRD a positive external potassium balance, namely a dietary input that surpasses output, has a crucial role in engendering hyperkalaemia, and its prevention requires a balanced management of dietary potassium load. The present paper aims to review dietary potassium handling and gives information about practical approach to limit potassium load in CKD patients at risk of hyperkalaemia. Whereas the US Food and Nutrition Board of the Institute of Medicine has set an adequate intake for potassium relatively high in healthy adults, i.e. 4.7 g per day, the World Health Organization recommends a dietary potassium intake of 3.9 g per day or at least 90 mmol/day , to reduce blood pressure and the risk of cardiovascular damage, stroke and coronary heart disease. In patients with non-dialysis dependent CKD stages 1–5, the National Kidney Foundation suggests an unrestricted potassium intake unless the serum potassium level is elevated. In hemodialysis patients, potassium intake should be up to 2.7–3.1 g/day and in peritoneal dialysis patients close to 3–4 g/day; in both cases, adjustments based on serum potassium levels are crucial.