The Case of the Anatomic Injury



It is not uncommon for the standard of care to be dictated by those who speak the loudest rather than a representative sample of best medical practice. This is certainly the case with the inclusion of whole-body CT scans in the initial management of patients presenting to the Emergency Department with traumatic complaints. Known colloquially as the trauma pan-scan, this global irradiation has become a significant part of the primary survey in many trauma centers across the US. For critically ill trauma patients in whom the clinical exam is unreliable, the introduction of the CT scan has been invaluable. What benefits do total body radiological evaluations provide for the clinically stable, cognitively present patient with minor injuries?

Recently published studies have promoted the pan-scan’s utility, claiming that those patients initially exposed to a total body irradiation fared significantly better than patients who underwent selective scanning as per the judgment of the treating physician (1,2,3,4).  The most notable of these studies was published in the Lancet in 2009 by Huber-Wagner et al (1). In this retrospective analysis the authors examined 4621 trauma patients enrolled in the German Trauma Society’s trauma registry from 2002-2004. The authors compared the mortality rates in those who received early empiric whole-body CT scans to those who received selective scanning. The authors utilized both the trauma and injury severity score (TRISS) and the revised injury severity classification (RISC), in an attempt to control for the retrospective, non-randomized nature of this trial. Both these scores attempt to predict the severity of injury using patient factors as well as anatomical injuries as defined by the abbreviated injury scale (AIS).

The authors found a significant reduction in mortality when compared to the value predicted by both the TRISS and RISC scores (ARR of 5.9% and 3.1% respectively), in the patients who received whole-body CT scans. Conversely the patients who received selective scanning were found to have no difference between their actual mortality and that predicted by the TRISS and RISC assessment scores. Furthermore when the authors attempted to control for age, hospital site, and date of presentation through logistic regression, whole-body CT scans remained a statistically significant predictor of reduced mortality (OR of 0.66, 95%-CI 0.50–0.86)(1).

Since the publication of this paper, a number of additional retrospective analyses have been conducted demonstrating similar results. The largest again penned by Huber-Wagner et al was a re-analysis of their trauma registry published in 2013 in PLoSE One (3), which confirmed their initial findings in a much larger cohort. A 2014 meta-analysis found analogous results (4).

The causal inference these trials hope to make is that the whole-body CT in the undifferentiated, well-appearing trauma patient saves lives. There are of course significant methodological problems that prevent us from drawing such a causative conclusion. Due to each trial’s retrospective, non-randomized design, we are unable to truly assess the similarity of the groups we are comparing. As such the potential for the introduction of bias into the results is incredibly high.

Huber-Wagner et al employed a technique known as the standardized mortality ratio (SMR) in an attempt to compensate for this potential bias (1). This involves comparing the actual mortality to the predicted mortality. Numbers less than 1 denote an actual mortality that was lower than what is predicted. Of note, in this form of analysis, there is no direct comparison between whole-body CT and controls. The SMR assumes the validity of whatever prognostic scoring system was used to predict the acuity of the patients examined. In this case the authors used both the TRISS and the RISC scores. Both of these systems utilize the injury severity score (ISS) as a large component of the resulting score. Many authors have written questioning the validity of the ISS (5) scoring system. Most recently, a January 2016 article published in Annals of Emergency Medicine by Gupta et al described the phenomenon of ISS inflation due directly to the Trauma Pan-Scan (6).

The ISS scoring system is based purely on the anatomic definition of injury. Using the anatomic injury scores originally proposed in JAMA in 1971 as a means of assessing automotive-related injuries (7), the ISS assigns a numeric score (0-6) for each anatomic area. This score then is squared and the squared numbers are then summed to give the resulting ISS. ISS ranges from 0-72. Since this is purely an anatomically defined score, patients who undergo whole-body CT are far more likely to have clinically insignificant injuries detected by these unnecessary scans. Given the exponential nature of the ISS, these findings can artificially inflate the resulting score. This shifts a healthier cohort into a more severe ISS level creating the illusion of improved outcomes.

To demonstrate the reality of this phenomenon, Gupta et al performed a secondary analysis of a trial originally published in Annals of Emergency Medicine in 2011. In the original cohort also by Gupta et al, the authors examined blunt trauma patients presenting to an academic Emergency Department who warranted trauma team activation (8). After examining the patients, the attending Emergency Physician and Trauma Surgeon were asked what anatomic areas they would want to image for further diagnostic evaluation. Results of the whole body CT were compared to whether the imaging was desired by the Emergency Physicians and Trauma Surgeons respectively. An undesired scan was defined as a component of the whole body CT that either the Emergency Physician, Trauma Surgeon or both stated was an unnecessary part of the radiologic evaluation. The original study enrolled 701 patients, of which, 600 (86%) received a whole body CT. Of the total 2,615 scans, 992 were undesired by at least one attending physician. Out of the 992 undesired scans 3 (0.3%) abnormalities were found that led to a predefined important intervention, though even these were questionable (for example 9 rib fractures were identified on CXR and 10 were observed on chest CT) (8).

Gupta et al examined the subset of patients from their original cohort who underwent undesired imaging where non-critical injuries were discovered. Using these 92 patients, the authors calculated what their ISS would have been if these undesired scans were not obtained. When compared to their actual ISS, the authors found a 50% reduction in the ISS score (6). This amplification of the ISS is an obvious and potent confounder in any non-randomized cohorts that attempt to utilize trauma injury scores to account for variations in case-mix and acuity level.

In 2006 Salim et al published the results of a prospective trial examining the utility of the pan-scan in clinically evaluable trauma patients (9). The authors enrolled 592 patients with no outward signs of trauma, who were clinically evaluable on presentation. They claimed to have found a multitude of injuries on CT that would have otherwise gone undiagnosed. In fact the authors cite that these findings changed management in 18.9% of patients. These results are disquieting to say the least. That is until one actually reads what injuries were identified. Among the injuries the authors discovered, 26 cervical spine fractures and 89 rib fractures all of which by their reports were asymptomatic. This of course flies in the face of clinical experience and multiple cohorts documenting the reliability of physical exam in the clearance of the cervical spine and costal injuries (10,11,12,13,14). In the discussion section of the article, Dr. James Tyburski questions this very implausibility;

“… Regarding the CT scan of the cervical spine, there were 30 patients, or 5.1%, with fractures and/or dislocation subluxations in the mechanism group vs 24, or 5.9%, in the group that had an unreliable physical exam. This implies that the physical exam was basically useless in the evaluation of the cervical spine, so I want to be clear here. There were no symptoms, pain, or physical signs, tenderness or anything else like that, etc, in these 30 patients that were awake and fully evaluable? Can you comment on this lack of sensitivity in the evaluation of the cervical spine, as this would be at odds with several trauma care guidelines by other organizations?…

… Regarding the CT scans of the chest, there were 89 rib fractures in the mechanism group alone. And again this implies that none of these patients had pain or tenderness over these ribs. If they did, then that would imply possibly an abnormal physical exam, if it was a physical abdominal exam for their lower rib fractures. Can you comment about that?”

Dr. Salim’s response:

“In terms of pain in the cervical spine, we didn’t really address whether they had pain. We were just looking to see if they had any outward signs of trauma. Typically we are just looking at that patient who looks like they don’t have anything wrong. It is the typical patient that the ER physician just wants to send home from the ER.”

This exchanges highlights the curious nature of the clinically evaluable patient. Essentially Salim et al examined the efficacy of whole-body CT as a costly and burdensome replacement for physical exam.

The implication when reviewing this literature is that well appearing, evaluable patients presenting to the Emergency Department may be harboring clinically occult, life threatening injuries undetectable by a standard physical exam. And yet this interpretation is based off methodologically flawed retrospective analyses and prospective data sets in which the physical exam was all but neglected. More importantly this ignores the multitude of clinical decision instruments, derived and validated from high quality prospective data, demonstrating that imaging can be avoided using simple components from a history and physical exam (10,11,12,13,14,15). Randomized data is required to truly assess whether the whole-body CT is beneficial in the workup of the undiagnosed trauma patient. One such trial is underway and its results are urgently needed (16). But even if it shows a small benefit in detecting clinically meaningless injuries we must ask ourselves, what is the cost? How many patients do we have to expose to harmful radiation to find one additional rib fracture? What portion of these injuries would remain occult during a reasonable period of observation? How does this overzealous imaging strategy affect the flow of the rest of the department? What about the remainder of our patients, the ones who may truly require an emergent CT who are continually bumped in favor of the endless trauma alerts? There is a cost to all our actions and it behooves us to consider them carefully. No matter who is yelling the loudest.


Sources Cited:


  1. Huber-Wagner, S., Lefering, R., Qvick, L.M., and Working Group on Polytrauma of the German Trauma Society. Effect of whole-body CT during trauma resuscitation on survival: a retrospective, multicentre study. Lancet. 2009; 373: 1455–1461
  2. Hutter M., Woltmann A., Hierholzer C., et al: Association between a single-pass whole-body computed tomography policy and survival after blunt major trauma: a retrospective cohort study. Scand J Trauma Resusc Emerg Med 2011; 19: pp. 73
  3. Huber-Wagner S, Biberthaler P, Haberle S, Wierer M, Dobritz M, Rummeny E, van Griensven M, Kanz KG, Lefering R: Whole-body CT in haemodynamically unstable severely injured patients – a retrospective. multicentre study. PLoS One 2013, 8(7):e68880.
  4. Caputo ND, Stahmer C, Lim G, Shah K. Whole-body computed tomographic scanning leads to better survival as opposed to selective scanning in trauma patients: a systematic review and meta-analysis. J Trauma Acute Care Surg. 2014;77(4):534-9.
  5. Champion, H.R. Trauma Scoring. Scandinavian Journal of Surgery March 2002 91 no. 1 12-22
  6. Gupta M, Gertz M, Schriger DL. Injury Severity Score Inflation Resulting From Pan-Computed Tomography in Patients With Blunt Trauma. Ann Emerg Med. 2016;67(1):71-75.e3.
  7. American Medical Association Committee on the Medical Aspects of Automotive Safety: Rating the severity of tissue damage: The abbreviated scale. JAMA 1971;215:277
  8. Gupta M., Schriger D.L., Hiatt J.R., et al: Selective use of computed tomography compared with routine whole body imaging in patients with blunt trauma. Ann Emerg Med 2011; 58: pp. 407-416.e15
  9. Salim A, Sangthong B, Martin M, Brown C, Plurad D, Demetriades D. Whole Body Imaging in Blunt Multisystem Trauma Patients Without Obvious Signs of Injury: Results of a Prospective Study. Arch Surg. 2006;141(5):468-475. doi:10.1001/archsurg.141.5.468.
  10. Hoffman JR, Mower WR, Wolfson AB, Todd KH, Zucker MI. Validity of a set of clinical criteria to rule out injury to the cervical spine in patients with blunt trauma. National Emergency X-Radiography Utilization Study Group. N Engl J Med. 2000;343(2):94-9.
  11. Stiell IG, Clement CM, Mcknight RD, et al. The Canadian C-spine rule versus the NEXUS low-risk criteria in patients with trauma. N Engl J Med. 2003;349(26):2510-8.
  12. Stiell IG, Clement CM, Rowe BH, et al. Comparison of the Canadian CT Head Rule and the New Orleans Criteria in patients with minor head injury. JAMA. 2005;294(12):1511-8.
  13. Rodriguez RM, Anglin D, Langdorf MI, et al. NEXUS chest: validation of a decision instrument for selective chest imaging in blunt trauma. JAMA Surg. 2013;148(10):940-6.
  14. Rodriguez RM, Langdorf MI, Nishijima D, et al. Derivation and validation of two decision instruments for selective chest CT in blunt trauma: a multicenter prospective observational study (NEXUS Chest CT). PLoS Med. 2015;12(10):e1001883.
  15. Rostas J, Cason B, Simmons J, Frotan MA, Brevard SB, Gonzalez RP. The validity of abdominal examination in blunt trauma patients with distracting injuries. J Trauma Acute Care Surg. 2015;78(6):1095-100 .
  16. Sierink JC, Saltzherr TP, Beenen LF, et al. A multicenter, randomized controlled trial of immediate total-body CT scanning in trauma patients (REACT-2). BMC Emerg Med. 2012;12:4.

The Case of the Precise Inaccuracy

continuous chest compressions

In the world of medical science we are often lulled into a false sense of security by large sample sizes and their correspondingly small confidence intervals. We often forget that such methodologic strengths augment only a trials precision, or the likelihood a similar trial will produce similar results. Such statistical robustness speaks little towards a trial’s accurate representation of the truth. A trial’s accuracy can only be revealed from the subjective nature of its designed. This type of intrinsic bias can dramatically affect a trial’s results and yet is incapable of being measured by our traditional statistical instruments. As such we often interpret data far beyond the limits of its methodological borders. An unfortunate concession made out of necessity. What follows is full of such methodological leaps and physiologic fancies. Read at your own peril.

For some time we have existed under the belief that continual chest compressions are vital during the resuscitation of patients in cardiac arrest. We dogmatically cite the importance of compression fractions and peri-shock pauses despite the evidence supporting their value being far from robust. In fact, until recently little randomized control trial level data existed examining the value of continuous chest compressions during cardiac arrest.

Published in the NEJM by Nichol et al in December 2015, the ROC investigators sought to examine the true value of continuous chest compressions (2). What the authors did was methodologically impressive. They compared a continuous chest compression strategy to the 30:2 compression to breath ratio traditionally implemented in CPR algorithms. The authors performed a cluster randomized control trial where participating EMS agencies were randomly assigned to a period of either continuous chest compression with asynchronous breaths or 30:2 ratio. Twice per year each agency was switched over to the other resuscitation strategy. Prior to enrollment each EMS agency was required to undergo a quality assessment period and was only allowed to enroll patients in the trial if their EMS services were able to adhere to treatment protocol as assigned, demonstrate proficiency in data entry, and prove themselves capable of appropriate CPR-process measures.

Patients enrolled in the continuous chest compression group assignment received continuous compressions at a rate of 100 per minute with asynchronous breaths delivered at a rate of 10-breaths per minute. Patients treated by groups assigned to the 30-2 ratio received 30 compressions followed by 2 ventilations delivered over a pause of no greater than a 5 seconds. Both of these protocols were performed during the first 6 minutes of arrest at which point an advanced airway was obtained and all patients received continuous compressions with asynchronous breaths at a rate of 10 breaths per minute. An important note, though the authors monitored the quality of CPR delivered by all providers to ensure protocol adherence, no means were established to ensure proper ventilatory support.

Even just a brief glance at this trials methodology reveals the colossal effort required to design, plan and institute a trial such as this. The authors enrolled 26,148 patients, of which 12,613 received continuous chest compressions and 11,035 received intermittent compressions during the active study period of the trial. Despite the imperfections of a cluster randomized trial design, by all appearances the two groups were fairly well balanced.

The authors found no difference in their primary outcome, survival to hospital discharge, between the continuous chest compression and intermittent compression groups (9.0% vs 9.7%). In fact the authors failed to find a difference in 24-hour survival, number of patients discharged home, or the amount of patients who were alive and neurologically intact at hospital discharge.

Only 43% of patients were treated according to the protocol to which they were assigned. In the patients who actually received the appropriate protocolized resuscitation, those who were assigned to the intermittent group fared noticeably better. Survival to hospital discharge was 7.8% vs 9.8% respectively, a 2% absolute difference (−2.9 to −1.1, p< 0.001).

Does this mean that our decade long focus on quality chest compressions and avoidance of interruptions has been futile? Has this massive endeavor by Nichol et al done nothing more than to once again encourage a phlegmatic approach to intra-arrest management? Despite its methodological rigor and complete lack of treatment effect, this trial is far less damning than it initially appears. The 0.7% absolute difference was surrounded by an equally small confidence interval, −1.5% to 0.1%. And yet when interpreting these results it is important to remember the meaning of such a statistical measure. A 95%-confidence interval is often misinterpreted to represent the range of values between which the truth is likely to exist. In reality a confidence interval is incapable of divining such an answer. If a trial was replicated an infinite number of time, the 95%-confidence interval represents the data points between which such a trial’s results would fall 95% of the time. As such it is incapable of assessing any intrinsic bias built into a trial’s design. Furthermore such statistical manipulations cannot assess the magnitude these biases potentially impact a trial’s results (1).

Ideally in a randomized trial design the goal is to obtain two identical cohorts with the only difference between the groups being the variable of questions. In this case Nichol et al hoped to examine two different compression strategies. But in reality how different were these strategies? When you look at the compression fraction in the two groups they differ very little, (0.83 and 0.77 respectively). Further more the peri-shock pause was again, identical. As was compression depth, use of ACLS medications and utilizations of post arrest treatments such as targeted temperature management and coronary revascularizations. One could argue that this trial was essentially an examination of two very similar compression strategies that diverged mostly in name alone. In truth, the most prominent difference between these two strategies may have been in was how the patients received positive pressure ventilation. The Nichol et al trial should not be viewed as an absence of evidence suggesting the importance of continuous chest compressions but rather the importance of the absence…

… of positive pressure ventilation.

In 2008 Bobrow et al published their experience using minimally interrupted compressions in the pre-hospital environment in Arizona (3). Their findings, published in JAMA, described a protocol, entitled minimally interrupted cardiac resuscitation (MICR), in which the participating EMS agencies were trained to perform 200 compressions over two minutes, followed by a rhythm analysis and defibrillation when appropriate, followed by another 200 compressions before checking for a pulse. This protocol was repeated for the first 6-minutes of CPR. In contrast to the Nichol et al trial, the importance of airway management was significantly devalued. Although BVM ventilation at a rate of 8 breaths per minute was permitted, providers were encouraged to provide only passive oxygenation through a nasal canulla.

Bobrow et al found that using a MICR strategy increased overall survival from 1.8% to 5.4%. In patients with witnessed ventricular fibrillation arrests, survival increased from 4.7% to 17.6%. Studies examining compression only CPR compared to standard AHA guideline CPR performed by bystanders in OHCA demonstrated similar improvements (4,5). The before and after study design in the Bobrow et al trial limits the causative conclusions that can be drawn, especially given the fact that as the trial progressed, though protocol compliance remained consistently high over time, survival dropped close to baseline by the end of the 18 month observation period. Nonetheless this evidence lends itself to support the hypothesis that positive pressure ventilation can be detrimental in the early stages of cardiac arrest.

Although Nichol et al were fastidious in their effort to monitor compression quality during their study period, they providing no means to account for the rate at which patients were ventilated. We know from previous work by Aufderhei et al that respiratory rates during cardiac arrest frequently far exceed the recommended 10 breaths per minute (6). In this prospective study where rescuers were observed during OHCA, the authors found that the average breath per minute delivered was 30 and the percent time per minute which positive pressure was recorded in the thoracic cavity was 47.3%. Milander et al found almost identical findings. In this cohort of in-hospital cardiac arrests, ventilations were given at a mean rate of 37 breaths per minute (ranging 24-60 breaths per minute) (7). This tendency toward hyperventilation may in fact be the reason that continuous chest compressions failed to demonstrate superiority when compared to an intermittent strategy and why in the per protocol analysis, the intermittent protocol demonstrated a statistically significant survival benefit compared to the continuous approach. In patients who were treated using the intermittent protocol, the 30:2 ratio clearly limits the amount of positive pressure breaths a patient can receive per minute. If rescuers are compressing at a rate of 100 compressions per minute, then the maximal amount of breaths that can be delivered in between chest compressions is 6-7 breaths per minute. Conversely since in the continuous compression group positive pressure ventilations were delivered in asynchronous fashion to the circulatory effort and had no feedback system limiting overzealous bagging, it is likely their delivery far out paced the recommended 10 breaths per minute.

The Nichol et at trial was a mammoth effort, which found no difference in efficacy between two resuscitative strategies. Despite its statistical confidence it would be unwise to make the evidentiary leap to state that these results devalue the importance of continuous chest compressions. They may suggest that the manner in which you deliver chest compressions does not matter as long as the compression fraction is high. They also suggest that the manner in which positive pressure ventilation is applied during these resuscitation is likely of vital importance. Unfortunately because of the ventilator strategy used by Nichol et al, the negative results cannot be applied to the use of a MICR strategy as proposed by Bobrow et al. To answer that question, another trial of immense proportions specifically examining a MICR protocol, which de-emphasizes the importance of early airway management, is required. A daunting and exhaustive task.

Sources Cited:

  1. Altman DG, Bland JM. Uncertainty beyond sampling error. BMJ. 2014;349:g7065.
  2. Nichol G, Leroux B, Wang H, et al. Trial of continuous or interrupted chest compressions during CPR. N Engl J Med 2015;373:2203-14.
  1. Bobrow BJ, Clark LL, Ewy GA, et al. Minimally interrupted cardiac resuscitation by emergency medical services for out-of-hospital cardiac arrest. JAMA. 2008;299(10):1158-65.
  2. Hallstrom A, Cobb L, Johnson E, Copass M. Cardiopulmonary resusci- tation by chest compression alone or with mouth-to-mouth ventilation. N Engl J Med. 2000;342:1546 –1553.
  3. Bobrow BJ, Spaite DW, Berg RA, et al. Chest compression-only CPR by lay rescuers and survival from out-of-hospital cardiac arrest. JAMA. 2010;304(13):1447-54.
  4. Aufderheide TP, Sigurdsson G, Pirrallo RG, Yannopoulos D, McKnite S, von Briesen C, Sparks CW, Conrad CJ, Provo TA, Lurie KG. Hyperventilation-induced hypotension during cardiopulmonary resusci- tation. Circulation. 2004;109:1960 –1965.
  5. Milander MM, Hiscok PS, Sanders AB, Kern KB, Berg RA, Ewy GA. Chest compression and ventilation rates during cardiopulmonary resus- citation: the effects of audible tone guidance. Acad Emerg Med. 1995;2: 708 –713.




The Adventure of the Cardboard Box Revisited

endovascular therapy Meta-analyses function under the assumption that the summation of data from multiple sources is a more accurate estimate of the true effect size than any one individual trial. And yet sometimes such statistical endeavors serve only to add dirt to the already muddy water. Such is the case with the recent trials examining endovascular therapy for acute ischemic stroke. Prior to the publication of MR CLEAN and its’ band of statistically significant misfits, the data regarding endovascular therapy has been consistently negative. Over the past year five RCTs examining endovascular therapy for acute ischemic stroke have been published. In direct contrast to the three trials published in 2013, all of the recent trials were impressively positive (1,2,3,4,5). So much so that the AHA, in their recently updated guidelines, recommended the use of endovascular therapy in a population far broader than any of the studies examined. Even more concerning was their whole-hearted support of the development of a regionalized system capable of instituting the use of endovascular therapy at a national level (6). Stating:

Regional systems of stroke care should be developed. These should consist of:

 (a) Healthcare facilities that provide initial emergency care including administration of intravenous r-tPA, including primary stroke centers, comprehensive stroke centers, and other facilities.

(b) Centers capable of performing endovascular stroke treatment with comprehensive periprocedural care, including comprehensive stroke centers and other healthcare facilities, to which rapid transport can be arranged when appropriate (Class I; Level of Evidence A). (Revised from the 2013 guideline)

They go on to say:

It may be useful for primary stroke centers and other healthcare facilities that provide initial emergency care including administration of intravenous r-tPA to develop the capability of performing emergency noninvasive intracranial vascular imaging to most appropriately select patients for transfer for endovascular intervention and reduce time to endovascular treatment (Class IIb; Level of Evidence C). (Revised from the 2013 guideline)

Even the biggest cynics must concede there is a signal of benefit demonstrated throughout the recent trials examining endovascular therapy for acute ischemic stroke. How much of this is due to the true effect of the treatment in question, and how much is in fact due to statistical noise is far more difficult to discern. Due to a lack of both blinding and equipoise of the trialists and the premature stoppage of all four trials following MR CLEAN, the data is likely to be a distortion of reality (7). And yet without a clear understanding of this effect size, it is difficult to assess whether this benefit justifies the resources required to support its implementation on a national level.

In an attempt to clarify this benefit, Badhiwala conducted a systematic review and meta-analysis of RCTs examining the efficacy of endovascular treatment for acute ischemic stroke (8). Published in JAMA, the authors included the five most recent positive trials, as well as the three negative trials published in the NEJM in February of 2013. The primary outcome was the frequently used and widely misunderstood ordinal analysis of functional status (mRS) at 90-days. The authors also examined a dichotomous outcome of the percent of patients alive and independent (mRS of 0-2) at 90 days, rate of revascularization at 24-hrs, symptomatic intracranial hemorrhage, and all cause 90-day mortality (8).

In their pooled analysis, the authors found a shift towards improved functional outcomes at 90-days in the patients randomized to receive endovascular therapy when compared to standard care. This difference demonstrated an odds ratio of 1.56 (1.14–2.13 p = 0.005), which translates into a 12% absolute increase in the amount of a patients alive and independent at 90-days (3.8%-20.3%; p = 0.005). Not surprisingly, patients randomized to the endovascular arm had a significantly higher rate of revascularization at 24-hours when compared to the patients in the standard care group (75.8% vs. 34.1%). No difference was observed in the 90-day mortality (15.8% vs. 17.8%) or the rate of symptomatic intracranial hemorrhage (5.7% vs. 5.1%) (8).

An interesting side note, one of the subgroups the authors examined in their secondary analysis was whether time to randomization had any effect on the efficacy of endovascular therapy. Specifically they looked at time from symptom onset to randomization. They examined the effect size of endovascular therapy as compared to standard care depending on whether patients were randomized before or after three hours from symptom onset. Temporality did not seem to affect outcomes (8). Once again calling into question the time is brain mantra so frequently proclaimed.

Although these results are positive, they are a far cry from the astonishing outcomes reported in the four trials released in the wake of the publication of MR CLEAN. Each of these trials was stopped prematurely because of an unplanned interim analysis provoked by the positive results found in MR CLEAN. And though it is unlikely that four trials would all demonstrate positive results due to random chance, it is well known that trials stopped early for benefit will demonstrate significantly more optimistic results than the intervention’s true effect size (6). In fact, the Badhiwala meta-analysis demonstrates a clear inverse association with the magnitude of benefit and the size of the sample at termination (8). And so the question becomes is it appropriate to pool this data at all?

To answer that question one has to ask, what is the goal of a meta-analysis such as this in the first place? The assumed benefit to performing a meta-analysis is that the summation of these data sets provides a more accurate description of the true effect size than each individual data set can provide. This supposition rests on the notion that all the studies included in the meta-analysis are examining the same study population, and that the variance of results is due to random errors in sampling. This is what is known as a “fixed effect” model. Unfortunately most data is not so homogeneous, and it is common for the variation observed between trials to be due to more than just random error, but to considerable differences in the populations being compared. In such cases, the results of a direct-pooled analysis will likely deviate from reality. Statistical models that attempt to account for these random deviations should be utilized. These are known as “random-effect” models (9).

In the Badhiwala et al meta-analysis the heterogeneity of the results between the trials examined was high and the authors correctly utilized a random-effect model. The authors used the I2 index to assess the degree of variation between studies. I2 describes the extent of variation across trials that cannot be explained by random chance. An I2 score of 0.0 implies all of the variation observed between trials can be accounted for by random errors in sampling. Conversely if the I2 is 75, only 25% of the variation can be accounted for by sampling error with the remaining variation (75%) due to heterogeneity between trials (10). In the Badhiwala et al meta-analysis the I2 = 75.4, confirming that the authors are likely attempting to pool trials with very different populations. Whether this variation is due to methodological differences or the bias that accompanies premature stoppage is unclear. Applying a statistical value to this uncertainty does not legitimize it.

Despite its methodologic rigor, Badhiwala et al’s meta-analysis brings us no closer to certitude. It serves to place an objective number on the current ambiguous state of the data concerning endovascular therapy for acute ischemic stroke. But the inherent value of its statistical manipulations in a pooled data set is unclear. This analysis provides little utility over our unstructured judgment of each respective trial’s importance, while validating our suspicion that these trials are examining very different populations.  By combining these trials, Badhiwala et al have attempted to augment statistical power in a dataset that already boasts effect sizes well below statistical significance. When truly, what is required, is a clinical homogeneity that no amount of statistical manipulations can supplant. I am sure in the coming months and years this study will be cited in vain. Used to champion the fortification of endovascular therapy’s place in the ivory tower of medicine. What will be remembered are odds ratios and p-values, while the true meaning will be forgotten. That what we have is a heterogeneous data set, ultimately providing more questions than answers.

Sources Cited:

  1. Berkhemer OA, Fransen PS, Beumer D, et al. A randomized trial of intraarterial treatment for acute ischemic stroke. N Engl J Med. 2015;372:(1)11-20.
  2. Campbell BC, Mitchell PJ, Kleinig TJ, et al. Endovascular Therapy for Ischemic Stroke with Perfusion-Imaging Selection. N Engl J Med. 2015.
  3. Goyal M, Demchuk AM, Menon BK, et al. Randomized Assessment of Rapid Endovascular Treatment of Ischemic Stroke. N Engl J Med. 2015.
  4. Saver JL, Goyal M, Bonafe A, et al. Stent-Retriever Thrombectomy after Intravenous t-PA vs. t-PA Alone in Stroke. N Engl J Med. 2015.
  5. Jovin TG, Chamorro A, Cobo E, et al. Thrombectomy within 8 Hours after Symptom Onset in Ischemic Stroke. N Engl J Med. 2015.
  6. Powers WJ, Derdeyn CP, Biller J, et al. 2015 American Heart Association/American Stroke Association Focused Update of the 2013 Guidelines for the Early Management of Patients With Acute Ischemic Stroke Regarding Endovascular Treatment: A Guideline for Healthcare Professionals From the American Heart Association/American Stroke Association. Stroke. 2015;46(10):3020-35.
  7. Bassler D, Briel M, Montori VM, et al. Stopping randomized trials early for benefit and estimation of treatment effects: systematic review and meta-regression analysis. JAMA. 2010;303(12):1180-7.
  8. Badhiwala, JH et al. Endovascular Thrombectomy for Acute Ischemic Stroke A Meta-analysisJAMA. 2015;314(17):1832-1843.
  9. Cornell JE, Mulrow CD, Localio R, et al. Random-effects meta-analysis of inconsistent effects: a time for change. Ann Intern Med. 2014;160(4):267-70.
  10. Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003;327(7414):557-60.


The Case of the Blind Allocator

CVC complications

In the modern world of evidence based medicine we exist in a perpetual state of doubt, continually attempting to perceive truths through the veil of science. Far too often our sample cohort deviates from the population it intends to represent. Hypothesis testing and frequentist statistics are tools intended to quantify the extent to which the observed results are due to random errors in sampling. And yet, there is an entirely different type of error that our statistical instruments are far less adept at appraising. This non-random form of error comes in the form of bias. This post will explore a number of common forms of bias and their extensive effects on data.

Despite the long standing belief that central venous catheters (CVC) placed in the femoral vein are at increased risk for catheter-related blood stream infections (CRBI), recent evidence has suggested that in the modern era of sterile insertion practices, the rate of line infections due to femoral catheter placement is no greater than cannulation of either the internal jugular (IJ) or subclavian (SC) veins.

In 2012, Marik et al published a paper in Critical Care Medicine with the intention of demonstrating this very assertion (1), conducting both a systematic review and meta-analysis of the existing data comparing the rates of CRBIs associated with each respective insertion site. The authors examined data of 17,376 central catheter insertions from 10 publications and concluded there was no difference in the rate of CRBI between the femoral inserted lines and their cephalad comparators. The relative risk cited was 1.02 (95% CI 0.64–1.65, p = .92) for femoral compared to SC, and 1.35 (95% CI 0.84–2.19, p = .2) when compared to IJ. Over 17,000 observed catheters insertions demonstrated no statistically significant difference in the rate of CRBIs between the various catheter insertion sites. And yet, despite the robust nature of its sample size, the validity of this meta-analysis has been questioned mostly due to the quality of the underlying data. Only 1,006, a small fraction of the total catheters placed in this analysis, were from RCT data. Of which, the majority of these originated from a single trial examining emergent dialysis catheters placed in either the femoral or internal jugular vein, which found no significant difference in the rate of central line infection between these two sites (10). But it is unclear if dialysis catheters, which are kept impeccably clean and accessed only for dialysis, translate to the heavily exploited standard CVCs used in the critically ill.

Observational cohorts, totaling 16,370 catheters, accounted for the remainder of the data in the Marik et al meta-analysis. When comparing outcomes between groups, observational data presents a number of methodological problems. In this instance, the location of catheter placement was not randomly assigned. Leading to an immense potential for selection bias, as the factors that determined site of cannulation may directly influence the likelihood that the catheter becomes infected. For example, in patients with severe respiratory distress, the cannulation of the SC vein may be avoided due to a fear of causing a pneumothorax. This leads to the placement of IJ and femoral catheters in a sicker subset of patients who are, in turn, at a greater risk of infection. Additionally, due to the pre-existing bias of many clinicians, femoral lines may have been removed earlier than either IJ or SC lines. The risk of central line infection is directly related to its time in situ, and thus their abbreviated use may underestimate the true risk of infection associated with femoral venous cannulation.

To further complicate matters, Marik et al eliminated two large trials from their analysis claiming they were statistical outliers (1). Although such a deletion may be statistically appropriate, the redacted trials demonstrated a far higher rate of line infections when the femoral site was utilized (2,3). When these trials are included in the analysis, the difference in the rate of CRBIs between the femoral and IJ insertion sites becomes statistically significant.

Essentially, there are too many confounding variables to be able to clearly interpret the data utilized in the Marik et al meta-analysis. Mathematical manipulations of this data, in the form of regression analyses, do not clarify the matter. This type of error is difficult to correct through statistical modeling and can only truly be controlled using randomization. Randomization accounts for confounding variables by randomly distributing them amongst the study arms. When implemented correctly, one may assume the observed differences are caused by the treatment effect in question.

A recent trial published in the NEJM sought to do just that. Parienti et al examined 3,471 catheter insertions in 3,027 patients in ten ICUs throughout France (4). Lines were inserted by “experienced” house staff, each required to have at least 50 previous line insertions. All lines were inserted using strict sterile precautions and Seldinger technique, though the use of ultrasound guidance was left to the inclination of the clinician performing the procedure. Patients were enrolled if the treating physician determined that at least two of the three sites (IJ, SC, or femoral) were appropriate for cannulation. At which point the patient was randomized to site.

The authors found a significant difference in their primary outcome, the rate of catheter-related infections and symptomatic deep-vein thrombosis, between the patients randomized to undergo SC line placement when compared to both IJ or femoral placement. Overall there were 8, 20 and 22 events in the SC, IJ and femoral sites respectively, which translates to 1.5, 3.6 and 4.6 events per 1000 catheter-days respectively. This was offset by an almost identical increase in the rate of mechanical complications (arterial injury, hematoma, pneumothorax or other), observed in patients randomized to the SC insertion site when compared to both the femoral or IJ groups (2.1%, 1.4% and 0.7% respectively) (4). This difference was made up entirely of an increase in the rate of pneumothoraxes observed in the SC group. And yet despite the randomized nature of this trial, the methodology utilized by Parienti et al makes interpretation less than straightforward.

As discussed, the major flaw in the Marik et al meta-analysis was the fact that the majority of the data was obtained from non-randomized cohort data, making it extremely difficult to account for the confounding variables that might have influenced site selection. Ideally randomization should eliminate these biases. Unfortunately because of a number methodological concerns, the Parienti et al trial failed to control for bias as well as we would have hoped.

For randomization to be valid, it is vital the participating clinicians are not aware of patient group assignment prior to randomization. This is what is called allocation concealment. Prior knowledge of such events will lead to a selection bias, as there is a tendency for clinicians to exclude certain patients based on their own beliefs regarding the validity of the treatments being examined (5,6). For example, a patient with severe respiratory distress may not be enrolled in the trial if the physician had prior knowledge that the patient would be randomized to SC site insertion, primarily due to potential for pneumothorax. Improper allocation concealment will exclude a certain subset of patients and produce results that systematically deviate from reality (6). Although Parienti et al did attempt to conceal allocation prior to randomization by the utilization of a permuted-block randomization with varying block sizes, they allowed the treating physicians to exclude one site prior to randomization, if it was deemed not suitable for clinical use. This allowance was probably unavoidable, as it is not uncommon for one or more vessels to be inaccessible in clinical practice, but this concession allows for the introduction of the very selection bias we were hoping to avoid through randomization (6).

Of the 3,471 catheters placed, 2,532 (72.9%) were placed in patients in whom all three sites were deemed accessible. This leaves 940 catheters (a little more than 25%) that were placed in patients in which the treating clinician had eliminated one site prior to randomization. The majority of these exclusions (570) were of the SC site, because the treating physician felt the risk of pneumothorax or bleeding was unacceptably high. Another 277 of the exclusions were of the femoral site, 45% because of “site contamination”. These exclusions potentially prevented the highest risk patients from being randomized into the SC and femoral insertion sites, leading to the very type of bias found in the observational data in the Marik et al meta-analysis we hoped to eliminate.

A further source of bias in the Parienti trial, can be traced to its inability to blind practitioners to the treatment group after allocation. For obvious reasons such blinding would have been unfeasible in a trial such as this, but it does allow for the introduction of yet another source of bias. When RCTs lack adequate blinding, the risk of ascertainment bias is prominent. Ascertainment bias is the systematic, non-random distortion of the measurement of the true frequency of an event because of the investigator’s knowledge and assumptions of the group allocation (7). In this case, patients randomized to the femoral site had their CVC in place significantly shorter than patients randomized to either the SC or IJ sites (mean catheter days approximately 5.9 +/- 4.8 for femoral and 6.5 +/- 5.4 for IJ and SC). Since risk of infection is directly related to length of catheter duration this difference could potentially skew the results in favor of the femoral site.

The authors attempt to control for these confounders through the use of regression analysis and analyzing catheter events per catheter day, rather than per insertion. Just as we discussed regarding the Marik et al meta-analysis, these types of statistical compensations cannot support such methodological frailties.

Despite its flaws, Parienti et al have gathered the largest, most complete data set in existence addressing the complication rate of CVC insertion. I suspect their results are as close a proximity to the truth as we currently have. As such, if we are willing to accept the slight increase in the rate of pneumothorax, the SC vein may be the preferred initial option for central venous cannulation, with the caveat that the true pneumothorax rate might be higher than observed due to the large number of exclusions prior to randomization (4).

So often in the interpretation and translation of medical literature we find ourselves lost in the statistical minutiae, citing p-values and confidence intervals as if they hold intrinsic value. And yet these statistical manipulations are for the most part concerned with quantifying the extent the results observed are due to random chance. Their mathematical constructs cannot account for the non-random error caused by methodologic missteps. Collecting data in the face of these flaws and attaching a statistical judgment to the results does nothing to legitimize its validity.

Sources Cited:

  1. Marik PE, Flemmer M, Harrison W. The risk of catheter-related bloodstream infection with femoral venous catheters as compared to subclavian and internal jugular venous catheters: a systematic review of the literature and meta-analysis. Crit Care Med. 2012;40(8):2479-85.
  2. Lorente L, Henry C, Martín MM, et al: Central venous catheter-related infection in a prospective and observational study of 2,595 catheters. Crit Care 2005; 9:R631–R63529.
  3. Nagashima G, Kikuchi T, Tsuyuzaki H, et al: To reduce catheter-related bloodstream infections: Is the subclavian route better than the jugular route for central venous catheterization? J Infect Chemother 2006; 12:363–365
  4. Parienti JJ, Mongardon N, Mégarbane B, et al. Intravascular Complications of Central Venous Catheterization by Insertion Site. N Engl J Med. 2015;373(13):1220-9.
  5. Schulz KF, Grimes DA. Generation of allocation sequences in randomised trials: chance, not choice. Lancet. 2002;359(9305):515-9.
  6. Schulz KF, Grimes DA. Allocation concealment in randomised trials: defending against deciphering. Lancet. 2002;359(9306):614-8.
  7. Schulz KF, Grimes DA. Blinding in randomised trials: hiding who got what. Lancet. 2002;359(9307):696-700.
  8. Altman DG, Bland JM. Uncertainty and sampling error. BMJ. 2014;349:g7064.
  9. Altman DG, Bland JM. Uncertainty beyond sampling error. BMJ. 2014;349:g7065.
  10. Parienti JJ, Thirion M, Mégarbane B, et al; Members of the Cathedia Study Group: Femoral vs jugular venous catheterization and risk of nosocomial events in adults requiring acute renal replacement therapy: A randomized controlled trial. JAMA 2008; 299:2413–2422





The Case of Dubious Squire Continues

BNPIn the era before the ubiquitous use of bedside ultrasound, BNP and its derivative natriuretic peptides were, at best, a mediocre test that added little to clinical judgment. In today’s world of sonographic abundance, they simply add noise to our already deafening workflow.

Despite a wealth of evidence demonstrating natriuretic peptides’ lack of clinical utility, their use has become an abundant and reflexive component in the workup of suspected acute decompensated heart failure. While consistently failing to adequately lend diagnostic guidance in patients where clinical uncertainty is present, in the eyes of many, natriuretic peptides have remained a viable diagnostic pathway, simply for lack of a better option.

In a recent publication by Pivetta et al, in CHEST (1), the authors remind us that when presented with a diagnostic question it is important to select a test capable of providing the answer. Authors enrolled 1,005 patients presenting to the Emergency Department with acute dyspnea. Patients were excluded if they had an obvious cause of symptoms clearly unrelated to acute decompensated heart failure (trauma), or if there was no Emergency Physician present with ultrasound expertise (defined as > 40 completed scans). Patients underwent a standardized workup including history, physical exam, EKG and arterial blood gas (ABG), after which the Emergency Physician was asked to categorize the presentation as acute decompensated heart failure or non-cardiac in origin. After this, they performed a standardized point of care ultrasound (POCUS) examination that consisted of a 6-zone scanning protocol. Diffuse interstitial syndrome (DIS) was defined as the presence of two or more zones with three or more B-lines on bilateral lung fields. The final diagnosis was determined by a review of each patient’s hospital course performed by an Emergency Physician and Cardiologist, who were blinded to the POCUS findings (1).

Of the 1,005 patients enrolled, 463 patients (46%) were given the final diagnosis of acute decompensated heart failure. The agreement of the two physicians determining this gold standard was excellent, only disagreeing on 3.5% of the cases. The treating physician’s ability to clinically differentiate cardiac from a non-cardiac cause of the presenting dyspnea was exceptionally good. The physicians demonstrated a sensitivity and specificity of 85.3% and 90% respectively. In fact the performance of the POCUS alone, though numerically better (sensitivity of 90.5% and a specificity of 93.5%), did not differ statistically from the physician’s intrinsic diagnostic capabilities. Although in isolation each performed well, the combination of the clinical and sonographic exams significantly augmented their mutual diagnostic capabilities. The sensitivity and specificity of the physician’s judgment in addition to lung US was 97% and 97.4% respectively. More importantly for the purposes of this post, was its performance when compared to the natriuretic peptides. Of the 1,005 patients, 486 had a natriuretic peptide drawn. Its ability to differentiate cardiac causes of dyspnea was worse than the unassisted judgment of the treating physician. The sensitivity and specificity was 85% and 67.1% respectively (when threshold for a positive test was prospectively set at 400 pg/mL for BNP, and 450, 900, and 1,800 pg/mL for patients, 50 years old, between 50 and 75 years old, and 75 years of age, respectively, for NT-pro-BNP) (1).

This study is far from perfect. This was a prospective observational study that did not enroll consecutive patients, required an Emergency Physician competent in the use of bedside US, and only obtained natriuretic assays in approximately 50% of the cohort (1). And yet despite these obvious flaws, this trial serves to illustrate an important point in the interpretation of diagnostic test results. In the Emergency Department we function in varying degrees of uncertainty. We are constantly being shown a single cross section of a disease process and asked to predict its subsequent velocity and acceleration. We are expected to perform the impossible task of calculating the slope of a line with only one point of data. We estimate these slopes in the form of risk. The greater the risk the stronger the force acting to overcome our intrinsic inertia. There is a certain probability above which the risk of pathology is high enough to compel further investigation. Below this threshold the probability of disease and its accompanying burdens are not worth further diagnostic consideration. Conversely there are cases where the potential of disease is so high that the treatment threshold has already been crossed, and further diagnostic studies are incapable of lowering the risk enough to justify withholding the necessary interventions (2). As Emergency Physicians we exist in is the gray zone, the area between the test and treatment thresholds. As such, it behooves us to utilize tests with the diagnostic capability necessary to shift the post-test probability into either extremes of the continuum.

Screen Shot 2015-09-22 at 12.30.24 PM
Fig 1


Using the more traditional test characteristics, sensitivity and specificity, it is very difficult to intuit how a particular test result will shift an individual patient’s probability of disease. Through the use of a two-by-two table we are able to determine how often a patients with the disease in question is correctly identified by a positive test result (sensitivity) and how often a patient without the disease is likely to have a negative test result (specificity). But this retrospective evaluation defines a test’s performance from the perspective of a population in which the final diagnosis is already known (3). It does little to prospectively predict the risk of an individual patient with a specific test result. In contrast, the likelihood ratio (LR) is a prospective mathematical concept describing a diagnostic test’s ability to alter a patient’s risk. Essentially an LR calculates the percentage of patients with the disease that will have a specific test result, divided by the percentage of patients without the disease who will have the same test result (4).Screen Shot 2015-09-22 at 3.27.42 PM


Screen Shot 2015-09-22 at 12.30.59 PM
Fig 2

A negative LR (-LR) measures the probability of patients with the disease who will have a negative test result, divided by the probability of patients without the disease who will have a negative test result. The positive LR (+LR) is the exact opposite; the probability that patients with the disease will have a positive test result divided by the probability that patients without the disease will have a positive test result. LRs greater than one will shift the probability towards the treatment threshold, and ratios less than one shift the post-test probability in the opposite direction, towards the test threshold. The marker of a useful test is one that will consistently move the post-test probability out of this zone of uncertainty. Typically negative and positive LRs of 10 and 0.1 are considered the minimal level for diagnostic utility. Levels less than 10 or greater than 0.1 will not consistently shift the post-test probability above or below the test or treatment threshold (4,5).

Screen Shot 2015-09-22 at 12.31.09 PM
Fig 3

Pivetta et al illustrated that when the Emergency Physician is confident in their clinical diagnosis, they consistently identify the presence or absence of decompensated heart failure. In these cases, clinical judgment alone has correctly identified the patients, as below the test threshold or above the treatment threshold, further diagnostic studies are not required. In the remainder of patients where clinical judgment is insufficient, the LRs possessed by the natriuretic peptides (2 and 0.2 respectively) are insufficient to reliably shift the post-test probability out of this zone of uncertainty. Conversely, in the spectrum of patients where clinical judgment was unable to correctly differentiate decompensated heart failure from other causes of dyspnea, lung ultrasound was exceptionally useful. Pivetta et al found that when POCUS was used to augment clinical judgment, the positive and negative LRs were effectively diagnostic (22.3 and 0.03 respectively) (1).

The vast majority of the time, the Emergency Physician is more than capable of clinically identifying patients presenting in acute decompensated heart failure. In the few cases that cast a diagnostic dilemma, natriuretic peptides provide no additional diagnostic guidance. Bedside ultrasound is a swift non-invasive tool in possession of likelihood ratios robust enough to shift post-test probability to a degree that is clinically relevant. Now is the time to speak frankly about natriuretic peptides. They are diagnostic clutter, another lab value flagged as abnormal that must be acknowledged before discarding as unhelpful. Natriuretic peptides add noise to an already uncertain baseline, making it only more difficult to detect the signal through the already thunderous cacophony that is diagnostic uncertainty.

Sources Sited:

  1. Pivetta E, Goffi A, Lupia E, et al. Lung ultrasound-implemented diagnosis of acute decompensated heart failure in the Emergency Department – A SIMEU multicenter study. Chest. 2015
  2. Pauker SG, Kassirer JP. The threshold approach to clinical decision making. N Engl J Med. 1980;302(20):1109-17.
  3. Altman DG, Bland JM. Diagnostic tests. 1: Sensitivity and specificity. BMJ. 1994;308(6943):1552.
  4. Deeks JJ, Altman DG. Diagnostic tests 4: likelihood ratios. BMJ. 2004;329(7458):168-9.
  5. Fagan TJ.Letter: Nomogram for Bayes theorem.N Engl J Med1975; 293:257.

The Adventure of the Impassable Stone

alpha blockersAs medical skeptics we have a tendency to revel in the negative study. We bemoan the p-value’s tendency to underestimate the risk of type I error and cite Frequentist statistics’ history of getting it wrong almost as often as it gets it right. Despite these nihilistic inclinations it is important that we are equally vigilant in identifying circumstances in which the risk of type II errors is high. A number of recent trials examining the use of medical expulsion therapy (MET) in ureteral colic illustrate the risk of such errors.

The first of these trials published by Pickard et al in The Lancet, in May 2015, examined both alpha blocker (tamsulosin 0.4 mg) and calcium channel blocker (nifedipine 30 mg) therapy in patients with CT confirmed ureterolithiasis (1). The authors randomized 1137 patients with stones 10 mm or less to receive either 0.4 mg of tamsulosin, 30 mg of nifedipine or placebo. Patients were excluded if they presented with obvious signs of sepsis, had significant renal failure (GFR<30) or required immediate invasive therapy as prescribed by the treating physician.

The authors found there to be no significant difference in their primary outcome, the rate of spontaneous passage at 4-weeks, between those randomized to the tamsolusin, nifedipine or placebo arms. Spontaneous stone passage, defined by absence of need for intervention to assist stone passage during the 4 week follow up, was 307 (81%), 304 (80%), and 303 (80%) respectively. There was also no significant differences noted in the need for pain medication, the number of days pain medication was required, or the visual analog scale (VAS) of patients pain at 4 weeks (1). By all accounts this was an impressively negative trial.

A second study was recently published online in July 2015 in Annals of Emergency Medicine. Like the Pickard et al trial, this trial, by Furyk et al examined the effects of MET in patients with CT confirmed ureterolithiasis(2). The authors randomized patients with stones 10 mm or less located in the distal ureter to either MET with 0.4 mg of tamsulosin or placebo. Patients were excluded if they demonstrated signs of infection or presented with a compromised GFR. And like the previous study, the authors found no statistical difference in the number of patients who experienced stone passage at 28 days (87.0% and 81.9% in the tamsulosin and placebo groups respectively)(2). We now have two high quality RCTs demonstrating that the use of MET is not beneficial in the management of acute ureteral colic. This should conceivably end the debate regarding the utility of alpha blockade for ureteral colic.

alpha blockersAnd yet despite what on first glance appears to be convincing evidence, neither of these trials address the pressing question regarding MET. The majority of patients in both these trials had stones less than 5 mm in diameter. Most small stones will pass without difficulty (6,7). As these trials demonstrate it is impossibly hard to show a statistically significant difference in an undifferentiated cohort of renal colic patients. The real question is, does MET work in patients with stones greater than 5 mm in diameter? Can these trials definitively demonstrate a lack of utility of MET in these patients?

To examine this question appropriately we first must define statistical power. Power is the ability of a trial to detect a statistically significant difference between two groups when a true difference exists (3). It is the ability to separate true positives from false negatives, essentially the trial’s sensitivity. Traditionally, an acceptable statistical power has been set at 80 or 90%. The true meaning of such a statement is nebulous and it becomes far easier to understand statistical power when utilizing quantifiable measures.

The Pickard et al trial based their sample size calculation on the ability to detect a 10% absolute difference between the tamsulosin group and its comparators with a power of 90%(1). What this translates to is, if the observed difference between the tamsulosin group and its comparators were zero (p=1.0), the trial would not be able to confidently rule out an absolute difference as large as 6%. Conversely if the trial did in fact find a 10% improvement in patients randomized to alpha blockade, this effect size could range as low as 4% or as high as 16%. In fact, this is exactly what they found. The 95% confidence interval surrounding 1% absolute risk reduction (ARR) in patients randomized to receive tamsulosin was –4.4% to 6.9 %. Conversely, in the subset of patients with stones greater than 5 mm in width, Pickard et al observed an absolute difference of 10% in the rate of stone passage at 4 weeks in favor of those randomized to receive tamsulosin. This difference did not reach statistical significance. It is important to note that power is a prospective concept calculated prior to knowing the results of a study. To retrospectively state a trial is underpowered once the results of the study are known is somewhat disingenuous. The claim that the observed difference is true and only failed to reach statistical significance due to an inappropriately small sample size, may in fact be correct, but is not justifiable due to the data alone. Any post-hoc power calculation performed on such a data set will inevitably demonstrate the limited ability to differentiate a true difference from the null hypothesis(4). Once the trial results are obtained, post-hoc calculations should be avoided, focusing instead on the confidence intervals surrounding the point estimates for a more honest interpretation of the data (3). In this case, we are unable to differentiate a 10% difference in stone passage from no effect. In fact the 95% confidence interval ranged from -2.8% to 23.6% (1). Clearly this trial was not designed to answer the question of whether MET is beneficial in patients with large diameter ureteral stones.

The results of the Furyk trial are even more compelling. Though the primary endpoint was the overall proportion of patients with stone passage at 28-days, the authors powered their study for an entirely different question. The study was powered to detect a difference in the rate of stone passage in patients with larger stone diameters (5-10 mm). The authors calculated they would require 98 patients with stones greater than 5 mm to detect a 20% difference in stone passage with an 80% power (2). This means that if no difference was observed, the authors would be unable to exclude a difference as large as 14%. While their primary outcome was negative, in the subgroup of patients this study was powered to examine, the authors found a 22.4% absolute difference in the rate of stone passage at 28-days. The confidence interval surrounding this point estimate ranged from 3.1%-41.6%. Although it is unwise to make claims of significance based off a secondary endpoint with such a wide confidence interval, it is equally unfair to use this data to disprove a hypothesis, which this trial is not designed to refute.

We are all aware of the hazards of subgroup analyses, and yet it is important to be honest in our skepticism. This in no way should be viewed as an endorsement of MET or the necessity of obtaining imaging to identify a subgroup of patients who may benefit from tamsulosin. On the contrary, these trials demonstrate that for the majority of patients presenting to the Emergency Department with renal colic, MET provides little additional benefit above symptomatic treatment. But a trial can only answer the question it was designed to ask. Neither of these trials were built to confidently address whether MET is beneficial in patients presenting with larger stones. Earlier trials examining this question are either so confounded by non-blinding and selection bias to make them interpretable or suffer from the same deficiencies in statistical power to confidently address the effects of MET for patients with larger stones (5). We are left with statistical and philosophical uncertainty regarding the utility of alpha-blockers in acute ureteral colic. We will continue to exist in this state of ambiguity until we have a study sufficiently powered to ask whether MET is efficacious in patients with large ureteral stones. Many would love to discard alpha-blockers for renal colic in our ever-growing pile of medical impotencies, but given the current state of the literature, this renouncement would be premature and unjust.

Sources Cited:

  1. Pickard R, Starr K, Maclennan G, et al. Medical expulsive therapy in adults with ureteric colic: a multicentre, randomised, placebo-controlled trial. Lancet. 2015
  2. Furyk, Jeremy S. et al. Distal Ureteric Stones and Tamsulosin: A Double-Blind, Placebo-Controlled, Randomized, Multicenter Trial. Annals of Emergency Medicine. Published online: July 17 2015
  3. Goodman SN, Berlin JA. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med. 1994;121(3):200-6.
  4. Goodman SN. A comment on replication, P-values and evidence. Stat. Med. 1992;11:875-9.
  5. Campschroer, T., Zhu, Y., Duijvesz, D. et al. Alpha-blockers as medical expulsive therapy for ureteral stones. Cochrane Database Syst Rev. 2014; : CD008509
  6. Coll, D.M., Varanelli, M.J., and Smith, R.C. Relationship of spontaneous passage of ureteral calculi to stone size and location as revealed by unenhanced helical CT. AJR Am J Roentgenol. 2002; 178: 101–103
  7. Miller, O.F., Kane, C.J. Time to stone passage for observed ureteral calculi: a guide for patient education. J Urol. 1999;162:688–690 (discussion 690-691).

The Case of the Non-inferior Inferiority


The practice of Frequentist statistics is often a study in extremes. Based on an arbitrary threshold of significance, we are asked to interpret data as either positive or negative when in reality it merely shifts our probability of certainty. Even more important, because of the singular nature of Frequentist statistics, our interpretation of data is often constrained to the questions posed by those designing the trial. Although a strict deductive methodology is important to prevent mistaking random chance for scientific proof, it is equally important to understand in which instances abiding by these laws will lead to a misinterpretation and misunderstanding of the data.

Appendicitis has long been considered a surgical emergency. If it is not intervened upon surgically in a timely fashion the pathological sequelae will lead to perforation, sepsis, and death. And yet, despite this foregone conclusion, a number of trials have challenged the necessity of cold steel in the management of acute appendicitis. Most recently, in JAMA, Salminen et al published the findings from their RCT comparing the traditional surgical management of acute appendicitis to conservative treatment with antibiotic therapy alone (1). Despite the authors’ primary conclusion, this trial demonstrated that in patients with non-complicated acute appendicitis, the use of antibiotic therapy is anything but inferior.

Salminen et al randomized 530 patients with CT confirmed non-complicated acute appendicitis to either surgical management using primarily open laparotomy, or a short course of IV antibiotics (3-days of ertapenem), followed by a 7-day course of oral levofloxacin. Of the 273 patients randomized to the surgical group, 272 (99.6%) underwent successful appendectomy. In the patients randomized to conservative therapy 70 patients (27.3%) underwent appendectomy within one year of initial presentation. Lets pause for a moment. A disease process, which for the past century has been considered a surgical necessity, with 72.7% of patients treated successfully with antibiotics alone (1). Despite these impressive numbers the trial was deemed unsuccessful as the rate of “treatment failure” in the conservative group crossed the predetermined non-inferiority margin of 24%. And yet these statistical inadequacies are based less on the inferiority of antibiotic therapy and more on the authors’ unfortunate choice of how exactly they defined “non-inferior”.

Non-inferiority trials are intended to ask a very specific question. Whether a new treatment strategy or medical intervention is comparable to the traditional standard therapy. Rather than examine the two in the hopes of determining superiority, a non-inferiority trial merely attempts to establish this new treatment is no worse than the current standard care. This type of trial is undertaken when the new treatment provides certain advantages that would make it preferable over the old treatment (2,3). For example if it is cheaper, safer, or less invasive one might prefer to use this new treatment rather than expose the patient to the cost, risk, or intrusive nature of the prior strategy. In fact depending on what advantages a new treatment may provide, one might accept some degradation in efficacy as long as it does not cross a predefined threshold for inferiority. This threshold is based upon a number of assumptions. First, what is the proven efficacy of the established standard? Say for example, this standard in previous studies demonstrated an absolute decrease in mortality of 5%. The confidence interval surrounding this point estimate ranges from 3%-7%. You would not want your new intervention to be 3% less effective than the standard comparator, in which case it would prove to be as beneficial as placebo. Second, what added benefits does this new therapy provide? If these advantages are impressive, then you may accept a greater degree of inferiority when compared to the standard treatment strategy (a lower non-inferiority margin). On the other hand, if this new treatment provided few novel advantages, you would likely accept far less deviation from the standard treatment’s efficacy.

Salminen et al utilized neither of these considerations when calculating their non-inferiority margin. In fairness to the authors, it would be exceedingly difficult to accurately access the true efficacy of surgery over placebo as this standard of care was established long before placebo control trials were utilized to define treatment effect. Where the authors did falter was the manner in which they determined their non-inferiority margin and performed their power calculation. Using data from prior studies examining the efficacy of antibiotic therapy in acute appendicitis, the authors estimated a 25% rate of treatment failure (defined as need for surgical intervention within one year of initial presentation) in the patients randomized to conservative treatment (1). Using this estimate they set their non-inferiority margin at no more than 24% treatment failure in patients randomized to antibiotic therapy, essentially dooming their trial from its earliest power calculations.

Non-inferiority trials ask a different question than the traditional superiority trials that we are more accustomed. Rather than presenting a null hypothesis that states there is no difference between the groups, the non-inferiority trial design operates under the assumption that the novel intervention is inferior to the standard treatment. The alternative hypothesis states that the treatment options are equivalent. In order to reject the null hypothesis the novel treatment must demonstrate a near equivalent efficacy within a degree of certainty. This means that both the point estimate and surrounding confidenceScreen Shot 2015-07-09 at 1.29.15 PMintervals must fall above the non-inferiority margin (2,3). In this case, despite all prior evidence demonstrating the contrary, the authors estimated that 275 patients per group would provide a 90% power to demonstrate the non-inferiority of conservative management for acute appendicitis when compared to the more traditional surgical intervention. Essentially this translates into the non-surgical group having to demonstrate a point estimate of approximately 20% treatment failure within one year for the lower end of the confidence interval not to cross their predefined non-inferiority margin. Further hampering their efforts, the authors halted the trial early after enrolling only 530 patients (rather than the 610 planned in the original power calculation), increasing the already wide confidence interval surrounding their point estimate (1).

It should have come as no surprise that the authors failed to demonstrate non-inferiority by their designated definition. The authors found that 27% of patients randomized to antibiotic therapy required an appendectomy within 1-year of initial presentation. The 95%-confidence interval surrounding this point estimate was 22.0% to 33.2% (1). In the two trials in which they used to justify their non-inferiority margin of 24%, the 1-year failure rate in patients treated with antibiotics was cited as 24% and 23.6% respectively (4,5). Unfortunately in the latter of these to trials by Hannson et al, this failure rate was calculated from the per-protocol analysis rather than the intention to treat analysis. In reality the antibiotic group had a 47.5% crossover rate to surgery. The overall failure rate in the intention-to-treat analysis was 60% (5). In an additional trial by Vons et al, published in the Lancet in 2011, the 1-year appendectomy rate was 37%. The 95%-confidence interval around this point estimate ranged form 28.36% to 45.64% (6). The 2011 Cochrane analysis after examining the 5 existing RCT trials found 26.6% (95%-confidence interval 18.1%- 37.3%) of the patients randomized to antibiotic therapy went on to have an appendectomy within 1-year of initial presentation (7). Given that the previous evidence indicates that the rate of antibiotic failure has consistently been greater than 25% and has ranged as high as 60%, the expectation by Salminen et al that they would find non-inferiority of antibiotic therapy with a non-inferiority margin of 24% was optimistic to say the least.

More importantly was appendectomy rate at 1-year truly the most appropriate criteria with which to define inferiority? This trial was not negative because medical management proved to be inferior to surgical appendectomy, rather it was negative because the authors asked the wrong question. As clinicians what is our concern with the medical management of acute appendicitis? It is not whether 20% or 27% of those initially treated with antibiotics will eventually require an appendectomy, but rather does medical therapy lead to an unacceptably high rate of serious complications? In fact if we were to be completely equitable, while 99.6% of the patients in the surgical arm of this trial underwent appendectomies, only 27% of the patients in the medical management arm were exposed to an invasive procedure. The question the authors should have asked was, “How many patients in each arm experienced resolution of symptoms related to acute appendicitis without experiencing acute complications related to delays in treatment (perforation, abscesses, sepsis, etc)?” If the authors had asked this question their answer would have been entirely different. Among the patients randomized to medical management, of the 257 patients, 15 (5.8%) required appendectomy during their initial hospital admission. Only 5 (1.9%) patients in the antibiotic group experienced perforations requiring surgical intervention, compared to 2 out of 273 (0.7%) patients randomized to an immediate surgical intervention (1). Essentially you would have to treat 100 patients with non-complicated acute appendicitis in order to prevent one perforation.

Certainly there is a great deal to be determined before this non-invasive strategy can be considered mainstream practice. This was a small underpowered cohort in which the participating surgeons performed primarily open laparotomies. How this strategy translates to the US where the primary approach to appendectomies is laproscopic intervention, is unclear. Additionally, whether patients require 3 days of broadspectrum IV therapy followed by a 7-day course of oral therapy is unknown. What seems obvious is in what was once considered an exclusively surgical disease, the majority of patients can effectively be managed conservatively. Despite not meeting their own high standards for non-inferiority, the authors demonstrated that for most patients with acute appendicitis, when treated conservatively with antibiotics we can avoid surgical intervention without complications of delays to definitive care. To define such a revelation as inferior is unjust indeed.

Sources Cited:

  1. Salminen P, Paajanen H, Rautio T, et al. Antibiotic Therapy vs Appendectomy for Treatment of Uncomplicated Acute Appendicitis: The APPAC Randomized Clinical Trial. JAMA. 2015;313(23):2340
  2. Kaji AH, Lewis RJ. Noninferiority Trials: Is a New Treatment Almost as Effective as Another?. JAMA. 2015;313(23):2371-2.
  3. Kaul S, Diamond GA. Good Enough: A Primer on the Analysis and Interpretation of Noninferiority Trials. Ann Intern Med. 2006;145:62-69
  4. StyrudJ,ErikssonS,NilssonI,etal. Appendectomy versus antibiotic treatment in acute appendicitis: a prospective multicenter randomized controlled trial. World J Surg. 2006;30(6):1033-1037.
  5. HanssonJ,KörnerU,Khorram-ManeshA, Solberg A, Lundholm K. Randomized clinical trial of antibiotic therapy versus appendicectomy as primary treatment of acute appendicitis in unselected patients. Br J Surg. 2009;96(5):473-481.
  6. VonsC,BarryC,MaitreS,etal.Amoxicillinplus clavulanic acid versus appendicectomy for treatment of acute uncomplicated appendicitis: an open-label, non-inferiority, randomised controlled trial. Lancet. 2011;377(9777):1573-1579.
  7. Wilms IM, De hoog DE, De visser DC, Janzing HM. Appendectomy versus antibiotic treatment for acute appendicitis. Cochrane Database Syst Rev. 2011;(11):CD008359.


The Case of the Irregular Irregularity


We have proven ourselves highly capable of managing atrial fibrillation in the Emergency Department. In recent years, a number of prospective cohorts have demonstrated that with the use of IV anti-arrhythmic medication and electrical cardioversion, patients presenting to the Emergency Department with new onset atrial fibrillation can be successfully discharged in sinus rhythm consistently and with minimal adverse events. In 2010, Steill et al published a case series of 660 patients who were cardioverted in the Emergency Department (1). What they coined the “Ottawa Aggressive Protocol” consisted of chemically managed rate control followed by a trial of procainamide loaded over an hour and, if this failed to convert the patient, DC electrical cardioversion. Using this protocol, Steill et al cite the number of patients who were discharged home in normal sinus rhythm to be 595 (90.2%). In a recent systematic review published in the European Journal of Emergency Medicine, Coll-Vinent et al found that in patients who underwent Emergency Department cardioversion, 78.2%-100% were discharged home in a normal sinus rhythm (2).

But competency is not directly translatable into efficacy. Despite this proof of concept, there is limited data examining the patient-oriented benefits these aggressive rhythm control strategies produce. In fact, the majority of such studies employ the “rhythm at Emergency Department discharge” as their measure of success. And though being discharged from the Emergency Department in a sinus rhythm seems preferential over atrial fibrillation, little is known regarding the extent of this benefit, as very few trials rigorously monitored patients following discharge from the Emergency Department. How many of these patients remained in a sinus rhythm and for how long? Steill et al found that only 8.6% of their cohort returned to the Emergency Department within one week of cardioversion with any reoccurrence of atrial fibrillation. Unfortunately these numbers were calculated from a chart extraction of the Ottawa Hospital health records database and do not directly reflect the number of patients who experienced atrial fibrillation over the 7 days following Emergency Department discharge (1). Decker et al, in a small cohort of 150 patients, cite a recurrence rate of 10% at 6 months (3). What is the true recurrence rate? Even more importantly, does reestablishing sinus conduction lead to improved patient health and wellbeing?

The question at hand remains, what exactly are we achieving by performing cardioversions in the Emergency Department? We have known for some time that despite being capable of maintaining patients in a sinus rhythm with moderate success, an aggressive rhythm control strategy does not prevent the long term sequelae associated with atrial fibrillation. The AFFIRM trial published in the NEJM in 2002, demonstrated that in a cohort of 4060 patients with atrial fibrillation, although the use of a rhythm control strategy reduced the time patients spent in atrial fibrillation, it did not reduce the rate of death, MI or ischemic stroke (4). When the 1391 patients experiencing their first episode of atrial fibrillation or the 1252 patients presenting within 48 hours of symptom onset were examined separately, no additional benefit was discovered (4). Since the AFFIRM trial’s publication a number of studies, performed in various subsets of atrial fibrillation patients, have validated that rhythm control strategies do not prevent the long-term sequelae associated with this chronic disease (5,6)

Since rate control is the preferred long-term treatment strategy of atrial fibrillation, what exactly are our goals for cardioversion in the Emergency Department? Is there a long-term health benefit to aggressive rhythm control in the Emergency Department? Does this lead to noticeable improvements in patient outcomes? Unfortunately conclusive data on these questions has yet to be published. The few RCTs examining the benefits of aggressive management of atrial fibrillation in the Emergency Department are small and inconclusive. Despite this paucity of convincing evidence, I would argue that the mathematical likelihood of benefit is incredibly low. Atrial fibrillation is a chronic disease, with sequelae measured in events per patient year. The rate of short-term adverse events is exceedingly low, with some cohorts citing a 30-day event rate of less than 1% (7). To design a study powered to identify a statistically meaningful difference, the sample size required would be unrealistically high. Especially given that the long-term utilization of such rhythm control strategies have not yielded clinically important improvement in patient outcomes. Furthermore the act of emergent cardioversion, does not avert the need for anticoagulation, as this decision should be based off the patient’s risk of thromboembolic event independent of their rhythm at discharge (8).

If we can agree that the clinical benefits of aggressive cardioversion in the Emergency Department are minimal, then the only remaining justification for Emergency Department cardioversion are the positive effects on patient wellbeing and comfort. The current argument in support of Emergency Department cardioversion hinges on the supposition that a state of sinus regularity is preferred when compared to the electrical chaos induced by atrial fibrillation (9). Until recently this claim has been exclusively supported by anecdotal descriptions of patient experience, its validity had never been examined in a prospective fashion.

Published online June 2015 in the Annals of Emergency Medicine, Ballard et al sought to objectively assess the effects of Emergency Department cardioversion on patients’ wellbeing and comfort (10). The authors surveyed 730 patients who were treated for new onset atrial fibrillation and discharged from one of 21 medical centers in Northern California. Of this cohort, 652(89%) responded to a structured phone survey. Though the data was prospectively gathered, these patients were not randomized to either a rate or rhythm control strategy, but rather the manner of treatment was left entirely to the judgment of the treating physician. Of the 652 respondents the majority, 432 (67.3%) were managed with rate control therapy alone. Regardless of management strategy, 410 (62.9%) of the patients were discharged from the Emergency Department in a sinus rhythm. Among those patients who underwent electrical cardioversion, 92.2% were in sinus rhythm upon discharge. If you consider discharge rhythm as a metric of success than electrical cardioversion was a far more accomplished strategy than either pharmacological cardioversion or rate control therapy alone, which accounted for 81.6% and 49.7% of patients in a sinus rhythm respectively at discharge (10). Despite its obvious superiority in rhythmic control, what benefits does cardioversion provide for patients’ symptom burden at 30-days?

The authors measured 30-day wellbeing using the Atrial Fibrillation Effect on Quality-of-life (AFEQT) score. This 18-question tool was intended to assess the patients’ perception of the burden of disease. The surveys were administered via telephone performed by trained research assistants at least 28-days following Emergency Department visit. Overall 539 patients (82.7%) reported some degree of symptom burden related to their atrial fibrillation upon discharge. The use of cardioversion did not decrease the rate or degree of symptom burden at 30-days. When the authors analyzed the AFEQT scores in quartiles of severity rather than the dichotomous symptom/no symptom outcome, they found no additional benefit to Emergency Department cardioversion. Certainly this data is far from perfect. This was a non-randomized cohort and it is unclear how well the AFEQT score captures symptom burden (10). Despite these shortcomings, findings are consistent with the body of literature examining whether an aggressive rhythm control strategy approves patient wellbeing. A number of trials have examined the long-term benefits rhythm control has on reducing symptom burden. These trials have consistently demonstrated that when compared to rate control alone, an aggressive rhythm control strategy provided no additional perceivable benefit to patients’ wellbeing and comfort (11).

The act of electrical cardioversion within 48 hours of symptom onset is commonly perceived as a safe practice. In a recent review of the existing literature, Cohen et al found that out of 1593 patients, only one (0.06%) stroke was reported. Despite this cursory endorsement, I would caution that safety is measured in the thousands and the current data is far too limited and ripe with publication bias to truly assess safety. Additionally a recent research letter published in JAMA called into question the safety of the 48-hour window we have traditionally used to determine suitability for Emergency Department cardioversion. Nuotio et al published a secondary analysis of the FinV trial registry which examined 2481 patients in atrial fibrillation who underwent electrical cardioversion within 48-hours of symptom onset. In this cohort the risk of ischemic event increased significantly (0.03% to 1.1%) when time to symptom onset was greater than 12 hours. And although 1.1% is still a relatively low event rate, given the absence of any clear clinical benefit, the benefit-harm ratio does not favor an aggressive rhythm control strategy (12).

Modern medicine far too often values competency over efficacy. Whether it is door to balloon time, or the 6-hour sepsis bundle, we are constantly measured in surrogates thought to be associated with improvements in patient outcomes. The quality of our care has been distilled down to what can be marked as complete on a checklist. Although the evidence clearly demonstrates Emergency Physicians are capable of effectively cardioverting new onset atrial fibrillation in the Emergency Department, one cannot help but asking, to what end?

Sources Cited:

  1. Stiell, I.G., Clement, C.M., Perry, J.J. et al. Association of the Ottawa Aggressive Protocol with rapid discharge of emergency department patients with recent-onset atrial fibrillation or flutter. CJEM. 2010; 12: 181–191
  2. Coll-Vinent, B., Fuenzalida, C., Garcia, A. et al. Management of acute atrial fibrillation in the emergency department: a systematic review of recent studies. Eur J Emerg Med. 2013; 20: 151–159
  3. Decker, et al. A Prospective, Randomized Trial of an Emergency Department Observation Unit for Acute Onset Atrial Fibrillation.  Annals of Emergency Medicine, 2007.
  4. Wyse DG, Waldo AL, Dimarco JP, et al. A comparison of rate control and rhythm control in patients with atrial fibrillation. N Engl J Med. 2002;347(23):1825-33.
  5. Van gelder IC, Hagens VE, Bosker HA, et al. A comparison of rate control and rhythm control in patients with recurrent persistent atrial fibrillation. N Engl J Med. 2002;347(23):1834-40.
  6. Roy D, Talajic M, Nattel S, et al. Rhythm control versus rate control for atrial fibrillation and heart failure. N Engl J Med. 2008;358(25):2667-77.
  7. Scheuermeyer FX, Grafstein E, Stenstrom R, et al. Thirty-day and 1-year outcomes of emergency department patients with atrial fibrillation and no acute underlying medical cause. Ann Emerg Med. 2012;60(6):755-765.e2.
  8. Wang TJ, Massaro JM, Levy D, et al. A risk score for pre- dicting stroke or death in individuals with new-onset atrial fibrillation in the community — the Framingham Heart Study. JAMA 2003;290:1049-56
  9. Stiell IG, Birnie D. Management of recent-onset atrial fibrillation in the emergency department. Ann Emerg Med. 2011; 57:31–2.
  10. Ballard, DW. et al. Emergency Department Management of Atrial Fibrillation and Flutter and Patient Quality of Life at One Month Postvisit. Annals of Emergency Medicine
  11. Thrall G, Lane D, Carroll D, Lip GY. Quality of life in patients with atrial fibrillation: a systematic review. Am J Med. 2006;119(5):448.e1-19.
  12. Nuotio I, Hartikainen JE, Grönberg T, Biancari F, Airaksinen KE. Time to cardioversion for acute atrial fibrillation and thromboembolic complications. JAMA. 2014;312(6):647-9.

The Problem of Thor Bridge


Disclosure: This post is unusually full of hearsay and conjecture. Like a secondary endpoint that flirts with statistical significance it should be viewed purely as hypothesis generating. For a more reasoned and experienced view of the following data please read Josh Farkas’s wonderful post on

Damage control ventilation is not a novel concept. It functions under the premise that positive-pressure ventilation intrinsically possesses few curative properties and rather acts as a bridge until a more suitable state of ventilatory well-being can be achieved. As such, we should view its utilization as a necessary evil and endeavor not to correct the patient’s pathological perturbations but rather limit its iatrogenic harms. Since the publication of the ARDSNet protocol in 2000 we have known that striving to achieve physiological normality leads to greater parenchymal injury and downstream mortality (1). Later research demonstrated that even in patients without fulminant ARDS, a protective lung strategy is beneficial (2). Understandably we are reticent to initiate mechanical ventilation unless absolutely necessary. Because of its abilities to delay and even prevent more invasive forms of ventilatory support, non-invasive ventilation (NIV) has long been the darling of the emergent management of most respiratory complaints. It is a rare respiratory ailment that cannot be remedied with a tincture of positive-pressure ventilatory support delivered via a form-fitting face mask. Its widespread implementation is primarily borne from NIV’s capacity to provide a bridge to a more definitive form of therapeutic support. Due in part to NIV’s ability to decrease the rate of intubation in patients presenting with COPD and CHF exacerbations, it is more readily  being utilized in a subgroup of patients where a definitive destination is far less assured, a group of patients where the cause of their current dyspnea is not so readily correctable. A bridge, if you permit me a moment of sensationalism, to nowhere…

Although the efficacy for the use of NIV in COPD exacerbations and acute cardiogenic pulmonary edema are well documented (3,4,5,6,7), the evidence for its use in managing other forms of hypoxic failure, such as pneumonia and ARDS, is far less robust. In fact there is some less than perfect evidence demonstrating that in these populations, NIV fails to prevent intubation and in this subset of patients, who are unsuccessful in their trial of non-invasive ventilatory support, the mortality is higher than in those patients who were initially intubated (8,9). And so the authors of the “Clinical Effect of the Association of Non-invasive Ventilation and High Flow Nasal Oxygen Therapy in Resuscitation of Patients with Acute Lung Injury (FLORALI)” trial hoped to examine whether NIV was superior to standard face mask oxygenation therapy in patients with acute hypoxic respiratory failure (10). Frat et al examined two forms of non-invasive ventilatory strategies in patients admitted to the ICU with non-hypercapneic, non-cardiogenic hypoxic respiratory failure. The first was the traditional bi-level positive pressure ventilation, more commonly known as BPAP. The second was high-flow (50 L/min) humidified oxygen delivered via nasal cannula. Using a 1:1:1 ratio the author’s randomized 313 patients too either BPAP, high-flow NC or standard 2270840_origfacemask support. The authors enrolled a relatively sick spectrum of patients. In order to be enrolled patients were required to have a respiratory rate of more than 25 breaths per minute, a PaO2/FiO2 of 300 mg Hg or less while on 10 L of supplementary O2, have a PaCO2 of no higher than 45 mm Hg with no history of underlying chronic respiratory disease. Additionally patients were excluded if they presented with an exacerbation of asthma or COPD, cardiogenic pulmonary edema, severe neutropenia, hemodynamic instability, use of vasopressors, a GCS of 12 or less, any contraindication to non-invasive ventilation, an urgent need for intubation or DNI orders. Given these stringent inclusion and exclusion criteria it is no surprise that out of the 2506 patients to present to one of the 23 participating ICUs, only 525 met the criteria for inclusion. Of these 313 underwent randomization and 310 were included in the final analysis (10).

The cause of hypoxia in the vast majority (75.5%) of these patients was due to pneumonia. The authors’ primary endpoint was the number of patients in each group who underwent endotracheal intubation within 28-days of enrollment. Although the authors found no statistical difference in the rate of intubation between the three groups, it is difficult not to infer a clinically important difference that was statistically overlooked due to the limited power generated by an n of 310. The 28-day intubation rate in the high-flow O2 group was 37% compared to 47% and 50% in the face-mask and BPAP groups respectively (an absolute difference of 10% and 13% respectively). When the more severely hypoxic patients were examined (those with a PaO2/FiO2 < 200), this absolute difference increased to 18% and 23% respectively. Additionally patients randomized to high-flow O2 had lower mortality rates, compared to either the facemask or BPAP groups. ICU mortality was 11%, 19% and 25% respectively and 90-mortality was 12%, 23%, and 28% respectively. In the patients with a more pronounced hypoxia these differences in mortality became even more pronounced. In patients with an PaO2/FiO2 < 200 the ICU mortality was 12%, 21.6% and 28.4%, while the 90-day mortality was 13.2%, 27.0% and 32.1%. Although the primary endpoint of this trial was negative (p= 0.18), there is a clear and consistent improvement in outcomes of patients randomized to high-flow O2 compared to the other two non-invasive strategies (10).

This trial is nowhere near perfect. The sample size is far too small to confidently rule out statistical whimsy’s causal responsibility for these findings.  Additionally it is difficult to discern whether high-flow O2 was beneficial in this subgroup of patients or rather BPAP was deleterious. Most importantly it fails address the question of primary concern for the Emergency Physician. Is non-invasive ventilation preferable to early endotracheal intubation? Frey et al compared high-flow O2 and BPAP therapy to standard face-mask oxygenation, which does not help us differentiate whether NIV is superior to early invasive ventilator support. Furthermore this trial examines the use of NIV in ICU patients over prolonged periods (median time to intubation was 17-27 hours), it does not tell us whether the use of BPAP is detrimental while patients are managed in the Emergency Department. Given these shortcomings how should we view these data?

Technically from a Frequentist’s viewpoint these statistically significant secondary endpoints are just hypothesis building and additional studies are required to validate these preliminary findings. But what if for a moment, we were to take a Bayesian perspective and examine this very same paper from an alternative vantage? How then would this data appear? Bayesian statistics takes an inductive perspective when examining data. Simply put it asks how does this data affect the prior scientific belief? Given the data presented in this trial, what is the most probable hypothesis that explains these results (12)? How do these results change the current scientific belief that was held prior to this study being conducted? Alternatively, when using Frequentist statistics we employ deductive methodology to address one question and utilize a predetermined statistical threshold to either accept or reject the null-hypothesis. All other questions examined in the paper are essentially exploratory and, due to the single minded nature of the p-value, are simply hypothesis generating (11).

Examining the data published by Frat et al, one would conclude the most probable hypothesis that would explain these events is:

In patients with non-hypercapnic, non-cardiogenic, hypoxic respiratory failure high-flow oxygen therapy decreases both mortality and the rate of intubation when compared to face-mask oxygenation. Additionally the use of BPAP does not decrease the rate of intubation and may in fact increase mortality in a subset of the sickest patients.

How does this effect the prior scientific belief of the efficacy of NIV in patients with hypoxic respiratory failure? Frat et al certainly supports the prior evidence demonstrating that BPAP therapy is detrimental in this subset of patients with hypoxic respiratory failure. In fact the rate of endotracheal intubation (50%) is essentially identical to rates cited in prior cohorts (8). It also highlights that these negative effects may in fact be due to the therapy itself rather than the delay to definitive airway management as was previously hypothesized. Though there was a non-significant increase in the median time to intubation in the BPAP group compared to patients receiving face-mask therapy alone, the time to intubation between the BPAP and high-flow O2 groups were identical. And yet despite these minimal differences in time to intubation, the patients who underwent intubation in the BPAP group had an increased mortality when compared to those randomized to either face-mask and high-flow oxygen (10). Patients in the BPAP group, with the help of positive pressure, achieved average tidal volumes of 9cc/kg. As the ARDSNET trial group demonstrated when administering positive pressure ventilation, a lung protective strategy, tidal volumes of 6cc/kg, led to significant improvement in outcomes in patients with ARDS (1). Determann et al demonstrated that even in patients without ARDS, lung protective strategies led to improved outcomes when compared to more traditional physiological lung volumes (2). Until now we have cognitively absolved positive pressure delivered in a non-invasive form as a causative agent of such complications. The findings of Frat et al have, for the first time, cast a shadow of doubt on the innocence of NIV.

As far as the spectacular results demonstrated by the high-flow O2 group, given the size of the population studied and a paucity of previous science with which to compare, it is hard to know how much credence to place in these results. What is clear is we should no longer view high-flow O2 as a substandard option, reserved only for patients who have failed to tolerate the more traditional forms of NIV. Rather high-flow O2 may provide a unique form of respiratory support that is not accounted for by our prior understanding of NIV (10).

We have known for some time that the use of positive pressure ventilation is the result of being forced to choose between the lesser of two evils. Although it provides a means of ventilatory support, it itself possesses little inherent therapeutic benefits. In fact, positive-pressure ventilation comes at the cost of hemodynamic compromise, iatrogenic lung injury, nosocomial infections, and sedation protocols that leave the patients confused and delirious.  As such, a damage control strategy is typically employed to limit these downstream harms until the patients own ventilatory capacity has returned. Until now these strategies have been limited to invasive forms of ventilatory support. The Frat et al data suggests that, to some degree, non-invasive ventilatory support may be associated with similar iatrogenic harms. Although the current data is incomplete, it should remind us that if we intend to construct a bridge, we should have some understanding of where this intended conduit will lead and if this is a healthier destination then where we started.

Sources Cited:

1.         Ventilation with lower tidal volumes as compared with traditional tidal volumes for acute lung injury and the acute respiratory distress syndrome: the Acute Respiratory Distress Syndrome Network. N Engl J Med 2000;342:1301‒8.

2.         Determann RM, Royakkers A, Wolthuis EK, et al. Ventilation with lower tidal volumes as compared with conventional tidal volumes for patients without acute lung injury: a preventive randomized controlled trial. Crit Care 2010;14(1):R1.

3.         Brochard L, Mancebo J, Wysocki M, et al. Noninvasive ventilation for acute exacer- bations of chronic obstructive pulmonary disease. N Engl J Med 1995;333:817-22.

4.         Keenan SP, Sinuff T, Cook DJ, Hill NS. Which patients with acute exacerbation of chronic obstructive pulmonary disease ben- efit from noninvasive positive-pressure ventilation? A systematic review of the lit- erature. Ann Intern Med 2003;138:861-70.

5.         Lightowler JV, Wedzicha JA, Elliott MW, Ram FS. Non-invasive positive pres- sure ventilation to treat respiratory failure resulting from exacerbations of chronic obstructive pulmonary disease: Cochrane systematic review and meta-analysis. BMJ 2003;326:185.

6.         Masip J, Roque M, Sánchez B, Fernán- dez R, Subirana M, Expósito JA. Noninva- sive ventilation in acute cardiogenic pul- monary edema: systematic review and meta-analysis. JAMA 2005;294:3124-30.

7.         Gray A, Goodacre S, Newby DE, et al. Noninvasive ventilation in acute cardiogenic pulmonary edema. N Engl J Med. 2008;359(2):142-51.

8.         Carrillo A, Gonzalez-diaz G, Ferrer M, et al. Non-invasive ventilation in community-acquired pneumonia and severe acute respiratory failure. Intensive Care Med. 2012;38(3):458-66..

9.         Delclaux C, L’Her E, Alberti C, et al. Treatment of acute hypoxemic nonhyper- capnic respiratory insufficiency with con- tinuous positive airway pressure delivered by a face mask: a randomized controlled trial. JAMA 2000;284:2352-60.

10.      Frat JP, Thille AW, Mercat A, et al. High-Flow Oxygen through Nasal Cannula in Acute Hypoxemic Respiratory Failure. N Engl J Med. 2015;

11.      Goodman SN. Toward evidence-based medical statistics. 1: The P value fallacy. Ann Intern Med. 1999;130(12):995-1004.

12.      Goodman SN. Toward evidence-based medical statistics. 2: The Bayes factor. Ann Intern Med. 1999;130(12):1005-13.

The Third Annotation of a Case of Identity


So often in modern medicine we mistake science for truth. In doing so we have become enamored with the p-value and view it as the major determinant of relevance in scientific inquiry. An almost arbitrary selected value of 0.05 is independently responsible for defining what is considered beneficial, and what will be discarded as medical quackery. The p-value was first proposed by Ronald Fisher as a novel method of defining the probability that the results observed had occurred by chance alone. Or stated more formally, “the probability, under the assumption of no effect or no difference (the null hypothesis), of obtaining a result equal to or more extreme than what was actually observed” (1). Originally intended as a tool for clinicians to assess whether the results from a trial were due to the treatment effect in question or merely random chance, its meaning has transformed into something far more divine. Despite its overwhelming acceptance, the p-value has many flaws. It is incapable of distinguishing clinical relevance, rather only denotes the probability of equivalence. In addition, its faculties are easily overwhelmed when multiple observations are performed. Finally, the mathematical assumptions it is built upon do not take into account prior evidence and provide no guidance for future endeavors (1).

Our romance with the p-value has not gone unnoticed by many pharmaceutical companies who have learned that they need not manufacture a drug that produces clinical benefits, but rather fabricate a trial that demonstrates statistical significance. Since the p-value does not take into account prior evidence, trialists are not required to justify results as they relate to the entirety of an evidentiary body but rather demonstrate singular mathematical significance in a statistical vacuum. As such we are asked to live in the evidentiary present with only selective access to past knowledge. Even when we are granted a privileged glimpse at results from prior trials, it is often comprised of incomplete and limited data intended to sway our opinion in a deliberate manner. This phenomenon, known as publication bias, allows pharmaceutical companies to preferentially publish trials with p-values that suit their interests while suppressing others that do not support their claims. By prospectively highlighting a would-be therapy’s more flattering features and bullying Frequentist statistics with sample sizes that would make even negligible differences significant, it is easy to snatch statistical victory from the grasp of clinical obscurity. This is likely what the makers of ticagrelor hoped for when they designed the PEGASUS Trial.

PEGASUS Trial’s intentions were to extend ticagrelor’s temporal indication beyond the 12-month window, testing the hypothesis that long-term therapy of ticagrelor in conjunction with low-dose aspirin reduces the risk of major adverse cardiovascular events among stable patients with a history of myocardial infarction. Bonaca et al randomized 21,162 patients who experienced a myocardial infarction within the past 1-3 years to either 90 mg or 60 mg of ticagrelor twice daily or placebo. This is not the first time such a hypothesis has been investigated (2). Multiple trials have studied whether prolonged use of P2Y12 inhibitors possess any value other than augmenting the pharmaceutical industries’ coffers. The largest of these investigations, the DAPT trial, was published in 2014 by Mauri et al in NEJM (3). This trial examined patients 12 months after a cardiovascular event and considered whether the continuation of either clopidogrel or prasugrel was beneficial. The authors randomized 9,961 patients to either a P2Y12 inhibitor or an appropriate placebo. The DAPT Trial demonstrated that prolonged use of dual-antiplatelet therapy decreased the rate of cardiovascular events (4.3% vs. 5.9%) and stent restenosis (0.4% vs 1.4%) in exchange for an increased rate of severe bleeding (2.5% vs. 1.6%). There was also a small increase in overall mortality (2% vs 1.5%) in patients randomized to prolonged P2Y12 inhibition (3). Multiple recent meta-analyses confirm these findings (4,5). These results should come as no surprise as the bulk of the literature examining P2Y12 inhibitors has highlighted their benefit primarily as a means of reducing type 4a peri-procedural infarctions of questionable clinical relevance. And so this was the landscape AstraZeneca faced when designing the PEGASUS Trial. Every prior trial examining the question of prolonged dual-antiplatelet therapy has demonstrated that the small reductions in ischemic endpoints are easily overshadowed by the excessive increase in the rate of severe bleeding events. Fortunately in the modern era of Frequentist statistics none of these failures matter. Because the p-value does not account for prior evidence, the authors of the PEGASUS Trial did not have to account for this less-than-stellar history. Success by modern standards is simply the ability to contrive a primary endpoint that will demonstrate an appreciably low enough p-value to be considered significant.

Bonaca et al’s primary outcome was the composite rate of cardiovascular death, MI and stroke over the follow up period (3-years). The absolute rate of primary events were 7.85%, 7.77%, and 9.02% in the 90 mg, 60 mg and placebo groups respectively. This small (approximately 1.20% absolute difference) was found to be impressively statistically significant (p-values of 0.008 and 0.004 in the 90 mg vs placebo and 60 mg vs placebo comparisons respectively). Its clinical significance is far more questionable, and unlike its statistical counterpart cannot be bullied by the mass and size of the sample population. The effect size of this composite outpoint is diminutively small. The effect sizes of each respective component of this composite outcome are even smaller. The only measure that maintained its statistical significance consistently across all treatment comparisons was the reduction in myocardial infarction, which boasts a 0.85% and 0.72% absolute reduction in the 90 mg and 60 mg groups respectively.

Conversely the rates of bleeding in the patients randomized to receive the active agent were impressively high, especially given the previous studies examining ticagrelor demonstrated a more reasonable safety profile. The rate of TIMI major bleeding was 2.6%, 2.3% and 1.06% in the 90 mg, 60 mg and placebo groups respectively. Since both the rate of intracranial hemorrhage and fatal hemorrhage were statistically similar, most of this excess bleeding seems to be in the form of “clinically overt hemorrhage associated with a drop in hemoglobin of ≥5 g/dL or a ≥15% absolute decrease in hematocrit.” (2) These results are not too dissimilar from those of the DAPT Trial(3). Patients taking P2Y12 inhibitors will benefit from a slight decrease in the risk of non-fatal myocardial infarctions and stent restenosis while experiencing an increased risk of clinically significant bleeding.

Despite the positive spin of this trial, it is far from a success. The investigators enrolled the more infirmed spectrum of patients with CAD, so as to include a cohort more likely to benefit from additional anti-platelet inhibition. They also excluded the patients most at risk for hemorrhagic complications so as to limit the appearance of adversity. Investigators excluded patients with a history of ischemic stroke or intracranial hemorrhage, a central nervous system tumor, an intracranial vascular abnormality, with a history of gastrointestinal bleeding within the previous 6 months or major surgery within the previous 30 days (2). This of course in itself is not a concern, was it not for the likely application of prolonged dual-antiplatelet therapy to a far broader patient population.

Our current version of evidence-based medicine has left us susceptible to mistaking mathematical manipulations as scientific truth. It is short sighted and allows for the linguistic error of misinterpreting statistical significance for clinical relevance. The PEGASUS Trial boasts p-values far below what is traditionally considered significant, and yet p-values below 0.05 hold little intrinsic value to our patients’ well being. Yes, from a Frequentist’s perspective we are capable of concluding with relative certainty that the use of ticagrelor decreases the composite endpoint of myocardial death, MI, or stroke. The clinical relevance of which, is far from certain as its weight is powered exclusively by a decrease in myocardial infarctions. It is unlikely this small benefit is worth the impressive increase in serious hemorrhagic events. From the very earliest trials examining P2Y12 inhibitors, their benefits have been primarily due to the manipulation of statistical constructs rather than any inherent efficacy (6,7). The PEGASUS Trial is no different. These trials are not landmark demonstrations of P2Y12 inhibitors’ benefits, but rather statistical manipulations of clinical insignificant differences stacked one on top of the other to give the appearance of height when none is present. It is the statistical equivalent of an eyespot meant to keep the scorn of the medical skeptics at bay. Know that we are not scared or confused by your statistical mimicry. We see these trials for what they are, pharmaceutical advertisements poorly hidden behind the guise of scientific inquiry.

Sources Cited:

  1. Goodman SN. Toward evidence-based medical statistics. 1: The P value fallacy. Ann Intern Med. 1999;130(12):995-1004.
  1. Bonaca MP, Bhatt DL, Cohen M, et al. Long-Term Use of Ticagrelor in Patients with Prior Myocardial Infarction. N Engl J Med. 2015
  1. Mauri L, Kereiakes DJ, Yeh RW, et al. Twelve or 30 months of dual antiplatelet therapy after drug-eluting stents. N Engl J Med. 2014;371(23):2155-66.
  1. Palmerini T, Sangiorgi D, Valgimigli M, et al. Short- versus long-term dual antiplatelet therapy after drug-eluting stent implantation: an individual patient data pairwise and network meta-analysis. J Am Coll Cardiol. 2015;65(11):1092-102.
  1. Giustino G, Baber U, Sartori S, et al. Duration of Dual Antiplatelet Therapy After Drug-Eluting Stent Implantation: A Systematic Review and Meta-Analysis of Randomized Controlled Trials. J Am Coll Cardiol. 2015;65:(13)1298-310
  2. Yusuf S, Zhao F, Mehta SR, et al. Effects of clopidogrel in addition to aspirin in patients with acute coronary syndromes without ST-segment elevation. N Engl J Med. 2001;345(7):494-502.
  3. A randomised, blinded, trial of clopidogrel versus aspirin in patients at risk of ischaemic events (CAPRIE). CAPRIE Steering Committee. Lancet. 1996;348(9038):1329-39.