The Case of the Irregular Irregularity

Attachment-1

We have proven ourselves highly capable of managing atrial fibrillation in the Emergency Department. In recent years, a number of prospective cohorts have demonstrated that with the use of IV anti-arrhythmic medication and electrical cardioversion, patients presenting to the Emergency Department with new onset atrial fibrillation can be successfully discharged in sinus rhythm consistently and with minimal adverse events. In 2010, Steill et al published a case series of 660 patients who were cardioverted in the Emergency Department (1). What they coined the “Ottawa Aggressive Protocol” consisted of chemically managed rate control followed by a trial of procainamide loaded over an hour and, if this failed to convert the patient, DC electrical cardioversion. Using this protocol, Steill et al cite the number of patients who were discharged home in normal sinus rhythm to be 595 (90.2%). In a recent systematic review published in the European Journal of Emergency Medicine, Coll-Vinent et al found that in patients who underwent Emergency Department cardioversion, 78.2%-100% were discharged home in a normal sinus rhythm (2).

But competency is not directly translatable into efficacy. Despite this proof of concept, there is limited data examining the patient-oriented benefits these aggressive rhythm control strategies produce. In fact, the majority of such studies employ the “rhythm at Emergency Department discharge” as their measure of success. And though being discharged from the Emergency Department in a sinus rhythm seems preferential over atrial fibrillation, little is known regarding the extent of this benefit, as very few trials rigorously monitored patients following discharge from the Emergency Department. How many of these patients remained in a sinus rhythm and for how long? Steill et al found that only 8.6% of their cohort returned to the Emergency Department within one week of cardioversion with any reoccurrence of atrial fibrillation. Unfortunately these numbers were calculated from a chart extraction of the Ottawa Hospital health records database and do not directly reflect the number of patients who experienced atrial fibrillation over the 7 days following Emergency Department discharge (1). Decker et al, in a small cohort of 150 patients, cite a recurrence rate of 10% at 6 months (3). What is the true recurrence rate? Even more importantly, does reestablishing sinus conduction lead to improved patient health and wellbeing?

The question at hand remains, what exactly are we achieving by performing cardioversions in the Emergency Department? We have known for some time that despite being capable of maintaining patients in a sinus rhythm with moderate success, an aggressive rhythm control strategy does not prevent the long term sequelae associated with atrial fibrillation. The AFFIRM trial published in the NEJM in 2002, demonstrated that in a cohort of 4060 patients with atrial fibrillation, although the use of a rhythm control strategy reduced the time patients spent in atrial fibrillation, it did not reduce the rate of death, MI or ischemic stroke (4). When the 1391 patients experiencing their first episode of atrial fibrillation or the 1252 patients presenting within 48 hours of symptom onset were examined separately, no additional benefit was discovered (4). Since the AFFIRM trial’s publication a number of studies, performed in various subsets of atrial fibrillation patients, have validated that rhythm control strategies do not prevent the long-term sequelae associated with this chronic disease (5,6)

Since rate control is the preferred long-term treatment strategy of atrial fibrillation, what exactly are our goals for cardioversion in the Emergency Department? Is there a long-term health benefit to aggressive rhythm control in the Emergency Department? Does this lead to noticeable improvements in patient outcomes? Unfortunately conclusive data on these questions has yet to be published. The few RCTs examining the benefits of aggressive management of atrial fibrillation in the Emergency Department are small and inconclusive. Despite this paucity of convincing evidence, I would argue that the mathematical likelihood of benefit is incredibly low. Atrial fibrillation is a chronic disease, with sequelae measured in events per patient year. The rate of short-term adverse events is exceedingly low, with some cohorts citing a 30-day event rate of less than 1% (7). To design a study powered to identify a statistically meaningful difference, the sample size required would be unrealistically high. Especially given that the long-term utilization of such rhythm control strategies have not yielded clinically important improvement in patient outcomes. Furthermore the act of emergent cardioversion, does not avert the need for anticoagulation, as this decision should be based off the patient’s risk of thromboembolic event independent of their rhythm at discharge (8).

If we can agree that the clinical benefits of aggressive cardioversion in the Emergency Department are minimal, then the only remaining justification for Emergency Department cardioversion are the positive effects on patient wellbeing and comfort. The current argument in support of Emergency Department cardioversion hinges on the supposition that a state of sinus regularity is preferred when compared to the electrical chaos induced by atrial fibrillation (9). Until recently this claim has been exclusively supported by anecdotal descriptions of patient experience, its validity had never been examined in a prospective fashion.

Published online June 2015 in the Annals of Emergency Medicine, Ballard et al sought to objectively assess the effects of Emergency Department cardioversion on patients’ wellbeing and comfort (10). The authors surveyed 730 patients who were treated for new onset atrial fibrillation and discharged from one of 21 medical centers in Northern California. Of this cohort, 652(89%) responded to a structured phone survey. Though the data was prospectively gathered, these patients were not randomized to either a rate or rhythm control strategy, but rather the manner of treatment was left entirely to the judgment of the treating physician. Of the 652 respondents the majority, 432 (67.3%) were managed with rate control therapy alone. Regardless of management strategy, 410 (62.9%) of the patients were discharged from the Emergency Department in a sinus rhythm. Among those patients who underwent electrical cardioversion, 92.2% were in sinus rhythm upon discharge. If you consider discharge rhythm as a metric of success than electrical cardioversion was a far more accomplished strategy than either pharmacological cardioversion or rate control therapy alone, which accounted for 81.6% and 49.7% of patients in a sinus rhythm respectively at discharge (10). Despite its obvious superiority in rhythmic control, what benefits does cardioversion provide for patients’ symptom burden at 30-days?

The authors measured 30-day wellbeing using the Atrial Fibrillation Effect on Quality-of-life (AFEQT) score. This 18-question tool was intended to assess the patients’ perception of the burden of disease. The surveys were administered via telephone performed by trained research assistants at least 28-days following Emergency Department visit. Overall 539 patients (82.7%) reported some degree of symptom burden related to their atrial fibrillation upon discharge. The use of cardioversion did not decrease the rate or degree of symptom burden at 30-days. When the authors analyzed the AFEQT scores in quartiles of severity rather than the dichotomous symptom/no symptom outcome, they found no additional benefit to Emergency Department cardioversion. Certainly this data is far from perfect. This was a non-randomized cohort and it is unclear how well the AFEQT score captures symptom burden (10). Despite these shortcomings, findings are consistent with the body of literature examining whether an aggressive rhythm control strategy approves patient wellbeing. A number of trials have examined the long-term benefits rhythm control has on reducing symptom burden. These trials have consistently demonstrated that when compared to rate control alone, an aggressive rhythm control strategy provided no additional perceivable benefit to patients’ wellbeing and comfort (11).

The act of electrical cardioversion within 48 hours of symptom onset is commonly perceived as a safe practice. In a recent review of the existing literature, Cohen et al found that out of 1593 patients, only one (0.06%) stroke was reported. Despite this cursory endorsement, I would caution that safety is measured in the thousands and the current data is far too limited and ripe with publication bias to truly assess safety. Additionally a recent research letter published in JAMA called into question the safety of the 48-hour window we have traditionally used to determine suitability for Emergency Department cardioversion. Nuotio et al published a secondary analysis of the FinV trial registry which examined 2481 patients in atrial fibrillation who underwent electrical cardioversion within 48-hours of symptom onset. In this cohort the risk of ischemic event increased significantly (0.03% to 1.1%) when time to symptom onset was greater than 12 hours. And although 1.1% is still a relatively low event rate, given the absence of any clear clinical benefit, the benefit-harm ratio does not favor an aggressive rhythm control strategy (12).

Modern medicine far too often values competency over efficacy. Whether it is door to balloon time, or the 6-hour sepsis bundle, we are constantly measured in surrogates thought to be associated with improvements in patient outcomes. The quality of our care has been distilled down to what can be marked as complete on a checklist. Although the evidence clearly demonstrates Emergency Physicians are capable of effectively cardioverting new onset atrial fibrillation in the Emergency Department, one cannot help but asking, to what end?

Sources Cited:

  1. Stiell, I.G., Clement, C.M., Perry, J.J. et al. Association of the Ottawa Aggressive Protocol with rapid discharge of emergency department patients with recent-onset atrial fibrillation or flutter. CJEM. 2010; 12: 181–191
  2. Coll-Vinent, B., Fuenzalida, C., Garcia, A. et al. Management of acute atrial fibrillation in the emergency department: a systematic review of recent studies. Eur J Emerg Med. 2013; 20: 151–159
  3. Decker, et al. A Prospective, Randomized Trial of an Emergency Department Observation Unit for Acute Onset Atrial Fibrillation.  Annals of Emergency Medicine, 2007.
  4. Wyse DG, Waldo AL, Dimarco JP, et al. A comparison of rate control and rhythm control in patients with atrial fibrillation. N Engl J Med. 2002;347(23):1825-33.
  5. Van gelder IC, Hagens VE, Bosker HA, et al. A comparison of rate control and rhythm control in patients with recurrent persistent atrial fibrillation. N Engl J Med. 2002;347(23):1834-40.
  6. Roy D, Talajic M, Nattel S, et al. Rhythm control versus rate control for atrial fibrillation and heart failure. N Engl J Med. 2008;358(25):2667-77.
  7. Scheuermeyer FX, Grafstein E, Stenstrom R, et al. Thirty-day and 1-year outcomes of emergency department patients with atrial fibrillation and no acute underlying medical cause. Ann Emerg Med. 2012;60(6):755-765.e2.
  8. Wang TJ, Massaro JM, Levy D, et al. A risk score for pre- dicting stroke or death in individuals with new-onset atrial fibrillation in the community — the Framingham Heart Study. JAMA 2003;290:1049-56
  9. Stiell IG, Birnie D. Management of recent-onset atrial fibrillation in the emergency department. Ann Emerg Med. 2011; 57:31–2.
  10. Ballard, DW. et al. Emergency Department Management of Atrial Fibrillation and Flutter and Patient Quality of Life at One Month Postvisit. Annals of Emergency Medicine
  11. Thrall G, Lane D, Carroll D, Lip GY. Quality of life in patients with atrial fibrillation: a systematic review. Am J Med. 2006;119(5):448.e1-19.
  12. Nuotio I, Hartikainen JE, Grönberg T, Biancari F, Airaksinen KE. Time to cardioversion for acute atrial fibrillation and thromboembolic complications. JAMA. 2014;312(6):647-9.

The Problem of Thor Bridge

Drager-Pulmotor

Disclosure: This post is unusually full of hearsay and conjecture. Like a secondary endpoint that flirts with statistical significance it should be viewed purely as hypothesis generating. For a more reasoned and experienced view of the following data please read Josh Farkas’s wonderful post on pulmcrit.org.

Damage control ventilation is not a novel concept. It functions under the premise that positive-pressure ventilation intrinsically possesses few curative properties and rather acts as a bridge until a more suitable state of ventilatory well-being can be achieved. As such, we should view its utilization as a necessary evil and endeavor not to correct the patient’s pathological perturbations but rather limit its iatrogenic harms. Since the publication of the ARDSNet protocol in 2000 we have known that striving to achieve physiological normality leads to greater parenchymal injury and downstream mortality (1). Later research demonstrated that even in patients without fulminant ARDS, a protective lung strategy is beneficial (2). Understandably we are reticent to initiate mechanical ventilation unless absolutely necessary. Because of its abilities to delay and even prevent more invasive forms of ventilatory support, non-invasive ventilation (NIV) has long been the darling of the emergent management of most respiratory complaints. It is a rare respiratory ailment that cannot be remedied with a tincture of positive-pressure ventilatory support delivered via a form-fitting face mask. Its widespread implementation is primarily borne from NIV’s capacity to provide a bridge to a more definitive form of therapeutic support. Due in part to NIV’s ability to decrease the rate of intubation in patients presenting with COPD and CHF exacerbations, it is more readily  being utilized in a subgroup of patients where a definitive destination is far less assured, a group of patients where the cause of their current dyspnea is not so readily correctable. A bridge, if you permit me a moment of sensationalism, to nowhere…

Although the efficacy for the use of NIV in COPD exacerbations and acute cardiogenic pulmonary edema are well documented (3,4,5,6,7), the evidence for its use in managing other forms of hypoxic failure, such as pneumonia and ARDS, is far less robust. In fact there is some less than perfect evidence demonstrating that in these populations, NIV fails to prevent intubation and in this subset of patients, who are unsuccessful in their trial of non-invasive ventilatory support, the mortality is higher than in those patients who were initially intubated (8,9). And so the authors of the “Clinical Effect of the Association of Non-invasive Ventilation and High Flow Nasal Oxygen Therapy in Resuscitation of Patients with Acute Lung Injury (FLORALI)” trial hoped to examine whether NIV was superior to standard face mask oxygenation therapy in patients with acute hypoxic respiratory failure (10). Frat et al examined two forms of non-invasive ventilatory strategies in patients admitted to the ICU with non-hypercapneic, non-cardiogenic hypoxic respiratory failure. The first was the traditional bi-level positive pressure ventilation, more commonly known as BPAP. The second was high-flow (50 L/min) humidified oxygen delivered via nasal cannula. Using a 1:1:1 ratio the author’s randomized 313 patients too either BPAP, high-flow NC or standard 2270840_origfacemask support. The authors enrolled a relatively sick spectrum of patients. In order to be enrolled patients were required to have a respiratory rate of more than 25 breaths per minute, a PaO2/FiO2 of 300 mg Hg or less while on 10 L of supplementary O2, have a PaCO2 of no higher than 45 mm Hg with no history of underlying chronic respiratory disease. Additionally patients were excluded if they presented with an exacerbation of asthma or COPD, cardiogenic pulmonary edema, severe neutropenia, hemodynamic instability, use of vasopressors, a GCS of 12 or less, any contraindication to non-invasive ventilation, an urgent need for intubation or DNI orders. Given these stringent inclusion and exclusion criteria it is no surprise that out of the 2506 patients to present to one of the 23 participating ICUs, only 525 met the criteria for inclusion. Of these 313 underwent randomization and 310 were included in the final analysis (10).

The cause of hypoxia in the vast majority (75.5%) of these patients was due to pneumonia. The authors’ primary endpoint was the number of patients in each group who underwent endotracheal intubation within 28-days of enrollment. Although the authors found no statistical difference in the rate of intubation between the three groups, it is difficult not to infer a clinically important difference that was statistically overlooked due to the limited power generated by an n of 310. The 28-day intubation rate in the high-flow O2 group was 37% compared to 47% and 50% in the face-mask and BPAP groups respectively (an absolute difference of 10% and 13% respectively). When the more severely hypoxic patients were examined (those with a PaO2/FiO2 < 200), this absolute difference increased to 18% and 23% respectively. Additionally patients randomized to high-flow O2 had lower mortality rates, compared to either the facemask or BPAP groups. ICU mortality was 11%, 19% and 25% respectively and 90-mortality was 12%, 23%, and 28% respectively. In the patients with a more pronounced hypoxia these differences in mortality became even more pronounced. In patients with an PaO2/FiO2 < 200 the ICU mortality was 12%, 21.6% and 28.4%, while the 90-day mortality was 13.2%, 27.0% and 32.1%. Although the primary endpoint of this trial was negative (p= 0.18), there is a clear and consistent improvement in outcomes of patients randomized to high-flow O2 compared to the other two non-invasive strategies (10).

This trial is nowhere near perfect. The sample size is far too small to confidently rule out statistical whimsy’s causal responsibility for these findings.  Additionally it is difficult to discern whether high-flow O2 was beneficial in this subgroup of patients or rather BPAP was deleterious. Most importantly it fails address the question of primary concern for the Emergency Physician. Is non-invasive ventilation preferable to early endotracheal intubation? Frey et al compared high-flow O2 and BPAP therapy to standard face-mask oxygenation, which does not help us differentiate whether NIV is superior to early invasive ventilator support. Furthermore this trial examines the use of NIV in ICU patients over prolonged periods (median time to intubation was 17-27 hours), it does not tell us whether the use of BPAP is detrimental while patients are managed in the Emergency Department. Given these shortcomings how should we view these data?

Technically from a Frequentist’s viewpoint these statistically significant secondary endpoints are just hypothesis building and additional studies are required to validate these preliminary findings. But what if for a moment, we were to take a Bayesian perspective and examine this very same paper from an alternative vantage? How then would this data appear? Bayesian statistics takes an inductive perspective when examining data. Simply put it asks how does this data affect the prior scientific belief? Given the data presented in this trial, what is the most probable hypothesis that explains these results (12)? How do these results change the current scientific belief that was held prior to this study being conducted? Alternatively, when using Frequentist statistics we employ deductive methodology to address one question and utilize a predetermined statistical threshold to either accept or reject the null-hypothesis. All other questions examined in the paper are essentially exploratory and, due to the single minded nature of the p-value, are simply hypothesis generating (11).

Examining the data published by Frat et al, one would conclude the most probable hypothesis that would explain these events is:

In patients with non-hypercapnic, non-cardiogenic, hypoxic respiratory failure high-flow oxygen therapy decreases both mortality and the rate of intubation when compared to face-mask oxygenation. Additionally the use of BPAP does not decrease the rate of intubation and may in fact increase mortality in a subset of the sickest patients.

How does this effect the prior scientific belief of the efficacy of NIV in patients with hypoxic respiratory failure? Frat et al certainly supports the prior evidence demonstrating that BPAP therapy is detrimental in this subset of patients with hypoxic respiratory failure. In fact the rate of endotracheal intubation (50%) is essentially identical to rates cited in prior cohorts (8). It also highlights that these negative effects may in fact be due to the therapy itself rather than the delay to definitive airway management as was previously hypothesized. Though there was a non-significant increase in the median time to intubation in the BPAP group compared to patients receiving face-mask therapy alone, the time to intubation between the BPAP and high-flow O2 groups were identical. And yet despite these minimal differences in time to intubation, the patients who underwent intubation in the BPAP group had an increased mortality when compared to those randomized to either face-mask and high-flow oxygen (10). Patients in the BPAP group, with the help of positive pressure, achieved average tidal volumes of 9cc/kg. As the ARDSNET trial group demonstrated when administering positive pressure ventilation, a lung protective strategy, tidal volumes of 6cc/kg, led to significant improvement in outcomes in patients with ARDS (1). Determann et al demonstrated that even in patients without ARDS, lung protective strategies led to improved outcomes when compared to more traditional physiological lung volumes (2). Until now we have cognitively absolved positive pressure delivered in a non-invasive form as a causative agent of such complications. The findings of Frat et al have, for the first time, cast a shadow of doubt on the innocence of NIV.

As far as the spectacular results demonstrated by the high-flow O2 group, given the size of the population studied and a paucity of previous science with which to compare, it is hard to know how much credence to place in these results. What is clear is we should no longer view high-flow O2 as a substandard option, reserved only for patients who have failed to tolerate the more traditional forms of NIV. Rather high-flow O2 may provide a unique form of respiratory support that is not accounted for by our prior understanding of NIV (10).

We have known for some time that the use of positive pressure ventilation is the result of being forced to choose between the lesser of two evils. Although it provides a means of ventilatory support, it itself possesses little inherent therapeutic benefits. In fact, positive-pressure ventilation comes at the cost of hemodynamic compromise, iatrogenic lung injury, nosocomial infections, and sedation protocols that leave the patients confused and delirious.  As such, a damage control strategy is typically employed to limit these downstream harms until the patients own ventilatory capacity has returned. Until now these strategies have been limited to invasive forms of ventilatory support. The Frat et al data suggests that, to some degree, non-invasive ventilatory support may be associated with similar iatrogenic harms. Although the current data is incomplete, it should remind us that if we intend to construct a bridge, we should have some understanding of where this intended conduit will lead and if this is a healthier destination then where we started.

Sources Cited:

1.         Ventilation with lower tidal volumes as compared with traditional tidal volumes for acute lung injury and the acute respiratory distress syndrome: the Acute Respiratory Distress Syndrome Network. N Engl J Med 2000;342:1301‒8.

2.         Determann RM, Royakkers A, Wolthuis EK, et al. Ventilation with lower tidal volumes as compared with conventional tidal volumes for patients without acute lung injury: a preventive randomized controlled trial. Crit Care 2010;14(1):R1.

3.         Brochard L, Mancebo J, Wysocki M, et al. Noninvasive ventilation for acute exacer- bations of chronic obstructive pulmonary disease. N Engl J Med 1995;333:817-22.

4.         Keenan SP, Sinuff T, Cook DJ, Hill NS. Which patients with acute exacerbation of chronic obstructive pulmonary disease ben- efit from noninvasive positive-pressure ventilation? A systematic review of the lit- erature. Ann Intern Med 2003;138:861-70.

5.         Lightowler JV, Wedzicha JA, Elliott MW, Ram FS. Non-invasive positive pres- sure ventilation to treat respiratory failure resulting from exacerbations of chronic obstructive pulmonary disease: Cochrane systematic review and meta-analysis. BMJ 2003;326:185.

6.         Masip J, Roque M, Sánchez B, Fernán- dez R, Subirana M, Expósito JA. Noninva- sive ventilation in acute cardiogenic pul- monary edema: systematic review and meta-analysis. JAMA 2005;294:3124-30.

7.         Gray A, Goodacre S, Newby DE, et al. Noninvasive ventilation in acute cardiogenic pulmonary edema. N Engl J Med. 2008;359(2):142-51.

8.         Carrillo A, Gonzalez-diaz G, Ferrer M, et al. Non-invasive ventilation in community-acquired pneumonia and severe acute respiratory failure. Intensive Care Med. 2012;38(3):458-66..

9.         Delclaux C, L’Her E, Alberti C, et al. Treatment of acute hypoxemic nonhyper- capnic respiratory insufficiency with con- tinuous positive airway pressure delivered by a face mask: a randomized controlled trial. JAMA 2000;284:2352-60.

10.      Frat JP, Thille AW, Mercat A, et al. High-Flow Oxygen through Nasal Cannula in Acute Hypoxemic Respiratory Failure. N Engl J Med. 2015;

11.      Goodman SN. Toward evidence-based medical statistics. 1: The P value fallacy. Ann Intern Med. 1999;130(12):995-1004.

12.      Goodman SN. Toward evidence-based medical statistics. 2: The Bayes factor. Ann Intern Med. 1999;130(12):1005-13.

The Third Annotation of a Case of Identity

gamblers-in-monte-carlo

So often in modern medicine we mistake science for truth. In doing so we have become enamored with the p-value and view it as the major determinant of relevance in scientific inquiry. An almost arbitrary selected value of 0.05 is independently responsible for defining what is considered beneficial, and what will be discarded as medical quackery. The p-value was first proposed by Ronald Fisher as a novel method of defining the probability that the results observed had occurred by chance alone. Or stated more formally, “the probability, under the assumption of no effect or no difference (the null hypothesis), of obtaining a result equal to or more extreme than what was actually observed” (1). Originally intended as a tool for clinicians to assess whether the results from a trial were due to the treatment effect in question or merely random chance, its meaning has transformed into something far more divine. Despite its overwhelming acceptance, the p-value has many flaws. It is incapable of distinguishing clinical relevance, rather only denotes the probability of equivalence. In addition, its faculties are easily overwhelmed when multiple observations are performed. Finally, the mathematical assumptions it is built upon do not take into account prior evidence and provide no guidance for future endeavors (1).

Our romance with the p-value has not gone unnoticed by many pharmaceutical companies who have learned that they need not manufacture a drug that produces clinical benefits, but rather fabricate a trial that demonstrates statistical significance. Since the p-value does not take into account prior evidence, trialists are not required to justify results as they relate to the entirety of an evidentiary body but rather demonstrate singular mathematical significance in a statistical vacuum. As such we are asked to live in the evidentiary present with only selective access to past knowledge. Even when we are granted a privileged glimpse at results from prior trials, it is often comprised of incomplete and limited data intended to sway our opinion in a deliberate manner. This phenomenon, known as publication bias, allows pharmaceutical companies to preferentially publish trials with p-values that suit their interests while suppressing others that do not support their claims. By prospectively highlighting a would-be therapy’s more flattering features and bullying Frequentist statistics with sample sizes that would make even negligible differences significant, it is easy to snatch statistical victory from the grasp of clinical obscurity. This is likely what the makers of ticagrelor hoped for when they designed the PEGASUS Trial.

PEGASUS Trial’s intentions were to extend ticagrelor’s temporal indication beyond the 12-month window, testing the hypothesis that long-term therapy of ticagrelor in conjunction with low-dose aspirin reduces the risk of major adverse cardiovascular events among stable patients with a history of myocardial infarction. Bonaca et al randomized 21,162 patients who experienced a myocardial infarction within the past 1-3 years to either 90 mg or 60 mg of ticagrelor twice daily or placebo. This is not the first time such a hypothesis has been investigated (2). Multiple trials have studied whether prolonged use of P2Y12 inhibitors possess any value other than augmenting the pharmaceutical industries’ coffers. The largest of these investigations, the DAPT trial, was published in 2014 by Mauri et al in NEJM (3). This trial examined patients 12 months after a cardiovascular event and considered whether the continuation of either clopidogrel or prasugrel was beneficial. The authors randomized 9,961 patients to either a P2Y12 inhibitor or an appropriate placebo. The DAPT Trial demonstrated that prolonged use of dual-antiplatelet therapy decreased the rate of cardiovascular events (4.3% vs. 5.9%) and stent restenosis (0.4% vs 1.4%) in exchange for an increased rate of severe bleeding (2.5% vs. 1.6%). There was also a small increase in overall mortality (2% vs 1.5%) in patients randomized to prolonged P2Y12 inhibition (3). Multiple recent meta-analyses confirm these findings (4,5). These results should come as no surprise as the bulk of the literature examining P2Y12 inhibitors has highlighted their benefit primarily as a means of reducing type 4a peri-procedural infarctions of questionable clinical relevance. And so this was the landscape AstraZeneca faced when designing the PEGASUS Trial. Every prior trial examining the question of prolonged dual-antiplatelet therapy has demonstrated that the small reductions in ischemic endpoints are easily overshadowed by the excessive increase in the rate of severe bleeding events. Fortunately in the modern era of Frequentist statistics none of these failures matter. Because the p-value does not account for prior evidence, the authors of the PEGASUS Trial did not have to account for this less-than-stellar history. Success by modern standards is simply the ability to contrive a primary endpoint that will demonstrate an appreciably low enough p-value to be considered significant.

Bonaca et al’s primary outcome was the composite rate of cardiovascular death, MI and stroke over the follow up period (3-years). The absolute rate of primary events were 7.85%, 7.77%, and 9.02% in the 90 mg, 60 mg and placebo groups respectively. This small (approximately 1.20% absolute difference) was found to be impressively statistically significant (p-values of 0.008 and 0.004 in the 90 mg vs placebo and 60 mg vs placebo comparisons respectively). Its clinical significance is far more questionable, and unlike its statistical counterpart cannot be bullied by the mass and size of the sample population. The effect size of this composite outpoint is diminutively small. The effect sizes of each respective component of this composite outcome are even smaller. The only measure that maintained its statistical significance consistently across all treatment comparisons was the reduction in myocardial infarction, which boasts a 0.85% and 0.72% absolute reduction in the 90 mg and 60 mg groups respectively.

Conversely the rates of bleeding in the patients randomized to receive the active agent were impressively high, especially given the previous studies examining ticagrelor demonstrated a more reasonable safety profile. The rate of TIMI major bleeding was 2.6%, 2.3% and 1.06% in the 90 mg, 60 mg and placebo groups respectively. Since both the rate of intracranial hemorrhage and fatal hemorrhage were statistically similar, most of this excess bleeding seems to be in the form of “clinically overt hemorrhage associated with a drop in hemoglobin of ≥5 g/dL or a ≥15% absolute decrease in hematocrit.” (2) These results are not too dissimilar from those of the DAPT Trial(3). Patients taking P2Y12 inhibitors will benefit from a slight decrease in the risk of non-fatal myocardial infarctions and stent restenosis while experiencing an increased risk of clinically significant bleeding.

Despite the positive spin of this trial, it is far from a success. The investigators enrolled the more infirmed spectrum of patients with CAD, so as to include a cohort more likely to benefit from additional anti-platelet inhibition. They also excluded the patients most at risk for hemorrhagic complications so as to limit the appearance of adversity. Investigators excluded patients with a history of ischemic stroke or intracranial hemorrhage, a central nervous system tumor, an intracranial vascular abnormality, with a history of gastrointestinal bleeding within the previous 6 months or major surgery within the previous 30 days (2). This of course in itself is not a concern, was it not for the likely application of prolonged dual-antiplatelet therapy to a far broader patient population.

Our current version of evidence-based medicine has left us susceptible to mistaking mathematical manipulations as scientific truth. It is short sighted and allows for the linguistic error of misinterpreting statistical significance for clinical relevance. The PEGASUS Trial boasts p-values far below what is traditionally considered significant, and yet p-values below 0.05 hold little intrinsic value to our patients’ well being. Yes, from a Frequentist’s perspective we are capable of concluding with relative certainty that the use of ticagrelor decreases the composite endpoint of myocardial death, MI, or stroke. The clinical relevance of which, is far from certain as its weight is powered exclusively by a decrease in myocardial infarctions. It is unlikely this small benefit is worth the impressive increase in serious hemorrhagic events. From the very earliest trials examining P2Y12 inhibitors, their benefits have been primarily due to the manipulation of statistical constructs rather than any inherent efficacy (6,7). The PEGASUS Trial is no different. These trials are not landmark demonstrations of P2Y12 inhibitors’ benefits, but rather statistical manipulations of clinical insignificant differences stacked one on top of the other to give the appearance of height when none is present. It is the statistical equivalent of an eyespot meant to keep the scorn of the medical skeptics at bay. Know that we are not scared or confused by your statistical mimicry. We see these trials for what they are, pharmaceutical advertisements poorly hidden behind the guise of scientific inquiry.

Sources Cited:

  1. Goodman SN. Toward evidence-based medical statistics. 1: The P value fallacy. Ann Intern Med. 1999;130(12):995-1004.
  1. Bonaca MP, Bhatt DL, Cohen M, et al. Long-Term Use of Ticagrelor in Patients with Prior Myocardial Infarction. N Engl J Med. 2015
  1. Mauri L, Kereiakes DJ, Yeh RW, et al. Twelve or 30 months of dual antiplatelet therapy after drug-eluting stents. N Engl J Med. 2014;371(23):2155-66.
  1. Palmerini T, Sangiorgi D, Valgimigli M, et al. Short- versus long-term dual antiplatelet therapy after drug-eluting stent implantation: an individual patient data pairwise and network meta-analysis. J Am Coll Cardiol. 2015;65(11):1092-102.
  1. Giustino G, Baber U, Sartori S, et al. Duration of Dual Antiplatelet Therapy After Drug-Eluting Stent Implantation: A Systematic Review and Meta-Analysis of Randomized Controlled Trials. J Am Coll Cardiol. 2015;65:(13)1298-310
  2. Yusuf S, Zhao F, Mehta SR, et al. Effects of clopidogrel in addition to aspirin in patients with acute coronary syndromes without ST-segment elevation. N Engl J Med. 2001;345(7):494-502.
  3. A randomised, blinded, trial of clopidogrel versus aspirin in patients at risk of ischaemic events (CAPRIE). CAPRIE Steering Committee. Lancet. 1996;348(9038):1329-39.

 

A Truncated Summation of the Adventure of the Cardboard Box

 

SL28408

One gets the sense when reading the literature on endovascular therapy for acute ischemic stroke that they are on a small seafaring vessel attempting to map the shoreline through a dense fog. There are moments when the fog lifts and you catch a glimpse of the topographic details of the shore, and then the cloud again rolls in obscuring any further ascertainment. Similarly the recent publications of endovascular therapy for acute ischemic stroke have demonstrated there is a definitive benefit to mechanical reperfusion therapy, and yet each publication in itself is so incomplete, it is difficult to perceive anything more than this general appearance of benefit. The finer details are obscured by the premature truncation of trials, too early to definitively characterize the benefits and risks of endovascular therapy.

MR CLEAN, published earlier this year in the NEJM, and discussed ad nauseam in previous posts, marked the first of what is now a litany of trials demonstrating benefit for endovascular therapy in acute ischemic stroke (1). Its release resulted in the subsequent premature stoppage of a number of key trials examining endovascular therapy. Although all these trials boast impressive results, each stopped their enrollment prematurely, not due to a preplanned interim analysis, but rather due to MR CLEAN’s positive results. ESCAPE and EXTEND-IA were the first to halt enrollment and hastily publish their results (2,3). More recently the NEJM has reported on the findings from the next two trials prematurely stopped due to MR CLEAN’s success.

The first of these studies is the SWIFT-PRIME trial published by Saver et al (4). This trial’s initial results were presented earlier this year alongside EXTEND-IA and ESCAPE at the 2015 International Stroke Conference. Like its counterparts, this trial examined patients presenting with large ischemic infarcts and radiographically identified occlusions in the terminal internal carotid (ICA) or first branch (M1) of the middle cerebral artery (MCA). Additionally patients had to demonstrate a favorable core-to-ischemic penumbra ratio on perfusion imaging. Patients were enrolled if they were able to undergo endovascular interventions within 6-hours of symptom onset.

Like ESCAPE and EXTEND-IA, the results of SWIFT-PRIME are impressive. Authors boast a 25% absolute difference in the number of patients with a mRS of 0-2 at 90 days. Though notable, the definitive magnitude of effect is hardly concrete. The authors cite an NNT of 4 to have one more patient alive and independent at 90 days, and an NNT of 2.6 to have one patient less disabled. These calculations are used using their dichotomous and ordinal analyses respectively. Although the authors cite impressive p-values (<0.001), the confidence interval surrounding this 25% point estimate is far broader (11-38%). Meaning the NNT is somewhere between 2.6 and 9 patients. EXTEND-IA and ESCAPE have similarly wide confidence intervals surrounding their point estimates (4). EXTEND-IA’s confidence interval is 8% to 50% surrounding a point estimate of 31% (2). Likewise ESCAPE has a confidence interval of 13% to 34% surrounding its 23.7% point estimate (3). All three of these trials were stopped early secondary to MR CLEAN’s results. And though both EXTEND-IA and ESCAPE came close to reaching their pre-defined sample size, SWIFT-PRIME was stopped before its first interim analysis (n<200) (4).

Like EXTEND-IA, ESCAPE and SWIFT-PRIME, the second trial just published in NEJM, the REVASCAT trial, by Jovin et al was stopped prematurely secondary to the publication of the MR CLEAN data. In fact, even though it failed to reach the prospectively determined efficacy threshold for stopping the trial, at the first interim analysis, the data and safety board felt that given the MR CLEAN data, there was a loss of equipoise and further randomization would be unethical (5). Despite its apparent success the results of the RAVASC trial are far less impressive than either EXTEND-IA, ESCAPE or SWIFT-PRIME. The REVASC trial planned to enroll 690 patients presenting to the Emergency Department in 4 centers across Catalonia with symptoms consistent with a large vessel stroke that could be treated with endovascular therapy within 8 hours of symptom onset. Unlike EXTEND-IA, ESCAPE or SWIFT-PRIME, the REVASCAT Trial did not use perfusion imaging to select patients with favorable areas of salvageable tissue. Rather employed CTA to identify occlusion in the ICA or M1 branch of the MCA, and utilized the less accurate ASPECT score, derived from the initial non-contrast CT, to assess potential for viable ischemic tissue (5).

REVASCAT enrolled 206 patients before its premature termination. And like the three trials before it demonstrated a statistically significant improvement in mRS at 90 days in the patients who underwent endovascular therapy. The REVASCAT trial cites an absolute increase in the number of patients with a mRS of 0-2 by 15.5%. This is surrounded by a confidence interval of 2.4% to 28.5%. Furthermore, unlike the previous three trials that either boast an outright benefit in mortality or demonstrate trends in favor of endovascular therapy, REVASCAT demonstrated an impressive 4.8% absolute increase in the rate of death within the first 7 days after randomization (5).

The results of REVASCAT are far from positive. If they were not included in the optimistic fervor that currently surrounds endovasacular therapy, it might even be considered a negative trial. Why were the results REVASCAT far less impressive than EXTEND-IA, ESCAPE and SWIFT-PRIME? Was it just random chance, the true effect size of endovascular therapy falling somewhere between the two extremes of the 13.5% difference observed in MR CLEAN and the 31% seen in EXTEND-IA? Or rather was it that the patient population selected in EXTEND-IA, ESCAPE and SWIFT-PRIME led to their success? EXTEND-IA, ESCAPE and SWIFT-PRIME all utilized some form of advanced imaging to determine the size of viable ischemic tissue (2,3,4). MR CLEAN and REVASCAT used only the CTA to identify a reachable lesion and the non-contrast CT to determine tissue viability (1,5). If any one of these trials were followed to completion the results likely would provide us with a better understanding of who will benefit from endovascular therapy and the exact magnitude of this benefit.

This is a problem of certainty. Our faith in endovascular interventions was so unyielding, that at the first sign of success we claimed victory and discontinued any further scientific inquiries. The bloated results demonstrated in EXTEND-IA, ESCAPE, and SWIFT-PRIME are the result of this premature resolution. We know that trials stopped early for benefit are likely to over-estimate the effect size of the treatment in question. In fact the smaller the sample size at the time of closure, the greater the amplification (6). In 1989, Peacock et al demonstrated this to be a mathematical inevitability (7). Later validated by Bassler et al in a meta-analysis examining 91 trials stopped prematurely for benefit (8). Bassler et al revealed that the degree of embellishment was directly related to the size of the sample population at cessation and independent of the quality of the trial or the presence of a predetermined methodology for early stoppage.

Although the exact patient population that stands to benefit from endovascular therapy is unclear, it is certainly a small fraction of the overall patients who present to the Emergency Department with acute ischemic stroke. All patients enrolled in the REVASC trial were also included in a national registry known as SONIA. SONIA catalogued 2576 patients (only 15.6% of all stroke patients seen) with some form of reperfusion therapy over the time period REVASCAT enrolled patients (5). The vast majority of these patients 2036(79%) received only tPA. 540 (21%) patients underwent endovascular therapy. Of these only 111 (24%) were eligible for enrollment into the REVASCAT trial. Only 4.3% of the patients in the SONIA registry, and only 0.3% of all stroke patients during the 2-year period were eligible for inclusion in the REVASCAT trial (5). This accounts for a small minority of the stroke patients presenting to the Emergency Department with symptoms consistent with acute ischemic stroke. Of note the criteria used in the REVASCAT trial to determine eligibility are more inclusive than those used in EXTEND-IA, ESCAPE, and PRIME-SWIFT, which if you believe were successful because of their inclusion criteria, would account for an even smaller portion of stroke patients presenting the Emergency Department. In the SWIFT-PRIME trial it took 2-years and 39 centers to recruit 196 patients (4). That comes out to 0.2 patients per center per month. EXTEND-IA and ESCAPE recruited only 0.3 and 1.44 patients per center per month respectively (2,3).

Even the most skeptical will find difficulty denying there is a definite treatment effect observed in the recent trials examining endovascular therapy in acute ischemic stroke. The magnitude of this effect has yet to be defined. Its borders are obscured by the murkiness of small sample sizes, extreme selection bias and prematurely stopped trials. There are also clear harms associated with this invasive procedure. Both the REVASCAT trial and the earlier trials examining endovascular therapy (IMS-3, SYNTHESIS and MR RESCUE) demonstrated that when performed on the wrong patient population, not only will endovascular therapy fail to provide benefit, it may in fact be harmful (5,9,10,11). This is simply not a yes or no question. The resources required to build an infrastructure capable of supporting endovascular therapy on a national level are daunting. Though we have reached a certain degree of clarity that endovascular therapy for acute ischemic stroke provides benefit, how well and in whom remains murky. The overeager truncation of important trials has left us adrift in a sea of fog. Unsure if the shoreline we paddle towards is a warm welcoming beachfront or a rocky coast prepared to demolish our vessel upon arrival.

Sources Cited:

  1. Berkhemer OA, Fransen PS, Beumer D, et al. A randomized trial of intraarterial treatment for acute ischemic stroke. N Engl J Med. 2015;372:(1)11-20.
  2. Campbell BC, Mitchell PJ, Kleinig TJ, et al. Endovascular Therapy for Ischemic Stroke with Perfusion-Imaging Selection. N Engl J Med. 2015.
  3. Goyal M, Demchuk AM, Menon BK, et al. Randomized Assessment of Rapid Endovascular Treatment of Ischemic Stroke. N Engl J Med. 2015.
  4. Saver JL, Goyal M, Bonafe A, et al. Stent-Retriever Thrombectomy after Intravenous t-PA vs. t-PA Alone in Stroke. N Engl J Med. 2015
  5. Jovin TG, Chamorro A, Cobo E, et al. Thrombectomy within 8 Hours after Symptom Onset in Ischemic Stroke. N Engl J Med. 2015;
  6. Guyatt GH, Briel M, Glasziou P, Bassler D, Montori VM. Problems of stopping trials early. BMJ. 2012;344:e3863.
  7. Pocock SJ, Hughes MD. Practical problems in interim analyses, with particular regard to estimation. Control Clin Trials 1989;10(suppl 4):209-21S.
  8. Bassler D, Briel M, Montori VM, et al. Stopping randomized trials early for benefit and estimation of treatment effects: systematic review and meta-regression analysis. JAMA. 2010;303(12):1180-7.
  9. Broderick JP, Palesch YY, Demchuk AM, et al. Endovascular therapy after intravenous t-PA versus t-PA alone for stroke. N Engl J Med. 2013;368(10):893-903.
  10. Ciccone A, Valvassori L, Nichelatti M, et al. Endovascular treatment for acute ischemic stroke. N Engl J Med. 2013;368(10):904-13.
  11. Kidwell CS, Jahan R, Gornbein J, et al. A trial of imaging selection and endovascular treatment for ischemic stroke. N Engl J Med. 2013;368(10):914-23.

 

 

 

 

 

The Case of the Anatomic Heart Part 2

illu_heart_kleiner

The PROMISE Trial, like any aptly named study chose an acronym meant to inspire. In this case, the hope for a better tomorrow. And though the authors of the Prospective Multicenter Imaging Study for Evaluation of Chest Pain trial were not clear on the specific details their promise entailed, I fear the results of this trial will leave us feeling betrayed and forsworn.

The authors of the PROMISE Trial presented the findings from their massive undertaking at the 2015 ACC scientific assembly. The results were published simultaneously in the NEJM. Douglas et al randomized 10,003 patients to either standard non-invasive functional testing, as determined by the treating physician, or CTCA. Patients were recruited from outpatient facilities across North America when presenting with new onset chest pain in which the treating physician was suspicious of cardiac origin and had already ruled out ACS. Patients were excluded if they presented with unstable vitals, EKG changes, or positive biomarkers. Given the pragmatic nature of the trial, all other treatment decisions were left to the prerogative of the treating physician (1).

The authors found no difference in their primary outcome, the composite endpoint of death, MI, hospitalization for UA, or major procedural complications over the followup period (at least 12 months with average follow up of 24 months), between the CTCA and traditional testing groups (3.3% vs 3.0%). In fact other than a small decrease in the amount of negative invasive catheterization seen in the CTCA arm (3.4% vs 4.3%), the authors were unable to find any statistically significant differences in the multitude of secondary endpoints measured. As far as safety outcomes, the authors did cite some relevant concerns. Most notably those randomized to receive CTCA as their screening test underwent significantly more downstream testing and interventions. 12.2% of those randomized to the CTCA arm compared to 8.1% in the standard testing arm underwent invasive catheterization, 6.2% compared to 3.2% underwent subsequent revascularization including a 1.5% vs 0.76% rate of coronary artery bypass grafting (CABG) (1).

Now some might argue that the PROMISE trial was not performed on Emergency Department patients and thus its application to our low risk chest pain population is questionable. In some senses this may be true. Patients evaluated in the Emergency Department for chest pain are inherently at higher risk than their counterparts seen in primary care offices. Conversely the PROMISE Trial evaluated a cohort of chest pain in whom the treating physician suspected the symptoms were likely of cardiac origin. Before being enrolled in the trial all of these patients were ruled out for ACS with negative EKGs and biomarkers. Additionally the treating physician felt further provocative testing was necessary. This is not unlike the cohort of patients we include in our low-risk chest pain population in the Emergency Department. Furthermore we have four trials with over 3,000 Emergency Department patients evaluating the efficacy of CTCA, which demonstrate almost identical results to the PROMISE Trial (2,3,4,5). Each of these studies determined that CTCA adds no additional prognostic value to our standard risk stratification strategies and likely leads to increased invasive procedures. In a meta-analysis of these four trials published in JACC in 2013, Hulten et al found a significant increase in the number of invasive angiographies, PCIs and revascularizations performed in the patients randomized to the CTCA arm (6). PROMISE demonstrated the exact same tendencies of CTCA in a much larger cohort (1).

Why did PROMISE fail to find a difference? What are we to infer about the acuity and severity of a disease state that does not benefit from a timely and accurate diagnosis? We know CTCA is far more accurate than our more traditional forms of provocative testing. And yet, why in this massive trial did it fail to find any difference in clinically relevant outcomes? Might it be that a time-sensitive anatomical definition of CAD is unnecessary?

The first reason why PROMISE failed to show a difference is that the population enrolled in the trial was at such low risk for the disease state in question, they are likely to do well whatever diagnostic testing strategy they undergo. Only 3.1% of the group had any event during the follow-up period. Only 1.5% died and only 0.7% had a MI (1). With such a low event rate, even if CTCA is an effective means of identifying and preventing MI and cardiac death, a statistically significant benefit is unlikely to be found even with a sample size as large as 10,000 patients.

The second reason why the PROMISE Trial is likely to have failed, is simply because we are functioning under the misconception that when we diagnose these patients with obstructive CAD, an invasive strategy is superior to optimal medical management. Though we know that reperfusion therapy has objective benefits in patients actively experiencing a myocardial infarction, these same benefits have failed to translate to the more stable lesions of CAD. Multiple large RCTs have failed to find a benefit of PCI over optimal medical management in patients with stable obstructive CAD (7,8). Stergiopoulos et al have now published a number of meta-analyses examining these trials, which have also failed to uncover benefits that may have been missed in the weaker powered individual trials (9,10).

The PROMISE trial was not the only trial presented at the ACC Scientific Assembly examining the pragmatic use of CTCA for the diagnostic work up of chest pain. The SCOT-HEART trial was yet another massive undertaking, the results published online in The Lancet in concert with the oral presentation. In this trial, investigators enrolled 4,146 patients referred to chest pain clinics across Scotland, to either a standard work up or a standard work up plus the addition of CTCA. Although by sheer quantity it does not possess the statistical s of the PROMISE trial, it does present us with some insights, which the PROMISE trial proved incapable of providing(11).

The unique design of the SCOT-HEART trial insured all patients received a full standardized evaluation, often including (85% of the time) an exercise stress test. It was only after the treating physician assessed the patient, reported his or her baseline estimate of the likelihood of CAD and determined what further testing and treatment strategies he or she would recommend, that the patients were randomized to either receive CTCA or standard care. Like PROMISE, this was a pragmatic trial design and other than the use of CT angiography clinicians were given free rein to treat each patient as they deemed appropriate. At 6 weeks the physicians were then asked again to assess the likelihood of CAD(11).

What the authors revealed was that the use of CTCA significantly improved the clinicians confidence in their diagnosis of both CAD and angina of cardiac origin (the trial’s primary endpoint). They also found a statistically significant increase in the number of patients diagnosed with CAD in the group randomized to receive CTCA (23% vs 11%). Additionally patients in the CTCA arm were more frequently shifted towards more aggressive and invasive modes of management when compared to the standard care arm. Specifically more patients in the CTCA group saw an increase in number of medical therapies prescribed and invasive catheterizations performed (11).

In summary, patients randomized to CTCA were more often given the diagnosis of CAD and were more likely to be treated with medical therapies and invasive procedures than the patients in the standard care group. But did all of these investigations and interventions lead to better outcomes? Simply put no. The rate of cardiovascular death and myocardial infarction during the follow up period (1.7 years) was 1.3 vs 2.0, a 0.7% non-statistical difference. The overall mortality was 0.8% vs 1.0%, respectively. Even the decrease in the quality and severity of the patients’ symptoms (the reason the patients presented to the clinic in the first place) at 6-weeks, was identical (11).

The PROMISE trial demonstrated the use of CTCA promotes increased downstream testing and intervention. The SCOT-HEART trial validated these findings. The SCOT-HEART trial also demonstrated CTCA provides a significant degree of diagnostic certainty to the treating physician, leading to more aggressive medical management. And yet knowing a lot and doing a lot failed equate to a reduction in mortality or myocardial infarctions. These are coronary mirages, promising the weary clinicians water when in reality they are just leading them deeper into the barren desert.

Despite its size and decisively negative results, perhaps the most important study arm in the PROMISE Trial did not exist, an arm in which patients were randomized to not receive any form of provocative testing, but rather treated medically as per the judgment of their physician. Both the PROMISE and SCOT-HEART trials demonstrated that a cohort of outpatient chest pain patients are at such low risk for adverse events, they are likely to do equally as well with whatever provocative test is used, or more importantly without any at all. Surely it is time to examine such a hypothesis, to add a third arm to the PROMISE cohort. The ISCHEMIA Trial is currently enrolling patients to compare medical management vs invasive strategies in the setting of a positive provocative test. Unfortunately this trial’s applicability is limited by the fact that authors insist all patients undergo a CTCA before enrollment to rule out the presence of left main arterial disease. And though this may be a step in the right direction, we still can’t escape our need for anatomical certainty in the face of diminishing clinical utility. Surely it is time we define the value of both provocative and anatomical testing in the low risk chest pain population, truly a Promise worth keeping.

Sources Cited:

  1. Douglas PS, Taylor A, Bild D, et al. Outcomes research in cardiovascular imaging: report of a workshop sponsored by the National Heart, Lung, and Blood Institute. Circ Cardiovasc Imaging 2009;2:339-348
  2. Goldstein JA, Chinnaiyan KM, Abidov A, et al. The CT-STAT (Coronary Computed Tomographic Angiography for Systematic Tri- age of Acute Chest Pain Patients to Treatment) trial. J Am Coll Cardiol 2011;58:1414–22.
  3. Hoffmann U, Truong QA, Schoenfeld DA, et al. Coronary CT angiography versus standard evaluation in acute chest pain. N Engl J Med 2012;367:299–308.
  4. Litt HI, Gatsonis C, Snyder B, et al. CT Angiography for safe discharge of patients with possible acute coronary syndromes. N Engl J Med 2012;366:1393–403.
  5. Goldstein JA, Gallagher MJ, O’Neill WW, Ross MA, O’Neil BJ, Raff GL. A randomized controlled trial of multi-slice coronary computed tomography for evaluation of acute chest pain. J Am Coll Cardiol 2007;49:863–71.
  6. Hulten E, Pickett C, Bittencourt MS, et al. Outcomes after coronary computed tomography angiography in the emergency department: a systematic review and meta-analysis of randomized, controlled trials. J Am Coll Cardiol. 2013;61:(8)880-92.
  7. Boden WE, O’rourke RA, Teo KK, et al. Optimal medical therapy with or without PCI for stable coronary disease. N Engl J Med. 2007;356(15):1503-16.
  8. Mehta SR, Cannon CP, Fox KA, et al. Routine vs selective invasive strategies in patients with acute coronary syndromes: a collaborative meta-analysis of randomized trials. JAMA. 2005;293(23):2908-17.
  9. Stergiopoulos K, Brown DL. Initial Coronary Stent Implantation With Medical Therapy vs Medical Therapy Alone for Stable Coronary Artery Disease: Meta- analysis of Randomized Controlled Trials. Archives of Internal Medicine 2012 Feb;172(4):312
  10. Stergiopoulos K, Boden WE, Hartigan P, et al. Percutaneous Coronary Intervention Outcomes in Patients With Stable Obstructive Coronary Artery Disease and Myocardial Ischemia: A Collaborative Meta-analysis of Contemporary Randomized Clinical Trials. JAMA Intern Med. 2014;174(2):232-240.
  11. The SCOT-HEART investigators. CT coronary angiography in patients with suspected angina due to coronary heart disease (SCOT-HEART): an open-label, parallel group multicenter trial. Lancet. 2015; (published online March 15.)

The Case of Dubious Squire

laennec (1)

I often get the sense that the makers of many biomarkers envision us as helpless damsels in distress drowning in an icy pond or trapped in a monumental tower with no obvious means of descent. I imagine they think in our desperate grasps for aid, we will cling to whatever assistance they may offer, independent of its buoyancy. But in these moments of fear and uncertainty we must remember for a test to be useful to a clinician not only does it have to be accurate and reliable, it must also add diagnostic value above the clinician’s own inherent aptitude. B-type natriuretic peptide (BNP) and its natriuretic derivatives are a classic example of such a test heralded for its isolated diagnostic properties without asking the simple question, how does it help the physician? Through statistical misdirection, the distributors of natriuretic peptides have published research hailing their diagnostic prowess when examined in isolation. Such publications have led to these assays becoming recommended components of the workup for any patient suspected of having acute decompensated heart failure (1,2,3). A recent meta-analysis performed by the helpful folks responsible for the NICE guidelines, sought to examine the validity of these recommendations and determine the true diagnostic accuracy of natriuretic peptides (4). And yet, I fear these authors in their effort to provide an accurate representation of the assay’s diagnostic accuracy, have forgotten to take into account the most important factor when evaluating any diagnostic test, the clinician.

In this meta-analysis, Roberts et al examined the clinical accuracy of BNP, NTproBNP, and MRproANP for the diagnosis of acute decompensated heart failure in the Emergency Department. Specifically, the  goal was to evaluate the low risk criteria proposed by the 2012 European Society of Cardiology guidelines for heart failure, a BNP ≤100 ng/L, a NTproBNP, ≤300 ng/L, and a MRproANP, ≤120 pmol/L. They also examined the utility of these assays at intermediate and high levels (100-500 ng/L, and >500 ng/L for BNP; 300-1800 ng/L, and >1800 ng/L for NTproBNP; and >120 pmol/L for MRproANP) (4).

The authors identified 42 articles, examining 37 different cohorts that met criteria for inclusion into their meta-analysis. Combining these studies, the authors calculated pooled test characteristics for each of the natriuretic assays in question. They found at the low thresholds proposed by the European Society of Cardiology, the assays performed equally mediocre. All three demonstrated high sensitivities, 95%, 99%, and 95% respectively. Of course by selecting such a low cutoff, authors ensured that a large proportion of the patients without acute heart failure would also test positive. The specificities of each of these assays were a dismal 63%, 43%, and 56% respectively. As with any diagnostic tool, by raising the threshold of what you consider positive, the authors were able to improve the assay’s specificity. When the intermediate thresholds were utilized, the specificities increased to to 86% and 76% for BNP and NTproBNP respectively (authors did not have enough data on MRproANP to adequately calculate accuracy in this intermediate range.) Of course this amplified specificity came at the price of a loss of sensitivity, 85% and 90% respectively. When using the high threshold, authors were able to augment the tests’ specificity even further, but of course at this high level a large portion of patients with acute decompensated heart failure are missed. At a threshold of ≥500 ng/L, diagnostic meta-analysis was not performed due to inadequate data. BNP demonstrated sensitivities from the individual studies ranging from 35% to 83%, with a paired specificity from 78% to 100%. Likewise at a threshold of ≥1800 ng/L, NTproBNP reported sensitivities ranging from 67% to 87% with paired specificities ranging from 72% to 95%. Finally at the threshold of >120 pmol/L, MRproANP demonstrated sensitivities ranging from 84% to 98% and the paired specificities from 40% to 84% (4).

The authors conclude, “The use of NTproBNP and B type natriuretic peptide at the rule-out threshold recommended by the recent European Society of Cardiology guidelines on heart failure provides excellent ability to exclude acute heart failure in the acute setting with reassuringly high sensitivity. The specificity is modest at all but the highest values of natriuretic peptide, therefore confirmatory testing by cardiac imaging is required in patients with positive test results (4).”

On face value this is a fair conclusion, as all three of these assays seem to perform moderately well at either extreme of their diagnostic spectrum. At very low levels it is safe to say that the likelihood that the patients symptoms were caused by heart failure was fairly low. Likewise when significantly elevated, these assays boast specificities high enough for clinical use. Unfortunately these results do very little to explain the true utility of natriuretic peptides. By isolating these assays’ test characteristics outside the clinical arena, the authors have falsely inflated the utility of BNP and its natriuretic derivatives.

The first issue that is pervasive throughout the literature expounding the utility of natriuretic peptides is the gold standard used to evaluate their diagnostic capabilities. The most prevalent gold standard used is a retrospective review performed by two Cardiologists blinded to the results of the natriuretic peptide in question. 31 of the 37 cohorts in this meta-analysis used some derivative of this questionable gold standard. In one of the largest trials conducted, the Breathing Not Properly (BNP) trial by Maisel et al, authors examined 1586 patients presenting to the Emergency Department with acute dyspnea (5). They found that the two Cardiologists disagreed with the initial Emergency Physician’s diagnoses 14% of the time and disagreed with each other 10.7% of the time (6). This suggests that the cases in question were clearly not straightforward. If two Cardiologists with access to the patients’ entire hospital course disagreed with each other almost as often as they disagreed with the initial diagnosis of the Emergency Physician, then it is fair to say using this definition as the gold standard is less than ideal.

Despite this tarnished gold standard the question remains, how do natriuretic peptides perform when used in the clinical arena? More specifically how well do natriuretic peptide assays help the Emergency Physician differentiate the causes of dyspnea in the subset of patients in which there is considerable diagnostic uncertainty? In the BNP trial Maisel et al examined the Emergency Physician’s ability to correctly identify acutely decompensated heart failure. They found our accuracy overall, when compared to the less than perfect gold standard of a retrospective review performed by two Cardiologists was 86% (6). In the subset of patients in which the Emergency Physician was certain the patients’ dyspnea was not cardiac in origin (<5% chance of CHF), their diagnostic accuracy was superb (92%). Likewise in the group of patients in which the Emergency Physician was 95% certain the patient did in fact have CHF, they were correct 95% of the time (7). It was only in the intermediate group (between 20%-80% probability) in which the Emergency Physician was unsure of the likelihood of CHF, that their diagnostic capabilities were understandably poor. It is in this intermediate group that we would hope the natriuretic peptides could provide us with some guidance. We should not ask how accurately do peptide assays predict acute decompensated heart failure, but rather how well do peptide assays predict acute decompensated heart failure in the subset of patients that present a diagnostic challenge to the Emergency Physician? When charged with such a task these assays are far less impressive.

Although in their initial publication Maisel et al failed to disclose the diagnostic abilities of the Emergency Physicians, citing only BNP’s performance using the retrospective cutoff of 100 ng/L (sensitivity of 90%, a specificity of 76%), the authors later published these findings in a secondary analysis. Published by McCullough et al in Circulation, the authors revealed that when the Emergency Physician was certain that the patient’s cause of dyspnea was either definitely CHF or definitely not CHF, their unstructured judgment outperformed that of the BNP assay. For patients in which the Emergency Physician was certain CHF was not the cause of their dyspnea their accuracy was 92% vs the BNP which was only 84%. Likewise when the Emergency Physician was certain the patient did in fact have CHF, again their judgment outperformed the diagnostic abilities of the BNP assay (accuracy of 95% vs 92%) (7). In fact even in the subset of patients where the Emergency Physician was fairly certain the diagnosis was CHF (>80%), their positive likelihood ratio of 11.5 was far more impressive than that of the BNP (3.4)(8). In the 27.8% of patients in which the Emergency Physician was unclear of the diagnosis, the very group we would hope the BNP could provide guidance, its diagnostic accuracy was entirely unhelpful. In this subset of patients, at a cutoff of 100 ng/L, the assay demonstrated no clinical utility with a sensitivity and specificity of 79% and 71% respectively (8).

Each of the 37 studies included in the Roberts et al meta-analysis failed to truly examine how natriuretic peptides perform clinically. As discussed, the majority of these trials employed a less than ideal gold standard comparator and were so confounded by spectrum bias, they rarely examined the subgroup of patients in which the diagnosis was unclear. Additionally most of these studies used a retrospectively derived cutoff calculated to demonstrate the assay’s optimal performance. This type of overfitting inevitably leads to decreased performance when validated in a novel cohort. Ideally a randomized trial comparing a natriuretic peptide guided management to standard practice could demonstrate what, if any, clinical utility these assays provide. A number of such trials have been conducted.

The first was published in the NEJM in 2004 by Mueller et al. In this trial the authors randomized 452 patients presenting to the emergency department with acute dyspnea to either a diagnostic strategy utilizing a BNP assay or a standard work up (9). Authors powered their study to detect a 20% reduction in time to discharge (an interesting primary diagnosis to choose if one thinks BNP possesses true clinical relevance), defined as the interval from presentation at the Emergency Department to discharge. The authors found a significant difference in time to discharge (8 vs 11 days) as well as shorter times to treatment for the BNP group (63 vs 90 minutes), decreased rates of hospitalization (75% vs 85%) and decreased admission to the ICU (15% vs 24%). In fact every outcome variable trended towards better in the group randomized to receive the BNP-guided diagnostic strategy. Initially these results seem significantly in favor of using BNP in the diagnostic workup of acute dyspnea, until one examines the other RCTs evaluating this question (9).

The second RCT examining natriuretic peptides for the management of acute dyspnea was published by Moe et al in Circulation in 2007(10). In this trial, the authors randomized 500 patients to either a NT-proBNP guided strategy or standard care. Like the previous study the authors used the clinically dubious endpoint of initial ED visit duration as their primary endpoint. Though the authors found a statistically significant difference in initial ED visit time, the 0.7-hour difference (5.6 hrs vs 6.3 hrs) hardly seems clinically relevant. In fact the remainder of clinically important variables all favored the usual care group (in-hospital mortality 4.4% vs 2.4% and 60-day mortality 5.4 vs 4.4) (10). Three other trials published subsequently found similar results. Other than clinically questionable reductions in length of stay, the use of natriuretic peptides had no meaningful effect on clinical outcomes (11,12,15). When these trials’ data were pooled in a meta-analysis published by Trinquart et al, in The American Journal of Emergency Medicine in 2011, authors found no significant difference in any of the multitude of clinically relevant variables including hospital admission rate, length of hospital stay, mortality or rates of re-hospitalization (13). Even in the long-term management of patients with known heart failure, when compared to symptom guided approach, a BNP guided protocol led to further diagnostic testing and more aggressive medical therapy without producing a difference in clinically relevant outcomes (18-month survival free of any hospitalization was 41% vs 40%) (16).

This is not a proclamation of the infallibility of the Emergency Physician but rather the recognition of our shortcomings. There are a clear group of patients that present a diagnostic challenge, for whom further confirmatory investigations could provide guidance. Despite the industry-sponsored studies designed to propagate an overinflated self-worth, a close examination of the natriuretic peptides reveal they add little value to Physicians’ judgment. When we as the Emergency Physician are certain of the diagnosis of acute decompensated heart failure, our intrinsic diagnostic capabilities outperform those of natriuretic peptides. In the patients that present as a diagnostic challenge, these assays are far too insensitive and non-specific to add substantial diagnostic clarity. Furthermore we have other, more diagnostically robust, tools like point of care ultrasound to assist in these challenging circumstances (14). Natriuretic peptides are not the diagnostic saviors that they are commonly proclaimed as. More importantly we are not in need of rescue as often as the makers of these peptides would have us believe. On the rare occasion we do require aid, should we not demand a far more resolute champion?

Sources Cited:

  1. Yancy CW, Jessup M, Bozkurt B, Butler J, Casey DE, Drazner M, et al. ACCF/AHA guideline for the management of heart failure: a report of the American College of Cardiology Foundation/American Heart Association Task Force on practice guidelines Circulation2013;128:e240-327
  2. McMurray JJV, Adamopoulos S, Anker SD, Auricchio A, Bohm M, Dickstein K, et al. ESC guidelines for the diagnosis and treatment of acute and chronic heart failure 2012: The Task Force for the Diagnosis and Treatment of Acute and Chronic Heart Failure 2012 of the European Society of Cardiology. Developed in collaboration with the Heart Failure Association (HFA) of the ESC. Eur J Heart Fail2012;14:803-69
  3. Thygesen K1, Mair J, Mueller C, Huber K, Weber M, Plebani M, et al. Recommendations for the use of natriuretic peptides in acute cardiac care: a position statement from the Study Group on Biomarkers in Cardiology of the ESC Working Group on Acute Cardiac Care Eur Heart J2012;33:2001-6
  4. Roberts Emmert, Ludman Andrew J, Dworzynski Katharina, Al-Mohammad Abdallah, Cowie Martin R, McMurray John J V et al. The diagnostic accuracy of the natriuretic peptides in heart failure: systematic review and diagnostic meta-analysis in the acute care setting BMJ 2015; 350 :h910
  5. Maisel AS, Krishnaswamy P, Nowak RM, et al. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med. 2002;347:(3)161-7.
  6. McCullough PA, Nowak RM, McCord J, et al. B-type natriuretic peptide and clinical judgment in emergency diagnosis of heart failure: analysis from Breathing Not Properly (BNP) Multinational Study. Circulation. 2002;106:(4)416-22.
  7. Schwam E. B-type natriuretic peptide for diagnosis of heart failure in emergency department patients: a critical appraisal. Acad Emerg Med. 2004;11:(6)686-91.
  8. Hohl CM, Mitelman BY, Wyer P, Lang E. Should emergency physicians use B-type natriuretic peptide testing in patients with unexplained dyspnea? CJEM. 2003;5:(3)162-5.
  9. Mueller C, Scholer A, Laule-Kilian K, Martina B, Schindler C, Buser P, et al. Use of B-type natriuretic peptide in the evaluation and management of acute dyspnea. N Engl J Med 2004;350(7):647-54.
  10. Moe GW, Howlett J, Januzzi JL, Zowall H. N-terminal pro-B-type natriuretic peptide testing improves the management of patients with suspected acute heart failure: primary results of the Canadian prospective randomized multicenter IMPROVE-CHF study. Circula- tion 2007;115(24):3103-10.
  11. Rutten JH, Steyerberg EW, Boomsma F, van Saase JL, Deckers JW, Hoogsteden HC, et al. N-terminal pro-brain natriuretic peptide testing in the emergency department: beneficial effects on hospitalization, costs, and outcome. Am Heart J 2008;156(1):71-7.
  12. Schneider HG, Lam L, Lokuge A, Krum H, Naughton MT, De Villiers Smit P, et al. B-type natriuretic peptide testing, clinical outcomes, and health services use in emergency department patients with dyspnea: a randomized trial. Ann Intern Med 2009;150(6):365-71.
  13. Trinquart L, Ray P, Riou B, Teixeira A. Natriuretic peptide testing in EDs for managing acute dyspnea: a meta-analysis. Am J Emerg Med. 2011;29:(7)757-67.
  14. Al Deeb M, Barbic S, Featherstone R, Dankoff J, Barbic D. Point-of-care ultrasonography for the diagnosis of acute cardiogenic pulmonary edema in patients presenting with acute dyspnea: a systematic review and meta-analysis. Acad Emerg Med. 2014;21:(8)843-52.
  15. Singer AJ, Birkhahn RH, Guss D, et al. Rapid Emergency Department Heart Failure Outpatients Trial (REDHOT II): a randomized controlled trial of the effect of serial B-type natriuretic peptide testing on patient management. Circ Heart Fail. 2009;2:(4)287-93.
  16. Pfisterer M, Buser P, Rickli H, et al. BNP-guided vs symptom-guided heart failure therapy: the Trial of Intensified vs Standard Medical Therapy in Elderly Patients With Congestive Heart Failure (TIME-CHF) randomized trial. JAMA. 2009;301:(4)383-92.

 

 

 

 

The Adventure of the Second Stain Continues

Meningitis_-_Lumbar_puncture

The CT-LP (lumbar puncture) diagnostic pathway has been a permanent fixture in the arsenal of the Emergency Physician for what seems like an eternity. Steadfast in its dependability, for many generations, the LP was a necessity for Emergency Physicians to safely exclude the diagnosis of subarachnoid hemorrhage (SAH). And yet, rarely a moment has passed over the past few years when Dr. Jeffrey Perry has not politely demonstrated how little we truly know about this disease process and the diagnostic tools associated with it. His 2011 paper questioning the necessity of an LP following a negative head CT under 6-hours from symptom onset, shook the once solid ground that the LP firmly stood upon (1). As if this attack on our reliable comrade was not enough, his most recent publication examining the diagnostic capabilities of the lumbar puncture itself has left our confidence in this once dependable testing strategy in turmoil.

In this paper, published in February of 2015 in The BMJ, Perry et al utilized a subset of patients from two cohorts originally enrolled to derive and validate his Ottawa SAH rule (3,4). Authors examined 1739 of these patients who received a lumbar puncture as part of their workup for SAH (2). They then sought to assess the diagnostic accuracy of this tool. Similar to common practice, they prospectively defined a positive tap as greater than 1 RBC on fluid aspirate. When this impossibly low threshold was upheld, LP’s performance was less than stellar. Of the 1739 patients who received an LP, 641 (36.9%) had positive findings, only 15 of which were actually from subarachnoid blood. Most of these false positive results were trivial, as 476 (74.3%) had counts of ≤100×106/L and 94 (14.8%) had counts of 101-1000×106/L. Only 10.4% of these patients were found to have clinically concerning levels of RBCs in their CSF (counts of >1000×106/L). Despite the predominance of low RBC counts, a great majority of the patients in whom the LP was positive (419) received invasive angiographic imaging.

When the LP was found to be negative (No RBCs in the CSF), it boasted a sensitivity of 100%. In an attempt to compensate for the unacceptably high number of false positives the authors retrospectively determined the ideal RBC cutoff to be 2000×106/L. At this threshold the LP had a sensitivity of 93.3% (95% confidence interval 66.0% to 99.7%) and specificity of 92.8% (90.5% to 94.6%) for aneurysmal subarachnoid hemorrhage.  If visual xanthochromia was added to this RBC cutoff, the sensitivity for ruling out SAH became 100% (95% confidence interval 74.7% to 100.0%).

These numbers are of course fraught with methodological pitfalls. The threshold of 2000×106/L was retrospectively derived to best fit this specific cohort. Only 15 of the 1739 patients examined actually had the disease in question making these calculations incredibly unstable (the confidence intervals surrounding their 100% sensitivity dropped as low as 74.7%). The threshold of 2000×106/L is hardly robust enough for clinical use and will inevitably fail when applied in prospective fashion to a novel cohort.

Though this data is not definitive and further studies validating these findings are required, a number of valuable conclusions can be inferred. Surprisingly the most important of these has little to do with the diagnostic utility of the lumbar puncture.

In 2011 Perry et al published their game changing article in The BMJ examining the accuracy of a non-contrast head CT performed under 6-hours from symptom onset for the diagnosis of SAH (1). This paper was a secondary analysis of the original cohort used to derive the Ottawa SAH Rules (4). Using this preexisting cohort they assessed the accuracy of head CT for the diagnosis of SAH before and after a 6-hour threshold. The authors claim a sensitivity of 100% when CT was performed within 6-hours of symptom onset. However when the CT was performed after this 6-hour threshold, the sensitivity fell to 85.7%. Suggesting that when performed within 6-hours, a non-contrast CT is sufficient to rule out SAH, allowing practitioners to forego a subsequent lumbar puncture. Though many have viewed this as practice changing, others argue a number of flaws in the study’s design prevent us from interpreting these conclusions with such conviction.

The most obvious and often discussed weakness of this study is the use of a surrogate endpoint in place of a true gold standard. Not all patients who had a negative head CT underwent a confirmatory lumbar puncture. In its place, the authors used a 6-month proxy outcome to demonstrate the safety of CT alone. Patients underwent a structured phone interview at the 6-month mark to ascertain their wellbeing. When attempts to reach patients over the phone failed, authors endeavored to determine their status by searching medical records from regional neurosurgical centers as well as coroner’s death records. Patients were considered to be free of SAH if on 6-month follow-up they were alive and well. In the case of patients who were discovered to have passed away during the follow-up period, if the cause of death was determined to be due to something other than SAH, their deaths were not counted as a missed diagnosis. Of the 1931 patients examined, 421 were lost to follow-up. Authors found 8 of these patients had passed away since their initial workup for subarachnoid hemorrhage. Although none of these patients were determined to have died because of SAH, the reliability of post mortem cause of death is questionable at best (5).

A far less discussed aspect to this study was how the authors’ definition of a positive CT influenced the validity of their results. The standard that Perry et al used to calculate the sensitivity of head CT was based upon the Neuroradiologist’s official report. In most facilities (as was the case at the centers participating in this study) what guides Emergency Physicians’ clinical decision-making is the initial wet read usually done by Radiology house staff or even the ED physicians themselves. The sensitivity we are concerned with is that of the wet read. The Neuroradiologists in this study were not blinded to the patients’ lab findings. As such we are unable ascertain how many CTs done within 6 hours were initially read as negative, and only later after a positive LP was performed was the final report recorded as positive. If this had occurred with any frequency it would obviously harm the internal validity of the results. We are able to get a sense of how frequently this occurred by examining how many of the patients who were diagnosed with SAH had both a positive CT and LP. At least in theory, if the CT was positive then there would be no reason to perform the subsequent LP.

Of the 15 patients with SAHs that were diagnosed using a positive LP, 10 underwent head CTs and LPs that were both positive. The vast majority of these subarachnoid bleeds (n=8) were found in patients who received their CTs beyond the 6-hour threshold. There were however two patients that were identified as having received their CTs within 6-hours of symptom onset. In both these patients their initial CT was read as negative and only after a positive lumbar puncture was the final report changed to positive. If these two patients are taken into account, the adjusted sensitivity of CT under 6-hours from symptom onset is only 98.3% (with the confidence interval dropping as low as 93.6%).

These findings of course do nothing accept muddy the already cloudy waters. Head CT though fairly sensitive, will on occasion miss a subarachnoid bleed. The addition of CSF aspirate will very often offer a further degree of ambiguity. Furthermore the utilization of LP, at least in its current strategy, leads to an unacceptable number of false positives, exposing a large number of patients to needless downstream testing. If a more liberal view towards RBCs in the CSF is taken, the LP’s utility may be justifiable. Even with the retrospective best fit diagnostic capabilities calculated by Perry et al, the prevalence of SAH following a negative CT in under 6-hours is so low that further testing will likely lead to identifying far more false positive results than true subarachnoid bleeds. Cleary the conviction and certainty we once held for this testing strategy has suffered. Perhaps it is time for a shared decision making model. After all it is our patients’ value systems rather than our own biases that should guide these investigative journeys. Dr. Perry has demonstrated that the CT-LP pathway is far from straightforward. Perhaps it is time we confess these imperfections to the world at large and begin a far more honest conversation.

Sources Cited:

  1. Perry JJ, Stiell IG, Sivilotti ML, et al. Sensitivity of computed tomography performed within six hours of onset of headache for diagnosis of subarachnoid haemorrhage: prospective cohort study. BMJ. 2011;343:d4277.
  2. Perry JJ, Alyahya B, Sivilotti ML, et al. Differentiation between traumatic tap and aneurysmal subarachnoid hemorrhage: prospective cohort study. BMJ. 2015;350:h568.
  3. Perry JJ, Stiell IG, Sivilotti ML, et al. Clinical decision rules to rule out subarachnoid hemorrhage for acute headache. JAMA. 2013;310:(12)1248-55.
  4. Perry  JJ, Stiell  IG, Sivilotti  ML,  et al.  High-risk clinical characteristics for subarachnoid haemorrhage in patients with acute headache: prospective cohort study. BMJ. 2010;341:c5204.
  5. Wexelman, BA et al. Survey of New York City Resident Physicians On Cause-Of-Death Reporting. 2010. Prev Chronic 2013 10:E76

The Adventure of the Cardboard Box Continues

sigmund-abeles_portrait-of-parasomniac

For those whose beliefs are already firmly in favor of endovascular therapy for acute ischemic stroke, the publication of the MR CLEAN trial earlier this year and more recently the EXTEND-IA and ESCAPE trials only serve as a big fat, “I TOLD YOU SO!” For the perpetual disbelievers, each of these trials possesses enough flaws to discredit their findings. For the appropriately skeptical among us, though these trials initially appear to discredit our well meaning rants, on closer examination they are far more validating.

Earlier this year the publication of a large, well done, RCT examining the use of endovascular treatment for acute ischemic stroke threatened to drastically change the acute management of CVA as we know it. And though this trial was given a most unfortunate name (MR CLEAN), it marked the first time endovascular therapy has demonstrated any clinically relevant benefit (1). We have discussed this trial in depth in two previous posts. While MR CLEAN’s results were promising, there are many reasons why they should be viewed with a healthy dose of skepticism. Before we commit to a resource heavy intervention like that of endovascular therapy, more studies validating these findings are required. Since the publication of MR CLEAN, two active trials were stopped early for benefit, seeming to be the very validation for which we asked. The results of both of these studies, EXTEND-IA and ESCAPE, were recently published in the NEJM (2,3).

The first trial, Extending the Time for Thrombolysis in Emergency Neurological Deficits — Intra-Arterial (EXTEND-IA) trial, by Campbell et al, is a multi-center RCT that examined the efficacy of endovascular treatment in patients with CVA whose symptoms began within 4.5 hours of randomization. Like MR CLEAN this trial was a stunning success. In fact its results far outpaced the, by comparison, paltry benefits found in MR CLEAN. EXTEND-IA was stopped early after enrolling 70 patients for overwhelming benefit. The rate of significant improvement after 3 days (reduction in NIHSS > 8) was 80% vs 37% in the endovascular group and control group respectively. Likewise the rate of favorable outcome at 90-days (mRS of 0-2) was 71% vs 40% respectively, boasting an absolute difference of 31% (2).

The second and far more statistically robust trial is the Endovascular Treatment for Small Core and Anterior Circulation Proximal Occlusion with Emphasis on Minimizing CT to Recanalization Times (ESCAPE) trial, published by Goyal et al. In this trial, authors examined patients up to 12-hours after symptom onset, (though the large majority of the patients enrolled were evaluated within 3-hours of symptom onset). Like EXTEND-IA, the ESCAPE trial was an overwhelming success. Authors randomized 316 patients to either standard care or standard care plus endovascular therapy. Like EXTEND-IA, the authors found overwhelming benefits of the endovascular therapy. The rate of functional independence at 90-days (mRS of 0-2) was 53.0% vs 29.3% in favor of the endovascular arm. With authors noting a 33.7% absolute increase in positive outcomes in patients who received endovascular therapy. For the first time in the history of reperfusion therapies for acute ischemic stroke, a clinically significant mortality benefit was demonstrated. 90-day mortality was 10.4% in the endovascular group compared to 19.0% in the control group. Not to mention the surprisingly low rate of intracranial hemorrhage, (3.6% vs 2.7%) (3).

Neither trial is definitive in its own right. The EXTEND-IA cohort only examined the efficacy of endovascular therapy in 70 patients. Originally intending to enroll 100 patients, this trial was stopped prematurely after an interim analysis demonstrated such impressive results. This premature investigation of the sealed data was not performed because of a pre-planned interim analysis, but rather because of the publication of MR CLEAN (2). Though the remaining 30 patients would have most likely not have altered the results, we cannot view this poorly powered trial as anything more than hypothesis building. In isolation, EXTEND-IA can only offer a guideline for the future of endovascular management in acute ischemic stroke. Even the authors themselves conceded this point in the statistical analysis plan they published in January 2014, in which they clearly defined EXTEND-IA as a phase II trial (4). A definition that is conveniently left out of the formal publication in the NEJM, an oversight possibly induced by the unexpected magnitude of their success causing well deserved delusions of grandeur.

ESCAPE, though far more statistically hardy than EXTEND-IA, is still a rather small cohort suffering from the same unfortunate biases. Originally intending to enroll 500 patients, the authors called for an early stoppage, prior to their planned interim analysis, again because of the results of MR CLEAN. Although the sample size of 316 patients lends a stronger validity than the 70 patients examined in EXTEND-IA, the early stoppage prevents us from confidently assessing the true effect size this treatment may provide. Interestingly when implementing this unplanned analysis, the authors utilized a dichotomous outcome comparing the mRS scale of patients alive and independent (mRS of 0-2) at 90-days rather than the ordinal analysis they had originally chosen and utilized as their primary outcome when performing the power calculation. The ordinal scale has recently gained favor as an outcome measure in stroke trials because of its ability to augment the p-value and turn otherwise negative trials into statistical successes. Conversely it is almost impossible to determine the clinical relevance of the odds ratio it produces. Given the impressive benefits of both trials, the small statistical augmentations offered by ordinal analysis are irrelevant. As such the authors of both trials favored the more traditional dichotomous outcome. The 33.7% absolute difference measured by the dichotomous scale in the ESCAPE trial, appears far more impressive than an odds ratio of 2.6 offered by ordinal analysis (3).

With the overwhelming success of both EXTEND-IA and ESCAPE, the MR CLEAN data appears almost lacking. In the MR CLEAN cohort, patients randomized to receive endovascular therapy had a 14% absolute benefit over those in the controls. It is safe to say neither group did all that well, with the amount of patients alive and independent at 90-days reported as 33% and 19% respectively(1). The EXTEND-IA and ESCAPE cohorts however did exponentially better (71% vs 41% and 53.0% vs 29.3% respectively) (2,3). Are we truly looking at the same patients as were examined in MR CLEAN, or do the EXTEND-IA and ESCAPE cohorts represent a completely different population?

It should come as no surprise that both the EXTEND-IA and ESCAPE cohorts included vastly different patients than those enrolled in MR CLEAN. In MR CLEAN, to be eligible for inclusion patients were required to have an occlusion of distal intracranial carotid artery or middle cerebral artery (M1, M2) or anterior cerebral artery (A1) as identified by CT angiography (CTA), magnetic resonance angiography (MRA) or digital subtraction angiography (DSA)(1). Both EXTEND-IA and ESCAPE had far stricter inclusion restrictions. Patients who were enrolled in the EXTEND-IA cohort needed to demonstrate an ischemic penumbra on perfusion imaging with a small infarcted core(2). Though slightly different criteria were utilized, like EXTEND-IA, the ESCAPE cohort used CT angiographic imaging to identify patients with small infarcted cores and large areas of salvageable tissue (3). These inclusion criteria significantly narrowed the subset of stroke patients examined. These differences in patient selection are not only responsible for the almost unbelievable efficacy demonstrated in both of the EXTEND-IA and ESCAPE trials, they mark the first time that imaging criteria was successfully used to identify a cohort of stoke patients who may benefit from reperfusion therapy.

There has been a long history of failure in the use of perfusion imaging for the management of acute ischemic stroke. Early studies investigating the use of diffusion weighted MRI to identify potentially salvageable ischemic brain failed to show benefit (5,6,7,8,9). These failures may be due in part to the industry bias of only enrolling patients presenting > 3 hours after onset, in the hopes of extending FDA approved treatment windows and more importantly their profit margins. Though these trials showed promising rates of reperfusion, the consistently high incidence of intracranial hemorrhage overshadowed the minimal benefits. The MR RESCUE trial, published in NEJM in February 2013 was the first to utilize this technology to identify potential candidates for endovascular therapy (10). Again this trial failed to demonstrate that patients with ischemic penumbrae benefitted from revascularization. However this may have been due more to the trial’s flawed design than the technology’s deficiencies. The authors of MR RESCUE only enrolled patients after initial IV tPA failure. In contrast to these historical failures both the EXTEND-IA and ESCAPE cohorts, unencumbered by fears of disproving tPAs early successes, aggressively pursued reperfusion therapy after salvageable tissue was identified on CT imaging. In doing so, these trials have, for the first time, identified the population that will most likely benefit from reperfusion therapy.

At the risk of sounding optimistic, both EXTEND-IA and ESCAPE are impressively positive trials. Although small and methodologically flawed, with likely exaggerated effect sizes, when viewed in concert with MR CLEAN, these trials present endovascular therapy in a promising light. For some time now legitimate cries for more data regarding tPA’s safety and efficacy in acute ischemic stroke management have been disregarded and marginalized. This almost fanatical acceptance based around the success of the NINDS trial, a single poorly powered study which treated patients with IV tPA within 3-hours of symptoms onset. Despite the many methodogical flaws of NINDS, its results were never duplicated because of the pharmaceutical industry’s fear of losing the tenuous ground they had gained. Although there are significant harms associated with the administration of tPA, the literature has consistently suggested that there is a subset of patients who will benefit from its administration. Rather than working to identify this narrow population, we have witnessed an industry driven effort to expand the indications for reperfusion therapy. EXTEND-IA and ESCAPE have identified potential cohorts of patients who will likely benefit from reperfusion therapy. If these results can be confirmed, no longer will we be forced to use the blunt tool of perceived time from symptom onset to determine which patients are eligible for treatment. These trials should inspire us to not only explore the successful utilization of endovascular therapy, but also reexamine the harmful practice of thrombolytic therapy we currently employ.

Sources Cited:

  1. Berkhemer OA, Fransen PS, Beumer D, et al. A randomized trial of intraarterial treatment for acute ischemic stroke. N Engl J Med. 2015;372:(1)11-20.
  2. Campbell BC, Mitchell PJ, Kleinig TJ, et al. Endovascular Therapy for Ischemic Stroke with Perfusion-Imaging Selection. N Engl J Med. 2015.
  3. Goyal M, Demchuk AM, Menon BK, et al. Randomized Assessment of Rapid Endovascular Treatment of Ischemic Stroke. N Engl J Med. 2015.
  4. Campbell BC, Mitchell PJ, Yan B, et al. A multicenter, randomized, controlled study to investigate EXtending the time for Thrombolysis in Emergency Neurological Deficits with Intra-Arterial therapy (EXTEND-IA). Int J Stroke 2014;9:126-132
  5. Davis SM, Donnan GA, Parsons MW, et al. Effects of alteplase beyond 3 h after stroke in the echoplanar imaging thrombolytic evaluation trial (EPITHET): a placebo-controlled randomised trial. Lancet Neurol. 2008;7:299–309.
  6. Albers GW, Thijs VN, Wechsler L, et al. Magnetic resonance imaging profiles predict clinical response to early reperfusion: the diffusion and perfusion imaging evaluation for understanding stroke evolution (DEFUSE) study. Ann Neurol. 2006;60:508–517
  7. Hacke W, Albers G, Al-Rawi Y, et al. The desmoteplase in acute ischemic stroke trial (DIAS): a phase II MRI-based 9-hour window acute stroke thrombolysis trial with intravenous desmoteplase. Stroke. 2005;36:66–73.
  8. Furlan AJ, Eyding D, Albers GW, et al. Dose Escalation of Desmoteplase for Acute Ischemic Stroke (DEDAS): evidence of safety and efficacy 3 to 9 hours after stroke onset. Stroke. 2006;37:1227–1231.
  9. Hacke W, Furlan AJ, Al-Rawi Y, et al. Intravenous desmoteplase in patients with acute ischaemic stroke selected by MRI perfusion-diffusion weighted imaging or perfusion CT (DIAS-2): a prospective, randomised, double-blind, placebo-controlled study. Lancet Neurol. 2009;8:(2)141-50.
  10. Kidwell CS, Jahan R, Gornbein J, et al. A trial of imaging selection and endovascular treatment for ischemic stroke. N Engl J 2013;368:(10)914-23.

The Adventure of the Blanched Soldier

fallen-bugler

 

So often in the management of the critically ill we are forced to choose between the lesser of two evils. The transfusion of blood products in the face of hemorrhagic shock is in some ways the best compromise of less than ideal choices. Every drop of resuscitative fluid given that does not mimic the blood a patient has recently lost further dilutes their already diminished coagulative capabilities. And yet an overtly zealous administration of blood products has the potential to cause a multitude of adverse events downstream, further complicating the patient’s potentially arduous recovery. That being said, the endeavor to replenish as close a surrogate to whole blood as logistically possible is an extremely feasible concept to accept as beneficial. Yet despite this strong biological plausibility, the balanced administration of packed red blood cells (PRBCs), plasma and platelets has never been demonstrated to be efficacious beyond this physiologic reasoning. A number of retrospective trials examining this concept have claimed benefit(1,2,3,4), but their results are so confounded by survivor bias, it is difficult to interpret their true meaning (8). Even the PROMMTT trial, the largest trial to examine this question in a prospective fashion, failed to include a prospectively randomized control group and as such, its results were equally limited. With the publication of the PROPPR trial, the first large RCT to evaluate the efficacy of a balanced transfusion strategy, we finally have some strong data to guide us (6). On first glance this well-done RCT seems to have vindicated those in support of the 1:1:1 transfusion strategy, but I fear, in reality it may have left us with more questions than answers.

The Pragmatic, Randomized Optimal Platelet and Plasma Ratios (PROPPR) Trial by Holcomb et al, published in JAMA on February 3, 2015, sought to identify the preferential ratio of plasma, platelets, and blood cells when resuscitating the critically ill trauma patient. The authors randomized 680 patients to either a 1:1:1 or 1:1:2 ratio of plasma, to platelets, to PRBCs. Inclusion criteria included; patients identified as having severe bleeding or being at risk of severe bleeding (defined as having at least 1 U of any blood component transfused prior to hospital arrival or within one hour of admission and prediction by an Assessment of Blood Consumption score of 2 or greater or by physician judgment of the need for a massive transfusion). Although authors specified the order and ratio that blood components should be transfused, the decision to administer products was left to the discretion of the treating physician. Using this pragmatic trial design authors hoped to examine the effects of each transfusion strategy on the primary endpoints, 24-hour and 30-day mortality. Holcomb et al also examined a number of secondary endpoints of importance including, time to hemostasis and the number and type of blood products administered until hemostasis was achieved.

On first glance the difference in transfusion strategies did not seem to make a difference, as the authors failed to find statistical significance in either of their two primary endpoints. A closer look reveals that this was more likely due to the authors overestimation of the true effect size of the 1:1:1 ratio rather than a lack of efficacy for this balanced transfusion strategy. Specifically the 24-hour mortality was 12.7% and 17.0% in the 1:1:1 and 1:1:2 groups respectively. Though not statistically significant this 4.3% absolute difference in favor of the more aggressive transfusion strategy clearly trends towards clinical relevance. Especially given that the rate of death due to exsanguination (9.2% vs 14.6%) and the percentage of patients who achieved hemostasis (86.1% vs 78.1%) were noticeably improved. Likewise though the 30-day mortality failed to reach statistical significance, it did maintain a robust absolute difference of 3.7% in favor of the 1:1:1 group.

As far as the transfusion related adverse events, the 1:1:1 strategy appears to be safe when compared to a less aggressive protocol. None of the 23 adverse events prospectively recorded seemed to occur with a greater regularity in patients randomized to the more aggressive strategy. There was a slight non-significant surge in the rate of systemic inflammatory response syndrome (SIRS) (5.2% absolute increase) in patients randomized to the 1:1:1, but it is hard to make much of this as the rates of both sepsis and acute respiratory distress syndrome seem equivalent.

It is important to note, despite the authors best intentions, this trial did not truly compare 1:1:1 vs 1:1:2 resuscitative strategies. Rather Holcomb et al examined a protocol intending to give 1:1:1 vs 1:1:2.  In reality neither group truly reached their proportional expectations. The 1:1:1 group in actuality was given products closer to a 2:1:2 ratio, while the 1:1:2 group only received products in a 2:1:4 ratio. It is difficult to know how these shortcomings affected outcomes.

By all intents and purposes it seems the rate of adverse reactions was not significantly increased when a more aggressive use of plasma and platelets was administered, though these results may too have been biased by the less than stringent implementation of each groups assigned blood product ratio. Throughout the intervention period the 1:1:1 group received a significantly higher ratio of PRBCs to plasma and PRBCs to platelets than the 1:1:2 group. However this ratio was reversed when the post-intervention period was examined. During the post-intervention period the treating physicians were able to select blood products in any ratio they deemed clinically relevant, and as such they attempted to replenish all the plasma and platelets they were restricted from giving during the intervention period. Though the total quantity was far less than what was given in the intervention period, the PRBCs to plasma to platelets ratio was higher in the 1:1:2 group during the post-intervention period. This in and of itself may have led to an increase in the rates of adverse events observed in the 1:1:2 group without providing the coagulative benefits the early administration of these products provided in the 1:1:1 group.

Despite some minor inconsistencies, the results appear to be a validation of the balanced transfusion strategy. And yet one has to ask, “what did these authors truly demonstrate?” Holcomb et al compared a 1:1:1 strategy to the slightly more conservative 1:1:2 strategy. Ideally the only difference in these two groups should have been that the 1:1:1 group received marginally more platelets and plasma during the initial resuscitation. Are these two transfusion strategies really dissimilar enough to demonstrate a clinically relevant difference? Should they have compared a balanced transfusion strategy to a reaction method where platelets and plasma are only administered when patients develop a coagulopathy? More importantly is any empirically chosen ratio the ideal strategy in today’s age of point of care testing? In 2013 CMAJ published a trial by Bartolomeu et al that compared a fixed ratio similar to that used by Holcomb et al (1:1:1) to a laboratory-guided transfusion strategy (7). In this laboratory-guided strategy, blood product administration was guided by INR, PTT, Hb and platelet values. Although the trial was far too small to be definitive (n=67), the results were interesting nonetheless.  The mortality in the laboratory-guided group was far less at 14.3% when compared to the 32.5% observed in the 1:1:1 strategy. Although a lab value guided resuscitation strategy is clearly impractical in the acute resuscitation period, a point-of-care based system like TEG may provide us with the instantaneous feedback we require to tailor our resuscitation strategies to the specific needs of the patient rather than the empiric strategy currently advised.

I doubt these results will lead to a significant change in practice. It seems the 1:1:1 massive transfusion strategy has become firmly entrenched in trauma resuscitation dogma. At least the PROPPR trial offers support to the notion that if one is going to use an empirically based transfusion strategy, striving for a equal ratio of cells, plasma and platelets appears to be of some benefit.

Sources Cited:

1. Holcomb JB, Wade CE, Michalek JE, Chisholm GB, Zarzabal LA, Schreiber MA et al. Increased plasma and platelet to red blood cell ratios improves outcome in 466 massively transfused civilian trauma patients. Ann Surg 2008; 248: 447–458.

2. Borgman, M.A., Spinella, P.C., Perkins, J.G. et al. The ratio of blood products transfused affects mortality in patients receiving massive transfusions at a combat support hospital. J Trauma 2007; 63: 805–813.

3. Holcomb, J.B., Wade, C.E., Michalek, J.E. et al. Increased plasma and platelet to red blood cell ratios improves outcome in 466 massively transfused civilian trauma patients. Ann Surg 2008; 248: 447–458.

4. Maegele, M., Lefering, R., Paffrath, T. et al. Red-blood-cell to plasma ratios transfused during massive transfusion are associated with mortality in severe multiple injury: a retrospective analysis from the Trauma Registry of the Deutsche Gesellschaft für Unfallchirurgie. Vox San. 2008; 95: 112–119.

5. Holcomb, J.B., del Junco, D.J., Fox, E.E. et al. The Prospective, Observational, Multicenter, Major Trauma Transfusion (PROMMTT) study. JAMA Surg 2013; 148: 127–136.

6. Holcomb JB, Tilley BC, Baraniuk S, et al. Transfusion of plasma, platelets, and red blood cells in a 1:1:1 vs a 1:1:2 ratio and mortality in patients with severe trauma: the PROPPR randomized clinical trial. JAMA 2015;313:(5)471-82.

7. Bartolomeu et al. “Effect of a Fixed-Ratio (1:1:1) Transfusion Protocol Versus Laboratory-Results–guided Transfusion in Patients with Severe Trauma: a Randomized Feasibility Trial.” CMAJ : Canadian Medical Association Journal 185.12 (2013): E583–E589. PMC. Web. 6 Feb. 2015.

8. Ho AM, Zamora JE, Holcomb JB, Ng CS, Karmakar MK, Dion PW. The Many Faces of Survivor Bias in Observational Studies on Trauma Resuscitation Requiring Massive Transfusion. Ann Emerg Med 2015.

The Case of the Balanced Solution

FEA2-3

Saline-based resuscitation strategies were first proposed as far back as 1831 during the Cholera Epidemic. In an article published in the Lancet in 1831, Dr. O’Shaughnessy suggests the use of injected salts into the venous system as a means of combating the dramatic dehydration seen in patients afflicted with this bacterial infection(1). Saline’s potential harms were first observed in post-surgical patients who after receiving large volumes of saline based resuscitation fluids during surgery were found to have a hyperchloremic acidosis (2). Though these changes appear transient and clinically trivial, it is theorized that when applied to the critically ill, the deleterious effects on renal blood flow may increase the rate of permanent renal impairment and even death. Unfortunately, no large prospective trials have demonstrated this hypothesis to be anything more than physiological reasoning. Small prospective trials have exhibited trivial trends in decreased renal blood flow, kidney function, and increased acidosis, though these perturbations were fleeting and of questionable clinical relevance (3, 4, 5, 6, 7). A larger retrospective study, bringing all the biases such trials are known to carry, demonstrated small improvements in mortality of ICU patients treated with a balanced fluid strategy, though it failed to demonstrate improvements in renal function (the theoretical model used to support balanced fluid administration) (8). In 2012 Yonus et al were the first to attempt to prospectively answer this question in an ICU population. Published in JAMA, on first glance the results seemed to vindicate those in support of the use of balanced fluids (9). Yet despite its superficial success, a closer look reveals this trial does little to demonstrate the deleterious effects of chloride-rich resuscitative strategies. In a recent publication in Intensive Care Medicine, Yonus et al re-examine this question in the hopes of once again demonstrating the benefits of balanced fluid strategies for the resuscitation of the critically ill (10).

In the original publication Yonus et al, using a prospective open-label before and after cohort design, hoped to demonstrate that use of balanced fluids in ICU patients would lead to improved renal function and decreased administration of renal replacement therapy (RRT). For the initial 6-month period fluid administration was left entirely to the whims of the treating intensivist. This was followed by a 6-month span during which ICU staff were trained and educated on the evils of chloride-rich solutions and the benefits of a more balanced approach to fluid selection. Following this smear campaign on normal saline and its high-chloride co-conspirators, authors spent the next 6-months recording fluid administration and subsequent patient outcomes. The authors’ co-primary outcomes were the increase in creatinine levels above baseline during ICU stay and the incidence of acute kidney injury (AKI) as defined by the RIFLE(Risk, Injury, Failure, Loss, End-stage) criteria. Secondary outcomes listed by the authors included the need for RRT, ICU length of stay, and mortality (9).

As far as convincing ICU staff that balanced solutions were beneficial, the authors’ experiment was an overwhelming success. 1,533 patients were examined, 760 patients during the 6-month control period and 773 patients during the subsequent 6-month intervention period. The total amount of normal saline used over the two periods was 2,411L and 52L respectively. Likewise the total chloride administration decreased by a total of 144,504 mmol, or by 198 mmol/patient (9).

On face value the study appears to have been a success, demonstrating statistically significant benefits for both primary outcomes. During the intervention period patients experienced a statistically lower rise in creatinine levels, 14.8 μmol/L (95% CI, 9.8-19.9 μmol/L) than during the control period 22.6 μmol/L (95% CI, 17.5-27.7 μmol/L). Authors also found a 5.6% absolute decrease in the rate of RIFLE defined kidney injury and kidney failure in patients during the intervention period when compared to those in the control period (9).

These seemingly positive results should be tempered by the fact that while statistically significant, the differences are, for the most part, clinically irrelevant. A 7.8 μmol/L increase in creatinine translates to an approximately 0.09 mg/dl difference between the control and intervention periods, which is hardly clinically pertinent. The 5.6% difference in rate of AKI was primarily powered by the 3.3% difference in rate of the less severe RIFLE class, kidney injury. When kidney failure was examined alone, unaccompanied by this statistical augmentation, the difference was found to be statistically insignificant (9).

Even the 3.7% absolute decrease in RRT in the intervention period (10.0% vs 6.3%) is hard to conclusively attribute to the balanced fluid strategy, given the open nature of the trial design and the fact that these benefits did not translate into either a decrease in the rate of long term dialysis requirements or mortality. Furthermore the annual rates of RRT during the control and intervention periods are almost identical (7.4% vs 7.9%). In fact, the rates of RRT in the years bookmarking this study are highly variable, which speaks to the potential for unmeasured bias and the cyclic nature of random chance causing the observed differences in these groups, rather than the intervention in question. It is important to remember that though RRT appears to be a finite objective endpoint, it is largely dependent on the treating physician’s subjective judgement. In an open label design such as this, in which the authors are clearly in favor of one intervention over another, the potential for bias affecting this outcome is evident (9).

In a secondary analysis of their data set, Yunos et al hoped to address some of these uncertainties. In this manuscript, published in Intensive Care Medicine in 2014, the authors added an additional 6 months of patient data to both the control and intervention periods, with the intention to demonstrate that the positive findings of their initial publication were due to the favorable influences of balanced fluids. The control period was expanded to include patient data (n=716) from the 6-month period prior to the study’s original start date. The authors then incorporated an additional 6 months of data to the intervention group (n=745) after its original stop date. Overall the two augmented periods ran from February 2007 to February 2008 and August 2008 to August 2009. The authors again found success. And though their primary endpoints remained of questionable clinical significance, the magnitude of their triumph was certainly more impressive (10).

With the addition of this 12-month period of data, the authors boast a 4.8% absolute decrease in the rate of moderate or severe kidney injury as compared to the control. Though the absolute difference in the rate of RRT decreased from 3.6% to 3.0%, when the additional patients were added to the analysis, the difference still remained statistically significant (10). Interestingly, despite both the added control and intervention groups regressing to the mean, the overall magnitude of benefit reported by the authors seemed to increase. This slight of hand was achieved not by some complex form of statistical wizardry, but rather simply lowering the bar for what the authors defined as success.

In their original manuscript, Yunos et al used the RIFLE criteria to define the varying degrees of AKI. Conversely in the more recent publication, AKI was evaluated using the Kidney Disease: Improving Global Outcomes (KDIGO) scale. Despite its grandiose title, in reality this scale is essentially the amalgamation of the previous two scales traditionally used to define AKI (the RIFLE and AKIN criteria). Creators of the KDIGO criteria hoped to identify a greater proportion of patients who would benefit from RRT, and thus created a novel tool by incorporating both definitions of AKI (11). Of course, as is typical with any diagnostic tool, augmenting its sensitivity is achieved by sacrificing its specificity.

Such is the case for the KDIGO score. Not surprisingly, when examined, the KDIGO score identified significantly more patients in renal failure than either the RIFLE or AKIN criteria. In a trial published by Critical Care in 2014, Luo et al compared RIFLE, AKIN and KDIGO’s abilities to identify clinically important AKI (12). They found that the use of the KDIGO criteria identified more overall patients as having AKI (51% compared to 46.9% and 38.4% respectively) as well as classified an larger subset of patients as being in failure (16% compared to 13.8% and 12.8% respectively). Despite the increased yield, no difference was seen in each respective criterion’s abilities to predict death (AOC were 0.738, 0.746, 0.757 respectively). It is still unclear whether the additional patients identified using the KDIGO criteria benefit from early aggressive management of their subtle renal impairment or are harmed from the invasive interventions performed in hopes of treating pathology that would likely resolve without interference. What is clear is that changing from the more conservative RIFLE criteria to the more liberal KDIGO, makes interpreting the clinical relevance of Yunos et al’s results difficult.

In the 2014 publication by Yunos et al, the absolute difference in AKI is similar to that described in the 2012 publication (4.8% vs 5.6%), but unlike their original population there is a shift to a more severe spectrum of renal impairment. Using the KDIGO criteria authors found significantly more stage 3 AKI than in their original publication. In the original manuscript the difference in RIFLE failure (class 3) AKI failed to reach clinical significance. In their updated cohort the authors now cite a statistically significant decrease in the rate of KDIGO class 3 AKI (the equivalent of RIFLE failure). The original trial states an absolute difference in the rate of RIFLE class 3 AKI of 2.1%. In their more recent document Yunos et al now cite a 4% (14% vs 10%) absolute decrease in KDIGO stage 3 AKI. Likewise the original manuscript states an absolute difference of 3.3% in the rate of RIFLE class 2 AKI. In the more recent document this same difference is now stated to be only 2%. Clearly the use of the KDIGO criteria has shifted the severity of the cohort in an alarming fashion. This increase in class 3 AKI may be a more accurate interpretation of reality, but given that these differences did not translate into a decrease in either long-term dialysis or mortality, its clinical relevance is unlikely.

Even these clinically questionable differences cannot be directly attributed to the more balanced fluid strategy utilized during the intervention period. It is equally likely the multiple biases introduced by a before and after study design were responsible. Using a multivariant regression model, Yunos et al hoped to account for many of these biases. On initial presentation authors seem to be vindicated in their assertions that these differences in renal function were due to the change in fluid administration. When the addition of the extended control and intervention periods were included in the multivariable analysis, the rate of KDIGO stage 2 and 3 AKI and RRT remained statistically significant. This benefit was powered completely by the initial cohort, the addition of the extended cohorts served only to regress these benefits towards the mean. The odds ratio in the original cohort for preventing AKI was 1.68 (1.28-2.21). When the extended groups were incorporated the odds ratio falls to 1.32 (1.11-1.58).  In fact a thorough examination comparing the four time periods uncovers the initial results are hardly as robust as they originally appear.  When the extended time period is examined alone (control vs intervention), there was no difference between in the incidence of AKI or RRT. Additionally when the extended control is compared to the original intervention period, the decrease in difference in AKI remains significant but the rate of RRT is no longer statistically significant. There is even a statistically significant increase in the rate of AKI when the original intervention period is compared to the extended intervention period. In fact this is the very same difference in both AKI and RRT that is observed when comparing the original control group to the extended intervention group (10) . Essentially, though it was the authors intent to validate the findings of their initial study, the inconsistent benefits demonstrated in the extended cohort do just the opposite.  These differences seem to be due more to random chance than any beneficial effects of a balanced fluid strategy.

The interpretation of medical literature very rarely is as straightforward as we would like to imagine. Much like searching for truth in a magic mirror, so often it serves only to confirm our own beliefs and supports our incredulities. And yet if we are to claim to be authentic curators of truth in medicine, it is important we apply just as much academic rigor when examining topics which we support as we do with those we distrust. A balanced approach to fluid administration has a strong physiological base to support its use. But physiologic reasoning has led us down many blind paths and dark alleys. It is only when we shine the light of critical research we reveal which are dead ends and which lead us and our patients to a better place. Currently we are uncertain as to whether the success of a balanced fluid strategy is due to its chloride-sparring effects or due to the uncontrollable bias introduced by a non-randomized, unblinded trial design, with serious potential for the Hawthorne effect. It may very well be that any fluid in excess is harmfull and “balanced” fluids high in acetate and lactate have their very own unintended consequences when administered in high volumes. The SPLIT trial (scheduled to be published in 2015) may validate our beliefs in the superiority of a balanced fluid strategy, but until then it is important we resist the urge to become quite so dogmatic with our cries of indignation towards chloride-rich solutions.

 A brief disclosure: I am, in fact, overwhelmingly and irredeemably in favor of the Stewart approach to acid-base disorders. although there is no convincing evidence directly demonstrating its superiority over the more traditional Henderson-Hasselbalch model, its elegance and intuitive nature make it perfect for the swirling chaos and uncertainty of the Emergency Department. As such it is not hard to imagine that the more judicious administration of fluid, specifically those high in chloride content, would benefit our patients by reducing hyperchloremic acidosis and the concomitant renal failure. I am however, less enthused by the evidence supporting this premise.

-A special thanks to Anand Swaminathan (@EMSwami) for his thoughts and guidance during the writing of this post.

-As always a special thanks to my ever patient wife, Rebecca Talmud(@DinosaurPT), for her editorial wizardry without which this blog would be the unstructured ramblings of a madman.

Sources Cited:

  1. O’Shaugnessy, WB (1831). “Proposal for a new method of treating the blue epidemic cholera by the injection of highly-oxygenated salts into the venous system”. Lancet 17 (432): 366–71
  2. Scheingraber S, Rehm M, Sehmisch C, Finsterer U. Rapid saline infusion produces hyperchloremic acidosis in patients undergoing gynaecologic surgery. Anesthesiology. 1999;90:1265–1270
  3. Quilley CP, Lin Y-S, McGiff JC. Chloride anion concentration as a determinant of renal vascular responsiveness to vasoconstrictor agents. Br J Pharmacol. 1993;108:106–110
  4. Hansen PB, Jensen BL, Skott O. Chloride regulates afferent arteriolar contraction in response to depolarization. Hypertension. 1998;32:1066–1070.
  5. O’Malley CM, Frumento RJ, Hardy MA, Benvenisty AI, Brentjens TE, Mercer JS, Bennett-Guerrero E. A randomized, double-blind comparison of lactated Ringer’s solution and 0.9% NaCl during renal transplantation. Anesth Analg. 2005;100:1518–1524
  6. Waters JH, Gottlieb A, Schoenwald P, Popovich MJ, Sprung J, Nelson DR. Normal saline versus lactated Ringer’s solution for intraoperative fluid management in patients undergoing abdominal aortic aneurysm repair: an outcome study. Anesth Analg. 2001;93:817–822.
  7. Hatherill M, Salie S, Waggie Z, Lawrenson J, Hewitson J, Reynolds L, Argent A. Hyperchloraemic metabolic acidosis following open cardiac surgery. Arch Dis Child. 2005;90:1288–1292
  8. Raghunathan K, Shaw A, Nathanson B, Stu ̈ rmer T, Brookhart A, Stefan MS, Setoguchi S, Beadles C, Lindenauer PK (2014) Association between the choice of IV crystalloid and in-hospital mortality among critically ill adults with sepsis. Crit Care Med 42:1585–1591
  9. Yunos NM, Bellomo R, Hegarty C, Story D, Ho L, Bailey M (2012) Association between a chloride-liberal vs chloride-restrictive intravenous fluid administration strategy and kidney injury in critically ill adults. JAMA 308:1566–1572
  10. Yunos NM, Bellomo R, Glassford N, Sutcliffe H, Lam Q, Bailey M. Chloride-liberal vs. chloride-restrictive intravenous fluid administration and acute kidney injury: an extended analysis. Intensive Care Med. 2014.
  11. http://www.kdigo.org/clinical_practice_guidelines/pdf/CKD/KDIGO_2012_CKD_GL.pdf
  12. Luo X, Jiang L, Du B, et al. A comparison of different diagnostic criteria of acute kidney injury in critically ill patients. Crit Care. 2014;18:(4)R144.