The Adventure of the Impassable Stone

Carbuncle_pagetAs medical skeptics we have a tendency to revel in the negative study. We bemoan the p-value’s tendency to underestimate the risk of type I error and cite Frequentist statistics’ history of getting it wrong almost as often as it gets it right. Despite these nihilistic inclinations it is important that we are equally vigilant in identifying circumstances in which the risk of type II errors is high. A number of recent trials examining the use of medical expulsion therapy (MET) in ureteral colic illustrate the risk of such errors.

The first of these trials published by Pickard et al in The Lancet, in May 2015, examined both alpha blocker (tamsulosin 0.4 mg) and calcium channel blocker (nifedipine 30 mg) therapy in patients with CT confirmed ureterolithiasis (1). The authors randomized 1137 patients with stones 10 mm or less to receive either 0.4 mg of tamsulosin, 30 mg of nifedipine or placebo. Patients were excluded if they presented with obvious signs of sepsis, had significant renal failure (GFR<30) or required immediate invasive therapy as prescribed by the treating physician.

The authors found there to be no significant difference in their primary outcome, the rate of spontaneous passage at 4-weeks, between those randomized to the tamsolusin, nifedipine or placebo arms. Spontaneous stone passage, defined by absence of need for intervention to assist stone passage during the 4 week follow up, was 307 (81%), 304 (80%), and 303 (80%) respectively. There was also no significant differences noted in the need for pain medication, the number of days pain medication was required, or the visual analog scale (VAS) of patients pain at 4 weeks (1). By all accounts this was an impressively negative trial.

A second study was recently published online in July 2015 in Annals of Emergency Medicine. Like the Pickard et al trial, this trial, by Furyk et al examined the effects of MET in patients with CT confirmed ureterolithiasis(2). The authors randomized patients with stones 10 mm or less located in the distal ureter to either MET with 0.4 mg of tamsulosin or placebo. Patients were excluded if they demonstrated signs of infection or presented with a compromised GFR. And like the previous study, the authors found no statistical difference in the number of patients who experienced stone passage at 28 days (87.0% and 81.9% in the tamsulosin and placebo groups respectively)(2). We now have two high quality RCTs demonstrating that the use of MET is not beneficial in the management of acute ureteral colic. This should conceivably end the debate regarding the utility of alpha blockade for ureteral colic.

And yet despite what on first glance appears to be convincing evidence, neither of these trials address the pressing question regarding MET. The majority of patients in both these trials had stones less than 5 mm in diameter. Most small stones will pass without difficulty (6,7). As these trials demonstrate it is impossibly hard to show a statistically significant difference in an undifferentiated cohort of renal colic patients. The real question is, does MET work in patients with stones greater than 5 mm in diameter? Can these trials definitively demonstrate a lack of utility of MET in these patients?

To examine this question appropriately we first must define statistical power. Power is the ability of a trial to detect a statistically significant difference between two groups when a true difference exists (3). It is the ability to separate true positives from false negatives, essentially the trial’s sensitivity. Traditionally, an acceptable statistical power has been set at 80 or 90%. The true meaning of such a statement is nebulous and it becomes far easier to understand statistical power when utilizing quantifiable measures.

The Pickard et al trial based their sample size calculation on the ability to detect a 10% absolute difference between the tamsulosin group and its comparators with a power of 90%(1). What this translates to is, if the observed difference between the tamsulosin group and its comparators were zero (p=1.0), the trial would not be able to confidently rule out an absolute difference as large as 6%. Conversely if the trial did in fact find a 10% improvement in patients randomized to alpha blockade, this effect size could range as low as 4% or as high as 16%. In fact, this is exactly what they found. The 95% confidence interval surrounding 1% absolute risk reduction (ARR) in patients randomized to receive tamsulosin was –4.4% to 6.9 %. Conversely, in the subset of patients with stones greater than 5 mm in width, Pickard et al observed an absolute difference of 10% in the rate of stone passage at 4 weeks in favor of those randomized to receive tamsulosin. This difference did not reach statistical significance. It is important to note that power is a prospective concept calculated prior to knowing the results of a study. To retrospectively state a trial is underpowered once the results of the study are known is somewhat disingenuous. The claim that the observed difference is true and only failed to reach statistical significance due to an inappropriately small sample size, may in fact be correct, but is not justifiable due to the data alone. Any post-hoc power calculation performed on such a data set will inevitably demonstrate the limited ability to differentiate a true difference from the null hypothesis(4). Once the trial results are obtained, post-hoc calculations should be avoided, focusing instead on the confidence intervals surrounding the point estimates for a more honest interpretation of the data (3). In this case, we are unable to differentiate a 10% difference in stone passage from no effect. In fact the 95% confidence interval ranged from -2.8% to 23.6% (1). Clearly this trial was not designed to answer the question of whether MET is beneficial in patients with large diameter ureteral stones.

The results of the Furyk trial are even more compelling. Though the primary endpoint was the overall proportion of patients with stone passage at 28-days, the authors powered their study for an entirely different question. The study was powered to detect a difference in the rate of stone passage in patients with larger stone diameters (5-10 mm). The authors calculated they would require 98 patients with stones greater than 5 mm to detect a 20% difference in stone passage with an 80% power (2). This means that if no difference was observed, the authors would be unable to exclude a difference as large as 14%. While their primary outcome was negative, in the subgroup of patients this study was powered to examine, the authors found a 22.4% absolute difference in the rate of stone passage at 28-days. The confidence interval surrounding this point estimate ranged from 3.1%-41.6%. Although it is unwise to make claims of significance based off a secondary endpoint with such a wide confidence interval, it is equally unfair to use this data to disprove a hypothesis, which this trial is not designed to refute.

We are all aware of the hazards of subgroup analyses, and yet it is important to be honest in our skepticism. This in no way should be viewed as an endorsement of MET or the necessity of obtaining imaging to identify a subgroup of patients who may benefit from tamsulosin. On the contrary, these trials demonstrate that for the majority of patients presenting to the Emergency Department with renal colic, MET provides little additional benefit above symptomatic treatment. But a trial can only answer the question it was designed to ask. Neither of these trials were built to confidently address whether MET is beneficial in patients presenting with larger stones. Earlier trials examining this question are either so confounded by non-blinding and selection bias to make them interpretable or suffer from the same deficiencies in statistical power to confidently address the effects of MET for patients with larger stones (5). We are left with statistical and philosophical uncertainty regarding the utility of alpha-blockers in acute ureteral colic. We will continue to exist in this state of ambiguity until we have a study sufficiently powered to ask whether MET is efficacious in patients with large ureteral stones. Many would love to discard alpha-blockers for renal colic in our ever-growing pile of medical impotencies, but given the current state of the literature, this renouncement would be premature and unjust.

Sources Cited:

  1. Pickard R, Starr K, Maclennan G, et al. Medical expulsive therapy in adults with ureteric colic: a multicentre, randomised, placebo-controlled trial. Lancet. 2015
  2. Furyk, Jeremy S. et al. Distal Ureteric Stones and Tamsulosin: A Double-Blind, Placebo-Controlled, Randomized, Multicenter Trial. Annals of Emergency Medicine. Published online: July 17 2015
  3. Goodman SN, Berlin JA. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med. 1994;121(3):200-6.
  4. Goodman SN. A comment on replication, P-values and evidence. Stat. Med. 1992;11:875-9.
  5. Campschroer, T., Zhu, Y., Duijvesz, D. et al. Alpha-blockers as medical expulsive therapy for ureteral stones. Cochrane Database Syst Rev. 2014; : CD008509
  6. Coll, D.M., Varanelli, M.J., and Smith, R.C. Relationship of spontaneous passage of ureteral calculi to stone size and location as revealed by unenhanced helical CT. AJR Am J Roentgenol. 2002; 178: 101–103
  7. Miller, O.F., Kane, C.J. Time to stone passage for observed ureteral calculi: a guide for patient education. J Urol. 1999;162:688–690 (discussion 690-691).

The Case of the Non-inferior Inferiority

Appendectomy-On-USS-Seadragon

The practice of Frequentist statistics is often a study in extremes. Based on an arbitrary threshold of significance, we are asked to interpret data as either positive or negative when in reality it merely shifts our probability of certainty. Even more important, because of the singular nature of Frequentist statistics, our interpretation of data is often constrained to the questions posed by those designing the trial. Although a strict deductive methodology is important to prevent mistaking random chance for scientific proof, it is equally important to understand in which instances abiding by these laws will lead to a misinterpretation and misunderstanding of the data.

Appendicitis has long been considered a surgical emergency. If it is not intervened upon surgically in a timely fashion the pathological sequelae will lead to perforation, sepsis, and death. And yet, despite this foregone conclusion, a number of trials have challenged the necessity of cold steel in the management of acute appendicitis. Most recently, in JAMA, Salminen et al published the findings from their RCT comparing the traditional surgical management of acute appendicitis to conservative treatment with antibiotic therapy alone (1). Despite the authors’ primary conclusion, this trial demonstrated that in patients with non-complicated acute appendicitis, the use of antibiotic therapy is anything but inferior.

Salminen et al randomized 530 patients with CT confirmed non-complicated acute appendicitis to either surgical management using primarily open laparotomy, or a short course of IV antibiotics (3-days of ertapenem), followed by a 7-day course of oral levofloxacin. Of the 273 patients randomized to the surgical group, 272 (99.6%) underwent successful appendectomy. In the patients randomized to conservative therapy 70 patients (27.3%) underwent appendectomy within one year of initial presentation. Lets pause for a moment. A disease process, which for the past century has been considered a surgical necessity, with 72.7% of patients treated successfully with antibiotics alone (1). Despite these impressive numbers the trial was deemed unsuccessful as the rate of “treatment failure” in the conservative group crossed the predetermined non-inferiority margin of 24%. And yet these statistical inadequacies are based less on the inferiority of antibiotic therapy and more on the authors’ unfortunate choice of how exactly they defined “non-inferior”.

Non-inferiority trials are intended to ask a very specific question. Whether a new treatment strategy or medical intervention is comparable to the traditional standard therapy. Rather than examine the two in the hopes of determining superiority, a non-inferiority trial merely attempts to establish this new treatment is no worse than the current standard care. This type of trial is undertaken when the new treatment provides certain advantages that would make it preferable over the old treatment (2,3). For example if it is cheaper, safer, or less invasive one might prefer to use this new treatment rather than expose the patient to the cost, risk, or intrusive nature of the prior strategy. In fact depending on what advantages a new treatment may provide, one might accept some degradation in efficacy as long as it does not cross a predefined threshold for inferiority. This threshold is based upon a number of assumptions. First, what is the proven efficacy of the established standard? Say for example, this standard in previous studies demonstrated an absolute decrease in mortality of 5%. The confidence interval surrounding this point estimate ranges from 3%-7%. You would not want your new intervention to be 3% less effective than the standard comparator, in which case it would prove to be as beneficial as placebo. Second, what added benefits does this new therapy provide? If these advantages are impressive, then you may accept a greater degree of inferiority when compared to the standard treatment strategy (a lower non-inferiority margin). On the other hand, if this new treatment provided few novel advantages, you would likely accept far less deviation from the standard treatment’s efficacy.

Salminen et al utilized neither of these considerations when calculating their non-inferiority margin. In fairness to the authors, it would be exceedingly difficult to accurately access the true efficacy of surgery over placebo as this standard of care was established long before placebo control trials were utilized to define treatment effect. Where the authors did falter was the manner in which they determined their non-inferiority margin and performed their power calculation. Using data from prior studies examining the efficacy of antibiotic therapy in acute appendicitis, the authors estimated a 25% rate of treatment failure (defined as need for surgical intervention within one year of initial presentation) in the patients randomized to conservative treatment (1). Using this estimate they set their non-inferiority margin at no more than 24% treatment failure in patients randomized to antibiotic therapy, essentially dooming their trial from its earliest power calculations.

Non-inferiority trials ask a different question than the traditional superiority trials that we are more accustomed. Rather than presenting a null hypothesis that states there is no difference between the groups, the non-inferiority trial design operates under the assumption that the novel intervention is inferior to the standard treatment. The alternative hypothesis states that the treatment options are equivalent. In order to reject the null hypothesis the novel treatment must demonstrate a near equivalent efficacy within a degree of certainty. This means that both the point estimate and surrounding confidenceScreen Shot 2015-07-09 at 1.29.15 PMintervals must fall above the non-inferiority margin (2,3). In this case, despite all prior evidence demonstrating the contrary, the authors estimated that 275 patients per group would provide a 90% power to demonstrate the non-inferiority of conservative management for acute appendicitis when compared to the more traditional surgical intervention. Essentially this translates into the non-surgical group having to demonstrate a point estimate of approximately 20% treatment failure within one year for the lower end of the confidence interval not to cross their predefined non-inferiority margin. Further hampering their efforts, the authors halted the trial early after enrolling only 530 patients (rather than the 610 planned in the original power calculation), increasing the already wide confidence interval surrounding their point estimate (1).

It should have come as no surprise that the authors failed to demonstrate non-inferiority by their designated definition. The authors found that 27% of patients randomized to antibiotic therapy required an appendectomy within 1-year of initial presentation. The 95%-confidence interval surrounding this point estimate was 22.0% to 33.2% (1). In the two trials in which they used to justify their non-inferiority margin of 24%, the 1-year failure rate in patients treated with antibiotics was cited as 24% and 23.6% respectively (4,5). Unfortunately in the latter of these to trials by Hannson et al, this failure rate was calculated from the per-protocol analysis rather than the intention to treat analysis. In reality the antibiotic group had a 47.5% crossover rate to surgery. The overall failure rate in the intention-to-treat analysis was 60% (5). In an additional trial by Vons et al, published in the Lancet in 2011, the 1-year appendectomy rate was 37%. The 95%-confidence interval around this point estimate ranged form 28.36% to 45.64% (6). The 2011 Cochrane analysis after examining the 5 existing RCT trials found 26.6% (95%-confidence interval 18.1%- 37.3%) of the patients randomized to antibiotic therapy went on to have an appendectomy within 1-year of initial presentation (7). Given that the previous evidence indicates that the rate of antibiotic failure has consistently been greater than 25% and has ranged as high as 60%, the expectation by Salminen et al that they would find non-inferiority of antibiotic therapy with a non-inferiority margin of 24% was optimistic to say the least.

More importantly was appendectomy rate at 1-year truly the most appropriate criteria with which to define inferiority? This trial was not negative because medical management proved to be inferior to surgical appendectomy, rather it was negative because the authors asked the wrong question. As clinicians what is our concern with the medical management of acute appendicitis? It is not whether 20% or 27% of those initially treated with antibiotics will eventually require an appendectomy, but rather does medical therapy lead to an unacceptably high rate of serious complications? In fact if we were to be completely equitable, while 99.6% of the patients in the surgical arm of this trial underwent appendectomies, only 27% of the patients in the medical management arm were exposed to an invasive procedure. The question the authors should have asked was, “How many patients in each arm experienced resolution of symptoms related to acute appendicitis without experiencing acute complications related to delays in treatment (perforation, abscesses, sepsis, etc)?” If the authors had asked this question their answer would have been entirely different. Among the patients randomized to medical management, of the 257 patients, 15 (5.8%) required appendectomy during their initial hospital admission. Only 5 (1.9%) patients in the antibiotic group experienced perforations requiring surgical intervention, compared to 2 out of 273 (0.7%) patients randomized to an immediate surgical intervention (1). Essentially you would have to treat 100 patients with non-complicated acute appendicitis in order to prevent one perforation.

Certainly there is a great deal to be determined before this non-invasive strategy can be considered mainstream practice. This was a small underpowered cohort in which the participating surgeons performed primarily open laparotomies. How this strategy translates to the US where the primary approach to appendectomies is laproscopic intervention, is unclear. Additionally, whether patients require 3 days of broadspectrum IV therapy followed by a 7-day course of oral therapy is unknown. What seems obvious is in what was once considered an exclusively surgical disease, the majority of patients can effectively be managed conservatively. Despite not meeting their own high standards for non-inferiority, the authors demonstrated that for most patients with acute appendicitis, when treated conservatively with antibiotics we can avoid surgical intervention without complications of delays to definitive care. To define such a revelation as inferior is unjust indeed.

Sources Cited:

  1. Salminen P, Paajanen H, Rautio T, et al. Antibiotic Therapy vs Appendectomy for Treatment of Uncomplicated Acute Appendicitis: The APPAC Randomized Clinical Trial. JAMA. 2015;313(23):2340
  2. Kaji AH, Lewis RJ. Noninferiority Trials: Is a New Treatment Almost as Effective as Another?. JAMA. 2015;313(23):2371-2.
  3. Kaul S, Diamond GA. Good Enough: A Primer on the Analysis and Interpretation of Noninferiority Trials. Ann Intern Med. 2006;145:62-69
  4. StyrudJ,ErikssonS,NilssonI,etal. Appendectomy versus antibiotic treatment in acute appendicitis: a prospective multicenter randomized controlled trial. World J Surg. 2006;30(6):1033-1037.
  5. HanssonJ,KörnerU,Khorram-ManeshA, Solberg A, Lundholm K. Randomized clinical trial of antibiotic therapy versus appendicectomy as primary treatment of acute appendicitis in unselected patients. Br J Surg. 2009;96(5):473-481.
  6. VonsC,BarryC,MaitreS,etal.Amoxicillinplus clavulanic acid versus appendicectomy for treatment of acute uncomplicated appendicitis: an open-label, non-inferiority, randomised controlled trial. Lancet. 2011;377(9777):1573-1579.
  7. Wilms IM, De hoog DE, De visser DC, Janzing HM. Appendectomy versus antibiotic treatment for acute appendicitis. Cochrane Database Syst Rev. 2011;(11):CD008359.

 

The Case of the Irregular Irregularity

Attachment-1

We have proven ourselves highly capable of managing atrial fibrillation in the Emergency Department. In recent years, a number of prospective cohorts have demonstrated that with the use of IV anti-arrhythmic medication and electrical cardioversion, patients presenting to the Emergency Department with new onset atrial fibrillation can be successfully discharged in sinus rhythm consistently and with minimal adverse events. In 2010, Steill et al published a case series of 660 patients who were cardioverted in the Emergency Department (1). What they coined the “Ottawa Aggressive Protocol” consisted of chemically managed rate control followed by a trial of procainamide loaded over an hour and, if this failed to convert the patient, DC electrical cardioversion. Using this protocol, Steill et al cite the number of patients who were discharged home in normal sinus rhythm to be 595 (90.2%). In a recent systematic review published in the European Journal of Emergency Medicine, Coll-Vinent et al found that in patients who underwent Emergency Department cardioversion, 78.2%-100% were discharged home in a normal sinus rhythm (2).

But competency is not directly translatable into efficacy. Despite this proof of concept, there is limited data examining the patient-oriented benefits these aggressive rhythm control strategies produce. In fact, the majority of such studies employ the “rhythm at Emergency Department discharge” as their measure of success. And though being discharged from the Emergency Department in a sinus rhythm seems preferential over atrial fibrillation, little is known regarding the extent of this benefit, as very few trials rigorously monitored patients following discharge from the Emergency Department. How many of these patients remained in a sinus rhythm and for how long? Steill et al found that only 8.6% of their cohort returned to the Emergency Department within one week of cardioversion with any reoccurrence of atrial fibrillation. Unfortunately these numbers were calculated from a chart extraction of the Ottawa Hospital health records database and do not directly reflect the number of patients who experienced atrial fibrillation over the 7 days following Emergency Department discharge (1). Decker et al, in a small cohort of 150 patients, cite a recurrence rate of 10% at 6 months (3). What is the true recurrence rate? Even more importantly, does reestablishing sinus conduction lead to improved patient health and wellbeing?

The question at hand remains, what exactly are we achieving by performing cardioversions in the Emergency Department? We have known for some time that despite being capable of maintaining patients in a sinus rhythm with moderate success, an aggressive rhythm control strategy does not prevent the long term sequelae associated with atrial fibrillation. The AFFIRM trial published in the NEJM in 2002, demonstrated that in a cohort of 4060 patients with atrial fibrillation, although the use of a rhythm control strategy reduced the time patients spent in atrial fibrillation, it did not reduce the rate of death, MI or ischemic stroke (4). When the 1391 patients experiencing their first episode of atrial fibrillation or the 1252 patients presenting within 48 hours of symptom onset were examined separately, no additional benefit was discovered (4). Since the AFFIRM trial’s publication a number of studies, performed in various subsets of atrial fibrillation patients, have validated that rhythm control strategies do not prevent the long-term sequelae associated with this chronic disease (5,6)

Since rate control is the preferred long-term treatment strategy of atrial fibrillation, what exactly are our goals for cardioversion in the Emergency Department? Is there a long-term health benefit to aggressive rhythm control in the Emergency Department? Does this lead to noticeable improvements in patient outcomes? Unfortunately conclusive data on these questions has yet to be published. The few RCTs examining the benefits of aggressive management of atrial fibrillation in the Emergency Department are small and inconclusive. Despite this paucity of convincing evidence, I would argue that the mathematical likelihood of benefit is incredibly low. Atrial fibrillation is a chronic disease, with sequelae measured in events per patient year. The rate of short-term adverse events is exceedingly low, with some cohorts citing a 30-day event rate of less than 1% (7). To design a study powered to identify a statistically meaningful difference, the sample size required would be unrealistically high. Especially given that the long-term utilization of such rhythm control strategies have not yielded clinically important improvement in patient outcomes. Furthermore the act of emergent cardioversion, does not avert the need for anticoagulation, as this decision should be based off the patient’s risk of thromboembolic event independent of their rhythm at discharge (8).

If we can agree that the clinical benefits of aggressive cardioversion in the Emergency Department are minimal, then the only remaining justification for Emergency Department cardioversion are the positive effects on patient wellbeing and comfort. The current argument in support of Emergency Department cardioversion hinges on the supposition that a state of sinus regularity is preferred when compared to the electrical chaos induced by atrial fibrillation (9). Until recently this claim has been exclusively supported by anecdotal descriptions of patient experience, its validity had never been examined in a prospective fashion.

Published online June 2015 in the Annals of Emergency Medicine, Ballard et al sought to objectively assess the effects of Emergency Department cardioversion on patients’ wellbeing and comfort (10). The authors surveyed 730 patients who were treated for new onset atrial fibrillation and discharged from one of 21 medical centers in Northern California. Of this cohort, 652(89%) responded to a structured phone survey. Though the data was prospectively gathered, these patients were not randomized to either a rate or rhythm control strategy, but rather the manner of treatment was left entirely to the judgment of the treating physician. Of the 652 respondents the majority, 432 (67.3%) were managed with rate control therapy alone. Regardless of management strategy, 410 (62.9%) of the patients were discharged from the Emergency Department in a sinus rhythm. Among those patients who underwent electrical cardioversion, 92.2% were in sinus rhythm upon discharge. If you consider discharge rhythm as a metric of success than electrical cardioversion was a far more accomplished strategy than either pharmacological cardioversion or rate control therapy alone, which accounted for 81.6% and 49.7% of patients in a sinus rhythm respectively at discharge (10). Despite its obvious superiority in rhythmic control, what benefits does cardioversion provide for patients’ symptom burden at 30-days?

The authors measured 30-day wellbeing using the Atrial Fibrillation Effect on Quality-of-life (AFEQT) score. This 18-question tool was intended to assess the patients’ perception of the burden of disease. The surveys were administered via telephone performed by trained research assistants at least 28-days following Emergency Department visit. Overall 539 patients (82.7%) reported some degree of symptom burden related to their atrial fibrillation upon discharge. The use of cardioversion did not decrease the rate or degree of symptom burden at 30-days. When the authors analyzed the AFEQT scores in quartiles of severity rather than the dichotomous symptom/no symptom outcome, they found no additional benefit to Emergency Department cardioversion. Certainly this data is far from perfect. This was a non-randomized cohort and it is unclear how well the AFEQT score captures symptom burden (10). Despite these shortcomings, findings are consistent with the body of literature examining whether an aggressive rhythm control strategy approves patient wellbeing. A number of trials have examined the long-term benefits rhythm control has on reducing symptom burden. These trials have consistently demonstrated that when compared to rate control alone, an aggressive rhythm control strategy provided no additional perceivable benefit to patients’ wellbeing and comfort (11).

The act of electrical cardioversion within 48 hours of symptom onset is commonly perceived as a safe practice. In a recent review of the existing literature, Cohen et al found that out of 1593 patients, only one (0.06%) stroke was reported. Despite this cursory endorsement, I would caution that safety is measured in the thousands and the current data is far too limited and ripe with publication bias to truly assess safety. Additionally a recent research letter published in JAMA called into question the safety of the 48-hour window we have traditionally used to determine suitability for Emergency Department cardioversion. Nuotio et al published a secondary analysis of the FinV trial registry which examined 2481 patients in atrial fibrillation who underwent electrical cardioversion within 48-hours of symptom onset. In this cohort the risk of ischemic event increased significantly (0.03% to 1.1%) when time to symptom onset was greater than 12 hours. And although 1.1% is still a relatively low event rate, given the absence of any clear clinical benefit, the benefit-harm ratio does not favor an aggressive rhythm control strategy (12).

Modern medicine far too often values competency over efficacy. Whether it is door to balloon time, or the 6-hour sepsis bundle, we are constantly measured in surrogates thought to be associated with improvements in patient outcomes. The quality of our care has been distilled down to what can be marked as complete on a checklist. Although the evidence clearly demonstrates Emergency Physicians are capable of effectively cardioverting new onset atrial fibrillation in the Emergency Department, one cannot help but asking, to what end?

Sources Cited:

  1. Stiell, I.G., Clement, C.M., Perry, J.J. et al. Association of the Ottawa Aggressive Protocol with rapid discharge of emergency department patients with recent-onset atrial fibrillation or flutter. CJEM. 2010; 12: 181–191
  2. Coll-Vinent, B., Fuenzalida, C., Garcia, A. et al. Management of acute atrial fibrillation in the emergency department: a systematic review of recent studies. Eur J Emerg Med. 2013; 20: 151–159
  3. Decker, et al. A Prospective, Randomized Trial of an Emergency Department Observation Unit for Acute Onset Atrial Fibrillation.  Annals of Emergency Medicine, 2007.
  4. Wyse DG, Waldo AL, Dimarco JP, et al. A comparison of rate control and rhythm control in patients with atrial fibrillation. N Engl J Med. 2002;347(23):1825-33.
  5. Van gelder IC, Hagens VE, Bosker HA, et al. A comparison of rate control and rhythm control in patients with recurrent persistent atrial fibrillation. N Engl J Med. 2002;347(23):1834-40.
  6. Roy D, Talajic M, Nattel S, et al. Rhythm control versus rate control for atrial fibrillation and heart failure. N Engl J Med. 2008;358(25):2667-77.
  7. Scheuermeyer FX, Grafstein E, Stenstrom R, et al. Thirty-day and 1-year outcomes of emergency department patients with atrial fibrillation and no acute underlying medical cause. Ann Emerg Med. 2012;60(6):755-765.e2.
  8. Wang TJ, Massaro JM, Levy D, et al. A risk score for pre- dicting stroke or death in individuals with new-onset atrial fibrillation in the community — the Framingham Heart Study. JAMA 2003;290:1049-56
  9. Stiell IG, Birnie D. Management of recent-onset atrial fibrillation in the emergency department. Ann Emerg Med. 2011; 57:31–2.
  10. Ballard, DW. et al. Emergency Department Management of Atrial Fibrillation and Flutter and Patient Quality of Life at One Month Postvisit. Annals of Emergency Medicine
  11. Thrall G, Lane D, Carroll D, Lip GY. Quality of life in patients with atrial fibrillation: a systematic review. Am J Med. 2006;119(5):448.e1-19.
  12. Nuotio I, Hartikainen JE, Grönberg T, Biancari F, Airaksinen KE. Time to cardioversion for acute atrial fibrillation and thromboembolic complications. JAMA. 2014;312(6):647-9.

The Problem of Thor Bridge

Drager-Pulmotor

Disclosure: This post is unusually full of hearsay and conjecture. Like a secondary endpoint that flirts with statistical significance it should be viewed purely as hypothesis generating. For a more reasoned and experienced view of the following data please read Josh Farkas’s wonderful post on pulmcrit.org.

Damage control ventilation is not a novel concept. It functions under the premise that positive-pressure ventilation intrinsically possesses few curative properties and rather acts as a bridge until a more suitable state of ventilatory well-being can be achieved. As such, we should view its utilization as a necessary evil and endeavor not to correct the patient’s pathological perturbations but rather limit its iatrogenic harms. Since the publication of the ARDSNet protocol in 2000 we have known that striving to achieve physiological normality leads to greater parenchymal injury and downstream mortality (1). Later research demonstrated that even in patients without fulminant ARDS, a protective lung strategy is beneficial (2). Understandably we are reticent to initiate mechanical ventilation unless absolutely necessary. Because of its abilities to delay and even prevent more invasive forms of ventilatory support, non-invasive ventilation (NIV) has long been the darling of the emergent management of most respiratory complaints. It is a rare respiratory ailment that cannot be remedied with a tincture of positive-pressure ventilatory support delivered via a form-fitting face mask. Its widespread implementation is primarily borne from NIV’s capacity to provide a bridge to a more definitive form of therapeutic support. Due in part to NIV’s ability to decrease the rate of intubation in patients presenting with COPD and CHF exacerbations, it is more readily  being utilized in a subgroup of patients where a definitive destination is far less assured, a group of patients where the cause of their current dyspnea is not so readily correctable. A bridge, if you permit me a moment of sensationalism, to nowhere…

Although the efficacy for the use of NIV in COPD exacerbations and acute cardiogenic pulmonary edema are well documented (3,4,5,6,7), the evidence for its use in managing other forms of hypoxic failure, such as pneumonia and ARDS, is far less robust. In fact there is some less than perfect evidence demonstrating that in these populations, NIV fails to prevent intubation and in this subset of patients, who are unsuccessful in their trial of non-invasive ventilatory support, the mortality is higher than in those patients who were initially intubated (8,9). And so the authors of the “Clinical Effect of the Association of Non-invasive Ventilation and High Flow Nasal Oxygen Therapy in Resuscitation of Patients with Acute Lung Injury (FLORALI)” trial hoped to examine whether NIV was superior to standard face mask oxygenation therapy in patients with acute hypoxic respiratory failure (10). Frat et al examined two forms of non-invasive ventilatory strategies in patients admitted to the ICU with non-hypercapneic, non-cardiogenic hypoxic respiratory failure. The first was the traditional bi-level positive pressure ventilation, more commonly known as BPAP. The second was high-flow (50 L/min) humidified oxygen delivered via nasal cannula. Using a 1:1:1 ratio the author’s randomized 313 patients too either BPAP, high-flow NC or standard 2270840_origfacemask support. The authors enrolled a relatively sick spectrum of patients. In order to be enrolled patients were required to have a respiratory rate of more than 25 breaths per minute, a PaO2/FiO2 of 300 mg Hg or less while on 10 L of supplementary O2, have a PaCO2 of no higher than 45 mm Hg with no history of underlying chronic respiratory disease. Additionally patients were excluded if they presented with an exacerbation of asthma or COPD, cardiogenic pulmonary edema, severe neutropenia, hemodynamic instability, use of vasopressors, a GCS of 12 or less, any contraindication to non-invasive ventilation, an urgent need for intubation or DNI orders. Given these stringent inclusion and exclusion criteria it is no surprise that out of the 2506 patients to present to one of the 23 participating ICUs, only 525 met the criteria for inclusion. Of these 313 underwent randomization and 310 were included in the final analysis (10).

The cause of hypoxia in the vast majority (75.5%) of these patients was due to pneumonia. The authors’ primary endpoint was the number of patients in each group who underwent endotracheal intubation within 28-days of enrollment. Although the authors found no statistical difference in the rate of intubation between the three groups, it is difficult not to infer a clinically important difference that was statistically overlooked due to the limited power generated by an n of 310. The 28-day intubation rate in the high-flow O2 group was 37% compared to 47% and 50% in the face-mask and BPAP groups respectively (an absolute difference of 10% and 13% respectively). When the more severely hypoxic patients were examined (those with a PaO2/FiO2 < 200), this absolute difference increased to 18% and 23% respectively. Additionally patients randomized to high-flow O2 had lower mortality rates, compared to either the facemask or BPAP groups. ICU mortality was 11%, 19% and 25% respectively and 90-mortality was 12%, 23%, and 28% respectively. In the patients with a more pronounced hypoxia these differences in mortality became even more pronounced. In patients with an PaO2/FiO2 < 200 the ICU mortality was 12%, 21.6% and 28.4%, while the 90-day mortality was 13.2%, 27.0% and 32.1%. Although the primary endpoint of this trial was negative (p= 0.18), there is a clear and consistent improvement in outcomes of patients randomized to high-flow O2 compared to the other two non-invasive strategies (10).

This trial is nowhere near perfect. The sample size is far too small to confidently rule out statistical whimsy’s causal responsibility for these findings.  Additionally it is difficult to discern whether high-flow O2 was beneficial in this subgroup of patients or rather BPAP was deleterious. Most importantly it fails address the question of primary concern for the Emergency Physician. Is non-invasive ventilation preferable to early endotracheal intubation? Frey et al compared high-flow O2 and BPAP therapy to standard face-mask oxygenation, which does not help us differentiate whether NIV is superior to early invasive ventilator support. Furthermore this trial examines the use of NIV in ICU patients over prolonged periods (median time to intubation was 17-27 hours), it does not tell us whether the use of BPAP is detrimental while patients are managed in the Emergency Department. Given these shortcomings how should we view these data?

Technically from a Frequentist’s viewpoint these statistically significant secondary endpoints are just hypothesis building and additional studies are required to validate these preliminary findings. But what if for a moment, we were to take a Bayesian perspective and examine this very same paper from an alternative vantage? How then would this data appear? Bayesian statistics takes an inductive perspective when examining data. Simply put it asks how does this data affect the prior scientific belief? Given the data presented in this trial, what is the most probable hypothesis that explains these results (12)? How do these results change the current scientific belief that was held prior to this study being conducted? Alternatively, when using Frequentist statistics we employ deductive methodology to address one question and utilize a predetermined statistical threshold to either accept or reject the null-hypothesis. All other questions examined in the paper are essentially exploratory and, due to the single minded nature of the p-value, are simply hypothesis generating (11).

Examining the data published by Frat et al, one would conclude the most probable hypothesis that would explain these events is:

In patients with non-hypercapnic, non-cardiogenic, hypoxic respiratory failure high-flow oxygen therapy decreases both mortality and the rate of intubation when compared to face-mask oxygenation. Additionally the use of BPAP does not decrease the rate of intubation and may in fact increase mortality in a subset of the sickest patients.

How does this effect the prior scientific belief of the efficacy of NIV in patients with hypoxic respiratory failure? Frat et al certainly supports the prior evidence demonstrating that BPAP therapy is detrimental in this subset of patients with hypoxic respiratory failure. In fact the rate of endotracheal intubation (50%) is essentially identical to rates cited in prior cohorts (8). It also highlights that these negative effects may in fact be due to the therapy itself rather than the delay to definitive airway management as was previously hypothesized. Though there was a non-significant increase in the median time to intubation in the BPAP group compared to patients receiving face-mask therapy alone, the time to intubation between the BPAP and high-flow O2 groups were identical. And yet despite these minimal differences in time to intubation, the patients who underwent intubation in the BPAP group had an increased mortality when compared to those randomized to either face-mask and high-flow oxygen (10). Patients in the BPAP group, with the help of positive pressure, achieved average tidal volumes of 9cc/kg. As the ARDSNET trial group demonstrated when administering positive pressure ventilation, a lung protective strategy, tidal volumes of 6cc/kg, led to significant improvement in outcomes in patients with ARDS (1). Determann et al demonstrated that even in patients without ARDS, lung protective strategies led to improved outcomes when compared to more traditional physiological lung volumes (2). Until now we have cognitively absolved positive pressure delivered in a non-invasive form as a causative agent of such complications. The findings of Frat et al have, for the first time, cast a shadow of doubt on the innocence of NIV.

As far as the spectacular results demonstrated by the high-flow O2 group, given the size of the population studied and a paucity of previous science with which to compare, it is hard to know how much credence to place in these results. What is clear is we should no longer view high-flow O2 as a substandard option, reserved only for patients who have failed to tolerate the more traditional forms of NIV. Rather high-flow O2 may provide a unique form of respiratory support that is not accounted for by our prior understanding of NIV (10).

We have known for some time that the use of positive pressure ventilation is the result of being forced to choose between the lesser of two evils. Although it provides a means of ventilatory support, it itself possesses little inherent therapeutic benefits. In fact, positive-pressure ventilation comes at the cost of hemodynamic compromise, iatrogenic lung injury, nosocomial infections, and sedation protocols that leave the patients confused and delirious.  As such, a damage control strategy is typically employed to limit these downstream harms until the patients own ventilatory capacity has returned. Until now these strategies have been limited to invasive forms of ventilatory support. The Frat et al data suggests that, to some degree, non-invasive ventilatory support may be associated with similar iatrogenic harms. Although the current data is incomplete, it should remind us that if we intend to construct a bridge, we should have some understanding of where this intended conduit will lead and if this is a healthier destination then where we started.

Sources Cited:

1.         Ventilation with lower tidal volumes as compared with traditional tidal volumes for acute lung injury and the acute respiratory distress syndrome: the Acute Respiratory Distress Syndrome Network. N Engl J Med 2000;342:1301‒8.

2.         Determann RM, Royakkers A, Wolthuis EK, et al. Ventilation with lower tidal volumes as compared with conventional tidal volumes for patients without acute lung injury: a preventive randomized controlled trial. Crit Care 2010;14(1):R1.

3.         Brochard L, Mancebo J, Wysocki M, et al. Noninvasive ventilation for acute exacer- bations of chronic obstructive pulmonary disease. N Engl J Med 1995;333:817-22.

4.         Keenan SP, Sinuff T, Cook DJ, Hill NS. Which patients with acute exacerbation of chronic obstructive pulmonary disease ben- efit from noninvasive positive-pressure ventilation? A systematic review of the lit- erature. Ann Intern Med 2003;138:861-70.

5.         Lightowler JV, Wedzicha JA, Elliott MW, Ram FS. Non-invasive positive pres- sure ventilation to treat respiratory failure resulting from exacerbations of chronic obstructive pulmonary disease: Cochrane systematic review and meta-analysis. BMJ 2003;326:185.

6.         Masip J, Roque M, Sánchez B, Fernán- dez R, Subirana M, Expósito JA. Noninva- sive ventilation in acute cardiogenic pul- monary edema: systematic review and meta-analysis. JAMA 2005;294:3124-30.

7.         Gray A, Goodacre S, Newby DE, et al. Noninvasive ventilation in acute cardiogenic pulmonary edema. N Engl J Med. 2008;359(2):142-51.

8.         Carrillo A, Gonzalez-diaz G, Ferrer M, et al. Non-invasive ventilation in community-acquired pneumonia and severe acute respiratory failure. Intensive Care Med. 2012;38(3):458-66..

9.         Delclaux C, L’Her E, Alberti C, et al. Treatment of acute hypoxemic nonhyper- capnic respiratory insufficiency with con- tinuous positive airway pressure delivered by a face mask: a randomized controlled trial. JAMA 2000;284:2352-60.

10.      Frat JP, Thille AW, Mercat A, et al. High-Flow Oxygen through Nasal Cannula in Acute Hypoxemic Respiratory Failure. N Engl J Med. 2015;

11.      Goodman SN. Toward evidence-based medical statistics. 1: The P value fallacy. Ann Intern Med. 1999;130(12):995-1004.

12.      Goodman SN. Toward evidence-based medical statistics. 2: The Bayes factor. Ann Intern Med. 1999;130(12):1005-13.

The Third Annotation of a Case of Identity

gamblers-in-monte-carlo

So often in modern medicine we mistake science for truth. In doing so we have become enamored with the p-value and view it as the major determinant of relevance in scientific inquiry. An almost arbitrary selected value of 0.05 is independently responsible for defining what is considered beneficial, and what will be discarded as medical quackery. The p-value was first proposed by Ronald Fisher as a novel method of defining the probability that the results observed had occurred by chance alone. Or stated more formally, “the probability, under the assumption of no effect or no difference (the null hypothesis), of obtaining a result equal to or more extreme than what was actually observed” (1). Originally intended as a tool for clinicians to assess whether the results from a trial were due to the treatment effect in question or merely random chance, its meaning has transformed into something far more divine. Despite its overwhelming acceptance, the p-value has many flaws. It is incapable of distinguishing clinical relevance, rather only denotes the probability of equivalence. In addition, its faculties are easily overwhelmed when multiple observations are performed. Finally, the mathematical assumptions it is built upon do not take into account prior evidence and provide no guidance for future endeavors (1).

Our romance with the p-value has not gone unnoticed by many pharmaceutical companies who have learned that they need not manufacture a drug that produces clinical benefits, but rather fabricate a trial that demonstrates statistical significance. Since the p-value does not take into account prior evidence, trialists are not required to justify results as they relate to the entirety of an evidentiary body but rather demonstrate singular mathematical significance in a statistical vacuum. As such we are asked to live in the evidentiary present with only selective access to past knowledge. Even when we are granted a privileged glimpse at results from prior trials, it is often comprised of incomplete and limited data intended to sway our opinion in a deliberate manner. This phenomenon, known as publication bias, allows pharmaceutical companies to preferentially publish trials with p-values that suit their interests while suppressing others that do not support their claims. By prospectively highlighting a would-be therapy’s more flattering features and bullying Frequentist statistics with sample sizes that would make even negligible differences significant, it is easy to snatch statistical victory from the grasp of clinical obscurity. This is likely what the makers of ticagrelor hoped for when they designed the PEGASUS Trial.

PEGASUS Trial’s intentions were to extend ticagrelor’s temporal indication beyond the 12-month window, testing the hypothesis that long-term therapy of ticagrelor in conjunction with low-dose aspirin reduces the risk of major adverse cardiovascular events among stable patients with a history of myocardial infarction. Bonaca et al randomized 21,162 patients who experienced a myocardial infarction within the past 1-3 years to either 90 mg or 60 mg of ticagrelor twice daily or placebo. This is not the first time such a hypothesis has been investigated (2). Multiple trials have studied whether prolonged use of P2Y12 inhibitors possess any value other than augmenting the pharmaceutical industries’ coffers. The largest of these investigations, the DAPT trial, was published in 2014 by Mauri et al in NEJM (3). This trial examined patients 12 months after a cardiovascular event and considered whether the continuation of either clopidogrel or prasugrel was beneficial. The authors randomized 9,961 patients to either a P2Y12 inhibitor or an appropriate placebo. The DAPT Trial demonstrated that prolonged use of dual-antiplatelet therapy decreased the rate of cardiovascular events (4.3% vs. 5.9%) and stent restenosis (0.4% vs 1.4%) in exchange for an increased rate of severe bleeding (2.5% vs. 1.6%). There was also a small increase in overall mortality (2% vs 1.5%) in patients randomized to prolonged P2Y12 inhibition (3). Multiple recent meta-analyses confirm these findings (4,5). These results should come as no surprise as the bulk of the literature examining P2Y12 inhibitors has highlighted their benefit primarily as a means of reducing type 4a peri-procedural infarctions of questionable clinical relevance. And so this was the landscape AstraZeneca faced when designing the PEGASUS Trial. Every prior trial examining the question of prolonged dual-antiplatelet therapy has demonstrated that the small reductions in ischemic endpoints are easily overshadowed by the excessive increase in the rate of severe bleeding events. Fortunately in the modern era of Frequentist statistics none of these failures matter. Because the p-value does not account for prior evidence, the authors of the PEGASUS Trial did not have to account for this less-than-stellar history. Success by modern standards is simply the ability to contrive a primary endpoint that will demonstrate an appreciably low enough p-value to be considered significant.

Bonaca et al’s primary outcome was the composite rate of cardiovascular death, MI and stroke over the follow up period (3-years). The absolute rate of primary events were 7.85%, 7.77%, and 9.02% in the 90 mg, 60 mg and placebo groups respectively. This small (approximately 1.20% absolute difference) was found to be impressively statistically significant (p-values of 0.008 and 0.004 in the 90 mg vs placebo and 60 mg vs placebo comparisons respectively). Its clinical significance is far more questionable, and unlike its statistical counterpart cannot be bullied by the mass and size of the sample population. The effect size of this composite outpoint is diminutively small. The effect sizes of each respective component of this composite outcome are even smaller. The only measure that maintained its statistical significance consistently across all treatment comparisons was the reduction in myocardial infarction, which boasts a 0.85% and 0.72% absolute reduction in the 90 mg and 60 mg groups respectively.

Conversely the rates of bleeding in the patients randomized to receive the active agent were impressively high, especially given the previous studies examining ticagrelor demonstrated a more reasonable safety profile. The rate of TIMI major bleeding was 2.6%, 2.3% and 1.06% in the 90 mg, 60 mg and placebo groups respectively. Since both the rate of intracranial hemorrhage and fatal hemorrhage were statistically similar, most of this excess bleeding seems to be in the form of “clinically overt hemorrhage associated with a drop in hemoglobin of ≥5 g/dL or a ≥15% absolute decrease in hematocrit.” (2) These results are not too dissimilar from those of the DAPT Trial(3). Patients taking P2Y12 inhibitors will benefit from a slight decrease in the risk of non-fatal myocardial infarctions and stent restenosis while experiencing an increased risk of clinically significant bleeding.

Despite the positive spin of this trial, it is far from a success. The investigators enrolled the more infirmed spectrum of patients with CAD, so as to include a cohort more likely to benefit from additional anti-platelet inhibition. They also excluded the patients most at risk for hemorrhagic complications so as to limit the appearance of adversity. Investigators excluded patients with a history of ischemic stroke or intracranial hemorrhage, a central nervous system tumor, an intracranial vascular abnormality, with a history of gastrointestinal bleeding within the previous 6 months or major surgery within the previous 30 days (2). This of course in itself is not a concern, was it not for the likely application of prolonged dual-antiplatelet therapy to a far broader patient population.

Our current version of evidence-based medicine has left us susceptible to mistaking mathematical manipulations as scientific truth. It is short sighted and allows for the linguistic error of misinterpreting statistical significance for clinical relevance. The PEGASUS Trial boasts p-values far below what is traditionally considered significant, and yet p-values below 0.05 hold little intrinsic value to our patients’ well being. Yes, from a Frequentist’s perspective we are capable of concluding with relative certainty that the use of ticagrelor decreases the composite endpoint of myocardial death, MI, or stroke. The clinical relevance of which, is far from certain as its weight is powered exclusively by a decrease in myocardial infarctions. It is unlikely this small benefit is worth the impressive increase in serious hemorrhagic events. From the very earliest trials examining P2Y12 inhibitors, their benefits have been primarily due to the manipulation of statistical constructs rather than any inherent efficacy (6,7). The PEGASUS Trial is no different. These trials are not landmark demonstrations of P2Y12 inhibitors’ benefits, but rather statistical manipulations of clinical insignificant differences stacked one on top of the other to give the appearance of height when none is present. It is the statistical equivalent of an eyespot meant to keep the scorn of the medical skeptics at bay. Know that we are not scared or confused by your statistical mimicry. We see these trials for what they are, pharmaceutical advertisements poorly hidden behind the guise of scientific inquiry.

Sources Cited:

  1. Goodman SN. Toward evidence-based medical statistics. 1: The P value fallacy. Ann Intern Med. 1999;130(12):995-1004.
  1. Bonaca MP, Bhatt DL, Cohen M, et al. Long-Term Use of Ticagrelor in Patients with Prior Myocardial Infarction. N Engl J Med. 2015
  1. Mauri L, Kereiakes DJ, Yeh RW, et al. Twelve or 30 months of dual antiplatelet therapy after drug-eluting stents. N Engl J Med. 2014;371(23):2155-66.
  1. Palmerini T, Sangiorgi D, Valgimigli M, et al. Short- versus long-term dual antiplatelet therapy after drug-eluting stent implantation: an individual patient data pairwise and network meta-analysis. J Am Coll Cardiol. 2015;65(11):1092-102.
  1. Giustino G, Baber U, Sartori S, et al. Duration of Dual Antiplatelet Therapy After Drug-Eluting Stent Implantation: A Systematic Review and Meta-Analysis of Randomized Controlled Trials. J Am Coll Cardiol. 2015;65:(13)1298-310
  2. Yusuf S, Zhao F, Mehta SR, et al. Effects of clopidogrel in addition to aspirin in patients with acute coronary syndromes without ST-segment elevation. N Engl J Med. 2001;345(7):494-502.
  3. A randomised, blinded, trial of clopidogrel versus aspirin in patients at risk of ischaemic events (CAPRIE). CAPRIE Steering Committee. Lancet. 1996;348(9038):1329-39.

 

A Truncated Summation of the Adventure of the Cardboard Box

 

SL28408

One gets the sense when reading the literature on endovascular therapy for acute ischemic stroke that they are on a small seafaring vessel attempting to map the shoreline through a dense fog. There are moments when the fog lifts and you catch a glimpse of the topographic details of the shore, and then the cloud again rolls in obscuring any further ascertainment. Similarly the recent publications of endovascular therapy for acute ischemic stroke have demonstrated there is a definitive benefit to mechanical reperfusion therapy, and yet each publication in itself is so incomplete, it is difficult to perceive anything more than this general appearance of benefit. The finer details are obscured by the premature truncation of trials, too early to definitively characterize the benefits and risks of endovascular therapy.

MR CLEAN, published earlier this year in the NEJM, and discussed ad nauseam in previous posts, marked the first of what is now a litany of trials demonstrating benefit for endovascular therapy in acute ischemic stroke (1). Its release resulted in the subsequent premature stoppage of a number of key trials examining endovascular therapy. Although all these trials boast impressive results, each stopped their enrollment prematurely, not due to a preplanned interim analysis, but rather due to MR CLEAN’s positive results. ESCAPE and EXTEND-IA were the first to halt enrollment and hastily publish their results (2,3). More recently the NEJM has reported on the findings from the next two trials prematurely stopped due to MR CLEAN’s success.

The first of these studies is the SWIFT-PRIME trial published by Saver et al (4). This trial’s initial results were presented earlier this year alongside EXTEND-IA and ESCAPE at the 2015 International Stroke Conference. Like its counterparts, this trial examined patients presenting with large ischemic infarcts and radiographically identified occlusions in the terminal internal carotid (ICA) or first branch (M1) of the middle cerebral artery (MCA). Additionally patients had to demonstrate a favorable core-to-ischemic penumbra ratio on perfusion imaging. Patients were enrolled if they were able to undergo endovascular interventions within 6-hours of symptom onset.

Like ESCAPE and EXTEND-IA, the results of SWIFT-PRIME are impressive. Authors boast a 25% absolute difference in the number of patients with a mRS of 0-2 at 90 days. Though notable, the definitive magnitude of effect is hardly concrete. The authors cite an NNT of 4 to have one more patient alive and independent at 90 days, and an NNT of 2.6 to have one patient less disabled. These calculations are used using their dichotomous and ordinal analyses respectively. Although the authors cite impressive p-values (<0.001), the confidence interval surrounding this 25% point estimate is far broader (11-38%). Meaning the NNT is somewhere between 2.6 and 9 patients. EXTEND-IA and ESCAPE have similarly wide confidence intervals surrounding their point estimates (4). EXTEND-IA’s confidence interval is 8% to 50% surrounding a point estimate of 31% (2). Likewise ESCAPE has a confidence interval of 13% to 34% surrounding its 23.7% point estimate (3). All three of these trials were stopped early secondary to MR CLEAN’s results. And though both EXTEND-IA and ESCAPE came close to reaching their pre-defined sample size, SWIFT-PRIME was stopped before its first interim analysis (n<200) (4).

Like EXTEND-IA, ESCAPE and SWIFT-PRIME, the second trial just published in NEJM, the REVASCAT trial, by Jovin et al was stopped prematurely secondary to the publication of the MR CLEAN data. In fact, even though it failed to reach the prospectively determined efficacy threshold for stopping the trial, at the first interim analysis, the data and safety board felt that given the MR CLEAN data, there was a loss of equipoise and further randomization would be unethical (5). Despite its apparent success the results of the RAVASC trial are far less impressive than either EXTEND-IA, ESCAPE or SWIFT-PRIME. The REVASC trial planned to enroll 690 patients presenting to the Emergency Department in 4 centers across Catalonia with symptoms consistent with a large vessel stroke that could be treated with endovascular therapy within 8 hours of symptom onset. Unlike EXTEND-IA, ESCAPE or SWIFT-PRIME, the REVASCAT Trial did not use perfusion imaging to select patients with favorable areas of salvageable tissue. Rather employed CTA to identify occlusion in the ICA or M1 branch of the MCA, and utilized the less accurate ASPECT score, derived from the initial non-contrast CT, to assess potential for viable ischemic tissue (5).

REVASCAT enrolled 206 patients before its premature termination. And like the three trials before it demonstrated a statistically significant improvement in mRS at 90 days in the patients who underwent endovascular therapy. The REVASCAT trial cites an absolute increase in the number of patients with a mRS of 0-2 by 15.5%. This is surrounded by a confidence interval of 2.4% to 28.5%. Furthermore, unlike the previous three trials that either boast an outright benefit in mortality or demonstrate trends in favor of endovascular therapy, REVASCAT demonstrated an impressive 4.8% absolute increase in the rate of death within the first 7 days after randomization (5).

The results of REVASCAT are far from positive. If they were not included in the optimistic fervor that currently surrounds endovasacular therapy, it might even be considered a negative trial. Why were the results REVASCAT far less impressive than EXTEND-IA, ESCAPE and SWIFT-PRIME? Was it just random chance, the true effect size of endovascular therapy falling somewhere between the two extremes of the 13.5% difference observed in MR CLEAN and the 31% seen in EXTEND-IA? Or rather was it that the patient population selected in EXTEND-IA, ESCAPE and SWIFT-PRIME led to their success? EXTEND-IA, ESCAPE and SWIFT-PRIME all utilized some form of advanced imaging to determine the size of viable ischemic tissue (2,3,4). MR CLEAN and REVASCAT used only the CTA to identify a reachable lesion and the non-contrast CT to determine tissue viability (1,5). If any one of these trials were followed to completion the results likely would provide us with a better understanding of who will benefit from endovascular therapy and the exact magnitude of this benefit.

This is a problem of certainty. Our faith in endovascular interventions was so unyielding, that at the first sign of success we claimed victory and discontinued any further scientific inquiries. The bloated results demonstrated in EXTEND-IA, ESCAPE, and SWIFT-PRIME are the result of this premature resolution. We know that trials stopped early for benefit are likely to over-estimate the effect size of the treatment in question. In fact the smaller the sample size at the time of closure, the greater the amplification (6). In 1989, Peacock et al demonstrated this to be a mathematical inevitability (7). Later validated by Bassler et al in a meta-analysis examining 91 trials stopped prematurely for benefit (8). Bassler et al revealed that the degree of embellishment was directly related to the size of the sample population at cessation and independent of the quality of the trial or the presence of a predetermined methodology for early stoppage.

Although the exact patient population that stands to benefit from endovascular therapy is unclear, it is certainly a small fraction of the overall patients who present to the Emergency Department with acute ischemic stroke. All patients enrolled in the REVASC trial were also included in a national registry known as SONIA. SONIA catalogued 2576 patients (only 15.6% of all stroke patients seen) with some form of reperfusion therapy over the time period REVASCAT enrolled patients (5). The vast majority of these patients 2036(79%) received only tPA. 540 (21%) patients underwent endovascular therapy. Of these only 111 (24%) were eligible for enrollment into the REVASCAT trial. Only 4.3% of the patients in the SONIA registry, and only 0.3% of all stroke patients during the 2-year period were eligible for inclusion in the REVASCAT trial (5). This accounts for a small minority of the stroke patients presenting to the Emergency Department with symptoms consistent with acute ischemic stroke. Of note the criteria used in the REVASCAT trial to determine eligibility are more inclusive than those used in EXTEND-IA, ESCAPE, and PRIME-SWIFT, which if you believe were successful because of their inclusion criteria, would account for an even smaller portion of stroke patients presenting the Emergency Department. In the SWIFT-PRIME trial it took 2-years and 39 centers to recruit 196 patients (4). That comes out to 0.2 patients per center per month. EXTEND-IA and ESCAPE recruited only 0.3 and 1.44 patients per center per month respectively (2,3).

Even the most skeptical will find difficulty denying there is a definite treatment effect observed in the recent trials examining endovascular therapy in acute ischemic stroke. The magnitude of this effect has yet to be defined. Its borders are obscured by the murkiness of small sample sizes, extreme selection bias and prematurely stopped trials. There are also clear harms associated with this invasive procedure. Both the REVASCAT trial and the earlier trials examining endovascular therapy (IMS-3, SYNTHESIS and MR RESCUE) demonstrated that when performed on the wrong patient population, not only will endovascular therapy fail to provide benefit, it may in fact be harmful (5,9,10,11). This is simply not a yes or no question. The resources required to build an infrastructure capable of supporting endovascular therapy on a national level are daunting. Though we have reached a certain degree of clarity that endovascular therapy for acute ischemic stroke provides benefit, how well and in whom remains murky. The overeager truncation of important trials has left us adrift in a sea of fog. Unsure if the shoreline we paddle towards is a warm welcoming beachfront or a rocky coast prepared to demolish our vessel upon arrival.

Sources Cited:

  1. Berkhemer OA, Fransen PS, Beumer D, et al. A randomized trial of intraarterial treatment for acute ischemic stroke. N Engl J Med. 2015;372:(1)11-20.
  2. Campbell BC, Mitchell PJ, Kleinig TJ, et al. Endovascular Therapy for Ischemic Stroke with Perfusion-Imaging Selection. N Engl J Med. 2015.
  3. Goyal M, Demchuk AM, Menon BK, et al. Randomized Assessment of Rapid Endovascular Treatment of Ischemic Stroke. N Engl J Med. 2015.
  4. Saver JL, Goyal M, Bonafe A, et al. Stent-Retriever Thrombectomy after Intravenous t-PA vs. t-PA Alone in Stroke. N Engl J Med. 2015
  5. Jovin TG, Chamorro A, Cobo E, et al. Thrombectomy within 8 Hours after Symptom Onset in Ischemic Stroke. N Engl J Med. 2015;
  6. Guyatt GH, Briel M, Glasziou P, Bassler D, Montori VM. Problems of stopping trials early. BMJ. 2012;344:e3863.
  7. Pocock SJ, Hughes MD. Practical problems in interim analyses, with particular regard to estimation. Control Clin Trials 1989;10(suppl 4):209-21S.
  8. Bassler D, Briel M, Montori VM, et al. Stopping randomized trials early for benefit and estimation of treatment effects: systematic review and meta-regression analysis. JAMA. 2010;303(12):1180-7.
  9. Broderick JP, Palesch YY, Demchuk AM, et al. Endovascular therapy after intravenous t-PA versus t-PA alone for stroke. N Engl J Med. 2013;368(10):893-903.
  10. Ciccone A, Valvassori L, Nichelatti M, et al. Endovascular treatment for acute ischemic stroke. N Engl J Med. 2013;368(10):904-13.
  11. Kidwell CS, Jahan R, Gornbein J, et al. A trial of imaging selection and endovascular treatment for ischemic stroke. N Engl J Med. 2013;368(10):914-23.

 

 

 

 

 

The Case of the Anatomic Heart Part 2

illu_heart_kleiner

The PROMISE Trial, like any aptly named study chose an acronym meant to inspire. In this case, the hope for a better tomorrow. And though the authors of the Prospective Multicenter Imaging Study for Evaluation of Chest Pain trial were not clear on the specific details their promise entailed, I fear the results of this trial will leave us feeling betrayed and forsworn.

The authors of the PROMISE Trial presented the findings from their massive undertaking at the 2015 ACC scientific assembly. The results were published simultaneously in the NEJM. Douglas et al randomized 10,003 patients to either standard non-invasive functional testing, as determined by the treating physician, or CTCA. Patients were recruited from outpatient facilities across North America when presenting with new onset chest pain in which the treating physician was suspicious of cardiac origin and had already ruled out ACS. Patients were excluded if they presented with unstable vitals, EKG changes, or positive biomarkers. Given the pragmatic nature of the trial, all other treatment decisions were left to the prerogative of the treating physician (1).

The authors found no difference in their primary outcome, the composite endpoint of death, MI, hospitalization for UA, or major procedural complications over the followup period (at least 12 months with average follow up of 24 months), between the CTCA and traditional testing groups (3.3% vs 3.0%). In fact other than a small decrease in the amount of negative invasive catheterization seen in the CTCA arm (3.4% vs 4.3%), the authors were unable to find any statistically significant differences in the multitude of secondary endpoints measured. As far as safety outcomes, the authors did cite some relevant concerns. Most notably those randomized to receive CTCA as their screening test underwent significantly more downstream testing and interventions. 12.2% of those randomized to the CTCA arm compared to 8.1% in the standard testing arm underwent invasive catheterization, 6.2% compared to 3.2% underwent subsequent revascularization including a 1.5% vs 0.76% rate of coronary artery bypass grafting (CABG) (1).

Now some might argue that the PROMISE trial was not performed on Emergency Department patients and thus its application to our low risk chest pain population is questionable. In some senses this may be true. Patients evaluated in the Emergency Department for chest pain are inherently at higher risk than their counterparts seen in primary care offices. Conversely the PROMISE Trial evaluated a cohort of chest pain in whom the treating physician suspected the symptoms were likely of cardiac origin. Before being enrolled in the trial all of these patients were ruled out for ACS with negative EKGs and biomarkers. Additionally the treating physician felt further provocative testing was necessary. This is not unlike the cohort of patients we include in our low-risk chest pain population in the Emergency Department. Furthermore we have four trials with over 3,000 Emergency Department patients evaluating the efficacy of CTCA, which demonstrate almost identical results to the PROMISE Trial (2,3,4,5). Each of these studies determined that CTCA adds no additional prognostic value to our standard risk stratification strategies and likely leads to increased invasive procedures. In a meta-analysis of these four trials published in JACC in 2013, Hulten et al found a significant increase in the number of invasive angiographies, PCIs and revascularizations performed in the patients randomized to the CTCA arm (6). PROMISE demonstrated the exact same tendencies of CTCA in a much larger cohort (1).

Why did PROMISE fail to find a difference? What are we to infer about the acuity and severity of a disease state that does not benefit from a timely and accurate diagnosis? We know CTCA is far more accurate than our more traditional forms of provocative testing. And yet, why in this massive trial did it fail to find any difference in clinically relevant outcomes? Might it be that a time-sensitive anatomical definition of CAD is unnecessary?

The first reason why PROMISE failed to show a difference is that the population enrolled in the trial was at such low risk for the disease state in question, they are likely to do well whatever diagnostic testing strategy they undergo. Only 3.1% of the group had any event during the follow-up period. Only 1.5% died and only 0.7% had a MI (1). With such a low event rate, even if CTCA is an effective means of identifying and preventing MI and cardiac death, a statistically significant benefit is unlikely to be found even with a sample size as large as 10,000 patients.

The second reason why the PROMISE Trial is likely to have failed, is simply because we are functioning under the misconception that when we diagnose these patients with obstructive CAD, an invasive strategy is superior to optimal medical management. Though we know that reperfusion therapy has objective benefits in patients actively experiencing a myocardial infarction, these same benefits have failed to translate to the more stable lesions of CAD. Multiple large RCTs have failed to find a benefit of PCI over optimal medical management in patients with stable obstructive CAD (7,8). Stergiopoulos et al have now published a number of meta-analyses examining these trials, which have also failed to uncover benefits that may have been missed in the weaker powered individual trials (9,10).

The PROMISE trial was not the only trial presented at the ACC Scientific Assembly examining the pragmatic use of CTCA for the diagnostic work up of chest pain. The SCOT-HEART trial was yet another massive undertaking, the results published online in The Lancet in concert with the oral presentation. In this trial, investigators enrolled 4,146 patients referred to chest pain clinics across Scotland, to either a standard work up or a standard work up plus the addition of CTCA. Although by sheer quantity it does not possess the statistical s of the PROMISE trial, it does present us with some insights, which the PROMISE trial proved incapable of providing(11).

The unique design of the SCOT-HEART trial insured all patients received a full standardized evaluation, often including (85% of the time) an exercise stress test. It was only after the treating physician assessed the patient, reported his or her baseline estimate of the likelihood of CAD and determined what further testing and treatment strategies he or she would recommend, that the patients were randomized to either receive CTCA or standard care. Like PROMISE, this was a pragmatic trial design and other than the use of CT angiography clinicians were given free rein to treat each patient as they deemed appropriate. At 6 weeks the physicians were then asked again to assess the likelihood of CAD(11).

What the authors revealed was that the use of CTCA significantly improved the clinicians confidence in their diagnosis of both CAD and angina of cardiac origin (the trial’s primary endpoint). They also found a statistically significant increase in the number of patients diagnosed with CAD in the group randomized to receive CTCA (23% vs 11%). Additionally patients in the CTCA arm were more frequently shifted towards more aggressive and invasive modes of management when compared to the standard care arm. Specifically more patients in the CTCA group saw an increase in number of medical therapies prescribed and invasive catheterizations performed (11).

In summary, patients randomized to CTCA were more often given the diagnosis of CAD and were more likely to be treated with medical therapies and invasive procedures than the patients in the standard care group. But did all of these investigations and interventions lead to better outcomes? Simply put no. The rate of cardiovascular death and myocardial infarction during the follow up period (1.7 years) was 1.3 vs 2.0, a 0.7% non-statistical difference. The overall mortality was 0.8% vs 1.0%, respectively. Even the decrease in the quality and severity of the patients’ symptoms (the reason the patients presented to the clinic in the first place) at 6-weeks, was identical (11).

The PROMISE trial demonstrated the use of CTCA promotes increased downstream testing and intervention. The SCOT-HEART trial validated these findings. The SCOT-HEART trial also demonstrated CTCA provides a significant degree of diagnostic certainty to the treating physician, leading to more aggressive medical management. And yet knowing a lot and doing a lot failed equate to a reduction in mortality or myocardial infarctions. These are coronary mirages, promising the weary clinicians water when in reality they are just leading them deeper into the barren desert.

Despite its size and decisively negative results, perhaps the most important study arm in the PROMISE Trial did not exist, an arm in which patients were randomized to not receive any form of provocative testing, but rather treated medically as per the judgment of their physician. Both the PROMISE and SCOT-HEART trials demonstrated that a cohort of outpatient chest pain patients are at such low risk for adverse events, they are likely to do equally as well with whatever provocative test is used, or more importantly without any at all. Surely it is time to examine such a hypothesis, to add a third arm to the PROMISE cohort. The ISCHEMIA Trial is currently enrolling patients to compare medical management vs invasive strategies in the setting of a positive provocative test. Unfortunately this trial’s applicability is limited by the fact that authors insist all patients undergo a CTCA before enrollment to rule out the presence of left main arterial disease. And though this may be a step in the right direction, we still can’t escape our need for anatomical certainty in the face of diminishing clinical utility. Surely it is time we define the value of both provocative and anatomical testing in the low risk chest pain population, truly a Promise worth keeping.

Sources Cited:

  1. Douglas PS, Hoffmann U, Patel MR, et al. Outcomes of anatomical versus functional testing for coronary artery disease. N Engl J Med. 2015;372(14):1291-300.
  2. Goldstein JA, Chinnaiyan KM, Abidov A, et al. The CT-STAT (Coronary Computed Tomographic Angiography for Systematic Tri- age of Acute Chest Pain Patients to Treatment) trial. J Am Coll Cardiol 2011;58:1414–22.
  3. Hoffmann U, Truong QA, Schoenfeld DA, et al. Coronary CT angiography versus standard evaluation in acute chest pain. N Engl J Med 2012;367:299–308.
  4. Litt HI, Gatsonis C, Snyder B, et al. CT Angiography for safe discharge of patients with possible acute coronary syndromes. N Engl J Med 2012;366:1393–403.
  5. Goldstein JA, Gallagher MJ, O’Neill WW, Ross MA, O’Neil BJ, Raff GL. A randomized controlled trial of multi-slice coronary computed tomography for evaluation of acute chest pain. J Am Coll Cardiol 2007;49:863–71.
  6. Hulten E, Pickett C, Bittencourt MS, et al. Outcomes after coronary computed tomography angiography in the emergency department: a systematic review and meta-analysis of randomized, controlled trials. J Am Coll Cardiol. 2013;61:(8)880-92.
  7. Boden WE, O’rourke RA, Teo KK, et al. Optimal medical therapy with or without PCI for stable coronary disease. N Engl J Med. 2007;356(15):1503-16.
  8. Mehta SR, Cannon CP, Fox KA, et al. Routine vs selective invasive strategies in patients with acute coronary syndromes: a collaborative meta-analysis of randomized trials. JAMA. 2005;293(23):2908-17.
  9. Stergiopoulos K, Brown DL. Initial Coronary Stent Implantation With Medical Therapy vs Medical Therapy Alone for Stable Coronary Artery Disease: Meta- analysis of Randomized Controlled Trials. Archives of Internal Medicine 2012 Feb;172(4):312
  10. Stergiopoulos K, Boden WE, Hartigan P, et al. Percutaneous Coronary Intervention Outcomes in Patients With Stable Obstructive Coronary Artery Disease and Myocardial Ischemia: A Collaborative Meta-analysis of Contemporary Randomized Clinical Trials. JAMA Intern Med. 2014;174(2):232-240.
  11. The SCOT-HEART investigators. CT coronary angiography in patients with suspected angina due to coronary heart disease (SCOT-HEART): an open-label, parallel group multicenter trial. Lancet. 2015; (published online March 15.)

The Case of Dubious Squire

laennec (1)

I often get the sense that the makers of many biomarkers envision us as helpless damsels in distress drowning in an icy pond or trapped in a monumental tower with no obvious means of descent. I imagine they think in our desperate grasps for aid, we will cling to whatever assistance they may offer, independent of its buoyancy. But in these moments of fear and uncertainty we must remember for a test to be useful to a clinician not only does it have to be accurate and reliable, it must also add diagnostic value above the clinician’s own inherent aptitude. B-type natriuretic peptide (BNP) and its natriuretic derivatives are a classic example of such a test heralded for its isolated diagnostic properties without asking the simple question, how does it help the physician? Through statistical misdirection, the distributors of natriuretic peptides have published research hailing their diagnostic prowess when examined in isolation. Such publications have led to these assays becoming recommended components of the workup for any patient suspected of having acute decompensated heart failure (1,2,3). A recent meta-analysis performed by the helpful folks responsible for the NICE guidelines, sought to examine the validity of these recommendations and determine the true diagnostic accuracy of natriuretic peptides (4). And yet, I fear these authors in their effort to provide an accurate representation of the assay’s diagnostic accuracy, have forgotten to take into account the most important factor when evaluating any diagnostic test, the clinician.

In this meta-analysis, Roberts et al examined the clinical accuracy of BNP, NTproBNP, and MRproANP for the diagnosis of acute decompensated heart failure in the Emergency Department. Specifically, the  goal was to evaluate the low risk criteria proposed by the 2012 European Society of Cardiology guidelines for heart failure, a BNP ≤100 ng/L, a NTproBNP, ≤300 ng/L, and a MRproANP, ≤120 pmol/L. They also examined the utility of these assays at intermediate and high levels (100-500 ng/L, and >500 ng/L for BNP; 300-1800 ng/L, and >1800 ng/L for NTproBNP; and >120 pmol/L for MRproANP) (4).

The authors identified 42 articles, examining 37 different cohorts that met criteria for inclusion into their meta-analysis. Combining these studies, the authors calculated pooled test characteristics for each of the natriuretic assays in question. They found at the low thresholds proposed by the European Society of Cardiology, the assays performed equally mediocre. All three demonstrated high sensitivities, 95%, 99%, and 95% respectively. Of course by selecting such a low cutoff, authors ensured that a large proportion of the patients without acute heart failure would also test positive. The specificities of each of these assays were a dismal 63%, 43%, and 56% respectively. As with any diagnostic tool, by raising the threshold of what you consider positive, the authors were able to improve the assay’s specificity. When the intermediate thresholds were utilized, the specificities increased to to 86% and 76% for BNP and NTproBNP respectively (authors did not have enough data on MRproANP to adequately calculate accuracy in this intermediate range.) Of course this amplified specificity came at the price of a loss of sensitivity, 85% and 90% respectively. When using the high threshold, authors were able to augment the tests’ specificity even further, but of course at this high level a large portion of patients with acute decompensated heart failure are missed. At a threshold of ≥500 ng/L, diagnostic meta-analysis was not performed due to inadequate data. BNP demonstrated sensitivities from the individual studies ranging from 35% to 83%, with a paired specificity from 78% to 100%. Likewise at a threshold of ≥1800 ng/L, NTproBNP reported sensitivities ranging from 67% to 87% with paired specificities ranging from 72% to 95%. Finally at the threshold of >120 pmol/L, MRproANP demonstrated sensitivities ranging from 84% to 98% and the paired specificities from 40% to 84% (4).

The authors conclude, “The use of NTproBNP and B type natriuretic peptide at the rule-out threshold recommended by the recent European Society of Cardiology guidelines on heart failure provides excellent ability to exclude acute heart failure in the acute setting with reassuringly high sensitivity. The specificity is modest at all but the highest values of natriuretic peptide, therefore confirmatory testing by cardiac imaging is required in patients with positive test results (4).”

On face value this is a fair conclusion, as all three of these assays seem to perform moderately well at either extreme of their diagnostic spectrum. At very low levels it is safe to say that the likelihood that the patients symptoms were caused by heart failure was fairly low. Likewise when significantly elevated, these assays boast specificities high enough for clinical use. Unfortunately these results do very little to explain the true utility of natriuretic peptides. By isolating these assays’ test characteristics outside the clinical arena, the authors have falsely inflated the utility of BNP and its natriuretic derivatives.

The first issue that is pervasive throughout the literature expounding the utility of natriuretic peptides is the gold standard used to evaluate their diagnostic capabilities. The most prevalent gold standard used is a retrospective review performed by two Cardiologists blinded to the results of the natriuretic peptide in question. 31 of the 37 cohorts in this meta-analysis used some derivative of this questionable gold standard. In one of the largest trials conducted, the Breathing Not Properly (BNP) trial by Maisel et al, authors examined 1586 patients presenting to the Emergency Department with acute dyspnea (5). They found that the two Cardiologists disagreed with the initial Emergency Physician’s diagnoses 14% of the time and disagreed with each other 10.7% of the time (6). This suggests that the cases in question were clearly not straightforward. If two Cardiologists with access to the patients’ entire hospital course disagreed with each other almost as often as they disagreed with the initial diagnosis of the Emergency Physician, then it is fair to say using this definition as the gold standard is less than ideal.

Despite this tarnished gold standard the question remains, how do natriuretic peptides perform when used in the clinical arena? More specifically how well do natriuretic peptide assays help the Emergency Physician differentiate the causes of dyspnea in the subset of patients in which there is considerable diagnostic uncertainty? In the BNP trial Maisel et al examined the Emergency Physician’s ability to correctly identify acutely decompensated heart failure. They found our accuracy overall, when compared to the less than perfect gold standard of a retrospective review performed by two Cardiologists was 86% (6). In the subset of patients in which the Emergency Physician was certain the patients’ dyspnea was not cardiac in origin (<5% chance of CHF), their diagnostic accuracy was superb (92%). Likewise in the group of patients in which the Emergency Physician was 95% certain the patient did in fact have CHF, they were correct 95% of the time (7). It was only in the intermediate group (between 20%-80% probability) in which the Emergency Physician was unsure of the likelihood of CHF, that their diagnostic capabilities were understandably poor. It is in this intermediate group that we would hope the natriuretic peptides could provide us with some guidance. We should not ask how accurately do peptide assays predict acute decompensated heart failure, but rather how well do peptide assays predict acute decompensated heart failure in the subset of patients that present a diagnostic challenge to the Emergency Physician? When charged with such a task these assays are far less impressive.

Although in their initial publication Maisel et al failed to disclose the diagnostic abilities of the Emergency Physicians, citing only BNP’s performance using the retrospective cutoff of 100 ng/L (sensitivity of 90%, a specificity of 76%), the authors later published these findings in a secondary analysis. Published by McCullough et al in Circulation, the authors revealed that when the Emergency Physician was certain that the patient’s cause of dyspnea was either definitely CHF or definitely not CHF, their unstructured judgment outperformed that of the BNP assay. For patients in which the Emergency Physician was certain CHF was not the cause of their dyspnea their accuracy was 92% vs the BNP which was only 84%. Likewise when the Emergency Physician was certain the patient did in fact have CHF, again their judgment outperformed the diagnostic abilities of the BNP assay (accuracy of 95% vs 92%) (7). In fact even in the subset of patients where the Emergency Physician was fairly certain the diagnosis was CHF (>80%), their positive likelihood ratio of 11.5 was far more impressive than that of the BNP (3.4)(8). In the 27.8% of patients in which the Emergency Physician was unclear of the diagnosis, the very group we would hope the BNP could provide guidance, its diagnostic accuracy was entirely unhelpful. In this subset of patients, at a cutoff of 100 ng/L, the assay demonstrated no clinical utility with a sensitivity and specificity of 79% and 71% respectively (8).

Each of the 37 studies included in the Roberts et al meta-analysis failed to truly examine how natriuretic peptides perform clinically. As discussed, the majority of these trials employed a less than ideal gold standard comparator and were so confounded by spectrum bias, they rarely examined the subgroup of patients in which the diagnosis was unclear. Additionally most of these studies used a retrospectively derived cutoff calculated to demonstrate the assay’s optimal performance. This type of overfitting inevitably leads to decreased performance when validated in a novel cohort. Ideally a randomized trial comparing a natriuretic peptide guided management to standard practice could demonstrate what, if any, clinical utility these assays provide. A number of such trials have been conducted.

The first was published in the NEJM in 2004 by Mueller et al. In this trial the authors randomized 452 patients presenting to the emergency department with acute dyspnea to either a diagnostic strategy utilizing a BNP assay or a standard work up (9). Authors powered their study to detect a 20% reduction in time to discharge (an interesting primary diagnosis to choose if one thinks BNP possesses true clinical relevance), defined as the interval from presentation at the Emergency Department to discharge. The authors found a significant difference in time to discharge (8 vs 11 days) as well as shorter times to treatment for the BNP group (63 vs 90 minutes), decreased rates of hospitalization (75% vs 85%) and decreased admission to the ICU (15% vs 24%). In fact every outcome variable trended towards better in the group randomized to receive the BNP-guided diagnostic strategy. Initially these results seem significantly in favor of using BNP in the diagnostic workup of acute dyspnea, until one examines the other RCTs evaluating this question (9).

The second RCT examining natriuretic peptides for the management of acute dyspnea was published by Moe et al in Circulation in 2007(10). In this trial, the authors randomized 500 patients to either a NT-proBNP guided strategy or standard care. Like the previous study the authors used the clinically dubious endpoint of initial ED visit duration as their primary endpoint. Though the authors found a statistically significant difference in initial ED visit time, the 0.7-hour difference (5.6 hrs vs 6.3 hrs) hardly seems clinically relevant. In fact the remainder of clinically important variables all favored the usual care group (in-hospital mortality 4.4% vs 2.4% and 60-day mortality 5.4 vs 4.4) (10). Three other trials published subsequently found similar results. Other than clinically questionable reductions in length of stay, the use of natriuretic peptides had no meaningful effect on clinical outcomes (11,12,15). When these trials’ data were pooled in a meta-analysis published by Trinquart et al, in The American Journal of Emergency Medicine in 2011, authors found no significant difference in any of the multitude of clinically relevant variables including hospital admission rate, length of hospital stay, mortality or rates of re-hospitalization (13). Even in the long-term management of patients with known heart failure, when compared to symptom guided approach, a BNP guided protocol led to further diagnostic testing and more aggressive medical therapy without producing a difference in clinically relevant outcomes (18-month survival free of any hospitalization was 41% vs 40%) (16).

This is not a proclamation of the infallibility of the Emergency Physician but rather the recognition of our shortcomings. There are a clear group of patients that present a diagnostic challenge, for whom further confirmatory investigations could provide guidance. Despite the industry-sponsored studies designed to propagate an overinflated self-worth, a close examination of the natriuretic peptides reveal they add little value to Physicians’ judgment. When we as the Emergency Physician are certain of the diagnosis of acute decompensated heart failure, our intrinsic diagnostic capabilities outperform those of natriuretic peptides. In the patients that present as a diagnostic challenge, these assays are far too insensitive and non-specific to add substantial diagnostic clarity. Furthermore we have other, more diagnostically robust, tools like point of care ultrasound to assist in these challenging circumstances (14). Natriuretic peptides are not the diagnostic saviors that they are commonly proclaimed as. More importantly we are not in need of rescue as often as the makers of these peptides would have us believe. On the rare occasion we do require aid, should we not demand a far more resolute champion?

Sources Cited:

  1. Yancy CW, Jessup M, Bozkurt B, Butler J, Casey DE, Drazner M, et al. ACCF/AHA guideline for the management of heart failure: a report of the American College of Cardiology Foundation/American Heart Association Task Force on practice guidelines Circulation2013;128:e240-327
  2. McMurray JJV, Adamopoulos S, Anker SD, Auricchio A, Bohm M, Dickstein K, et al. ESC guidelines for the diagnosis and treatment of acute and chronic heart failure 2012: The Task Force for the Diagnosis and Treatment of Acute and Chronic Heart Failure 2012 of the European Society of Cardiology. Developed in collaboration with the Heart Failure Association (HFA) of the ESC. Eur J Heart Fail2012;14:803-69
  3. Thygesen K1, Mair J, Mueller C, Huber K, Weber M, Plebani M, et al. Recommendations for the use of natriuretic peptides in acute cardiac care: a position statement from the Study Group on Biomarkers in Cardiology of the ESC Working Group on Acute Cardiac Care Eur Heart J2012;33:2001-6
  4. Roberts Emmert, Ludman Andrew J, Dworzynski Katharina, Al-Mohammad Abdallah, Cowie Martin R, McMurray John J V et al. The diagnostic accuracy of the natriuretic peptides in heart failure: systematic review and diagnostic meta-analysis in the acute care setting BMJ 2015; 350 :h910
  5. Maisel AS, Krishnaswamy P, Nowak RM, et al. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med. 2002;347:(3)161-7.
  6. McCullough PA, Nowak RM, McCord J, et al. B-type natriuretic peptide and clinical judgment in emergency diagnosis of heart failure: analysis from Breathing Not Properly (BNP) Multinational Study. Circulation. 2002;106:(4)416-22.
  7. Schwam E. B-type natriuretic peptide for diagnosis of heart failure in emergency department patients: a critical appraisal. Acad Emerg Med. 2004;11:(6)686-91.
  8. Hohl CM, Mitelman BY, Wyer P, Lang E. Should emergency physicians use B-type natriuretic peptide testing in patients with unexplained dyspnea? CJEM. 2003;5:(3)162-5.
  9. Mueller C, Scholer A, Laule-Kilian K, Martina B, Schindler C, Buser P, et al. Use of B-type natriuretic peptide in the evaluation and management of acute dyspnea. N Engl J Med 2004;350(7):647-54.
  10. Moe GW, Howlett J, Januzzi JL, Zowall H. N-terminal pro-B-type natriuretic peptide testing improves the management of patients with suspected acute heart failure: primary results of the Canadian prospective randomized multicenter IMPROVE-CHF study. Circula- tion 2007;115(24):3103-10.
  11. Rutten JH, Steyerberg EW, Boomsma F, van Saase JL, Deckers JW, Hoogsteden HC, et al. N-terminal pro-brain natriuretic peptide testing in the emergency department: beneficial effects on hospitalization, costs, and outcome. Am Heart J 2008;156(1):71-7.
  12. Schneider HG, Lam L, Lokuge A, Krum H, Naughton MT, De Villiers Smit P, et al. B-type natriuretic peptide testing, clinical outcomes, and health services use in emergency department patients with dyspnea: a randomized trial. Ann Intern Med 2009;150(6):365-71.
  13. Trinquart L, Ray P, Riou B, Teixeira A. Natriuretic peptide testing in EDs for managing acute dyspnea: a meta-analysis. Am J Emerg Med. 2011;29:(7)757-67.
  14. Al Deeb M, Barbic S, Featherstone R, Dankoff J, Barbic D. Point-of-care ultrasonography for the diagnosis of acute cardiogenic pulmonary edema in patients presenting with acute dyspnea: a systematic review and meta-analysis. Acad Emerg Med. 2014;21:(8)843-52.
  15. Singer AJ, Birkhahn RH, Guss D, et al. Rapid Emergency Department Heart Failure Outpatients Trial (REDHOT II): a randomized controlled trial of the effect of serial B-type natriuretic peptide testing on patient management. Circ Heart Fail. 2009;2:(4)287-93.
  16. Pfisterer M, Buser P, Rickli H, et al. BNP-guided vs symptom-guided heart failure therapy: the Trial of Intensified vs Standard Medical Therapy in Elderly Patients With Congestive Heart Failure (TIME-CHF) randomized trial. JAMA. 2009;301:(4)383-92.

 

 

 

 

The Adventure of the Second Stain Continues

Meningitis_-_Lumbar_puncture

The CT-LP (lumbar puncture) diagnostic pathway has been a permanent fixture in the arsenal of the Emergency Physician for what seems like an eternity. Steadfast in its dependability, for many generations, the LP was a necessity for Emergency Physicians to safely exclude the diagnosis of subarachnoid hemorrhage (SAH). And yet, rarely a moment has passed over the past few years when Dr. Jeffrey Perry has not politely demonstrated how little we truly know about this disease process and the diagnostic tools associated with it. His 2011 paper questioning the necessity of an LP following a negative head CT under 6-hours from symptom onset, shook the once solid ground that the LP firmly stood upon (1). As if this attack on our reliable comrade was not enough, his most recent publication examining the diagnostic capabilities of the lumbar puncture itself has left our confidence in this once dependable testing strategy in turmoil.

In this paper, published in February of 2015 in The BMJ, Perry et al utilized a subset of patients from two cohorts originally enrolled to derive and validate his Ottawa SAH rule (3,4). Authors examined 1739 of these patients who received a lumbar puncture as part of their workup for SAH (2). They then sought to assess the diagnostic accuracy of this tool. Similar to common practice, they prospectively defined a positive tap as greater than 1 RBC on fluid aspirate. When this impossibly low threshold was upheld, LP’s performance was less than stellar. Of the 1739 patients who received an LP, 641 (36.9%) had positive findings, only 15 of which were actually from subarachnoid blood. Most of these false positive results were trivial, as 476 (74.3%) had counts of ≤100×106/L and 94 (14.8%) had counts of 101-1000×106/L. Only 10.4% of these patients were found to have clinically concerning levels of RBCs in their CSF (counts of >1000×106/L). Despite the predominance of low RBC counts, a great majority of the patients in whom the LP was positive (419) received invasive angiographic imaging.

When the LP was found to be negative (No RBCs in the CSF), it boasted a sensitivity of 100%. In an attempt to compensate for the unacceptably high number of false positives the authors retrospectively determined the ideal RBC cutoff to be 2000×106/L. At this threshold the LP had a sensitivity of 93.3% (95% confidence interval 66.0% to 99.7%) and specificity of 92.8% (90.5% to 94.6%) for aneurysmal subarachnoid hemorrhage.  If visual xanthochromia was added to this RBC cutoff, the sensitivity for ruling out SAH became 100% (95% confidence interval 74.7% to 100.0%).

These numbers are of course fraught with methodological pitfalls. The threshold of 2000×106/L was retrospectively derived to best fit this specific cohort. Only 15 of the 1739 patients examined actually had the disease in question making these calculations incredibly unstable (the confidence intervals surrounding their 100% sensitivity dropped as low as 74.7%). The threshold of 2000×106/L is hardly robust enough for clinical use and will inevitably fail when applied in prospective fashion to a novel cohort.

Though this data is not definitive and further studies validating these findings are required, a number of valuable conclusions can be inferred. Surprisingly the most important of these has little to do with the diagnostic utility of the lumbar puncture.

In 2011 Perry et al published their game changing article in The BMJ examining the accuracy of a non-contrast head CT performed under 6-hours from symptom onset for the diagnosis of SAH (1). This paper was a secondary analysis of the original cohort used to derive the Ottawa SAH Rules (4). Using this preexisting cohort they assessed the accuracy of head CT for the diagnosis of SAH before and after a 6-hour threshold. The authors claim a sensitivity of 100% when CT was performed within 6-hours of symptom onset. However when the CT was performed after this 6-hour threshold, the sensitivity fell to 85.7%. Suggesting that when performed within 6-hours, a non-contrast CT is sufficient to rule out SAH, allowing practitioners to forego a subsequent lumbar puncture. Though many have viewed this as practice changing, others argue a number of flaws in the study’s design prevent us from interpreting these conclusions with such conviction.

The most obvious and often discussed weakness of this study is the use of a surrogate endpoint in place of a true gold standard. Not all patients who had a negative head CT underwent a confirmatory lumbar puncture. In its place, the authors used a 6-month proxy outcome to demonstrate the safety of CT alone. Patients underwent a structured phone interview at the 6-month mark to ascertain their wellbeing. When attempts to reach patients over the phone failed, authors endeavored to determine their status by searching medical records from regional neurosurgical centers as well as coroner’s death records. Patients were considered to be free of SAH if on 6-month follow-up they were alive and well. In the case of patients who were discovered to have passed away during the follow-up period, if the cause of death was determined to be due to something other than SAH, their deaths were not counted as a missed diagnosis. Of the 1931 patients examined, 421 were lost to follow-up. Authors found 8 of these patients had passed away since their initial workup for subarachnoid hemorrhage. Although none of these patients were determined to have died because of SAH, the reliability of post mortem cause of death is questionable at best (5).

A far less discussed aspect to this study was how the authors’ definition of a positive CT influenced the validity of their results. The standard that Perry et al used to calculate the sensitivity of head CT was based upon the Neuroradiologist’s official report. In most facilities (as was the case at the centers participating in this study) what guides Emergency Physicians’ clinical decision-making is the initial wet read usually done by Radiology house staff or even the ED physicians themselves. The sensitivity we are concerned with is that of the wet read. The Neuroradiologists in this study were not blinded to the patients’ lab findings. As such we are unable ascertain how many CTs done within 6 hours were initially read as negative, and only later after a positive LP was performed was the final report recorded as positive. If this had occurred with any frequency it would obviously harm the internal validity of the results. We are able to get a sense of how frequently this occurred by examining how many of the patients who were diagnosed with SAH had both a positive CT and LP. At least in theory, if the CT was positive then there would be no reason to perform the subsequent LP.

Of the 15 patients with SAHs that were diagnosed using a positive LP, 10 underwent head CTs and LPs that were both positive. The vast majority of these subarachnoid bleeds (n=8) were found in patients who received their CTs beyond the 6-hour threshold. There were however two patients that were identified as having received their CTs within 6-hours of symptom onset. In both these patients their initial CT was read as negative and only after a positive lumbar puncture was the final report changed to positive. If these two patients are taken into account, the adjusted sensitivity of CT under 6-hours from symptom onset is only 98.3% (with the confidence interval dropping as low as 93.6%).

These findings of course do nothing accept muddy the already cloudy waters. Head CT though fairly sensitive, will on occasion miss a subarachnoid bleed. The addition of CSF aspirate will very often offer a further degree of ambiguity. Furthermore the utilization of LP, at least in its current strategy, leads to an unacceptable number of false positives, exposing a large number of patients to needless downstream testing. If a more liberal view towards RBCs in the CSF is taken, the LP’s utility may be justifiable. Even with the retrospective best fit diagnostic capabilities calculated by Perry et al, the prevalence of SAH following a negative CT in under 6-hours is so low that further testing will likely lead to identifying far more false positive results than true subarachnoid bleeds. Cleary the conviction and certainty we once held for this testing strategy has suffered. Perhaps it is time for a shared decision making model. After all it is our patients’ value systems rather than our own biases that should guide these investigative journeys. Dr. Perry has demonstrated that the CT-LP pathway is far from straightforward. Perhaps it is time we confess these imperfections to the world at large and begin a far more honest conversation.

Sources Cited:

  1. Perry JJ, Stiell IG, Sivilotti ML, et al. Sensitivity of computed tomography performed within six hours of onset of headache for diagnosis of subarachnoid haemorrhage: prospective cohort study. BMJ. 2011;343:d4277.
  2. Perry JJ, Alyahya B, Sivilotti ML, et al. Differentiation between traumatic tap and aneurysmal subarachnoid hemorrhage: prospective cohort study. BMJ. 2015;350:h568.
  3. Perry JJ, Stiell IG, Sivilotti ML, et al. Clinical decision rules to rule out subarachnoid hemorrhage for acute headache. JAMA. 2013;310:(12)1248-55.
  4. Perry  JJ, Stiell  IG, Sivilotti  ML,  et al.  High-risk clinical characteristics for subarachnoid haemorrhage in patients with acute headache: prospective cohort study. BMJ. 2010;341:c5204.
  5. Wexelman, BA et al. Survey of New York City Resident Physicians On Cause-Of-Death Reporting. 2010. Prev Chronic 2013 10:E76

The Adventure of the Cardboard Box Continues

sigmund-abeles_portrait-of-parasomniac

For those whose beliefs are already firmly in favor of endovascular therapy for acute ischemic stroke, the publication of the MR CLEAN trial earlier this year and more recently the EXTEND-IA and ESCAPE trials only serve as a big fat, “I TOLD YOU SO!” For the perpetual disbelievers, each of these trials possesses enough flaws to discredit their findings. For the appropriately skeptical among us, though these trials initially appear to discredit our well meaning rants, on closer examination they are far more validating.

Earlier this year the publication of a large, well done, RCT examining the use of endovascular treatment for acute ischemic stroke threatened to drastically change the acute management of CVA as we know it. And though this trial was given a most unfortunate name (MR CLEAN), it marked the first time endovascular therapy has demonstrated any clinically relevant benefit (1). We have discussed this trial in depth in two previous posts. While MR CLEAN’s results were promising, there are many reasons why they should be viewed with a healthy dose of skepticism. Before we commit to a resource heavy intervention like that of endovascular therapy, more studies validating these findings are required. Since the publication of MR CLEAN, two active trials were stopped early for benefit, seeming to be the very validation for which we asked. The results of both of these studies, EXTEND-IA and ESCAPE, were recently published in the NEJM (2,3).

The first trial, Extending the Time for Thrombolysis in Emergency Neurological Deficits — Intra-Arterial (EXTEND-IA) trial, by Campbell et al, is a multi-center RCT that examined the efficacy of endovascular treatment in patients with CVA whose symptoms began within 4.5 hours of randomization. Like MR CLEAN this trial was a stunning success. In fact its results far outpaced the, by comparison, paltry benefits found in MR CLEAN. EXTEND-IA was stopped early after enrolling 70 patients for overwhelming benefit. The rate of significant improvement after 3 days (reduction in NIHSS > 8) was 80% vs 37% in the endovascular group and control group respectively. Likewise the rate of favorable outcome at 90-days (mRS of 0-2) was 71% vs 40% respectively, boasting an absolute difference of 31% (2).

The second and far more statistically robust trial is the Endovascular Treatment for Small Core and Anterior Circulation Proximal Occlusion with Emphasis on Minimizing CT to Recanalization Times (ESCAPE) trial, published by Goyal et al. In this trial, authors examined patients up to 12-hours after symptom onset, (though the large majority of the patients enrolled were evaluated within 3-hours of symptom onset). Like EXTEND-IA, the ESCAPE trial was an overwhelming success. Authors randomized 316 patients to either standard care or standard care plus endovascular therapy. Like EXTEND-IA, the authors found overwhelming benefits of the endovascular therapy. The rate of functional independence at 90-days (mRS of 0-2) was 53.0% vs 29.3% in favor of the endovascular arm. With authors noting a 33.7% absolute increase in positive outcomes in patients who received endovascular therapy. For the first time in the history of reperfusion therapies for acute ischemic stroke, a clinically significant mortality benefit was demonstrated. 90-day mortality was 10.4% in the endovascular group compared to 19.0% in the control group. Not to mention the surprisingly low rate of intracranial hemorrhage, (3.6% vs 2.7%) (3).

Neither trial is definitive in its own right. The EXTEND-IA cohort only examined the efficacy of endovascular therapy in 70 patients. Originally intending to enroll 100 patients, this trial was stopped prematurely after an interim analysis demonstrated such impressive results. This premature investigation of the sealed data was not performed because of a pre-planned interim analysis, but rather because of the publication of MR CLEAN (2). Though the remaining 30 patients would have most likely not have altered the results, we cannot view this poorly powered trial as anything more than hypothesis building. In isolation, EXTEND-IA can only offer a guideline for the future of endovascular management in acute ischemic stroke. Even the authors themselves conceded this point in the statistical analysis plan they published in January 2014, in which they clearly defined EXTEND-IA as a phase II trial (4). A definition that is conveniently left out of the formal publication in the NEJM, an oversight possibly induced by the unexpected magnitude of their success causing well deserved delusions of grandeur.

ESCAPE, though far more statistically hardy than EXTEND-IA, is still a rather small cohort suffering from the same unfortunate biases. Originally intending to enroll 500 patients, the authors called for an early stoppage, prior to their planned interim analysis, again because of the results of MR CLEAN. Although the sample size of 316 patients lends a stronger validity than the 70 patients examined in EXTEND-IA, the early stoppage prevents us from confidently assessing the true effect size this treatment may provide. Interestingly when implementing this unplanned analysis, the authors utilized a dichotomous outcome comparing the mRS scale of patients alive and independent (mRS of 0-2) at 90-days rather than the ordinal analysis they had originally chosen and utilized as their primary outcome when performing the power calculation. The ordinal scale has recently gained favor as an outcome measure in stroke trials because of its ability to augment the p-value and turn otherwise negative trials into statistical successes. Conversely it is almost impossible to determine the clinical relevance of the odds ratio it produces. Given the impressive benefits of both trials, the small statistical augmentations offered by ordinal analysis are irrelevant. As such the authors of both trials favored the more traditional dichotomous outcome. The 33.7% absolute difference measured by the dichotomous scale in the ESCAPE trial, appears far more impressive than an odds ratio of 2.6 offered by ordinal analysis (3).

With the overwhelming success of both EXTEND-IA and ESCAPE, the MR CLEAN data appears almost lacking. In the MR CLEAN cohort, patients randomized to receive endovascular therapy had a 14% absolute benefit over those in the controls. It is safe to say neither group did all that well, with the amount of patients alive and independent at 90-days reported as 33% and 19% respectively(1). The EXTEND-IA and ESCAPE cohorts however did exponentially better (71% vs 41% and 53.0% vs 29.3% respectively) (2,3). Are we truly looking at the same patients as were examined in MR CLEAN, or do the EXTEND-IA and ESCAPE cohorts represent a completely different population?

It should come as no surprise that both the EXTEND-IA and ESCAPE cohorts included vastly different patients than those enrolled in MR CLEAN. In MR CLEAN, to be eligible for inclusion patients were required to have an occlusion of distal intracranial carotid artery or middle cerebral artery (M1, M2) or anterior cerebral artery (A1) as identified by CT angiography (CTA), magnetic resonance angiography (MRA) or digital subtraction angiography (DSA)(1). Both EXTEND-IA and ESCAPE had far stricter inclusion restrictions. Patients who were enrolled in the EXTEND-IA cohort needed to demonstrate an ischemic penumbra on perfusion imaging with a small infarcted core(2). Though slightly different criteria were utilized, like EXTEND-IA, the ESCAPE cohort used CT angiographic imaging to identify patients with small infarcted cores and large areas of salvageable tissue (3). These inclusion criteria significantly narrowed the subset of stroke patients examined. These differences in patient selection are not only responsible for the almost unbelievable efficacy demonstrated in both of the EXTEND-IA and ESCAPE trials, they mark the first time that imaging criteria was successfully used to identify a cohort of stoke patients who may benefit from reperfusion therapy.

There has been a long history of failure in the use of perfusion imaging for the management of acute ischemic stroke. Early studies investigating the use of diffusion weighted MRI to identify potentially salvageable ischemic brain failed to show benefit (5,6,7,8,9). These failures may be due in part to the industry bias of only enrolling patients presenting > 3 hours after onset, in the hopes of extending FDA approved treatment windows and more importantly their profit margins. Though these trials showed promising rates of reperfusion, the consistently high incidence of intracranial hemorrhage overshadowed the minimal benefits. The MR RESCUE trial, published in NEJM in February 2013 was the first to utilize this technology to identify potential candidates for endovascular therapy (10). Again this trial failed to demonstrate that patients with ischemic penumbrae benefitted from revascularization. However this may have been due more to the trial’s flawed design than the technology’s deficiencies. The authors of MR RESCUE only enrolled patients after initial IV tPA failure. In contrast to these historical failures both the EXTEND-IA and ESCAPE cohorts, unencumbered by fears of disproving tPAs early successes, aggressively pursued reperfusion therapy after salvageable tissue was identified on CT imaging. In doing so, these trials have, for the first time, identified the population that will most likely benefit from reperfusion therapy.

At the risk of sounding optimistic, both EXTEND-IA and ESCAPE are impressively positive trials. Although small and methodologically flawed, with likely exaggerated effect sizes, when viewed in concert with MR CLEAN, these trials present endovascular therapy in a promising light. For some time now legitimate cries for more data regarding tPA’s safety and efficacy in acute ischemic stroke management have been disregarded and marginalized. This almost fanatical acceptance based around the success of the NINDS trial, a single poorly powered study which treated patients with IV tPA within 3-hours of symptoms onset. Despite the many methodogical flaws of NINDS, its results were never duplicated because of the pharmaceutical industry’s fear of losing the tenuous ground they had gained. Although there are significant harms associated with the administration of tPA, the literature has consistently suggested that there is a subset of patients who will benefit from its administration. Rather than working to identify this narrow population, we have witnessed an industry driven effort to expand the indications for reperfusion therapy. EXTEND-IA and ESCAPE have identified potential cohorts of patients who will likely benefit from reperfusion therapy. If these results can be confirmed, no longer will we be forced to use the blunt tool of perceived time from symptom onset to determine which patients are eligible for treatment. These trials should inspire us to not only explore the successful utilization of endovascular therapy, but also reexamine the harmful practice of thrombolytic therapy we currently employ.

Sources Cited:

  1. Berkhemer OA, Fransen PS, Beumer D, et al. A randomized trial of intraarterial treatment for acute ischemic stroke. N Engl J Med. 2015;372:(1)11-20.
  2. Campbell BC, Mitchell PJ, Kleinig TJ, et al. Endovascular Therapy for Ischemic Stroke with Perfusion-Imaging Selection. N Engl J Med. 2015.
  3. Goyal M, Demchuk AM, Menon BK, et al. Randomized Assessment of Rapid Endovascular Treatment of Ischemic Stroke. N Engl J Med. 2015.
  4. Campbell BC, Mitchell PJ, Yan B, et al. A multicenter, randomized, controlled study to investigate EXtending the time for Thrombolysis in Emergency Neurological Deficits with Intra-Arterial therapy (EXTEND-IA). Int J Stroke 2014;9:126-132
  5. Davis SM, Donnan GA, Parsons MW, et al. Effects of alteplase beyond 3 h after stroke in the echoplanar imaging thrombolytic evaluation trial (EPITHET): a placebo-controlled randomised trial. Lancet Neurol. 2008;7:299–309.
  6. Albers GW, Thijs VN, Wechsler L, et al. Magnetic resonance imaging profiles predict clinical response to early reperfusion: the diffusion and perfusion imaging evaluation for understanding stroke evolution (DEFUSE) study. Ann Neurol. 2006;60:508–517
  7. Hacke W, Albers G, Al-Rawi Y, et al. The desmoteplase in acute ischemic stroke trial (DIAS): a phase II MRI-based 9-hour window acute stroke thrombolysis trial with intravenous desmoteplase. Stroke. 2005;36:66–73.
  8. Furlan AJ, Eyding D, Albers GW, et al. Dose Escalation of Desmoteplase for Acute Ischemic Stroke (DEDAS): evidence of safety and efficacy 3 to 9 hours after stroke onset. Stroke. 2006;37:1227–1231.
  9. Hacke W, Furlan AJ, Al-Rawi Y, et al. Intravenous desmoteplase in patients with acute ischaemic stroke selected by MRI perfusion-diffusion weighted imaging or perfusion CT (DIAS-2): a prospective, randomised, double-blind, placebo-controlled study. Lancet Neurol. 2009;8:(2)141-50.
  10. Kidwell CS, Jahan R, Gornbein J, et al. A trial of imaging selection and endovascular treatment for ischemic stroke. N Engl J 2013;368:(10)914-23.