The Case of the Blind Allocator

maxresdefault (1)

In the modern world of evidence based medicine we exist in a perpetual state of doubt, continually attempting to perceive truths through the veil of science. Far too often our sample cohort deviates from the population it intends to represent. Hypothesis testing and frequentist statistics are tools intended to quantify the extent to which the observed results are due to random errors in sampling. And yet, there is an entirely different type of error that our statistical instruments are far less adept at appraising. This non-random form of error comes in the form of bias. This post will explore a number of common forms of bias and their extensive effects on data.

Despite the long standing belief that central venous catheters (CVC) placed in the femoral vein are at increased risk for catheter-related blood stream infections (CRBI), recent evidence has suggested that in the modern error of sterile insertion practices, the rate of line infections due to femoral catheter placement is no greater than cannulation of either the internal jugular (IJ) or subclavian (SC) veins.

In 2012, Marik et al published a paper in Critical Care Medicine with the intention of demonstrating this very assertion (1), conducting both a systematic review and meta-analysis of the existing data comparing the rates of CRBIs associated with each respective insertion site. The authors examined data of 17,376 central catheter insertions from 10 publications and concluded there was no difference in the rate of CRBI between the femoral inserted lines and their cephalad comparators. The relative risk cited was 1.02 (95% CI 0.64–1.65, p = .92) for femoral compared to SC, and 1.35 (95% CI 0.84–2.19, p = .2) when compared to IJ. Over 17,000 observed catheters insertions demonstrated no statistically significant difference in the rate of CRBIs between the various catheter insertion sites. And yet, despite the robust nature of its sample size, the validity of this meta-analysis has been questioned mostly due to the quality of the underlying data. Only 1,006, a small fraction of the total catheters placed in this analysis, were from RCT data. Of which, the majority of these originated from a single trial examining emergent dialysis catheters placed in either the femoral or internal jugular vein, which found no significant difference in the rate of central line infection between these two sites (10). But it is unclear if dialysis catheters, which are kept impeccably clean and accessed only for dialysis, translate to the heavily exploited standard CVCs used in the critically ill.

Observational cohorts, totaling 16,370 catheters, accounted for the remainder of the data in the Marik et al meta-analysis. When comparing outcomes between groups, observational data presents a number of methodological problems. In this instance, the location of catheter placement was not randomly assigned. Leading to an immense potential for selection bias, as the factors that determined site of cannulation may directly influence the likelihood that the catheter becomes infected. For example, in patients with severe respiratory distress, the cannulation of the SC vein may be avoided due to a fear of causing a pneumothorax. This leads to the placement of IJ and femoral catheters in a sicker subset of patients who are, in turn, at a greater risk of infection. Additionally, due to the pre-existing bias of many clinicians, femoral lines may have been removed earlier than either IJ or SC lines. The risk of central line infection is directly related to its time in situ, and thus their abbreviated use may underestimate the true risk of infection associated with femoral venous cannulation.

To further complicate matters, Marik et al eliminated two large trials from their analysis claiming they were statistical outliers (1). Although such a deletion may be statistically appropriate, the redacted trials demonstrated a far higher rate of line infections when the femoral site was utilized (2,3). When these trials are included in the analysis, the difference in the rate of CRBIs between the femoral and IJ insertion sites becomes statistically significant.

Essentially, there are too many confounding variables to be able to clearly interpret the data utilized in the Marik et al meta-analysis. Mathematical manipulations of this data, in the form of regression analyses, do not clarify the matter. This type of error is difficult to correct through statistical modeling and can only truly be controlled using randomization. Randomization accounts for confounding variables by randomly distributing them amongst the study arms. When implemented correctly, one may assume the observed differences are caused by the treatment effect in question.

A recent trial published in the NEJM sought to do just that. Parienti et al examined 3,471 catheter insertions in 3,027 patients in ten ICUs throughout France (4). Lines were inserted by “experienced” house staff, each required to have at least 50 previous line insertions. All lines were inserted using strict sterile precautions and Seldinger technique, though the use of ultrasound guidance was left to the inclination of the clinician performing the procedure. Patients were enrolled if the treating physician determined that at least two of the three sites (IJ, SC, or femoral) were appropriate for cannulation. At which point the patient was randomized to site.

The authors found a significant difference in their primary outcome, the rate of catheter-related infections and symptomatic deep-vein thrombosis, between the patients randomized to undergo SC line placement when compared to both IJ or femoral placement. Overall there were 8, 20 and 22 events in the SC, IJ and femoral sites respectively, which translates to 1.5, 3.6 and 4.6 events per 1000 catheter-days respectively. This was offset by an almost identical increase in the rate of mechanical complications (arterial injury, hematoma, pneumothorax or other), observed in patients randomized to the SC insertion site when compared to both the femoral or IJ groups (2.1%, 1.4% and 0.7% respectively) (4). This difference was made up entirely of an increase in the rate of pneumothoraxes observed in the SC group. And yet despite the randomized nature of this trial, the methodology utilized by Parienti et al makes interpretation less than straightforward.

As discussed, the major flaw in the Marik et al meta-analysis was the fact that the majority of the data was obtained from non-randomized cohort data, making it extremely difficult to account for the confounding variables that might have influenced site selection. Ideally randomization should eliminate these biases. Unfortunately because of a number methodological concerns, the Parienti et al trial failed to control for bias as well as we would have hoped.

For randomization to be valid, it is vital the participating clinicians are not aware of patient group assignment prior to randomization. This is what is called allocation concealment. Prior knowledge of such events will lead to a selection bias, as there is a tendency for clinicians to exclude certain patients based on their own beliefs regarding the validity of the treatments being examined (5,6). For example, a patient with severe respiratory distress may not be enrolled in the trial if the physician had prior knowledge that the patient would be randomized to SC site insertion, primarily due to potential for pneumothorax. Improper allocation concealment will exclude a certain subset of patients and produce results that systematically deviate from reality (6). Although Parienti et al did attempt to conceal allocation prior to randomization by the utilization of a permuted-block randomization with varying block sizes, they allowed the treating physicians to exclude one site prior to randomization, if it was deemed not suitable for clinical use. This allowance was probably unavoidable, as it is not uncommon for one or more vessels to be inaccessible in clinical practice, but this concession allows for the introduction of the very selection bias we were hoping to avoid through randomization (6).

Of the 3,471 catheters placed, 2,532 (72.9%) were placed in patients in whom all three sites were deemed accessible. This leaves 940 catheters (a little more than 25%) that were placed in patients in which the treating clinician had eliminated one site prior to randomization. The majority of these exclusions (570) were of the SC site, because the treating physician felt the risk of pneumothorax or bleeding was unacceptably high. Another 277 of the exclusions were of the femoral site, 45% because of “site contamination”. These exclusions potentially prevented the highest risk patients from being randomized into the SC and femoral insertion sites, leading to the very type of bias found in the observational data in the Marik et al meta-analysis we hoped to eliminate.

A further source of bias in the Parienti trial, can be traced to its inability to blind practitioners to the treatment group after allocation. For obvious reasons such blinding would have been unfeasible in a trial such as this, but it does allow for the introduction of yet another source of bias. When RCTs lack adequate blinding, the risk of ascertainment bias is prominent. Ascertainment bias is the systematic, non-random distortion of the measurement of the true frequency of an event because of the investigator’s knowledge and assumptions of the group allocation (7). In this case, patients randomized to the femoral site had their CVC in place significantly shorter than patients randomized to either the SC or IJ sites (mean catheter days approximately 5.9 +/- 4.8 for femoral and 6.5 +/- 5.4 for IJ and SC). Since risk of infection is directly related to length of catheter duration this difference could potentially skew the results in favor of the femoral site.

The authors attempt to control for these confounders through the use of regression analysis and analyzing catheter events per catheter day, rather than per insertion. Just as we discussed regarding the Marik et al meta-analysis, these types of statistical compensations cannot support such methodological frailties.

Despite its flaws, Parienti et al have gathered the largest, most complete data set in existence addressing the complication rate of CVC insertion. I suspect their results are as close a proximity to the truth as we currently have. As such, if we are willing to accept the slight increase in the rate of pneumothorax, the SC vein may be the preferred initial option for central venous cannulation, with the caveat that the true pneumothorax rate might be higher than observed due to the large number of exclusions prior to randomization (4).

So often in the interpretation and translation of medical literature we find ourselves lost in the statistical minutiae, citing p-values and confidence intervals as if they hold intrinsic value. And yet these statistical manipulations are for the most part concerned with quantifying the extent the results observed are due to random chance. Their mathematical constructs cannot account for the non-random error caused by methodologic missteps. Collecting data in the face of these flaws and attaching a statistical judgment to the results does nothing to legitimize its validity.

Sources Cited:

  1. Marik PE, Flemmer M, Harrison W. The risk of catheter-related bloodstream infection with femoral venous catheters as compared to subclavian and internal jugular venous catheters: a systematic review of the literature and meta-analysis. Crit Care Med. 2012;40(8):2479-85.
  2. Lorente L, Henry C, Martín MM, et al: Central venous catheter-related infection in a prospective and observational study of 2,595 catheters. Crit Care 2005; 9:R631–R63529.
  3. Nagashima G, Kikuchi T, Tsuyuzaki H, et al: To reduce catheter-related bloodstream infections: Is the subclavian route better than the jugular route for central venous catheterization? J Infect Chemother 2006; 12:363–365
  4. Parienti JJ, Mongardon N, Mégarbane B, et al. Intravascular Complications of Central Venous Catheterization by Insertion Site. N Engl J Med. 2015;373(13):1220-9.
  5. Schulz KF, Grimes DA. Generation of allocation sequences in randomised trials: chance, not choice. Lancet. 2002;359(9305):515-9.
  6. Schulz KF, Grimes DA. Allocation concealment in randomised trials: defending against deciphering. Lancet. 2002;359(9306):614-8.
  7. Schulz KF, Grimes DA. Blinding in randomised trials: hiding who got what. Lancet. 2002;359(9307):696-700.
  8. Altman DG, Bland JM. Uncertainty and sampling error. BMJ. 2014;349:g7064.
  9. Altman DG, Bland JM. Uncertainty beyond sampling error. BMJ. 2014;349:g7065.
  10. Parienti JJ, Thirion M, Mégarbane B, et al; Members of the Cathedia Study Group: Femoral vs jugular venous catheterization and risk of nosocomial events in adults requiring acute renal replacement therapy: A randomized controlled trial. JAMA 2008; 299:2413–2422





The Case of Dubious Squire Continues

antique_anatomy_illustration__human_heart_blood_circulation_circa_1911_sjpg3562In the era before the ubiquitous use of bedside ultrasound, BNP and its derivative natriuretic peptides were, at best, a mediocre test that added little to clinical judgment. In today’s world of sonographic abundance, they simply add noise to our already deafening workflow.

Despite a wealth of evidence demonstrating natriuretic peptides’ lack of clinical utility, their use has become an abundant and reflexive component in the workup of suspected acute decompensated heart failure. While consistently failing to adequately lend diagnostic guidance in patients where clinical uncertainty is present, in the eyes of many, natriuretic peptides have remained a viable diagnostic pathway, simply for lack of a better option.

In a recent publication by Pivetta et al, in CHEST (1), the authors remind us that when presented with a diagnostic question it is important to select a test capable of providing the answer. Authors enrolled 1,005 patients presenting to the Emergency Department with acute dyspnea. Patients were excluded if they had an obvious cause of symptoms clearly unrelated to acute decompensated heart failure (trauma), or if there was no Emergency Physician present with ultrasound expertise (defined as > 40 completed scans). Patients underwent a standardized workup including history, physical exam, EKG and arterial blood gas (ABG), after which the Emergency Physician was asked to categorize the presentation as acute decompensated heart failure or non-cardiac in origin. After this, they performed a standardized point of care ultrasound (POCUS) examination that consisted of a 6-zone scanning protocol. Diffuse interstitial syndrome (DIS) was defined as the presence of two or more zones with three or more B-lines on bilateral lung fields. The final diagnosis was determined by a review of each patient’s hospital course performed by an Emergency Physician and Cardiologist, who were blinded to the POCUS findings (1).

Of the 1,005 patients enrolled, 463 patients (46%) were given the final diagnosis of acute decompensated heart failure. The agreement of the two physicians determining this gold standard was excellent, only disagreeing on 3.5% of the cases. The treating physician’s ability to clinically differentiate cardiac from a non-cardiac cause of the presenting dyspnea was exceptionally good. The physicians demonstrated a sensitivity and specificity of 85.3% and 90% respectively. In fact the performance of the POCUS alone, though numerically better (sensitivity of 90.5% and a specificity of 93.5%), did not differ statistically from the physician’s intrinsic diagnostic capabilities. Although in isolation each performed well, the combination of the clinical and sonographic exams significantly augmented their mutual diagnostic capabilities. The sensitivity and specificity of the physician’s judgment in addition to lung US was 97% and 97.4% respectively. More importantly for the purposes of this post, was its performance when compared to the natriuretic peptides. Of the 1,005 patients, 486 had a natriuretic peptide drawn. Its ability to differentiate cardiac causes of dyspnea was worse than the unassisted judgment of the treating physician. The sensitivity and specificity was 85% and 67.1% respectively (when threshold for a positive test was prospectively set at 400 pg/mL for BNP, and 450, 900, and 1,800 pg/mL for patients, 50 years old, between 50 and 75 years old, and 75 years of age, respectively, for NT-pro-BNP) (1).

This study is far from perfect. This was a prospective observational study that did not enroll consecutive patients, required an Emergency Physician competent in the use of bedside US, and only obtained natriuretic assays in approximately 50% of the cohort (1). And yet despite these obvious flaws, this trial serves to illustrate an important point in the interpretation of diagnostic test results. In the Emergency Department we function in varying degrees of uncertainty. We are constantly being shown a single cross section of a disease process and asked to predict its subsequent velocity and acceleration. We are expected to perform the impossible task of calculating the slope of a line with only one point of data. We estimate these slopes in the form of risk. The greater the risk the stronger the force acting to overcome our intrinsic inertia. There is a certain probability above which the risk of pathology is high enough to compel further investigation. Below this threshold the probability of disease and its accompanying burdens are not worth further diagnostic consideration. Conversely there are cases where the potential of disease is so high that the treatment threshold has already been crossed, and further diagnostic studies are incapable of lowering the risk enough to justify withholding the necessary interventions (2). As Emergency Physicians we exist in is the gray zone, the area between the test and treatment thresholds. As such, it behooves us to utilize tests with the diagnostic capability necessary to shift the post-test probability into either extremes of the continuum.

Screen Shot 2015-09-22 at 12.30.24 PM
Fig 1


Using the more traditional test characteristics, sensitivity and specificity, it is very difficult to intuit how a particular test result will shift an individual patient’s probability of disease. Through the use of a two-by-two table we are able to determine how often a patients with the disease in question is correctly identified by a positive test result (sensitivity) and how often a patient without the disease is likely to have a negative test result (specificity). But this retrospective evaluation defines a test’s performance from the perspective of a population in which the final diagnosis is already known (3). It does little to prospectively predict the risk of an individual patient with a specific test result. In contrast, the likelihood ratio (LR) is a prospective mathematical concept describing a diagnostic test’s ability to alter a patient’s risk. Essentially an LR calculates the percentage of patients with the disease that will have a specific test result, divided by the percentage of patients without the disease who will have the same test result (4).Screen Shot 2015-09-22 at 3.27.42 PM


Screen Shot 2015-09-22 at 12.30.59 PM
Fig 2

A negative LR (-LR) measures the probability of patients with the disease who will have a negative test result, divided by the probability of patients without the disease who will have a negative test result. The positive LR (+LR) is the exact opposite; the probability that patients with the disease will have a positive test result divided by the probability that patients without the disease will have a positive test result. LRs greater than one will shift the probability towards the treatment threshold, and ratios less than one shift the post-test probability in the opposite direction, towards the test threshold. The marker of a useful test is one that will consistently move the post-test probability out of this zone of uncertainty. Typically negative and positive LRs of 10 and 0.1 are considered the minimal level for diagnostic utility. Levels less than 10 or greater than 0.1 will not consistently shift the post-test probability above or below the test or treatment threshold (4,5).

Screen Shot 2015-09-22 at 12.31.09 PM
Fig 3

Pivetta et al illustrated that when the Emergency Physician is confident in their clinical diagnosis, they consistently identify the presence or absence of decompensated heart failure. In these cases, clinical judgment alone has correctly identified the patients, as below the test threshold or above the treatment threshold, further diagnostic studies are not required. In the remainder of patients where clinical judgment is insufficient, the LRs possessed by the natriuretic peptides (2 and 0.2 respectively) are insufficient to reliably shift the post-test probability out of this zone of uncertainty. Conversely, in the spectrum of patients where clinical judgment was unable to correctly differentiate decompensated heart failure from other causes of dyspnea, lung ultrasound was exceptionally useful. Pivetta et al found that when POCUS was used to augment clinical judgment, the positive and negative LRs were effectively diagnostic (22.3 and 0.03 respectively) (1).

The vast majority of the time, the Emergency Physician is more than capable of clinically identifying patients presenting in acute decompensated heart failure. In the few cases that cast a diagnostic dilemma, natriuretic peptides provide no additional diagnostic guidance. Bedside ultrasound is a swift non-invasive tool in possession of likelihood ratios robust enough to shift post-test probability to a degree that is clinically relevant. Now is the time to speak frankly about natriuretic peptides. They are diagnostic clutter, another lab value flagged as abnormal that must be acknowledged before discarding as unhelpful. Natriuretic peptides add noise to an already uncertain baseline, making it only more difficult to detect the signal through the already thunderous cacophony that is diagnostic uncertainty.

Sources Sited:

  1. Pivetta E, Goffi A, Lupia E, et al. Lung ultrasound-implemented diagnosis of acute decompensated heart failure in the Emergency Department – A SIMEU multicenter study. Chest. 2015
  2. Pauker SG, Kassirer JP. The threshold approach to clinical decision making. N Engl J Med. 1980;302(20):1109-17.
  3. Altman DG, Bland JM. Diagnostic tests. 1: Sensitivity and specificity. BMJ. 1994;308(6943):1552.
  4. Deeks JJ, Altman DG. Diagnostic tests 4: likelihood ratios. BMJ. 2004;329(7458):168-9.
  5. Fagan TJ.Letter: Nomogram for Bayes theorem.N Engl J Med1975; 293:257.

The Adventure of the Impassable Stone

Carbuncle_pagetAs medical skeptics we have a tendency to revel in the negative study. We bemoan the p-value’s tendency to underestimate the risk of type I error and cite Frequentist statistics’ history of getting it wrong almost as often as it gets it right. Despite these nihilistic inclinations it is important that we are equally vigilant in identifying circumstances in which the risk of type II errors is high. A number of recent trials examining the use of medical expulsion therapy (MET) in ureteral colic illustrate the risk of such errors.

The first of these trials published by Pickard et al in The Lancet, in May 2015, examined both alpha blocker (tamsulosin 0.4 mg) and calcium channel blocker (nifedipine 30 mg) therapy in patients with CT confirmed ureterolithiasis (1). The authors randomized 1137 patients with stones 10 mm or less to receive either 0.4 mg of tamsulosin, 30 mg of nifedipine or placebo. Patients were excluded if they presented with obvious signs of sepsis, had significant renal failure (GFR<30) or required immediate invasive therapy as prescribed by the treating physician.

The authors found there to be no significant difference in their primary outcome, the rate of spontaneous passage at 4-weeks, between those randomized to the tamsolusin, nifedipine or placebo arms. Spontaneous stone passage, defined by absence of need for intervention to assist stone passage during the 4 week follow up, was 307 (81%), 304 (80%), and 303 (80%) respectively. There was also no significant differences noted in the need for pain medication, the number of days pain medication was required, or the visual analog scale (VAS) of patients pain at 4 weeks (1). By all accounts this was an impressively negative trial.

A second study was recently published online in July 2015 in Annals of Emergency Medicine. Like the Pickard et al trial, this trial, by Furyk et al examined the effects of MET in patients with CT confirmed ureterolithiasis(2). The authors randomized patients with stones 10 mm or less located in the distal ureter to either MET with 0.4 mg of tamsulosin or placebo. Patients were excluded if they demonstrated signs of infection or presented with a compromised GFR. And like the previous study, the authors found no statistical difference in the number of patients who experienced stone passage at 28 days (87.0% and 81.9% in the tamsulosin and placebo groups respectively)(2). We now have two high quality RCTs demonstrating that the use of MET is not beneficial in the management of acute ureteral colic. This should conceivably end the debate regarding the utility of alpha blockade for ureteral colic.

And yet despite what on first glance appears to be convincing evidence, neither of these trials address the pressing question regarding MET. The majority of patients in both these trials had stones less than 5 mm in diameter. Most small stones will pass without difficulty (6,7). As these trials demonstrate it is impossibly hard to show a statistically significant difference in an undifferentiated cohort of renal colic patients. The real question is, does MET work in patients with stones greater than 5 mm in diameter? Can these trials definitively demonstrate a lack of utility of MET in these patients?

To examine this question appropriately we first must define statistical power. Power is the ability of a trial to detect a statistically significant difference between two groups when a true difference exists (3). It is the ability to separate true positives from false negatives, essentially the trial’s sensitivity. Traditionally, an acceptable statistical power has been set at 80 or 90%. The true meaning of such a statement is nebulous and it becomes far easier to understand statistical power when utilizing quantifiable measures.

The Pickard et al trial based their sample size calculation on the ability to detect a 10% absolute difference between the tamsulosin group and its comparators with a power of 90%(1). What this translates to is, if the observed difference between the tamsulosin group and its comparators were zero (p=1.0), the trial would not be able to confidently rule out an absolute difference as large as 6%. Conversely if the trial did in fact find a 10% improvement in patients randomized to alpha blockade, this effect size could range as low as 4% or as high as 16%. In fact, this is exactly what they found. The 95% confidence interval surrounding 1% absolute risk reduction (ARR) in patients randomized to receive tamsulosin was –4.4% to 6.9 %. Conversely, in the subset of patients with stones greater than 5 mm in width, Pickard et al observed an absolute difference of 10% in the rate of stone passage at 4 weeks in favor of those randomized to receive tamsulosin. This difference did not reach statistical significance. It is important to note that power is a prospective concept calculated prior to knowing the results of a study. To retrospectively state a trial is underpowered once the results of the study are known is somewhat disingenuous. The claim that the observed difference is true and only failed to reach statistical significance due to an inappropriately small sample size, may in fact be correct, but is not justifiable due to the data alone. Any post-hoc power calculation performed on such a data set will inevitably demonstrate the limited ability to differentiate a true difference from the null hypothesis(4). Once the trial results are obtained, post-hoc calculations should be avoided, focusing instead on the confidence intervals surrounding the point estimates for a more honest interpretation of the data (3). In this case, we are unable to differentiate a 10% difference in stone passage from no effect. In fact the 95% confidence interval ranged from -2.8% to 23.6% (1). Clearly this trial was not designed to answer the question of whether MET is beneficial in patients with large diameter ureteral stones.

The results of the Furyk trial are even more compelling. Though the primary endpoint was the overall proportion of patients with stone passage at 28-days, the authors powered their study for an entirely different question. The study was powered to detect a difference in the rate of stone passage in patients with larger stone diameters (5-10 mm). The authors calculated they would require 98 patients with stones greater than 5 mm to detect a 20% difference in stone passage with an 80% power (2). This means that if no difference was observed, the authors would be unable to exclude a difference as large as 14%. While their primary outcome was negative, in the subgroup of patients this study was powered to examine, the authors found a 22.4% absolute difference in the rate of stone passage at 28-days. The confidence interval surrounding this point estimate ranged from 3.1%-41.6%. Although it is unwise to make claims of significance based off a secondary endpoint with such a wide confidence interval, it is equally unfair to use this data to disprove a hypothesis, which this trial is not designed to refute.

We are all aware of the hazards of subgroup analyses, and yet it is important to be honest in our skepticism. This in no way should be viewed as an endorsement of MET or the necessity of obtaining imaging to identify a subgroup of patients who may benefit from tamsulosin. On the contrary, these trials demonstrate that for the majority of patients presenting to the Emergency Department with renal colic, MET provides little additional benefit above symptomatic treatment. But a trial can only answer the question it was designed to ask. Neither of these trials were built to confidently address whether MET is beneficial in patients presenting with larger stones. Earlier trials examining this question are either so confounded by non-blinding and selection bias to make them interpretable or suffer from the same deficiencies in statistical power to confidently address the effects of MET for patients with larger stones (5). We are left with statistical and philosophical uncertainty regarding the utility of alpha-blockers in acute ureteral colic. We will continue to exist in this state of ambiguity until we have a study sufficiently powered to ask whether MET is efficacious in patients with large ureteral stones. Many would love to discard alpha-blockers for renal colic in our ever-growing pile of medical impotencies, but given the current state of the literature, this renouncement would be premature and unjust.

Sources Cited:

  1. Pickard R, Starr K, Maclennan G, et al. Medical expulsive therapy in adults with ureteric colic: a multicentre, randomised, placebo-controlled trial. Lancet. 2015
  2. Furyk, Jeremy S. et al. Distal Ureteric Stones and Tamsulosin: A Double-Blind, Placebo-Controlled, Randomized, Multicenter Trial. Annals of Emergency Medicine. Published online: July 17 2015
  3. Goodman SN, Berlin JA. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med. 1994;121(3):200-6.
  4. Goodman SN. A comment on replication, P-values and evidence. Stat. Med. 1992;11:875-9.
  5. Campschroer, T., Zhu, Y., Duijvesz, D. et al. Alpha-blockers as medical expulsive therapy for ureteral stones. Cochrane Database Syst Rev. 2014; : CD008509
  6. Coll, D.M., Varanelli, M.J., and Smith, R.C. Relationship of spontaneous passage of ureteral calculi to stone size and location as revealed by unenhanced helical CT. AJR Am J Roentgenol. 2002; 178: 101–103
  7. Miller, O.F., Kane, C.J. Time to stone passage for observed ureteral calculi: a guide for patient education. J Urol. 1999;162:688–690 (discussion 690-691).

The Case of the Non-inferior Inferiority


The practice of Frequentist statistics is often a study in extremes. Based on an arbitrary threshold of significance, we are asked to interpret data as either positive or negative when in reality it merely shifts our probability of certainty. Even more important, because of the singular nature of Frequentist statistics, our interpretation of data is often constrained to the questions posed by those designing the trial. Although a strict deductive methodology is important to prevent mistaking random chance for scientific proof, it is equally important to understand in which instances abiding by these laws will lead to a misinterpretation and misunderstanding of the data.

Appendicitis has long been considered a surgical emergency. If it is not intervened upon surgically in a timely fashion the pathological sequelae will lead to perforation, sepsis, and death. And yet, despite this foregone conclusion, a number of trials have challenged the necessity of cold steel in the management of acute appendicitis. Most recently, in JAMA, Salminen et al published the findings from their RCT comparing the traditional surgical management of acute appendicitis to conservative treatment with antibiotic therapy alone (1). Despite the authors’ primary conclusion, this trial demonstrated that in patients with non-complicated acute appendicitis, the use of antibiotic therapy is anything but inferior.

Salminen et al randomized 530 patients with CT confirmed non-complicated acute appendicitis to either surgical management using primarily open laparotomy, or a short course of IV antibiotics (3-days of ertapenem), followed by a 7-day course of oral levofloxacin. Of the 273 patients randomized to the surgical group, 272 (99.6%) underwent successful appendectomy. In the patients randomized to conservative therapy 70 patients (27.3%) underwent appendectomy within one year of initial presentation. Lets pause for a moment. A disease process, which for the past century has been considered a surgical necessity, with 72.7% of patients treated successfully with antibiotics alone (1). Despite these impressive numbers the trial was deemed unsuccessful as the rate of “treatment failure” in the conservative group crossed the predetermined non-inferiority margin of 24%. And yet these statistical inadequacies are based less on the inferiority of antibiotic therapy and more on the authors’ unfortunate choice of how exactly they defined “non-inferior”.

Non-inferiority trials are intended to ask a very specific question. Whether a new treatment strategy or medical intervention is comparable to the traditional standard therapy. Rather than examine the two in the hopes of determining superiority, a non-inferiority trial merely attempts to establish this new treatment is no worse than the current standard care. This type of trial is undertaken when the new treatment provides certain advantages that would make it preferable over the old treatment (2,3). For example if it is cheaper, safer, or less invasive one might prefer to use this new treatment rather than expose the patient to the cost, risk, or intrusive nature of the prior strategy. In fact depending on what advantages a new treatment may provide, one might accept some degradation in efficacy as long as it does not cross a predefined threshold for inferiority. This threshold is based upon a number of assumptions. First, what is the proven efficacy of the established standard? Say for example, this standard in previous studies demonstrated an absolute decrease in mortality of 5%. The confidence interval surrounding this point estimate ranges from 3%-7%. You would not want your new intervention to be 3% less effective than the standard comparator, in which case it would prove to be as beneficial as placebo. Second, what added benefits does this new therapy provide? If these advantages are impressive, then you may accept a greater degree of inferiority when compared to the standard treatment strategy (a lower non-inferiority margin). On the other hand, if this new treatment provided few novel advantages, you would likely accept far less deviation from the standard treatment’s efficacy.

Salminen et al utilized neither of these considerations when calculating their non-inferiority margin. In fairness to the authors, it would be exceedingly difficult to accurately access the true efficacy of surgery over placebo as this standard of care was established long before placebo control trials were utilized to define treatment effect. Where the authors did falter was the manner in which they determined their non-inferiority margin and performed their power calculation. Using data from prior studies examining the efficacy of antibiotic therapy in acute appendicitis, the authors estimated a 25% rate of treatment failure (defined as need for surgical intervention within one year of initial presentation) in the patients randomized to conservative treatment (1). Using this estimate they set their non-inferiority margin at no more than 24% treatment failure in patients randomized to antibiotic therapy, essentially dooming their trial from its earliest power calculations.

Non-inferiority trials ask a different question than the traditional superiority trials that we are more accustomed. Rather than presenting a null hypothesis that states there is no difference between the groups, the non-inferiority trial design operates under the assumption that the novel intervention is inferior to the standard treatment. The alternative hypothesis states that the treatment options are equivalent. In order to reject the null hypothesis the novel treatment must demonstrate a near equivalent efficacy within a degree of certainty. This means that both the point estimate and surrounding confidenceScreen Shot 2015-07-09 at 1.29.15 PMintervals must fall above the non-inferiority margin (2,3). In this case, despite all prior evidence demonstrating the contrary, the authors estimated that 275 patients per group would provide a 90% power to demonstrate the non-inferiority of conservative management for acute appendicitis when compared to the more traditional surgical intervention. Essentially this translates into the non-surgical group having to demonstrate a point estimate of approximately 20% treatment failure within one year for the lower end of the confidence interval not to cross their predefined non-inferiority margin. Further hampering their efforts, the authors halted the trial early after enrolling only 530 patients (rather than the 610 planned in the original power calculation), increasing the already wide confidence interval surrounding their point estimate (1).

It should have come as no surprise that the authors failed to demonstrate non-inferiority by their designated definition. The authors found that 27% of patients randomized to antibiotic therapy required an appendectomy within 1-year of initial presentation. The 95%-confidence interval surrounding this point estimate was 22.0% to 33.2% (1). In the two trials in which they used to justify their non-inferiority margin of 24%, the 1-year failure rate in patients treated with antibiotics was cited as 24% and 23.6% respectively (4,5). Unfortunately in the latter of these to trials by Hannson et al, this failure rate was calculated from the per-protocol analysis rather than the intention to treat analysis. In reality the antibiotic group had a 47.5% crossover rate to surgery. The overall failure rate in the intention-to-treat analysis was 60% (5). In an additional trial by Vons et al, published in the Lancet in 2011, the 1-year appendectomy rate was 37%. The 95%-confidence interval around this point estimate ranged form 28.36% to 45.64% (6). The 2011 Cochrane analysis after examining the 5 existing RCT trials found 26.6% (95%-confidence interval 18.1%- 37.3%) of the patients randomized to antibiotic therapy went on to have an appendectomy within 1-year of initial presentation (7). Given that the previous evidence indicates that the rate of antibiotic failure has consistently been greater than 25% and has ranged as high as 60%, the expectation by Salminen et al that they would find non-inferiority of antibiotic therapy with a non-inferiority margin of 24% was optimistic to say the least.

More importantly was appendectomy rate at 1-year truly the most appropriate criteria with which to define inferiority? This trial was not negative because medical management proved to be inferior to surgical appendectomy, rather it was negative because the authors asked the wrong question. As clinicians what is our concern with the medical management of acute appendicitis? It is not whether 20% or 27% of those initially treated with antibiotics will eventually require an appendectomy, but rather does medical therapy lead to an unacceptably high rate of serious complications? In fact if we were to be completely equitable, while 99.6% of the patients in the surgical arm of this trial underwent appendectomies, only 27% of the patients in the medical management arm were exposed to an invasive procedure. The question the authors should have asked was, “How many patients in each arm experienced resolution of symptoms related to acute appendicitis without experiencing acute complications related to delays in treatment (perforation, abscesses, sepsis, etc)?” If the authors had asked this question their answer would have been entirely different. Among the patients randomized to medical management, of the 257 patients, 15 (5.8%) required appendectomy during their initial hospital admission. Only 5 (1.9%) patients in the antibiotic group experienced perforations requiring surgical intervention, compared to 2 out of 273 (0.7%) patients randomized to an immediate surgical intervention (1). Essentially you would have to treat 100 patients with non-complicated acute appendicitis in order to prevent one perforation.

Certainly there is a great deal to be determined before this non-invasive strategy can be considered mainstream practice. This was a small underpowered cohort in which the participating surgeons performed primarily open laparotomies. How this strategy translates to the US where the primary approach to appendectomies is laproscopic intervention, is unclear. Additionally, whether patients require 3 days of broadspectrum IV therapy followed by a 7-day course of oral therapy is unknown. What seems obvious is in what was once considered an exclusively surgical disease, the majority of patients can effectively be managed conservatively. Despite not meeting their own high standards for non-inferiority, the authors demonstrated that for most patients with acute appendicitis, when treated conservatively with antibiotics we can avoid surgical intervention without complications of delays to definitive care. To define such a revelation as inferior is unjust indeed.

Sources Cited:

  1. Salminen P, Paajanen H, Rautio T, et al. Antibiotic Therapy vs Appendectomy for Treatment of Uncomplicated Acute Appendicitis: The APPAC Randomized Clinical Trial. JAMA. 2015;313(23):2340
  2. Kaji AH, Lewis RJ. Noninferiority Trials: Is a New Treatment Almost as Effective as Another?. JAMA. 2015;313(23):2371-2.
  3. Kaul S, Diamond GA. Good Enough: A Primer on the Analysis and Interpretation of Noninferiority Trials. Ann Intern Med. 2006;145:62-69
  4. StyrudJ,ErikssonS,NilssonI,etal. Appendectomy versus antibiotic treatment in acute appendicitis: a prospective multicenter randomized controlled trial. World J Surg. 2006;30(6):1033-1037.
  5. HanssonJ,KörnerU,Khorram-ManeshA, Solberg A, Lundholm K. Randomized clinical trial of antibiotic therapy versus appendicectomy as primary treatment of acute appendicitis in unselected patients. Br J Surg. 2009;96(5):473-481.
  6. VonsC,BarryC,MaitreS,etal.Amoxicillinplus clavulanic acid versus appendicectomy for treatment of acute uncomplicated appendicitis: an open-label, non-inferiority, randomised controlled trial. Lancet. 2011;377(9777):1573-1579.
  7. Wilms IM, De hoog DE, De visser DC, Janzing HM. Appendectomy versus antibiotic treatment for acute appendicitis. Cochrane Database Syst Rev. 2011;(11):CD008359.


The Case of the Irregular Irregularity


We have proven ourselves highly capable of managing atrial fibrillation in the Emergency Department. In recent years, a number of prospective cohorts have demonstrated that with the use of IV anti-arrhythmic medication and electrical cardioversion, patients presenting to the Emergency Department with new onset atrial fibrillation can be successfully discharged in sinus rhythm consistently and with minimal adverse events. In 2010, Steill et al published a case series of 660 patients who were cardioverted in the Emergency Department (1). What they coined the “Ottawa Aggressive Protocol” consisted of chemically managed rate control followed by a trial of procainamide loaded over an hour and, if this failed to convert the patient, DC electrical cardioversion. Using this protocol, Steill et al cite the number of patients who were discharged home in normal sinus rhythm to be 595 (90.2%). In a recent systematic review published in the European Journal of Emergency Medicine, Coll-Vinent et al found that in patients who underwent Emergency Department cardioversion, 78.2%-100% were discharged home in a normal sinus rhythm (2).

But competency is not directly translatable into efficacy. Despite this proof of concept, there is limited data examining the patient-oriented benefits these aggressive rhythm control strategies produce. In fact, the majority of such studies employ the “rhythm at Emergency Department discharge” as their measure of success. And though being discharged from the Emergency Department in a sinus rhythm seems preferential over atrial fibrillation, little is known regarding the extent of this benefit, as very few trials rigorously monitored patients following discharge from the Emergency Department. How many of these patients remained in a sinus rhythm and for how long? Steill et al found that only 8.6% of their cohort returned to the Emergency Department within one week of cardioversion with any reoccurrence of atrial fibrillation. Unfortunately these numbers were calculated from a chart extraction of the Ottawa Hospital health records database and do not directly reflect the number of patients who experienced atrial fibrillation over the 7 days following Emergency Department discharge (1). Decker et al, in a small cohort of 150 patients, cite a recurrence rate of 10% at 6 months (3). What is the true recurrence rate? Even more importantly, does reestablishing sinus conduction lead to improved patient health and wellbeing?

The question at hand remains, what exactly are we achieving by performing cardioversions in the Emergency Department? We have known for some time that despite being capable of maintaining patients in a sinus rhythm with moderate success, an aggressive rhythm control strategy does not prevent the long term sequelae associated with atrial fibrillation. The AFFIRM trial published in the NEJM in 2002, demonstrated that in a cohort of 4060 patients with atrial fibrillation, although the use of a rhythm control strategy reduced the time patients spent in atrial fibrillation, it did not reduce the rate of death, MI or ischemic stroke (4). When the 1391 patients experiencing their first episode of atrial fibrillation or the 1252 patients presenting within 48 hours of symptom onset were examined separately, no additional benefit was discovered (4). Since the AFFIRM trial’s publication a number of studies, performed in various subsets of atrial fibrillation patients, have validated that rhythm control strategies do not prevent the long-term sequelae associated with this chronic disease (5,6)

Since rate control is the preferred long-term treatment strategy of atrial fibrillation, what exactly are our goals for cardioversion in the Emergency Department? Is there a long-term health benefit to aggressive rhythm control in the Emergency Department? Does this lead to noticeable improvements in patient outcomes? Unfortunately conclusive data on these questions has yet to be published. The few RCTs examining the benefits of aggressive management of atrial fibrillation in the Emergency Department are small and inconclusive. Despite this paucity of convincing evidence, I would argue that the mathematical likelihood of benefit is incredibly low. Atrial fibrillation is a chronic disease, with sequelae measured in events per patient year. The rate of short-term adverse events is exceedingly low, with some cohorts citing a 30-day event rate of less than 1% (7). To design a study powered to identify a statistically meaningful difference, the sample size required would be unrealistically high. Especially given that the long-term utilization of such rhythm control strategies have not yielded clinically important improvement in patient outcomes. Furthermore the act of emergent cardioversion, does not avert the need for anticoagulation, as this decision should be based off the patient’s risk of thromboembolic event independent of their rhythm at discharge (8).

If we can agree that the clinical benefits of aggressive cardioversion in the Emergency Department are minimal, then the only remaining justification for Emergency Department cardioversion are the positive effects on patient wellbeing and comfort. The current argument in support of Emergency Department cardioversion hinges on the supposition that a state of sinus regularity is preferred when compared to the electrical chaos induced by atrial fibrillation (9). Until recently this claim has been exclusively supported by anecdotal descriptions of patient experience, its validity had never been examined in a prospective fashion.

Published online June 2015 in the Annals of Emergency Medicine, Ballard et al sought to objectively assess the effects of Emergency Department cardioversion on patients’ wellbeing and comfort (10). The authors surveyed 730 patients who were treated for new onset atrial fibrillation and discharged from one of 21 medical centers in Northern California. Of this cohort, 652(89%) responded to a structured phone survey. Though the data was prospectively gathered, these patients were not randomized to either a rate or rhythm control strategy, but rather the manner of treatment was left entirely to the judgment of the treating physician. Of the 652 respondents the majority, 432 (67.3%) were managed with rate control therapy alone. Regardless of management strategy, 410 (62.9%) of the patients were discharged from the Emergency Department in a sinus rhythm. Among those patients who underwent electrical cardioversion, 92.2% were in sinus rhythm upon discharge. If you consider discharge rhythm as a metric of success than electrical cardioversion was a far more accomplished strategy than either pharmacological cardioversion or rate control therapy alone, which accounted for 81.6% and 49.7% of patients in a sinus rhythm respectively at discharge (10). Despite its obvious superiority in rhythmic control, what benefits does cardioversion provide for patients’ symptom burden at 30-days?

The authors measured 30-day wellbeing using the Atrial Fibrillation Effect on Quality-of-life (AFEQT) score. This 18-question tool was intended to assess the patients’ perception of the burden of disease. The surveys were administered via telephone performed by trained research assistants at least 28-days following Emergency Department visit. Overall 539 patients (82.7%) reported some degree of symptom burden related to their atrial fibrillation upon discharge. The use of cardioversion did not decrease the rate or degree of symptom burden at 30-days. When the authors analyzed the AFEQT scores in quartiles of severity rather than the dichotomous symptom/no symptom outcome, they found no additional benefit to Emergency Department cardioversion. Certainly this data is far from perfect. This was a non-randomized cohort and it is unclear how well the AFEQT score captures symptom burden (10). Despite these shortcomings, findings are consistent with the body of literature examining whether an aggressive rhythm control strategy approves patient wellbeing. A number of trials have examined the long-term benefits rhythm control has on reducing symptom burden. These trials have consistently demonstrated that when compared to rate control alone, an aggressive rhythm control strategy provided no additional perceivable benefit to patients’ wellbeing and comfort (11).

The act of electrical cardioversion within 48 hours of symptom onset is commonly perceived as a safe practice. In a recent review of the existing literature, Cohen et al found that out of 1593 patients, only one (0.06%) stroke was reported. Despite this cursory endorsement, I would caution that safety is measured in the thousands and the current data is far too limited and ripe with publication bias to truly assess safety. Additionally a recent research letter published in JAMA called into question the safety of the 48-hour window we have traditionally used to determine suitability for Emergency Department cardioversion. Nuotio et al published a secondary analysis of the FinV trial registry which examined 2481 patients in atrial fibrillation who underwent electrical cardioversion within 48-hours of symptom onset. In this cohort the risk of ischemic event increased significantly (0.03% to 1.1%) when time to symptom onset was greater than 12 hours. And although 1.1% is still a relatively low event rate, given the absence of any clear clinical benefit, the benefit-harm ratio does not favor an aggressive rhythm control strategy (12).

Modern medicine far too often values competency over efficacy. Whether it is door to balloon time, or the 6-hour sepsis bundle, we are constantly measured in surrogates thought to be associated with improvements in patient outcomes. The quality of our care has been distilled down to what can be marked as complete on a checklist. Although the evidence clearly demonstrates Emergency Physicians are capable of effectively cardioverting new onset atrial fibrillation in the Emergency Department, one cannot help but asking, to what end?

Sources Cited:

  1. Stiell, I.G., Clement, C.M., Perry, J.J. et al. Association of the Ottawa Aggressive Protocol with rapid discharge of emergency department patients with recent-onset atrial fibrillation or flutter. CJEM. 2010; 12: 181–191
  2. Coll-Vinent, B., Fuenzalida, C., Garcia, A. et al. Management of acute atrial fibrillation in the emergency department: a systematic review of recent studies. Eur J Emerg Med. 2013; 20: 151–159
  3. Decker, et al. A Prospective, Randomized Trial of an Emergency Department Observation Unit for Acute Onset Atrial Fibrillation.  Annals of Emergency Medicine, 2007.
  4. Wyse DG, Waldo AL, Dimarco JP, et al. A comparison of rate control and rhythm control in patients with atrial fibrillation. N Engl J Med. 2002;347(23):1825-33.
  5. Van gelder IC, Hagens VE, Bosker HA, et al. A comparison of rate control and rhythm control in patients with recurrent persistent atrial fibrillation. N Engl J Med. 2002;347(23):1834-40.
  6. Roy D, Talajic M, Nattel S, et al. Rhythm control versus rate control for atrial fibrillation and heart failure. N Engl J Med. 2008;358(25):2667-77.
  7. Scheuermeyer FX, Grafstein E, Stenstrom R, et al. Thirty-day and 1-year outcomes of emergency department patients with atrial fibrillation and no acute underlying medical cause. Ann Emerg Med. 2012;60(6):755-765.e2.
  8. Wang TJ, Massaro JM, Levy D, et al. A risk score for pre- dicting stroke or death in individuals with new-onset atrial fibrillation in the community — the Framingham Heart Study. JAMA 2003;290:1049-56
  9. Stiell IG, Birnie D. Management of recent-onset atrial fibrillation in the emergency department. Ann Emerg Med. 2011; 57:31–2.
  10. Ballard, DW. et al. Emergency Department Management of Atrial Fibrillation and Flutter and Patient Quality of Life at One Month Postvisit. Annals of Emergency Medicine
  11. Thrall G, Lane D, Carroll D, Lip GY. Quality of life in patients with atrial fibrillation: a systematic review. Am J Med. 2006;119(5):448.e1-19.
  12. Nuotio I, Hartikainen JE, Grönberg T, Biancari F, Airaksinen KE. Time to cardioversion for acute atrial fibrillation and thromboembolic complications. JAMA. 2014;312(6):647-9.

The Problem of Thor Bridge


Disclosure: This post is unusually full of hearsay and conjecture. Like a secondary endpoint that flirts with statistical significance it should be viewed purely as hypothesis generating. For a more reasoned and experienced view of the following data please read Josh Farkas’s wonderful post on

Damage control ventilation is not a novel concept. It functions under the premise that positive-pressure ventilation intrinsically possesses few curative properties and rather acts as a bridge until a more suitable state of ventilatory well-being can be achieved. As such, we should view its utilization as a necessary evil and endeavor not to correct the patient’s pathological perturbations but rather limit its iatrogenic harms. Since the publication of the ARDSNet protocol in 2000 we have known that striving to achieve physiological normality leads to greater parenchymal injury and downstream mortality (1). Later research demonstrated that even in patients without fulminant ARDS, a protective lung strategy is beneficial (2). Understandably we are reticent to initiate mechanical ventilation unless absolutely necessary. Because of its abilities to delay and even prevent more invasive forms of ventilatory support, non-invasive ventilation (NIV) has long been the darling of the emergent management of most respiratory complaints. It is a rare respiratory ailment that cannot be remedied with a tincture of positive-pressure ventilatory support delivered via a form-fitting face mask. Its widespread implementation is primarily borne from NIV’s capacity to provide a bridge to a more definitive form of therapeutic support. Due in part to NIV’s ability to decrease the rate of intubation in patients presenting with COPD and CHF exacerbations, it is more readily  being utilized in a subgroup of patients where a definitive destination is far less assured, a group of patients where the cause of their current dyspnea is not so readily correctable. A bridge, if you permit me a moment of sensationalism, to nowhere…

Although the efficacy for the use of NIV in COPD exacerbations and acute cardiogenic pulmonary edema are well documented (3,4,5,6,7), the evidence for its use in managing other forms of hypoxic failure, such as pneumonia and ARDS, is far less robust. In fact there is some less than perfect evidence demonstrating that in these populations, NIV fails to prevent intubation and in this subset of patients, who are unsuccessful in their trial of non-invasive ventilatory support, the mortality is higher than in those patients who were initially intubated (8,9). And so the authors of the “Clinical Effect of the Association of Non-invasive Ventilation and High Flow Nasal Oxygen Therapy in Resuscitation of Patients with Acute Lung Injury (FLORALI)” trial hoped to examine whether NIV was superior to standard face mask oxygenation therapy in patients with acute hypoxic respiratory failure (10). Frat et al examined two forms of non-invasive ventilatory strategies in patients admitted to the ICU with non-hypercapneic, non-cardiogenic hypoxic respiratory failure. The first was the traditional bi-level positive pressure ventilation, more commonly known as BPAP. The second was high-flow (50 L/min) humidified oxygen delivered via nasal cannula. Using a 1:1:1 ratio the author’s randomized 313 patients too either BPAP, high-flow NC or standard 2270840_origfacemask support. The authors enrolled a relatively sick spectrum of patients. In order to be enrolled patients were required to have a respiratory rate of more than 25 breaths per minute, a PaO2/FiO2 of 300 mg Hg or less while on 10 L of supplementary O2, have a PaCO2 of no higher than 45 mm Hg with no history of underlying chronic respiratory disease. Additionally patients were excluded if they presented with an exacerbation of asthma or COPD, cardiogenic pulmonary edema, severe neutropenia, hemodynamic instability, use of vasopressors, a GCS of 12 or less, any contraindication to non-invasive ventilation, an urgent need for intubation or DNI orders. Given these stringent inclusion and exclusion criteria it is no surprise that out of the 2506 patients to present to one of the 23 participating ICUs, only 525 met the criteria for inclusion. Of these 313 underwent randomization and 310 were included in the final analysis (10).

The cause of hypoxia in the vast majority (75.5%) of these patients was due to pneumonia. The authors’ primary endpoint was the number of patients in each group who underwent endotracheal intubation within 28-days of enrollment. Although the authors found no statistical difference in the rate of intubation between the three groups, it is difficult not to infer a clinically important difference that was statistically overlooked due to the limited power generated by an n of 310. The 28-day intubation rate in the high-flow O2 group was 37% compared to 47% and 50% in the face-mask and BPAP groups respectively (an absolute difference of 10% and 13% respectively). When the more severely hypoxic patients were examined (those with a PaO2/FiO2 < 200), this absolute difference increased to 18% and 23% respectively. Additionally patients randomized to high-flow O2 had lower mortality rates, compared to either the facemask or BPAP groups. ICU mortality was 11%, 19% and 25% respectively and 90-mortality was 12%, 23%, and 28% respectively. In the patients with a more pronounced hypoxia these differences in mortality became even more pronounced. In patients with an PaO2/FiO2 < 200 the ICU mortality was 12%, 21.6% and 28.4%, while the 90-day mortality was 13.2%, 27.0% and 32.1%. Although the primary endpoint of this trial was negative (p= 0.18), there is a clear and consistent improvement in outcomes of patients randomized to high-flow O2 compared to the other two non-invasive strategies (10).

This trial is nowhere near perfect. The sample size is far too small to confidently rule out statistical whimsy’s causal responsibility for these findings.  Additionally it is difficult to discern whether high-flow O2 was beneficial in this subgroup of patients or rather BPAP was deleterious. Most importantly it fails address the question of primary concern for the Emergency Physician. Is non-invasive ventilation preferable to early endotracheal intubation? Frey et al compared high-flow O2 and BPAP therapy to standard face-mask oxygenation, which does not help us differentiate whether NIV is superior to early invasive ventilator support. Furthermore this trial examines the use of NIV in ICU patients over prolonged periods (median time to intubation was 17-27 hours), it does not tell us whether the use of BPAP is detrimental while patients are managed in the Emergency Department. Given these shortcomings how should we view these data?

Technically from a Frequentist’s viewpoint these statistically significant secondary endpoints are just hypothesis building and additional studies are required to validate these preliminary findings. But what if for a moment, we were to take a Bayesian perspective and examine this very same paper from an alternative vantage? How then would this data appear? Bayesian statistics takes an inductive perspective when examining data. Simply put it asks how does this data affect the prior scientific belief? Given the data presented in this trial, what is the most probable hypothesis that explains these results (12)? How do these results change the current scientific belief that was held prior to this study being conducted? Alternatively, when using Frequentist statistics we employ deductive methodology to address one question and utilize a predetermined statistical threshold to either accept or reject the null-hypothesis. All other questions examined in the paper are essentially exploratory and, due to the single minded nature of the p-value, are simply hypothesis generating (11).

Examining the data published by Frat et al, one would conclude the most probable hypothesis that would explain these events is:

In patients with non-hypercapnic, non-cardiogenic, hypoxic respiratory failure high-flow oxygen therapy decreases both mortality and the rate of intubation when compared to face-mask oxygenation. Additionally the use of BPAP does not decrease the rate of intubation and may in fact increase mortality in a subset of the sickest patients.

How does this effect the prior scientific belief of the efficacy of NIV in patients with hypoxic respiratory failure? Frat et al certainly supports the prior evidence demonstrating that BPAP therapy is detrimental in this subset of patients with hypoxic respiratory failure. In fact the rate of endotracheal intubation (50%) is essentially identical to rates cited in prior cohorts (8). It also highlights that these negative effects may in fact be due to the therapy itself rather than the delay to definitive airway management as was previously hypothesized. Though there was a non-significant increase in the median time to intubation in the BPAP group compared to patients receiving face-mask therapy alone, the time to intubation between the BPAP and high-flow O2 groups were identical. And yet despite these minimal differences in time to intubation, the patients who underwent intubation in the BPAP group had an increased mortality when compared to those randomized to either face-mask and high-flow oxygen (10). Patients in the BPAP group, with the help of positive pressure, achieved average tidal volumes of 9cc/kg. As the ARDSNET trial group demonstrated when administering positive pressure ventilation, a lung protective strategy, tidal volumes of 6cc/kg, led to significant improvement in outcomes in patients with ARDS (1). Determann et al demonstrated that even in patients without ARDS, lung protective strategies led to improved outcomes when compared to more traditional physiological lung volumes (2). Until now we have cognitively absolved positive pressure delivered in a non-invasive form as a causative agent of such complications. The findings of Frat et al have, for the first time, cast a shadow of doubt on the innocence of NIV.

As far as the spectacular results demonstrated by the high-flow O2 group, given the size of the population studied and a paucity of previous science with which to compare, it is hard to know how much credence to place in these results. What is clear is we should no longer view high-flow O2 as a substandard option, reserved only for patients who have failed to tolerate the more traditional forms of NIV. Rather high-flow O2 may provide a unique form of respiratory support that is not accounted for by our prior understanding of NIV (10).

We have known for some time that the use of positive pressure ventilation is the result of being forced to choose between the lesser of two evils. Although it provides a means of ventilatory support, it itself possesses little inherent therapeutic benefits. In fact, positive-pressure ventilation comes at the cost of hemodynamic compromise, iatrogenic lung injury, nosocomial infections, and sedation protocols that leave the patients confused and delirious.  As such, a damage control strategy is typically employed to limit these downstream harms until the patients own ventilatory capacity has returned. Until now these strategies have been limited to invasive forms of ventilatory support. The Frat et al data suggests that, to some degree, non-invasive ventilatory support may be associated with similar iatrogenic harms. Although the current data is incomplete, it should remind us that if we intend to construct a bridge, we should have some understanding of where this intended conduit will lead and if this is a healthier destination then where we started.

Sources Cited:

1.         Ventilation with lower tidal volumes as compared with traditional tidal volumes for acute lung injury and the acute respiratory distress syndrome: the Acute Respiratory Distress Syndrome Network. N Engl J Med 2000;342:1301‒8.

2.         Determann RM, Royakkers A, Wolthuis EK, et al. Ventilation with lower tidal volumes as compared with conventional tidal volumes for patients without acute lung injury: a preventive randomized controlled trial. Crit Care 2010;14(1):R1.

3.         Brochard L, Mancebo J, Wysocki M, et al. Noninvasive ventilation for acute exacer- bations of chronic obstructive pulmonary disease. N Engl J Med 1995;333:817-22.

4.         Keenan SP, Sinuff T, Cook DJ, Hill NS. Which patients with acute exacerbation of chronic obstructive pulmonary disease ben- efit from noninvasive positive-pressure ventilation? A systematic review of the lit- erature. Ann Intern Med 2003;138:861-70.

5.         Lightowler JV, Wedzicha JA, Elliott MW, Ram FS. Non-invasive positive pres- sure ventilation to treat respiratory failure resulting from exacerbations of chronic obstructive pulmonary disease: Cochrane systematic review and meta-analysis. BMJ 2003;326:185.

6.         Masip J, Roque M, Sánchez B, Fernán- dez R, Subirana M, Expósito JA. Noninva- sive ventilation in acute cardiogenic pul- monary edema: systematic review and meta-analysis. JAMA 2005;294:3124-30.

7.         Gray A, Goodacre S, Newby DE, et al. Noninvasive ventilation in acute cardiogenic pulmonary edema. N Engl J Med. 2008;359(2):142-51.

8.         Carrillo A, Gonzalez-diaz G, Ferrer M, et al. Non-invasive ventilation in community-acquired pneumonia and severe acute respiratory failure. Intensive Care Med. 2012;38(3):458-66..

9.         Delclaux C, L’Her E, Alberti C, et al. Treatment of acute hypoxemic nonhyper- capnic respiratory insufficiency with con- tinuous positive airway pressure delivered by a face mask: a randomized controlled trial. JAMA 2000;284:2352-60.

10.      Frat JP, Thille AW, Mercat A, et al. High-Flow Oxygen through Nasal Cannula in Acute Hypoxemic Respiratory Failure. N Engl J Med. 2015;

11.      Goodman SN. Toward evidence-based medical statistics. 1: The P value fallacy. Ann Intern Med. 1999;130(12):995-1004.

12.      Goodman SN. Toward evidence-based medical statistics. 2: The Bayes factor. Ann Intern Med. 1999;130(12):1005-13.

The Third Annotation of a Case of Identity


So often in modern medicine we mistake science for truth. In doing so we have become enamored with the p-value and view it as the major determinant of relevance in scientific inquiry. An almost arbitrary selected value of 0.05 is independently responsible for defining what is considered beneficial, and what will be discarded as medical quackery. The p-value was first proposed by Ronald Fisher as a novel method of defining the probability that the results observed had occurred by chance alone. Or stated more formally, “the probability, under the assumption of no effect or no difference (the null hypothesis), of obtaining a result equal to or more extreme than what was actually observed” (1). Originally intended as a tool for clinicians to assess whether the results from a trial were due to the treatment effect in question or merely random chance, its meaning has transformed into something far more divine. Despite its overwhelming acceptance, the p-value has many flaws. It is incapable of distinguishing clinical relevance, rather only denotes the probability of equivalence. In addition, its faculties are easily overwhelmed when multiple observations are performed. Finally, the mathematical assumptions it is built upon do not take into account prior evidence and provide no guidance for future endeavors (1).

Our romance with the p-value has not gone unnoticed by many pharmaceutical companies who have learned that they need not manufacture a drug that produces clinical benefits, but rather fabricate a trial that demonstrates statistical significance. Since the p-value does not take into account prior evidence, trialists are not required to justify results as they relate to the entirety of an evidentiary body but rather demonstrate singular mathematical significance in a statistical vacuum. As such we are asked to live in the evidentiary present with only selective access to past knowledge. Even when we are granted a privileged glimpse at results from prior trials, it is often comprised of incomplete and limited data intended to sway our opinion in a deliberate manner. This phenomenon, known as publication bias, allows pharmaceutical companies to preferentially publish trials with p-values that suit their interests while suppressing others that do not support their claims. By prospectively highlighting a would-be therapy’s more flattering features and bullying Frequentist statistics with sample sizes that would make even negligible differences significant, it is easy to snatch statistical victory from the grasp of clinical obscurity. This is likely what the makers of ticagrelor hoped for when they designed the PEGASUS Trial.

PEGASUS Trial’s intentions were to extend ticagrelor’s temporal indication beyond the 12-month window, testing the hypothesis that long-term therapy of ticagrelor in conjunction with low-dose aspirin reduces the risk of major adverse cardiovascular events among stable patients with a history of myocardial infarction. Bonaca et al randomized 21,162 patients who experienced a myocardial infarction within the past 1-3 years to either 90 mg or 60 mg of ticagrelor twice daily or placebo. This is not the first time such a hypothesis has been investigated (2). Multiple trials have studied whether prolonged use of P2Y12 inhibitors possess any value other than augmenting the pharmaceutical industries’ coffers. The largest of these investigations, the DAPT trial, was published in 2014 by Mauri et al in NEJM (3). This trial examined patients 12 months after a cardiovascular event and considered whether the continuation of either clopidogrel or prasugrel was beneficial. The authors randomized 9,961 patients to either a P2Y12 inhibitor or an appropriate placebo. The DAPT Trial demonstrated that prolonged use of dual-antiplatelet therapy decreased the rate of cardiovascular events (4.3% vs. 5.9%) and stent restenosis (0.4% vs 1.4%) in exchange for an increased rate of severe bleeding (2.5% vs. 1.6%). There was also a small increase in overall mortality (2% vs 1.5%) in patients randomized to prolonged P2Y12 inhibition (3). Multiple recent meta-analyses confirm these findings (4,5). These results should come as no surprise as the bulk of the literature examining P2Y12 inhibitors has highlighted their benefit primarily as a means of reducing type 4a peri-procedural infarctions of questionable clinical relevance. And so this was the landscape AstraZeneca faced when designing the PEGASUS Trial. Every prior trial examining the question of prolonged dual-antiplatelet therapy has demonstrated that the small reductions in ischemic endpoints are easily overshadowed by the excessive increase in the rate of severe bleeding events. Fortunately in the modern era of Frequentist statistics none of these failures matter. Because the p-value does not account for prior evidence, the authors of the PEGASUS Trial did not have to account for this less-than-stellar history. Success by modern standards is simply the ability to contrive a primary endpoint that will demonstrate an appreciably low enough p-value to be considered significant.

Bonaca et al’s primary outcome was the composite rate of cardiovascular death, MI and stroke over the follow up period (3-years). The absolute rate of primary events were 7.85%, 7.77%, and 9.02% in the 90 mg, 60 mg and placebo groups respectively. This small (approximately 1.20% absolute difference) was found to be impressively statistically significant (p-values of 0.008 and 0.004 in the 90 mg vs placebo and 60 mg vs placebo comparisons respectively). Its clinical significance is far more questionable, and unlike its statistical counterpart cannot be bullied by the mass and size of the sample population. The effect size of this composite outpoint is diminutively small. The effect sizes of each respective component of this composite outcome are even smaller. The only measure that maintained its statistical significance consistently across all treatment comparisons was the reduction in myocardial infarction, which boasts a 0.85% and 0.72% absolute reduction in the 90 mg and 60 mg groups respectively.

Conversely the rates of bleeding in the patients randomized to receive the active agent were impressively high, especially given the previous studies examining ticagrelor demonstrated a more reasonable safety profile. The rate of TIMI major bleeding was 2.6%, 2.3% and 1.06% in the 90 mg, 60 mg and placebo groups respectively. Since both the rate of intracranial hemorrhage and fatal hemorrhage were statistically similar, most of this excess bleeding seems to be in the form of “clinically overt hemorrhage associated with a drop in hemoglobin of ≥5 g/dL or a ≥15% absolute decrease in hematocrit.” (2) These results are not too dissimilar from those of the DAPT Trial(3). Patients taking P2Y12 inhibitors will benefit from a slight decrease in the risk of non-fatal myocardial infarctions and stent restenosis while experiencing an increased risk of clinically significant bleeding.

Despite the positive spin of this trial, it is far from a success. The investigators enrolled the more infirmed spectrum of patients with CAD, so as to include a cohort more likely to benefit from additional anti-platelet inhibition. They also excluded the patients most at risk for hemorrhagic complications so as to limit the appearance of adversity. Investigators excluded patients with a history of ischemic stroke or intracranial hemorrhage, a central nervous system tumor, an intracranial vascular abnormality, with a history of gastrointestinal bleeding within the previous 6 months or major surgery within the previous 30 days (2). This of course in itself is not a concern, was it not for the likely application of prolonged dual-antiplatelet therapy to a far broader patient population.

Our current version of evidence-based medicine has left us susceptible to mistaking mathematical manipulations as scientific truth. It is short sighted and allows for the linguistic error of misinterpreting statistical significance for clinical relevance. The PEGASUS Trial boasts p-values far below what is traditionally considered significant, and yet p-values below 0.05 hold little intrinsic value to our patients’ well being. Yes, from a Frequentist’s perspective we are capable of concluding with relative certainty that the use of ticagrelor decreases the composite endpoint of myocardial death, MI, or stroke. The clinical relevance of which, is far from certain as its weight is powered exclusively by a decrease in myocardial infarctions. It is unlikely this small benefit is worth the impressive increase in serious hemorrhagic events. From the very earliest trials examining P2Y12 inhibitors, their benefits have been primarily due to the manipulation of statistical constructs rather than any inherent efficacy (6,7). The PEGASUS Trial is no different. These trials are not landmark demonstrations of P2Y12 inhibitors’ benefits, but rather statistical manipulations of clinical insignificant differences stacked one on top of the other to give the appearance of height when none is present. It is the statistical equivalent of an eyespot meant to keep the scorn of the medical skeptics at bay. Know that we are not scared or confused by your statistical mimicry. We see these trials for what they are, pharmaceutical advertisements poorly hidden behind the guise of scientific inquiry.

Sources Cited:

  1. Goodman SN. Toward evidence-based medical statistics. 1: The P value fallacy. Ann Intern Med. 1999;130(12):995-1004.
  1. Bonaca MP, Bhatt DL, Cohen M, et al. Long-Term Use of Ticagrelor in Patients with Prior Myocardial Infarction. N Engl J Med. 2015
  1. Mauri L, Kereiakes DJ, Yeh RW, et al. Twelve or 30 months of dual antiplatelet therapy after drug-eluting stents. N Engl J Med. 2014;371(23):2155-66.
  1. Palmerini T, Sangiorgi D, Valgimigli M, et al. Short- versus long-term dual antiplatelet therapy after drug-eluting stent implantation: an individual patient data pairwise and network meta-analysis. J Am Coll Cardiol. 2015;65(11):1092-102.
  1. Giustino G, Baber U, Sartori S, et al. Duration of Dual Antiplatelet Therapy After Drug-Eluting Stent Implantation: A Systematic Review and Meta-Analysis of Randomized Controlled Trials. J Am Coll Cardiol. 2015;65:(13)1298-310
  2. Yusuf S, Zhao F, Mehta SR, et al. Effects of clopidogrel in addition to aspirin in patients with acute coronary syndromes without ST-segment elevation. N Engl J Med. 2001;345(7):494-502.
  3. A randomised, blinded, trial of clopidogrel versus aspirin in patients at risk of ischaemic events (CAPRIE). CAPRIE Steering Committee. Lancet. 1996;348(9038):1329-39.


A Truncated Summation of the Adventure of the Cardboard Box



One gets the sense when reading the literature on endovascular therapy for acute ischemic stroke that they are on a small seafaring vessel attempting to map the shoreline through a dense fog. There are moments when the fog lifts and you catch a glimpse of the topographic details of the shore, and then the cloud again rolls in obscuring any further ascertainment. Similarly the recent publications of endovascular therapy for acute ischemic stroke have demonstrated there is a definitive benefit to mechanical reperfusion therapy, and yet each publication in itself is so incomplete, it is difficult to perceive anything more than this general appearance of benefit. The finer details are obscured by the premature truncation of trials, too early to definitively characterize the benefits and risks of endovascular therapy.

MR CLEAN, published earlier this year in the NEJM, and discussed ad nauseam in previous posts, marked the first of what is now a litany of trials demonstrating benefit for endovascular therapy in acute ischemic stroke (1). Its release resulted in the subsequent premature stoppage of a number of key trials examining endovascular therapy. Although all these trials boast impressive results, each stopped their enrollment prematurely, not due to a preplanned interim analysis, but rather due to MR CLEAN’s positive results. ESCAPE and EXTEND-IA were the first to halt enrollment and hastily publish their results (2,3). More recently the NEJM has reported on the findings from the next two trials prematurely stopped due to MR CLEAN’s success.

The first of these studies is the SWIFT-PRIME trial published by Saver et al (4). This trial’s initial results were presented earlier this year alongside EXTEND-IA and ESCAPE at the 2015 International Stroke Conference. Like its counterparts, this trial examined patients presenting with large ischemic infarcts and radiographically identified occlusions in the terminal internal carotid (ICA) or first branch (M1) of the middle cerebral artery (MCA). Additionally patients had to demonstrate a favorable core-to-ischemic penumbra ratio on perfusion imaging. Patients were enrolled if they were able to undergo endovascular interventions within 6-hours of symptom onset.

Like ESCAPE and EXTEND-IA, the results of SWIFT-PRIME are impressive. Authors boast a 25% absolute difference in the number of patients with a mRS of 0-2 at 90 days. Though notable, the definitive magnitude of effect is hardly concrete. The authors cite an NNT of 4 to have one more patient alive and independent at 90 days, and an NNT of 2.6 to have one patient less disabled. These calculations are used using their dichotomous and ordinal analyses respectively. Although the authors cite impressive p-values (<0.001), the confidence interval surrounding this 25% point estimate is far broader (11-38%). Meaning the NNT is somewhere between 2.6 and 9 patients. EXTEND-IA and ESCAPE have similarly wide confidence intervals surrounding their point estimates (4). EXTEND-IA’s confidence interval is 8% to 50% surrounding a point estimate of 31% (2). Likewise ESCAPE has a confidence interval of 13% to 34% surrounding its 23.7% point estimate (3). All three of these trials were stopped early secondary to MR CLEAN’s results. And though both EXTEND-IA and ESCAPE came close to reaching their pre-defined sample size, SWIFT-PRIME was stopped before its first interim analysis (n<200) (4).

Like EXTEND-IA, ESCAPE and SWIFT-PRIME, the second trial just published in NEJM, the REVASCAT trial, by Jovin et al was stopped prematurely secondary to the publication of the MR CLEAN data. In fact, even though it failed to reach the prospectively determined efficacy threshold for stopping the trial, at the first interim analysis, the data and safety board felt that given the MR CLEAN data, there was a loss of equipoise and further randomization would be unethical (5). Despite its apparent success the results of the RAVASC trial are far less impressive than either EXTEND-IA, ESCAPE or SWIFT-PRIME. The REVASC trial planned to enroll 690 patients presenting to the Emergency Department in 4 centers across Catalonia with symptoms consistent with a large vessel stroke that could be treated with endovascular therapy within 8 hours of symptom onset. Unlike EXTEND-IA, ESCAPE or SWIFT-PRIME, the REVASCAT Trial did not use perfusion imaging to select patients with favorable areas of salvageable tissue. Rather employed CTA to identify occlusion in the ICA or M1 branch of the MCA, and utilized the less accurate ASPECT score, derived from the initial non-contrast CT, to assess potential for viable ischemic tissue (5).

REVASCAT enrolled 206 patients before its premature termination. And like the three trials before it demonstrated a statistically significant improvement in mRS at 90 days in the patients who underwent endovascular therapy. The REVASCAT trial cites an absolute increase in the number of patients with a mRS of 0-2 by 15.5%. This is surrounded by a confidence interval of 2.4% to 28.5%. Furthermore, unlike the previous three trials that either boast an outright benefit in mortality or demonstrate trends in favor of endovascular therapy, REVASCAT demonstrated an impressive 4.8% absolute increase in the rate of death within the first 7 days after randomization (5).

The results of REVASCAT are far from positive. If they were not included in the optimistic fervor that currently surrounds endovasacular therapy, it might even be considered a negative trial. Why were the results REVASCAT far less impressive than EXTEND-IA, ESCAPE and SWIFT-PRIME? Was it just random chance, the true effect size of endovascular therapy falling somewhere between the two extremes of the 13.5% difference observed in MR CLEAN and the 31% seen in EXTEND-IA? Or rather was it that the patient population selected in EXTEND-IA, ESCAPE and SWIFT-PRIME led to their success? EXTEND-IA, ESCAPE and SWIFT-PRIME all utilized some form of advanced imaging to determine the size of viable ischemic tissue (2,3,4). MR CLEAN and REVASCAT used only the CTA to identify a reachable lesion and the non-contrast CT to determine tissue viability (1,5). If any one of these trials were followed to completion the results likely would provide us with a better understanding of who will benefit from endovascular therapy and the exact magnitude of this benefit.

This is a problem of certainty. Our faith in endovascular interventions was so unyielding, that at the first sign of success we claimed victory and discontinued any further scientific inquiries. The bloated results demonstrated in EXTEND-IA, ESCAPE, and SWIFT-PRIME are the result of this premature resolution. We know that trials stopped early for benefit are likely to over-estimate the effect size of the treatment in question. In fact the smaller the sample size at the time of closure, the greater the amplification (6). In 1989, Peacock et al demonstrated this to be a mathematical inevitability (7). Later validated by Bassler et al in a meta-analysis examining 91 trials stopped prematurely for benefit (8). Bassler et al revealed that the degree of embellishment was directly related to the size of the sample population at cessation and independent of the quality of the trial or the presence of a predetermined methodology for early stoppage.

Although the exact patient population that stands to benefit from endovascular therapy is unclear, it is certainly a small fraction of the overall patients who present to the Emergency Department with acute ischemic stroke. All patients enrolled in the REVASC trial were also included in a national registry known as SONIA. SONIA catalogued 2576 patients (only 15.6% of all stroke patients seen) with some form of reperfusion therapy over the time period REVASCAT enrolled patients (5). The vast majority of these patients 2036(79%) received only tPA. 540 (21%) patients underwent endovascular therapy. Of these only 111 (24%) were eligible for enrollment into the REVASCAT trial. Only 4.3% of the patients in the SONIA registry, and only 0.3% of all stroke patients during the 2-year period were eligible for inclusion in the REVASCAT trial (5). This accounts for a small minority of the stroke patients presenting to the Emergency Department with symptoms consistent with acute ischemic stroke. Of note the criteria used in the REVASCAT trial to determine eligibility are more inclusive than those used in EXTEND-IA, ESCAPE, and PRIME-SWIFT, which if you believe were successful because of their inclusion criteria, would account for an even smaller portion of stroke patients presenting the Emergency Department. In the SWIFT-PRIME trial it took 2-years and 39 centers to recruit 196 patients (4). That comes out to 0.2 patients per center per month. EXTEND-IA and ESCAPE recruited only 0.3 and 1.44 patients per center per month respectively (2,3).

Even the most skeptical will find difficulty denying there is a definite treatment effect observed in the recent trials examining endovascular therapy in acute ischemic stroke. The magnitude of this effect has yet to be defined. Its borders are obscured by the murkiness of small sample sizes, extreme selection bias and prematurely stopped trials. There are also clear harms associated with this invasive procedure. Both the REVASCAT trial and the earlier trials examining endovascular therapy (IMS-3, SYNTHESIS and MR RESCUE) demonstrated that when performed on the wrong patient population, not only will endovascular therapy fail to provide benefit, it may in fact be harmful (5,9,10,11). This is simply not a yes or no question. The resources required to build an infrastructure capable of supporting endovascular therapy on a national level are daunting. Though we have reached a certain degree of clarity that endovascular therapy for acute ischemic stroke provides benefit, how well and in whom remains murky. The overeager truncation of important trials has left us adrift in a sea of fog. Unsure if the shoreline we paddle towards is a warm welcoming beachfront or a rocky coast prepared to demolish our vessel upon arrival.

Sources Cited:

  1. Berkhemer OA, Fransen PS, Beumer D, et al. A randomized trial of intraarterial treatment for acute ischemic stroke. N Engl J Med. 2015;372:(1)11-20.
  2. Campbell BC, Mitchell PJ, Kleinig TJ, et al. Endovascular Therapy for Ischemic Stroke with Perfusion-Imaging Selection. N Engl J Med. 2015.
  3. Goyal M, Demchuk AM, Menon BK, et al. Randomized Assessment of Rapid Endovascular Treatment of Ischemic Stroke. N Engl J Med. 2015.
  4. Saver JL, Goyal M, Bonafe A, et al. Stent-Retriever Thrombectomy after Intravenous t-PA vs. t-PA Alone in Stroke. N Engl J Med. 2015
  5. Jovin TG, Chamorro A, Cobo E, et al. Thrombectomy within 8 Hours after Symptom Onset in Ischemic Stroke. N Engl J Med. 2015;
  6. Guyatt GH, Briel M, Glasziou P, Bassler D, Montori VM. Problems of stopping trials early. BMJ. 2012;344:e3863.
  7. Pocock SJ, Hughes MD. Practical problems in interim analyses, with particular regard to estimation. Control Clin Trials 1989;10(suppl 4):209-21S.
  8. Bassler D, Briel M, Montori VM, et al. Stopping randomized trials early for benefit and estimation of treatment effects: systematic review and meta-regression analysis. JAMA. 2010;303(12):1180-7.
  9. Broderick JP, Palesch YY, Demchuk AM, et al. Endovascular therapy after intravenous t-PA versus t-PA alone for stroke. N Engl J Med. 2013;368(10):893-903.
  10. Ciccone A, Valvassori L, Nichelatti M, et al. Endovascular treatment for acute ischemic stroke. N Engl J Med. 2013;368(10):904-13.
  11. Kidwell CS, Jahan R, Gornbein J, et al. A trial of imaging selection and endovascular treatment for ischemic stroke. N Engl J Med. 2013;368(10):914-23.






The Case of the Anatomic Heart Part 2


The PROMISE Trial, like any aptly named study chose an acronym meant to inspire. In this case, the hope for a better tomorrow. And though the authors of the Prospective Multicenter Imaging Study for Evaluation of Chest Pain trial were not clear on the specific details their promise entailed, I fear the results of this trial will leave us feeling betrayed and forsworn.

The authors of the PROMISE Trial presented the findings from their massive undertaking at the 2015 ACC scientific assembly. The results were published simultaneously in the NEJM. Douglas et al randomized 10,003 patients to either standard non-invasive functional testing, as determined by the treating physician, or CTCA. Patients were recruited from outpatient facilities across North America when presenting with new onset chest pain in which the treating physician was suspicious of cardiac origin and had already ruled out ACS. Patients were excluded if they presented with unstable vitals, EKG changes, or positive biomarkers. Given the pragmatic nature of the trial, all other treatment decisions were left to the prerogative of the treating physician (1).

The authors found no difference in their primary outcome, the composite endpoint of death, MI, hospitalization for UA, or major procedural complications over the followup period (at least 12 months with average follow up of 24 months), between the CTCA and traditional testing groups (3.3% vs 3.0%). In fact other than a small decrease in the amount of negative invasive catheterization seen in the CTCA arm (3.4% vs 4.3%), the authors were unable to find any statistically significant differences in the multitude of secondary endpoints measured. As far as safety outcomes, the authors did cite some relevant concerns. Most notably those randomized to receive CTCA as their screening test underwent significantly more downstream testing and interventions. 12.2% of those randomized to the CTCA arm compared to 8.1% in the standard testing arm underwent invasive catheterization, 6.2% compared to 3.2% underwent subsequent revascularization including a 1.5% vs 0.76% rate of coronary artery bypass grafting (CABG) (1).

Now some might argue that the PROMISE trial was not performed on Emergency Department patients and thus its application to our low risk chest pain population is questionable. In some senses this may be true. Patients evaluated in the Emergency Department for chest pain are inherently at higher risk than their counterparts seen in primary care offices. Conversely the PROMISE Trial evaluated a cohort of chest pain in whom the treating physician suspected the symptoms were likely of cardiac origin. Before being enrolled in the trial all of these patients were ruled out for ACS with negative EKGs and biomarkers. Additionally the treating physician felt further provocative testing was necessary. This is not unlike the cohort of patients we include in our low-risk chest pain population in the Emergency Department. Furthermore we have four trials with over 3,000 Emergency Department patients evaluating the efficacy of CTCA, which demonstrate almost identical results to the PROMISE Trial (2,3,4,5). Each of these studies determined that CTCA adds no additional prognostic value to our standard risk stratification strategies and likely leads to increased invasive procedures. In a meta-analysis of these four trials published in JACC in 2013, Hulten et al found a significant increase in the number of invasive angiographies, PCIs and revascularizations performed in the patients randomized to the CTCA arm (6). PROMISE demonstrated the exact same tendencies of CTCA in a much larger cohort (1).

Why did PROMISE fail to find a difference? What are we to infer about the acuity and severity of a disease state that does not benefit from a timely and accurate diagnosis? We know CTCA is far more accurate than our more traditional forms of provocative testing. And yet, why in this massive trial did it fail to find any difference in clinically relevant outcomes? Might it be that a time-sensitive anatomical definition of CAD is unnecessary?

The first reason why PROMISE failed to show a difference is that the population enrolled in the trial was at such low risk for the disease state in question, they are likely to do well whatever diagnostic testing strategy they undergo. Only 3.1% of the group had any event during the follow-up period. Only 1.5% died and only 0.7% had a MI (1). With such a low event rate, even if CTCA is an effective means of identifying and preventing MI and cardiac death, a statistically significant benefit is unlikely to be found even with a sample size as large as 10,000 patients.

The second reason why the PROMISE Trial is likely to have failed, is simply because we are functioning under the misconception that when we diagnose these patients with obstructive CAD, an invasive strategy is superior to optimal medical management. Though we know that reperfusion therapy has objective benefits in patients actively experiencing a myocardial infarction, these same benefits have failed to translate to the more stable lesions of CAD. Multiple large RCTs have failed to find a benefit of PCI over optimal medical management in patients with stable obstructive CAD (7,8). Stergiopoulos et al have now published a number of meta-analyses examining these trials, which have also failed to uncover benefits that may have been missed in the weaker powered individual trials (9,10).

The PROMISE trial was not the only trial presented at the ACC Scientific Assembly examining the pragmatic use of CTCA for the diagnostic work up of chest pain. The SCOT-HEART trial was yet another massive undertaking, the results published online in The Lancet in concert with the oral presentation. In this trial, investigators enrolled 4,146 patients referred to chest pain clinics across Scotland, to either a standard work up or a standard work up plus the addition of CTCA. Although by sheer quantity it does not possess the statistical s of the PROMISE trial, it does present us with some insights, which the PROMISE trial proved incapable of providing(11).

The unique design of the SCOT-HEART trial insured all patients received a full standardized evaluation, often including (85% of the time) an exercise stress test. It was only after the treating physician assessed the patient, reported his or her baseline estimate of the likelihood of CAD and determined what further testing and treatment strategies he or she would recommend, that the patients were randomized to either receive CTCA or standard care. Like PROMISE, this was a pragmatic trial design and other than the use of CT angiography clinicians were given free rein to treat each patient as they deemed appropriate. At 6 weeks the physicians were then asked again to assess the likelihood of CAD(11).

What the authors revealed was that the use of CTCA significantly improved the clinicians confidence in their diagnosis of both CAD and angina of cardiac origin (the trial’s primary endpoint). They also found a statistically significant increase in the number of patients diagnosed with CAD in the group randomized to receive CTCA (23% vs 11%). Additionally patients in the CTCA arm were more frequently shifted towards more aggressive and invasive modes of management when compared to the standard care arm. Specifically more patients in the CTCA group saw an increase in number of medical therapies prescribed and invasive catheterizations performed (11).

In summary, patients randomized to CTCA were more often given the diagnosis of CAD and were more likely to be treated with medical therapies and invasive procedures than the patients in the standard care group. But did all of these investigations and interventions lead to better outcomes? Simply put no. The rate of cardiovascular death and myocardial infarction during the follow up period (1.7 years) was 1.3 vs 2.0, a 0.7% non-statistical difference. The overall mortality was 0.8% vs 1.0%, respectively. Even the decrease in the quality and severity of the patients’ symptoms (the reason the patients presented to the clinic in the first place) at 6-weeks, was identical (11).

The PROMISE trial demonstrated the use of CTCA promotes increased downstream testing and intervention. The SCOT-HEART trial validated these findings. The SCOT-HEART trial also demonstrated CTCA provides a significant degree of diagnostic certainty to the treating physician, leading to more aggressive medical management. And yet knowing a lot and doing a lot failed equate to a reduction in mortality or myocardial infarctions. These are coronary mirages, promising the weary clinicians water when in reality they are just leading them deeper into the barren desert.

Despite its size and decisively negative results, perhaps the most important study arm in the PROMISE Trial did not exist, an arm in which patients were randomized to not receive any form of provocative testing, but rather treated medically as per the judgment of their physician. Both the PROMISE and SCOT-HEART trials demonstrated that a cohort of outpatient chest pain patients are at such low risk for adverse events, they are likely to do equally as well with whatever provocative test is used, or more importantly without any at all. Surely it is time to examine such a hypothesis, to add a third arm to the PROMISE cohort. The ISCHEMIA Trial is currently enrolling patients to compare medical management vs invasive strategies in the setting of a positive provocative test. Unfortunately this trial’s applicability is limited by the fact that authors insist all patients undergo a CTCA before enrollment to rule out the presence of left main arterial disease. And though this may be a step in the right direction, we still can’t escape our need for anatomical certainty in the face of diminishing clinical utility. Surely it is time we define the value of both provocative and anatomical testing in the low risk chest pain population, truly a Promise worth keeping.

Sources Cited:

  1. Douglas PS, Hoffmann U, Patel MR, et al. Outcomes of anatomical versus functional testing for coronary artery disease. N Engl J Med. 2015;372(14):1291-300.
  2. Goldstein JA, Chinnaiyan KM, Abidov A, et al. The CT-STAT (Coronary Computed Tomographic Angiography for Systematic Tri- age of Acute Chest Pain Patients to Treatment) trial. J Am Coll Cardiol 2011;58:1414–22.
  3. Hoffmann U, Truong QA, Schoenfeld DA, et al. Coronary CT angiography versus standard evaluation in acute chest pain. N Engl J Med 2012;367:299–308.
  4. Litt HI, Gatsonis C, Snyder B, et al. CT Angiography for safe discharge of patients with possible acute coronary syndromes. N Engl J Med 2012;366:1393–403.
  5. Goldstein JA, Gallagher MJ, O’Neill WW, Ross MA, O’Neil BJ, Raff GL. A randomized controlled trial of multi-slice coronary computed tomography for evaluation of acute chest pain. J Am Coll Cardiol 2007;49:863–71.
  6. Hulten E, Pickett C, Bittencourt MS, et al. Outcomes after coronary computed tomography angiography in the emergency department: a systematic review and meta-analysis of randomized, controlled trials. J Am Coll Cardiol. 2013;61:(8)880-92.
  7. Boden WE, O’rourke RA, Teo KK, et al. Optimal medical therapy with or without PCI for stable coronary disease. N Engl J Med. 2007;356(15):1503-16.
  8. Mehta SR, Cannon CP, Fox KA, et al. Routine vs selective invasive strategies in patients with acute coronary syndromes: a collaborative meta-analysis of randomized trials. JAMA. 2005;293(23):2908-17.
  9. Stergiopoulos K, Brown DL. Initial Coronary Stent Implantation With Medical Therapy vs Medical Therapy Alone for Stable Coronary Artery Disease: Meta- analysis of Randomized Controlled Trials. Archives of Internal Medicine 2012 Feb;172(4):312
  10. Stergiopoulos K, Boden WE, Hartigan P, et al. Percutaneous Coronary Intervention Outcomes in Patients With Stable Obstructive Coronary Artery Disease and Myocardial Ischemia: A Collaborative Meta-analysis of Contemporary Randomized Clinical Trials. JAMA Intern Med. 2014;174(2):232-240.
  11. The SCOT-HEART investigators. CT coronary angiography in patients with suspected angina due to coronary heart disease (SCOT-HEART): an open-label, parallel group multicenter trial. Lancet. 2015; (published online March 15.)

The Case of Dubious Squire

laennec (1)

I often get the sense that the makers of many biomarkers envision us as helpless damsels in distress drowning in an icy pond or trapped in a monumental tower with no obvious means of descent. I imagine they think in our desperate grasps for aid, we will cling to whatever assistance they may offer, independent of its buoyancy. But in these moments of fear and uncertainty we must remember for a test to be useful to a clinician not only does it have to be accurate and reliable, it must also add diagnostic value above the clinician’s own inherent aptitude. B-type natriuretic peptide (BNP) and its natriuretic derivatives are a classic example of such a test heralded for its isolated diagnostic properties without asking the simple question, how does it help the physician? Through statistical misdirection, the distributors of natriuretic peptides have published research hailing their diagnostic prowess when examined in isolation. Such publications have led to these assays becoming recommended components of the workup for any patient suspected of having acute decompensated heart failure (1,2,3). A recent meta-analysis performed by the helpful folks responsible for the NICE guidelines, sought to examine the validity of these recommendations and determine the true diagnostic accuracy of natriuretic peptides (4). And yet, I fear these authors in their effort to provide an accurate representation of the assay’s diagnostic accuracy, have forgotten to take into account the most important factor when evaluating any diagnostic test, the clinician.

In this meta-analysis, Roberts et al examined the clinical accuracy of BNP, NTproBNP, and MRproANP for the diagnosis of acute decompensated heart failure in the Emergency Department. Specifically, the  goal was to evaluate the low risk criteria proposed by the 2012 European Society of Cardiology guidelines for heart failure, a BNP ≤100 ng/L, a NTproBNP, ≤300 ng/L, and a MRproANP, ≤120 pmol/L. They also examined the utility of these assays at intermediate and high levels (100-500 ng/L, and >500 ng/L for BNP; 300-1800 ng/L, and >1800 ng/L for NTproBNP; and >120 pmol/L for MRproANP) (4).

The authors identified 42 articles, examining 37 different cohorts that met criteria for inclusion into their meta-analysis. Combining these studies, the authors calculated pooled test characteristics for each of the natriuretic assays in question. They found at the low thresholds proposed by the European Society of Cardiology, the assays performed equally mediocre. All three demonstrated high sensitivities, 95%, 99%, and 95% respectively. Of course by selecting such a low cutoff, authors ensured that a large proportion of the patients without acute heart failure would also test positive. The specificities of each of these assays were a dismal 63%, 43%, and 56% respectively. As with any diagnostic tool, by raising the threshold of what you consider positive, the authors were able to improve the assay’s specificity. When the intermediate thresholds were utilized, the specificities increased to to 86% and 76% for BNP and NTproBNP respectively (authors did not have enough data on MRproANP to adequately calculate accuracy in this intermediate range.) Of course this amplified specificity came at the price of a loss of sensitivity, 85% and 90% respectively. When using the high threshold, authors were able to augment the tests’ specificity even further, but of course at this high level a large portion of patients with acute decompensated heart failure are missed. At a threshold of ≥500 ng/L, diagnostic meta-analysis was not performed due to inadequate data. BNP demonstrated sensitivities from the individual studies ranging from 35% to 83%, with a paired specificity from 78% to 100%. Likewise at a threshold of ≥1800 ng/L, NTproBNP reported sensitivities ranging from 67% to 87% with paired specificities ranging from 72% to 95%. Finally at the threshold of >120 pmol/L, MRproANP demonstrated sensitivities ranging from 84% to 98% and the paired specificities from 40% to 84% (4).

The authors conclude, “The use of NTproBNP and B type natriuretic peptide at the rule-out threshold recommended by the recent European Society of Cardiology guidelines on heart failure provides excellent ability to exclude acute heart failure in the acute setting with reassuringly high sensitivity. The specificity is modest at all but the highest values of natriuretic peptide, therefore confirmatory testing by cardiac imaging is required in patients with positive test results (4).”

On face value this is a fair conclusion, as all three of these assays seem to perform moderately well at either extreme of their diagnostic spectrum. At very low levels it is safe to say that the likelihood that the patients symptoms were caused by heart failure was fairly low. Likewise when significantly elevated, these assays boast specificities high enough for clinical use. Unfortunately these results do very little to explain the true utility of natriuretic peptides. By isolating these assays’ test characteristics outside the clinical arena, the authors have falsely inflated the utility of BNP and its natriuretic derivatives.

The first issue that is pervasive throughout the literature expounding the utility of natriuretic peptides is the gold standard used to evaluate their diagnostic capabilities. The most prevalent gold standard used is a retrospective review performed by two Cardiologists blinded to the results of the natriuretic peptide in question. 31 of the 37 cohorts in this meta-analysis used some derivative of this questionable gold standard. In one of the largest trials conducted, the Breathing Not Properly (BNP) trial by Maisel et al, authors examined 1586 patients presenting to the Emergency Department with acute dyspnea (5). They found that the two Cardiologists disagreed with the initial Emergency Physician’s diagnoses 14% of the time and disagreed with each other 10.7% of the time (6). This suggests that the cases in question were clearly not straightforward. If two Cardiologists with access to the patients’ entire hospital course disagreed with each other almost as often as they disagreed with the initial diagnosis of the Emergency Physician, then it is fair to say using this definition as the gold standard is less than ideal.

Despite this tarnished gold standard the question remains, how do natriuretic peptides perform when used in the clinical arena? More specifically how well do natriuretic peptide assays help the Emergency Physician differentiate the causes of dyspnea in the subset of patients in which there is considerable diagnostic uncertainty? In the BNP trial Maisel et al examined the Emergency Physician’s ability to correctly identify acutely decompensated heart failure. They found our accuracy overall, when compared to the less than perfect gold standard of a retrospective review performed by two Cardiologists was 86% (6). In the subset of patients in which the Emergency Physician was certain the patients’ dyspnea was not cardiac in origin (<5% chance of CHF), their diagnostic accuracy was superb (92%). Likewise in the group of patients in which the Emergency Physician was 95% certain the patient did in fact have CHF, they were correct 95% of the time (7). It was only in the intermediate group (between 20%-80% probability) in which the Emergency Physician was unsure of the likelihood of CHF, that their diagnostic capabilities were understandably poor. It is in this intermediate group that we would hope the natriuretic peptides could provide us with some guidance. We should not ask how accurately do peptide assays predict acute decompensated heart failure, but rather how well do peptide assays predict acute decompensated heart failure in the subset of patients that present a diagnostic challenge to the Emergency Physician? When charged with such a task these assays are far less impressive.

Although in their initial publication Maisel et al failed to disclose the diagnostic abilities of the Emergency Physicians, citing only BNP’s performance using the retrospective cutoff of 100 ng/L (sensitivity of 90%, a specificity of 76%), the authors later published these findings in a secondary analysis. Published by McCullough et al in Circulation, the authors revealed that when the Emergency Physician was certain that the patient’s cause of dyspnea was either definitely CHF or definitely not CHF, their unstructured judgment outperformed that of the BNP assay. For patients in which the Emergency Physician was certain CHF was not the cause of their dyspnea their accuracy was 92% vs the BNP which was only 84%. Likewise when the Emergency Physician was certain the patient did in fact have CHF, again their judgment outperformed the diagnostic abilities of the BNP assay (accuracy of 95% vs 92%) (7). In fact even in the subset of patients where the Emergency Physician was fairly certain the diagnosis was CHF (>80%), their positive likelihood ratio of 11.5 was far more impressive than that of the BNP (3.4)(8). In the 27.8% of patients in which the Emergency Physician was unclear of the diagnosis, the very group we would hope the BNP could provide guidance, its diagnostic accuracy was entirely unhelpful. In this subset of patients, at a cutoff of 100 ng/L, the assay demonstrated no clinical utility with a sensitivity and specificity of 79% and 71% respectively (8).

Each of the 37 studies included in the Roberts et al meta-analysis failed to truly examine how natriuretic peptides perform clinically. As discussed, the majority of these trials employed a less than ideal gold standard comparator and were so confounded by spectrum bias, they rarely examined the subgroup of patients in which the diagnosis was unclear. Additionally most of these studies used a retrospectively derived cutoff calculated to demonstrate the assay’s optimal performance. This type of overfitting inevitably leads to decreased performance when validated in a novel cohort. Ideally a randomized trial comparing a natriuretic peptide guided management to standard practice could demonstrate what, if any, clinical utility these assays provide. A number of such trials have been conducted.

The first was published in the NEJM in 2004 by Mueller et al. In this trial the authors randomized 452 patients presenting to the emergency department with acute dyspnea to either a diagnostic strategy utilizing a BNP assay or a standard work up (9). Authors powered their study to detect a 20% reduction in time to discharge (an interesting primary diagnosis to choose if one thinks BNP possesses true clinical relevance), defined as the interval from presentation at the Emergency Department to discharge. The authors found a significant difference in time to discharge (8 vs 11 days) as well as shorter times to treatment for the BNP group (63 vs 90 minutes), decreased rates of hospitalization (75% vs 85%) and decreased admission to the ICU (15% vs 24%). In fact every outcome variable trended towards better in the group randomized to receive the BNP-guided diagnostic strategy. Initially these results seem significantly in favor of using BNP in the diagnostic workup of acute dyspnea, until one examines the other RCTs evaluating this question (9).

The second RCT examining natriuretic peptides for the management of acute dyspnea was published by Moe et al in Circulation in 2007(10). In this trial, the authors randomized 500 patients to either a NT-proBNP guided strategy or standard care. Like the previous study the authors used the clinically dubious endpoint of initial ED visit duration as their primary endpoint. Though the authors found a statistically significant difference in initial ED visit time, the 0.7-hour difference (5.6 hrs vs 6.3 hrs) hardly seems clinically relevant. In fact the remainder of clinically important variables all favored the usual care group (in-hospital mortality 4.4% vs 2.4% and 60-day mortality 5.4 vs 4.4) (10). Three other trials published subsequently found similar results. Other than clinically questionable reductions in length of stay, the use of natriuretic peptides had no meaningful effect on clinical outcomes (11,12,15). When these trials’ data were pooled in a meta-analysis published by Trinquart et al, in The American Journal of Emergency Medicine in 2011, authors found no significant difference in any of the multitude of clinically relevant variables including hospital admission rate, length of hospital stay, mortality or rates of re-hospitalization (13). Even in the long-term management of patients with known heart failure, when compared to symptom guided approach, a BNP guided protocol led to further diagnostic testing and more aggressive medical therapy without producing a difference in clinically relevant outcomes (18-month survival free of any hospitalization was 41% vs 40%) (16).

This is not a proclamation of the infallibility of the Emergency Physician but rather the recognition of our shortcomings. There are a clear group of patients that present a diagnostic challenge, for whom further confirmatory investigations could provide guidance. Despite the industry-sponsored studies designed to propagate an overinflated self-worth, a close examination of the natriuretic peptides reveal they add little value to Physicians’ judgment. When we as the Emergency Physician are certain of the diagnosis of acute decompensated heart failure, our intrinsic diagnostic capabilities outperform those of natriuretic peptides. In the patients that present as a diagnostic challenge, these assays are far too insensitive and non-specific to add substantial diagnostic clarity. Furthermore we have other, more diagnostically robust, tools like point of care ultrasound to assist in these challenging circumstances (14). Natriuretic peptides are not the diagnostic saviors that they are commonly proclaimed as. More importantly we are not in need of rescue as often as the makers of these peptides would have us believe. On the rare occasion we do require aid, should we not demand a far more resolute champion?

Sources Cited:

  1. Yancy CW, Jessup M, Bozkurt B, Butler J, Casey DE, Drazner M, et al. ACCF/AHA guideline for the management of heart failure: a report of the American College of Cardiology Foundation/American Heart Association Task Force on practice guidelines Circulation2013;128:e240-327
  2. McMurray JJV, Adamopoulos S, Anker SD, Auricchio A, Bohm M, Dickstein K, et al. ESC guidelines for the diagnosis and treatment of acute and chronic heart failure 2012: The Task Force for the Diagnosis and Treatment of Acute and Chronic Heart Failure 2012 of the European Society of Cardiology. Developed in collaboration with the Heart Failure Association (HFA) of the ESC. Eur J Heart Fail2012;14:803-69
  3. Thygesen K1, Mair J, Mueller C, Huber K, Weber M, Plebani M, et al. Recommendations for the use of natriuretic peptides in acute cardiac care: a position statement from the Study Group on Biomarkers in Cardiology of the ESC Working Group on Acute Cardiac Care Eur Heart J2012;33:2001-6
  4. Roberts Emmert, Ludman Andrew J, Dworzynski Katharina, Al-Mohammad Abdallah, Cowie Martin R, McMurray John J V et al. The diagnostic accuracy of the natriuretic peptides in heart failure: systematic review and diagnostic meta-analysis in the acute care setting BMJ 2015; 350 :h910
  5. Maisel AS, Krishnaswamy P, Nowak RM, et al. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med. 2002;347:(3)161-7.
  6. McCullough PA, Nowak RM, McCord J, et al. B-type natriuretic peptide and clinical judgment in emergency diagnosis of heart failure: analysis from Breathing Not Properly (BNP) Multinational Study. Circulation. 2002;106:(4)416-22.
  7. Schwam E. B-type natriuretic peptide for diagnosis of heart failure in emergency department patients: a critical appraisal. Acad Emerg Med. 2004;11:(6)686-91.
  8. Hohl CM, Mitelman BY, Wyer P, Lang E. Should emergency physicians use B-type natriuretic peptide testing in patients with unexplained dyspnea? CJEM. 2003;5:(3)162-5.
  9. Mueller C, Scholer A, Laule-Kilian K, Martina B, Schindler C, Buser P, et al. Use of B-type natriuretic peptide in the evaluation and management of acute dyspnea. N Engl J Med 2004;350(7):647-54.
  10. Moe GW, Howlett J, Januzzi JL, Zowall H. N-terminal pro-B-type natriuretic peptide testing improves the management of patients with suspected acute heart failure: primary results of the Canadian prospective randomized multicenter IMPROVE-CHF study. Circula- tion 2007;115(24):3103-10.
  11. Rutten JH, Steyerberg EW, Boomsma F, van Saase JL, Deckers JW, Hoogsteden HC, et al. N-terminal pro-brain natriuretic peptide testing in the emergency department: beneficial effects on hospitalization, costs, and outcome. Am Heart J 2008;156(1):71-7.
  12. Schneider HG, Lam L, Lokuge A, Krum H, Naughton MT, De Villiers Smit P, et al. B-type natriuretic peptide testing, clinical outcomes, and health services use in emergency department patients with dyspnea: a randomized trial. Ann Intern Med 2009;150(6):365-71.
  13. Trinquart L, Ray P, Riou B, Teixeira A. Natriuretic peptide testing in EDs for managing acute dyspnea: a meta-analysis. Am J Emerg Med. 2011;29:(7)757-67.
  14. Al Deeb M, Barbic S, Featherstone R, Dankoff J, Barbic D. Point-of-care ultrasonography for the diagnosis of acute cardiogenic pulmonary edema in patients presenting with acute dyspnea: a systematic review and meta-analysis. Acad Emerg Med. 2014;21:(8)843-52.
  15. Singer AJ, Birkhahn RH, Guss D, et al. Rapid Emergency Department Heart Failure Outpatients Trial (REDHOT II): a randomized controlled trial of the effect of serial B-type natriuretic peptide testing on patient management. Circ Heart Fail. 2009;2:(4)287-93.
  16. Pfisterer M, Buser P, Rickli H, et al. BNP-guided vs symptom-guided heart failure therapy: the Trial of Intensified vs Standard Medical Therapy in Elderly Patients With Congestive Heart Failure (TIME-CHF) randomized trial. JAMA. 2009;301:(4)383-92.