Decision-Making in Clinical Medicine¶

Chapter 4 | Part 1: The Profession of Medicine

KEY CLINICAL POINTS¶

Clinical expertise involves dual-process reasoning: fast intuitive (System 1) pattern recognition and slow analytic (System 2) deliberative thinking
Diagnostic test interpretation requires understanding sensitivity, specificity, likelihood ratios, and application of Bayes' theorem to calculate posttest probability
Heuristics (representativeness, availability, anchoring, simplicity) are cognitive shortcuts that can lead to diagnostic errors when misapplied
Evidence-based medicine integrates best available research evidence with clinical judgment and patient preferences for personalized decision-making
Clinical practice guidelines synthesize evidence systematically but must be applied with consideration of individual patient characteristics and preferences

1. DEFINITION & OVERVIEW¶

Medical practice fundamentally requires making decisions under conditions of inherent uncertainty. Sir William Osler's observation that 'Medicine is a science of uncertainty and an art of probability' captures the complex duality of clinical practice. Although deeply rooted in science, medicine remains a craft requiring varying levels of skill, knowledge, and understanding. This chapter addresses three pillars of modern medical practice: (1) expertise in clinical reasoning, (2) rational diagnostic test use and interpretation, and (3) integration of research evidence with clinical judgment (evidence-based medicine).

1.1 The Challenge of Clinical Expertise¶

Defining clinical expertise remains surprisingly difficult. Unlike chess or athletics with objective ranking systems, medicine lacks benchmarks to identify physicians who have attained the highest levels of clinical performance after completing training. Elite clinicians known for their problem-solving prowess often cannot explain their exact processes and methods, limiting dissemination of expertise. Furthermore, clinical virtuosity is not generalizable—an expert in hypertrophic cardiomyopathy may be no better than a first-year resident at managing neutropenic fever with hypotension.

1.2 Components of Clinical Expertise¶

Clinical expertise encompasses: (1) Cognitive dimensions involving integration of disease knowledge with verbal and visual cues and test interpretation; (2) Complex fine-motor skills for invasive procedures; (3) Effective communication and care coordination with patients and medical team members. Research on medical expertise remains sparse and is mostly centered on diagnostic reasoning.

2. CLINICAL REASONING: DUAL-PROCESS THEORY¶

The dual-process theory distinguishes two general conceptual modes of thinking: fast (System 1) and slow (System 2) reasoning. Understanding these systems and their interactions is fundamental to developing clinical expertise and avoiding diagnostic errors.

2.1 System 1: Intuitive Reasoning¶

Intuition (System 1) provides rapid, effortless judgments from memorized associations using pattern recognition and heuristics (rules of thumb). For example, 'black woman plus hilar adenopathy equals sarcoid' is a simple pattern useful in certain situations. Because no effort is involved in recalling patterns, clinicians often cannot articulate how judgments were formulated. Pattern recognition is a complex cognitive process that appears largely effortless—one can recognize faces, automobile models, or music from a few notes within milliseconds.

2.2 System 2: Analytic Reasoning¶

Analysis (System 2) is slow, methodical, deliberative, and effortful. A student might read about causes of hilar adenopathy and from that list identify diseases more common in black women, or examine the patient for skin or eye findings associated with sarcoid. For complex or unfamiliar diagnostic problems, clinicians typically resort to System 2 and proceed methodically using the hypothetico-deductive model.

2.3 Pattern Recognition in Clinical Practice¶

Experienced clinicians recognize familiar diagnostic patterns very quickly. The key is having a large library of stored patterns that can be rapidly accessed. Without an extensive stored repertoire, students and clinicians operating outside their area of expertise must use more laborious System 2 approaches with more intensive data collection. Example patterns with hemoptysis: (1) 46-year-old healthy nonsmoker recovering from viral bronchitis with blood-streaked sputum suggests acute bronchitis—chest x-ray sufficient; (2) 46-year-old with 100-pack-year smoking history, productive cough, weight loss fits lung carcinoma pattern—requires chest x-ray, sputum cytology, CT scan; (3) 46-year-old immigrant from developing country with diastolic rumbling murmur suggests rheumatic mitral stenosis—requires echocardiogram.

2.4 Diagnostic Verification and Premature Closure¶

Pattern recognition alone is insufficient for secure diagnosis. Without deliberative systematic reflection, undisciplined pattern recognition can result in premature closure: mistakenly jumping to the conclusion of correct diagnosis before all relevant data are obtained. A critical second step is diagnostic verification: considering whether the diagnosis adequately accounts for all presenting symptoms and signs and explains all ancillary findings.

2.5 Case Example: Premature Closure¶

A 45-year-old man presents with a 3-week 'flulike' URI including dyspnea and productive cough. The ED clinician used a 'URI assessment form,' noted absence of fever and clear chest examination, prescribed cough suppressant for acute bronchitis, and reassured the patient. Following a sleepless night with significant dyspnea, the patient developed nausea, vomiting, collapsed, and was brought back in cardiac arrest—unable to be resuscitated. Autopsy showed posterior wall MI with fresh thrombus in atherosclerotic right coronary artery. The error: by concentrating on the abbreviated URI protocol, the clinician failed to elicit the full dyspnea history, which was precipitated by exertion, accompanied by chest heaviness, and relieved by rest.

3. HEURISTICS AND COGNITIVE BIASES¶

Heuristics or rules of thumb are part of the intuitive system, providing quick and easy paths to conclusions. However, when used improperly, they can lead to errors. Two major research programs have studied heuristics with different conclusions: the 'heuristics and biases' program focuses on how mental shortcuts lead to incorrect judgments, while the 'fast and frugal heuristics' program explores when simple heuristics produce good decisions.

Major Cognitive Heuristics and Associated Errors¶

Heuristic	Definition	Potential Error	Clinical Example
Representativeness	Pattern matching based on similarity to mental disease models	Overestimating rare disease likelihood; underestimating atypical presentations	Assuming pheochromocytoma with classic triad despite low prevalence
Availability	Judgments based on ease of recall of similar cases	Overweighting memorable/ recent/dramatic cases	Ordering unnecessary workup after malpractice case discussion
Anchoring	Insufficient adjustment of probability after new information	Ignoring test results that contradict initial impression	Proceeding to cath despite negative stress test
Simplicity (Occam's razor)	Seeking simplest unifying explanation	Premature closure; missing multiple diagnoses	Attributing all symptoms to one disease, missing comorbidity

3.1 Representativeness Heuristic¶

Clinicians develop diagnostic hypotheses based on the similarity of patient symptoms, signs, and data to their mental representations (memorized patterns) of disease possibilities. This cognitive shortcut involves pattern matching to identify diagnoses sharing the most similar findings to the patient. Example: A hypertensive patient with headache, palpitations, and diaphoresis—this classic triad suggests pheochromocytoma. However, judging pheochromocytoma as quite likely would be incorrect because other causes of hypertension are much more common and this triad can occur without pheochromocytoma. Errors arise from: (1) Overestimating likelihood based on representative symptoms while failing to account for low underlying prevalence (pretest probability); (2) Underestimating likelihood due to atypical presentations of common diseases; (3) Inexperience with disease breadth, especially multi-organ system diseases like sarcoid or tuberculosis.

3.2 Availability Heuristic¶

Judgments based on how easily prior similar cases or outcomes can be brought to mind. Example: A clinician may recall a morbidity and mortality conference case where an elderly patient presented with painless acute dyspnea, was evaluated for pulmonary causes, but eventually found to have acute MI with diagnostic delay contributing to ischemic cardiomyopathy. If associated with malpractice, such examples become even more memorable. Errors arise from recall bias: rare catastrophic outcomes become memorable with clarity and force disproportionate to their likelihood—a sore throat patient eventually found to have leukemia, or a young athlete with leg pain found to have osteosarcoma. Publicized or recently experienced cases are easier to recall and more influential on clinical judgments.

3.3 Anchoring Heuristic (Conservatism/Stickiness)¶

Involves insufficiently adjusting the initial probability of disease up (or down) following a positive (or negative) test compared with Bayes' theorem—sticking to the initial diagnosis. Example: A clinician may still judge probability of coronary artery disease to be high despite a negative exercise perfusion test and proceed to cardiac catheterization.

3.4 Simplicity Heuristic (Occam's Razor)¶

States that clinicians should use the simplest explanation possible that will adequately account for the patient's symptoms and findings. Although attractive and often used, no biologic basis for it exists. Errors include premature closure leading to neglect of unexplained significant symptoms or findings.

4. HYPOTHETICO-DEDUCTIVE REASONING MODEL¶

For complex or unfamiliar diagnostic problems, clinicians typically resort to analytic reasoning (System 2) and proceed methodically using the hypothetico-deductive model. This structured approach involves systematic hypothesis generation, refinement, and verification.

4.1 Hypothesis Generation¶

Based on the patient's stated reasons for seeking medical attention, clinicians develop an initial list of diagnostic possibilities. During the history of present illness, initial hypotheses evolve as emerging information is tested against mental models of diseases being considered.

Diagnoses increase and decrease in likelihood or are dropped from or added to consideration as working hypotheses. Mental models generate additional questions that distinguish diagnostic possibilities from one another. The focused physical examination contributes to further distinguishing working hypotheses. Key questions: Is the spleen enlarged? How big is the liver? Is it tender? Are there palpable masses or nodules?

4.3 Diagnostic Verification¶

Involves testing the adequacy (whether diagnosis accounts for all symptoms and signs) and coherency (whether signs and symptoms are consistent with underlying pathophysiologic causal mechanism) of the working diagnosis. Example: If the enlarged and tender liver on examination is due to acute hepatitis (hypothesis), then certain specific liver function tests will be markedly elevated (prediction). If tests return normal, the hypothesis may need to be discarded and others reconsidered.

4.4 Importance of Negative Findings¶

Negative findings are as important as positive ones because they reduce likelihood of diagnostic hypotheses under consideration. Chest discomfort not provoked or worsened by exertion and not relieved by rest in an active patient lowers likelihood of chronic ischemic heart disease. Absence of resting tachycardia and thyroid gland enlargement reduces likelihood of hyperthyroidism in paroxysmal atrial fibrillation.

4.5 Diagnostic Imperatives¶

The acuity of illness may override considerations of prevalence. Diagnostic imperatives recognize the significance of relatively rare but potentially catastrophic conditions if undiagnosed and untreated. Clinicians should routinely consider aortic dissection as a possible cause of acute severe chest discomfort. Although typical presenting symptoms differ from MI, dissection may mimic MI and is far less prevalent but potentially fatal if mistreated. Clinicians caring for acute severe chest pain patients should: explicitly and routinely inquire about symptoms suggesting dissection, measure blood pressures in both arms for discrepancies, examine for pulse deficits. When all negative, clinicians may feel reassured to discard the dissection hypothesis. If chest x-ray shows possible widened mediastinum, the hypothesis should be reinstated and appropriate imaging ordered (thoracic CT angiography or transesophageal echocardiogram).

5. KNOWLEDGE ORGANIZATION IN EXPERTS¶

Research has shifted from examining the problem-solving process of experts to analyzing the organization of their knowledge for pattern matching. Understanding how experts structure knowledge provides insights into developing clinical expertise.

5.1 Forms of Knowledge Organization¶

Experts organize knowledge as: (1) Exemplars: Diagnosis based on resemblance of new case to patients seen previously; (2) Prototypes: Abstract mental models of disease incorporating likelihood of various disease features; (3) Illness scripts: Include risk factors, pathophysiology, symptoms, and signs. Experts have a much larger store of exemplar and prototype cases (e.g., the visual long-term memory of experienced radiologists). Clinicians do not simply rely on literal recall but have constructed elaborate conceptual networks of memorized information or models of disease.

5.2 Characteristics of Expertise¶

No single theory accounts for all key features of expertise in medical diagnosis. Experts have more knowledge about presenting symptoms of diseases and a larger repertoire of cognitive tools than nonexperts. One definition highlights the ability to make powerful distinctions—a working knowledge of diagnostic possibilities and features that distinguish one disease from another. Memorization alone is insufficient; photographic memory of a medical textbook would not make one an expert. But having access to detailed case-specific relevant information is critically important.

5.3 Developing Expertise¶

Whether any didactic program can accelerate progression from novice to expert remains uncertain. Strategies used outside medicine (music, athletics, chess): (1) Deliberate effortful practice over extended time (10 years or 10,000 practice hours); (2) Personal coaching. Their use in medical practice has not been adequately explored. Studies suggest the most beneficial educational approach exposes students to both: disease pattern recognition (signs and symptoms of specific diseases) and differential diagnosis (lists of diseases presenting with specific symptoms). Active learning opportunities include developing a personal learning system, systematically reflecting on diagnostic processes used (metacognition), and following up to identify diagnoses and treatments for patients in their care.

6. DIAGNOSTIC TEST PERFORMANCE¶

The purpose of performing a test is to reduce uncertainty about diagnosis or prognosis to facilitate appropriate management. Any information that changes a clinician's understanding of the patient's problem qualifies as a diagnostic test, including history and physical examination.

Measures of Diagnostic Test Accuracy¶

Measure	Formula	Interpretation
True-positive rate (Sensitivity)	TP/(TP + FN)	Proportion with disease who test positive
False-negative rate	FN/(TP + FN) = 1 – sensitivity	Proportion with disease who test negative
True-negative rate (Specificity)	TN/(TN + FP)	Proportion without disease who test negative
False-positive rate	FP/(TN + FP) = 1 – specificity	Proportion without disease who test positive

2x2 Table for Diagnostic Test Results¶

Test Result	Disease Present	Disease Absent
Positive	True Positives (TP)	False Positives (FP)
Negative	False Negatives (FN)	True Negatives (TN)

6.1 Test Accuracy Fundamentals¶

Test accuracy is best assessed relative to a 'gold standard,' where a positive gold standard test defines patients with disease and a negative test securely rules out disease. Characterizing diagnostic performance requires: identifying an appropriate population (patients representative of those in whom the test would be used) and applying both new and gold standard tests to all subjects. Biased estimates occur when diagnostic accuracy is defined using an inappropriate population or when gold standard determination is incomplete.

6.2 Sensitivity and Specificity¶

Sensitivity (true-positive rate): Proportion of patients with disease who have a positive test. Reflects how well the test identifies patients with disease. False-negative rate = 1 – sensitivity. Specificity (true-negative rate): Proportion of patients without disease who have a negative test. Reflects how well the test correctly identifies patients without disease. False-positive rate = 1 – specificity. A theoretically perfect test would have 100% sensitivity and 100% specificity.

6.3 Clinical Mnemonics for Test Interpretation¶

SnNout: A test with very high Sensitivity when Negative helps rule OUT disease. SpPin: A test with very high Specificity when Positive helps rule IN disease.

6.4 Trade-off Between Sensitivity and Specificity¶

Calculating sensitivity and specificity requires selection of a threshold (cut point) above which the test is considered positive. Making the cut point stricter (raising it): lowers sensitivity, improves specificity. Making the cut point laxer (lowering it): raises sensitivity, lowers specificity. This trade-off is displayed graphically as a receiver operating characteristic (ROC) curve, plotting sensitivity (y-axis) versus 1 – specificity (x-axis). Each point represents a potential cut point with associated sensitivity and specificity values.

6.5 ROC Curve Interpretation¶

Area under the ROC curve (AUC): Quantitative measure of test information content. Values range from 0.5 (no diagnostic information; equivalent to coin flip) to 1.0 (perfect test). Choice of cut point should reflect relative harms and benefits of treatment. For safe treatments with substantial benefit: choose high-sensitivity cut point (upper right of ROC curve), e.g., phenylketonuria in newborns. For treatments with substantial harm risk: choose high-specificity cut point (lower left of ROC curve), e.g., chemotherapy for cancer. Prevalence also affects cut point choice: Low prevalence emphasizes harms of false-positives (e.g., HIV testing in marriage applicants) or false-negatives (e.g., HIV testing in blood donors).

7. BAYES' THEOREM AND LIKELIHOOD RATIOS¶

Bayes' rule provides a way to quantify revised uncertainty using probability mathematics, helping avoid anchoring bias. It calculates posttest probability from three parameters: pretest probability, test sensitivity, and test specificity.

Interpretation of Likelihood Ratios¶

Likelihood Ratio Range	Effect on Posttest Probability	Clinical Significance
>10 (positive test)	Large increase	Often conclusive for ruling in disease
5-10 (positive test)	Moderate increase	Intermediate discriminatory ability
2-5 (positive test)	Small increase	Modest discriminatory ability
1	No change	Uninformative test
0.2-0.5 (negative test)	Small decrease	Modest discriminatory ability
0.1-0.2 (negative test)	Moderate decrease	Intermediate discriminatory ability
<0.1 (negative test)	Large decrease	Often conclusive for ruling out disease

7.1 Key Definitions¶

Pretest probability: Quantitative estimate of likelihood of diagnosis before test is performed. Usually estimated from disease prevalence in underlying population or clinical context (age, sex, symptoms). Posttest probability (predictive value): Recalibrated probability of diagnosis accounting for both pretest probability and test results.

7.2 Bayes' Rule Formula¶

For probability of disease following a positive test: Posttest probability = [Pretest probability × test sensitivity] / [Pretest probability × test sensitivity + (1 – Pretest probability) × false-positive rate]. Example: 64-year-old woman with atypical chest pain, pretest probability 0.50, positive test (sensitivity 0.90, specificity 0.90): Posttest probability = (0.50)(0.90) / [(0.50)(0.90) + (0.50)(0.10)] = 0.45/0.50 = 0.90

7.3 Likelihood Ratios¶

Likelihood ratio summarizes the impact of diagnostic test results as the ratio of probability of a given test result in patients with disease to probability in patients without disease. Likelihood ratio for positive test (LR+) = sensitivity / (1 – specificity) = true-positive rate / false-positive rate. Example: Test with 0.90 sensitivity and 0.90 specificity has LR+ = 0.90/(1 – 0.90) = 9. A positive result is 9 times more likely in a patient with disease than without. Most medical tests have LR+ between 1.5 and 20. Higher values substantially increase posttest likelihood of disease. Very high LR+ (>10) usually implies high specificity—positive test helps 'rule in' disease (SpPin). Likelihood ratio for negative test (LR–) = (1 – sensitivity) / specificity = false-negative rate / true-negative rate. Lower LR– values substantially lower posttest likelihood of disease. Very low LR– (<0.10) usually implies high sensitivity—negative test helps 'rule out' disease (SnNout). Example: Test with 0.90 sensitivity and 0.90 specificity has LR– = (1 – 0.90)/0.90 = 0.11. A negative result is about one-tenth as likely in patients with disease as in those without.

7.4 Interpreting Likelihood Ratios¶

Discriminatory ability based on LR+: LR+ 2-5: Modest discriminatory ability; LR+ 5-10: Intermediate discriminatory ability; LR+ >10: High discriminatory ability. The nomogram version of Bayes' rule visually demonstrates how pretest probability combined with likelihood ratio yields posttest probability.

8. APPLICATIONS TO DIAGNOSTIC TESTING IN CAD¶

Comparison of two common tests for coronary artery disease diagnosis illustrates principles of test interpretation and the importance of pretest probability.

Comparison of Exercise Tests for CAD Diagnosis¶

Parameter	Exercise Treadmill	Exercise SPECT
Sensitivity	~60%	~90%
Specificity	~75%	~90%
Likelihood Ratio (+)	2.4	9.0
Discriminatory Ability	Modest (2-5 range)	Intermediate (5-10 range)
10% pretest fi posttest (+ test)	~30%	~50%
80% pretest fi posttest (+ test)	~95%	~97%
50% pretest fi posttest (+ test)	~80%	~90%

8.1 Exercise Treadmill Test¶

Positive treadmill ST-segment response: Average sensitivity ~60%, Average specificity ~75%, Likelihood ratio positive = 0.60/(1 – 0.75) = 2.4 (modest discriminatory ability, falls between 2-5). Application: 41-year-old man with atypical chest pain, no other risk factors: Pretest probability ~10%. After positive result: Posttest probability rises to only ~30%. Application: 60-year-old man with typical angina, multiple risk factors: Pretest probability ~80%. After positive result: Posttest probability rises to ~95%.

8.2 Exercise SPECT Myocardial Perfusion Imaging¶

Reversible exercise-induced perfusion defect: Sensitivity ~90%, Specificity ~90%, Likelihood ratio positive = 0.90/(1 – 0.90) = 9.0 (intermediate discriminatory ability, falls between 5-10). Application: Same 10% pretest probability patient: Positive test raises probability of CAD to 50%. Despite higher accuracy (50% vs 30%), may not change management because result still represents only 50:50 chance of disease. Application: 80% pretest probability patient: Exercise SPECT raises posttest probability to 97% (vs 95% for treadmill). More accurate test does not provide enough improvement to alter management; neither test improves much on clinical data alone.

8.3 Key Principles from CAD Testing Examples¶

(1) Positive results with accurate tests when pretest probability is low do not move posttest probability to range high enough to rule in disease. In screening situations, pretest probabilities are particularly low (asymptomatic patients), making specificity especially important. Example: Screening first-time female blood donors without HIV risk factors—positive test raised HIV likelihood to only 67% despite 99.995% specificity because prevalence was 0.01%. (2) With high pretest probability, negative tests may not rule out disease adequately if not sufficiently sensitive. (3) Largest change in diagnostic likelihood following test result occurs when clinician is most uncertain (pretest probability 30-70%). Example: 70-year-old woman with typical angina and multiple risk factors—pretest probability ~50%. Positive treadmill moves posttest probability to 80%. Positive exercise SPECT moves posttest probability to 90%.

8.4 Limitations of Bayes' Rule Applications¶

(1) Few tests provide only positive or negative results; many have multidimensional outcomes (e.g., extent of ST-segment depression, exercise duration, symptoms). Bayes' theorem can be adapted but is computationally more complex. (2) For sequential tests, posttest probability may be used as pretest probability for second test. This assumes conditional independence (first test results don't affect second test result likelihood)—often not true. (3) Sensitivity and specificity are often claimed to be prevalence-independent but frequently are not. Example: Treadmill exercise test has ~30% sensitivity in one-vessel CAD but ~80% in severe three-vessel CAD. Hospitalized/symptomatic/referral populations have higher disease prevalence and more advanced disease than outpatients—test sensitivity will likely be higher in hospitalized patients, specificity higher in outpatients.

9. RISK PREDICTION MODELS¶

Multivariable statistical models more accurately address complex diagnostic problems than simple Bayes' rule by simultaneously accounting for multiple relevant patient characteristics.

Wells Clinical Prediction Rule for Pulmonary Embolism (PE)¶

Clinical Feature	Points
Clinical signs of deep-vein thrombosis	3
Alternative diagnosis is less likely than PE	3
Heart rate >100 beats/min	1.5
Immobilization ‡3 days or surgery in previous 4 weeks	1.5
History of deep-vein thrombosis or PE	1.5
Hemoptysis	1
Malignancy (with treatment within 6 months) or palliative	1
INTERPRETATION:
Score >6.0	High probability
Score 2.0-6.0	Intermediate probability
Score <2.0	Low probability

9.1 Advantages of Prediction Models¶

Models explicitly account for multiple, possibly overlapping, pieces of patient-specific information and assign relative weight to each based on unique independent contribution to prediction. Example: A logistic regression model to predict CAD probability considers all relevant independent factors from clinical examination and diagnostic testing and their relative importance, rather than the limited data clinicians can manage mentally or with simple Bayes' rule.

9.2 Limitations and Applications¶

Prediction models are usually too complex computationally to use without calculator or computer. Guideline-driven treatment recommendations based on statistical prediction models available online: ACC/AHA risk calculator for primary prevention with statins, CHADS I -VASc calculator for anticoagulation in atrial fibrillation. Some predictive models are now embedded in electronic health record (EHR) systems, most commonly addressing thrombosis/anticoagulation and sepsis. Evidence about impact on patient outcomes is mostly observational—more work needed to deliver risk information to the right clinician at the right time supporting clinical workflow.

9.3 Model Validation¶

Only a handful of prediction models have been validated sufficiently (e.g., Wells criteria for pulmonary embolism). The importance of independent validation in a population separate from the development population cannot be overstated. An unvalidated risk prediction model should be viewed with skepticism appropriate for any new drug or device without rigorous clinical trial testing. When statistical survival models in cancer and heart disease have been compared with clinicians' predictions, models have been more consistent (as expected) but not always more accurate.

10. PERSONALIZED DECISION-MAKING¶

The modern ideal of medical therapeutic decision-making is to 'personalize' treatment recommendations by combining best available evidence with individual patient features and preferences.

10.1 Two Levels of Personalization¶

(1) Precision medicine: Individualizing risk of harm and benefit for options based on specific patient characteristics (risk factors, genomics, comorbidities). (2) Shared decision-making: Personalizing the therapeutic decision process by incorporating patient preferences and values for possible health outcomes. This typically involves clinicians sharing knowledge about options and associated consequences/trade-offs, and patients sharing their health goals (e.g., avoiding short-term surgical risk to see a grandchild's wedding).

10.2 Limitations of Personal Clinical Experience¶

Individualizing evidence does not mean relying on physician impressions from personal experience. Due to nonrandom selection, small sample sizes, and rare events, the chance of drawing erroneous causal inferences from one's own clinical experience is very high. For most chronic diseases, treatment response is a counterfactual concept demonstrable only statistically in large populations. Examples of what cannot be individually inferred: Treating a hypertensive patient with ACE inhibitors necessarily prevented a stroke during treatment, or an untreated patient would definitely have avoided their stroke if treated. For many chronic diseases: majority of patients remain event-free regardless of treatment choices, some will have events regardless of treatment, and those who avoided events through treatment cannot be individually identified. Blood pressure lowering (surrogate endpoint) does not have tightly coupled relationship with strokes prevented.

11. NONCLINICAL INFLUENCES ON CLINICAL DECISION-MAKING¶

More than three decades of research on variations in clinician practice patterns have identified important nonclinical forces that shape clinical decisions, grouped into three overlapping categories.

Nonclinical Influences on Clinical Decision-Making¶

Category	Factors	Potential Effects
Practice Style	Knowledge, training, experience, specialty perspective	Variation in diagnostic approach, treatment thresholds
Defensive Medicine	Malpractice concerns, fear of adverse outcomes	Overuse of tests/therapies, perpetuation of practice norms
Practice Setting	Workflow, technology, organization, environment, resource availability	Physician-induced demand, care fragmentation
Payment Systems	Fee-for-service, capitation, salary, value-based	Overuse (fee-for-service) or underuse (capitation) of services

11.1 Practice Style Factors¶

Factors include physician's knowledge, training, and experience. Specialists generally know evidence in their field better than generalists. Practice style defines norms of clinical behavior based on training, personal experience, and medical evidence. Examples of practice style differences: Cardiologists evaluating lower-risk chest pain often conceptualize primary diagnostic objective as maximizing ischemia detection, favoring stress imaging. Internists may be more comfortable with initial exercise ECG without imaging, following guideline recommendations indicating no outcome advantage for stress imaging in this context. Cardiologists may favor more liberal use of coronary angiography and revascularization in stable ischemic symptoms ('oculostenotic reflex').

11.2 Defensive Medicine¶

Physician perceptions about malpractice suit risk may drive clinical decisions. Defensive medicine involves ordering tests and therapies with very small marginal benefits to preclude future criticism should adverse outcome occur. Over time, such patterns may become accepted practice norms, perpetuating overuse (e.g., annual cardiac exercise testing in asymptomatic patients).

11.3 Practice Setting Factors¶

Work systems: tasks, workflow (interruptions, inefficiencies, workload). Technology: EHR design or implementation issues. Organizational characteristics: culture, leadership, staffing, scheduling. Physical environment: noise, lighting, layout. Physician-induced demand: once medical facilities and technologies become available, physicians find ways to use them. Environmental factors: local availability of specialists, high-tech imaging or procedure facilities (MRI, proton beam therapy), fragmentation of care.

11.4 Payment System Factors¶

Fee-for-service: Physicians who do more get paid more, encouraging overuse consciously or unconsciously. When fees are reduced (discounted reimbursement), clinicians tend to increase number of services to maintain revenue. Capitation: Fixed payment per patient per year encourages physicians to consider global population budget, ideally reducing interventions with small marginal benefit. Salary: Fixed payment regardless of volume. Current efforts seek transition to value-based payment systems to reduce overuse and reflect benefit. Pay-for-performance models are being studied, but high-quality clinical trial evidence for effectiveness is still mostly lacking.

12. FORMAL DECISION SUPPORT TOOLS¶

Over the past 50 years, many attempts have been made to develop computer systems to aid clinical decision-making and patient management.

12.1 Levels of Computer Support¶

Basic level: Ready access to vast reservoirs of information (may be difficult to sort through). Higher levels: Support care management decisions by making accurate predictions of outcome, simulate whole decision process, provide algorithmic guidance. Computer-based predictions using Bayesian or statistical regression models inform clinical decisions but do not reach 'conclusions' or 'recommendations.'

12.2 Artificial Intelligence in Medicine¶

Recent advances suggest medicine is on the threshold of developing much more powerful digital tools, but current enthusiasm exceeds demonstrated utility. AI work dates to the 1950s with three major subtypes: Neural networks; Machine learning (and subtype deep learning): being applied to pattern recognition tasks such as skin lesion examination and x-ray interpretation; Generative AI (models generating new content): includes large language models (e.g., GPT-4). Large language models offer promise in helping create clinical notes. Their use in support of clinical decision-making is at a very preliminary stage with need for independent validation in populations separate from development populations. Early evidence suggests clinicians are willing to rely on AI-based tools even when information provided is clearly inaccurate or contradictory. Concerns about model confabulation and potential patient harms mandate careful and comprehensive testing before AI tools are implemented in clinical care.

12.3 Reminder and Protocol-Directed Systems¶

Do not make predictions but use existing algorithms (guidelines, appropriate utilization criteria) to direct clinical practice. Decision support systems have so far had little impact on practice overall. Reminder systems built into EHRs have shown the most promise, particularly in correcting drug dosing and promoting adherence to guidelines. Checklists may also help avoid or reduce errors.

12.4 Decision Analysis¶

Represents a normative prescriptive approach to decision-making under uncertainty. Principal application is in complex decisions. Public health policy decisions often involve trade-offs: length versus quality of life, benefits versus resource use, population versus individual health, uncertainty regarding efficacy/effectiveness/adverse events, values regarding mortality/morbidity outcomes.

12.5 Decision Analysis Example: Breast Cancer Screening¶

2016 CISNET analysis examined eight strategies differing by initiation age (40, 45, or 50 years) and frequency (annually, biennially, or hybrid). Six simulation models found biennial strategies most efficient for average-risk women. Results: Biennial screening for 1000 women age 50-74 vs no screening: avoided 7 breast cancer deaths. Annual screening age 40-74: avoided 3 additional deaths but required 20,000 additional mammograms and 1988 more false-positive results. Factors influencing results: Women with 2-4 fold higher risk—annual screening from 40-74 yielded similar benefits as biennial 50-74. Average-risk patients with moderate or severe comorbidities—screening could stop earlier at age 66-68 years.

13. DIAGNOSIS AS AN ELEMENT OF QUALITY OF CARE¶

High-quality medical care begins with accurate diagnosis. Diagnostic errors represent a significant quality of care and patient-safety problem.

13.1 Incidence of Diagnostic Errors¶

Estimated by various methods: postmortem examinations, medical record reviews, medical malpractice claims. Each method yields complementary but different estimates. Current estimates suggest nearly everyone will experience at least one diagnostic error in their lifetime.

13.2 Consequences of Diagnostic Errors¶

Mortality, morbidity, unnecessary tests and procedures, costs, and anxiety.

13.3 Modern View of Diagnostic Errors¶

Past view: Failure of individual clinicians. Modern view: Mostly system of care deficiencies. Solutions focus on system-level approaches: decision support and other tools integrated into EHRs, checklists proposed as means of reducing cognitive errors such as premature closure. Checklists have been shown useful in certain contexts (operating rooms, intensive care units), but value in preventing diagnostic errors leading to patient adverse events remains to be demonstrated.

14. EVIDENCE-BASED MEDICINE¶

EBM places greater emphasis on processes by which clinicians gain knowledge of the most up-to-date and relevant clinical research to determine whether interventions alter disease course and improve length or quality of life.

14.1 Four Key Steps of EBM¶

(1) Formulating the management question to be answered; (2) Searching the literature and online databases for applicable research data; (3) Appraising the evidence gathered with regard to its validity and relevance; (4) Integrating this appraisal with knowledge about the unique aspects of the patient (including patient preferences about possible outcomes).

14.2 Practical Challenges¶

Searching world research literature and appraising study quality and relevance can be time-consuming and requires skills most clinicians do not possess. In busy clinical practice, this work is logistically not feasible. This has led to focus on finding recent systematic overviews as a useful shortcut in the EBM process.

14.3 Systematic Reviews¶

Regarded by some as highest level of evidence in EBM hierarchy because they are intended to comprehensively summarize available evidence on a particular topic. To avoid biases found in narrative review articles, predefined reproducible explicit search strategies and inclusion/exclusion criteria seek to find all relevant scientific research and grade its quality. Prototype: Cochrane Database of Systematic Reviews. When appropriate, meta-analysis quantitatively summarizes findings.

14.4 Limitations of Systematic Reviews¶

Not uniformly the acme of EBM they were initially envisioned to be. Value is less clear when: only a few trials are available, trials and observational studies are mixed, evidence base is only observational. Cannot compensate for deficiencies in underlying research. Many are created without requisite clinical insights. Medical literature is flooded with systematic reviews of varying quality and clinical utility. Peer review has not proved to be an effective arbiter of quality. Systematic reviews should be used with circumspection along with selective reading of best empirical studies.

15. SOURCES OF EVIDENCE: CLINICAL TRIALS AND REGISTRIES¶

Over the past 50 years, understanding of how best to turn raw observation into useful evidence has evolved considerably. The COVID-19 pandemic provided a refresher lesson in this process.

Comparison of Evidence Sources¶

Source	Strengths	Limitations
Randomized Clinical Trials	Best protection against selection bias; high internal validity	May lack generalizability; restrictive eligibility; regulatory compromises
Prospective Registries	Broader population; real-world practice; feasible when trials impossible	Treatment selection bias; requires statistical adjustment
Retrospective Data	Readily available; large sample sizes possible	Limited to recorded data; high bias potential; no control over confounders
Case Reports/Series	First reports of new findings; hypothesis-generating	No causal inference; cannot establish standards of practice

15.1 COVID-19 Pandemic Lessons¶

Starting spring 2020, case reports, anecdotal experience, and small case series appeared and quickly became a flood of confusing and often contradictory evidence. Despite >40,000 publications in first 7 months, enormous uncertainty remained about prevention, diagnosis, treatment, and prognosis. Many early publications were small observational series or reviews with substantial limitations in validity and generalizability. Small observational studies may generate important hypotheses or be first reports of adverse events or therapeutic benefit, but have no role in formulating modern standards of practice.

15.2 Major Tools for Developing Reliable Evidence¶

Randomized clinical trials supplemented strategically by large (high-quality) observational registries. Registry or database typically focused on: disease or syndrome (cancer types, acute/chronic CAD, chronic heart failure), clinical procedure (bone marrow transplantation, coronary revascularization), administrative process (claims data for billing and reimbursement).

15.3 Observational Data Strengths and Limitations¶

By definition, investigator does not control patient care. Carefully collected prospective observational data can at times achieve evidence quality approaching major clinical trial data through trial emulation using causal inference methods. Retrospective data (chart review) are limited to what previous observers recorded and may not include specific research data sought. Advantages: Inclusion of broader population as encountered in practice than typical clinical trials (which have restrictive inclusion/exclusion criteria); Primary evidence when randomized trial cannot be performed (e.g., cannot randomize based on sex, race/ethnicity, socioeconomic status, or to potentially harmful interventions like smoking). Major limitation: Lack of protection from treatment selection bias (unlike randomized trials). Statistical models attempt to adjust for important imbalances, but when management is clearly not random (e.g., all eligible left main CAD patients referred for surgery), the problem may be too confounded for statistical correction.

15.4 Importance of Concurrent Controls¶

Use of concurrent controls is vastly preferable to historical controls. Example: Comparing current surgical management of left main CAD with medically treated patients from the 1970s would be extremely misleading because medical therapy has substantially improved.

15.5 Randomized Controlled Clinical Trials¶

Include careful prospective design features of best observational studies plus random allocation of treatment. Provides best protection against measured and unmeasured confounding due to treatment selection bias (major aspect of internal validity). May not have good external validity (generalizability) if recruitment excluded many potentially eligible subjects or if eligibility describes very heterogeneous population. Consumers of evidence need to be aware that randomized trials vary widely in quality and applicability. Design process often involves compromises—FDA approval trials must fulfill regulatory requirements (e.g., placebo control) that may result in trial population and design differing substantially from what practicing clinicians would find most useful.

16. META-ANALYSIS¶

Meta-analysis is research that combines and summarizes available evidence quantitatively. Most useful for summarizing all available randomized trials examining a particular therapy in a specific clinical context.

NNT Calculation Example¶

Population Risk	Control Event Rate	Treatment Event Rate	ARR	NNT
High risk	12%	8%	4%	25
Lower risk	6%	4%	2%	50

16.1 Key Features of Good Meta-Analysis¶

Ideally, unpublished trials should be identified and included to avoid publication bias (missing 'negative' trials). Best meta-analyses obtain and analyze individual patient-level data from all trials rather than using only summary data from published reports. Results most persuasive if they include at least several large-scale, properly performed randomized trials. Not all published meta-analyses yield reliable evidence—methodology should be scrutinized to ensure proper study design and analysis.

16.2 Value of Meta-Analysis¶

Can help detect benefits when individual trials are inadequately powered. Example: Benefits of streptokinase thrombolytic therapy in acute MI demonstrated by ISIS-2 in 1988 were evident by early 1970s through meta-analysis. However, when available trials are small or poorly done, meta-analysis should not be viewed as remedy for deficiencies in primary trial data or design.

16.3 Interpreting Meta-Analysis Results¶

Typically focus on summary measures of relative treatment benefit (odds ratios, relative risks). Clinicians should also examine absolute risk reduction (ARR). Number Needed to Treat (NNT) = 1/ARR. NNT should not be interpreted literally as causal statement. Example: If therapy reduced 5-year mortality by 33% (relative benefit) from 12% (control) to 8% (treatment), ARR = 12% – 8% = 4%, and NNT = 1/0.04 = 25. This does not mean literally that 1 patient benefits and 24 do not—it is an informal measure of treatment efficiency.

16.4 NNT and Risk Stratification¶

If same treatment applied to lower-risk population (6% 5-year mortality), 33% relative benefit reduces absolute mortality by 2% (6% to 4%), and NNT = 50 for same therapy in lower-risk group. Comparisons of NNT estimates should account for duration of follow-up used to create each estimate. The NNT concept assumes homogeneity in response to treatment that may not be accurate. NNT is simply another way of summarizing absolute treatment difference and does not provide unique information.

17. CLINICAL PRACTICE GUIDELINES¶

Per 1990 Institute of Medicine definition: 'Systematically developed statements to assist practitioner and patient decisions about appropriate health care for specific clinical circumstances.'

17.1 Key Features of Modern Guidelines¶

(1) Created using tools of EBM—core of development process is systematic literature search followed by review of relevant peer-reviewed literature; (2) Usually focused on clinical disorder (diabetes mellitus, stable angina) or health care intervention (cancer screening); (3) Primary objective is to improve quality of medical care by identifying care practices that should be routinely implemented based on high-quality evidence and high benefit-to-harm ratios.

17.2 Scope and Limitations of Guidelines¶

Guidelines are intended to 'assist' decision-making, not to define explicitly what decisions should be made in particular situations. Guideline-level evidence alone is never sufficient for clinical decision-making. Example: Deciding whether to intubate and administer antibiotics for pneumonia in a terminally ill individual, in an individual with dementia, or in an otherwise healthy 30-year-old mother requires individualization beyond guideline recommendations.

17.3 Guideline Document Components¶

Narrative documents constructed by expert panels whose composition is often determined by interested professional organizations. Panels vary in expertise and degree to which they represent all relevant stakeholders. Documents consist of: series of specific management recommendations, summary indication of quantity and quality of evidence supporting each recommendation, assessment of benefit-to-harm ratio for the recommendation, and narrative discussion of underlying evidence and clinical considerations.

18. KEY POINTS & CLINICAL PEARLS¶

Essential concepts for clinical practice in medical decision-making.

Clinical Pearls in Medical Decision-Making¶

Domain	Key Principle
Clinical Reasoning	Pattern recognition must be followed by diagnostic verification to avoid premature closure
Heuristics	Representativeness, availability, anchoring, and simplicity heuristics can lead to systematic errors when misapplied
Test Interpretation	SnNout (high Sensitivity Negative rules OUT) and SpPin (high Specificity Positive rules IN)
Bayes' Theorem	The largest change in diagnostic likelihood occurs when pretest probability is 30-70%
Low Pretest Probability	Even accurate positive tests may not raise probability enough to rule in disease
High Pretest Probability	Negative tests may not adequately rule out disease if not sufficiently sensitive
Diagnostic Imperatives	Always consider rare but catastrophic conditions (e.g., aortic dissection in acute chest pain)
Evidence Quality	Unvalidated risk prediction models should be viewed with skepticism like untested drugs/devices
NNT	Number needed to treat varies with baseline risk—same relative benefit yields different NNTs in different populations
Guidelines	Guidelines assist but do not replace individualized clinical decision-making