Konkretna zasada: no execution within 48 hours of a proposal unless your checklist is complete, at least one external reviewer has signed off, and every data point is timestamped. Without that gate, your assumptions wont be challenged and small errors compound; with it, reversal rates drop and accountability is visible.
Define roles clearly: assign a skeptic role that must contest the plan, a sponsor who owns outcomes, and a recorder who logs whats at stake and the evidence. In one case annie, a surgeon, paused for 24 hours, documented alternatives and asked a peer to simulate worst-case scenarios; that simple change exposed a hidden contraindication.
Mitigate bias: quantify confidence thresholds (for example, >70% subjective confidence triggers a third-party audit), require a pre-mortem that names likely failure modes, and label emotion in notes so affective signals dont masquerade as facts. Overconfidence becomes visible when projections are compared to baseline forecasts; thinking should involve calibrated probabilities rather than slogans.
Operationalize reviews: all minutes and content must be versioned and reviewed weekly for high-impact items, with any change logged and justified. For each case record the objective criterion that would force reversal, who reviewed it, and why the choice still stands. Documented, repeatable rules reduce ambiguity and improve the quality of subsequent making and follow-up.
Step 2: Decide how you’re going to decide by creating your decision criteria in advance

Set 5–7 measurable criteria and assign numeric weights that sum to 100 before evaluating alternatives. Ensure each criterion ties to a clear goal metric; you shouldnt change weights after seeing proposals – if you do, log the reason and timestamp the change as news.
Incorporate at least two objective measures (cost, time to market, failure rate) and one qualitative criterion reflecting values or customer impact; add any unique constraints up front. Use historical times and years of data where available; if you didnt have five years, use the available half-window and apply conservative bounds drawn from science to set minimum thresholds.
Score options 0–10 per criterion, multiply by each criterion’s weight and sum. Ask yourself whether a little improvement on a high-weight item is more valuable than large gains on low-weight items; focus on high-weight metrics and avoid letting bias pull you astray. Don’t prioritize someone else’s or elses pet metric over what your model shows.
For teams with multiple languages, translate criteria and sample answers so scoring stays aligned; keep the scoring space small (max 7 criteria) to minimize noise. When new information arrives, tag it as news and rescore only if it changes a criterion’s associated assumptions; after rescoring, record what you found and why so reviewers stay informed.
Use a lightweight spreadsheet template (columns: criterion, weight, score, weighted score, notes) and store versions so reviewers can track how thinking evolved. Unfortunately bias accumulates over years; schedule retrospectives every 6–12 months to confirm criteria still reflect the original goal and to catch drift astray. This approach is useful for solving trade-offs and shouldnt be treated as simply opinion – tie evidence to each score and document the sources you found.
Articulate the decision goal in one clear sentence
Define the goal in a single sentence that specifies metric (with baseline), numeric target, deadline, owner by role, and a concrete rollback trigger.
Example sentence your company uses: ‘Reduce monthly churn from 6.0% to 4.0% within 90 days (baseline 6.0% measured Jan 2025), owned by Head of Customer Success, budget increase capped at $200k, rollback to previous model if NPS drops >5 points or revenue impact exceeds $50k monthly.’ This final phrasing shows baseline, target, timeframe, owner, and stop condition; it will frame the context for testing, set a fast feedback loop, and create a clear list of acceptance criteria. weve found that most teams who break goals into these elements find faster alignment; admit uncertainty with a confidence band (±0.5 percentage points) so overconfident forecasts arent enforced. Equally assign monitoring responsibilities and watch signals weekly; the role that should lead the measurement must have authority to fire the rollback if early indicators reverse progress, and might trigger a full reverse course if sustained harm appears. For long-term evaluation, specify whether the intervention continues after 12 months and how the company will measure sustained impact. Use this sentence as the final goal and treat it as the single source of truth when you need encouragement to decide or to quickly find who will lead execution.
List the core decision criteria that will judge each option

Score each option numerically against five fixed criteria and accept those with a weighted score ≥6.5: Impact (benefit) 40%, Total cost (TCO) 20%, Risk & ambiguity 20%, Alignment with organization goals 10%, Time-to-value 10%. Use a 0–10 rating per criterion and record sources for every rating – good baseline metrics must be captured beforehand and tied to verifiable facts.
Calculate a weighted sum (Σ weight_i × rating_i / 10). Set decision bands between 6.5–8.0 for pilot and ≥8.0 for full rollout; mark anything with a risk rating ≤2 as a veto regardless of higher totals. Though numbers drive prioritization, include a binary check for regulatory or safety factors. Document the steps used to score, whether assumptions were tested, and apply red-team thinking to challenge optimistic inputs.
Measure adoption risk via observed behavior in pilot events: watch uptake rates, training completion, and how different user groups respond differently. Survey and log what you heard from stakeholders; note if key peoples wont change behavior without incentives. If stakeholders argue a point, record the contention and quantify its impact as a separate factor.
Require a trusted reviewer from a different function to validate sources and facts, attach evidence links, and sign-off on mitigation plans. Raise acceptance thresholds to a higher level for strategic choices. Maintain an audit log of steps, scoring, outcomes and failed assumptions so future evaluations compare performance verywell against past events.
Define the data you’ll collect for each criterion
Collect three concrete data types for every criterion: a numeric metric, supporting evidence, and provenance metadata (who, when, how); record units, time window, and confidence score immediately.
- Metric name and definition – whats being measured, unit, calculation formula (e.g., “Weekly active users / MAU, percent”).
- Baseline and target – numeric baseline (date-stamped) and target value with deadline; record time window for comparisons.
- Measurement frequency and method – continuous, daily sample, survey; instrument or SQL query used and versioning for reproducibility.
- Sample size and variance – N, standard deviation or margin of error; note when sample is small so you treat the result as low-confidence.
- Confidence / probability – numeric chance (0–100%) that the metric reflects reality; document how that chance was estimated.
- Evidence links and content – links to articles, experiments, logs, transcripts; save short excerpt (one sentence) describing why the evidence matters.
- Source and collector – name of maker or team that provided the data, contact, and collection timestamp; flag external vs internal sources.
- Bias checks – list known biases (selection, survivorship, overconfident forecasts) and one mitigation step per bias.
- Weight and scoring rubric – numeric weight for the criterion (sum of weights = 100) and clear score mapping (0–10 with anchors for 0, 5, 10).
- Normalization method – how scores across heterogeneous metrics are converted to a common scale (z-score, min-max, rank), with formula saved.
- Fallback handling – what to do if data is missing: provisional score, imputation method, or mark for follow-up; allow self-compassion in early rounds to avoid paralysis.
- Stakeholder input log – list of others who reviewed the data, their role, timestamp, and short rationale for any changes to score or weight.
- Decision audit trail – brief note of reasons for final assignment of weight or score and links to relevant content for future review.
- Cost and time estimates – direct cost, opportunity cost, and implementation time in person-hours; indicate uncertainty range.
Use the template below as a fillable record for each option you evaluate; keep records machine-readable (CSV/JSON) and human-readable (one-paragraph summary).
-
Example: Criterion = “Customer retention”
- Metric: 30-day retention rate (%) – formula: retained_users/active_users_30d
- Baseline/target: 18% (2025-10-01) → target 25% by 2026-04-01
- Frequency/method: daily cohort calc via analytics query v3
- Sample/variance: N=12,000; SE=0.4%
- Confidence: 85% (science-based cohort stability + A/B validation)
- Evidence: experiment #412, customer interviews (links to articles and transcripts)
- Source: analytics team (maker: J. Kim), collected 2025-11-02
- Bias check: retention inflated by reactivation campaigns – mitigation: isolate organic cohort
- Weight: 30 (of 100); scoring rubric: 0=<15%, 5=20%, 10=>25%
- Cost/time: $4k implementation, 120 hours
-
Example: Criterion = “Implementation risk”
- Metric: probability of major outage (%) – derived from past rollout failure rates
- Baseline/target: 10% → target <5%
- Evidence: postmortems, fault-injection results, ops runbooks
- Bias check: avoid overconfident judgments; require at least two independent reviewers
- Weight: 20; normalization: map probability to 0–10 inverse scale
heres a quick checklist for collectors: clearly name metric, attach evidence links, record time and maker, state weight and chance, note who reviewed, and save raw input for later audit; if youre looking for validation, prioritize reproducible measures and triangulate with at least some qualitative input to learn why numbers moved and whats likely to change.
When results look wrong or contradictory, slow down: log the reasons, call for targeted data collection, and use a small experiment to gain muscle in probabilistic thinking rather than trusting a single source; there are others who will challenge your view – record their objections and test them.
Set thresholds or benchmarks for acceptance
Require explicit numeric thresholds before green-lighting any initiative: probability of success ≥70%, expected ROI ≥15%, maximum downside exposure ≤10% of project budget, and reliability for software releases ≥99.0% uptime over a 30-day window.
Break acceptance into three measurable parts: quality, value, and risk. For quality use defect rate per KLOC or per 1,000 transactions; for value use NPV or margin uplift; for risk use downside as percent of budget and lives impacted. If any part falls below its threshold, reject or require mitigation. State which factor failed, who verified it, and what evidence you want before reconsideration.
| Type | Metryczny | Acceptance threshold | Action if below |
|---|---|---|---|
| Product launch | Projected adoption probability | >= 70% | Hold launch; require additional piloting |
| Hiring | Role-fit score (0–100) | >= 80 | Do not extend offer; collect references |
| Software release | Automated test pass rate | >= 98.5% | Block release; fix failing suites |
| Research / experiment | Statistical power & p-value | Power >= 0.8 and p <= 0.05 | Increase sample or redesign |
Remove emotion from acceptance by converting beliefs into testable hypotheses. When youve documented assumptions, write the acceptance criteria in tickets and require that reviewers state how they tested each belief. If prior iterations didnt meet criteria, include the failure modes and the corrective rule for re-evaluation.
Weight factors explicitly and allocate them equally or by business priority; e.g., quality 40%, value 40%, risk 20%. Use a weighted scorecard: accept when weighted score >= 0.75 of maximum. After sensitivity analysis, if small changes to inputs make something swing across the threshold, raise the level of scrutiny or require contingency funds.
Use reproducible checks so articles or reports that support a case can be traced to raw data; springs of new requirements should originate from measurable impact, not anecdotes. If the impact on lives is non-zero, increase thresholds by +15 percentage points and require an independent audit. For problems you cant solve immediately, define a timeboxed remediation plan with owners and metrics so you know whether going forward still meets acceptance.
Choose a decision rule that fits your criteria (scoring, ranking, or binary pass/fail)
Use a numerical scoring system for multi-criteria comparisons: score each option 0–100 on no more than 8 criteria, assign weights that sum to 100, require at least three independent raters, calculate mean and standard deviation, and set a pass threshold (example: 70). If SD > 15 points on any option, trigger a second round of blind scoring; record every individual score and short rationale so future reviews can trace trade-offs and reduce overconfidence and emotion-driven shifts.
Apply ranking when you need a clear top-N from many viable alternatives: have raters rank their top 10, convert ranks to Borda scores, aggregate and report the median rank plus interquartile range; if adjacent ranks are within 5% of score, treat them as a tie and run a focused comparison. Use binary pass/fail only for safety or compliance criteria–examples from psychiatry and medtech projects show binary rules cut harm: a treatment must meet minimum safety incidence <2% or have demonstrated efficacy ≥60% in known trials before moving forward.
Operational rules to improve outcomes: anonymize submissions so makers and evaluators judge content, not authors; require independent scoring to mitigate groupthink; run pre-mortems and insist teams acknowledge key failure modes before final selection; record the pre-mortem notes alongside scores. Consult recent articles on bias reduction to design calibration sessions for raters and to describe common pitfalls raters experience when under time pressure or when experiencing strong emotion.
Practical tolerances and contingencies: if aggregated score lies within ±3 points of threshold, escalate to a cross-functional panel; if raters’ variance remains high after recalibration, bring in two external reviewers who score independently and whose means break ties. For different project types, adjust thresholds: exploratory work can accept lower pass cutoffs (example: 60) with mandatory monitoring, high-risk work must use binary gates. Heres a checklist to craft your rule set: define criteria and weights, set numeric thresholds, mandate minimum raters, anonymize inputs, require pre-mortems, record all inputs, and schedule an after-action review to improve calibration and institutional memory.
9 Steps to Build a Strong Decision-Making Process">
What Is Empathy? Definition, Examples, and Its Impact">
Fearful-Avoidant Triggers – Why Do I Feel Triggered by Both Closeness and Distance?">
Back to School Season – Stress-Free Tips to Help You Succeed">
How to Cope When Your Ex Starts Dating Again – Practical Tips to Heal and Move On">
Selective Attention Theories in Psychology – Models and Evidence">
Guilt-Tripping – Definition, Signs, Examples, and How to Respond">
How to Avoid the Expectations vs Reality Trap – Practical Tips">
How to Get Over Someone You Never Dated – Practical Steps to Move On">
9 Essential Things to Consider Before Breaking Up With Your Partner">
10 Essential Open Relationship Rules to Follow for Relationship Success">