Trading Systems Handbook · v1.0

Design, test, operate.

Everything that turns a trading idea into a system you can trust with real capital — the anatomy of a complete strategy, the mathematics of edge and risk, honest backtesting and validation, and the operations that keep a live system alive. Every concept paired with its formula, worked numbers, and concrete rules.

System anatomy Edge & expectancy Position sizing Risk of ruin Backtesting Walk-forward Execution Monitoring

01What a Trading System Is

A system is not an indicator, a signal, or a feeling. It is a complete set of rules that answers every decision a trade demands — and answers them the same way every time. This section defines the object the rest of the handbook builds, tests, and operates.

The definition that matters

A trading system is a repeatable, fully specified procedure that converts market data into trading decisions with no decision left to in-the-moment judgement. Given the same inputs, it produces the same output — whether executed by you, a colleague, or a machine. That property is what separates a system from a "style".

The practical test: could a stranger trade your account identically from your written rules, without phoning you? If the honest answer is no, you have intuitions, not a system — and intuitions cannot be backtested, sized, audited, or improved methodically.

Core idea A trading system is a falsifiable hypothesis about market behaviour, expressed precisely enough to test on history and execute without ambiguity. Everything downstream — backtesting, sizing, risk control — depends on this precision existing first.

Discretionary, systematic, and hybrid

These are points on a spectrum of how much judgement enters the loop, not a moral hierarchy. Each can be profitable; each fails differently.

DimensionDiscretionarySystematicHybrid
Decision sourceTrader judgement in the momentPre-defined rules, mechanically appliedRules generate candidates; trader vetoes/confirms
Backtestable?No — judgement isn't reproducibleYes — fullyPartially — the rule layer only
Scales with capital/markets?Poorly (operator is the bottleneck)WellLimited by the human step
Primary failure modeEmotion, inconsistency, fatigueRegime change the rules didn't anticipateSelective override that quietly destroys the edge
Best forReading context, news, anomaliesRepeatable, measurable edgesEdges that need human context but rule discipline

Most successful discretionary traders are, in fact, undocumented systematic traders: they apply a consistent internal procedure they have never written down. The work of building a system is largely the work of extracting that procedure into explicit rules — which is precisely where contradictions and gaps surface.

Why systematise an edge you already trade

  • Measurability. You cannot improve what you cannot measure. A specified system has an expectancy, a drawdown, a sample size — numbers you can act on.
  • Falsifiability. Rules can be proven wrong on history before they cost you live money. A feeling cannot.
  • Consistency. The system trades the same on your best day and your worst. Most blow-ups are not bad rules — they are good rules abandoned under stress.
  • Compounding of knowledge. Every trade becomes a labelled data point, not a vague memory. The edge sharpens with sample size.
  • Leverage. A specified system can be automated, monitored, and scaled. An intuition lives and dies with your attention.

What a system is not

Disambiguation
  • A system is not a signal service — signals are outputs; a system is the full procedure that also defines size, risk, and exit.
  • A system is not a guarantee — a positive expectancy is a long-run statistical claim, not a promise about the next trade or the next month.
  • A system is not an edge — it is the container for one. A perfectly specified system with no edge loses money with great consistency.

The two things every viable system needs

Strip everything else away and a tradeable system rests on two independent pillars. Lose either and the account dies — just on different timelines.

1 · A real edge

Positive expectancy after costs, demonstrated over a sample large enough to distinguish skill from luck. Without it, better risk control only slows the bleed.

2 · Survival

Risk and position-sizing rules that guarantee you are still solvent after the inevitable losing streak. Without it, a real edge is wiped out before it can pay you.

The rest of this handbook is the engineering of those two pillars: Sections 02–06 build the edge and its rules; Sections 03–04 and 10–14 build and protect survival.

02Anatomy of a Complete System

A complete system answers eight questions, in order. Skip one and you have left a decision to chance. The order is not cosmetic — each component constrains the next, and getting the sequence wrong (sizing before stops, entry before regime) is one of the most common structural errors.

The eight components

01Universe 02Regime filter 03Setup 04Entry 05Initial stop 06Position size 07Exit logic 08Manage
The pipeline. The first row decides whether and where to act; the second decides how much and for how long. Risk (05) is defined before size (06) because size is a function of the stop, never the reverse.
#ComponentThe question it answersConcrete forms
01UniverseWhat instruments are we even allowed to trade?A fixed list (e.g. major FX pairs), or a screen (liquidity, spread, ATR floor)
02Regime filterIs the system permitted to act right now?Trend filter (price vs 200-EMA), volatility band, session window, news blackout
03SetupWhat recurring condition defines an opportunity?Pullback to a level, range break, oversold extreme, momentum cross
04EntryWhat exact event triggers the order, and how do we get in?Trigger candle close + market/limit/stop order at a defined price
05Initial stopWhere is the idea proven wrong? (This defines 1R.)Beyond structure, k × ATR, or a fixed distance
06Position sizeHow much, given the stop and the risk budget?Fixed-fractional %, volatility-targeted, fractional Kelly
07Exit logicHow and when do we take profit or cut?Fixed R-target, trailing stop, time stop, opposite signal
08ManageWhat happens to the trade while it is open?Move to breakeven at +1R, scale out in tranches, pyramid
The sequencing rule Risk before size. The single most common structural mistake is choosing a position size first ("I'll do 1 lot") and discovering your risk afterwards. Professionals invert it: define the stop, define the dollar risk budget, then solve for the size that makes those two consistent. Size is an output, never an input.

A worked specification

The same idea, written first as a discretionary "style" and then as a system, makes the difference concrete.

As a style (untradeable)

"I buy GBPUSD pullbacks in an uptrend when it looks like the dip is done, and I take profit into resistance."

As a system (tradeable)

Every word below is checkable and reproducible — and therefore backtestable.

GBPUSD trend-pullback — full specification
  • Universe: GBPUSD only, 1-hour bars.
  • Regime filter: close > 200-EMA and 50-EMA > 200-EMA (uptrend confirmed). No trades in the 30 minutes around high-impact GBP/USD news.
  • Setup: price pulls back and the low touches the 20-EMA while the regime filter holds.
  • Entry: buy-stop 2 pips above the high of the first bar that closes back above the 20-EMA.
  • Initial stop: 1.5 × ATR(14) below entry. This distance defines 1R.
  • Position size: risk 0.5% of equity; size = (equity × 0.5%) ÷ (stop distance in pips × pip value).
  • Exit: take half at +1R, trail the remainder under each new swing low; time-stop the trade if +1R is not reached within 24 bars.
  • Manage: move stop to breakeven once +1R is filled.

Notice that the system version exposes decisions the style hid: how far is a valid pullback, which EMA, what confirms "the dip is done", where exactly is the stop. Those hidden decisions are where discretionary edges silently drift — and where a system makes the drift impossible.

03Edge & Expectancy

A system makes money for exactly one reason: positive expectancy realised over a large enough sample. Not a high win rate, not a good feeling, not a clever indicator. This section is the arithmetic of edge — and the traps that arithmetic exposes.

Expectancy: the master number

Expectancy is the average profit or loss per trade you can expect over many trades. It is the product of how often you win and how much you win versus lose.

E = (W% × avgWin) − (L% × avgLoss)
# W% = win rate, L% = loss rate = 1 − W%
# avgWin / avgLoss in currency or pips

To compare systems across instruments and account sizes, normalise everything to R — the initial risk per trade. One R is the distance from entry to your initial stop. A trade that makes twice its risk is +2R; a trade stopped out is −1R.

R-multiple = (exit price − entry price) ÷ (entry price − initial stop) # for a long
Expectancy[R] = (W% × avgWinR) − (L% × avgLossR)
# > 0 means the system is profitable per unit of risk, before frequency
Why R changes everything Expressing results in R makes a system account- and instrument-agnostic. "I made 40R last year" is portable; "I made $4,000" is not. R is also the unit your position sizing (Section 04) is built on — the two systems speak the same language.

Win rate is a vanity metric

Win rate alone tells you nothing about profitability, because it ignores the size of wins versus losses. The payoff ratio b = avgWin ÷ avgLoss couples them. The win rate you need merely to break even falls as your payoff ratio rises:

Break-even win rate = 1 ÷ (1 + b) # b = avgWin ÷ avgLoss (the reward:risk ratio)
100% 50% 0% 1:1 2:1 3:1 4:1 5:1 reward : risk (payoff ratio b) 50% 33% 25% above the line = profitable
The win rate you must clear just to break even. At 2:1 you need only 33% winners; at 3:1, 25%. This is why trend systems survive on low win rates — their wins dwarf their losses.
Payoff ratio (b)Break-even win rateWin rate for healthy edgeTypical archetype
0.5 : 166.7%> 75%Mean reversion / scalping
1 : 150.0%> 55%Range / oscillator systems
2 : 133.3%> 40%Swing / breakout
3 : 125.0%> 33%Trend-pullback
5 : 116.7%> 25%Trend-following / momentum
The win-rate trap A 90%-win-rate system that risks 10R to make 1R has an expectancy of (0.9 × 1) − (0.1 × 10) = −0.1R per trade. It feels wonderful nine times out of ten and quietly bankrupts you on the tenth. High win rates and tight stops sell courses; expectancy pays bills.

Profit factor — a second lens

Profit factor is gross profit divided by gross loss. It is closely related to expectancy but reads more intuitively as "how many dollars I make per dollar I lose".

Profit factor = gross profit ÷ gross loss = (W% × avgWin) ÷ (L% × avgLoss)
# 1.0 = break-even · 1.3–1.6 = solid · > 2.0 = excellent (and worth double-checking for look-ahead bias)

Frequency: expectancy is per trade, growth is per year

Per-trade expectancy alone does not grow an account — expectancy multiplied by trade frequency does. A smaller edge taken often can dominate a larger edge taken rarely.

System A

+0.2R per trade × 200 trades/year = +40R/year. Low edge, high frequency.

System B

+0.5R per trade × 30 trades/year = +15R/year. High edge, low frequency.

System A compounds faster and reaches statistical significance sooner — but only if its costs per trade (Section 13) do not eat the thinner edge. Frequency amplifies both your edge and your friction.

Expectancy is a claim about samples, not trades

A positive expectancy is a statement about the long-run average, and every long-run average hides brutal short-run variance. A robust system will produce long losing streaks — they are a feature of randomness around a positive mean, not evidence the edge is gone.

Expected longest losing streak in N trades ≈ ln(N) ÷ ln(1 ÷ L%)
# e.g. L% = 60%, N = 500 → ln(500)/ln(1.667) ≈ 12 consecutive losses are normal
Implication for sample size You cannot judge a system on 20 trades — the variance swamps the signal. Treat anything under ~100 closed trades as anecdote, ~100–300 as suggestive, and 300+ as the floor for taking an expectancy estimate seriously. The losing streaks that this math guarantees are exactly what your position sizing in Section 04 must survive.

04Position Sizing & Risk of Ruin

Edge tells you whether to play; sizing tells you whether you survive long enough to collect. Most accounts are not killed by bad systems — they are killed by good systems sized too aggressively to outlast a normal losing streak. This is the survival pillar, and it is pure arithmetic.

The job of position sizing

Sizing has one job: convert a risk budget and a stop distance into a quantity. Everything flows from the stop you already defined in Section 02 — which is why stops come first.

Risk per trade ($) = Equity × risk%
Position size = Risk per trade ($) ÷ (stop distance × value per unit move)
# FX: lots = (Equity × risk%) ÷ (stopPips × pipValuePerLot)
Worked example — FX

Equity $10,000 · risk 0.5% · GBPUSD stop 25 pips · pip value ≈ $10/standard lot.

  • Risk per trade = 10,000 × 0.5% = $50
  • Lots = 50 ÷ (25 × 10) = 0.20 lots (20,000 units)

Widen the stop to 50 pips and the size halves to 0.10 lots — same dollar risk. The market decides the stop; your budget decides the dollars; size is whatever reconciles them.

Sizing methods, ranked

MethodIdeaStrengthWeaknessVerdict
Fixed lotSame size every tradeTrivialIgnores stop distance and account size; risk varies wildly per tradeAvoid
Fixed fractionalRisk a constant % of equity per tradeAuto-scales up in wins, down in losses; bounds drawdownSlow recovery after deep drawdownDefault. Start here.
Volatility targetingSize so each trade contributes equal volatility (size ∝ 1/ATR)Normalises risk across instruments and regimesNeeds reliable volatility estimate; reacts to vol spikesExcellent for multi-instrument
Fixed ratioIncrease size after a fixed profit increment (Δ)Aggressive growth for small accountsRisk grows non-linearly; punishing in drawdownNiche
Kelly / fractional KellyBet the growth-optimal fraction (or a fraction of it)Mathematically maximises long-run growthAssumes you know edge exactly; full Kelly is brutally volatileUse a fraction, as a ceiling — never raw

The Kelly criterion — and why nobody trades full Kelly

Kelly gives the fraction of capital that maximises long-run geometric growth.

f* = p − q ÷ b = (p(b + 1) − 1) ÷ b
# p = win prob, q = 1 − p, b = payoff ratio (avgWin ÷ avgLoss)
# example: p = 0.40, b = 3 → f* = (0.40 × 4 − 1) ÷ 3 = 0.20 → 20% of equity per trade

Twenty percent per trade is correct in theory and insane in practice. Full Kelly produces gut-wrenching drawdowns (a 50%+ drawdown is routine), and it assumes you know p and b exactly — you do not; you estimated them from a finite, noisy backtest. Over-estimate the edge and Kelly over-bets you straight into ruin.

Fractional Kelly Serious operators trade a fraction of Kelly — typically a quarter or less — and treat the Kelly number as an upper bound they stay well under. Half-Kelly keeps ~75% of the growth at ~half the volatility. For most discretionary FX systems, a flat 0.5–1% fixed-fractional risk already sits far below Kelly, which is exactly why it survives.

Risk of ruin

Risk of ruin (RoR) is the probability of losing enough capital to be unable (or unwilling) to continue. For a simplified 1:1 system risking a fixed unit per trade, with capital expressed as N units (N = 1 ÷ risk%):

RoR = (q ÷ p)^N for p > q ; RoR = 1 for p ≤ q
# N = number of "units" of risk in your account = 1 ÷ risk-per-trade
Risk per tradeUnits (N)RoR at 55% win (1:1)RoR at 50% win (no edge)
1%100≈ 0%100% (eventually certain)
2%50≈ 0.01%100%
5%20≈ 1.9%100%
10%10≈ 13.7%100%
20%5≈ 36.6%100%
Two lessons in one table First, risk-per-trade is the dominant lever on ruin — going from 1% to 10% turns a rounding error into a one-in-seven chance of blowing up. Second, with no edge (p ≤ q), ruin is certain given enough trades regardless of sizing — no money management rescues a negative-expectancy system. This is the fixed-bet approximation; real fixed-fractional sizing never hits exactly zero but trades that for deep drawdowns, which is why the true tool for RoR is Monte Carlo simulation (Section 11).

Drawdown is the binding constraint

Losses and the gains needed to undo them are not symmetric. A drawdown of depth d requires a gain of d ÷ (1 − d) just to get back to even — and that gap explodes as the hole deepens.

400% 300% 200% 100% 0 20% 40% 60% 80% drawdown depth +25% +100% +300%
Recover-to-even gain by drawdown. A 20% loss needs +25%; a 50% loss needs +100%; a 75% loss needs a +300% moonshot. Avoiding the deep hole is worth more than any entry signal.

Portfolio heat & correlation

Per-trade risk is not enough; you must cap portfolio heat — total open risk across all positions at once. Correlated positions are the trap: long EURUSD and long GBPUSD are not two 0.5% bets, they are closer to one 1% bet on a falling dollar.

  • Cap aggregate open risk (e.g. total heat ≤ 2–3% of equity), not just per-trade risk.
  • Treat correlated instruments as one position for the heat calculation; size the cluster, not each leg.
  • Cap correlated clusters so a single macro move (a dollar spike, a risk-off day) cannot hit every open trade at full size simultaneously.
Survival-first principle Sizing is engineered backwards from the drawdown you can survive — financially and psychologically — not forwards from the returns you want. Decide the worst losing streak the math says is normal (Section 03), confirm your size lets you sit through it without breaching your max-drawdown limit, and only then ask about returns. A system you abandon at the bottom of a normal drawdown has, for you, a negative expectancy.

05Strategy Archetypes

Every edge is a bet that a specific market behaviour will repeat. Each archetype below works in one regime and bleeds in its opposite — there is no all-weather edge. The meta-skill is knowing which archetype you are running, which regime it needs, and how to tell when that regime has left.

The two parents: trend and mean reversion

Almost every system descends from one of two opposing beliefs: that moves continue (momentum/trend), or that moves revert (mean reversion). They are negatively correlated by construction, which is also why running both can smooth an equity curve.

Trend-following buy strength / pullbacks Mean reversion fade extremes back to mean
Two opposite bets. Trend buys after a move proves itself and rides continuation; mean reversion fades a stretched move expecting a snap back to the average. Each is the other's worst regime.

The archetype map

ArchetypeThe betTypical win% / payoffNeeds this regimeBleeds when
Trend-followingStrong moves persist; let winners run30–45% / 2–5RSustained directional trendsChoppy, range-bound, mean-reverting markets (death by a thousand cuts)
Mean reversionExtremes overshoot and snap back60–75% / 0.5–1RRange-bound, stationary, high-noiseA trend or regime break runs through your fade (picking up coins in front of a roller)
BreakoutRange expansion births a new trend35–50% / 2–3RVolatility compression resolvingFalse breakouts in chop; getting whipsawed at the edges
Momentum (cross-sectional)Recent winners keep outperforming losers~50% / variesDispersion across a basket; persistent leadershipSharp reversals / correlation spikes flip the rankings
Carry (FX)Earn the interest-rate differential (swap)High win% / small, steadyCalm, risk-on, low-volatilityRisk-off: "up the stairs, down the elevator" — slow gains, violent reversals
Pairs / stat-arbA cointegrated spread reverts to its meanHigh win% / smallA stable statistical relationshipThe relationship structurally breaks (the spread never returns)
Session / time-basedIntraday seasonality (e.g. London open, NY overlap)VariesRepeatable liquidity/volatility windowsThe seasonality decays or shifts with market structure

Multi-timeframe: a structure, not an archetype

Multi-timeframe (MTF) is not a separate edge — it is a way of combining the above. The standard pattern: a higher timeframe sets the bias (which direction the regime filter permits), and a lower timeframe provides the trigger (a precise, cheaper entry). MTF tightens stops and improves reward:risk, but it cannot manufacture an edge that the underlying bet does not have.

The regime-dependence principle There is no edge that works everywhere — there are edges and the regimes that feed them. This is why Section 02's regime filter exists (only trade when your regime is present) and why Section 14's monitoring exists (detect when the regime — and therefore your edge — has left). A trend system with a regime filter that keeps it flat during chop is not a worse trend system; it is a complete one.
Combining for smoother equity Because trend and mean reversion profit in opposite regimes, a portfolio that holds both tends to have a smoother equity curve than either alone — one is usually working when the other is not. Diversifying across uncorrelated edges is the closest thing to a free lunch in system design, far safer than leveraging a single edge harder.

06From Idea to Specification

Most systems do not fail in the backtest — they fail because the rules were never truly nailed down, so the trader ran a slightly different system every week and never knew it. Specification is the unglamorous discipline that turns a belief into something you can test, size, audit, and improve. It is also the single hardest step.

Hypothesis before indicators

Start from a market behaviour you can state in one sentence, not from an indicator you find interesting. The hypothesis is the edge; the indicator is merely how you measure the hypothesis.

Indicator-first (backwards)

"The RSI looks useful — let me find settings that would have worked." This is curve-fitting with extra steps.

Hypothesis-first (correct)

"Liquid FX trends resume intraday after a shallow pullback to the mean." Now pick the cheapest indicator that captures that.

A hypothesis is testable and can be wrong. "RSI is good" cannot be wrong because it says nothing. If you cannot state, in plain language, what the market is doing and why your rules profit from it, you do not yet have a system to specify.

The specification: zero ambiguity

To specify is to answer all eight components (Section 02) such that two people — or a machine — would execute identically. Re-run the stranger test against every clause: could someone else act on this word without asking you what you meant?

Vague (a style)Specified (a system)
"Strong uptrend"close > 200-EMA and 50-EMA slope > 0 over last 10 bars
"Near support"within 0.25 × ATR(14) of a level touched ≥ 2× in the last 50 bars
"Wait for confirmation"a bar that closes back above the 20-EMA
"Don't trade the news"no entries in the 30 min before/after a high-impact GBP or USD release
"Take profit into resistance"limit at the nearest level above; else exit at +2R

Contradictions and gaps: a spec must be total

A real specification is total — it defines an action for every state the market can present. Two failure modes hide here, and both are invisible until they cost money:

  • Contradictions — two rules that fire at once and disagree. "Buy when RSI < 30" and "never buy below the 200-EMA" both trigger when an oversold market is also below trend. Which wins? If the spec doesn't say, you'll improvise — differently each time.
  • Gaps — states the rules never anticipated. The setup forms on the exact bar the regime filter flips. A target and a stop are both hit inside one bar. Price gaps past your entry. An undefined state is a coin-flip you didn't know you were making.
Why this is the whole game Contradictions and gaps are the difference between a backtest you can trust and one that quietly used assumptions your live self won't replicate. Resolving them before coding is cheaper than discovering them in a drawdown. Most "the backtest worked but live didn't" stories are unspecified states being resolved one way by the historical fill engine and another way by the panicking human.

Structure versus parameters

Separate the structural logic (the bet: "buy pullbacks in an uptrend") from the tunable parameters (the 20 in 20-EMA, the 1.5 in 1.5×ATR). Structure encodes your hypothesis; parameters are knobs. Every free parameter is a degree of freedom you can accidentally fit to noise (Section 09).

  • Minimise free parameters. Three robust ones beat ten finely-tuned ones. Each knob should earn its place with an economic reason, not a backtest improvement.
  • Prefer parameters that generalise. A 200-EMA trend filter is a broad, well-understood concept; a 187-period filter that tested 0.3% better is a red flag.
  • Fix what you can justify; only tune what you must. The fewer things you optimise, the less you can overfit.
Where Reign Edge fits This translation problem — turning a trader's tacit, in-the-head rules into an explicit, contradiction-free, testable specification — is precisely the problem the Reign Edge platform is built around. The hard part was never the indicators; it is making the rules say exactly what the trader means, surfacing the contradictions and gaps they didn't know were there, and expressing the result as a small set of typed, composable, auditable conditions rather than opaque free-form code. A specification you can read, test, and reason about beats a black box you can only run.

07Data Foundations

A backtest is only as honest as the data underneath it. Bad data does not produce obvious errors — it manufactures plausible, profitable-looking edges that evaporate the moment real money is on the line. Most "my backtest lied to me" stories are data stories.

The raw material: bars and OHLCV

A bar aggregates price action over an interval into five numbers: Open, High, Low, Close, Volume. The interval can be time-based (1H), tick-based (every 500 ticks), or volume-based. The critical limitation: a bar tells you the range but not the path within it.

The intrabar problem If a bar's range contains both your stop and your target, the OHLC alone cannot tell you which was hit first. Assume the pessimistic order (stop first) unless you have tick data to resolve it. Backtests that assume the target filled first are a common, silent source of inflated results.

Tick vs bar data

AspectTick dataBar (OHLCV) data
GranularityEvery quote/tradeSummary per interval
Resolves intrabar order?YesNo
Models spread/slippage?Yes (bid/ask)Approximation only
Size & costHuge, expensive to store/processCompact, cheap
Use forScalping, intrabar logic, realistic fillsSwing/position, higher-TF research

Data quality — clean before you trust

  • Spikes & bad ticks: erroneous prints that trigger phantom signals. Filter outliers.
  • Gaps & missing sessions: holidays, outages, weekend gaps in FX. Decide how each is handled, consistently.
  • Duplicate / misaligned bars: repeated timestamps or off-by-one alignment silently shift signals.
  • Timezone drift: the deadliest quiet bug — a "daily" bar means something different at broker-server time vs UTC vs your local time, and a session filter built on the wrong zone is wrong on every bar.

FX-specific realities

Foreign exchange has no central exchange, which changes the data picture in ways equity traders often miss:

  • No single price: each broker/liquidity provider quotes slightly differently. There is no canonical tape.
  • Variable spread: spread widens at session opens, around news, and in thin liquidity (Asian session, Friday close). A fixed-spread assumption flatters the backtest.
  • Swap / rollover: holding overnight earns or pays interest on the rate differential. For multi-day holds, unmodelled swap can flip a carry trade's sign.
  • Sessions & gaps: the market is continuous Mon–Fri but liquidity rotates across Tokyo/London/New York; the weekend gap can leap past stops.

Feed parity: one source, end to end

Single-feed principle Use one data source for historical research, forward testing, and live execution. Mixing a cheap historical vendor with a different live broker injects divergence that has nothing to do with your strategy's quality — different timestamps, different spreads, different fills. When backtest and live disagree, you must be able to rule out the data as the cause. A single feed (ideally the same provider you will trade through) makes the backtest a fair preview of live.
TierUseTrade-off
Free tick (e.g. Dukascopy)Research, spikes, archetype validationGreat granularity, but not your execution venue — for exploration only
Broker API history + live (single provider)Production backtest & liveParity across the whole pipeline; the configuration you actually trade
Mixed vendorsAvoid: divergence you cannot attribute to the strategy

08Backtesting

A backtest is a laboratory for trying to prove your hypothesis false on history before risking capital. Its job is not a pretty equity curve — it is an honest estimate of expectancy and of the conditions under which that expectancy holds. The moment you start trying to make it look good, you have stopped being the scientist and become the mark.

Vectorised vs event-driven

AspectVectorisedEvent-driven
How it runsComputes signals across the whole series at onceSteps through time bar-by-bar as if live
SpeedVery fast — ideal for scanning many ideasSlow — one finalist at a time
Path-dependent logicAwkward (trailing stops, partial fills, pyramiding)Natural — mirrors how live execution works
Look-ahead riskEasy to introduce accidentally (whole-series ops)Structurally harder — you only see the past
Best forResearch, parameter sweeps, idea triageValidating the finalist; matching the live engine
The two-stage approach Explore broadly in a fast vectorised engine, then re-test the survivor in an event-driven engine that mirrors your live execution path (same fill logic, same cost model). Discrepancies between the two stages are themselves diagnostic — they usually reveal hidden look-ahead or unmodelled path dependence.

The event-driven loop (the mental model)

Everything hinges on one phrase: data available up to now. For each bar, in order:

  1. Update indicators using only bars that have already closed.
  2. Evaluate the regime filter, then the setup, then the entry trigger.
  3. Simulate the fill with costs (spread, commission, slippage).
  4. Manage open trades (stops, targets, trails) against this bar's range.
  5. Mark equity and record the trade's R-multiple.

If step 1 ever peeks at the current or a future bar's close to make a decision you'd act on now, you have introduced look-ahead bias — and your results are fiction (Section 09).

Costs: the quiet edge-killer

Costs scale with frequency, and they attack thin edges first. Model them pessimistically — optimism here is self-deception with a spreadsheet.

Net expectancy[R] = Gross expectancy[R] − costs per trade[R]
costs ≈ (spread + commission + slippage) ÷ (stop distance) # all in the same units
How frequency turns cost into ruin

Gross edge +0.20R/trade, 200 trades/year → +40R gross.

  • Costs 0.10R/trade → net +0.10R → +20R (half the edge gone to friction).
  • Costs 0.20R/trade → net 0.00R → break-even — a real edge, fully consumed by costs.

The thinner and faster the edge, the more a credible cost model decides whether it is real. Stress your costs upward and see if the edge survives.

Pessimistic fills on stops Model stop-loss fills as worse than the trigger price (slippage works against you precisely when you're stopped in fast markets), and limit fills as no better than the level. A backtest that assumes perfect fills both ways systematically overstates a real edge.

In-sample and out-of-sample

Split your history. Develop, optimise, and iterate freely on the in-sample (IS) period. Reserve an out-of-sample (OOS) period that you test against once.

The OOS is sacred — and consumable Every time you look at OOS results and then go back and tweak the system, you have leaked OOS information into your design — the OOS quietly becomes in-sample, and its honesty is spent. The discipline is to finalise on IS, test on OOS exactly once, and if it fails, treat that as a finding (not an invitation to re-tune until OOS passes). For repeated, structured testing, use walk-forward (Section 11) instead.

What a credible backtest reports

  • A long period spanning multiple regimes (trending, ranging, high- and low-volatility, at least one crisis).
  • Costs included; IS and OOS shown separately; the trade count stated.
  • The distribution of outcomes, not just the total — equity curve, drawdown depth and duration, and the full metric set (Section 10).
  • Sensitivity to small parameter changes (Section 09) — robustness, not a single hero result.
A backtest is a hypothesis test, not a brochure If your instinct on a weak result is to search for the settings that fix it, you have inverted the exercise. The honest question is "does the edge survive my attempts to break it?" — not "can I make this curve go up?" A backtest engineered to impress impresses exactly one person: the one about to trade it.

09Overfitting & the Bias Catalog

Overfitting is the reason a beautiful backtest becomes a losing live system. It is fitting the noise in your historical sample instead of the signal that will repeat — and it is seductive precisely because it always looks like progress. This is the single most dangerous failure in system development.

What overfitting actually is

A market history contains both a (possibly real) pattern and a large amount of random noise specific to that sample. Overfitting is when your rules and parameters memorise the noise — the exact wiggles that will never recur — rather than the generalisable structure. The more free parameters you have and the more variations you try, the easier it becomes to fit noise perfectly.

The cruelty is the asymmetry: overfitting is invisible in the backtest (where it looks superb) and only revealed in live trading (where it costs money). You cannot detect it by admiring results — only by methodology applied before you see the results you were hoping for.

Symptoms

a single lucky peak robust: a broad plateau parameter value performance
Plot performance against a parameter. A robust edge sits on a wide plateau — neighbours perform similarly, so small errors don't matter. An overfit edge is a lone spike: nudge the parameter and it collapses. You want plateaus, not peaks.
  • Too many parameters relative to the number of trades — degrees of freedom that let you fit anything.
  • Fragility: performance collapses when a parameter moves slightly — a peak, not a plateau.
  • Too-good metrics: Sharpe > 3, 80%+ win rate and large payoff, a near-straight equity line. Real edges are noisier than this.
  • Great in-sample, poor out-of-sample — the textbook signature.
  • Works on one instrument only and breaks on similar ones — a genuine behavioural edge usually generalises at least somewhat.

The bias catalog

Overfitting is the headline, but it travels with a family of biases that all inflate backtest results. Know each by name so you can hunt for it deliberately.

BiasWhat it isDefence
Look-aheadUsing information not available at decision time (a bar's close, a future value, a revised figure)Event-driven loop; only closed bars; lag any revised data
SurvivorshipTesting only instruments that still exist; the failures were deletedUse point-in-time universes that include delisted/dead names
Data-snooping / multiple testingTrying many ideas and keeping the best — which looks good by luck aloneCount your trials; raise the bar; deflate the result (below)
Optimisation biasTuning parameters to the in-sample period's specific noiseFew parameters; walk-forward; demand plateaus
Selection biasCherry-picking the test window, instrument, or start date that flattersFixed, pre-declared test period across regimes
Hindsight in rule designAdding rules that "explain" past losses you already sawPre-register the hypothesis before looking; resist post-hoc patches
Cost omissionIgnoring spread, slippage, swapPessimistic cost model (Section 08)

The multiple-comparisons problem

If you test enough strategies, the best one will look brilliant even if none has any edge — the maximum of many random results is large by chance. The more configurations you search, the higher your performance bar must rise to mean anything.

Deflated Sharpe The honest correction is to discount your best result for the number of trials behind it (the deflated Sharpe ratio formalises this). Practically: track how many variations you tried, treat a Sharpe of 1.5 found after one honest attempt very differently from a Sharpe of 1.5 found after 500 sweeps, and be deeply suspicious of any result you only obtained by searching hard for it.

The parsimony toolkit

  • Minimise parameters and justify each one economically, not by backtest gain.
  • Out-of-sample testing, used sparingly (Section 08).
  • Walk-forward analysis and Monte Carlo — the core robustness tools (Section 11).
  • Demand plateaus, not peaks, in parameter space.
  • Hold out a final, untouched dataset for one last sanity check before going live.
  • Pre-register the hypothesis and count trials honestly. The discipline must precede the results.

10Performance Metrics

No single number describes a system. CAGR ignores risk; win rate ignores payoff; Sharpe hides drawdown duration. A system is a profile across four families — return, risk-adjusted return, drawdown, and trade quality — and every individual metric is gameable in isolation. Read them as a set.

Return, risk-adjusted return, and the equity picture

CAGR compounds the growth rate; risk-adjusted ratios divide return by some measure of pain.

CAGR = (Ending ÷ Beginning)^(1 ÷ years) − 1
Sharpe = (Rp − Rf) ÷ σ p # annualised ≈ daily Sharpe × √252; penalises ALL volatility
Sortino = (Rp − Rf) ÷ σ downside # penalises only downside — fairer to asymmetric systems
Calmar = CAGR ÷ |max drawdown| # return per unit of worst pain
Equity max DD Drawdown (underwater) 0% trough time →
The two views traders actually feel. The equity curve shows growth; the underwater curve shows how deep and how long you spent below the prior peak. Two systems with the same CAGR can have wildly different underwater curves — and you live in the underwater one.

Drawdown and trade quality

Maximum drawdown is the largest peak-to-trough equity decline; its duration — how long you stay underwater — is often the more punishing number. At the trade level, MAE/MFE (Maximum Adverse / Favourable Excursion) measure how far each trade ran against and for you before closing — invaluable for calibrating stops (are you getting stopped just before reversals?) and targets (are you leaving most of the move on the table?).

MetricDefinitionHealthy rangeThe gotcha
CAGRCompound annual growth rateContext-dependentSays nothing about risk taken to earn it
SharpeExcess return ÷ total volatility> 1 acceptable, > 2 strongPenalises upside; assumes near-normal returns; smoothable
SortinoExcess return ÷ downside volatility> 2 goodFairer to asymmetric systems, but noisier to estimate
Calmar / MARCAGR ÷ |max drawdown|> 0.5 ok, > 1 strongHostage to the single worst DD and the length of the test
Max drawdownLargest peak-to-trough decline< 20% comfortable for mostOne number hides duration and frequency
DD durationLongest time underwaterShorter is betterThe metric that actually breaks discipline
Profit factorGross profit ÷ gross loss1.3–1.6 solid> 2 — verify it isn't look-ahead
Expectancy [R]Average R per trade> 0; > 0.1 goodMeaningless below ~100 trades
Win rateWins ÷ totalOnly with payoff contextPure vanity in isolation
MAE / MFEWorst / best excursion per tradeUsed to tune stops & targetsNeeds trade-by-trade path data
Read the set, not the number Any single metric can be gamed: a high Sharpe can mask a shallow-but-endless drawdown; a high CAGR can ride a 70% max drawdown; a high win rate can hide a negative expectancy. Demand the whole profile — return, the ratio that captures your pain, the depth and duration of drawdown, and trade-level quality — before you believe a system is good.

11Robustness & Validation

A single backtest — even out-of-sample — is one path through one history with one set of parameters. Robustness testing asks the harder question: would this edge have survived different data, different parameters, and different luck? Here you stop admiring the system and start trying to break it on purpose.

Walk-forward analysis — the gold standard

Walk-forward analysis (WFA) mimics how you'd actually run a system: optimise on a window, trade the next unseen window with those settings, then roll the window forward and repeat. The concatenated out-of-sample segments form a realistic equity curve that was never optimised in hindsight.

in-sample (optimise) out-of-sample (test) time →
Rolling walk-forward. Each pass optimises on the grey block, then tests untouched on the red block, and slides forward. Stitching the red blocks together yields an out-of-sample curve you could actually have traded. Walk-forward efficiency = OOS performance ÷ IS performance; you want it reasonably high (say > 0.5), not a cliff.

Anchored WFA

In-sample window expands from a fixed start. Uses all history; adapts slowly. Good when more data always helps.

Rolling WFA

In-sample window is a fixed length that slides. Adapts to changing regimes; discards old data. Good when markets evolve.

Monte Carlo — the range of luck

Your backtest's max drawdown is a single sample of what randomness could deal you; the next one could be worse. Monte Carlo simulation reshuffles or resamples your trade results thousands of times to reveal the distribution of outcomes — especially the drawdowns you didn't happen to get but easily could.

median path trades → equity
Each faint line is one reshuffle of the same trades; the bold line is the median. The spread of final equity and (more importantly) of worst drawdown across thousands of runs tells you the range your sizing must survive — not just the one path history happened to draw.
  • Trade-order shuffling: reorder the same results — same total return, very different drawdown paths.
  • Bootstrap resampling: draw trades with replacement to build the outcome distribution.
  • Randomised skipping: drop a fraction of trades at random — does the edge survive missing some signals?

Sensitivity, regime, and stress

  • Parameter sensitivity: nudge every parameter ±10–20%; performance should degrade gracefully (a plateau), not collapse (Section 09).
  • Regime slicing: break results out by trending / ranging / high- vs low-volatility periods. A robust system needn't excel everywhere, but it must not be catastrophic in its off-regime — and its regime filter should keep it largely flat there.
  • Stress testing: replay the worst historical windows, double your slippage, widen spreads, and gap price through a stop. If the system only survives benign conditions, it isn't validated.
"Would I trade this?" — pre-validation checklist
  • Positive expectancy after pessimistic costs, over 100+ trades spanning multiple regimes.
  • Out-of-sample and walk-forward results hold (WFE not a cliff).
  • Parameter plateau, not a peak; survives ±20% nudges.
  • Monte Carlo worst-case drawdown is one your sizing and psychology can survive.
  • No single regime, instrument, or year carries the entire result.
Robustness beats optimisation A slightly sub-optimal system that is stable across data, parameters, and luck will out-earn a backtest-optimal one that is fragile — because live markets will differ from your sample. The goal is not the best historical curve; it is an edge that remains profitable while you are slightly wrong about everything.

12From Backtest to Live

The gap between a validated backtest and a profitable live account is where most edges quietly die — not from a flawed system, but from the implementation gap. Crossing it is a deliberate, staged process, and the first thing you measure live is not profit.

Three testing modes

ModeWhat it testsBlind spot
Paper / simulationLogic and operational bugs, with idealised fillsReal slippage and the operator's nerves
Forward test (real-time data)The truest data preview — same feed, no peeking aheadFills if paper; psychology if not live
Live micro-sizeReal fills and real psychology — the only test that includes youCosts real money (kept tiny on purpose)

The implementation gap

These are the backtest assumptions that break in contact with a live venue. Each one widens the gap between hypothetical and realised expectancy:

  • Slippage worse than modelled, especially on stops in fast markets.
  • Latency between signal and fill — the price you saw isn't always the price you get.
  • Partial or rejected fills, and requotes in thin liquidity.
  • Spread spikes at session opens and news that your fixed-spread backtest never charged you.
  • The operator — hesitating on a valid signal, overriding a loss, sizing up after a win.

Incubate, then scale in

Do not jump from simulation to full size. Run the system forward — paper or micro — for a window long enough to span a few dozen trades and at least one regime shift, then ramp capital in stages, each stage gated on the live edge continuing to track the backtest within tolerance.

micro size → small size → target size
# advance a stage only if: live expectancy ≈ backtest expectancy (within tolerance)
# AND drawdown is inside the limit AND execution stats match the cost model
Go-live checklist
  • Feed parity verified — same data source as backtest (Section 07).
  • Costs modelled and matching observed spread/slippage.
  • Position-size formula re-derived and unit-tested against hand calculations.
  • Max-drawdown limit and a kill switch coded, not just intended.
  • Logging of every decision and fill; alerting on anomalies; reconciliation of system state vs broker state.
  • A written, pre-committed pause rule (Section 14) — decided in calm, not in drawdown.
Measure tracking before profit The first live question is not "am I making money?" — over a small sample, that's mostly noise. It is "does live expectancy track backtest expectancy within tolerance?" If realised results diverge materially from the validated estimate, halt and find the cause — data, costs, slippage, or your own execution — before adding a cent of size. A divergence you ignore at micro-size becomes a catastrophe at target size.

13Execution & Operations

A validated edge can still bleed out through execution. This is the engineering layer — how orders actually reach the market, the FX-specific frictions, and the fail-safes that matter most precisely when a system is handling real money and something goes wrong.

Order types

OrderBehaviourCertaintyUse for
MarketFills now at best available priceFill certain, price uncertainWhen speed > price; risky in thin liquidity
LimitFills at your price or betterPrice certain, fill uncertainEntries at a level, taking profit
Stop (stop-market)Becomes a market order when the level tradesFill certain once triggered, price uncertain (slippage)Stop-losses, breakout entries
Stop-limitBecomes a limit order when triggeredPrice certain, fill uncertainCareful entries — dangerous for stop-losses, as it can leave you unprotected in a fast move
Trailing stopA stop that follows price by a set distanceLocks in gains progressivelyLetting winners run
The market-vs-limit trade-off Every entry is a choice between fill certainty (market) and price certainty (limit). High-frequency or thin-edge systems are exquisitely sensitive to this: a few tenths of a pip of extra slippage, taken hundreds of times, is the whole edge. Match the order type to how much your strategy can pay in slippage versus how badly it needs to be in the trade.

FX execution realities

  • Spread on every round trip: you pay it entering and it's baked into your exit. It is the most certain cost you have — model it on the bid/ask you actually trade.
  • Swap / rollover applies at the daily rollover (around 17:00 New York); triple swap is typically charged once a week for the weekend. Material for any multi-day hold.
  • Liquidity windows: the London/New York overlap is deepest and cheapest; the Asian session and the Friday close are thin and wide; the Sunday open can gap.
  • The weekend gap: price can open Monday far from Friday's close, leaping over stops — size and hold with that in mind.

The decision / risk split

Core principle for automated & AI systems Use judgement (human or model) to choose trades; use deterministic rules to enforce risk — never the reverse. A probabilistic or discretionary component — including an LLM — may decide whether to take a setup, but the limits that keep you solvent (stop placement, position size, the kill switch, the daily-loss cap) must be hard, deterministic code in the hot path. A non-deterministic component sitting inside the risk-enforcement loop is account-draining: its failures are quiet, context-dependent, and impossible to fully test. Decisions can be soft; risk limits must be hard.

Fail-safes & operational hardening

In trading, bugs rarely throw a clean exception — they lose money silently. Engineer the system to fail loudly and fail flat.

  • Kill switch: a global halt triggered by a daily-loss limit, an error-rate spike, or a data/broker disconnect — flatten and stop, don't "keep trying".
  • Circuit breakers: max-daily-loss and max-drawdown limits that automatically halt new entries.
  • Idempotency: idempotent order keys so a retry after a timeout never double-submits a position.
  • State reconciliation: continuously verify the system's view of open positions against the broker's truth; alert on any mismatch.
  • Connectivity & heartbeat: detect disconnects fast and behave safely — never leave orphaned orders or unmanaged positions.
  • Deterministic runtime: freeze inputs at decision time (point-in-time data, cached values), so the same bar always produces the same action — no surprise recomputation.
  • Full audit log: every signal, order, fill, and rejection recorded — both for debugging and for honest performance attribution.
Crash safe, not crash trying A system that halts cleanly and flat on an unexpected condition is far safer than one that gamely keeps placing orders into a state it doesn't understand. When in doubt, the correct default for a trading system is to stop and alert a human — flat is a position you can always recover from.

14Monitoring & Edge Decay

A live system is not "set and forget". Edges decay, regimes shift, and markets adapt to the inefficiencies you're exploiting. The job after launch is to know — quantitatively — whether the system is still the one you validated, and to have decided in advance what you'll do when it isn't.

Track live against backtest, continuously

Maintain rolling live statistics — expectancy, profit factor, win rate, average R — and compare them to the distribution your backtest and Monte Carlo produced (Section 11). Inside those bounds is normal variance; persistently outside them is a signal worth investigating.

upper limit lower limit backtest mean breach → investigate trades →
Treat the equity engine like a process under control. While rolling expectancy wanders within the limits (set from your Monte Carlo / historical drawdown distribution), it's noise. A sustained move below the lower limit is a trigger to investigate — not necessarily to stop, but to find out why.

Variance or decay? — the hard distinction

The central difficulty of monitoring is telling a normal losing streak (variance around a still-positive mean, which Section 03 proved is inevitable) from genuine edge decay (the inefficiency is gone). Over-react and you abandon good systems in normal drawdowns; under-react and you feed a dead one. The only defence is pre-defined, quantitative thresholds set while calm.

Cause of decayWhat happenedTell
Regime changeYour archetype's regime left (trend turned to chop)Underperformance concentrated in one regime; filter no longer firing
CrowdingOthers found and arbitraged the same edgeSlow, persistent erosion of expectancy across regimes
Structural changeMarket microstructure, spreads, or participants shiftedCosts/slippage drift up; fills worsen vs the model
Parameter driftThe world moved; your fixed parameters didn'tWalk-forward would now pick very different values

Pre-committed pause and retire rules

Decide these in calm and write them down — because in a live drawdown your judgement is compromised by the very situation it's judging.

  • Pause when a hard limit is breached: max drawdown hit, or daily-loss cap reached. Stop new entries, reassess.
  • Review when live expectancy sits outside its control band for a pre-set number of trades — investigate before deciding.
  • Retire when the thesis is invalidated — the market behaviour the system bets on demonstrably no longer holds. A dead edge doesn't deserve loyalty.
Two symmetric failure modes Monitoring exists to defend against both: abandoning a good system inside a statistically normal drawdown, and clinging to a dead system hoping it returns. In the moment you will be biased toward one or the other depending on your recent results — which is exactly why the thresholds must be set in advance, quantitatively, and obeyed.

15Psychology & Adherence

The system is the easy part. The hard part is the human operating it. The most common reason a profitable system loses money is not a flaw in the rules — it is a failure to follow them. Discipline is not a personality trait you either have or lack; it is infrastructure you build so that in-the-moment judgement can't quietly destroy the edge.

How a profitable system becomes a losing one

Every item below converts a positive-expectancy system into a negative one — without changing a single rule:

  • Overriding a valid signal because it "feels wrong" — usually right when the next winner arrives.
  • Skipping trades after a losing streak — abandoning the system at the bottom, missing the recovery.
  • Sizing up after wins — the most dangerous one; the biggest losses tend to follow the biggest, most confident bets.
  • Revenge trading after a loss — taking setups outside the system to "make it back".
  • Moving stops to avoid being wrong — converting a defined 1R loss into an undefined disaster.

The biases doing the damage

BiasMechanismDamage to the system
Loss aversionLosses hurt ~2× as much as equivalent gains feel goodCutting winners early, holding losers past the stop
Recency biasOver-weighting the last few tradesAbandoning the system after a normal losing streak
Outcome biasJudging a decision by its single resultDistrusting a good system after an unlucky loss
Gambler's fallacyBelieving you're "due" for a winSizing up to recover, breaking risk rules
Post-win overconfidenceRecent success inflates perceived skillSizing up into the next, larger loss
Confirmation biasSeeking evidence for what you want to doRationalising past the regime filter's "no"

Systematic does not mean emotionless

Even a fully automated system leaves you one decision: whether to keep it running through a drawdown. That single choice — made under maximum emotional pressure — is where most automated edges die too. The answer is not to "be more disciplined"; it is to remove the moment of weakness from the loop wherever you can.

The journal: making adherence measurable

Log every trade — entry, exit, size, R-multiple — and whether you followed the system, and if not, why. Then separate system P&L (what the rules would have made) from deviation P&L (the cost of your overrides). This turns discipline from a vague aspiration into a number you can confront.

The feedback loop Most traders discover, on doing this honestly, that their overrides have a strongly negative expectancy — that they would have made more money asleep. Quantifying the cost of discretion is the most powerful argument for sticking to the system, and it is exactly the kind of continuous, evidence-based coaching the Reign Edge journal is designed to surface.

Building discipline infrastructure

  • Pre-commit the rules in calm — including pause/retire thresholds (Section 14) — when you are not in a trade.
  • Automate what you can — automation removes the moment of weakness entirely; a rule a machine executes can't be overridden in a panic.
  • Size so you can sleep (Section 04) — most overrides are driven by positions that are simply too large.
  • Use a pre-trade checklist — force every entry through the same gate, every time.
Separate the decision from the outcome A good decision can lose; a bad decision can win. Judging your process by individual outcomes trains exactly the wrong reflexes — abandoning sound systems after unlucky losses, trusting reckless ones after lucky wins. Grade the process, not the trade. Over a large enough sample, a sound process and faithful execution are the only things you actually control.

16Worked Example & Pre-Launch Checklist

Every preceding section, applied once, end to end, to a single concrete system. This is the full lifecycle — from a one-sentence hypothesis to a monitored live system with pre-committed exit rules — run as one continuous workflow.

The lifecycle loop

01Hypothesise 02Specify 03Backtest 04Validate 05Forward test 06Go live 07Monitor 08Improve / retire iterate — the loop never ends; only the system retires
The lifecycle. Sections 06→02→08→11→12→13→14 mapped onto a loop: a system is continuously re-validated and improved, and the loop only stops when the edge is retired. "Set and forget" is not on this diagram.

End to end: the GBPUSD trend-pullback

Taking the system specified in Section 02 through every stage:

  1. Hypothesise. "Liquid FX trends persist intraday; after a shallow pullback to the mean within an established uptrend, continuation is more likely than reversal." One sentence, falsifiable.
  2. Specify. The full eight-component spec from Section 02 — universe (GBPUSD 1H), regime filter (200/50-EMA + news blackout), setup (pullback to 20-EMA), entry (buy-stop above the reclaim bar), stop (1.5×ATR = 1R), size (0.5% fixed-fractional), exit (half at +1R, trail remainder, 24-bar time stop), manage (breakeven at +1R). The stranger test passes.
  3. Data. Several years of 1H bars from a single feed that will also be the live broker, spanning trending, ranging, and at least one volatile crisis period; tick data reserved to resolve intrabar stop-vs-target order (Section 07).
  4. Backtest. Fast vectorised triage to confirm the idea has a pulse, then an event-driven re-test mirroring the live engine, with pessimistic spread + slippage + swap (Section 08).
  5. Validate. In-sample / out-of-sample split; rolling walk-forward; Monte Carlo on the trade sequence; ±20% parameter sweep for a plateau; regime slicing (Section 11).
  6. Forward & micro-live. Run forward on real-time data, then micro-size live, tracking live expectancy against the backtest distribution (Section 12).
  7. Scale & monitor. Ramp size only while live tracks backtest; maintain a control band; obey the pre-committed pause/retire rules (Sections 12 & 14).
Illustrative results only — not a real backtest The figures below are hypothetical, shown only to demonstrate how a validation report reads. They are not results from any actual system.
MetricIn-sampleOut-of-sampleRead
Trades420180Ample sample both windows
Expectancy+0.28R+0.24RHolds OOS — encouraging
Win rate / payoff42% / 2.640% / 2.5Low win rate, high payoff — consistent with a trend system
Profit factor1.551.48Solid, not suspiciously high
Max drawdown14%16%Survivable; check Monte Carlo tail
Walk-forward efficiency0.86OOS keeps most of IS edge — robust, not a cliff

The master pre-launch checklist

Before a single real dollar
  • Edge: positive expectancy after pessimistic costs, 100+ trades, multiple regimes (§03, §08).
  • Specification: total, contradiction-free, passes the stranger test; few justified parameters (§06).
  • Data: single feed, parity backtest↔live, gaps and timezones handled (§07).
  • Validation: OOS holds, walk-forward efficiency healthy, parameter plateau, Monte Carlo worst-case survivable (§11).
  • Sizing: risk-per-trade and portfolio heat set backwards from a survivable drawdown; well under Kelly (§04).
  • Operations: kill switch, daily-loss cap, idempotent orders, reconciliation, logging — coded, not intended (§13).
  • Discipline: pre-committed pause/retire thresholds; a journal that separates system P&L from deviation P&L (§14, §15).

Failure-mode map

How systems blow upPrevented by
Trading with no real edge§03 expectancy · §08 honest backtest · §11 validation
Sizing too large → ruin / abandonment§04 sizing & risk of ruin
Overfitting a gorgeous backtest§09 parsimony · §11 walk-forward & Monte Carlo
Look-ahead / data bias inflating results§07 data hygiene · §08 event-driven loop
Costs quietly eating the edge§08 cost model · §13 execution
No regime awareness (right system, wrong market)§02 filter · §05 archetypes · §14 monitoring
Non-deterministic logic in the risk path§13 decision/risk split & fail-safes
Abandoning a good system / clinging to a dead one§14 pre-committed rules · §15 adherence
The whole handbook in one sentence Find a real, specified edge; prove it honestly on history; size it so a normal losing streak can't kill you; operate it deterministically; and have decided, in advance and in writing, exactly when you'll pause, scale, or retire it — because every other decision will be made by a worse version of you, mid-drawdown.

17Glossary

The core vocabulary of system development, in plain terms. Each definition is the working sense used throughout this handbook.

Edge

A statistical advantage that produces positive expectancy after costs over a large sample.

Expectancy

Average profit/loss per trade: (win% × avg win) − (loss% × avg loss). The master profitability number.

R-multiple

Profit/loss expressed in units of initial risk. A trade making twice its risk is +2R; a full stop-out is −1R.

Payoff ratio

Average winning trade ÷ average losing trade (reward:risk). Couples with win rate to determine the edge.

Profit factor

Gross profit ÷ gross loss. Above 1.0 is profitable; 1.3–1.6 is solid.

Win rate

Proportion of trades that profit. Meaningless without the payoff ratio.

Drawdown

A decline in equity from a prior peak, measured in percent or currency.

Max drawdown

The largest peak-to-trough equity decline over a period; its duration often matters more than its depth.

CAGR

Compound annual growth rate — the smoothed annualised return. Ignores risk in isolation.

Sharpe ratio

Excess return ÷ total volatility. Penalises all volatility and assumes near-normal returns.

Sortino ratio

Excess return ÷ downside volatility — fairer to systems with asymmetric (upside-skewed) returns.

Calmar / MAR

CAGR ÷ |max drawdown| — return per unit of worst pain.

Position sizing

Translating a risk budget and stop distance into a trade quantity. The survival lever.

Fixed fractional

Risking a constant percentage of equity per trade. The sensible default.

Kelly criterion

The growth-optimal bet fraction. Used only fractionally and as a ceiling, never raw.

Risk of ruin

The probability of losing enough capital to be unable or unwilling to continue.

Portfolio heat

Total open risk across all positions at once; correlated positions count as one.

Regime

The prevailing market behaviour (trending, ranging, high/low volatility). Every edge needs a specific one.

Backtest

Simulating a system on historical data to estimate its expectancy and conditions of success.

In-sample / out-of-sample

Data used for development (IS) versus data reserved for honest, one-shot testing (OOS).

Walk-forward analysis

Repeatedly optimising on one window and testing on the next unseen window, rolling forward.

Monte Carlo

Reshuffling/resampling trade results many times to reveal the distribution of outcomes and drawdowns.

Overfitting

Fitting the noise in a historical sample rather than the signal; looks great in backtest, fails live.

Look-ahead bias

Using information not available at decision time. The deadliest silent inflator of results.

Survivorship bias

Testing only instruments that still exist, ignoring those that failed and were removed.

Slippage

The difference between the expected fill price and the actual one; worst on stops in fast markets.

Spread

The bid/ask gap — a cost paid on entry and embedded in exit. Widens in thin liquidity and around news.

Swap / rollover

Interest earned or paid for holding an FX position overnight, based on the rate differential.

MAE / MFE

Maximum Adverse / Favourable Excursion — how far a trade ran against / for you before closing.

Kill switch

A global halt that flattens positions and stops trading on a defined dangerous condition.

Idempotency

Designing order submission so a retry never duplicates a position — critical for safe automation.

Walk-forward efficiency

Out-of-sample performance ÷ in-sample performance; a measure of how well an edge generalises.

Keep reading

The Technical Analysis Handbook covers the entry-and-structure layer this handbook assumes — candlesticks, market structure, indicators, patterns, confluence, and order flow — each with exact rules.

Read the Technical Analysis Handbook →