Trading Systems Handbook

01What a Trading System Is

A system is not an indicator, a signal, or a feeling. It is a complete set of rules that answers every decision a trade demands — and answers them the same way every time. This section defines the object the rest of the handbook builds, tests, and operates.

The definition that matters

A trading system is a repeatable, fully specified procedure that converts market data into trading decisions with no decision left to in-the-moment judgement. Given the same inputs, it produces the same output — whether executed by you, a colleague, or a machine. That property is what separates a system from a "style".

The practical test: could a stranger trade your account identically from your written rules, without phoning you? If the honest answer is no, you have intuitions, not a system — and intuitions cannot be backtested, sized, audited, or improved methodically.

Core idea A trading system is a falsifiable hypothesis about market behaviour, expressed precisely enough to test on history and execute without ambiguity. Everything downstream — backtesting, sizing, risk control — depends on this precision existing first.

Discretionary, systematic, and hybrid

These are points on a spectrum of how much judgement enters the loop, not a moral hierarchy. Each can be profitable; each fails differently.

Dimension	Discretionary	Systematic	Hybrid
Decision source	Trader judgement in the moment	Pre-defined rules, mechanically applied	Rules generate candidates; trader vetoes/confirms
Backtestable?	No — judgement isn't reproducible	Yes — fully	Partially — the rule layer only
Scales with capital/markets?	Poorly (operator is the bottleneck)	Well	Limited by the human step
Primary failure mode	Emotion, inconsistency, fatigue	Regime change the rules didn't anticipate	Selective override that quietly destroys the edge
Best for	Reading context, news, anomalies	Repeatable, measurable edges	Edges that need human context but rule discipline

Most successful discretionary traders are, in fact, undocumented systematic traders: they apply a consistent internal procedure they have never written down. The work of building a system is largely the work of extracting that procedure into explicit rules — which is precisely where contradictions and gaps surface.

Why systematise an edge you already trade

Measurability. You cannot improve what you cannot measure. A specified system has an expectancy, a drawdown, a sample size — numbers you can act on.
Falsifiability. Rules can be proven wrong on history before they cost you live money. A feeling cannot.
Consistency. The system trades the same on your best day and your worst. Most blow-ups are not bad rules — they are good rules abandoned under stress.
Compounding of knowledge. Every trade becomes a labelled data point, not a vague memory. The edge sharpens with sample size.
Leverage. A specified system can be automated, monitored, and scaled. An intuition lives and dies with your attention.

What a system is not

Disambiguation

A system is not a signal service — signals are outputs; a system is the full procedure that also defines size, risk, and exit.
A system is not a guarantee — a positive expectancy is a long-run statistical claim, not a promise about the next trade or the next month.
A system is not an edge — it is the container for one. A perfectly specified system with no edge loses money with great consistency.

The two things every viable system needs

Strip everything else away and a tradeable system rests on two independent pillars. Lose either and the account dies — just on different timelines.

1 · A real edge

Positive expectancy after costs, demonstrated over a sample large enough to distinguish skill from luck. Without it, better risk control only slows the bleed.

2 · Survival

Risk and position-sizing rules that guarantee you are still solvent after the inevitable losing streak. Without it, a real edge is wiped out before it can pay you.

The rest of this handbook is the engineering of those two pillars: Sections 02–06 build the edge and its rules; Sections 03–04 and 10–14 build and protect survival.

02Anatomy of a Complete System

A complete system answers eight questions, in order. Skip one and you have left a decision to chance. The order is not cosmetic — each component constrains the next, and getting the sequence wrong (sizing before stops, entry before regime) is one of the most common structural errors.

The eight components

The pipeline. The first row decides whether and where to act; the second decides how much and for how long. Risk (05) is defined before size (06) because size is a function of the stop, never the reverse.

#	Component	The question it answers	Concrete forms
01	Universe	What instruments are we even allowed to trade?	A fixed list (e.g. major FX pairs), or a screen (liquidity, spread, ATR floor)
02	Regime filter	Is the system permitted to act right now?	Trend filter (price vs 200-EMA), volatility band, session window, news blackout
03	Setup	What recurring condition defines an opportunity?	Pullback to a level, range break, oversold extreme, momentum cross
04	Entry	What exact event triggers the order, and how do we get in?	Trigger candle close + market/limit/stop order at a defined price
05	Initial stop	Where is the idea proven wrong? (This defines 1R.)	Beyond structure, k × ATR, or a fixed distance
06	Position size	How much, given the stop and the risk budget?	Fixed-fractional %, volatility-targeted, fractional Kelly
07	Exit logic	How and when do we take profit or cut?	Fixed R-target, trailing stop, time stop, opposite signal
08	Manage	What happens to the trade while it is open?	Move to breakeven at +1R, scale out in tranches, pyramid

The sequencing rule Risk before size. The single most common structural mistake is choosing a position size first ("I'll do 1 lot") and discovering your risk afterwards. Professionals invert it: define the stop, define the dollar risk budget, then solve for the size that makes those two consistent. Size is an output, never an input.

A worked specification

The same idea, written first as a discretionary "style" and then as a system, makes the difference concrete.

As a style (untradeable)

"I buy GBPUSD pullbacks in an uptrend when it looks like the dip is done, and I take profit into resistance."

As a system (tradeable)

Every word below is checkable and reproducible — and therefore backtestable.

GBPUSD trend-pullback — full specification

Universe: GBPUSD only, 1-hour bars.
Regime filter: close > 200-EMA and 50-EMA > 200-EMA (uptrend confirmed). No trades in the 30 minutes around high-impact GBP/USD news.
Setup: price pulls back and the low touches the 20-EMA while the regime filter holds.
Entry: buy-stop 2 pips above the high of the first bar that closes back above the 20-EMA.
Initial stop: 1.5 × ATR(14) below entry. This distance defines 1R.
Position size: risk 0.5% of equity; size = (equity × 0.5%) ÷ (stop distance in pips × pip value).
Exit: take half at +1R, trail the remainder under each new swing low; time-stop the trade if +1R is not reached within 24 bars.
Manage: move stop to breakeven once +1R is filled.

Notice that the system version exposes decisions the style hid: how far is a valid pullback, which EMA, what confirms "the dip is done", where exactly is the stop. Those hidden decisions are where discretionary edges silently drift — and where a system makes the drift impossible.

03Edge & Expectancy

A system makes money for exactly one reason: positive expectancy realised over a large enough sample. Not a high win rate, not a good feeling, not a clever indicator. This section is the arithmetic of edge — and the traps that arithmetic exposes.

Expectancy: the master number

Expectancy is the average profit or loss per trade you can expect over many trades. It is the product of how often you win and how much you win versus lose.

E = (W% × avgWin) − (L% × avgLoss)
# W% = win rate, L% = loss rate = 1 − W%
# avgWin / avgLoss in currency or pips

To compare systems across instruments and account sizes, normalise everything to R — the initial risk per trade. One R is the distance from entry to your initial stop. A trade that makes twice its risk is +2R; a trade stopped out is −1R.

R-multiple = (exit price − entry price) ÷ (entry price − initial stop) # for a long
Expectancy[R] = (W% × avgWinR) − (L% × avgLossR)
# > 0 means the system is profitable per unit of risk, before frequency

Why R changes everything Expressing results in R makes a system account- and instrument-agnostic. "I made 40R last year" is portable; "I made $4,000" is not. R is also the unit your position sizing (Section 04) is built on — the two systems speak the same language.

Win rate is a vanity metric

Win rate alone tells you nothing about profitability, because it ignores the size of wins versus losses. The payoff ratio b = avgWin ÷ avgLoss couples them. The win rate you need merely to break even falls as your payoff ratio rises:

Break-even win rate = 1 ÷ (1 + b) # b = avgWin ÷ avgLoss (the reward:risk ratio)

The win rate you must clear just to break even. At 2:1 you need only 33% winners; at 3:1, 25%. This is why trend systems survive on low win rates — their wins dwarf their losses.

Payoff ratio (b)	Break-even win rate	Win rate for healthy edge	Typical archetype
0.5 : 1	66.7%	> 75%	Mean reversion / scalping
1 : 1	50.0%	> 55%	Range / oscillator systems
2 : 1	33.3%	> 40%	Swing / breakout
3 : 1	25.0%	> 33%	Trend-pullback
5 : 1	16.7%	> 25%	Trend-following / momentum

The win-rate trap A 90%-win-rate system that risks 10R to make 1R has an expectancy of (0.9 × 1) − (0.1 × 10) = −0.1R per trade. It feels wonderful nine times out of ten and quietly bankrupts you on the tenth. High win rates and tight stops sell courses; expectancy pays bills.

Profit factor — a second lens

Profit factor is gross profit divided by gross loss. It is closely related to expectancy but reads more intuitively as "how many dollars I make per dollar I lose".

Profit factor = gross profit ÷ gross loss = (W% × avgWin) ÷ (L% × avgLoss)
# 1.0 = break-even · 1.3–1.6 = solid · > 2.0 = excellent (and worth double-checking for look-ahead bias)

Frequency: expectancy is per trade, growth is per year

Per-trade expectancy alone does not grow an account — expectancy multiplied by trade frequency does. A smaller edge taken often can dominate a larger edge taken rarely.

System A

+0.2R per trade × 200 trades/year = +40R/year. Low edge, high frequency.

System B

+0.5R per trade × 30 trades/year = +15R/year. High edge, low frequency.

System A compounds faster and reaches statistical significance sooner — but only if its costs per trade (Section 13) do not eat the thinner edge. Frequency amplifies both your edge and your friction.

Expectancy is a claim about samples, not trades

A positive expectancy is a statement about the long-run average, and every long-run average hides brutal short-run variance. A robust system will produce long losing streaks — they are a feature of randomness around a positive mean, not evidence the edge is gone.

Expected longest losing streak in N trades ≈ ln(N) ÷ ln(1 ÷ L%)
# e.g. L% = 60%, N = 500 → ln(500)/ln(1.667) ≈ 12 consecutive losses are normal

Implication for sample size You cannot judge a system on 20 trades — the variance swamps the signal. Treat anything under ~100 closed trades as anecdote, ~100–300 as suggestive, and 300+ as the floor for taking an expectancy estimate seriously. The losing streaks that this math guarantees are exactly what your position sizing in Section 04 must survive.

04Position Sizing & Risk of Ruin

Edge tells you whether to play; sizing tells you whether you survive long enough to collect. Most accounts are not killed by bad systems — they are killed by good systems sized too aggressively to outlast a normal losing streak. This is the survival pillar, and it is pure arithmetic.

The job of position sizing

Sizing has one job: convert a risk budget and a stop distance into a quantity. Everything flows from the stop you already defined in Section 02 — which is why stops come first.

Risk per trade ($) = Equity × risk%
Position size = Risk per trade ($) ÷ (stop distance × value per unit move)
# FX: lots = (Equity × risk%) ÷ (stopPips × pipValuePerLot)

Worked example — FX

Equity $10,000 · risk 0.5% · GBPUSD stop 25 pips · pip value ≈ $10/standard lot.

Risk per trade = 10,000 × 0.5% = $50
Lots = 50 ÷ (25 × 10) = 0.20 lots (20,000 units)

Widen the stop to 50 pips and the size halves to 0.10 lots — same dollar risk. The market decides the stop; your budget decides the dollars; size is whatever reconciles them.

Sizing methods, ranked

Method	Idea	Strength	Weakness	Verdict
Fixed lot	Same size every trade	Trivial	Ignores stop distance and account size; risk varies wildly per trade	Avoid
Fixed fractional	Risk a constant % of equity per trade	Auto-scales up in wins, down in losses; bounds drawdown	Slow recovery after deep drawdown	Default. Start here.
Volatility targeting	Size so each trade contributes equal volatility (size ∝ 1/ATR)	Normalises risk across instruments and regimes	Needs reliable volatility estimate; reacts to vol spikes	Excellent for multi-instrument
Fixed ratio	Increase size after a fixed profit increment (Δ)	Aggressive growth for small accounts	Risk grows non-linearly; punishing in drawdown	Niche
Kelly / fractional Kelly	Bet the growth-optimal fraction (or a fraction of it)	Mathematically maximises long-run growth	Assumes you know edge exactly; full Kelly is brutally volatile	Use a fraction, as a ceiling — never raw

The Kelly criterion — and why nobody trades full Kelly

Kelly gives the fraction of capital that maximises long-run geometric growth.

f* = p − q ÷ b = (p(b + 1) − 1) ÷ b
# p = win prob, q = 1 − p, b = payoff ratio (avgWin ÷ avgLoss)
# example: p = 0.40, b = 3 → f* = (0.40 × 4 − 1) ÷ 3 = 0.20 → 20% of equity per trade

Twenty percent per trade is correct in theory and insane in practice. Full Kelly produces gut-wrenching drawdowns (a 50%+ drawdown is routine), and it assumes you know p and b exactly — you do not; you estimated them from a finite, noisy backtest. Over-estimate the edge and Kelly over-bets you straight into ruin.

Fractional Kelly Serious operators trade a fraction of Kelly — typically a quarter or less — and treat the Kelly number as an upper bound they stay well under. Half-Kelly keeps ~75% of the growth at ~half the volatility. For most discretionary FX systems, a flat 0.5–1% fixed-fractional risk already sits far below Kelly, which is exactly why it survives.

Risk of ruin

Risk of ruin (RoR) is the probability of losing enough capital to be unable (or unwilling) to continue. For a simplified 1:1 system risking a fixed unit per trade, with capital expressed as N units (N = 1 ÷ risk%):

RoR = (q ÷ p)^N for p > q ; RoR = 1 for p ≤ q
# N = number of "units" of risk in your account = 1 ÷ risk-per-trade

Risk per trade	Units (N)	RoR at 55% win (1:1)	RoR at 50% win (no edge)
1%	100	≈ 0%	100% (eventually certain)
2%	50	≈ 0.01%	100%
5%	20	≈ 1.9%	100%
10%	10	≈ 13.7%	100%
20%	5	≈ 36.6%	100%

Two lessons in one table First, risk-per-trade is the dominant lever on ruin — going from 1% to 10% turns a rounding error into a one-in-seven chance of blowing up. Second, with no edge (p ≤ q), ruin is certain given enough trades regardless of sizing — no money management rescues a negative-expectancy system. This is the fixed-bet approximation; real fixed-fractional sizing never hits exactly zero but trades that for deep drawdowns, which is why the true tool for RoR is Monte Carlo simulation (Section 11).

Drawdown is the binding constraint

Losses and the gains needed to undo them are not symmetric. A drawdown of depth d requires a gain of d ÷ (1 − d) just to get back to even — and that gap explodes as the hole deepens.

Recover-to-even gain by drawdown. A 20% loss needs +25%; a 50% loss needs +100%; a 75% loss needs a +300% moonshot. Avoiding the deep hole is worth more than any entry signal.

Portfolio heat & correlation

Per-trade risk is not enough; you must cap portfolio heat — total open risk across all positions at once. Correlated positions are the trap: long EURUSD and long GBPUSD are not two 0.5% bets, they are closer to one 1% bet on a falling dollar.

Cap aggregate open risk (e.g. total heat ≤ 2–3% of equity), not just per-trade risk.
Treat correlated instruments as one position for the heat calculation; size the cluster, not each leg.
Cap correlated clusters so a single macro move (a dollar spike, a risk-off day) cannot hit every open trade at full size simultaneously.

Survival-first principle Sizing is engineered backwards from the drawdown you can survive — financially and psychologically — not forwards from the returns you want. Decide the worst losing streak the math says is normal (Section 03), confirm your size lets you sit through it without breaching your max-drawdown limit, and only then ask about returns. A system you abandon at the bottom of a normal drawdown has, for you, a negative expectancy.

05Strategy Archetypes

Every edge is a bet that a specific market behaviour will repeat. Each archetype below works in one regime and bleeds in its opposite — there is no all-weather edge. The meta-skill is knowing which archetype you are running, which regime it needs, and how to tell when that regime has left.

The two parents: trend and mean reversion

Almost every system descends from one of two opposing beliefs: that moves continue (momentum/trend), or that moves revert (mean reversion). They are negatively correlated by construction, which is also why running both can smooth an equity curve.

Two opposite bets. Trend buys after a move proves itself and rides continuation; mean reversion fades a stretched move expecting a snap back to the average. Each is the other's worst regime.

The archetype map

Archetype	The bet	Typical win% / payoff	Needs this regime	Bleeds when
Trend-following	Strong moves persist; let winners run	30–45% / 2–5R	Sustained directional trends	Choppy, range-bound, mean-reverting markets (death by a thousand cuts)
Mean reversion	Extremes overshoot and snap back	60–75% / 0.5–1R	Range-bound, stationary, high-noise	A trend or regime break runs through your fade (picking up coins in front of a roller)
Breakout	Range expansion births a new trend	35–50% / 2–3R	Volatility compression resolving	False breakouts in chop; getting whipsawed at the edges
Momentum (cross-sectional)	Recent winners keep outperforming losers	~50% / varies	Dispersion across a basket; persistent leadership	Sharp reversals / correlation spikes flip the rankings
Carry (FX)	Earn the interest-rate differential (swap)	High win% / small, steady	Calm, risk-on, low-volatility	Risk-off: "up the stairs, down the elevator" — slow gains, violent reversals
Pairs / stat-arb	A cointegrated spread reverts to its mean	High win% / small	A stable statistical relationship	The relationship structurally breaks (the spread never returns)
Session / time-based	Intraday seasonality (e.g. London open, NY overlap)	Varies	Repeatable liquidity/volatility windows	The seasonality decays or shifts with market structure

Multi-timeframe: a structure, not an archetype

Multi-timeframe (MTF) is not a separate edge — it is a way of combining the above. The standard pattern: a higher timeframe sets the bias (which direction the regime filter permits), and a lower timeframe provides the trigger (a precise, cheaper entry). MTF tightens stops and improves reward:risk, but it cannot manufacture an edge that the underlying bet does not have.

The regime-dependence principle There is no edge that works everywhere — there are edges and the regimes that feed them. This is why Section 02's regime filter exists (only trade when your regime is present) and why Section 14's monitoring exists (detect when the regime — and therefore your edge — has left). A trend system with a regime filter that keeps it flat during chop is not a worse trend system; it is a complete one.

Combining for smoother equity Because trend and mean reversion profit in opposite regimes, a portfolio that holds both tends to have a smoother equity curve than either alone — one is usually working when the other is not. Diversifying across uncorrelated edges is the closest thing to a free lunch in system design, far safer than leveraging a single edge harder.

06From Idea to Specification

Most systems do not fail in the backtest — they fail because the rules were never truly nailed down, so the trader ran a slightly different system every week and never knew it. Specification is the unglamorous discipline that turns a belief into something you can test, size, audit, and improve. It is also the single hardest step.

Hypothesis before indicators

Start from a market behaviour you can state in one sentence, not from an indicator you find interesting. The hypothesis is the edge; the indicator is merely how you measure the hypothesis.

Indicator-first (backwards)

"The RSI looks useful — let me find settings that would have worked." This is curve-fitting with extra steps.

Hypothesis-first (correct)

"Liquid FX trends resume intraday after a shallow pullback to the mean." Now pick the cheapest indicator that captures that.

A hypothesis is testable and can be wrong. "RSI is good" cannot be wrong because it says nothing. If you cannot state, in plain language, what the market is doing and why your rules profit from it, you do not yet have a system to specify.

The specification: zero ambiguity

To specify is to answer all eight components (Section 02) such that two people — or a machine — would execute identically. Re-run the stranger test against every clause: could someone else act on this word without asking you what you meant?

Vague (a style)	Specified (a system)
"Strong uptrend"	close > 200-EMA and 50-EMA slope > 0 over last 10 bars
"Near support"	within 0.25 × ATR(14) of a level touched ≥ 2× in the last 50 bars
"Wait for confirmation"	a bar that closes back above the 20-EMA
"Don't trade the news"	no entries in the 30 min before/after a high-impact GBP or USD release
"Take profit into resistance"	limit at the nearest level above; else exit at +2R

Contradictions and gaps: a spec must be total

A real specification is total — it defines an action for every state the market can present. Two failure modes hide here, and both are invisible until they cost money:

Contradictions — two rules that fire at once and disagree. "Buy when RSI < 30" and "never buy below the 200-EMA" both trigger when an oversold market is also below trend. Which wins? If the spec doesn't say, you'll improvise — differently each time.
Gaps — states the rules never anticipated. The setup forms on the exact bar the regime filter flips. A target and a stop are both hit inside one bar. Price gaps past your entry. An undefined state is a coin-flip you didn't know you were making.

Why this is the whole game Contradictions and gaps are the difference between a backtest you can trust and one that quietly used assumptions your live self won't replicate. Resolving them before coding is cheaper than discovering them in a drawdown. Most "the backtest worked but live didn't" stories are unspecified states being resolved one way by the historical fill engine and another way by the panicking human.

Structure versus parameters

Separate the structural logic (the bet: "buy pullbacks in an uptrend") from the tunable parameters (the 20 in 20-EMA, the 1.5 in 1.5×ATR). Structure encodes your hypothesis; parameters are knobs. Every free parameter is a degree of freedom you can accidentally fit to noise (Section 09).

Minimise free parameters. Three robust ones beat ten finely-tuned ones. Each knob should earn its place with an economic reason, not a backtest improvement.
Prefer parameters that generalise. A 200-EMA trend filter is a broad, well-understood concept; a 187-period filter that tested 0.3% better is a red flag.
Fix what you can justify; only tune what you must. The fewer things you optimise, the less you can overfit.

Where Reign Edge fits This translation problem — turning a trader's tacit, in-the-head rules into an explicit, contradiction-free, testable specification — is precisely the problem the Reign Edge platform is built around. The hard part was never the indicators; it is making the rules say exactly what the trader means, surfacing the contradictions and gaps they didn't know were there, and expressing the result as a small set of typed, composable, auditable conditions rather than opaque free-form code. A specification you can read, test, and reason about beats a black box you can only run.

07Data Foundations

A backtest is only as honest as the data underneath it. Bad data does not produce obvious errors — it manufactures plausible, profitable-looking edges that evaporate the moment real money is on the line. Most "my backtest lied to me" stories are data stories.

The raw material: bars and OHLCV

A bar aggregates price action over an interval into five numbers: Open, High, Low, Close, Volume. The interval can be time-based (1H), tick-based (every 500 ticks), or volume-based. The critical limitation: a bar tells you the range but not the path within it.

The intrabar problem If a bar's range contains both your stop and your target, the OHLC alone cannot tell you which was hit first. Assume the pessimistic order (stop first) unless you have tick data to resolve it. Backtests that assume the target filled first are a common, silent source of inflated results.

Tick vs bar data

Aspect	Tick data	Bar (OHLCV) data
Granularity	Every quote/trade	Summary per interval
Resolves intrabar order?	Yes	No
Models spread/slippage?	Yes (bid/ask)	Approximation only
Size & cost	Huge, expensive to store/process	Compact, cheap
Use for	Scalping, intrabar logic, realistic fills	Swing/position, higher-TF research

Data quality — clean before you trust

Spikes & bad ticks: erroneous prints that trigger phantom signals. Filter outliers.
Gaps & missing sessions: holidays, outages, weekend gaps in FX. Decide how each is handled, consistently.
Duplicate / misaligned bars: repeated timestamps or off-by-one alignment silently shift signals.
Timezone drift: the deadliest quiet bug — a "daily" bar means something different at broker-server time vs UTC vs your local time, and a session filter built on the wrong zone is wrong on every bar.

FX-specific realities

Foreign exchange has no central exchange, which changes the data picture in ways equity traders often miss:

No single price: each broker/liquidity provider quotes slightly differently. There is no canonical tape.
Variable spread: spread widens at session opens, around news, and in thin liquidity (Asian session, Friday close). A fixed-spread assumption flatters the backtest.
Swap / rollover: holding overnight earns or pays interest on the rate differential. For multi-day holds, unmodelled swap can flip a carry trade's sign.
Sessions & gaps: the market is continuous Mon–Fri but liquidity rotates across Tokyo/London/New York; the weekend gap can leap past stops.

Feed parity: one source, end to end

Single-feed principle Use one data source for historical research, forward testing, and live execution. Mixing a cheap historical vendor with a different live broker injects divergence that has nothing to do with your strategy's quality — different timestamps, different spreads, different fills. When backtest and live disagree, you must be able to rule out the data as the cause. A single feed (ideally the same provider you will trade through) makes the backtest a fair preview of live.

Tier	Use	Trade-off
Free tick (e.g. Dukascopy)	Research, spikes, archetype validation	Great granularity, but not your execution venue — for exploration only
Broker API history + live (single provider)	Production backtest & live	Parity across the whole pipeline; the configuration you actually trade
Mixed vendors	—	Avoid: divergence you cannot attribute to the strategy

08Backtesting

A backtest is a laboratory for trying to prove your hypothesis false on history before risking capital. Its job is not a pretty equity curve — it is an honest estimate of expectancy and of the conditions under which that expectancy holds. The moment you start trying to make it look good, you have stopped being the scientist and become the mark.

Vectorised vs event-driven

Aspect	Vectorised	Event-driven
How it runs	Computes signals across the whole series at once	Steps through time bar-by-bar as if live
Speed	Very fast — ideal for scanning many ideas	Slow — one finalist at a time
Path-dependent logic	Awkward (trailing stops, partial fills, pyramiding)	Natural — mirrors how live execution works
Look-ahead risk	Easy to introduce accidentally (whole-series ops)	Structurally harder — you only see the past
Best for	Research, parameter sweeps, idea triage	Validating the finalist; matching the live engine

The two-stage approach Explore broadly in a fast vectorised engine, then re-test the survivor in an event-driven engine that mirrors your live execution path (same fill logic, same cost model). Discrepancies between the two stages are themselves diagnostic — they usually reveal hidden look-ahead or unmodelled path dependence.

The event-driven loop (the mental model)

Everything hinges on one phrase: data available up to now. For each bar, in order:

Update indicators using only bars that have already closed.
Evaluate the regime filter, then the setup, then the entry trigger.
Simulate the fill with costs (spread, commission, slippage).
Manage open trades (stops, targets, trails) against this bar's range.
Mark equity and record the trade's R-multiple.

If step 1 ever peeks at the current or a future bar's close to make a decision you'd act on now, you have introduced look-ahead bias — and your results are fiction (Section 09).

Costs: the quiet edge-killer

Costs scale with frequency, and they attack thin edges first. Model them pessimistically — optimism here is self-deception with a spreadsheet.

Net expectancy[R] = Gross expectancy[R] − costs per trade[R]
costs ≈ (spread + commission + slippage) ÷ (stop distance) # all in the same units

How frequency turns cost into ruin

Gross edge +0.20R/trade, 200 trades/year → +40R gross.

Costs 0.10R/trade → net +0.10R → +20R (half the edge gone to friction).
Costs 0.20R/trade → net 0.00R → break-even — a real edge, fully consumed by costs.

The thinner and faster the edge, the more a credible cost model decides whether it is real. Stress your costs upward and see if the edge survives.

Pessimistic fills on stops Model stop-loss fills as worse than the trigger price (slippage works against you precisely when you're stopped in fast markets), and limit fills as no better than the level. A backtest that assumes perfect fills both ways systematically overstates a real edge.

In-sample and out-of-sample

Split your history. Develop, optimise, and iterate freely on the in-sample (IS) period. Reserve an out-of-sample (OOS) period that you test against once.

The OOS is sacred — and consumable Every time you look at OOS results and then go back and tweak the system, you have leaked OOS information into your design — the OOS quietly becomes in-sample, and its honesty is spent. The discipline is to finalise on IS, test on OOS exactly once, and if it fails, treat that as a finding (not an invitation to re-tune until OOS passes). For repeated, structured testing, use walk-forward (Section 11) instead.

What a credible backtest reports

A long period spanning multiple regimes (trending, ranging, high- and low-volatility, at least one crisis).
Costs included; IS and OOS shown separately; the trade count stated.
The distribution of outcomes, not just the total — equity curve, drawdown depth and duration, and the full metric set (Section 10).
Sensitivity to small parameter changes (Section 09) — robustness, not a single hero result.

A backtest is a hypothesis test, not a brochure If your instinct on a weak result is to search for the settings that fix it, you have inverted the exercise. The honest question is "does the edge survive my attempts to break it?" — not "can I make this curve go up?" A backtest engineered to impress impresses exactly one person: the one about to trade it.

09Overfitting & the Bias Catalog

Overfitting is the reason a beautiful backtest becomes a losing live system. It is fitting the noise in your historical sample instead of the signal that will repeat — and it is seductive precisely because it always looks like progress. This is the single most dangerous failure in system development.

What overfitting actually is

A market history contains both a (possibly real) pattern and a large amount of random noise specific to that sample. Overfitting is when your rules and parameters memorise the noise — the exact wiggles that will never recur — rather than the generalisable structure. The more free parameters you have and the more variations you try, the easier it becomes to fit noise perfectly.

The cruelty is the asymmetry: overfitting is invisible in the backtest (where it looks superb) and only revealed in live trading (where it costs money). You cannot detect it by admiring results — only by methodology applied before you see the results you were hoping for.

Symptoms

Plot performance against a parameter. A robust edge sits on a wide plateau — neighbours perform similarly, so small errors don't matter. An overfit edge is a lone spike: nudge the parameter and it collapses. You want plateaus, not peaks.

Too many parameters relative to the number of trades — degrees of freedom that let you fit anything.
Fragility: performance collapses when a parameter moves slightly — a peak, not a plateau.
Too-good metrics: Sharpe > 3, 80%+ win rate and large payoff, a near-straight equity line. Real edges are noisier than this.
Great in-sample, poor out-of-sample — the textbook signature.
Works on one instrument only and breaks on similar ones — a genuine behavioural edge usually generalises at least somewhat.

The bias catalog

Overfitting is the headline, but it travels with a family of biases that all inflate backtest results. Know each by name so you can hunt for it deliberately.

Bias	What it is	Defence
Look-ahead	Using information not available at decision time (a bar's close, a future value, a revised figure)	Event-driven loop; only closed bars; lag any revised data
Survivorship	Testing only instruments that still exist; the failures were deleted	Use point-in-time universes that include delisted/dead names
Data-snooping / multiple testing	Trying many ideas and keeping the best — which looks good by luck alone	Count your trials; raise the bar; deflate the result (below)
Optimisation bias	Tuning parameters to the in-sample period's specific noise	Few parameters; walk-forward; demand plateaus
Selection bias	Cherry-picking the test window, instrument, or start date that flatters	Fixed, pre-declared test period across regimes
Hindsight in rule design	Adding rules that "explain" past losses you already saw	Pre-register the hypothesis before looking; resist post-hoc patches
Cost omission	Ignoring spread, slippage, swap	Pessimistic cost model (Section 08)

The multiple-comparisons problem

If you test enough strategies, the best one will look brilliant even if none has any edge — the maximum of many random results is large by chance. The more configurations you search, the higher your performance bar must rise to mean anything.

Deflated Sharpe The honest correction is to discount your best result for the number of trials behind it (the deflated Sharpe ratio formalises this). Practically: track how many variations you tried, treat a Sharpe of 1.5 found after one honest attempt very differently from a Sharpe of 1.5 found after 500 sweeps, and be deeply suspicious of any result you only obtained by searching hard for it.

The parsimony toolkit

Minimise parameters and justify each one economically, not by backtest gain.
Out-of-sample testing, used sparingly (Section 08).
Walk-forward analysis and Monte Carlo — the core robustness tools (Section 11).
Demand plateaus, not peaks, in parameter space.
Hold out a final, untouched dataset for one last sanity check before going live.
Pre-register the hypothesis and count trials honestly. The discipline must precede the results.

10Performance Metrics

No single number describes a system. CAGR ignores risk; win rate ignores payoff; Sharpe hides drawdown duration. A system is a profile across four families — return, risk-adjusted return, drawdown, and trade quality — and every individual metric is gameable in isolation. Read them as a set.

Return, risk-adjusted return, and the equity picture

CAGR compounds the growth rate; risk-adjusted ratios divide return by some measure of pain.

CAGR = (Ending ÷ Beginning)^(1 ÷ years) − 1
Sharpe = (Rp − Rf) ÷ σ p # annualised ≈ daily Sharpe × √252; penalises ALL volatility
Sortino = (Rp − Rf) ÷ σ downside # penalises only downside — fairer to asymmetric systems
Calmar = CAGR ÷ |max drawdown| # return per unit of worst pain

The two views traders actually feel. The equity curve shows growth; the underwater curve shows how deep and how long you spent below the prior peak. Two systems with the same CAGR can have wildly different underwater curves — and you live in the underwater one.

Drawdown and trade quality

Maximum drawdown is the largest peak-to-trough equity decline; its duration — how long you stay underwater — is often the more punishing number. At the trade level, MAE/MFE (Maximum Adverse / Favourable Excursion) measure how far each trade ran against and for you before closing — invaluable for calibrating stops (are you getting stopped just before reversals?) and targets (are you leaving most of the move on the table?).

Metric	Definition	Healthy range	The gotcha
CAGR	Compound annual growth rate	Context-dependent	Says nothing about risk taken to earn it
Sharpe	Excess return ÷ total volatility	> 1 acceptable, > 2 strong	Penalises upside; assumes near-normal returns; smoothable
Sortino	Excess return ÷ downside volatility	> 2 good	Fairer to asymmetric systems, but noisier to estimate
Calmar / MAR	CAGR ÷ \|max drawdown\|	> 0.5 ok, > 1 strong	Hostage to the single worst DD and the length of the test
Max drawdown	Largest peak-to-trough decline	< 20% comfortable for most	One number hides duration and frequency
DD duration	Longest time underwater	Shorter is better	The metric that actually breaks discipline
Profit factor	Gross profit ÷ gross loss	1.3–1.6 solid	> 2 — verify it isn't look-ahead
Expectancy [R]	Average R per trade	> 0; > 0.1 good	Meaningless below ~100 trades
Win rate	Wins ÷ total	Only with payoff context	Pure vanity in isolation
MAE / MFE	Worst / best excursion per trade	Used to tune stops & targets	Needs trade-by-trade path data

Read the set, not the number Any single metric can be gamed: a high Sharpe can mask a shallow-but-endless drawdown; a high CAGR can ride a 70% max drawdown; a high win rate can hide a negative expectancy. Demand the whole profile — return, the ratio that captures your pain, the depth and duration of drawdown, and trade-level quality — before you believe a system is good.

11Robustness & Validation

A single backtest — even out-of-sample — is one path through one history with one set of parameters. Robustness testing asks the harder question: would this edge have survived different data, different parameters, and different luck? Here you stop admiring the system and start trying to break it on purpose.

Walk-forward analysis — the gold standard

Walk-forward analysis (WFA) mimics how you'd actually run a system: optimise on a window, trade the next unseen window with those settings, then roll the window forward and repeat. The concatenated out-of-sample segments form a realistic equity curve that was never optimised in hindsight.

Rolling walk-forward. Each pass optimises on the grey block, then tests untouched on the red block, and slides forward. Stitching the red blocks together yields an out-of-sample curve you could actually have traded. Walk-forward efficiency = OOS performance ÷ IS performance; you want it reasonably high (say > 0.5), not a cliff.

Anchored WFA

In-sample window expands from a fixed start. Uses all history; adapts slowly. Good when more data always helps.

Rolling WFA

In-sample window is a fixed length that slides. Adapts to changing regimes; discards old data. Good when markets evolve.

Monte Carlo — the range of luck

Your backtest's max drawdown is a single sample of what randomness could deal you; the next one could be worse. Monte Carlo simulation reshuffles or resamples your trade results thousands of times to reveal the distribution of outcomes — especially the drawdowns you didn't happen to get but easily could.

Each faint line is one reshuffle of the same trades; the bold line is the median. The spread of final equity and (more importantly) of worst drawdown across thousands of runs tells you the range your sizing must survive — not just the one path history happened to draw.

Trade-order shuffling: reorder the same results — same total return, very different drawdown paths.
Bootstrap resampling: draw trades with replacement to build the outcome distribution.
Randomised skipping: drop a fraction of trades at random — does the edge survive missing some signals?

Sensitivity, regime, and stress

Parameter sensitivity: nudge every parameter ±10–20%; performance should degrade gracefully (a plateau), not collapse (Section 09).
Regime slicing: break results out by trending / ranging / high- vs low-volatility periods. A robust system needn't excel everywhere, but it must not be catastrophic in its off-regime — and its regime filter should keep it largely flat there.
Stress testing: replay the worst historical windows, double your slippage, widen spreads, and gap price through a stop. If the system only survives benign conditions, it isn't validated.

"Would I trade this?" — pre-validation checklist

Positive expectancy after pessimistic costs, over 100+ trades spanning multiple regimes.
Out-of-sample and walk-forward results hold (WFE not a cliff).
Parameter plateau, not a peak; survives ±20% nudges.
Monte Carlo worst-case drawdown is one your sizing and psychology can survive.
No single regime, instrument, or year carries the entire result.

Robustness beats optimisation A slightly sub-optimal system that is stable across data, parameters, and luck will out-earn a backtest-optimal one that is fragile — because live markets will differ from your sample. The goal is not the best historical curve; it is an edge that remains profitable while you are slightly wrong about everything.

12From Backtest to Live

The gap between a validated backtest and a profitable live account is where most edges quietly die — not from a flawed system, but from the implementation gap. Crossing it is a deliberate, staged process, and the first thing you measure live is not profit.

Three testing modes

Mode	What it tests	Blind spot
Paper / simulation	Logic and operational bugs, with idealised fills	Real slippage and the operator's nerves
Forward test (real-time data)	The truest data preview — same feed, no peeking ahead	Fills if paper; psychology if not live
Live micro-size	Real fills and real psychology — the only test that includes you	Costs real money (kept tiny on purpose)

The implementation gap

These are the backtest assumptions that break in contact with a live venue. Each one widens the gap between hypothetical and realised expectancy:

Slippage worse than modelled, especially on stops in fast markets.
Latency between signal and fill — the price you saw isn't always the price you get.
Partial or rejected fills, and requotes in thin liquidity.
Spread spikes at session opens and news that your fixed-spread backtest never charged you.
The operator — hesitating on a valid signal, overriding a loss, sizing up after a win.

Incubate, then scale in

Do not jump from simulation to full size. Run the system forward — paper or micro — for a window long enough to span a few dozen trades and at least one regime shift, then ramp capital in stages, each stage gated on the live edge continuing to track the backtest within tolerance.

micro size → small size → target size
# advance a stage only if: live expectancy ≈ backtest expectancy (within tolerance)
# AND drawdown is inside the limit AND execution stats match the cost model

Go-live checklist

Feed parity verified — same data source as backtest (Section 07).
Costs modelled and matching observed spread/slippage.
Position-size formula re-derived and unit-tested against hand calculations.
Max-drawdown limit and a kill switch coded, not just intended.
Logging of every decision and fill; alerting on anomalies; reconciliation of system state vs broker state.
A written, pre-committed pause rule (Section 14) — decided in calm, not in drawdown.

Measure tracking before profit The first live question is not "am I making money?" — over a small sample, that's mostly noise. It is "does live expectancy track backtest expectancy within tolerance?" If realised results diverge materially from the validated estimate, halt and find the cause — data, costs, slippage, or your own execution — before adding a cent of size. A divergence you ignore at micro-size becomes a catastrophe at target size.

13Execution & Operations

A validated edge can still bleed out through execution. This is the engineering layer — how orders actually reach the market, the FX-specific frictions, and the fail-safes that matter most precisely when a system is handling real money and something goes wrong.

Order types

Order	Behaviour	Certainty	Use for
Market	Fills now at best available price	Fill certain, price uncertain	When speed > price; risky in thin liquidity
Limit	Fills at your price or better	Price certain, fill uncertain	Entries at a level, taking profit
Stop (stop-market)	Becomes a market order when the level trades	Fill certain once triggered, price uncertain (slippage)	Stop-losses, breakout entries
Stop-limit	Becomes a limit order when triggered	Price certain, fill uncertain	Careful entries — dangerous for stop-losses, as it can leave you unprotected in a fast move
Trailing stop	A stop that follows price by a set distance	Locks in gains progressively	Letting winners run

The market-vs-limit trade-off Every entry is a choice between fill certainty (market) and price certainty (limit). High-frequency or thin-edge systems are exquisitely sensitive to this: a few tenths of a pip of extra slippage, taken hundreds of times, is the whole edge. Match the order type to how much your strategy can pay in slippage versus how badly it needs to be in the trade.

FX execution realities

Spread on every round trip: you pay it entering and it's baked into your exit. It is the most certain cost you have — model it on the bid/ask you actually trade.
Swap / rollover applies at the daily rollover (around 17:00 New York); triple swap is typically charged once a week for the weekend. Material for any multi-day hold.
Liquidity windows: the London/New York overlap is deepest and cheapest; the Asian session and the Friday close are thin and wide; the Sunday open can gap.
The weekend gap: price can open Monday far from Friday's close, leaping over stops — size and hold with that in mind.

The decision / risk split

Core principle for automated & AI systems Use judgement (human or model) to choose trades; use deterministic rules to enforce risk — never the reverse. A probabilistic or discretionary component — including an LLM — may decide whether to take a setup, but the limits that keep you solvent (stop placement, position size, the kill switch, the daily-loss cap) must be hard, deterministic code in the hot path. A non-deterministic component sitting inside the risk-enforcement loop is account-draining: its failures are quiet, context-dependent, and impossible to fully test. Decisions can be soft; risk limits must be hard.

Fail-safes & operational hardening

In trading, bugs rarely throw a clean exception — they lose money silently. Engineer the system to fail loudly and fail flat.

Kill switch: a global halt triggered by a daily-loss limit, an error-rate spike, or a data/broker disconnect — flatten and stop, don't "keep trying".
Circuit breakers: max-daily-loss and max-drawdown limits that automatically halt new entries.
Idempotency: idempotent order keys so a retry after a timeout never double-submits a position.
State reconciliation: continuously verify the system's view of open positions against the broker's truth; alert on any mismatch.
Connectivity & heartbeat: detect disconnects fast and behave safely — never leave orphaned orders or unmanaged positions.
Deterministic runtime: freeze inputs at decision time (point-in-time data, cached values), so the same bar always produces the same action — no surprise recomputation.
Full audit log: every signal, order, fill, and rejection recorded — both for debugging and for honest performance attribution.

Crash safe, not crash trying A system that halts cleanly and flat on an unexpected condition is far safer than one that gamely keeps placing orders into a state it doesn't understand. When in doubt, the correct default for a trading system is to stop and alert a human — flat is a position you can always recover from.

14Monitoring & Edge Decay

A live system is not "set and forget". Edges decay, regimes shift, and markets adapt to the inefficiencies you're exploiting. The job after launch is to know — quantitatively — whether the system is still the one you validated, and to have decided in advance what you'll do when it isn't.

Track live against backtest, continuously

Maintain rolling live statistics — expectancy, profit factor, win rate, average R — and compare them to the distribution your backtest and Monte Carlo produced (Section 11). Inside those bounds is normal variance; persistently outside them is a signal worth investigating.

Treat the equity engine like a process under control. While rolling expectancy wanders within the limits (set from your Monte Carlo / historical drawdown distribution), it's noise. A sustained move below the lower limit is a trigger to investigate — not necessarily to stop, but to find out why.

Variance or decay? — the hard distinction

The central difficulty of monitoring is telling a normal losing streak (variance around a still-positive mean, which Section 03 proved is inevitable) from genuine edge decay (the inefficiency is gone). Over-react and you abandon good systems in normal drawdowns; under-react and you feed a dead one. The only defence is pre-defined, quantitative thresholds set while calm.

Cause of decay	What happened	Tell
Regime change	Your archetype's regime left (trend turned to chop)	Underperformance concentrated in one regime; filter no longer firing
Crowding	Others found and arbitraged the same edge	Slow, persistent erosion of expectancy across regimes
Structural change	Market microstructure, spreads, or participants shifted	Costs/slippage drift up; fills worsen vs the model
Parameter drift	The world moved; your fixed parameters didn't	Walk-forward would now pick very different values

Pre-committed pause and retire rules

Decide these in calm and write them down — because in a live drawdown your judgement is compromised by the very situation it's judging.

Pause when a hard limit is breached: max drawdown hit, or daily-loss cap reached. Stop new entries, reassess.
Review when live expectancy sits outside its control band for a pre-set number of trades — investigate before deciding.
Retire when the thesis is invalidated — the market behaviour the system bets on demonstrably no longer holds. A dead edge doesn't deserve loyalty.

Two symmetric failure modes Monitoring exists to defend against both: abandoning a good system inside a statistically normal drawdown, and clinging to a dead system hoping it returns. In the moment you will be biased toward one or the other depending on your recent results — which is exactly why the thresholds must be set in advance, quantitatively, and obeyed.

15Psychology & Adherence

The system is the easy part. The hard part is the human operating it. The most common reason a profitable system loses money is not a flaw in the rules — it is a failure to follow them. Discipline is not a personality trait you either have or lack; it is infrastructure you build so that in-the-moment judgement can't quietly destroy the edge.

How a profitable system becomes a losing one

Every item below converts a positive-expectancy system into a negative one — without changing a single rule:

Overriding a valid signal because it "feels wrong" — usually right when the next winner arrives.
Skipping trades after a losing streak — abandoning the system at the bottom, missing the recovery.
Sizing up after wins — the most dangerous one; the biggest losses tend to follow the biggest, most confident bets.
Revenge trading after a loss — taking setups outside the system to "make it back".
Moving stops to avoid being wrong — converting a defined 1R loss into an undefined disaster.

The biases doing the damage

Bias	Mechanism	Damage to the system
Loss aversion	Losses hurt ~2× as much as equivalent gains feel good	Cutting winners early, holding losers past the stop
Recency bias	Over-weighting the last few trades	Abandoning the system after a normal losing streak
Outcome bias	Judging a decision by its single result	Distrusting a good system after an unlucky loss
Gambler's fallacy	Believing you're "due" for a win	Sizing up to recover, breaking risk rules
Post-win overconfidence	Recent success inflates perceived skill	Sizing up into the next, larger loss
Confirmation bias	Seeking evidence for what you want to do	Rationalising past the regime filter's "no"

Systematic does not mean emotionless

Even a fully automated system leaves you one decision: whether to keep it running through a drawdown. That single choice — made under maximum emotional pressure — is where most automated edges die too. The answer is not to "be more disciplined"; it is to remove the moment of weakness from the loop wherever you can.

The journal: making adherence measurable

Log every trade — entry, exit, size, R-multiple — and whether you followed the system, and if not, why. Then separate system P&L (what the rules would have made) from deviation P&L (the cost of your overrides). This turns discipline from a vague aspiration into a number you can confront.

The feedback loop Most traders discover, on doing this honestly, that their overrides have a strongly negative expectancy — that they would have made more money asleep. Quantifying the cost of discretion is the most powerful argument for sticking to the system, and it is exactly the kind of continuous, evidence-based coaching the Reign Edge journal is designed to surface.

Building discipline infrastructure

Pre-commit the rules in calm — including pause/retire thresholds (Section 14) — when you are not in a trade.
Automate what you can — automation removes the moment of weakness entirely; a rule a machine executes can't be overridden in a panic.
Size so you can sleep (Section 04) — most overrides are driven by positions that are simply too large.
Use a pre-trade checklist — force every entry through the same gate, every time.

Separate the decision from the outcome A good decision can lose; a bad decision can win. Judging your process by individual outcomes trains exactly the wrong reflexes — abandoning sound systems after unlucky losses, trusting reckless ones after lucky wins. Grade the process, not the trade. Over a large enough sample, a sound process and faithful execution are the only things you actually control.

16Worked Example & Pre-Launch Checklist

Every preceding section, applied once, end to end, to a single concrete system. This is the full lifecycle — from a one-sentence hypothesis to a monitored live system with pre-committed exit rules — run as one continuous workflow.

The lifecycle loop

The lifecycle. Sections 06→02→08→11→12→13→14 mapped onto a loop: a system is continuously re-validated and improved, and the loop only stops when the edge is retired. "Set and forget" is not on this diagram.

End to end: the GBPUSD trend-pullback

Taking the system specified in Section 02 through every stage:

Hypothesise. "Liquid FX trends persist intraday; after a shallow pullback to the mean within an established uptrend, continuation is more likely than reversal." One sentence, falsifiable.
Specify. The full eight-component spec from Section 02 — universe (GBPUSD 1H), regime filter (200/50-EMA + news blackout), setup (pullback to 20-EMA), entry (buy-stop above the reclaim bar), stop (1.5×ATR = 1R), size (0.5% fixed-fractional), exit (half at +1R, trail remainder, 24-bar time stop), manage (breakeven at +1R). The stranger test passes.
Data. Several years of 1H bars from a single feed that will also be the live broker, spanning trending, ranging, and at least one volatile crisis period; tick data reserved to resolve intrabar stop-vs-target order (Section 07).
Backtest. Fast vectorised triage to confirm the idea has a pulse, then an event-driven re-test mirroring the live engine, with pessimistic spread + slippage + swap (Section 08).
Validate. In-sample / out-of-sample split; rolling walk-forward; Monte Carlo on the trade sequence; ±20% parameter sweep for a plateau; regime slicing (Section 11).
Forward & micro-live. Run forward on real-time data, then micro-size live, tracking live expectancy against the backtest distribution (Section 12).
Scale & monitor. Ramp size only while live tracks backtest; maintain a control band; obey the pre-committed pause/retire rules (Sections 12 & 14).

Illustrative results only — not a real backtest The figures below are hypothetical, shown only to demonstrate how a validation report reads. They are not results from any actual system.

Metric	In-sample	Out-of-sample	Read
Trades	420	180	Ample sample both windows
Expectancy	+0.28R	+0.24R	Holds OOS — encouraging
Win rate / payoff	42% / 2.6	40% / 2.5	Low win rate, high payoff — consistent with a trend system
Profit factor	1.55	1.48	Solid, not suspiciously high
Max drawdown	14%	16%	Survivable; check Monte Carlo tail
Walk-forward efficiency	0.86		OOS keeps most of IS edge — robust, not a cliff

The master pre-launch checklist

Before a single real dollar

Edge: positive expectancy after pessimistic costs, 100+ trades, multiple regimes (§03, §08).
Specification: total, contradiction-free, passes the stranger test; few justified parameters (§06).
Data: single feed, parity backtest↔live, gaps and timezones handled (§07).
Validation: OOS holds, walk-forward efficiency healthy, parameter plateau, Monte Carlo worst-case survivable (§11).
Sizing: risk-per-trade and portfolio heat set backwards from a survivable drawdown; well under Kelly (§04).
Operations: kill switch, daily-loss cap, idempotent orders, reconciliation, logging — coded, not intended (§13).
Discipline: pre-committed pause/retire thresholds; a journal that separates system P&L from deviation P&L (§14, §15).

Failure-mode map

How systems blow up	Prevented by
Trading with no real edge	§03 expectancy · §08 honest backtest · §11 validation
Sizing too large → ruin / abandonment	§04 sizing & risk of ruin
Overfitting a gorgeous backtest	§09 parsimony · §11 walk-forward & Monte Carlo
Look-ahead / data bias inflating results	§07 data hygiene · §08 event-driven loop
Costs quietly eating the edge	§08 cost model · §13 execution
No regime awareness (right system, wrong market)	§02 filter · §05 archetypes · §14 monitoring
Non-deterministic logic in the risk path	§13 decision/risk split & fail-safes
Abandoning a good system / clinging to a dead one	§14 pre-committed rules · §15 adherence

The whole handbook in one sentence Find a real, specified edge; prove it honestly on history; size it so a normal losing streak can't kill you; operate it deterministically; and have decided, in advance and in writing, exactly when you'll pause, scale, or retire it — because every other decision will be made by a worse version of you, mid-drawdown.

17Glossary

The core vocabulary of system development, in plain terms. Each definition is the working sense used throughout this handbook.

Edge

A statistical advantage that produces positive expectancy after costs over a large sample.

Expectancy

Average profit/loss per trade: (win% × avg win) − (loss% × avg loss). The master profitability number.

R-multiple

Profit/loss expressed in units of initial risk. A trade making twice its risk is +2R; a full stop-out is −1R.

Payoff ratio

Average winning trade ÷ average losing trade (reward:risk). Couples with win rate to determine the edge.

Profit factor

Gross profit ÷ gross loss. Above 1.0 is profitable; 1.3–1.6 is solid.

Win rate

Proportion of trades that profit. Meaningless without the payoff ratio.

Drawdown

A decline in equity from a prior peak, measured in percent or currency.

Max drawdown

The largest peak-to-trough equity decline over a period; its duration often matters more than its depth.

CAGR

Compound annual growth rate — the smoothed annualised return. Ignores risk in isolation.

Sharpe ratio

Excess return ÷ total volatility. Penalises all volatility and assumes near-normal returns.

Sortino ratio

Excess return ÷ downside volatility — fairer to systems with asymmetric (upside-skewed) returns.

Calmar / MAR

CAGR ÷ |max drawdown| — return per unit of worst pain.

Position sizing

Translating a risk budget and stop distance into a trade quantity. The survival lever.

Fixed fractional

Risking a constant percentage of equity per trade. The sensible default.

Kelly criterion

The growth-optimal bet fraction. Used only fractionally and as a ceiling, never raw.

Risk of ruin

The probability of losing enough capital to be unable or unwilling to continue.

Portfolio heat

Total open risk across all positions at once; correlated positions count as one.

Regime

The prevailing market behaviour (trending, ranging, high/low volatility). Every edge needs a specific one.

Backtest

Simulating a system on historical data to estimate its expectancy and conditions of success.

In-sample / out-of-sample

Data used for development (IS) versus data reserved for honest, one-shot testing (OOS).

Walk-forward analysis

Repeatedly optimising on one window and testing on the next unseen window, rolling forward.

Monte Carlo

Reshuffling/resampling trade results many times to reveal the distribution of outcomes and drawdowns.

Overfitting

Fitting the noise in a historical sample rather than the signal; looks great in backtest, fails live.

Look-ahead bias

Using information not available at decision time. The deadliest silent inflator of results.

Survivorship bias

Testing only instruments that still exist, ignoring those that failed and were removed.

Slippage

The difference between the expected fill price and the actual one; worst on stops in fast markets.

Spread

The bid/ask gap — a cost paid on entry and embedded in exit. Widens in thin liquidity and around news.

Swap / rollover

Interest earned or paid for holding an FX position overnight, based on the rate differential.

MAE / MFE

Maximum Adverse / Favourable Excursion — how far a trade ran against / for you before closing.

Kill switch

A global halt that flattens positions and stops trading on a defined dangerous condition.

Idempotency

Designing order submission so a retry never duplicates a position — critical for safe automation.

Walk-forward efficiency

Out-of-sample performance ÷ in-sample performance; a measure of how well an edge generalises.

Keep reading

The Technical Analysis Handbook covers the entry-and-structure layer this handbook assumes — candlesticks, market structure, indicators, patterns, confluence, and order flow — each with exact rules.

Read the Technical Analysis Handbook →

Design, test, operate.

01What a Trading System Is

The definition that matters

Discretionary, systematic, and hybrid

Why systematise an edge you already trade

What a system is not

The two things every viable system needs

1 · A real edge

2 · Survival

02Anatomy of a Complete System

The eight components

A worked specification

As a style (untradeable)

As a system (tradeable)

03Edge & Expectancy

Expectancy: the master number

Win rate is a vanity metric

Profit factor — a second lens

Frequency: expectancy is per trade, growth is per year

System A

System B

Expectancy is a claim about samples, not trades

04Position Sizing & Risk of Ruin

The job of position sizing

Sizing methods, ranked

The Kelly criterion — and why nobody trades full Kelly

Risk of ruin

Drawdown is the binding constraint

Portfolio heat & correlation

05Strategy Archetypes

The two parents: trend and mean reversion

The archetype map

Multi-timeframe: a structure, not an archetype

06From Idea to Specification

Hypothesis before indicators

Indicator-first (backwards)

Hypothesis-first (correct)

The specification: zero ambiguity

Contradictions and gaps: a spec must be total

Structure versus parameters

07Data Foundations

The raw material: bars and OHLCV

Tick vs bar data

Data quality — clean before you trust

FX-specific realities

Feed parity: one source, end to end

08Backtesting

Vectorised vs event-driven

The event-driven loop (the mental model)

Costs: the quiet edge-killer

In-sample and out-of-sample

What a credible backtest reports

09Overfitting & the Bias Catalog

What overfitting actually is

Symptoms

The bias catalog

The multiple-comparisons problem

The parsimony toolkit

10Performance Metrics

Return, risk-adjusted return, and the equity picture

Drawdown and trade quality

11Robustness & Validation

Walk-forward analysis — the gold standard

Anchored WFA

Rolling WFA

Monte Carlo — the range of luck

Sensitivity, regime, and stress

12From Backtest to Live

Three testing modes

The implementation gap

Incubate, then scale in

13Execution & Operations

Order types

FX execution realities

The decision / risk split

Fail-safes & operational hardening

14Monitoring & Edge Decay

Track live against backtest, continuously

Variance or decay? — the hard distinction

Pre-committed pause and retire rules