01 Concept

Scale Function

The Opening Problem

Consider a diffusion process $X(t)$ that starts at a point $x$ lying strictly between two levels $a$ and $b$, with $a \lt x \lt b$. The process drifts and fluctuates, and sooner or later it will leave the interval $(a,b)$ through one of its two ends. The question that organizes this entire page is deceptively simple to state: what is the probability that $X$ reaches the upper level $b$ before it reaches the lower level $a$? Because the diffusion is a Markov process, this probability depends only on the present position $x$ and not on the history of how the process arrived there, so it is meaningful to write it as a function of the starting point, $\mathbb{P}_x(\tau_b \lt \tau_a)$.

This two-sided exit problem is not a curiosity; it is one of the load-bearing questions of the whole theory, because an astonishing range of practical quantities are exit probabilities in disguise. In population biology, $X$ might track the frequency of an allele in a finite population, and the question becomes whether the allele fixes (reaches frequency $b=1$) or is lost (reaches $a=0$) — the basic dichotomy of genetic drift. In a sequential signal-detection or quality-control setting, $X$ might be the accumulated log-likelihood ratio between two competing hypotheses, drifting upward under one and downward under the other; the experimenter sets two decision thresholds and acts as soon as either is crossed, so the reliability of the procedure is exactly the probability of hitting the correct threshold first. The same structure appears in the ruin problem of insurance, the firing of a neuron whose membrane potential accumulates toward a spike threshold, and the triggering of a trade when a price reaches a target before a stop-loss.

What unites these examples is that we are never asking when the process exits — only where. That distinction will turn out to be the key to a clean answer, and we will hold the timing question deliberately at arm's length until much later. For now we resist the temptation to answer the exit problem directly, and instead look at the two simplest diffusions, where the answer can be obtained by hand and where the path to the general theory becomes visible.

Warm-Up: The Simple Cases

Standard Brownian Motion on $[a,b]$

Let $X(t) = W(t)$ be standard Brownian motion started at $x$, and define the two first passage times

$$\tau_a = \inf\{t \ge 0 : X(t) = a\}, \qquad \tau_b = \inf\{t \ge 0 : X(t) = b\}.$$

Both are stopping times, and we write $\tau = \tau_a \wedge \tau_b$ for the moment the process first leaves $(a,b)$. The decisive structural fact about Brownian motion is that it is a martingale: on average it neither rises nor falls. If we may evaluate that "no net movement" property at the random time $\tau$ rather than at a fixed time, then the expected position at exit must equal the starting position. This is precisely what the optional stopping theorem licenses, provided $\tau$ is almost surely finite and $X$ stays bounded up to $\tau$ — both of which hold here, since the process is confined to the bounded interval $[a,b]$ until it exits and a recurrent Brownian motion exits in finite time. Optional stopping therefore gives

$$\mathbb{E}_x[X(\tau)] = x.$$

At the exit time the process sits at exactly one of the two boundaries, so $X(\tau)$ takes the value $a$ or $b$. Writing $p = \mathbb{P}_x(\tau_b \lt \tau_a)$ for the probability of exiting at the top, the expectation on the left is a weighted average of the two boundary values:

$$b\,p + a\,(1-p) = x.$$

Solving this single linear equation yields the answer to our problem in the Brownian case:

$$\mathbb{P}_x(\tau_b \lt \tau_a) = \frac{x-a}{b-a}.$$

The result is linear in the starting point, and that linearity has a vivid geometric meaning. Start exactly in the middle and you are equally likely to exit on either side; start a fraction of the way up the interval and that same fraction is your probability of reaching the top first. The probability is simply the relative position of $x$ within $[a,b]$. This is the cleanest possible answer, and the rest of the theory is, in a precise sense, an effort to recover this same cleanliness for diffusions that are not so symmetric.

Brownian Motion with Constant Drift

Now tilt the process. Let $X$ satisfy $dX(t) = \mu\,dt + dW(t)$ with a constant drift $\mu \neq 0$, and pose the identical exit problem on $(a,b)$. The previous argument breaks at its very first step: $X$ is no longer a martingale, because the drift gives it a systematic tendency to move in one direction, so $\mathbb{E}_x[X(\tau)] \neq x$ in general. We cannot equate the expected exit position with the start, and the linear answer collapses.

That a positive drift should raise the probability of exiting at the top is intuitively clear, but the dependence on $x$ is now genuinely nonlinear. To see why, notice that the drift makes positions unequally "valuable": being near $b$ when the drift points upward is worth more than the same distance would be worth for driftless motion, because the current is helping you along. The probability therefore cannot grow at a constant rate as $x$ moves across the interval.

The martingale argument was so clean that we should be reluctant to abandon it. The obstacle is not the method but the coordinate: $X$ itself fails to be a martingale. So we ask whether there is some function of the process that restores the martingale property. Concretely, we look for a strictly increasing function $s$ such that the transformed process $s(X(t))$ is a martingale. If such an $s$ exists, then $s(X)$ plays exactly the role that $X$ played in the driftless case, and the entire optional-stopping computation goes through verbatim in the new coordinate. Finding that function is a calculus problem, and the tool that converts "function of a diffusion" into something whose drift we can read off is Itô's formula. This is the idea from which the scale function is born.

The Scale Function: Definition and Meaning

The Key Idea

The scale function $s(x)$ is the function — unique up to an affine transformation — for which $s(X(t))$ becomes a (local) martingale. Where the raw process carries a drift, the composed process $s(X)$ carries none. In effect, $s$ does not alter the dynamics of $X$; it relabels the state space, assigning new coordinates in which the process is driftless. One can say that $s$ levels the playing field: it stretches the axis exactly enough in each region to absorb the local pull of the drift, so that in the new coordinate the process has no preferred direction and behaves, as far as exit probabilities are concerned, like Brownian motion. Once the process is in natural scale, the linear Brownian answer applies, and the only remaining work is to compute $s$.

Formal Definition

Take a regular one-dimensional diffusion on an interval $(l,r)$ governed by

$$dX(t) = \mu(X(t))\,dt + \sigma(X(t))\,dW(t),$$

with $\sigma^2(x) \gt 0$ on the interior. We seek a sufficiently smooth function $s$ for which $s(X(t))$ is a local martingale. Applying Itô's formula to $s(X(t))$ and substituting the dynamics of $X$, the increment decomposes into a finite-variation (drift) part and a stochastic-integral (martingale) part:

$$ds(X(t)) = s'(X(t))\,dX(t) + \tfrac{1}{2}s''(X(t))\,d\langle X\rangle(t) = \underbrace{\Big[\mu(X)\,s'(X) + \tfrac{1}{2}\sigma^2(X)\,s''(X)\Big]dt}_{\text{drift part}} + \underbrace{s'(X)\,\sigma(X)\,dW(t)}_{\text{martingale part}}.$$

A process is a local martingale precisely when its drift part vanishes identically. Setting the bracketed coefficient to zero gives the defining equation of the scale function, a linear second-order ordinary differential equation:

$$\tfrac{1}{2}\sigma^2(x)\,s''(x) + \mu(x)\,s'(x) = 0.$$

This ODE is elementary to solve because it is first-order in the unknown $s'$. Dividing by $\tfrac{1}{2}\sigma^2 s'$ rearranges it into $\dfrac{s''(x)}{s'(x)} = -\dfrac{2\mu(x)}{\sigma^2(x)}$, whose left side is $\big(\log s'\big)'$. Integrating once and exponentiating gives the scale density

$$s'(x) = C\,\exp\!\left(-\int^{x} \frac{2\mu(y)}{\sigma^2(y)}\,dy\right),$$

and integrating a second time gives the scale function itself,

$$s(x) = \int^{x} s'(u)\,du = \int^{x}\!\exp\!\left(-\int^{u}\frac{2\mu(y)}{\sigma^2(y)}\,dy\right)du,$$

each integral determined only up to a constant of integration. The two constants are exactly the multiplicative constant $C$ and the additive constant from the outer integral, so $s$ is determined only up to an affine map $s \mapsto \alpha s + \beta$ with $\alpha \gt 0$. This nonuniqueness is harmless for our purpose. The exit probability we are about to write is a ratio of differences of $s$, and under $s \mapsto \alpha s + \beta$ every difference $s(u)-s(v)$ is multiplied by $\alpha$ while every additive $\beta$ cancels; the ratio is therefore unchanged. We are free to normalize $s$ however is convenient — for instance so that $s(a)=0$ and $s(b)=1$.

The Hitting Probability Formula

Because $s$ was constructed to make $s(X)$ a local martingale, and because $s$ is bounded on the closed interval $[a,b]$ (so the local martingale is a genuine martingale up to the exit time), the optional-stopping computation from the Brownian warm-up now applies to $s(X)$ in place of $X$. Replacing $x$, $a$, $b$ by $s(x)$, $s(a)$, $s(b)$ in the linear formula gives the central result of this page:

$$\mathbb{P}_x(\tau_b \lt \tau_a) = \frac{s(x)-s(a)}{s(b)-s(a)}.$$

The formula is nothing more than the Brownian answer $\dfrac{x-a}{b-a}$ rewritten in scale coordinates. It says that the exit problem for any regular one-dimensional diffusion is linear once the state space is measured by $s$: the probability of reaching $b$ first is the relative position of the start, not on the original axis, but on the $s$-transformed axis. The two warm-ups are the two extremes of this statement. For Brownian motion the drift is zero, the ODE reduces to $s''=0$, and $s(x)=x$ up to affine maps, so scale coordinates coincide with ordinary coordinates and the formula is exactly the linear one. As soon as a drift is present, $s$ becomes nonlinear, bending the axis to encode the asymmetry that the linear formula cannot.

Worked Examples

Example 1: Brownian Motion with Constant Drift

(i) Setup. Let $dX = \mu\,dt + dW$ on the interval $[0,1]$, started at $x \in (0,1)$, with constant $\mu \neq 0$. We want $\mathbb{P}_x(\tau_1 \lt \tau_0)$, the probability of reaching $1$ before $0$.

(ii) Scale density. Here $\mu(x)=\mu$ and $\sigma(x)=1$, so $\dfrac{2\mu(x)}{\sigma^2(x)} = 2\mu$ and

$$s'(x) = \exp\!\left(-\int^{x} 2\mu\,dy\right) = e^{-2\mu x},$$

taking the multiplicative constant to be one.

(iii) Scale function. Integrating from $0$,

$$s(x) = \int_0^x e^{-2\mu y}\,dy = \frac{1 - e^{-2\mu x}}{2\mu}.$$

Normalizing affinely so that $s(0)=0$ and $s(1)=1$ removes the factor $2\mu$ and yields the clean form

$$s(x) = \frac{1 - e^{-2\mu x}}{1 - e^{-2\mu}}.$$

(iv) Hitting probability. With $a=0$ and $b=1$ the formula gives, since $s(0)=0$ and $s(1)=1$,

$$\mathbb{P}_x(\tau_1 \lt \tau_0) = \frac{s(x)-s(0)}{s(1)-s(0)} = \frac{1 - e^{-2\mu x}}{1 - e^{-2\mu}}.$$

(v) Interpretation. The exponential replaces the straight line $x$ of the driftless case with a curve. When $\mu \gt 0$ the curve bulges upward, raising the probability above $x$ for every interior point: the drift helps the process reach the top. The limiting behavior makes this concrete. As $\mu \to +\infty$, both $e^{-2\mu x}$ and $e^{-2\mu}$ vanish and the probability tends to $1$ for every $x \gt 0$ — an overwhelming upward current carries the process to $b$ almost surely. As $\mu \to -\infty$, the same algebra (the dominant terms are now the exponentials) drives the probability to $0$ for every $x \lt 1$ — a strong downward current sends it to $a$ first. And as $\mu \to 0$ the expression reduces, by a first-order expansion of both exponentials, to $x$, recovering the linear Brownian answer continuously.

Example 2: Geometric Brownian Motion

(i) Setup. Let $dX = \mu X\,dt + \sigma X\,dW$ on $(0,\infty)$, with $W$ standard Brownian motion, and take $0 \lt a \lt x \lt b$. This is the standard model for a strictly positive quantity such as an asset price; we want the probability of reaching the higher level $b$ before the lower level $a$.

(ii) Scale density. Now $\mu(x)=\mu x$ and $\sigma(x)=\sigma x$, so the ratio is $\dfrac{2\mu(x)}{\sigma^2(x)} = \dfrac{2\mu x}{\sigma^2 x^2} = \dfrac{2\mu}{\sigma^2}\cdot\dfrac{1}{x}$. Writing $\nu = \dfrac{2\mu}{\sigma^2}$,

$$s'(x) = \exp\!\left(-\int^{x}\frac{\nu}{y}\,dy\right) = \exp(-\nu \log x) = x^{-\nu}.$$

(iii) Scale function. Integrating, and writing $\gamma = 1-\nu = 1 - \dfrac{2\mu}{\sigma^2}$,

$$s(x) = \int^{x} u^{-\nu}\,du = \frac{x^{\gamma}}{\gamma} \quad (\gamma \neq 0), \qquad s(x) = \log x \quad (\gamma = 0).$$

The borderline case $\gamma = 0$ corresponds to $\mu = \tfrac{1}{2}\sigma^2$, where the power law degenerates into a logarithm. Up to affine normalization we may take $s(x)=x^{\gamma}$ in the generic case and $s(x)=\log x$ in the borderline case.

(iv) Hitting probability. Substituting into the formula,

$$\mathbb{P}_x(\tau_b \lt \tau_a) = \frac{x^{\gamma}-a^{\gamma}}{b^{\gamma}-a^{\gamma}} \quad (\gamma \neq 0), \qquad \mathbb{P}_x(\tau_b \lt \tau_a) = \frac{\log(x/a)}{\log(b/a)} \quad (\gamma = 0).$$

(v) Interpretation. Read $x$ as a current price, $b$ as a profit target, and $a$ as a stop-loss. The probability of touching the target before the stop is governed entirely by the exponent $\gamma$, which compares the drift $\mu$ against half the variance $\tfrac12\sigma^2$. A larger relative drift makes $\gamma$ more negative and the power $x^\gamma$ steeper near $a$, raising the chance of reaching the target; a larger volatility flattens the dependence. The logarithmic borderline $\mu = \tfrac12\sigma^2$ is exactly the drift at which the median of geometric Brownian motion is stationary, and there the exit probability depends only on the multiplicative position of $x$ between the two barriers.

Example 3: The Ornstein–Uhlenbeck Process

(i) Setup. Let $dX = -\theta X\,dt + \sigma\,dW$ with $\theta \gt 0$, the canonical mean-reverting diffusion, and take $a \lt x \lt b$. The drift always points back toward the origin, so we expect reaching a level far from $0$ to be harder than for Brownian motion.

(ii) Scale density. Here $\mu(x) = -\theta x$ and $\sigma(x)=\sigma$, so $\dfrac{2\mu(x)}{\sigma^2(x)} = -\dfrac{2\theta x}{\sigma^2}$ and

$$s'(x) = \exp\!\left(-\int_0^{x} \left(-\frac{2\theta y}{\sigma^2}\right)dy\right) = \exp\!\left(\frac{\theta x^2}{\sigma^2}\right).$$

(iii) Scale function. Integrating, and writing $\kappa = \sqrt{\theta}/\sigma$, the antiderivative of a Gaussian with a positive exponent is the imaginary error function $\operatorname{erfi}(z) = \tfrac{2}{\sqrt{\pi}}\int_0^z e^{t^2}\,dt$:

$$s(x) = \int_0^{x} e^{\theta y^2/\sigma^2}\,du = \frac{\sigma\sqrt{\pi}}{2\sqrt{\theta}}\,\operatorname{erfi}(\kappa x), \qquad \text{so up to affine maps} \quad s(x) = \operatorname{erfi}\!\left(\frac{\sqrt{\theta}}{\sigma}\,x\right).$$

It is worth pausing on the sign. The mean-reverting drift produces a scale density that grows like $e^{\theta x^2/\sigma^2}$, hence the imaginary error function rather than the ordinary one. This is the mirror image of the Ornstein–Uhlenbeck stationary density, which decays like $e^{-\theta x^2/\sigma^2}$ and is the object naturally written through the ordinary error function or the standard normal CDF. The scale function and the stationary density point in opposite directions, a contrast we return to among the pitfalls. Note also that the generator of the process, $L = \tfrac12\sigma^2\,\partial_{xx} - \theta x\,\partial_x$, annihilates $s$, i.e. $Ls = 0$; the scale function is the diffusion's analogue of a harmonic function, a function the generator sends to zero.

(iv) Hitting probability. With $\kappa = \sqrt{\theta}/\sigma$,

$$\mathbb{P}_x(\tau_b \lt \tau_a) = \frac{\operatorname{erfi}(\kappa x) - \operatorname{erfi}(\kappa a)}{\operatorname{erfi}(\kappa b) - \operatorname{erfi}(\kappa a)}.$$

(v) Interpretation. Because $\operatorname{erfi}$ grows faster than any power, the scale coordinate stretches the far reaches of the axis enormously, which is the precise statement that levels far from the origin are very hard to reach against the restoring drift. Compared with Brownian motion, mean reversion suppresses the probability of large excursions and lengthens the time spent loitering near $0$. The limit $\theta \to \infty$ is striking: as the restoring force overwhelms the noise, the stretching becomes so severe that the process hits whichever boundary is closer to the origin almost surely. If $|a| \lt b$ the probability tends to $0$ (it reaches $a$ first); if $b \lt |a|$ it tends to $1$ (it reaches $b$ first). Strong mean reversion makes the nearer barrier, measured from the mean, the inevitable one.

Geometric and Probabilistic Intuition

The single picture to carry away is that the scale function is a change of ruler. The process moves through real space with a bias, favoring some direction because of its drift. The scale function builds a new ruler for that same space, one whose tick marks are crowded together in regions the drift tends to push the process toward, and spread far apart in regions the drift tends to push it away from. Measured against this new ruler, the bias disappears: equal probabilistic "steps" now correspond to equal increments of $s$, and the process explores its new coordinate without preference, exactly as driftless Brownian motion explores ordinary space.

A physical analogy makes the mechanism tangible. Imagine a marble rolling on a gently tilted, uneven sheet, buffeted continually by random taps. The tilt is the drift; it biases where the marble tends to go. The scale function is the act of finding new horizontal and vertical coordinates in which the tilted sheet looks perfectly flat. On that flattened terrain the marble has no downhill to prefer, so the chance of it reaching one edge before another is simply a matter of how far each edge is in the flattened coordinates. Regions the marble was naturally rolling away from are, in the new coordinates, placed correspondingly farther off, which is why they are reached with correspondingly smaller probability. The scale function is the dictionary between the tilted world we observe and the flat world in which the answer is a straight line.

It is equally important to be clear about what this ruler does not measure. The scale function governs where the process goes — the probabilities of the two-sided exit — but it says nothing about when. Two diffusions can share the same scale function, and hence identical hitting probabilities, while one of them takes vastly longer to exit than the other. Timing is carried by a different object entirely, which weighs how much real time the process spends in each region; that is the subject of the Speed Measure, and the two together — scale for direction, speed for duration — determine the diffusion's behavior completely.

The Scale Function and Recurrence

On an unbounded interval the scale function also decides whether the process keeps returning to where it has been or eventually escapes. The criterion is the behavior of $s$ at the two ends. If $s(+\infty) = +\infty$ and $s(-\infty) = -\infty$ — that is, if the scale axis is infinitely long in both directions — then from any starting point the process is certain to reach any level eventually, and the diffusion is recurrent. If instead one of these limits is finite, the scale axis has a finite end in that direction, the boundary is reachable with positive probability "at infinity," and the diffusion is transient toward that end, drifting off without return. A full account of how the ends of the interval behave — accessible or not, attracting or not — is the content of natural boundaries and boundary classification.

Brownian motion with drift illustrates the dichotomy exactly. From Example 1 the scale function is proportional to $1 - e^{-2\mu x}$, equivalently $-e^{-2\mu x}$ up to affine maps. For $\mu \gt 0$ this tends to $-\infty$ as $x \to -\infty$ but to a finite limit as $x \to +\infty$, so the process is transient upward — it wanders off to $+\infty$ and is never certain to return — while remaining recurrent on its lower side. For $\mu = 0$ the scale function is $s(x)=x$, both limits are infinite, and Brownian motion is (neighborhood) recurrent: it revisits every level infinitely often. The single sign of the drift, read through the scale function, flips the process between these qualitatively different fates. The general theory of when and how often a diffusion returns is developed under Recurrence and Transience.

Common Misconceptions and Pitfalls

  • The scale function is not the stationary density. They are built from the same ingredients but with opposite signs in the exponent: the scale density is proportional to $\exp\!\big(-\!\int 2\mu/\sigma^2\big)$, whereas the stationary density is proportional to $\sigma^{-2}\exp\!\big(\!\int 2\mu/\sigma^2\big)$. The Ornstein–Uhlenbeck process makes the contrast unmistakable — its scale density explodes like $e^{+\theta x^2/\sigma^2}$ while its stationary density is the decaying Gaussian $e^{-\theta x^2/\sigma^2}$. One governs hitting probabilities; the other describes long-run occupation.
  • Do not forget the affine nonuniqueness. The scale function is defined only up to $s \mapsto \alpha s + \beta$ with $\alpha \gt 0$. Any normalization is legitimate, and the hitting-probability ratio is invariant under all of them. Errors creep in when one silently fixes constants in two different ways within the same calculation; choose a normalization once and keep it.
  • Check $a \lt x \lt b$ and well-definedness before applying the formula. The exit formula presupposes a regular diffusion on $(l,r)$ with the start strictly interior and both barriers reachable. If a barrier coincides with an inaccessible or natural boundary, or if $x$ is not strictly between $a$ and $b$, the ratio can be meaningless or degenerate. Verify the setup before substituting.
  • The scale function does not give hitting times. It answers "which boundary first," never "how long until exit." Expected exit times require the speed measure (and the Green's function built from scale and speed together). Reading a duration off $s$ alone is a category error.
  • Keep $x$-space and $s(x)$-space distinct. The probability is linear in $s(x)$, not in $x$. It is tempting to interpolate "halfway between $a$ and $b$" on the original axis and expect probability one-half; that holds only after transforming to scale coordinates. The midpoint in $x$ is generally not the midpoint in $s$.

← Back to Roadmap