Definition 11 (Risk Set) For a survival study with \(n\) subjects, let \(T_i\) denote the true event time and \(C_i\) the censoring time for subject \(i\), and let \(\tilde{T}_i = \min(T_i, C_i)\) be the observed follow-up time. The risk set at time \(t\) is
\[\mathcal{R}(t) \;\stackrel{\text{def}}{=}\; \bigl\{i \in \{1, \ldots, n\} : \tilde{T}_i \geq t\bigr\},\]
the set of subjects still under observation (neither having experienced the event nor been censored) immediately before time \(t\).
The number at risk at time \(t\) is
\[r(t) \;\stackrel{\text{def}}{=}\; \lvert \mathcal{R}(t) \rvert.\]
At each ordered event time \(t_i\), this count is written \(r_i \stackrel{\text{def}}{=}r(t_i)\).
Example 9 In Figure 10, the 8-person study has \(\tilde{T}_i \in \{2, 3, 4, 5, 6, 6, 7, 8\}\). At time \(t = 4\), subjects 3, 4, 5, 6, 7, and 8 still have \(\tilde{T}_i \geq 4\), so \(\mathcal{R}(4) = \{3, 4, 5, 6, 7, 8\}\) and \(r(4) = 6\).
To see why multiplying the conditional survival factors estimates marginal survival, write the ordered distinct exit times as \(y_1 < y_2 < \cdots < y_m\). Here \(Y\) denotes exit time (either event time or censoring time), while \(T\) denotes the underlying event time, which is not always observed.
Because \(y_{j-1}\) and \(y_j\) are consecutive distinct exit times, there are no exits between them. Therefore the conditional probability of surviving between these two times is 1:
\[
\Pr(Y \geq y_j \mid Y > y_{j-1}) = 1.
\]
Then the event \(\{Y \geq y_j\}\) is contained in the event \(\{Y > y_{j-1}\}\), because \(y_j > y_{j-1}\). Therefore intersecting \(\{Y \geq y_j\}\) with \(\{Y > y_{j-1}\}\) does not change the event:
Theorem 10 (Consecutive Exit-Time Identity) If \(\Pr(Y \geq y_j \mid Y > y_{j-1}) = 1\), then:
\[\Pr(Y \geq y_j) = \Pr(Y > y_{j-1})\]
Proof. \[
\begin{aligned}
\Pr(Y \geq y_j)
\\
&= \Pr(Y \geq y_j, Y > y_{j-1})
\\
&= \Pr(Y \geq y_j \mid Y > y_{j-1}) \Pr(Y > y_{j-1})
\\
&= 1 \Pr(Y > y_{j-1})
\\
&= \Pr(Y > y_{j-1})
\end{aligned}
\]
The first equality uses the containment \(\{Y \geq y_j\} \subseteq \{Y > y_{j-1}\}\) (subset property). The second equality uses the multiplication rule for probabilities, and the third equality uses the between-exit-time survival assumption.
Let \(\kappa(y_j) = \Pr(Y > y_j \mid Y \geq y_j)\) denote the conditional probability of surviving past the exit time \(y_j\), given survival up to \(y_j\). The marginal survival through \(y_j\) can then be written recursively:
\[
\begin{aligned}
\operatorname{S}\mathopen{}\left(y_j\right)\mathclose{}
&=\Pr(Y > y_j)
\\
&= \Pr(Y > y_j, Y \geq y_j)\\
&= \Pr(Y > y_j \mid Y \geq y_j)\Pr(Y \geq y_j)\\
&= \Pr(Y > y_j \mid Y \geq y_j)\Pr(Y > y_{j-1})\\
&= \kappa(y_j)\Pr(Y > y_{j-1})
\\
&= \kappa(y_j)\operatorname{S}\mathopen{}\left(y_{j-1}\right)\mathclose{}.
\end{aligned}
\]
Here the first equality again uses the subset property: if \(Y > y_j\), then \(Y \geq y_j\), so \(\{Y > y_j\} \subseteq \{Y \geq y_j\}\). The third equality substitutes the Theorem 10 result, \(\Pr(Y \geq y_j) = \Pr(Y > y_{j-1})\).
The same recursion can be applied repeatedly. If \(S_j = \Pr(Y > y_j)\) and \(S_0 = 1\), then:
\[
\begin{aligned}
S_1
&= \kappa(y_1)S_0\\
&= \kappa(y_1),\\
S_2
&= \kappa(y_2)S_1\\
&= \kappa(y_2)\kappa(y_1),\\
S_3
&= \kappa(y_3)S_2\\
&= \kappa(y_3)\kappa(y_2)\kappa(y_1).
\end{aligned}
\]
Continuing in this way gives \(S_j = \prod_{k=1}^j \kappa(y_k)\).
To estimate the survival function for the underlying event time \(T\), we replace the conditional survival probabilities \(\Pr(T > y_k \mid T \geq y_k)\) with their empirical estimates at the observed exit times. This gives the Kaplan-Meier product-limit estimator:
\[
\begin{aligned}
\hat{\Pr}(T > t)
&= \hat{S}_{KM}(t)\\
&= \prod_{k:\, y_k \leq t} \hat{\Pr}(T > y_k \mid T \geq y_k)\\
&= \prod_{k:\, y_k \leq t} \hat{\kappa}_k\\
&= \prod_{k:\, y_k \leq t} \frac{r_k - d_k}{r_k}
\end{aligned}
\]
where \(r_k\) is the number at risk at time \(y_k\) and \(d_k\) is the number of events (not total exits) at time \(y_k\). The empirical factor \((r_k - d_k)/r_k\) estimates \(\Pr(T > y_k \mid T \geq y_k)\) by counting only events in the numerator, since censoring times do not provide information about the event time distribution.
For any time \(t\) between two consecutive exit times, the estimate stays constant, because the only additional conditional survival factors are 1.