ICH E9(R1) Section 5.7
The probability of obtaining results at least as extreme as those observed in the study, assuming that the null hypothesis of no treatment effect is true.
The p-value serves as a measure of statistical evidence against the null hypothesis, indicating how likely the observed results would be if there were truly no treatment effect. A small p-value suggests that the observed data are unlikely under the null hypothesis, providing evidence in favor of a treatment effect. By convention, p-values below 0.05 are typically considered statistically significant, though this threshold is arbitrary and should be interpreted in context with effect sizes, confidence intervals, and clinical relevance.
Proper interpretation of p-values requires understanding what they do and do not represent. The p-value is not the probability that the null hypothesis is true, nor is it the probability that the observed result occurred by chance alone. Rather, it represents the probability of obtaining results as extreme or more extreme than those observed, calculated under the assumption that the null hypothesis is true. A p-value of 0.03 means there is a 3% probability of seeing results this extreme if no treatment effect exists, not that there is a 3% probability that the treatment does not work.
Clinical trials typically pre-specify the significance level, often 0.05, which defines the threshold for declaring statistical significance. When p-values fall below this threshold, the null hypothesis is rejected in favor of the alternative hypothesis of a treatment effect. However, statistical significance should not be conflated with clinical significance; a treatment may produce statistically significant but clinically trivial effects, particularly in large trials with high precision. Conversely, clinically meaningful effects may fail to reach statistical significance in underpowered studies.
Significant result
"The primary analysis yielded a p-value of 0.002, indicating that results this extreme would occur less than 0.2% of the time if there were no true treatment effect, providing strong statistical evidence against the null hypothesis."
Borderline result
"With a p-value of 0.048, the result was nominally statistically significant at the pre-specified 0.05 level, though investigators noted that this borderline finding warranted cautious interpretation and consideration of the confidence interval."
A range of values calculated from study data that is expected to contain the true treatment effect with a specified probability, typically 95%, providing information about both the estimated effect size and the precision of that estimate.
A statistical analysis strategy that includes all randomized participants in the groups to which they were originally assigned, regardless of whether they completed the study treatment or adhered to the protocol.
A planned statistical analysis conducted before all participants have completed the study, typically to evaluate accumulating data for evidence of efficacy, futility, or safety concerns that might warrant early termination of the trial.
A statistical analysis that includes only participants who completed the study according to protocol requirements, without major protocol violations, adequate treatment exposure, and complete outcome assessments.
The primary endpoint is the main outcome measure used to evaluate whether the treatment hypothesis is supported and forms the basis for regulatory approval decisions, while secondary endpoints provide supportive evidence and characterize additional treatment effects.
ICH E9(R1) Section 5.7
The probability of obtaining results at least as extreme as those observed in the study, assuming that the null hypothesis of no treatment effect is true.
The p-value serves as a measure of statistical evidence against the null hypothesis, indicating how likely the observed results would be if there were truly no treatment effect. A small p-value suggests that the observed data are unlikely under the null hypothesis, providing evidence in favor of a treatment effect. By convention, p-values below 0.05 are typically considered statistically significant, though this threshold is arbitrary and should be interpreted in context with effect sizes, confidence intervals, and clinical relevance.
Proper interpretation of p-values requires understanding what they do and do not represent. The p-value is not the probability that the null hypothesis is true, nor is it the probability that the observed result occurred by chance alone. Rather, it represents the probability of obtaining results as extreme or more extreme than those observed, calculated under the assumption that the null hypothesis is true. A p-value of 0.03 means there is a 3% probability of seeing results this extreme if no treatment effect exists, not that there is a 3% probability that the treatment does not work.
Clinical trials typically pre-specify the significance level, often 0.05, which defines the threshold for declaring statistical significance. When p-values fall below this threshold, the null hypothesis is rejected in favor of the alternative hypothesis of a treatment effect. However, statistical significance should not be conflated with clinical significance; a treatment may produce statistically significant but clinically trivial effects, particularly in large trials with high precision. Conversely, clinically meaningful effects may fail to reach statistical significance in underpowered studies.
Significant result
"The primary analysis yielded a p-value of 0.002, indicating that results this extreme would occur less than 0.2% of the time if there were no true treatment effect, providing strong statistical evidence against the null hypothesis."
Borderline result
"With a p-value of 0.048, the result was nominally statistically significant at the pre-specified 0.05 level, though investigators noted that this borderline finding warranted cautious interpretation and consideration of the confidence interval."
A range of values calculated from study data that is expected to contain the true treatment effect with a specified probability, typically 95%, providing information about both the estimated effect size and the precision of that estimate.
A statistical analysis strategy that includes all randomized participants in the groups to which they were originally assigned, regardless of whether they completed the study treatment or adhered to the protocol.
A planned statistical analysis conducted before all participants have completed the study, typically to evaluate accumulating data for evidence of efficacy, futility, or safety concerns that might warrant early termination of the trial.
A statistical analysis that includes only participants who completed the study according to protocol requirements, without major protocol violations, adequate treatment exposure, and complete outcome assessments.
The primary endpoint is the main outcome measure used to evaluate whether the treatment hypothesis is supported and forms the basis for regulatory approval decisions, while secondary endpoints provide supportive evidence and characterize additional treatment effects.