ICH E6(R3) Section 5.5
The process of detecting, correcting, and resolving inaccurate, incomplete, or inconsistent data in the clinical trial database to ensure data quality and reliability for analysis.
Data cleaning encompasses all activities undertaken to ensure that clinical trial data are accurate, complete, consistent, and suitable for analysis. This process identifies potential errors through programmed validation checks and manual review, investigates discrepancies by querying site personnel, and implements corrections to resolve confirmed errors. Data cleaning occurs throughout the trial as data accumulate and intensifies as the study approaches database lock, when all outstanding issues must be resolved before the database can be finalized.
Effective data cleaning relies on well-designed validation specifications that define the checks to be performed on the data. Range checks identify values that fall outside clinically plausible limits. Consistency checks verify logical relationships between data points, such as ensuring that randomization dates fall after informed consent dates. Completeness checks identify required fields that are missing data. Cross-form checks verify consistency across different parts of the database, such as ensuring that adverse events reported as ongoing at one visit are still documented at subsequent visits. These programmatic checks complement manual review by medical monitors and data managers.
The data cleaning process must be balanced against practical considerations including site burden, timeline requirements, and the relative importance of different data elements. Not all discrepancies warrant queries, and excessive querying can strain site relationships and divert resources from more important activities. Risk-based approaches focus data cleaning efforts on data most critical to safety evaluation and primary endpoint analysis, while applying lighter touch review to less critical elements. Documentation of the data cleaning approach, specifications, and activities performed supports regulatory compliance and enables evaluation of data quality at database lock.
Programmatic cleaning
"The data cleaning specifications included over 500 programmed edit checks that were run weekly during the conduct phase, automatically generating queries for values outside expected ranges, logical inconsistencies, and missing required data."
Medical review
"As part of data cleaning, the medical monitor reviewed all serious adverse events to ensure clinical consistency, verify that all required fields were complete, and confirm that the narrative descriptions aligned with the coded terms and severity assessments."
A CDISC standard that defines the structure and content of analysis-ready datasets derived from SDTM data, supporting efficient generation of statistical analyses and displays for regulatory submissions.
A secure, computer-generated, time-stamped electronic record that automatically captures the creation, modification, or deletion of data, including the identity of the operator and the date and time of the action.
An international nonprofit organization that develops and supports global data standards for clinical research, enabling consistent and efficient exchange of clinical trial information.
The degree to which data are complete, consistent, accurate, trustworthy, and reliable throughout the data lifecycle.
The formal process of making the clinical trial database unmodifiable once all data have been entered, reviewed, cleaned, and verified, marking the transition from data collection to statistical analysis.
ICH E6(R3) Section 5.5
The process of detecting, correcting, and resolving inaccurate, incomplete, or inconsistent data in the clinical trial database to ensure data quality and reliability for analysis.
Data cleaning encompasses all activities undertaken to ensure that clinical trial data are accurate, complete, consistent, and suitable for analysis. This process identifies potential errors through programmed validation checks and manual review, investigates discrepancies by querying site personnel, and implements corrections to resolve confirmed errors. Data cleaning occurs throughout the trial as data accumulate and intensifies as the study approaches database lock, when all outstanding issues must be resolved before the database can be finalized.
Effective data cleaning relies on well-designed validation specifications that define the checks to be performed on the data. Range checks identify values that fall outside clinically plausible limits. Consistency checks verify logical relationships between data points, such as ensuring that randomization dates fall after informed consent dates. Completeness checks identify required fields that are missing data. Cross-form checks verify consistency across different parts of the database, such as ensuring that adverse events reported as ongoing at one visit are still documented at subsequent visits. These programmatic checks complement manual review by medical monitors and data managers.
The data cleaning process must be balanced against practical considerations including site burden, timeline requirements, and the relative importance of different data elements. Not all discrepancies warrant queries, and excessive querying can strain site relationships and divert resources from more important activities. Risk-based approaches focus data cleaning efforts on data most critical to safety evaluation and primary endpoint analysis, while applying lighter touch review to less critical elements. Documentation of the data cleaning approach, specifications, and activities performed supports regulatory compliance and enables evaluation of data quality at database lock.
Programmatic cleaning
"The data cleaning specifications included over 500 programmed edit checks that were run weekly during the conduct phase, automatically generating queries for values outside expected ranges, logical inconsistencies, and missing required data."
Medical review
"As part of data cleaning, the medical monitor reviewed all serious adverse events to ensure clinical consistency, verify that all required fields were complete, and confirm that the narrative descriptions aligned with the coded terms and severity assessments."
A CDISC standard that defines the structure and content of analysis-ready datasets derived from SDTM data, supporting efficient generation of statistical analyses and displays for regulatory submissions.
A secure, computer-generated, time-stamped electronic record that automatically captures the creation, modification, or deletion of data, including the identity of the operator and the date and time of the action.
An international nonprofit organization that develops and supports global data standards for clinical research, enabling consistent and efficient exchange of clinical trial information.
The degree to which data are complete, consistent, accurate, trustworthy, and reliable throughout the data lifecycle.
The formal process of making the clinical trial database unmodifiable once all data have been entered, reviewed, cleaned, and verified, marking the transition from data collection to statistical analysis.