How to Maintain Data Quality ?
Data is the backbone of every system that was in use or currently being used. The data quality needs to be maintained at every point in time. Sometimes certain discrepancies in data may have impact on the overall process of the system. At the same time, the complete reporting system will be impacted. Any analysis performed on such data would only yield inaccurate results. Ultimately it will hurt the strategies established on the basis of such data.
The bad data could be any one of these:
- Duplication of data where the data should be unique
- Inaccurate data that leads to incorrect information
- Incorrect data that might result from typing mistake or wrong references
- Incomplete data that comes from missing information
As the bad data moves from data entry towards decision making, the impact becomes bigger and bigger:
Data quality should be part of the quality policy. This will enable the team to develop a plan for this activity. Data being the integral part of every business needs serious efforts to maintain higher levels of quality.
Data Quality Process
There could be many checks and balances established in the system, still errors will creep in, & data cleaning activity needs to be performed. The data quality maintenance need a process to be followed that could be something similar to this:
The data quality process starts with analyzing the data which will help standardizing the policies to gather data. It follows with data cleaning activity followed up by establishing rules. Lastly the process should be updated so that these problems do not occur again. This whole process should be part of continuous monitoring to ensure data quality is maintained.
Data Cleaning Framework
A generic framework for data cleaning could be similar to this:
- Stage 1 is to go through data samples and identify error types
- Stage 2 is to search all instances as per the error types identified
- Stage 3 is to fix all those errors
- Stage 4 is to update the SOP for the data entry & related processes so that future occurrences could be stopped.
Data Cleaning Activity
Some of the activities to be performed during data cleaning are but not limited to:
- The data should be Validated against the established rules and constraints in the system. The constraints might be related to data type, uniqueness, mandatory or relationship related constraints.
- The data should be accurate according to the business requirements. The data in the system might also be coming from other systems which should conform as per system constrains.
- The data should be complete with all aspects and no information should be missing. For some missing data we might add rules in the system, however certain missed data cannot be replaced in any case. This could be made sure only when the data is being entered to the system itself, later on it would be hard enough to complete the missing data accurately.
- The measures should be taken for data consistency across all channels from where data is flowing in. Inconsistency is most difficult to rectify once it has been introduced in the system. To decide the correct information out of inconsistent data is sometimes not even possible. Hence this should be dealt very strictly since beginning
We are often part of these exercises, hence making them a part of process & our routines can make our systems more reliable.
NOTE: The article has also been published on LinkedIn