GDC Class Notes

Class Notes of Graduate Diploma in Computing in Unitec

Data Warehouse Data Quality

Issues in source data

  • out of acceptable range
  • non-standard data/ varying formats
  • outliers, outliers is above/below 3* standard deviation
  • invalid values
  • null values
  • differing culture rules (like, nz/us time date format)
  • cosmetic
  • verification (wrong post code)

strategy

  • filter/clean/correct errors in source date before transform
  • quality assured by verifying again
  • correct and complete
  • pre-processing

Data Quality(DQ) Assurance process

  • checking
    • checking if there is an error or dirty data
    • based on pre-defined data quality rules
  • Logging
    • record the data with errors
    • notify DB users/admin to correct the data
  • Correcting
    • clean or ignore the erros in the data – data cleaning

Logging errors

DQ rule is violated, log the event into a table. (the below is the fields of the table)

  • RowID of the record that vialoates the rule a physical address the row stored on disk
  • rule no that is violated
  • table name of that row id
  • Database name of the table
  • Action -> Reject/Allow/Fix

DQ Actions

  • Reject
  • Allow (low risk error, the error is in within tolerance limits)
  • Fix (Correct the error)

Reading

  • sql server trigger