数仓六西格玛(Six Sigma)

About Six Sigma

https://docs.oracle.com/cd/B31080_01/doc/owb.102/b28223/concept_data_quality.htm

Warehouse Builder provides Six Sigma results embedded within the other data profiling results to provide a standardized approach to data quality.

What is Six Sigma?

Six Sigma is a methodology that attempts to standardize the concept of quality in business processes. It achieves this goal by statistically analyzing the performance of business processes. The goal of Six Sigma is to improve the performance of these processes by identifying the defects, understanding them, and eliminating the variables that cause these defects.

Six Sigma metrics give a quantitative number for the number of defects in each 1,000,000 opportunities. The term "opportunities" can be interpreted as the number of records. The perfect score is 6.0. The score of 6.0 is achieved when there are only 3.4 defects in each 1,000,000 opportunities. The score is calculated using the following formula:

  • Defects Per Million Opportunities (DPMO) = (Total Defects / Total Opportunities) * 1,000,000

  • Defects (%) = (Total Defects / Total Opportunities)* 100%

  • Yield (%) = 100 - %Defects

  • Process Sigma = NORMSINV(1-((Total Defects) / (Total Opportunities))) + 1.5

    where NORMSINV is the inverse of the standard normal cumulative distribution.

Six Sigma Metrics for Data Profiling

Six Sigma metrics are also provided for data profiling in Warehouse Builder. When you perform data profiling, the number of defects and anomalies discovered are shown as Six Sigma metrics. For example, if data profiling finds that a table has a row relationship with a second table, the number of records in the first table that do not adhere to this row-relationship can be described using the Six Sigma metric.

Six Sigma metrics are calculated for the following measures in the Data Profile Editor:

  • Aggregation: For each column, the number of null values (defects) to the total number of rows in the table (opportunities).

  • Data Types: For each column, the number of values that do not comply with the documented data type (defects) to the total number of rows in the table (opportunities).

  • Data Types: For each column, the number of values that do not comply with the documented length (defects) to the total number of rows in the table (opportunities).

  • Data Types: For each column, the number of values that do not comply with the documented scale (defects) to the total number of rows in the table (opportunities).

  • Data Types: For each column, the number of values that do not comply with the documented precision (defects) to the total number of rows in the table (opportunities).

  • Patterns: For each column, the number of values that do not comply with the common format (defects) to the total number of rows in the table (opportunities).

  • Domains: For each column, the number of values that do not comply with the documented domain (defects) to the total number of rows in the table (opportunities).

  • Referential: For each relationship, the number of values that do not comply with the documented foreign key (defects) to the total number of rows in the table (opportunities).

  • Referential: For each column, the number of values that are redundant (defects) to the total number of rows in the table (opportunities).

  • Unique Key: For each unique key, the number of values that do not comply with the documented unique key (defects) to the total number of rows in the table (opportunities).

  • Unique Key: For each foreign key, the number of rows that are childless (defects) to the total number of rows in the table (opportunities).

  • Data Rule: For each data rule applied to the data profile, the number of rows that fail the data rule to the number of rows in the table.

发布了91 篇原创文章 · 获赞 7 · 访问量 12万+

猜你喜欢

转载自blog.csdn.net/Ture010Love/article/details/102792206