Statistical interpretation of data—
Detection and treatment of outliers in the normal sample
1 Scope
This standard is applicable to the detection and treatment of outliers in the normal sample.
2 Normative references
The following standards contain provisions which, through reference in this text, constitute provisions of this national standard. For dated references, subsequent amendments (excluding corrections) to, or revisions, of any of these publications do not apply to this standard. However, parties to agreements based on this standard are encouraged to investigate the possibility of applying the most recent editions of the normative documents indicated below. For undated references, the latest edition of the normative document referred to applies.
GB/T 4882-2001 Statistical interpretation of data—Normality tests
GB/T 19000-2000 Quality management systems—Fundamentals and vocabulary
ISO 3534-1: 2006 Statistics—Vocabulary and symbols—Part 1:General statistical terms and terms used in probability
ISO 3534-2: 2006 Statistics—Vocabulary and symbols—Part 2: Applied statistics
3 Terms, definitions and symbols
For the purposes of this standard, the terms and definitions in ISO 3534-1: 2006, ISO 3534-2: 2006, GB/T 19000-2000 and the following apply. For reference, some terms are directly quoted from the above standards.
3.1 Terms and definitions
3.1.1
outlier
one or more observed values in the sample, which are far away from other observed values, suggesting that they may come from different populations
Note: Outliers are divided into stragglers and statistical outliers according to the degree of significance.
3.1.2
statistical outlier
significant outliers by statistical test under deletion level
3.1.3
straggler
outliers that are significant at the detection level (3.1.4), but not significant at the deletion level (3.1.5)
3.1.4
detection level
significance level of statistical test specified to detect outliers
Note: Unless otherwise agreed by the parties under this standard, α shall be 0.05.
3.1.5
deletion level
significance level of statistical test specified to detect whether outliers are high
Note: The value of deletion level α* shall not the exceed the value of detection level. Unless otherwise agreed by the parties under this standard, α* shall be 0.01.
3.2 Symbols and abbreviations
n Sample size (number of observed values)
Sample mean
α Significance level used to test outliers, referred to as detection level
α* Significance level used to test statistical outliers, referred to as deletion level (α* < α)
x(i) The i-th value of observed values from small to large
σ Population standard deviation
s Sample standard deviation
Rn Statistics on Nair
R'n Statistics under Nair
Gn Statistics on Grubbs
G'n Statistics under Grubbs
Dn Statistics on Dixon
D'n Statistics under Dixon
bs Skewness statistics
bk Kurtosis statistics
4 Outlier detection
Foreword i
Introduction iii
1 Scope
2 Normative references
3 Terms, definitions and symbols
3.1 Terms and definitions
3.2 Symbols and abbreviations
4 Outlier detection
4.1 Source and detection
4.2 Three cases of outliers
4.3 Upper limit of the number of detection outliers
4.4 Single outlier case
4.5 Test rules for determining multiple outliers
5 Outlier treatment
5.1 Methods
5.2 Rules
5.3 Recording
6 Rules for detecting outliers in the case of known standard deviation
6.1 General rules
6.2 Rules for detecting outliers
6.2.1 Upper case
6.2.2 Lower case
6.2.3 Two-sided case
6.3 Example of using Nair’s test
7 Detection rules for outliers in the case of unknown standard deviation (the number of detected outliers is limited to no more than 1)
7.1 General rules
7.2 Grubbs' test
7.2.1 Upper case
7.2.2 Lower case
7.2.3 Two-sided case
7.2.4 Example of using Grubbs' test
7.3 Dixon’s test
7.3.1 One-sided case
7.3.2 Two-sided case
7.3.3 Examples of using Dixon’s test
8 Detection rules for outliers in the case of unknown standard deviation (the number of detected outliers is limited to be greater than 1)
8.1 General rules
8.2 Skewness-kurtosis test
8.2.1 Service conditions
8.2.2 One-sided case—skewness test
8.2.3 Two-sided case—kurtosis test
8.2.4 Example of repeated use of kurtosis test
8.3 Dixon’s test
8.3.1 Rules of Dixon’s test
8.3.2 Examples of repeated use of the Dixon’s test
Annex A (Normative) Table of statistical values
Annex B (Informative) Guidelines for selecting outlier detection methods and treatment rules
B.1 Purpose of determining and treating outliers
B.2 Selection of various test methods
B.3 Pay attention to the information given by the detected outliers
Annex C (Informative) Dixon's test when n > 30
Bibliography
Statistical interpretation of data—
Detection and treatment of outliers in the normal sample
1 Scope
This standard is applicable to the detection and treatment of outliers in the normal sample.
2 Normative references
The following standards contain provisions which, through reference in this text, constitute provisions of this national standard. For dated references, subsequent amendments (excluding corrections) to, or revisions, of any of these publications do not apply to this standard. However, parties to agreements based on this standard are encouraged to investigate the possibility of applying the most recent editions of the normative documents indicated below. For undated references, the latest edition of the normative document referred to applies.
GB/T 4882-2001 Statistical interpretation of data—Normality tests
GB/T 19000-2000 Quality management systems—Fundamentals and vocabulary
ISO 3534-1: 2006 Statistics—Vocabulary and symbols—Part 1:General statistical terms and terms used in probability
ISO 3534-2: 2006 Statistics—Vocabulary and symbols—Part 2: Applied statistics
3 Terms, definitions and symbols
For the purposes of this standard, the terms and definitions in ISO 3534-1: 2006, ISO 3534-2: 2006, GB/T 19000-2000 and the following apply. For reference, some terms are directly quoted from the above standards.
3.1 Terms and definitions
3.1.1
outlier
one or more observed values in the sample, which are far away from other observed values, suggesting that they may come from different populations
Note: Outliers are divided into stragglers and statistical outliers according to the degree of significance.
3.1.2
statistical outlier
significant outliers by statistical test under deletion level
3.1.3
straggler
outliers that are significant at the detection level (3.1.4), but not significant at the deletion level (3.1.5)
3.1.4
detection level
significance level of statistical test specified to detect outliers
Note: Unless otherwise agreed by the parties under this standard, α shall be 0.05.
3.1.5
deletion level
significance level of statistical test specified to detect whether outliers are high
Note: The value of deletion level α* shall not the exceed the value of detection level. Unless otherwise agreed by the parties under this standard, α* shall be 0.01.
3.2 Symbols and abbreviations
n Sample size (number of observed values)
Sample mean
α Significance level used to test outliers, referred to as detection level
α* Significance level used to test statistical outliers, referred to as deletion level (α* < α)
x(i) The i-th value of observed values from small to large
σ Population standard deviation
s Sample standard deviation
Rn Statistics on Nair
R'n Statistics under Nair
Gn Statistics on Grubbs
G'n Statistics under Grubbs
Dn Statistics on Dixon
D'n Statistics under Dixon
bs Skewness statistics
bk Kurtosis statistics
4 Outlier detection
Contents of GB/T 4883-2008
Foreword i
Introduction iii
1 Scope
2 Normative references
3 Terms, definitions and symbols
3.1 Terms and definitions
3.2 Symbols and abbreviations
4 Outlier detection
4.1 Source and detection
4.2 Three cases of outliers
4.3 Upper limit of the number of detection outliers
4.4 Single outlier case
4.5 Test rules for determining multiple outliers
5 Outlier treatment
5.1 Methods
5.2 Rules
5.3 Recording
6 Rules for detecting outliers in the case of known standard deviation
6.1 General rules
6.2 Rules for detecting outliers
6.2.1 Upper case
6.2.2 Lower case
6.2.3 Two-sided case
6.3 Example of using Nair’s test
7 Detection rules for outliers in the case of unknown standard deviation (the number of detected outliers is limited to no more than 1)
7.1 General rules
7.2 Grubbs' test
7.2.1 Upper case
7.2.2 Lower case
7.2.3 Two-sided case
7.2.4 Example of using Grubbs' test
7.3 Dixon’s test
7.3.1 One-sided case
7.3.2 Two-sided case
7.3.3 Examples of using Dixon’s test
8 Detection rules for outliers in the case of unknown standard deviation (the number of detected outliers is limited to be greater than 1)
8.1 General rules
8.2 Skewness-kurtosis test
8.2.1 Service conditions
8.2.2 One-sided case—skewness test
8.2.3 Two-sided case—kurtosis test
8.2.4 Example of repeated use of kurtosis test
8.3 Dixon’s test
8.3.1 Rules of Dixon’s test
8.3.2 Examples of repeated use of the Dixon’s test
Annex A (Normative) Table of statistical values
Annex B (Informative) Guidelines for selecting outlier detection methods and treatment rules
B.1 Purpose of determining and treating outliers
B.2 Selection of various test methods
B.3 Pay attention to the information given by the detected outliers
Annex C (Informative) Dixon's test when n > 30
Bibliography