Codeofchina.com is in charge of this English translation. In case of any doubt about the English translation, the Chinese original shall be considered authoritative.
This standard is developed in accordance with the rules given in GB/T 1.1-2009.
Attention is drawn to the possibility that some of the elements of this standard may be the subject of patent rights. The issuing body of this document shall not be held responsible for identifying any or all such patent rights.
This standard was proposed by and is under the jurisdiction of the National Technical Committee on Information Security of Standardization Administration of China (SAC/TC 260).
Introduction
In the era of big data, cloud computing and the Internet of Everything, data-based applications are increasingly widespread, which also brings huge personal information security problems. In order to protect the personal information security and promote the sharing of data, this guide for de-identifying personal information is formulated.
The purpose of this standard is to learn from the latest research results of personal information de-identifying at home and abroad, refine the current best practices in the industry, study the objectives, principles, techniques, models, processes and organizational measures of personal information de-identifying, and put forward a guide to de-identifying personal information that can scientifically and effectively resist security risks and meet the needs of information development.
The data set to be de-identified concerned by this standard is microdata (the data set represented by record set that may be represented logically in tabular form). De-identification is not only deleting or transforming the direct identifier and quasi-identifier in the data set, but also considering the risk of re-identification of the data set in combination with the later application scenarios, so as to select the appropriate de-identification models and technical measures and implement the appropriate effect assessment.
Data sets that are not microdata may be converted into microdata for processing, and may also be processed with reference to the objectives, principles and methods of this standard. For example, for tabular data, if there are multiple records about one person, multiple records may be combined into one, thus forming microdata, in which there is only one record of the same person.
Information security technology — Guide for de-identifying personal information
1 Scope
This standard describes the objectives and principles of personal information de-identification, and puts forward the de-identification process and management measures.
This standard provides specific personal information de-identification guidance for microdata, which is applicable for organizations to implement the personal information de-identification, as well as the supervision, management and assessment of personal information security implemented by relevant network security authorities and third-party assessment agencies, etc.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
GB/T 25069-2010 Information security technology — Glossary
3 Terms and definitions
For the purposes of this document, the terms and definitions given in GB/T 25069-2010 and the following apply.
3.1
personal information
various information recorded electronically or otherwise that can, either alone or in combination with other information, identify a particular natural person or reflect the activity of such a person
[GB/T 35273-2017, 3.1]
3.2
personal information subject
the natural person identified by personal information
[GB/T 35273-2017, 3.3]
3.3
de-identification
process of processing personal information in technical terms so that the personal data subject cannot be identified without additional information
[GB/T 35273-2017, 3.14]
Note: Remove the correlation between identifier and personal information subject.
3.4
microdata
structured data set, in which each record (row) corresponds to a personal information subject, and each field (column) in the record corresponds to an attribute
3.5
aggregate data
data representing a set of personal information subject
Note: For example, a set of various statistical values.
3.6
identifier
one or more attributes in microdata that may uniquely identify the personal information subject
Note: Identifiers are classified into direct identifiers and quasi-identifiers.
3.7
direct identifier
attribute in microdata that can identify the personal information subject independently under specific circumstances
Note 1: Specific environment refers to the specific scenario where personal information is applied. For example, in a specific school, a specific student may be directly identified by his or her student number.
Note 2: Common direct identifiers include name, ID card number, passport number, driver's license number, address, email address, telephone number, fax number, bank card number, license plate number, vehicle identification number, social insurance number, health card number, medical record number, equipment identifier, biometric code, Internet Protocol (IP) address number and network universal resource locator (URL).
3.8
quasi-identifier
attribute in microdata that may uniquely identify the personal information subject in combination with other attributes
Note: Common quasi-identifiers include gender, date of birth or age, date of event (e.g., admission, operation, discharge, visit), place (such as postal code, building name, region), ethnic origin, country of birth, language, aboriginal status, visible ethnic minority status, occupation, marital status, education level, school years, criminal history, total income and religious belief, etc.
3.9
re-identification
process of re-correlating the de-identified data set to the original personal information subject or a set of personal information subjects
3.10
sensitive attribute
attribute in a data set that needs to be protected, whose leakage, modification, destruction or loss will cause harm to individuals
Note: During the potential re-identification attack, it is necessary to prevent its value from being correlated with any personal information subject.
3.11
usefulness
characteristics of data with concrete meaning and useful meaning for application
Note: De-identified data is widely used, and each application will require de-identified data to have certain characteristics to achieve the application purpose, so after de-identification, it is necessary to ensure the retention of these characteristics.
3.12
completely public sharing
public release directly through the Internet, with data hard to recall once disseminated
Note: the same as English term “The Release and Forget Model”.
3.13
controlled public sharing
data use restricted by the data use agreement
Note 1: For example, information receivers are prohibited from launching re-identification attacks on individuals in data sets, from correlating with external data sets or information, and from sharing data sets without permission.
Note 2: the same as English term “The Data Use Agreement Model”.
3.14
enclave public sharing
data sharing in a physical or virtual enclave, where data cannot flow out of the enclave
Note: the same as English term “The Enclave Model”.
3.15
de-identification technique
technique to reduce the correlation between information in data set and personal information subject
Note 1: Reduce the discrimination of information, so that information cannot correspond to a specific individual. If the discrimination is lower, it is impossible to judge whether different information corresponds to the same individual. In practice, it is often required that the number of people that a piece of information may correspond to exceeds a certain threshold.
Note 2: Disconnecting from the personal information subject means separating other personal information from identification information.
3.16
de-identification model
method of applying de-identification technique and calculating re-identification risk
4 General
4.1 De-identification objectives
The de-identification objectives include:
a) Delete or transform the direct identifier and the quasi-identifier, so as to prevent the attacker from directly identifying the original personal information subject based on these attributes or combining with other information;
b) Control the risk of re-identification, select appropriate models and techniques based on available data and application scenarios, and control the risk of re-identification within an acceptable range; ensure that the risk of re-identification will not increase with the dissemination of new data, and ensure that potential collusion between data recipients will not increase the risk of re-identification;
c) Under the premise of controlling the re-identification risk, and in combination with business objectives and data characteristics, select the appropriate de-identification model and technique to ensure that the de-identified data set meets its intended purpose (useful) as much as possible.
Foreword i
Introduction ii
1 Scope
2 Normative references
3 Terms and definitions
4 General
4.1 De-identification objectives
4.2 De-identification principles
4.3 Re-identification risks
4.4 De-identification impact
4.5 Impact of different types of public sharing on de-identification
5 De-identification process
5.1 General
5.2 Determination of objectives
5.3 Identifying the identification
5.4 Processing the identification
5.5 Verification and approval
5.6 Monitoring and reviewing
6 Role responsibilities and personnel management
6.1 Role responsibilities
6.2 Personnel management
Annex A (Informative) Common de-identification techniques
Annex B (Informative) Common de-identification models
Annex C (Informative) Selection of de-identification model and technique
Annex D (Informative) Challenges to de-identification
Bibliography
Codeofchina.com is in charge of this English translation. In case of any doubt about the English translation, the Chinese original shall be considered authoritative.
This standard is developed in accordance with the rules given in GB/T 1.1-2009.
Attention is drawn to the possibility that some of the elements of this standard may be the subject of patent rights. The issuing body of this document shall not be held responsible for identifying any or all such patent rights.
This standard was proposed by and is under the jurisdiction of the National Technical Committee on Information Security of Standardization Administration of China (SAC/TC 260).
Introduction
In the era of big data, cloud computing and the Internet of Everything, data-based applications are increasingly widespread, which also brings huge personal information security problems. In order to protect the personal information security and promote the sharing of data, this guide for de-identifying personal information is formulated.
The purpose of this standard is to learn from the latest research results of personal information de-identifying at home and abroad, refine the current best practices in the industry, study the objectives, principles, techniques, models, processes and organizational measures of personal information de-identifying, and put forward a guide to de-identifying personal information that can scientifically and effectively resist security risks and meet the needs of information development.
The data set to be de-identified concerned by this standard is microdata (the data set represented by record set that may be represented logically in tabular form). De-identification is not only deleting or transforming the direct identifier and quasi-identifier in the data set, but also considering the risk of re-identification of the data set in combination with the later application scenarios, so as to select the appropriate de-identification models and technical measures and implement the appropriate effect assessment.
Data sets that are not microdata may be converted into microdata for processing, and may also be processed with reference to the objectives, principles and methods of this standard. For example, for tabular data, if there are multiple records about one person, multiple records may be combined into one, thus forming microdata, in which there is only one record of the same person.
Information security technology — Guide for de-identifying personal information
1 Scope
This standard describes the objectives and principles of personal information de-identification, and puts forward the de-identification process and management measures.
This standard provides specific personal information de-identification guidance for microdata, which is applicable for organizations to implement the personal information de-identification, as well as the supervision, management and assessment of personal information security implemented by relevant network security authorities and third-party assessment agencies, etc.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
GB/T 25069-2010 Information security technology — Glossary
3 Terms and definitions
For the purposes of this document, the terms and definitions given in GB/T 25069-2010 and the following apply.
3.1
personal information
various information recorded electronically or otherwise that can, either alone or in combination with other information, identify a particular natural person or reflect the activity of such a person
[GB/T 35273-2017, 3.1]
3.2
personal information subject
the natural person identified by personal information
[GB/T 35273-2017, 3.3]
3.3
de-identification
process of processing personal information in technical terms so that the personal data subject cannot be identified without additional information
[GB/T 35273-2017, 3.14]
Note: Remove the correlation between identifier and personal information subject.
3.4
microdata
structured data set, in which each record (row) corresponds to a personal information subject, and each field (column) in the record corresponds to an attribute
3.5
aggregate data
data representing a set of personal information subject
Note: For example, a set of various statistical values.
3.6
identifier
one or more attributes in microdata that may uniquely identify the personal information subject
Note: Identifiers are classified into direct identifiers and quasi-identifiers.
3.7
direct identifier
attribute in microdata that can identify the personal information subject independently under specific circumstances
Note 1: Specific environment refers to the specific scenario where personal information is applied. For example, in a specific school, a specific student may be directly identified by his or her student number.
Note 2: Common direct identifiers include name, ID card number, passport number, driver's license number, address, email address, telephone number, fax number, bank card number, license plate number, vehicle identification number, social insurance number, health card number, medical record number, equipment identifier, biometric code, Internet Protocol (IP) address number and network universal resource locator (URL).
3.8
quasi-identifier
attribute in microdata that may uniquely identify the personal information subject in combination with other attributes
Note: Common quasi-identifiers include gender, date of birth or age, date of event (e.g., admission, operation, discharge, visit), place (such as postal code, building name, region), ethnic origin, country of birth, language, aboriginal status, visible ethnic minority status, occupation, marital status, education level, school years, criminal history, total income and religious belief, etc.
3.9
re-identification
process of re-correlating the de-identified data set to the original personal information subject or a set of personal information subjects
3.10
sensitive attribute
attribute in a data set that needs to be protected, whose leakage, modification, destruction or loss will cause harm to individuals
Note: During the potential re-identification attack, it is necessary to prevent its value from being correlated with any personal information subject.
3.11
usefulness
characteristics of data with concrete meaning and useful meaning for application
Note: De-identified data is widely used, and each application will require de-identified data to have certain characteristics to achieve the application purpose, so after de-identification, it is necessary to ensure the retention of these characteristics.
3.12
completely public sharing
public release directly through the Internet, with data hard to recall once disseminated
Note: the same as English term “The Release and Forget Model”.
3.13
controlled public sharing
data use restricted by the data use agreement
Note 1: For example, information receivers are prohibited from launching re-identification attacks on individuals in data sets, from correlating with external data sets or information, and from sharing data sets without permission.
Note 2: the same as English term “The Data Use Agreement Model”.
3.14
enclave public sharing
data sharing in a physical or virtual enclave, where data cannot flow out of the enclave
Note: the same as English term “The Enclave Model”.
3.15
de-identification technique
technique to reduce the correlation between information in data set and personal information subject
Note 1: Reduce the discrimination of information, so that information cannot correspond to a specific individual. If the discrimination is lower, it is impossible to judge whether different information corresponds to the same individual. In practice, it is often required that the number of people that a piece of information may correspond to exceeds a certain threshold.
Note 2: Disconnecting from the personal information subject means separating other personal information from identification information.
3.16
de-identification model
method of applying de-identification technique and calculating re-identification risk
4 General
4.1 De-identification objectives
The de-identification objectives include:
a) Delete or transform the direct identifier and the quasi-identifier, so as to prevent the attacker from directly identifying the original personal information subject based on these attributes or combining with other information;
b) Control the risk of re-identification, select appropriate models and techniques based on available data and application scenarios, and control the risk of re-identification within an acceptable range; ensure that the risk of re-identification will not increase with the dissemination of new data, and ensure that potential collusion between data recipients will not increase the risk of re-identification;
c) Under the premise of controlling the re-identification risk, and in combination with business objectives and data characteristics, select the appropriate de-identification model and technique to ensure that the de-identified data set meets its intended purpose (useful) as much as possible.
Contents of GB/T 37964-2019
Foreword i
Introduction ii
1 Scope
2 Normative references
3 Terms and definitions
4 General
4.1 De-identification objectives
4.2 De-identification principles
4.3 Re-identification risks
4.4 De-identification impact
4.5 Impact of different types of public sharing on de-identification
5 De-identification process
5.1 General
5.2 Determination of objectives
5.3 Identifying the identification
5.4 Processing the identification
5.5 Verification and approval
5.6 Monitoring and reviewing
6 Role responsibilities and personnel management
6.1 Role responsibilities
6.2 Personnel management
Annex A (Informative) Common de-identification techniques
Annex B (Informative) Common de-identification models
Annex C (Informative) Selection of de-identification model and technique
Annex D (Informative) Challenges to de-identification
Bibliography