GB/T 26237.13-2023 English Version, GB/T 26237.13-2023 Information technology—Biometric data interchange formats—Part 13:Voice data (English Version)

Chinese Classification Professional Classification ICS Classification Latest News Value-added Services

Position: Chinese Standard in English/GB/T 26237.13-2023

GB/T 26237.13-2023 Information technology—Biometric data interchange formats—Part 13:Voice data (English Version)
Standard No.:	GB/T 26237.13-2023	Status:	valid remind me the status change Email:
Target Language:	English	File Format:	PDF
Word Count:	14500 words	Translation Price(USD):	435.00 remind me the price change Email:
Implemented on:	2023-12-1	Delivery:	via email in 1~3 business day

Standard No.:	GB/T 26237.13-2023
English Name:	Information technology—Biometric data interchange formats—Part 13:Voice data
Chinese Name:	信息技术生物特征识别数据交换格式第13部分：声音数据
Chinese Classification:	L71 Code, character set and character recognition
Professional Classification:	GB National Standard
ICS Classification:	35.240.15 35.240.15 Identification cards and related devices 35.240.15
Source Content Issued by:	SAMR; SAC
Issued on:	2023-05-23
Implemented on:	2023-12-1
Status:	valid
Target Language:	English
File Format:	PDF
Word Count:	14500 words
Translation Price(USD):	435.00
Delivery:	via email in 1~3 business day

Standard

GB/T 26237.13-2023 Information technology—Biometric data interchange formats—Part 13:Voice data (English Version)
Standard No.	GB/T 26237.13-2023
Status	valid
Language	English
File Format	PDF
Word Count	14500 words
Price(USD)	435.0
Implemented on	2023-12-1
Delivery	via email in 1~3 business day

Detail of GB/T 26237.13-2023

Standard No.

English Name

Chinese Name

Chinese Classification

Professional Classification

ICS Classification

Issued by

Issued on

Implemented on

Status

Superseded by

Superseded on

Abolished on

Superseding

Language

File Format

Word Count

Price(USD)

Keywords

Introduction of GB/T 26237.13-2023

GB/T 26237.13-2023 Information technology - Biometric data interchange formats-Part 13: Voice data 1 Scope This document specifies a data interchange format that can be used for storing, recording, and transmitting digitized acoustic human voice data (speech) assumed to be from a single speaker recorded in a single session. This format is designed specifically to support a wide variety of Speaker Identification and Verification (SIV)applications, both text-dependent and text-independent, with minimal assumptions made regarding the voice data capture conditions or the collection environment. Other uses for the data encapsulated in this format, such as automated speech recognition (ASR), may be possible, but are not addressed in this document. This document also does not address handling of data that has been processed to the feature or voice model levels. No application-specific requirements, equipment, or features are addressed in this document. This document supports the optional inclusion of non-standardized extended data. This document allows both the original data captured and digitally- processed (enhanced) voice data to be exchanged. A description of any processing of the original source input is intended to be included in the metadata associated with the voice representations (VRs). This document does not address data streaming. Provisions that stored and transmitted biometric data be time-stamped and that cryptographic techniques be used to protect their authenticity, integrity and confidentiality are out of the scope of this document. Information formatted in accordance with this document can be recorded on machine-readable media or can be transmitted by data communication between systems. A general content-oriented subclause describing the voice data interchange format is followed by a subclause addressing an XML schema definition. This document includes vocabulary in common use by the speech and speaker recognition community, as well as terminology from other ISO standards. 2 Normative references The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies. ISO 8601 Data elements and interchange formats- Information interchange- Representation of dates and times Note: GB/T 7408-2005 Data elements and interchange formats - Information interchange - Representation of dates and times (ISO 8601 : 2000,IDT) ISO/IEC 2382-37 Information technology- Vocabulary- Part 37 : Biometrics Note: GB/T 5271.37-2021 Information technology - Vocabulary - Part 37 : Biometrics (ISO/IEC 2382-37 :2017, MOD) ISO/IEC 19785-1 Information technology - Common Biometric Exchange Formats Framework- Part 1 : Data element specification Note: GB/T 28826.1-2012 Information technology - Common biometric exchange formats framework - Part 1: Data element specification (ISO/IEC 19785-1 : 2006, MOD) ISO/IEC 19794-1 Information technology - Biometric data interchange formats - Part 1 : Framework Note: GB/T 26237.1-2022 Information technology - Biometric data interchange formats - Part 1 : Framework (ISO/IEC 19794-1: 2011, MOD) 3 Terms and definitions For the purposes of this document, the terms and definitions in ISO/IEC 19794-1 and the following apply. 3.1 analog-to-digital converter (ADC) resolution exponent of the base 2 representation (the number of bits)of the number of discrete amplitudes that the analog-to-digital converter is capable of producing Note: Common values for ADC resolution for sound-cards are: 8, 16, 20 and 24. 3.2 audio duration duration of the complete audio containing all voice representation utterances, e.g. whole call recordings 3.3 audio encoding encoding used by the data capture subsystem, e.g.. cellphone Note 1: The voice signal is encoded before being transmitted over a channel. There are many formats in use today and the number is likely to continue to change as telephones and transmission channels evolve. Formats include PCM (ITU-T G.711) and ADPCM (ITU-T G.726) for wave encoding and ACELP (ITU-T G.723.1) and CS-ACELP (ITU-T G.729 Annex A) for AbS encoding. A-law PCM and mu-law PCM are included in ITU-T G.711. Note 2: A comprehensive overview list is provided in 7.4.3.2. 3.4 compression process that reduces the size of a digital file and, accordingly, the data rate required for transmission Note: Some audio encodings include compression and some do not. Compression is almost always “lossy” and, therefore, has an impact on the speech signal. 3.5 cut-off frequency(lower/upper) frequency (below/above)which the acoustic energy drops 3dB below the average energy in the pass band 3.6 far-field region far enough from the source where the angular field distribution is independent of the distance from the source 3.7 interactive voice response function of for a telephony based computer that is used to control the flow of telephone calls and to provide voice based self-service Note 1: Technology that allows a computer to detect voice and keypad inputs. Note 2: IVR systems deal with several real-world and constrained-content effects, such as emotional voices, varying environmental noises, recording of free speech, but also hot words (e.g.. yes, no, digits, keywords). Note 3: IVRs apply ASR for user navigation, where on secure applications SIV becomes relevant e.g., financial transactions via telefone. IVR systems may combine ASR and SIV to detect audio sample replays and detect user liveness by introducing on-time generated knowledge to the user that shall be spoken. 3.8 microphone data capture subsystem that converts the acoustic pressure wave emanating from the voice into an electrical signal 3.9 mid-field region between the near-field and the far-field which has a combination of the characteristics found in both the near-field and the far-field 3.10 near-field region in an enclosure in which the direct energy at the microphone from the primary source is greater than the reflected energy from that source 3.11 public switched telephone network channel based technology used to switch analogue signal, typically telephone calls, through a network from a source such as a telephone to a destination such as another telephone Note: Knowledge about the channel where a telephone call originates is useful because, historically, noise and other channel characteristics vary from country to country. The advent and growth of VoIP and other digital telephone networks has attenuated the impact of national telecommunications networks because they are not constrained by national boundaries. For example, a call originating in the United States might traverse Canada before arriving at its destination, which could be within the United States (see Voiceover IP). 3.12 representation duration duration of a single voice representation utterance 3.13 sampling rate number of samples per second (or per other unit) taken from a continuous signal to make a discrete signal Note 1: When the rate is per second, the unit is Hertz (Hz). Note 2: Equal to the sampling frequency. Note 3: The rate of sampling needs to satisfy the Nyquist criterion. 3.14 session single capture process that takes place over a single, continuous time period Note: In biometric systems a session can be interpreted as the time of recording one or more samples without the subject leaving the scene of the biometric capturing device, i.e. passing through a control stage/barrier infers the end of a session, while multiple rejects can occur during one session. 3.15 signal-to-encoding noise ratio SNR ratio of the pure signal of interest to the noise component that results from possible electronic noise sources Note 1: SNR(dB)=10 lg(Ps/Pn), where Ps is average signal power and Pn is average noise power, expressed as follows for digitized signals. Note 2: where N is the total number of digital samples. Note 3: Usually measured in decibels (dB). Note 4: For example, in PCM, the noise is caused by quantization and roughly calculated in Furui, Digital Speech Processing. Synthesis, and Recognition, (Dekker, 1989) as: SNR(dB)=6B-7.2 Note 5: where B is quantization bits. 3.16 speaker identification form of speaker recognition which compares a voice sample with a set of voice references corresponding to different persons to determine the one who has spoken 3.17 speaker recognition process of determining whether two speech segments were produced by the vocal mechanism of the same data subject 3.18 speaker verification speaker authentication form of speaker recognition for deciding whether a speech sample was spoken by the person whose identity was claimed Note 1: Speaker verification is used mainly to restrict access to information, facilities or premises. Note 2: Speaker verification can also be called speaker confirmation. In this document and practical application, confirmation and verification can be used interchangeably. 3.19 speaker identification and verification process of automatically recognizing individuals through voice characteristics Note : The data format itself does not depend on the application purpose (active/passive SIV). 3.20 voice speech sound produced by the vocal apparatus whilst speaking Note 1: Normally defined by phoneticians as the sound that emanates from the lips and nostrils, which comprises "voiced" and "unvoiced" sound produced by the vibration of the vocal folds and from constrictions within the vocal track and modified by the time varying acoustic transfer characteristic of the vocal tract. Note 2: For the purposes of this document, speech and voice are used interchangeably.

Contents of GB/T 26237.13-2023

Tel: +86-10-8572 5655 \| Fax: +86-10-8581 9515 \| Email: coc@codeofchina.com \| QQ: 672269886
Copyright: TransForyou Co., Ltd. 2008-2040



Keywords:
GB/T 26237.13-2023, GB 26237.13-2023, GBT 26237.13-2023, GB/T26237.13-2023, GB/T 26237.13, GB/T26237.13, GB26237.13-2023, GB 26237.13, GB26237.13, GBT26237.13-2023, GBT 26237.13, GBT26237.13