Regístrese
Buscar en
Acta Otorrinolaringológica Española
Toda la web
Inicio Acta Otorrinolaringológica Española Acoustic Voice Analysis Using the Praat programme: Comparative Study With the Dr...
Journal Information
Vol. 65. Issue 3.
Pages 170-176 (May - June 2014)
Visits
6361
Vol. 65. Issue 3.
Pages 170-176 (May - June 2014)
Original article
DOI: 10.1016/j.otoeng.2014.05.007
Full text access
Acoustic Voice Analysis Using the Praat programme: Comparative Study With the Dr. Speech Programme
Análisis acústico de la voz mediante el progama Praat: estudio comparativo con el programa Dr. Speech
Visits
6361
Faustino Núñez Batallaa,??
Corresponding author
fnunezb@telefonica.net

Corresponding author.
, Rocío González Márqueza, M. Belén Peláez Gonzálezb, Irene González Labordab, María Fernández Fernándezb, Marta Morato Galána
a Servicio de Otorrinolaringología, Hospital Universitario Central de Asturias, Oviedo, Spain
b Grado de Logopedia, Facultad de Psicología, Universidad de Oviedo, Oviedo, Spain
Article information
Abstract
Full Text
Bibliography
Download PDF
Statistics
Figures (2)
Tables (3)
Table 1. Descriptive Statistics for Both Programmes.
Table 2. Statistical Results.
Table 3. Descriptive Spectrographic Statistics: Yanagihara Grades and Subharmonics.
Show moreShow less
Abstract
Introduction and objectives

The European Laryngological Society (ELS) basic protocol for functional assessment of voice pathology includes 5 different approaches: perception, videostroboscopy, acoustics, aerodynamics and subjective rating by the patient. In this study we focused on acoustic voice analysis.

The purpose of the present study was to correlate the results obtained by the commercial software Dr. Speech and the free software Praat in 2 fields:

1. Narrow-band spectrogram (the presence of noise according to Yanagihara, and the presence of subharmonics) (semi-quantitative).

2. Voice acoustic parameters (jitter, shimmer, harmonics-to-noise ratio, and fundamental frequency) (quantitative).

Materials and methods

We studied a total of 99 voice samples from individuals with Reinke's oedema diagnosed using videostroboscopy. One independent observer used Dr. Speech 3.0 and the second one used the Praat programme (Phonetic Sciences, University of Amsterdam).

The spectrographic analysis consisted of obtaining a narrow-band spectrogram from the previous digitalised voice samples by the 2 independent observers. They then determined the presence of noise in the spectrogram, using the Yanagihara grades, as well as the presence of subharmonics. As a final result, the acoustic parameters jitter, shimmer, harmonics-to-noise ratio and fundamental frequency were obtained from the 2 acoustic analysis programmes.

Results

The results indicated that the sound spectrogram and the numerical values obtained for shimmer and jitter were similar for both computer programmes, even though type 1, 2 and 3 voice samples were analysed.

Conclusions

The Praat and Dr. Speech programmes provide similar results in the acoustic analysis of pathological voices.

Keywords:
Sound spectrogram
Acoustic analysis
Praat
Dr. Speech
Resumen
Introducción y objetivos

El protocolo de la European Laringological Society (ELS) para la valoración funcional de la disfonía incluye 5 dimensiones: percepción, análisis acústico, videoestroboscopia, aerodinámica y autovaloración del paciente.

El objetivo de este trabajo es correlacionar los resultados obtenidos con el programa comercial Dr. Speech con los obtenidos con el programa gratuito Praat en 2 ámbitos:

1. Espectrograma de banda estrecha (presencia de ruido según Yanagihara y presencia de subarmónicos) (semicuantitativo).

2. Parámetros acústicos de la voz (jitter, shimmer, relación armónico-ruido, frecuencia fundamental) (cuantitativo).

Material y métodos

Se estudiaron un total de 99 muestras de voz diagnosticadas mediante videoestroboscopia de edema de Reinke. En este estudio un observador independiente utilizó el Dr. Speech 3.0 y otro el Praat (Phonetic Sciences, University of Amsterdam).

El análisis espectrográfico consistió en obtener un espectrograma de banda estrecha a partir de las anteriores voces digitalizadas por parte de los 2 observadores independientes. Después determinaron la presencia de ruido en el espectrograma siguiendo los grados de Yanagihara y la presencia de subarmónicos. Por último, se obtuvieron los siguientes parámetros acústicos: jitter, shimmer, relación armónico-ruido (HNR) y el valor de la frecuencia fundamental (Fo).

Resultados

Los resultados indican que el espectrograma y el parámetro de perturbación de la frecuencia jitter son comparables en los 2 programas. También es comparable el parámetro de perturbación de la amplitud shimmer, a pesar de haber analizado tanto voces de tipo 1, como de tipo 2 y de tipo 3.

Conclusiones

Los programas Praat y Dr. Speech ofrecen similares resultados en el análisis acústico de las voces patológicas.

Palabras clave:
Espectrograma
Análisis acústico
Praat
Dr. Speech
Full Text
Introduction

Acoustic voice analysis based on perturbation measures has been the object of a long debate, especially with regard to its validity and fundamentally with the validity of criteria related to perceptual assessment, a point of reference for evaluating vocal quality. Numerous studies have shown the relationship that parameters of perturbation have with the perceptual correlates of dysphonia that are assessed in agreement with the grade-roughness-breathiness-asthenia-strain (GRBAS) scale.1–3 It has also been shown that these parameters make it possible to document the severity of a dysphonia without having to demonstrate the usefulness of etiological diagnosis of the vocal disorder.4,5

In spite of this being a matter of intense research activity, extending the use of these parameters on a routine basis in clinical practice has not been achieved. One of the reasons is undoubtedly the cost of voice analysis systems and programmes. However, we are currently observing the development of free computer applications that can be used for this purpose. One of the programmes with the widest diffusion is the Praat, in principle designed for uses related to instrumental phonetics, but that has great capabilities of analysing acoustic signals and spectrography. We present a comparison of the results of acoustic perturbation and spectrographic analysis between a commercial programme and the Praat, utilising the same dysphonic voice recordings; our objective was to ascertain if there were any differences between these 2 programmes and provide evidence to back the application of the Praat in clinical practice, so as to extend acoustic voice analysis in daily practice.

Materials and MethodsVoice Samples

A total of 99 voice samples were studied, corresponding to an equal number of patients, diagnosed with Reinke's oedema using videostroboscopy.

Recording

The acoustic signal was recorded using the Voice Assessment application of the programme Dr. Speech 3.0 for Windows 95. The computer used was a compatible personal PC Pentium-100 with a RAM of 16Mb. To digitalize the vocal signal, a Windows-compatible sound card with 16-bit resolution and recording frequency of 44,100Hz (Sound Blaster 16) was installed. The microphone was unidirectional and dynamic. Sample frequency was 44,100Hz and we used a high frequency resolution microphone placed 10cm from the patient's mouth while he or she uttered the vowel /e/ at comfortable intensity and tones in a soundproof chamber. The computer captured 3s of the utterances of the vowel. The recommendations of the National Center for Voice and Speech were followed.6

Spectrographic Analysis

Spectrographic analysis consisted of making a narrow band spectrogram with both programmes, based on the digitalized voices that were handed out to each observer; these voices were classified according to the Yanagihara score7 and the presence or absence of subharmonics.8

Perturbation Analysis

For the acoustic analysis, the values for jitter, shimmer, harmonics-to-noise ratio (HNR) and fundamental frequency (F0) provided by each programme were obtained. In this study mean relative jitter and mean percent shimmer are considered.

Acoustic Analysis Programmes

Dr. Speech Sciences for Windows (Tiger Electronics Inc.) is a set of programmes specially designed for use in compatible computers, created for voice analysis and rehabilitation in clinical practice. It functions in a Windows environment and can be installed in any laptop that complies with its requisites. It includes a series of programmes or modules: voice assessment; speech analysis; electroglottography (EGG) assessment; clinical development monitoring (clinical progress tracking); speech training; voice synthesis and therapy; wave generator; and phonetogram. In this study we used the first 2 modules. The first was the voice assessment module, which makes it possible to calculate 5 vocal parameters–fundamental frequency (F0), jitter, shimmer, glottic noise and standard deviation of the F0. It allows you to obtain numerical values and a simple graph of the analysis results compared with a normal pattern. This in turn makes it possible to assess the degree of dysphonia in visual form quickly. The second module was that of speech analysis, which (using 2 windows) lets you visualise a phrase in the form of a wave or oscillogram and analyse a wide- or narrow-band spectrogram of the entire phrase or of a chosen segment. Likewise, the programme is capable of drawing the formats on the spectrogram.

The Praat programme is a tool for phonetic speech analysis developed by Paul Boersma and David Weenink in the Institute of Phonetic Sciences of the University of Amsterdam.9 You open the programme and select Open in the upper menu in Praat Objects and then you select Read from file to choose the recorded voice file previously guarded in a file, you select the file that you wish to analyse and it will appear in Objects.

To obtain the F0 and the analysis of the perturbation, jitter, shimmer and HNR in the window Praat Objects, you select View&Edit from the column on the right. Select Pulses and then Show Pulses, which makes the glottic pulses appear on the oscillogram. Next, you choose a part of the voice file or all of it; in the upper menu you select Pulses and then Voice Report. A new window Praat Info then appears, with the data for the F0, jitter, shimmer and HNR, among other parameters. The parameters chosen for comparison with that of programme Dr. Speech are Median Pitch (Hz), Jitter (rap %), Shimmer (apq 5%) and Mean harmonics-to-noise ratio (dB).

Statistical Analysis

Continuous variables are described with median and standard deviation (acoustic analysis parameters) and categorical variables are described using relative frequencies (spectrogram parameters).

To study the difference between continuous variables, the intraclass correlation index (ICI) was used; this lets you see the degree of concordance between the measurements. This index is considered as “good correlation” when the value obtained is more than 0.8.

With the categorical variables, the kappa index was used to rule out random coincidences (if it is +1, there is complete agreement; if it is −1, there is complete disagreement; if it is 0, it indicates complete independence).

The 2 samples that indicated results for perturbation analysis that were over 5% were eliminated from the statistical study because they were considered unreliable.10,11

The statistical analysis (after preparing a database) was handled using SPSS version 15.0 for Windows (SPSS Inc., Chicago, IL).

ResultsAcoustic Analysis

There was an elevated ICI value in all of the variables studied, with the minimum being that obtained for F0 (0.740) and the maximum being that obtained for shimmer (0.903). The differences obtained between the results provided by both the programmes were not significant, except for the case of the variable jitter (P=.005). However, this difference could be a result of how high the correlation between the 2 programmes was, which would make minimal differences in the results significant. It could also stem from using different algorithms for each programme to calculate this parameter.

The results obtained in the acoustic voice analysis can be seen in Tables 1 and 2, as well as in Figs. 1 and 2.

Table 1.

Descriptive Statistics for Both Programmes.

  Mean  Standard deviation  Maximum  Minimum 
Jitter (DS), %  0.72  1.0  5.7  0.0 
Jitter (P), %  0.59  0.7  17.7  0.1 
Shimmer (DS), %  3.8  3.4  23.7  0.0 
Shimmer (P), %  3.8  3.4  21.9  0.7 
HNR (DS)  20.3  6.4  31.8  0.0 
HNR (P)  20.3  6.1  30.4  1.8 
F0 (DS), Hz  175.1  48.1  289.9  86.5 
F0 (P), Hz  169.9  49.2  282.6  84.4 

DS: Dr. Speech; F0: fundamental frequency; HNR: harmonic-noise ratio; Hz: hertz; P: Praat.

Table 2.

Statistical Results.

  Dr. SpeechPraatP value  ICI (CI 95%) 
  Media  SD  Mean  SD     
Jitter (%)  0.722  1.0  0.595  0.786  .005  0.856 (0.2–0.7) 
Shimmer (%)  3.810  3.433  3.820  3.412  .926  0.903 (0.7–0.9) 
HNR  20.35  6.430  20.39  6.152  .911  0.784 (0.7–0.8) 
F0 (Hz)  175.15  48.10  169.95  49.26  .092  0.740 (0.6–0.9) 

CI: confidence interval; F0: fundamental frequency; HNR: harmonic-noise ratio; ICI: intraclass interval; SD: standard deviation.

Figure 1.

Correlation of the several variables in the concordance analysis in the acoustic analysis.

(0.22MB).
Figure 2.

Bland–Altman graphs for the acoustic variables, showing the concordance between the data obtained using both the programmes.

(0.16MB).
Spectrographic Analysis

Table 3 presents the descriptive statistics of the spectrographs of both the programmes.

Table 3.

Descriptive Spectrographic Statistics: Yanagihara Grades and Subharmonics.

  Grade I (%)  Grade II (%)  Grade III (%)  Grade IV (%)  Subharmonics (%) 
Dr. Speech  41.1  25.2  24.3  7.5  38.3 
Praat  40.2  26.2  23.4  6.5  41.7 

In the classification of the voices in the Yanagihara scale, there was concordance between both observers in 94 of the 99 cases (96%), with a kappa index of 0.940 (P=.03). This result indicates almost complete correlation between both observations.

In addition, the analysis of subharmonics showed coincidence in 93 cases (94%), with a kappa index of 0.873 (P=.05).

Discussion

This study reveals the similarities between a commercial programme (Dr. Speech, Tiger Electronics) and a free acoustic voice analysis programme (Praat) by means of analysing 3-s vocal samples of the sustained vowel /e/ obtained from 99 patients diagnosed with Reinke's oedema.

There are antecedents of studies that have analysed the differences in the results of perturbation measurements between acoustic analysis programmes.12–14 Our study also tackles spectrography, analysing the results of the classification of noise and of the presence of subharmonics between both the programmes.

In our work jitter presented a lower correlation between both the programmes. This is a finding that has also been verified in other studies, in which it is observed that the perturbation measurements present less significant correlations between the programmes, even though there are strong similarities in the results for fundamental frequency, especially in those of frequency perturbation. The fact that the amplitude perturbations have better correlations than those of frequency can be explained to be due to jitter being much more dependent on the exact placement of the wave limits than shimmer. While minimal errors in the location of the wave limits add intense noise to the frequency perturbation measurements, the effect of such errors represents less detriment in the amplitude perturbation measurements, given that they are generally not of enough magnitude to eliminate the cycle peak completely.12,14 In this way it is explained that studies that compare results obtained by different programmes, among which is included the present one, find weak or moderate correlations in frequency perturbation, and moderate or strong ones in amplitude perturbation. This fact makes it necessary to study a series of healthy individuals to establish standard values for the Praat programme, a task that has not been performed to date.15

After studying the perturbation measurements, we investigated the similarities and differences between both the programmes in spectrography. The narrow-band spectrographic tracings were analysed in accordance with the Yanagihara classification and observing the presence of subharmonics. It should be emphasised that there was high concordance between observers in the spectrographic analysis with the 2 programmes, both in the Yanagihara scale (with almost complete correlation) and the subharmonics. The results showed almost complete coincidence. Consequently, it can be concluded that the spectrographic analysis is absolutely comparable between the different programmes. This finding was expected, because the programmes produce an image or a spectrogram that requires evaluation by an examiner to interpret it, and not a mathematical algorithm that yields a numeric result.

Having this advantage in spectrography makes comparisons between the results of different programmes possible, as long as the evaluation criteria for the figures are the same. It should be taken into consideration that acoustic voice analysis has to include this technique without exception for studying voices that exceed 5% of frequency or amplitude perturbation (voices that correspond to Titze type II). According to this author6 it is useful to classify, first of all, voices into 3 types: type 1 voices are practically periodic; type 2 voices contain aperiodicity, subharmonics or voice breaks; and type 3 voices are chaotic. Consequently, this author recommends starting the assessment of the pathological voice by performing a spectrographic analysis to establish the most suitable study methods for each specific case, and the spectrograms obtained with Dr. Speech and the Praat programme serve this purpose.

Short-term perturbation measurements are not reliable if the voices contain intermittencies, strong subharmonics or modulations.16 Therefore, type 2 and type 3 voices can only be studied using a perceptual classification method (GRBAS) and by a visual method such as the spectrogram. No matter how pathological the voice is, there will always be a figure in which we can see noise, harmonics, subharmonics and vacant signal segments represented.8

In contrast, type 1 voices can indeed be analysed using the short-term perturbation parameters (jitter, shimmer and HNR) reliably.16

With this study on the correlation of the results obtained using 2 acoustic analysis programmes, 1 commercial (Dr. Speech) and the other free (Praat), we have attempted to support the use of the latter with evidence. This would mean benefiting from some of the following advantages:

Dr. Speech is designed for the operating systems Windows 95/NT/98/2000/XP and cannot be used with other operating systems. In contrast, the Praat programme can be used with Windows and Macintosh, the free Linux operating system and with other systems such as FreeBSD, SGI, Solaris and HPUX. This makes it easy to install in any equipment, without the need to have a specific operating system available. The Praat is an open source software or OSS; that is, it is licensed in such a way that the authors display the source code openly, the algorithms used in the programme for each parameter are of public domain and there are no author's rights. This allows the users to utilise, change and improve the software. In this way, thanks to the collaboration between authors and users, this type of programme is developed more quickly than a commercial programme. The Praat programme is free, so it is available for all the voice professionals, in institutions or private offices. According to the study by Rodríguez-Parra et al.,17 62% of Spanish ORL services do not have a voice laboratory available, in spite of the fact that it is currently considered essential in assessing and treating patients with vocal problems and for clinical research on vocal disorders.18

The vocal function is multidimensional3 and its assessment consequently needs to be multidimensional as well. It should include perception, acoustic analysis, videostroboscopy, aerodynamics and patient self-evaluation.19 Thus, having reliable free programmes available should favour their use in both public institutions and private offices with restricted budgets.

This is important above all for self-employed professionals. The number of patients with voice disorders who usually go to a private speech therapy office compared to those with other illness does not generally justify economic investment in a commercial programme. However, the existence of free programmes that are valid, reliable, have minimum equipment requirements and are easily managed contributes to improving the quality of patient care.

Derived from this study, and to be able to use the Praat programme in a clinical setting, it is necessary to study series of healthy individuals as well with the objective of establishing its standard values, a task that has not yet been carried out.

Conclusions

The spectrogram obtained with the Praat programme is comparable to that obtained with the Dr. Speech programme.

There were weak or moderate correlations in frequency perturbation, and moderate or strong correlations in amplitude perturbation.

Conflict of Interest

The authors have no conflicts of interests to declare.

References
[1]
L. Eskenazi, D.G. Childers, D.M. Hicks.
Acoustic correlates of vocal quality.
J Speech Hear Res, 33 (1990), pp. 298-306
[2]
P.H. Dejonckere, M. Remacle, E. Fresnel-Elbaz, V. Woisard, L. Crevier-Buchman, B. Millet.
Differentiated perceptual evaluation of pathological voice quality: reliability and correlations with acoustic measurements.
Rev Laryngol Otol Rhinol, 117 (1996), pp. 219-224
[3]
M. Hirano.
Clinical examination of voice.
Springer, (1981),
[4]
J. Kreiman, B. Gerratt.
Measuring vocal quality.
Voice quality measurement, pp. 73-101
[5]
K. Werth, D. Voigt, M. Döllinger, U. Eysholdt, J. Lohscheller.
Clinical value of acoustic voice measures: a retrospective study.
Eur Arch Otorhinolaryngol, 267 (2010), pp. 1261-1271
[6]
I.R. Titze, National Center for Voice and Speech.
Workshop on acoustic voice analysis. Summary statement,
[7]
N. Yanagihara.
Significance of harmonic changes and noise components in hoarseness.
J Speech Hear Res, 10 (1967), pp. 531-541
[8]
N. Núñez Batalla, C. Suarez Nieto.
Espectrografía clínica de la voz.
Universidad de Oviedo. Servicio de Publicaciones, (1999),
[9]
P. Boersma, D. Weenink.
Phonetic sciences.
University of Amsterdam, (2013),
[10]
I.R. Titze, H. Liang.
Comparison of F0 extraction methods for high-precision voice perturbation measurements.
J Speech Hear Res, 36 (1993), pp. 1120-1133
[11]
S.N. Awan, S.E. Scarpino.
Measures of vocal F0 from continuous speech samples: an interprogram comparison.
J Speech Lang Pathol Audiol, 28 (2004), pp. 122-131
[12]
S. Bielamowicz, J. Kreiman, B.R. Gerratt, M.S. Dauer, G.S. Berke.
Comparison of voice analysis systems for perturbation measurement.
J Speech Hear Res, 39 (1993), pp. 126-134
[13]
M.P. Karnell, K.D. Hall, K.L. Landahl.
Comparison of fundamental frequency and perturbation measurements among three analysis systems.
J Voice, 9 (1995), pp. 383-393
[14]
I. Smits, P. Ceuppens, M.S. de Bodt.
Comparative study of acoustic voice measurements by means of Dr. Speech and computerized speech lab.
[15]
Y. Maryn, P. Corthals, M. de Bodt, P. Van Cauwenberge, D. Deliyski.
Perturbation measures of voice: a comparative study between multi-dimensional voice program and Praat.
Folia Phoniatr Logop, 61 (2009), pp. 217-226
[16]
F. Núñez Batalla, P. Santos Corte, G. Sequeiros Santiago, B. Señaris González, C. Suárez Nieto.
Evaluación perceptual de la disfonía: correlación con los parámetros acústicos y fiabilidad.
Acta Otorrinolaringol Esp, 55 (2004), pp. 282-287
[17]
M.J. Rodríguez-Parra, J.C. Casado, J.A. Adrián, J.J. Buiza.
Estado actual de los servicios ORL españoles. Heterogeneidad en el manejo de los problemas de la voz.
Acta Otorrinolaringol Esp, 57 (2006), pp. 109-114
[18]
P.H. Dejonckere.
Valoración perceptual y de laboratorio de la disfonía.
Otolaryngol Clin North Am, 33 (2000), pp. 677-694
[19]
P.H. Dejonckere, L. Crevier-Buchman, J.P. Marie, M. Moerman, M. Remacle, V. Woisard.
European Research Group on the Larynx. Implementation of the European Laryngological Society (ELS) basic protocol for assessing voice treatment effect.
Rev Laryngol Otol Rhinol (Bord), 124 (2003), pp. 279-283

Please cite this article as: Núñez Batalla F, González Márquez R, Peláez González MB, González Laborda I, Fernández Fernández M, Morato Galán M. Análisis acústico de la voz mediante el progama Praat: estudio comparativo con el programa Dr. Speech. Acta Otorrinolaringol Esp. 2014;65(3):170–176.

Copyright © 2013. Elsevier España, S.L.. All rights reserved
Article options
Tools
es en pt

¿Es usted profesional sanitario apto para prescribir o dispensar medicamentos?

Are you a health professional able to prescribe or dispense drugs?

Você é um profissional de saúde habilitado a prescrever ou dispensar medicamentos

es en pt
Política de cookies Cookies policy Política de cookies
Utilizamos cookies propias y de terceros para mejorar nuestros servicios y mostrarle publicidad relacionada con sus preferencias mediante el análisis de sus hábitos de navegación. Si continua navegando, consideramos que acepta su uso. Puede cambiar la configuración u obtener más información aquí. To improve our services and products, we use "cookies" (own or third parties authorized) to show advertising related to client preferences through the analyses of navigation customer behavior. Continuing navigation will be considered as acceptance of this use. You can change the settings or obtain more information by clicking here. Utilizamos cookies próprios e de terceiros para melhorar nossos serviços e mostrar publicidade relacionada às suas preferências, analisando seus hábitos de navegação. Se continuar a navegar, consideramos que aceita o seu uso. Você pode alterar a configuração ou obter mais informações aqui.