logo

A DEVELOPMENTAL OVERVIEW OF VOICE AS A STEADFAST IDENTIFICATION TECHNIQUE



Abstract :

Voice Authentication technique for rhetorical sample is usually a difficult task for automatic, semiautomatic and human primarily based strategies. The speech samples being compared could also be recorded in several situations; e.g., one sample may well be a yelling over the phone, whereas the opposite may well be a whisper in an interview space. A speaker may well be disguising his or her voice, ill, or underneath the influence of medication, alcohol, or stress in one or a lot of samples. The speech samples can presumably contain noise, could also be terribly short, and will not contain enough relevant speech material for comparative functions. Every of those variables, additionally to the proverbial variability of speech normally, make reliable discrimination of speakers a sophisticated and intimidating task.

Key words:

Voice authentication, whisper, disguising voice, speech material, reliable.

  1. Inroduction:
  2. Earlier from 1660s the English emperor Charles has been dealing with voice analysis however there was no specific and scientific reason to prove the voice authenticity.[1] Forensic researchers has developed varied voice analysis techniques together with the result of pressure and psychological conditions of the soundness over voice like superimposed Voice analysis.[2] Many an times there's a state of affairs arises in court of laws concerning genuineness and scientific proof wherever wrongdoer is detected however not visible within the proof, there's AN witness claiming to be ready to determine the culprit’s voice however he's ineffectual to ensure, prove that the identification of voice is correct. [3] Deception and alterations within the voice by criminals may be a drawback Janus-faced by the investigation authorities since long.[4] currently criminals started victimization gloves to create certain regarding inaccessibility of their fingerprints to the investigation authorities and thy confirm to burn the equipment’s and traces at the crime scene to destroy deoxyribonucleic acid proof and additionally disguise their voice once hard-to-please cost by phone calls or just in case of threatening victimization audio devices like Compact discs, audiotapes etc. to beat the issues of voice identification, fashionable automatic strategies are developed to trace the voice phonates, frequencies and voice prints.[5] For rhetorical functions a group of various speech tests is additionally performed together with automobile and manual identification techniques and a combined result's wont to build the opinion regarding voice within the justice system. the most principal behind rhetorical voice analysis is individuality that's voice prints/ phonates is exclusive for each person like deoxyribonucleic acid and fingerprints. [6] Digital machine-driven techniques uses digital wave pattern of the voice to analyse the spectra dependent analysis.

  3. Background:
  4. In late thirties, the bell telephone laboratories (BTL) resulted in an invention referred to as the spectrograph, that was truly a results of the analysis, proceeded to supply aid in vocalisation coaching to the deaf and for the scholars learning foreign languages. In 1944 once world war-II, the term “voice print‟ was used for the terribly initial time. In 1962 kersta printed a piece in “the nature entitled Voice print Identification” and provided opinion in court until 1967.[7] In 1967, young and kambell has challenged the analysis of kersta concerning accuracy of the results and discovered the ends up in identification of voice prints wherever accuracy went down up to thirty eighth.[8] many textbooks on rhetorical acoustics are printed throughout the last decades, many of that ar found in university libraries, which give smart and comprehensive introductory reading. rhetorical Voice Identification by Hollien, that sketches a historical background of the sphere and covers topics like automatic speech recognition, memory and voice lineup procedures fairly non-technical and doesn't need any exhaustive phonetic information and rhetorical talker identification, significantly a lot of technical in nature addressing automatic talker identification that covers a number of the techniques used, like cepstrum analysis, in some depth. applied mathematics issues and strategies concerned in speaker verification analysis like theorem statistics and rhetorical voice comparison victimization voice supply options has gain a vide interest of the analysis just in case of machine/instrument generated voice prints thanks to the raised deception and alteration within the voice by criminals victimization varied voice stations for generating voice by creating minor modifications in pitch and base of the voice.[9] Auditive phonetic approaches in rhetorical identification of voice are expertise primarily based subjective analysis of voice quality. “Voice quality” is said speech organ voice-tract settings and to sit down with physiologically affected furthermore as voluntarily. A number of the reliable options employed in the paraxial voice identification includes: Distortion options and fundamental of the many sorts i.e. absolute normalized noise, normalized amplitude shimmer, normalized slenderness shimmer,that is said because the distinction within the height of approx. triangle shaped thanks to negative spike of the speech organ pulse. Singularities within the mucosal-wave diplomatic building power spectrum of fourteen features. [10]

  5. Standard Comparison Protocols
  6. The following protocols are maintained for positive outcome of voice samples and comparison of voice samples.

    1. Solely original recordings of voice samples are accepted for examination, unless the first recording had been erased and a high-quality copy was still offered.

    2. The recordings are going to be compete back on applicable skilled magnetic recorders and recorded on knowledgeable full-track tape recorder at seven 1/2 ips. Once potential, playback speed needs to be adjusted to correct for original recording speed errors by analysing the recorded phone and AC line tones on spectroscopic analysis instrumentality. Whenever necessary, special recorders are permitted to correct playback of original recordings that's having incorrect track placement or azimuth placement. Spectrograms for Voice Identification, should have normal settings and a linear expand frequency vary (0-4000 Hz), wide band filter (300 Hz) and have bar show mode a minimum of, higher specification may also be used. All spectrograms for every separate comparison ought to be ready on a similar spectrograph. The spectrograms should be phonetically marked below every voice sound.

    3. One needs to prepare increased tape copies from the first recordings victimization equalizers, notch filters, and digital adaptation prognosticative DE convolution programs to cut back extraneous noise and channelized telephonic and recording effects. Prepare A second set of spectrograms from the improved copies and use it along with the unprocessed spectrograms for comparison.

    4. Compare equally pronounced words between 2 voice samples, with most proverbial voice samples being verbatim with the unknown voice recording. Normally, twenty or a lot of completely different words are required for a meaningful comparison. But twenty words sometimes ends up in a less conclusive opinion, like probably rather than in all probability.

    5. The examiners have to be compelled to created spectral pattern comparison between the 2 voice samples by comparison starting, mean and finish formant frequency, formant shaping, pitch, timing, etc., of every individual word. Once offered, compare equally pronounced words among every sample to insure voice sample’s consistency. Words with spectral patterns that are distorted, masked ,by extraneous sounds, too faint, or lacked adequate characteristic characteristics ought to be eliminated.

    6. Build an aural examination of every voice sample to see if pattern similarities or dissimilarities noted ar the merchandise of pronunciation variations, voice disguise, obvious drug or alcohol use, altered status, electronic manipulation, etc.

    7. An aural comparison is taken by repeatedly enjoying 2 voice samples at the same time on separate tape recorders, and electronically changes back and forth between the samples whereas listening on high-quality headphones. In case, a sample contains a wider frequency response than the opposite, bandpass filters are suggested to compensate a minimum of a number of the aural listening tests.

    8. The examiner ought to resolve any variations found between the aural and spectral results, typically by continuance all or a number of the comparison steps.

    9. If the examiner found the samples to be terribly similar (identification) or terribly dissimilar (elimination), perpetually conduct associate freelance analysis by atleast one, however typically 2 alternative examiners to substantiate the results. If variations of opinions still gift’s between the examiners, extra comparisons to be done to resolve this elimination.[11] Communication phonetic studies have yielded many insights into the potential states of the speech organ.[12] Folks will management the speech organ so they turn out speech sounds with not solely regular adjustment vibrations at a variety of various pitches, however conjointly harsh, soft, creaky, breathy and a range of alternative communication sorts. These are manageable variations within the actions of the speech organ, not simply personal individual potentialities or involuntary pathological actions. What seems to be associate uncontrollable pathological voice quality for one person can be a necessary a part of the set of descriptive linguistics contrasts for somebody else. for instance, some English language speakers might have a awfully breathy voice that's thought of to be pathological, whereas Gujarati speakers would like the same voice quality to tell apart the word /baª|/ that means „outside‟ from the word /ba|/ that means „twelve‟.[13.14] Likewise, associate English language speaker might have a awfully creaky voice quality like the one used by speakers of Jalapa Mazatec to tell apart the word /ja0!/ that means „he wears‟ from the word /ja!/ that means „tree‟.[15] As was noted a while agone, one Person’s voice disorder can be another person's phone.[16] Another purpose on the communication time exploited by sure languages (far fewer in range than languages that have voiceless sounds) is breathy voice. Breathy communication is related to a decrease in overall acoustic intensity in several languages, e.g. Gujarati (Fischer-Jørgensen 1967), Kui and Chong (Thongkum 1988), Tsonga (Traill and Jackson 1988), Hupa (Gordon 1998).

  7. Sensible issues with voice samples
  8. Factors which will influence identification accuracy are primarily sample length and acoustic quality. If we tend to 1st take into account the influence of sample length, we tend to might observe that in world investigations samples is also terribly short, typically simply a number of words or a phrase or 2 which implies that sample length is on the order of a number of seconds. In associate early study by Pollacketal. (1954), the authors determined that identification accuracy raised as sample size (for syllabic words) raised, however solely up to regarding one.2 seconds. For extended samples they claim that phonetic variation takes over because the most significant issue. They conclude that “we believe that the length of the speech sample in and of itself is comparatively unimportant, except in thus far because it admits a bigger or smaller applied mathematics sampling of the speaker’s speech repertoire”. This is often somewhat shocking finding has, however, been confirmed in alternative studies. In an exceedingly study by nuclear physicist (1963), fifteen recorded segments of the vowel [17] for every of nine speakers, acquainted to the listeners, were bestowed. The segments differed solely in length (25–2500 ms). For segments longer than regarding seventy five ms, there was no increase in recognition rate as a operate of length. Bricker and Pruzansky (1966) bestowed stimuli that varied in length further as sound variation. They found that identification rate raised with length given that the longer stimuli conjointly contained additional sound variation which “Identification accuracy improved directly with the amount of phonemes within the sample even once length was controlled”. in an exceedingly study by

    Orchard and Yarmey (1995) correct identification rate was considerably higher for eight minute stimuli compared with thirty second stimuli. No try was created, however, to estimate the several contributions of length and descriptive linguistics variation, however it's probably that descriptive linguistics variation should are higher within the longer stimuli. An oversized proportion of threats ar done over the telephonic and criminals typically use telephones once they set up or coordinate crimes. Telephonic quality speech has so received attention in rhetorical acoustics studies. Telephonic lines have restricted information measure. Most of the frequencies relevant for speech transmission are coated, however not all. Frequencies below three hundred Hz ar filtered out for instance. With mobile phones, issues associated with speech secret writing ar introduced. These effects are significantly noticeable for feminine voices. vital queries within the rhetorical context are whether or not the poorer thousand quality of recorded telephonic conversations adversely affects voice identification and if so to what extent and the way. Also, from a method purpose of read one would love to understand whether or not one ought to solely use voices recorded over the telephonic in lineups wherever the criminative decision is recorded over the telephonic.[18]There are amazingly few studies that address this question, however there are some results that indicate that the matter may not be as serious in concert would possibly expect. for instance Rathborn, Bull and Clifford (1981, cited in Yaremey, 1991) “failed to seek out any important variations in voice identification of a target voice detected originally over the telephonic and tested employing a taped lineup over the telephonic, in distinction to voice identification detected originally over the telephonic and tested directly with a taped lineup. A matter that has received some attention latterly is that the influence of the band-pass filtering that happens in telephonic transmissions on acoustic analysis of voice samples. in an exceedingly recent study, Künzel (2001) found that the comparatively high (300 Hz) lower cut-off frequency had the result of shifting F1 in German vowels upwards compared to the corresponding tokens in an exceedingly synchronal DAT-recording. The typical size of the shift was half dozen.6% for male and half dozen.1´% for feminine speakers and every one the variations were important at the five hundred level or higher. Other, but minor, artefacts were determined further. As a consequence, Künzel warns against exploitation formant information for identification functions if the recordings were made of telephones. His results haven't been questioned, however his total rejection of the utilization of formant information in identification supported telephonic recordings has been challenged by Nolan (2002).[19]

    4.1. Disguised Voice

    Disguised voice up to the extent used, could be a significant issue for identification. Within the extreme finish of the spectrum we discover electronic manipulation or perhaps communication via speech synthesis, which might create identification nearly not possible. Within the world of actual rhetorical work, however, voice disguise tends to be of a rather unsophisticated nature. Künzel (2000) notes, supported expertise from BKA (the German Federal Police Office), unconcealed that “falsetto, pertinent creaky voice, whispering, faking an overseas accent, and pinching One’s nose” ar the foremost common sorts. Essentially an equivalent observations are created in experimental studies. in an exceedingly study by Masthoff (1996) wherever collegian students served as subjects, the bulk of the chosen disguises (35%) were communication level disguises (whisper, raised pitch or lowered pitch). Articulation level disguises (dialect mimicry, foreign accent etc.) were conjointly used (20%). The remaining disguises were mixtures of 2 sorts. Electronically manipulated messages are still rare, however Künzelnotes that there has been a rise in recent years, in the main within the type of piece of writing recorded voices. Though the used kinds of disguise in most cases are rather unsophisticated, disguise might all the same have a substantial prejudicious result on identification. In an exceedingly study by Reich and Duke (1979) wherever numerous kinds of disguise were used, every type made considerably less correct identification. Hyper tone made the best result however there have been in most cases no important variations between the various sorts. Whisper, one among the additional common sorts, resulted in markedly less correct identification in an exceedingly study by Orchardand Yarmey (1995) if unvoiced samples were compared with phonated samples. If each the reference and therefore the check samples were unvoiced the distinction was less pronounced. Voice disguise isn't as common in concert would possibly suppose. Künzel (2000) reports that:” Over the last 20 years, between fifteen and twenty five per cent of the annual cases addressed at the BKA identification section exhibited a minimum of one reasonably disguise”.[20] Voice identification by manual strategies has shown variability in result accuracy supported the examiners experiences and skills. Automatic and prism spectroscope identification techniques are introduced within the identification, wherever a spectrograph is employed for identification of voice that produces a visible graph (voice spectrogram) of the speech as a operate of your time on horizontal axis and frequency at vertical axis having voice energy in gray scale/colour variations.[21]it could be a well-accepted analysis tool in voice identification i.e. wont to study individual vowel characteristics, physiological speech anomalies etc. the prism spectroscope voice identification assumes that intra-speaker variability together with variations within the same vocalization recurrent by an equivalent speaker is determinable from inter-speaker variability of the variations within the same vocalization by completely different speakers.[22]

    4.2. Tilt in Voice Spectra

    One of the main acoustic parameters that faithfully differentiate vox humana varieties in several languages is spectral tilt, i.e. the degree to that intensity drops off as frequency will increase. Spectral tilt will be quantified by scrutiny the amplitude of the basic thereto of upper frequency harmonics, e.g. the second harmonic, the harmonic nearest to the primary formant, or the harmonic nearest to the second formant. Spectral tilt is characteristically most steeply positive for creaky vowels and most steeply negative for breathy vowels. In different words, the deterioration in energy at higher frequencies is least for creaky voice and most for breathy voice. Subtracting the amplitude of the basic from the amplitude of upper harmonics so yields the best values for creaky vowels and therefore the smallest values for breathy vowels, with intermediate values for modal vowels. Spectral tilt faithfully differentiates vox varieties during a range of languages, together with Jalapa Mazatec (Kirk et al. 1993, Silverman et al. 1995), that contrasts creaky, breathy, and modal vowels, (Bickley 1982, Ladefoged 1983, Jackson et al. 1988), that distinguishes between breathy and modal vowels (as well as a 3rd style of vox, strident,), Gujarati (Fischer-Jorgensen 1967), that contrasts breathy and modal vowels, Kedang (Samely 1991), that contrasts modal and breathy vowels, Hmong (Huffman 1987), that distinguishes breathy and modal vowels, Tsonga (Traill and Jackson 1988), that contrasts breathy and modal nasals, some minority languages of China (Jingpho, Haoni, Wa, Yi) examined by Maddieson and Ladefoged (1985), that distinction a “tense” vox somewhat totally different from creaky voice with a lot of modal voice kind, and, finally, mpi, that additionally contrasts tense and non-tense (or “lax”) vox. Totally different measures of spectral tilt don't continually behave uniformly in differentiating vox varieties during a single language. In Mpi, that uses tone contrastively, Blankenship (1997) found interactions between tone level and measurements of spectral tilt. The amplitude distinction between the basic and therefore the second harmonic was a lot of reliable indicator of vox kind for prime tone than for either middle or low tone, whereas the amplitude distinction between the basic and therefore the harmonic nearest to the second formant was a lot of helpful for differentiating vox contrasts in middle and low tone vowels than in high tone vowels. Investigation of vox variations is a very important space of analysis, as several languages use distinctions that believe entirely on variations in voice quality. As we've got seen, these distinctions could involve 2 or a lot of totally different vox varieties and should have an effect on consonants, vowels, or each consonants and vowels. Additionally, several different languages often use non-modal vox varieties as variants of modal voice in sure manner of speaking contexts. Languages additionally disagree in their temporal order of non-modal vox relative to different articulated events in fascinating ways in which, though there area unit sure repeated temporal order patterns and spacing restrictions that warrant rationalization. variations in vox kind will be signalled by an oversized range of quantitative phonetic properties within the acoustic, mechanics, and articulated domains, the last of that has been comparatively unstudied thanks to the invasive activity techniques needed. It’s unlikely; but, that future analysis can yield several actually universal observations concerning the vary and realization of vox varieties in languages of the globe. {we can| we can |we are able to} ne'er grasp whether or not some language within the past had or within the future will have a unique technique of exploitation the vocal folds to create a linguistic distinction. The prevalence of phonetic rarities like the strident voice quality and few neighbouring languages shows that we are able to use the speech organ in completely surprising ways in which.[23,24,25]

  9. Conclusion
  10. Voice being an area of activity life science is just about developing rhetorical importance in cases of extortion, felony etc. cases. Voice is employed as confirmatory proof is significant trustworthy supply of proof. Forensic Voice analysis of nowadays is predicated on overall outcome supported principals and experiments of scientific modulus. As per the studies associated with the topic, it extents that limitations and errors area unit major problems for voice identification. Positive and outstanding results area unit classifiable if ideal voice samples with enough speech length and vowel counts area unit obtained. Demonstration of frequency vs. time spectra’s of words and vowel/consonants makes it a lot of reliable technique. Voice identification has backline thresholds as a most promising technique for real time analysis for identification for person, over a mobile network providing subsequently operated can afford to deliver info of social, education and geographical background of a person determining linguistic skills of subjected speakers.

REFERENCES :

  1. Alexander, A., Botti, F., and Drygajlo, A. (2004). Handling Mismatch in Corpus-Based Forensic Speaker Recognition. In Odyssey 2004, The Speaker and Language Recognition Workshop, pages 69–74, Toledo, Spain.
  2. Arcienega, M. and Drygajlo, A. (2003). A Bayesian network approach for combining pitch and reliable spectral envelope features for robust speaker verification. In Kittler, J. and Nixon, M. S., editors, Proc. 4th Int. Conf. on Audio- and Video- Based Biometric Person Authentication, pages 78–85, Guildford, UK. Springer.
  3. Dunn, R. B., Quatieri, T. F., Reynolds, D. A., and Campbell, J. (2001). Speaker recognition from coded speech in matched and mismatched conditions. In 2001: A Speaker Odyssey, Crete, Greece.
  4. K¨unzel, H. J. (1998). Forensic speaker identification: A view from the crime lab. In Proceedings of the COST Workshop on Speaker Recognition by Man and Machine, pages 4–8, Technical University of Ankara, Ankara, Turkey.
  5. Miller, G. A. (1956). The Magical Number Seven, Plus or Minus Two: Some Limits in our Capacity for Processing Information. The Psychological Review, 63:81–97
  6. Oglesby, J. Mason, J. (1989). Speaker recognition with a neural classifier. In Proceedings First IEE International Conference on artificial Neural Networks, volume 313, pages 306–309.
  7. Loevinger, L. (1995). Science as evidence. Jurimetrics, 35(2):153–190.
  8. Martin, R. (1994). Spectral subtraction based on minimum statistics. In EUSIPCO- 94, pages 1182–1185.
  9. Rossy, Q. (2003). Simulation de cas reels de reconnaissance de locuteurs au moyen du logiciel ASPIC. Project report, Institut de Police Scientifique, Ecole des Sciences Criminelles, University of Lausanne, Switzerland.
  10. Zimmermann, P. (2005). Analyse de l‟influence des conditions d‟enregistrement dans la reconnaissance automatique de locuteurs en sciences forensiques. Project report, Institute Police Scientifique, Ecole des Sciences Criminelles, University of Lausanne, Switzerland.
  11. Meuwly, D. (2000). Voice Analysis, in : Encyclopedia of Forensic Science, pages 1413 – 1420. London: Academic Press Ltd.
  12. Titze, I.R. (1994). Principles of Voice Production, Prentice Hall (currently published by NCVS.org), ISBN 978-0-13-717893-3.
  13. Sundberg, Johan, The Acoustics of the Singing Voice, Scientific American Mar 77, p82
  14. Greene, Margaret; Lesley Mathieson (2001). The Voice and its Disorders. John Wiley & Sons; 6th Edition. ISBN 978-1-86156-196-1.
  15. Rothenberg, M. The Breath-Stream Dynamics of Simple-Released Plosive Production, Vol. 6, Bibliotheca Phonetica, Karger, Basel, 1968.
  16. Titze, I. R. (2006).The Myoelatic Aerodynamic Theory of Phonation, Iowa City:National Center for Voice and Speech, 2006.
  17. Phil Manchester (January 2010). "An Introduction To Forensic Audio". Sound on Sound.
  18. Maher, Robert C. (March 2009). "Audio forensic examination: authenticity, enhancement, and interpretation". IEEE Signal Processing Magazine 26: 84–94.
  19. Alexander Gelfand (10 October 2007). "Audio Forensics Experts Reveal (Some) Secrets". Wired Magazine.
  20. Labov, William (1972) Sociolinguistic patterns. Philadelphia, PA: University of Pennsylvania Press, p192.
  21. Eagleson, Robert. (1994). 'Forensic analysis of personal written texts: a case study', John Gibbons (ed.), Language and the Law, London: Longman, 362–373.
  22. Gibbons, J., V Prakasam, K V Tirumalesh, and H Nagarajan (Eds) (2004). Language in the Law. New Delhi: Orient Longman. Koenig, B.J. (1986) 'Spectrographic voice identification: a forensic survey', letter to the editor of J. Acoustic Soc, Am., 79, 6, 2088- 90.
  23. Koenig, B.J. (1986) 'Spectrographic voice identification: a forensic survey', letter to the editor of J. Acoustic Soc, Am., 79, 6, 2088-90.
  24. Pennycook, A. (1996) 'Borrowing others words: text, ownership, memory and plagiarism', TESOL Quarterly, 30, 201-30.
  25. John Olsson (2004). An Introduction to Language Crime and the Law. London: Continuum International Publishing Group.


*************************************************** 

Ahuja Pooja
Faculty, Institute of Forensic Science,
Gujarat Forensic Sciences University


J M Vyas
Director General,
Gujarat Forensic Sciences University


Previous index next
Copyright © 2012 - 2024 KCG. All Rights Reserved.   |    Powered By : Knowledge Consortium of Gujarat
Home  |   Archive  |   Advisory Committee  |   Contact us