You may have to Search all our reviewed books and magazines, click the sign up button below to create a free account.
This book constitutes the refereed proceedings of the 11th International Conference on Text, Speech and Dialogue, TSD 2008, held in Brno, Czech Republic, September 8-12, 2008. The 79 revised full papers presented together with 4 invited papers were carefully reviewed and selected from 173 submissions. The topics of the conference include, but are not limited to, text corpora and tagging; transcription problems in spoken corpora; sense disambiguation; links between text and speech oriented systems; parsing issues; parsing problems in spoken texts; multi-lingual issues; multi-lingual dialogue systems; information retrieval and information extraction; text/topic summarization; machine translation; semantic networks and ontologies; semantic web; speech modeling; speech segmentation; speech recognition; search in speech for IR and IE; text-to-speech synthesis; dialogue systems; development of dialogue strategies; prosody in dialogues; emotions and personality modeling; user modeling; knowledge representation in relation to dialogue systems; assistive technologies based on speech and dialogue; applied systems and software; facial animation; and visual speech synthesis
This book discusses the contribution of articulatory and excitation source information in discriminating sound units. The authors focus on excitation source component of speech -- and the dynamics of various articulators during speech production -- for enhancement of speech recognition (SR) performance. Speech recognition is analyzed for read, extempore, and conversation modes of speech. Five groups of articulatory features (AFs) are explored for speech recognition, in addition to conventional spectral features. Each chapter provides the motivation for exploring the specific feature for SR task, discusses the methods to extract those features, and finally suggests appropriate models to capture the sound unit specific knowledge from the proposed features. The authors close by discussing various combinations of spectral, articulatory and source features, and the desired models to enhance the performance of SR systems.
This book constitutes the proceedings of the 24th International Conference on Speech and Computer, SPECOM 2022, held as a hybrid event in Gurugram, India, in November 2022. The 51 full and 9 short papers presented in this volume were carefully reviewed and selected from 99 submissions. The papers present current research in the area of computer speech processing including audio signal processing, automatic speech recognition, speaker recognition, computational paralinguistics, speech synthesis, sign language and multimodal processing, and speech and language resources.
The two-volume proceedings set LNAI 14338 and 14339 constitutes the refereed proceedings of the 25th International Conference on Speech and Computer, SPECOM 2023, held in Dharwad, India, during November 29–December 2, 2023. The 94 papers included in these proceedings were carefully reviewed and selected from 174 submissions. They focus on all aspects of speech science and technology: ​automatic speech recognition; computational paralinguistics; digital signal processing; speech prosody; natural language processing; child speech processing; speech processing for medicine; industrial speech and language technology; speech technology for under-resourced languages; speech analysis and synthesis; speaker and language identification, verification and diarization.
The book presents current research and developments in multilingual speech recognition. The author presents a Multilingual Phone Recognition System (Multi-PRS), developed using a common multilingual phone-set derived from the International Phonetic Alphabets (IPA) based transcription of six Indian languages - Kannada, Telugu, Bengali, Odia, Urdu, and Assamese. The author shows how the performance of Multi-PRS can be improved using tandem features. The book compares Monolingual Phone Recognition Systems (Mono-PRS) versus Multi-PRS and baseline versus tandem system. Methods are proposed to predict Articulatory Features (AFs) from spectral features using Deep Neural Networks (DNN). Multitask learning is explored to improve the prediction accuracy of AFs. Then, the AFs are explored to improve the performance of Multi-PRS using lattice rescoring method of combination and tandem method of combination. The author goes on to develop and evaluate the Language Identification followed by Monolingual phone recognition (LID-Mono) and common multilingual phone-set based multilingual phone recognition systems.
This book focuses on speech processing in the presence of low-bit rate coding and varying background environments. The methods presented in the book exploit the speech events which are robust in noisy environments. Accurate estimation of these crucial events will be useful for carrying out various speech tasks such as speech recognition, speaker recognition and speech rate modification in mobile environments. The authors provide insights into designing and developing robust methods to process the speech in mobile environments. Covering temporal and spectral enhancement methods to minimize the effect of noise and examining methods and models on speech and speaker recognition applications in mobile environments.
A sharp increase in the computing power of modern computers has triggered the development of powerful algorithms that can analyze complex patterns in large amounts of data within a short time period. Consequently, it has become possible to apply pattern recognition techniques to new tasks. The main goal of this book is to cover some of the latest application domains of pattern recognition while presenting novel techniques that have been developed or customized in those domains.
In this brief, the authors discuss recently explored spectral (sub-segmental and pitch synchronous) and prosodic (global and local features at word and syllable levels in different parts of the utterance) features for discerning emotions in a robust manner. The authors also delve into the complementary evidences obtained from excitation source, vocal tract system and prosodic features for the purpose of enhancing emotion recognition performance. Features based on speaking rate characteristics are explored with the help of multi-stage and hybrid models for further improving emotion recognition performance. Proposed spectral and prosodic features are evaluated on real life emotional speech corpus.
The book deals with several key aspects of developing technologies in information processing systems. It explains various problems related to advanced image processing systems and describes some of the latest state-of-the-art techniques in solving them. Particularly, the recent advances in image and video processing are covered thoroughly with real-life applications. Some of the latest topics like rough fuzzy hybridization and knowledge reuse in computational intelligence are included adequately.
The two-volume set of LNCS 11941 and 11942 constitutes the refereed proceedings of the 8th International Conference on Pattern Recognition and Machine Intelligence, PReMI 2019, held in Tezpur, India, in December 2019. The 131 revised full papers presented were carefully reviewed and selected from 341 submissions. They are organized in topical sections named: Pattern Recognition; Machine Learning; Deep Learning; Soft and Evolutionary Computing; Image Processing; Medical Image Processing; Bioinformatics and Biomedical Signal Processing; Information Retrieval; Remote Sensing; Signal and Video Processing; and Smart and Intelligent Sensors.