ЛИЧНА СТРАНИЦА
С ПЕРСОНАЛНА И ПРОФЕСИОНАЛНА ИНФОРМАЦИЯ

auto EN

Име: ТОДОР ДИМИТРОВ ГАНЧЕВ
Месторабота: Технически университет - Варна
Длъжност (звание): Заместник-ректор, висше училище
Научна степен: Доктор (научно-образователна),
Факултет: Факултет по изчислителна техника и автоматизация
Катедра: Компютърни науки и технологии
Кабинет(и): 502Е
Служебен тел.: 052 383618
E-mail1:
E-mail2:


Съдържание:
Публикации (16)
Интереси (4)


Публикации
Публикация: №1 Колев, Й., Ц. Стоянова, В. Пенчев, И. Булиев, Т. Ганчев, Ръководство за лабораторни упражнения по Микропроцесорни системи, Технически Университет – Варна, гр. Варна, ISBN 954-8284-98-7, 2000.
Издателство: Технически Университет – Варна
Изд.год.: 2000
Вид: Учебна литература - Учебно пособие

Публикация: №2 Ganchev Т., Speaker Recognition, PhD dissertation, Dept. of Electrical and Computer Engineering, University of Patras, Greece, Nov. 2005.
Издателство: University of Patras, Greece
Изд.год.: 2005
Вид: Дисертация
Абстракт: This dissertation deals with speaker recognition in real-world conditions. The main accent falls on: (1) evaluation of various speech feature extraction approaches, (2) reduction of the impact of environmental interferences on the speaker recognition performance, and (3) studying alternative to the present state-of-the-art classification techniques. Specifically, within (1), a novel wavelet packet-based speech features extraction scheme, fine-tuned for speaker recognition, is proposed. It is derived in an objective manner with respect to the speaker recognition performance, in contrast to the state-of-the-art MFCC scheme, which is based on approximation of human auditory perception. Next, within (2), an advanced noise-robust feature extraction scheme based on MFCC is offered for improving the speaker recognition performance in real-world environments. In brief, a model-based noise reduction technique adapted for the specifics of the speaker verification task is incorporated directly into the MFCC computation scheme. This approach demonstrated significant advantage in real-world fast-varying environments. Finally, within (3), two novel classifiers referred to as Locally Recurrent Probabilistic Neural Network (LR PNN), and Generalized Locally Recurrent Probabilistic Neural Network (GLR PNN) are introduced. They are hybrids between Recurrent Neural Network (RNN) and Probabilistic Neural Network (PNN) and combine the virtues of the generative and discriminative classification approaches. Moreover, these novel neural networks are sensitive to temporal and spatial correlations among consecutive inputs, and therefore, are capable to exploit the inter-frame correlations among speech features derived for successive speech frames. In the experimentations, it was demonstrated that the LR PNN and GLR PNN architectures provide benefit in terms of performance, when compared to the original PNN.
Web: http://nemertes.lis.upatras.gr/dspace/handle/123456789/308

Публикация: №3 Ганчев Т., Автореферат на дисертационен труд на тема „Разпознаване на диктори”, Факултет по Електоинженерни Науки и Компютърни Технологии, Патренски Университет, гр. Патра, Гърция. 2005
Издателство: Факултет по Електоинженерни Науки и Компютърни Технологии, Патренски Университет, Гърция
Изд.год.: 2005
Вид: Друга
Абстракт: Научната новост на изследването се заключава в следните основни научни приноси: (i) предложена е нова схема за параметризация на речеви сигнали, базирана на уейвлет пакети (wavelet packets), която е проектирана специално за нуждите на задачата за разпознаване на диктори, (ii) предложена е усъвършенствана шумоустойчива схема за параметризация на речеви сигнали, която създава и изполва модели на околния шум за да подобри шумоустойчивостта на широко-използваните MFCC описатели, (iii) предложена е нова хибридна архитектура RNN-PNN (Recurrent Neural Network -- Probabilistic Neural Network), която комбинира предимствата на моделиращите и дискриминативни подходи за класификация на данни. Разглеждат се два нови варианта на класификатори, наречени Локално Рекурентна Вероятностна Невронна Мрежа (Locally Recurrent Probabilistic Neural Network -- LR PNN) и Обобщена Локално Рекурентна Вероятностна Невронна Мрежа (Generalized Locally Recurrent Probabilistic Neural Network -- GLR PNN). Предложена е методика за обучението на тези хибридни класификатори. Получените резултати са публикувани и защитени в съответствие със закона за авторските права чрез публикации в периодичния печат в чужбина.

Публикация: №4 Ganchev T., Parsopoulos K.E., Vrahatis M.N., Fakotakis N., Partially Connected Locally Recurrent Probabilistic Neural Networks, Chapter 18 in Recurrent Neural Networks, ISBN 978-3-902613-28-8, September 2008, InTech, Vienna, Austria, pp.377-400.
Издателство: InTech, Vienna, Austria
Изд.год.: 2008
Вид: Друга
Абстракт: In this chapter, we review existing locally recurrent neural networks and introduce a novel artificial neural network architecture that merges the locally recurrent probabilistic neural networks (LRPNN) with swarm intelligence algorithms and concepts. In particular, we develop an enhanced LRPNN model, referred to as Partially Connected LRPNN (PC-LRPNN). In contrast to LRPNN, where the recurrent layer consists of a set of fully connected neurons, the proposed new architecture assumes a swarm of neurons in the recurrent layer. Each neuron of the swarm presumes a neighbourhood of neurons with which it communicates through interconnections. The locality that determines the neighbourhoods is defined based on existing neighbourhood and communication schemes proposed in the swarm intelligence literature. Obviously, the PC-LRPNN offers a more general scheme, in which the fully connected LRPNN can be considered as a particular case, where all links in the recurrent layer are implemented. The neighbourhood topology of the new, swarm-based recurrent layer can be either static or dynamic. Dynamic neighbourhoods have been studied extensively in the field of swarm intelligence, since swarms with dynamic communication schemes among individuals have been shown to achieve remarkably better results than swarms with static communication schemes in the field of optimization. Also, the plasticity of the neighbourhoods can be useful in cases where better fit to unknown data is required. In the present chapter we will limit our exposition to the static neighbourhoods, which are defined once during training, and remain unchanged during the operation of the PC-LRPNN. However, the concepts that we introduce here can be extended further to the dynamic counterparts. The aforementioned local neighbourhoods and communications schemes facilitate the optimization of the recurrent layer linkage, which leads to much faster operation of the neural network, when compared to the fully linked structure. Furthermore, it significantly reduces the computational load for the overall training of the recurrent layer, which is performed at each case using the Particle Swarm Optimization (PSO) algorithm. Equipping the PC-LRPNN with PSO, results in an efficient hybrid scheme that takes advantage of the virtues of the probabilistic neural networks (PNN), recurrent neural networks (RNN), swarm intelligence concept, and that can tackle successfully real-life classification problems that assume temporal or spatial correlations among subsequent events.
Web: http://sciyo.com/articles/show/title/partially_connected_locally_recurrent_probabilistic_neural_networks

Публикация: №5 Potamitis, I., T. Ganchev, Generalized Recognition of Sound Events: Approaches and Applications. In: Multimedia Services in Intelligent Environments, ISBN: 978-3-540-78491-3, Springer-Verlag Berlin Heidelberg, 2008, pp.41–79.
Издателство: Springer-Verlag Berlin Heidelberg
Изд.год.: 2008
Вид: Друга
Абстракт: This chapter surveys the contemporary approaches of automatic sound recognition and discusses the benefits stemming from real-world applications of this technology. We identify the common aspects and subtle differences among these diverse application areas and review state-of-the-art systems. In this context we project that there is much space for knowledge transfer between the different subfields of sound classification, which seem to evolve independently while achieving different states of maturity. Particular emphasis is given to lessons learned from the speech recognition paradigm, which together with speaker recognition were among the first applications of sound classification that reached the status of launching commercial products at a large climax. Special attention is paid to new emerging applications such as environmental monitoring and bioacoustic identification and applications to music which have already started altering our everyday life as we once knew it.
Web: http://www.springerlink.com/content/874227036g536255/fulltext.pdf

Публикация: №6 S. Jimenez-Murcia, F. Fernandez-Aranda, E. Kalapanidas, D. Konstantas, T. Ganchev, et al., Playmancer Project: A Serious videogame as an Additional Therapy Tool for Eating and Impulse Control Disorders, Annual Review of Cybertherapy and Telemedicine 2009,
Издателство: Amsterdam: IOS Press
Изд.год.: 2009
Вид: Друга

Публикация: №7 Ganchev T., Locally Recurrent Neural Networks and Their Applications, Chapter IX in Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods and Techniques, ISBN 978-1-60566-766-9, IGI Global, August 2009, pp.195-222.
Издателство: IGI Global
Изд.год.: 2009
Вид: Друга
Абстракт: In this chapter we review various computational models of locally recurrent neurons and deliberate the architecture of some archetypal locally recurrent neural networks (LRNNs) that are based on them. Generalizations of these structures are discussed as well. Furthermore, we point at a number of real-world applications of LRNNs that have been reported in past and recent publications. These applications involve classification or prediction of temporal sequences, discovering and modeling of spatial and temporal correlations, process identification and control, etc. Validation experiments reported in these developments provide evidence that locally recurrent architectures are capable of identifying and exploiting temporal and spatial correlations, i.e. the context in which events occur, which is the main reason for their advantageous performance when compared with the one of their non-recurrent counterparts or other reasonable machine learning techniques.
Web: http://www.igi-global.com/Bookstore/TitleDetails.aspx?TitleId=486&DetailsType=EditorialAdvisoryBoard

Публикация: №8 Mporas I., T. Ganchev, O. Kocsis, N. Fakotakis, Context-adaptive pre-processing scheme for robust speech recognition in fast-varying noise environment, Signal Processing, vol.91, no.8, 2011, pp. 2101-2111. DOI: 10.1016/j.sigpro.2011.03.020
Издателство: Elsevier
Изд.год.: 2011
Вид: Научна публикация - Статия в чужд. списание
Абстракт: Based on the observation that dissimilar speech enhancement algorithms perform differently for different types of interference and noise conditions, we propose a context-adaptive speech pre-processing scheme, which performs adaptive selection of the most advantageous speech enhancement algorithm for each condition. The selection process is based on an unsupervised clustering of the acoustic feature space and a subsequent mapping function that identifies the most appropriate speech enhancement channel for each audio input, corresponding to unknown environmental conditions. Experiments performed on the MoveOn motorcycle speech and noise database validate the practical value of the proposed scheme for speech enhancement and demonstrate a significant improvement in terms of speech recognition accuracy, when compared to the one of the best performing individual speech enhancement algorithm. This is expressed as accuracy gain of 3.3% in terms of word recognition rate. The advance offered in the present work reaches beyond the specifics of the present application, and can be beneficial to spoken interfaces operating in fast-varying noise environments.
Web: http://www.sciencedirect.com/science/article/pii/S0165168411000958

Публикация: №9 A. Lazaridis, I. Mporas, T. Ganchev, G. Kokkinakis, N. Fakotakis: Improving Phone Duration Modelling Using Support Vector Regression Fusion, Speech Communication, ISSN 0167-6393, vol. 53, no.1, Jan. 2011, pp. 85-97.
Издателство: Elsevier
Изд.год.: 2011
Вид: Научна публикация - Статия в чужд. списание
Абстракт: In the present work, we propose a scheme for the fusion of different phone duration models, operating in parallel. Specifically, the predictions from a group of dissimilar and independent to each other individual duration models are fed to a machine learning algorithm, which reconciles and fuses the outputs of the individual models, yielding more precise phone duration predictions. The performance of the individual duration models and of the proposed fusion scheme is evaluated on the American-English KED TIMIT and on the Greek WCL-1 databases. On both databases, the SVR-based individual model demonstrates the lowest error rate. When compared to the second-best individual algorithm, a relative reduction of the mean absolute error (MAE) and the root mean square error (RMSE) by 5.5% and 3.7% on KED TIMIT, and 6.8% and 3.7% on WCL-1 is achieved. At the fusion stage, we evaluate the performance of twelve fusion techniques. The proposed fusion scheme, when implemented with SVR-based fusion, contributes to the improvement of the phone duration prediction accuracy over the one of the best individual model, by 1.9% and 2.0% in terms of relative reduction of the MAE and RMSE on KED TIMIT, and by 2.6% and 1.8% on the WCL-1 database.

Публикация: №10 Ganchev T., Contemporary methods for speech parameterization. Springer New York, Aug. 2011. ISBN: 978-1-4419-8446-3, e-ISBN: 978-1-4419-8447-0. DOI: 10.1007/978-1-4419-8447-0.
Издателство: Springer New York
Изд.год.: 2011
Вид: Книга (чужд.издание)
Абстракт: This brief book offers a general view of short-time cepstrum-based speech parameterization and provides a common ground for further in-depth studies on the subject. Specifically, it offers a comprehensive description, comparative analysis, and empirical performance evaluation of eleven contemporary speech parameterization methods, which compute short-time cepstrum-based speech features. Among these are five discrete wavelet packet transform (DWPT)-based, six discrete Fourier transform (DFT)-based speech features and some of their variants which have been used on the speech recognition, speaker recognition, and other related speech processing tasks. The main similarities and differences in their computation are discussed and empirical results from performance evaluation in common experimental conditions are presented. The recognition accuracy obtained on the monophone recognition, continuous speech recognition and speaker recognition tasks is contrasted against the one obtained for the well-known and widely used Mel Frequency Cepstral Coefficients (MFCC). It is shown that many of these methods lead to speech features that do offer competitive performance on a certain speech processing setup when compared to the venerable MFCC. The last does not target the promotion of certain speech features but instead aims to enhance the common understanding about the advantages and disadvantages of the various speech parameterization techniques available today and to provide the basis for selection of an appropriate speech parameterization in each particular case. In brief, this volume consists of nine sections. Section 1 summarizes the main concepts on which the contemporary speech parameterization is based and offers some background information about their origins. Section 2 introduces the objectives of speech pre-processing and describes the processing steps that are commonly used in the contemporary speech parameterization methods. Sections 3 and 4 offer a comprehensive description and a comparative analysis of the DFT- and DWPT-based speech parameterization methods of interest. Sections 5, 6 and 7, present results from experimental evaluation on the monophone recognition, continuous speech recognition and speaker recognition tasks, respectively. Section 8 offers concluding remarks and outlook for possible future targets of speech parameterization research. Finally, Section 9 provides some links to other sources of information and to publically available software, which offer ready-to-use implementations of these speech features.
Web: http://www.springer.com/engineering/signals/book/978-1-4419-8446-3

Публикация: №11 Kostoulas, T., T. Winkler, T. Ganchev, N. Fakotakis, J. Koehler, The MoveOn Database: Motorcycle Environment Speech and Noise Database for Command and Control Applications, Language Resources and Evaluation Journal, 2012.
Издателство: Springer (The Netherlands).
Изд.год.: 2012
Вид: Научна публикация - Статия в чужд. списание
Абстракт: In the present article we offer a comprehensive description of a unique speech and noise database, referred to as the MoveOn database, which was purposely designed and implemented in support of research on spoken dialogue interaction in a motorcycle environment. The distinctiveness of the MoveOn database results from the requirements of the application domain – an information support and operational command and control system for the two-wheel police force – and also from the specifics of the adverse open-air acoustic environment. In this article, we first outline the target application, motivating the database design, and the database purpose, and then offer a comprehensive report on the design and implementation of the database. Furthermore, we discuss the main challenges related to the choice of equipment, the organization of recording sessions, and some difficulties that were experienced during this effort. A detailed account of the database statistics and the suggested data splits in subsets is also offered. Finally, we discuss results from automatic speech recognition experiments performed on the MoveOn database, which serve the purpose to illustrate the degree of complexity of the operational environment.
Web: http://www.elra.info/Language-Resources-and-Evaluation.html

Публикация: №12 Ntalampiras, S., D. Arsic, M. Hofmann, M. Andersson, T. Ganchev, PROMETHEUS: heterogeneous sensor database in support of research on human behavioral patterns in unrestricted environments, Signal, Image and Video Processing, 2012.
Издателство: Springer LONDON LTD
Изд.год.: 2012
Вид: Научна публикация - Статия в чужд. списание
Абстракт: The multi-modal multi-sensor PROMETHEUS database was created in support of research and development activities [PROMETHEUS (FP7-ICT-214901): http://www.prometheus-FP7.eu] aiming at the creation of a framework for monitoring and interpretation of human behaviors in unrestricted indoor and outdoor environments. The distinctiveness of the PROMETHEUS database comes from the unique sensor sets, used in the various recording scenarios, but also from the database design, which covers a range of real-world applications, correlated to smart-home automation and indoors/outdoors surveillance of public areas. Numerous single-person and multi-person scenarios, but also scenarios with interactions between groups of people, motivated by these applications were implemented with the help of skilled actors and supernumerary personnel. In these scenarios, the actors and personnel were instructed to implement a range of typical and atypical behaviors, and simulations of emergency and crisis situations. In summary, the database contains more than 4 h of synchronized recordings from heterogeneous sensors (an infrared motion detection sensor, thermal imaging cameras, overview/surveillance video cameras, close-view video cameras, a 3D camera, a stereoscopic camera, a general-purpose camcoder, microphone arrays, and motion capture equipment) collected in common setups, simulating smart-home environment, airport, and ATM security environment. Selected scenes of the database were annotated for the needs of human detection and tracking. The entire audio part of the database was annotated for the needs of sound event detection, sound source enumeration, emotion recognition, etc.
Web: http://www.springerlink.com/content/7840844077131961/fulltext.pdf

Публикация: №13 Kostoulas, Т., I. Mporas, O. Kocsis, T. Ganchev et al., Affective Speech Interface in Serious Games for Supporting Therapy of Mental Disorders, Expert Systems with Applications, Mar. 2012, vol.39, no.12, pp.11072–11079, DOI=10.1016/j.eswa.2012.03.067.
Издателство: Elsevier
Изд.год.: 2012
Вид: Научна публикация - Статия в чужд. списание
Абстракт: We describe a novel design, implementation and evaluation of a speech interface, as part of a platform for the development of serious games. The speech interface consists of the speech recognition component and the emotion recognition from speech component. The speech interface relies on a platform designed and implemented to support the development of serious games, which supports cognitive-based treatment of patients with mental disorders. The implementation of the speech interface is based on the Olympus/RavenClaw framework. This framework has been extended for the needs of the specific serious games and the respective application domain, by integrating new components, such as emotion recognition from speech. The evaluation of the speech interface utilized purposely collected domain-specific dataset. The speech recognition experiments show that emotional speech moderately affects the performance of the speech interface. Furthermore, the emotion detectors demonstrated satisfying performance for the emotion states of interest, Anger and Boredom, and contributed towards successful modelling of the patient’s emotion status. The performance achieved for speech recognition and for the detection of the emotional states of interest was satisfactory. Recent evaluation of the serious games showed that the patients started to show new coping styles with negative emotions in normal stress life situations.
Web: http://www.sciencedirect.com/science/article/pii/S0957417412005908

Публикация: №14 Lazaridis, А., T. Ganchev, I.Mporas, E. Dermatas, N. Fakotakis, Two-stage phone duration modelling with feature construction and feature vector extension for the needs of speech synthesis, Computer Speech and Language, vol.26, no.4, 2012, pp. 274-292.
Издателство: Elsevier
Изд.год.: 2012
Вид: Научна публикация - Статия в чужд. списание
Абстракт: We propose a two-stage phone duration modelling scheme, which can be applied for the improvement of prosody modelling in speech synthesis systems. This scheme builds on a number of independent feature constructors (FCs) employed in the first stage, and a phone duration model (PDM) which operates on an extended feature vector in the second stage. The feature vector, which acts as input to the first stage, consists of numerical and non-numerical linguistic features extracted from text. The extended feature vector is obtained by appending the phone duration predictions estimated by the FCs to the initial feature vector. Experiments on the American-English KED TIMIT and on the Modern Greek WCL-1 databases validated the advantage of the proposed two-stage scheme, improving prediction accuracy over the best individual predictor, and over a two-stage scheme which just fuses the first-stage outputs. Specifically, when compared to the best individual predictor, a relative reduction in the mean absolute error and the root mean square error of 3.9% and 3.9% on the KED TIMIT, and of 4.8% and 4.6% on the WCL-1 database, respectively, is observed.
Web: http://www.sciencedirect.com/science/article/pii/S0885230812000137

Публикация: №15 Lazaridis, A., I. Mporas, T. Ganchev, Phone Duration Modeling of Affective Speech using Support Vector Regression, International Journal of Intelligent Systems and Applications (IJISA), ISSN: 2074-904X vol.4, 2012, no.8, pp.1-9.
Издателство: MECS Publisher
Изд.год.: 2012
Вид: Научна публикация - Статия в чужд. списание
Абстракт: In speech synthesis accurate modeling of prosody is important for producing high quality synthetic speech. One of the main aspects of prosody is phone duration. Robust phone duration modeling is a prerequisite for synthesizing emotional speech with natural sounding. In this work ten phone duration models are evaluated. These models belong to well known and widely used categories of algorithms, such as the decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms. Furthermore, we investigate the effectiveness of Support Vector Regression (SVR) in phone duration modeling in the context of emotional speech. The evaluation of the eleven models is performed on a Modern Greek emotional speech database which consists of four categories of emotional speech (anger, fear, joy, sadness) plus neutral speech. The experimental results demonstrated that the SVR-based modeling outperforms the other ten models across all the four emotion categories. Specifically, the SVR model achieved an average relative reduction of 8% in terms of root mean square error (RMSE) throughout all emotional categories.
Web: http://www.mecs-press.org/ijisa/ijisa-v4-n8/v4n8-1.html

Публикация: №16 F Fernandez-Aranda, S Jimenez-Murcia, JJ Santamaria, K Gunnard, A Soto, E Kalapanidas, RG Bults, C Davarakis , T Ganchev et al,Video games as a complementary tool in mental disorders: Playmancer a European multicenter study, J of Mental Health,2012, 21(4)
Издателство: 2012 Informa UK, Ltd.,
Изд.год.: 2012
Вид: Научна публикация - Статия в чужд. списание
Абстракт: Background: Previous review studies have suggested that computer games can serve as an alternative or additional form of treatment in several areas (schizophrenia, asthma or motor rehabilitation). Although several naturalistic studies have been conducted showing the usefulness of serious video games in the treatment of some abnormal behaviours, there is a lack of serious games specially designed for treating mental disorders. Aim: The purpose of our project was to develop and evaluate a serious video game designed to remediate attitudinal, behavioural and emotional processes of patients with impulse-related disorders. Method and results: The video game was created and developed within the European research project PlayMancer. It aims to prove potential capacity to change underlying attitudinal, behavioural and emotional processes of patients with impulse-related disorders. New interaction modes were provided by newly developed components, such as emotion recognition from speech, face and physiological reactions, while specific impulsive reactions were elicited. The video game uses biofeedback for helping patients to learn relaxation skills, acquire better self-control strategies and develop new emotional regulation strategies. In this article, we present a description of the video game used, rationale, user requirements, usability and preliminary data, in several mental disorders.
Web: http://informahealthcare.com/doi/pdf/10.3109/09638237.2012.664302


Интереси
Интерес: Цифрова обработка на сигнали

Интерес: статистически методи за класификация на данни

Интерес: Обработка на реч и аудиосигнали

Интерес: биоакустика