ICASSP 97, Table of Contents 1997 International Conference on Acoustics, Speech and Signal Processing Title: Space-Time Processing for Wireless Communications Authors: Arogyaswami Paulraj, Stanford University Volume: 1, Page: 1 Abstract: This paper reviews space-time signal processing in mobile wireless communications. Space-time processing refers to the signal processing performed in the spatial and temporal domain on signals received at or transmitted from an antenna array, in order to improve performance of wireless networks. We focus on antenna arrays deployed at the base stations since such applications are of current practical interest. ** Title: Variability Of Performance In Video Coding Authors: Don Pearson, University of Essex Volume: 1, Page: 5 Abstract: Modern video compression techniques exhibit variability of performance as a function of time. Studies are reported of viewers reactions to this variability, which indicate a sensitivity to particular features. Some interesting conclusions emerge for future work in video coding. ** Title: Expert Summaries Authors: Renato de Mori, C.E.R.I. Hermann Ney, University of Technology (RWTH), Aachen Hans Georg Musmann, University of Hannover Rama Chellapa, University of Maryland Mark J.T. Smith, Georgia Institute of Technology John R. Treichler, Applied Signal Technology, Inc. Georgios B. Giannakis, University of Virginia Michael D. Zoltowski, Purdue University Volume: 1, Page: 9 Abstract: Leading experts in their fields summarize the most relevant new ideas from submitted papers in the fields of speech processing, digital signal processing, image and multidimensional signal processing, and statistical and array processing. ** Title: Expanding Team Experiences in DSP Education Authors: Delores Etter, University of of Colorado Geoffrey C. Orsak, George Mason University Volume: 1, Page: 11 Abstract: Since practicing engineers work in multidisciplinary teams, it is important that universities provide as many teaming experiences as possible. In this paper, we present some of the advantages and disadvantages of traditional teaming approaches. We then present issues related to virtual teaming - the teaming of students from geographically distributed locations. Virtual teaming adds a new dimension to the teaming experiences that universities can provide to students to better equip them for the environment in which they will work in research positions and in industry. Experiences with a three-year program in virtual teaming between the University of Colorado and George Mason University will be presented. ** Title: Interactive Classroom For DSP/Communication Courses Authors: Huseyin Abut, School of Applied Science Yusuf Ozturk, San Diego State University Volume: 1, Page: 15 Abstract: In this study, we present a new classroom environment to conduct digital signal processing and communication systems courses. Key features of the model are the collaborating instructor embracing students, a smart classroom equipped with a `Whiteboard` and advanced telecommunication networks, electronic textbook, and other resources, World Wide Web (WWW), Matlab, and other on-line tools. The underlying assumptions of the educational process are teambuilding instead of independent learning, collaborating/ supervising instructor, lateral curriculum instead of a vertical curriculum, and idea-to-product design concept. We will present a sample lecture in the proposed interactive classroom, where the concept of eye diagrams in regenerative repeaters will be presented from the first author`s text using matlab and WWW. ** Title: Experiences in Teaching DSP First in the ECE Curriculum Authors: James H. McClellan, Georgia Institute of Technology Ronald W. Schafer, Georgia Institute of Technology Mark A. Yoder, Rose-Hulman Institute of Technology Volume: 1, Page: 19 Abstract: In this paper we describe experiences gained from teaching an introductory electrical engineering course based on digital signal processing rather than the traditional first course in analog circuit theory. We will discuss our motivation for teaching DSP first, before covering analog circuits and systems. We will describe the style of the course and point out difficulties, as well as advantages, in this organization of basic material. Finally, we will make some comments about extending this approach to encompass a wider range of students from other disciplines. ** Title: Analog Signal Processing: A Replacement for the Sophomore-Level Circuit Analysis Course Authors: David C. Munson, University of Illinois Volume: 1, Page: 23 Abstract: A new undergraduate curriculum in electrical engineering has been adopted by the Department of Electrical and Computer Engineering at the University of Illinois. Major changes have been incorporated, including a redistribution of the circuits and signal processing topics within the curriculum. After giving an overview of the new curriculum, this paper focuses on a new, required sophomore-level course on analog signal processing. This course combines material from the traditional course on circuit analysis with material on continuous-time signals and systems. Students completing this course can study digital signal processing as first-semester juniors, which leaves ample time for more advanced signal and image processing courses in future semesters. ** Title: Re-engineering The Electrical Engineering Curriculum Authors: Sanjit K. Mitra, University of California Volume: 1, Page: 27 Abstract: Three specific programs are suggested to modify the electrical engineering curriculum to keep up with the dramatic technological developments of recent years. One of the programs is a five-year combined BS/MS program which permits the student to specialize in more than one field. The second proposal is to restructure the BS program into a multi-track program. The third one is an internship-in-industry program to provide the student with a meaningful and valuable real-world design experience before graduation. ** Title: Structural Subband Decomposition: A new Concept in Digital Signal Processing Authors: Sanjit K. Mitra, University of California Volume: 1, Page: 31 Abstract: This paper introduces the concept of structural subband decomposition of sequences, a generalization of the polyphase decomposition of sequences, and outlines a number of applications of this concept, such as efficient FIR filter design and implementation, adaptive filtering, and fast computation of discrete transforms. ** Title: A New Algorithm for the Generalized Eigenvalue Problem Authors: Knut Huper, University of Wurzburg Uwe Helmke, University of Wurzburg Volume: 1, Page: 35 Abstract: The problem of finding the generalized eigenvalues and eigenvectors of a pair of real symmetric matrices A and B, with B>0, can be viewed as a smooth optimization problem on a smooth manifold. We present a cost function approach to the generalized eigenvalue problem which is posed on the product of the n-sphere and Euclidian space R. The critical point set of this cost function is studied. An algorithm is presented based on constrained optimization. A proof of local quadratic convergence is given. ** Title: A lattice structure for perfect reconstruction linear time varying filter banks with all pass analysis banks Authors: Soura Dasgupta, University of Iowa Chris Schwarz, University of Iowa Minyue Fu, University of Newcastle Volume: 1, Page: 39 Abstract: We consider a multi-input, multi-output lattice realization for linear time-varying analysis banks which are all pass. Such a realization has been given for LTI systems; and under certain conditions, we show how it generalizes to the LTV case. Moreover, our implementation is simpler than the existing LTI version. Finally, we describe the anticausal inverse of a lattice realization which is used in the synthesis bank. ** Title: Algorithm Design for Structured Systems: Application to Pole Placement Authors: Steffen Paul, Technical University of Munich Josef A. Nossek, Technical University of Munich Volume: 1, Page: 43 Abstract: Numerical algorithms for signal processing and control are quite often constructed by intuition. When the system to be designed contains algebraic or other invariants, then these constraints can be exploited to find appropriate transformations. The transformations in system theory are usually Lie groups. One has to find Lie groups which are consistent with the invariants. We show, how this point of view can be applied to construct pole placement algorithms for symmetric and skew-symmetric realizations. However, Lie group theory only reveals the appropriate transformations but is not able to reduce the design process to a trivial task. The problem discussed here does also show this limitation. ** Title: Actions of noncompact groups and algorithm design: A case study Authors: Klaus Diepold, IDT Rainer Pauli, Technical University of Munich Volume: 1, Page: 47 Abstract: Numerical matrix computations involving actions of noncompact transformation groups are known to produce numerical problems since the elements of the pertaining matrix representations are inherently unbounded. In this case study we analyse numerical problems occuring in a class of algorithms that is based on actions of the pseudo-orthogonal group O_n,m -- a group that is noncompact (hyperbolic geometry) and well established in signal processing (Schur methods). As a major result, it is shown how to exploit the additional degrees of freedom in defining coordinate frames in a Grassmannian setting in order to impose an a priori bound on the norm of the transformation matrices. This way, numerically disastrous situations can be circumvented systematically. Hence, it becomes possible to develop modified algorithms which exhibit superior numerical performance for a large class of problems based on e.g. hyperbolic transformations. ** Title: Discretization Issues for the Design of Optimal Blind Algorithms Authors: Rodney A. Kennedy, Australian National University Deva K. Borah, Australian National University Zhi Ding, Auburn University Volume: 1, Page: 51 Abstract: The performance and complexity of blind algorithms in a digital receiver is dependent on the prefilter prior to discretization of the received continuous time signal and the sampling rate. This paper shows that symbol spaced blind equalization algorithms are in general sub-optimal, since a matched filter cannot be used. We show that, for fractionally spaced equalizers, the prefilter can be a general low-pass filter and does not need to be matched to the unknown channel. This flexibility on choosing the prefilter can result in different discrete time models with different complexities for the signal processing algorithms to follow. As for example, a simpler whitening filter design which is needed for the success of several important blind equalization algorithms can be realized using this flexibility. ** Title: Continuous-Time Envelope-Constrained Filter Design via Laguerre Filter and H_(infinity) Optimization Methods Authors: Zhuquan Zang, ATRI, Curtin University of Technology Antonio Cantoni, ATRI, Curtin University of Technology Koklay Teo, ATRI, Curtin University of Technology Volume: 1, Page: 55 Abstract: Envelope-constrained filtering is concerned with the design of a time-invariant filter to process a given input signal such that the noiseless output of the filter is guaranteed to lie within a prespecified output mask. In this paper, using Laguerre filters and H_(infinity) optimization techniques, the continuous-time envelope-constrained filter design problem has been reformulated and solved as a constrained H_(infinity) model-matching problem. To illustrate the effectiveness of the design method, a numerical example is presented which deals with the design of an equalization filter for a digital transmission channel. ** Title: Local adaptive algorithms for information maximization in neural networks, and application to source separation Authors: Jeroen Dehaene, K.U.Leuven Nanayaa Twum-Danso, Harvard University Volume: 1, Page: 59 Abstract: Information theoretic criteria for neural network adaptation laws have recently become an important focus of attention. We consider the problem of adaptively maximizing the entropy of the outputs of a deterministic feedforward neural network with real valued stochastic input signals, as considered by Bell and Sejnowski. We give a new explanation for the relevance of output information (entropy) maximization for source separation applications and reinterpret Bell and Sejnowski's approach in a more general context of probability density estimation. This insight is the basis for a generalization of the approach, and we consider a family of gradient based algorithms. ** Title: Quick Aggregation of Markov Chain Functionals via Stochastic Complementation Authors: Kutluyil Dogancay, University of Melbourne Vikram Krishnamurthy, University of Melbourne Volume: 1, Page: 63 Abstract: The paper presents a quick and simplified aggregation method for a large class of Markov chain functionals based on the concept of stochastic complementation. Aggregation results in a reduction in the number of Markov states by grouping them into a smaller number of aggregated states, thereby producing a considerable saving on computational complexity associated with maximum likelihood parameter and state estimation for hidden Markov models. The importance of the proposed aggregation method stems from the ease with which Markov chains with a large number of states can be aggregated. Three Markov chain functionals which have widespread use are considered to illustrate the application of our aggregation method. ** Title: A rank preserving flow algorithm for quadratic optimization problems subject to quadratic equality constraints Authors: John B. Moore, Systems Engineering, ANU Danchi Jiang, Dept. MAE. Chinese University Volume: 1, Page: 67 Abstract: This paper concerns quadratic programming problems subject to quadratic equality constraints such as arise in broadband antenna array signal processing and elsewhere. At first, such a problem is converted into a semidefinite programming problem with a rank constraint. Then, a rank preserving flow is used to accommodate the rank constraint. The associated gradient formulas are carefully developed. The convergence of the resulted algorithm is also guaranteed. Our approach is demonstrated by a numerical experiment. ** Title: VERBMOBIL: The Combination Of Deep And Shallow Processing For Spontaneous Speech Translation Authors: Thomas Bub, DFKI Wolfgang Wahlster, DFKI Alex Waibel, Carnegie Mellon University Volume: 1, Page: 71 Abstract: Verbmobil is a speech-to-speech translation system for spontaneously spoken negotiation dialogs. The actual system translates 74.2% of spontaneously spoken German input. We give an overview of the Verbmobil system. After the introduction of the Verbmobil scenario and the unique constraints of the project, we describe the underlying system architecture and its realization. The progress that was achieved on the end-to-end translation rate owes much to the increase of the word recognition rate from 45% in 1993 to 87% in 1996. But in order to achieve the envisaged coverage on the incertain speech recognizer output, deep and shallow approaches to the analysis and transfer problem had to be combined. ** Title: Prosodic Processing and its Use in Verbmobil Authors: Heinrich Niemann, University of Erlangen Elmar Noth, University of Erlangen Andreas Kiessling, University of Erlangen Ralf Kompe, University of Erlangen Anton Batliner, L.M.-Univ. Munchen Volume: 1, Page: 75 Abstract: We present the prosody module of the VERBMOBIL speech-to-speech translation system, the world wide first complete system, which successfully uses prosodic information in the linguistic analysis. This is achieved by computing probabilities for clause boundaries, accentuation, and different types of sentence mood for each of the word hypotheses computed by the word recognizer. These probabilities guide the search of the linguistic analysis. Disambiguation is already achieved during the analysis and not by a prosodic verification of different linguistic hypotheses. So far, the most useful prosodic information is provided by clause boundaries. These are detected with a recognition rate of 94%. For the parsing of word hypotheses graphs, the use of clause boundary probabilities yields a speed-up of 92% and a 96% reduction of alternative readings. ** Title: The Language Components in Verbmobil Authors: Hans Ulrich Block, Siemens AG Volume: 1, Page: 79 Abstract: This paper gives an overview over the main problems and their solutions in the language components of the Verbmobil speech translation system. Interpretation of spontaneously spoken language has to take into account that syntax and semantics differ from written language, that punctuation is missing, that accent and intonation have effects on the meaning and the translation, that the output of the speech recognizer may be noisy and that speakers produce errors due to distraction. The Verbmobil interpretation and translation components try to attack these problems by means of a grammar for spoken language, heavy use of prosodic information, a syntactic search on word hypothesis graphs and a shallow robust fall back translation device that is used in case the "deep" translation fails. ** Title: The Karlsruhe-Verbmobil Speech Recognition Engine Authors: Michael Finke, University of Karlsruhe Petra Geutner, University of Karlsruhe Hermann Hild, University of Karlsruhe Thomas Kemp, University of Karlsruhe Klaus Ries, University of Karlsruhe Martin Westphal, University of Karlsruhe Volume: 1, Page: 83 Abstract: Verbmobil, a German research project, aims at machine translation of spontaneous speech input. The ultimate goal is the development of a portable machine translator that will allow people to negotiate in their native language. Within this project the University of Karlsruhe has developed a speech recognition engine that has been evaluated on a yearly basis during the project and shows very promising speech recognition word accuracy results on large vocabulary spontaneous speech. In this paper we will introduce the Janus Speech Recognition Toolkit underlying the speech recognizer. The main new contributions to the acoustic modeling part of our 1996 evaluation system -- speaker normalization, channel normalization and polyphonic clustering -- will be discussed and evaluated. Besides the acoustic models we delineate the different language models used in our evaluation system: Word trigram models interpolated with class based models and a separate spelling language model were applied. As a result of using the toolkit and integrating all these parts into the recognition engine the word error rate on the German Spontaneous Scheduling Task (GSST) could be decreased from 30% word error rate in 1995 to 13.8% in 1996. ** Title: An Experiment On Korean-To-English And Korean-To-Japanese Spoken Language Translation Authors: Jae-Woo Yang, ETRI Jun Park, ETRI Volume: 1, Page: 87 Abstract: We have implemented a Korean-to-English and Korean-to-Japanese spoken language translation system prototype. The system can translate speech in travel planning domain with 5,000 word vocabulary. In our prototype, we concentrate on how to transfer the intention of a user to the partner in spite of current limitation of spoken language processing technology. We measured the end-to-end performance of the prototype to test whether the output of the system is understandable using a subjective measure. We also used an objective measure to evaluate the system performance and found that it generates coherent result with the subjective test. The test result shows that the user can understand the output even in the case that the system cannot translate speech correctly. Thus it is important to provide even partially correct translation output to the user, in order not to neglect the possibility that the user can infer the intended message using the context and his/her intelligence. ** Title: Multilingual Person to Person Communication at IRST Authors: Bianca Angelini, IRST Mauro Cettolo, IRST Anna Corazza, IRST Daniele Falavigna, IRST Gianni Lazzari, IRST Volume: 1, Page: 91 Abstract: This paper refers to a machine-mediated person-to-person multilingual communication system. Stress is put on robustness, that is the ability of the system to preserve communication even in presence of the variability and errors typical of spoken language systems. The statistical approach is adopted not only at the acoustic level, but also for the linguistic processing. Therefore, while an overview of the global architecture will be briefly introduced, the focus will be put on the acoustic recognizer and the understanding module. Experimental evaluations complete the presentation. ** Title: Fast Word-Graph Generation For Spontaneous Conversational Speech Translation Authors: Tohru Shimizu, ATR-ITL Harald Singer, ATR-ITL Yoshinori Sagisaka, ATR-ITL Volume: 1, Page: 95 Abstract: This paper introduces the latest advances in research at ATR on speech translation for spontaneous conversations, especially focusing on speech recognition efforts. For recognition, we employ a word search technique that generates moderate sized word graphs in real-time. To cope with a variety in length of utterances, e.g., word, phrase, sentence fragment, sentence, and concatenated sentences in spontaneous speech, we have adopted a two pass search strategy that uses variable-order word n-gram statistics in the first stage and task dependent language constraints in the second stage. This strategy is evaluated using the ``ATR Travel Arrangement'' corpus. ** Title: JANUS-III: Speech-to-Speech Translation in Multiple Languages Authors: Alon Lavie, Carnegie Mellon University Alex Waibel, Carnegie Mellon University Lori Levin, Carnegie Mellon University Michael Finke, Carnegie Mellon University Donna Gates, Carnegie Mellon University Marsal Gavalda, Carnegie Mellon University Torsten Zeppenfeld, Carnegie Mellon University Puming Zhan, Carnegie Mellon University Volume: 1, Page: 99 Abstract: This paper describes JANUS-III, our most recent version of the JANUS speech-to-speech translation system. We present an overview of the system and focus on how system design facilitates speech translation between multiple languages, and allows for easy adaptation to new source and target languages. We also describe our methodology for evaluation of end-to-end system performance with a variety of source and target languages. For system development and evaluation, we have experimented with both push-to-talk as well as cross-talk recording conditions. To date, our system has achieved performance levels of over 80% acceptable translations on transcribed input, and over 75% acceptable translations on speech input recognized with a 75-90% word accuracy. Our current major research is concentrated on enhancing the capabilities of the system to deal with input in broad and general domains. ** Title: State-Transition Cost Functions and an Application to Language Translation Authors: Hiyan Alshawi, AT&T Labs Adam L. Buchsbaum, AT&T Labs Volume: 1, Page: 103 Abstract: We define a general method for ranking the solutions of a search process by associating costs with equivalence classes of state transitions of the process. We show how the method accommodates models based on probabilistic, discriminative, and distance cost functions, including assignment of costs to unseen events. By applying the method to our machine translation prototype, we are able to experiment with different cost functions and training procedures, including an unsupervised procedure for training the numerical parameters of our English-Chinese translation model. Results from these experiments show that the choice of cost function leads to significant differences in translation quality. ** Title: Hybrid language processing in the Spoken Language Translator Authors: Manny Rayner, SRI International David M. Carter, SRI International Volume: 1, Page: 107 Abstract: The paper presents an overview of the Spoken Language Translator (SLT) system's hybrid language-processing architecture, focussing on the way in which rule-based and statistical methods are combined to achieve robust and efficient performance within a linguistically motivated framework. In general, we argue that rules are desirable in order to encode domain-independent linguistic constraints and achieve high-quality grammatical output, while corpus-derived statistics are needed if systems are to be efficient and robust; further, that hybrid architectures are superior from the point of view of portability to architectures which only make use of one type of information. We address the topics of ``multi-engine'' strategies for robust translation; robust bottom-up parsing using pruning and grammar specialization; rational development of linguistic rule-sets using balanced domain corpora; and efficient supervised training by interactive disambiguation. All work described is fully implemented in the current version of the SLT-2 system. ** Title: Finite-State Speech-to-Speech Translation Authors: Enrique Vidal, DSIC UPV Volume: 1, Page: 111 Abstract: A fully integrated approach to Speech-Input Language Translation in limited-domain applications is presented. The mapping from the input to the output language is modeled in terms of a finite state translation model which is learned from examples of input-output sentences of the task considered. This model is tightly integrated with standard acoustic-phonetic models of the input language and the resulting global model directly supplies, through Viterbi search, an optimal output-language sentence for each input-language utterance. Several extensions to this framework, recently developed to cope with the increasing difficulty of translation tasks, are reviewed. Finally, results for a task in the framework of hotel front-desk communication, with a vocabulary of about 700 words, are reported. ** Title: An Experimental Bidirectional Japanese/English Interpreting Video Phone System Using Internet. Authors: Shoji Hiraoka, MRIT Masakatsu Hoshimi, MRIT Kenji Matsui, CRL Jean-Claude Junqua, STL Volume: 1, Page: 115 Abstract: In this paper we report on an experimental bidirectional Japanese/English interpreting video phone system using Internet. We particularly emphasize the motivation for this work, the task, and the experiments conducted. Using in house technology developed both in Japan and in the United States, we demonstrated an Internet home shopping application where an American shop assistant and a Japanese customer engaged in task-directed dialogues, using their native languages. The experiments showed that when users are familiar with the application language, a natural interaction can be obtained. ** Title: From Neural Networks to Neural Strategies Authors: Christian Goerick, Ruhr-Univ. Bochum Bernhard Sendhoff, Ruhr-Univ. Bochum Werner von Seelen, Ruhr-Univ. Bochum Volume: 1, Page: 119 Abstract: Artificial neural network have evolved from their biologically inspired roots to a well established means to solve a broad spectrum of engineering problems. The embedding into modern statistics has provided the necessary theoretical foundation for challenging engineering tasks, such as advanced real-time image and signal processing. These are exemplary demonstrations for the applicability of this approach to complex information processing. However, the large number of applications must not obscure the fact that there are some major unsolved problems concerning neural networks. There are still no satisfactorily constructive ways to determine the optimal structure (elements as well as organization) or the learning and evaluation dynamics. The ongoing research addresses these problems. In addition to pursuing this direction, one can ask, what other lessons we can learn from biology concerning complex information processing. Our goal in this paper is to sketch a possible way from neural networks to more comprehensive neural strategies. ** Title: Neural And Traditional Techniques In Diagnostic ECG Classification. Authors: Rosaria Silipo, DSI Giovanni Bortolan, DSI Volume: 1, Page: 123 Abstract: Neural and traditional techniques have been compared for the particular task of automatic ECG analysis. A large validated ECG database has been used. Statistical methods, neural architectures with supervised and unsupervised learning, and a neuro-fuzzy architecture have been considered. The results from the connectionist approach are always at least comparable with those coming from more traditional classification methods. But the best performances have been obtained by the combination of the connectionist with the fuzzy approach. ** Title: Unsupervised Learning for Blind Source Separation: an Information-Theoretic Approach Authors: Dragan Obradovic, Siemens, Munchen Gustavo Deco, Siemens, Munchen Volume: 1, Page: 127 Abstract: This paper provides a detailed and rigorous analysis of the two commonly used methods for redundancy reduction: Linear Independent Component Analysis (ICA) and Information Maximization (InfoMax). The paper shows analytically that ICA based on the Kullback-Leibler information as a mutual information measure and InfoMax lead to the same solution if the parameterization of the output nonlinear functions in the latter method is sufficiently rich. Furthermore, this work briefly discusses the alternative redundancy measures not based on the Kullback-Leibler information distance and Nonlinear ICA. The practical issues of applying ICA and InfoMax are also discussed. ** Title: Applications of Neural Blind Separation to Signal and Image Processing Authors: Juha Karhunen, Helsinki University of Technology Aapo Hyvarinen, Helsinki University of Technology Ricardo Vigario, Helsinki University of Technology Jarmo Hurri, Helsinki University of Technology Erkki Oja, Helsinki University of Technology Volume: 1, Page: 131 Abstract: In blind source separation one tries to separate statistically independent unknown source signals from their linear mixtures without knowing the mixing coefficients. Such techniques are currently studied actively both in statistical signal processing and unsupervised neural learning. In this paper, we apply neural blind separation techniques developed in our laboratory to extraction of features from natural images and to separation of medical EEG signals. The new analysis method yields features that describe the underlying data better than for example classical principal component analysis. We briefly discuss difficulties related with real-world applications of blind signal processing, too. ** Title: Communications and Neural Networks: Theory and Practice Authors: Mark D. Plumbley, KCL Volume: 1, Page: 135 Abstract: In this paper we shall see that neural networks and communications are interlinked in a number of ways, towards the goal of efficient communication of information. One concrete example of this is the use of neural networks to ensure efficient use of communication channels, through connection admission control in ATM networks. In addition, however, efficient communication is also important within a decision making system such as a neural network. Finally we examine what type of neural network solutions are suggested by this approach. ** Title: Robust Vector Quantization by Competitive Learning Authors: Joachim M. Buhmann, University of Bonn Thomas Hofmann, University of Bonn Volume: 1, Page: 139 Abstract: Competitive neural networks can be used to efficiently quantize image and video data. We discuss a novel class of vector quantizers which perform noise robust data compression. The vector quantizers are trained to simultaneously compensate channel noise and code vector elimination noise. The training algorithm to estimate code vectors is derived by the maximum entropy principle in the spirit of deterministic annealing. We demonstrate the performance of noise robust codebooks with compression results for a teleconferencing system on the basis of a wavelet image representation. ** Title: Recognizing faces from a new viewpoint Authors: Thomas Vetter, Max-Planck-Institut, Tubingen Volume: 1, Page: 143 Abstract: A new technique is described for recognizing faces from new viewpoints. From a single 2D image of a face synthetic images from new viewpoints are generated and compared to stored views. A novel 2D image of a face can be computed without knowledge about the 3D structure of the head. The technique draws on prior knowledge of faces based on example images of other faces seen in different poses and on a single generic 3D model of a human head. The example images are used to learn a pose-invariant shape and texture description of a new face. The 3D model is used to solve the correspondence problem between images showing faces in different poses. The performance of the technique is tested on a date set of 200 faces of known orientation for rotations up to 90 degree. ** Title: Hybrid Optimization of Feedforward Neural Networks for Handwritten Character Recognition Authors: Wolfgang Utschick, Technical University of Munich Josef A. Nossek, Technical University of Munich Volume: 1, Page: 147 Abstract: An extension of a feedforward neural network is presented. Although utilizing linear threshold functions and a boolean function in the second layer, signal processing within the neural network is real. After mapping input vectors onto a discretization of the input space, real valued features of the internal representation of pattern are extracted. A vectorquantizer assigns a class hypothesis to a pattern based on its extracted features and adequate reference vectors of all classes in the decision space of the output layer. Training consists of a combination of combinatorial and convex optimization. This work has been applied to a standard optical character recognition task. Results and comparison to alternative approaches are presented. ** Title: Reading Checks with multilayer graph transducter networks Authors: Yann LeCun, AT&T Labs Leon Bottou, AT&T Labs Yoshua Bengio, AT&T Labs Volume: 1, Page: 151 Abstract: We propose a new machine learning paradigm called Multilayer Graph Transformer Network that extends the applicability of gradient-based learning algorithms to systems composed of modules that take graphs as input and produce graphs as output. A complete check reading system based on this concept is described. The system combines convolutional neural network character recognizers with graph-based stochastic models trained cooperatively at the document level. It is deployed commercially and reads million of business and personal checks per month with record accuracy. ** Title: Neural Networks For Process Control In Steel Manufacturing Authors: Martin Schlang, Siemens AG, Munich Einar Broese, Siemens AG, Munich Bjoern Feldkeller, Siemens AG, Munich Otto Gramckow, Siemens AG, Munich Michael Jansen, Siemens AG, Munich Thomas Poppe, Siemens AG, Munich Clemens Schaeffner, Siemens AG, Munich Guenter Soergel, Siemens AG, Munich Volume: 1, Page: 155 Abstract: Neural Networks are particularly suitable for the approximation of non-linear time-variant functions. Due to their learning capabilities, they have proven useful in control applications for complex industrial processes. In collaboration with the Corporate Research and Development Department, the Siemens Industrial and Building Systems Group developed Neural Network applications for the steel industry, resulting in a more economic use of resources and an improvement of productivity. At this time Siemens has installed more than 100 neural nets world wide at different plants. ** Title: A Neuro-Dynamic Programming Approach to Admission Control in ATM Networks: The Single Link Case Authors: Peter Marbach, LIDS, MIT John N. Tsitsiklis, LIDS, MIT Volume: 1, Page: 159 Abstract: We are interested in solving large-scale Markov Decision Problems. The classical method of Dynamic Programming provides a mathematical framework for finding optimal solutions for a given Markov Decision Problem. However, for Dynamic Programming algorithms become computationally infeasible when the underlying Markov Decision Problem evolves over a large state space. In recent years, a new methodology, called Neuro-Dynamic Programming, has emerged which tries to overcome this ``curse of dimensionality''. We present how Neuro-Dynamic Programming can be applied to the Admission Control Problem for a single link in an ATM environment. Based on results obtained through Neuro-Dynamic Programming, we derive a heuristic ``Threshold'' policy. Performances of the policies obtained through Neuro-Dynamic Programming are compared with a policy which always accepts a customer when the required resources are available. ** Title: Issues In Measuring The Benefits Of Multimodal Interfaces Authors: James L. Flanagan, Rutgers University Ivan Marsic, Rutgers University Volume: 1, Page: 163 Abstract: Multimedia interfaces are rapidly evolving to facilitate human/machine communication. Most of the technologies on which they are based are, as yet, imperfect. But, the interfaces do begin to allow information exchange in ways familiar and comfortable to the human--principally through natural actions in the sensory dimensions of sight, sound and touch. Further, as digital networking becomes ubiquitous, the opportunity grows for collaborative work through conferenced computing. In this context the machine takes on the role of mediator in human/machine/human communication--the ideal being to extend the intellectual abilities of humans through access to distributed information resources and collective decision making. The challenge is how to design machine mediation so that it extends, not impedes, human abilities. This report describes evolving work to incorporate multimodal interfaces into a networked system for collaborative distributed computing. It also addresses strategies for quantifying the synergies that may be gained. ** Title: Multimodal Interfaces for Multimedia Information Agents Authors: Alex Waibel, Carnegie Mellon University Bernhard Suhm, Carnegie Mellon University Minh Tu Vo, Carnegie Mellon University Jie Yang, Carnegie Mellon University Volume: 1, Page: 167 Abstract: When humans communicate they take advantage of a rich spectrum of cues. Some are verbal and acoustic. Some are non-verbal and non-acoustic. Signal processing technology has devoted much attention to the recognition of speech, as a single human communication signal. Most other complementary communication cues, however, remain unexplored and unused in human-computer interaction. In this paper we show that the addition of non-acoustic or non-verbal cues can significantly enhance robustness, flexibility, naturalness and performance of human-computer interaction. We demonstrate computer agents that use speech, gesture, handwriting, pointing, spelling jointly for more robust, natural and flexible human-computer interaction in the various tasks of an information worker: information creation, access, manipulation or dissemination. ** Title: Smart Rooms, Desks, and Clothes Authors: Alex Pentland, MIT Media Lab Volume: 1, Page: 171 Abstract: We are working to develop smart networked environments that can help people in their homes, offices, cars, and when walking about. Our research is aimed at giving rooms, desks, and clothes the perceptual and cognitive intelligence needed to become active helpers. ** Title: Human Machine Interaction by Voice and Gesture Authors: Nikil Jayant, Bell Laboratories Volume: 1, Page: 175 Abstract: Voice and gesture represent fundamental and universal modalities in interhuman communication. With recent advances in automatic methods of speech recognition and synthesis, human-machine interaction by voice is rapidly becoming a technological and commercial reality. Although less mature and deployed, gesture recognition by machine is becoming reliable enough to be considered as a serious supplement to the voice interface between humans and machines. ** Title: Audio-Visual Interaction in Multimedia Communication Authors: Tsuhan Chen, AT&T Labs - Research Ram R. Rao, Georgia Institute of Technology Volume: 1, Page: 179 Abstract: To many people, the word "multimedia" simply means the combination of various forms of information: text, speech, music, images, graphics and video. What is often overlooked is the interaction among these forms. In this paper, we will present our recent results in exploiting the audio-visual interaction that is very significant in multimedia communication. The applications include lip synchronization, joint audio-video coding, and person verification. We will present the enabling technologies, including audio-to-visual mapping and facial image analysis, for these applications. Our results show that the joint processing of audio and video provides advantages that are not available when audio and video are studied separately. ** Title: LIP Motion Modeling and Speech Driven Estimation Authors: F. Lavagetto, University of Genova S. Lepsoy, University of Genova C. Braccini, University of Genova S. Curinga, University of Genova Volume: 1, Page: 183 Abstract: Recent advances in joint acoustical/visual analysis for model-based lip motion synthesis is presented. The 2D lip motion field is modeled as a linear combination of a low dimensional motion basis computed through Principal Component Analysis (PCA). The vector of PCA coefficients is expressed as a function of a limited set of articulatory parameters which describe the external appearance of the mouth. The acoustical processing estimates these articulatory parameters from the direct analysis of the speech waveform based on a neural processing stage, i.e. through a bank of Time Delay Neural Networks. The achieved results have been subjectively evaluated by visualizing the estimated motion on a wire-frame mouth template presented in synchronization with speech. The experiments carried out so far deal with single-speaker trained TDNNs and with single-speaker PCA, but suitable algorithms for generalizing the techniques are currently under investigation. ** Title: Voice Source Localization for Automatic Camera Pointing System in Videoconferencing Authors: Hong Wang, PictureTel Peter Chu, PictureTel Volume: 1, Page: 187 Abstract: This paper describes the voice source localization algorithm used in the PictureTel automatic camera pointing system (LimeLight-TM, Dynamic Speech Locating Technology). The system uses an array of 46cm wide and 30cm high, which contains 4 microphones, and is mounted on top of the monitor. The three dimensional position of a sound source is calculated from the time delays of 4 pairs of microphones. In time delay estimation, the averaging of signal onsets of each frequency band is combined with phase correlation to reduce the influence of noise and reverberation. With this approach, it is possible to provide reliable three dimensional voice source localization by a small microphone array. Post processing based on a priori knowledge is also introduced to eliminate the influences of reflections from furniture such as tables. Results of speech source localization under real conference room conditions will be given. Some system related issues will also be discussed. ** Title: Video interface for spatiotemporal interactions based on multi-dimensional video computing Authors: Akihito Akutsu, NTT Human Interface Laboratories Yoshinobu Tonomura, NTT Human Interface Laboratories Hiroshi Hamada, NTT Human Interface Laboratories Volume: 1, Page: 191 Abstract: Because digital video is becoming increasingly important for the networked multimedia society, the audio-visual access environment should allow us to do more than just passively watch. We propose a new video user interface concept made possible by multi-dimensional video computing. Multi-dimensional video computing offers a framework for analyzing a video, creating new structures, and restyling and visualizing the video according to the user's demands. The video interface visualizes video content and context structure comprehensibly to allow us to access the spatiotemporal information in videos intuitively. In this paper, we introduce our research activities toward a video interface based on the information extracted from the video. New video interfaces called VideoBrowser, PanoramaVideo, and VideoJigsaw are described. ** Title: Indexing and Search of Multimodal Information Authors: Alexander G. Hauptmann, Carnegie Mellon University Howard D. Wactlar, Carnegie Mellon University Volume: 1, Page: 195 Abstract: The Informedia Digital Library Project allows full content indexing and retrieval of text, audio and video material. The integration of speech recognition, image processing, natural language processing and information retrieval overcomes limits in each technology to create a useful system. In order to answer the question how good speech recognition has to be in order to be useful and usable for indexing and retrieving speech recognizer generated transcripts, some empirical evidence is presented that illustrates the degradation of information retrieval at different levels of speech accuracy. In our experiments, word error rates up to 25% did not significantly impact information retrieval and error rates of 50% still provided 85 to 95% of the recall and precision relative to fully accurate transcripts in the same retrieval system. ** Title: Acoustic Indexing for Multimedia Retrieval and Browsing Authors: Steve J. Young, Cambridge University Engineering Dept Jonathan T. Foote, Cambridge University Engineering Dept Gareth J.F Jones, Cambridge University Engineering Dept Karen Sparck Jones, Cambridge University Computer Lab Martin G. Brown, ORL Ltd Volume: 1, Page: 199 Abstract: This paper reviews the Video Mail Retrieval (VMR) project at Cambridge University and ORL. The VMR project began in September 1993 with the aim of developing methods for retrieving video documents by scanning the audio soundtrack for keywords. The project has shown, both experimentally and through the construction of a working prototype, that speech recognition can be combined with information retrieval methods to locate multimedia documents by content. The final version of the VMR system uses pre-computed phone lattices to allow extremely rapid word spotting and audio indexing, and statistical information retrieval (IR) methods to mitigate the effects of spotting errors. The net result is a retrieval system that is open-vocabulary and speaker-independent, and which can search audio orders of magnitude faster than real time. ** Title: Broadcast News Transcription Authors: Francis Kubala, BBN Hubert Jin, BBN Long Nguyen, BBN Richard Schwartz, BBN Spyros Matsoukas, Northeastern University Volume: 1, Page: 203 Abstract: In this paper we describe our recent work on automatic transcription of radio and television news broadcasts. This problem is very challenging for large vocabulary speech recognition because of the frequent and unpredictable changes that occur in speaker, speaking style, topic, channel, and background conditions. Faced with such a problem, there is a strong tendency to try to carve the input into separable classes and deal with each one independently. In our early work on this problem, however, we are finding that the rewards for condition-specific techniques are disappointingly small. This is forcing us to look for general, robust, and adaptive algorithms for dealing with extremely variable data. Herein, we describe the BBN BYBLOS recognition system configured to handle off-line transcription and we characterize the speech contained in the 1996 DARPA Hub-4 testbed. On the partitioned development test set, we achieved a 29.4% overall word error rate. ** Title: Image/Speech Processing that Adopts an Artistic Approach -Toward Integration of Art and Technology- Authors: Ryohei Nakatsu, ATR-MIC Volume: 1, Page: 207 Abstract: In the areas of image/speech processing, researchers have long dreamed of producing computer agents that can communicate with people in a human-like way. Although the non-verbal aspects of communications, such as emotions-based communications, play very important roles in our daily lives, most research so far has concentrated on the verbal aspects of communications and has neglected the nonverbal aspects. To achieve human-like agents we have adopted a two-way approach. 1. To provide agents with nonverbal communications capability, engineers have started research on emotions recognition and facial expressions recognition. 2. Artists have begun to design and generate the reactions and behaviors of agents, to fill the gap between real human behaviors and those of computer agents. ** Title: Noise Cancelling for Microphone Arrays Authors: Jens Meyer, Darmstadt University of Technology Carsten Sydow, SIEMENS AG Volume: 1, Page: 211 Abstract: In this paper an application of the noise cancelling method for suppression of noise of a microphone array system is discussed. First an overview of the noise cancelling approach is given. This is followed by a description of the employment of the method in a realized microphone array system. The limiting factors are described and theoretical limits of the noise suppression are derived. Experimental results, which are obtained in a realistic environment, are presented. The results show, that depending on the recording situation the noise cancelling approach applied to a microphone array system leads to a significant enhancement of the signal to noise ratio of the array output signal. ** Title: A Microphone Array System for Speech Recognition Authors: Kenji Kiyohara, NTT Human Interface Labs. Yutaka Kaneda, NTT Human Interface Labs. Satoshi Takahashi, NTT Human Interface Labs. Hiroaki Nomura, NTT Human Interface Labs. Junji Kojima, NTT Human Interface Labs. Volume: 1, Page: 215 Abstract: This paper proposes a microphone array system which realizes the following important functions for speech recognition: i) SNR improvement, ii) flat spectrum response for an arbitrary speaker position, and iii) speech period detection in noisy speech. This microphone array system features time delay estimation using pre-whitening signal processing, delay-and-sum array weighted optimally, and speech period detection based on the level difference (called MLD) between signals before and after array processing. Word recognition experiments performed in the presence of crowd noise demonstrate greater robustness of the proposed system against noise than the system with conventional directional microphone and speech period detection method. ** Title: Strategies for combining Acoustic Echo Cancellers and Adaptive Beamforming Microphone Arrays Authors: Walter Kellermann, FH Regensburg Volume: 1, Page: 219 Abstract: New concepts for efficient combination of acoustic echo cancellation (AEC) and adaptive beamforming microphone arrays (ABMA) are presented. By decomposing common beamforming methods into a time-invariant part, which the AEC can integrate, and a separate time-variant part, the number of echo cancellers is minimized without rendering the system identification problem more difficult. Methods for controlling the interaction of ABMA and AEC are outlined and implementations for typical microphone array applications are discussed briefly. ** Title: A Steerable and Variable First-Order Differential Microphone Array Authors: Gary W. Elko, Acoustics Research Department Anh-Tho Nguyen Pong, Speech Processing Software and Technology Research Volume: 1, Page: 223 Abstract: A new first-order differential microphone array with an infinitely steerable and variable beampattern is described. The microphone consists of 6 small pressure microphones flush-mounted on the surface of a 3/4" diameter rigid nylon sphere. The microphones are located on the surface at points where included octahedron vertices contact the spherical surface. By appropriately combining the three Cartesian orthogonal pairs with simple scalar weightings, a general first-order differential microphone beam (or beams) can be realized and directed to any angle in 4(pi) steradian space. A working real-time version has been created and measured results from this microphone are shown. This microphone should be useful for surround sound recording/playback applications and to virtual reality audio applications. ** Title: Microphone Array based Speech Recognition with Different Talker-Array Positions Authors: Maurizio Omologo, ITC-IRST Marco Matassoni, ITC-IRST Piergiorgio Svaizer, ITC-IRST Diego Giuliani, ITC-IRST Volume: 1, Page: 227 Abstract: The use of a microphone array for hands-free continuous speech recognition in noisy and reverberant environment is investigated. An array of eight omnidirectional microphones was placed at different angles and distances from the talker. A time delay compensation module was used to provide a beamformed signal as input to a Hidden Markov Model (HMM) based recognizer. A phone HMM adaptation, based on a small amount of phonetically rich sentences, further improved the recognition rate obtained by applying only beamforming. These results were confirmed both by experiments conducted in a noisy and reverberant environment and by simulations. In the latter case, different conditions were recreated by using the image method to reproduce synthetic versions of the array microphone signals. ** Title: Acoustic Source Location In A Three-Dimensional Space Using Crosspower Spectrum Phase Authors: Piergiorgio Svaizer, ITC-IRST Marco Matassoni, ITC-IRST Maurizio Omologo, ITC-IRST Volume: 1, Page: 231 Abstract: A microphone array can be used to locate a dominant acoustic source in a given environment. This capability is successfully employed to locate an active talker in teleconferencing or other multi-speaker applications. In this work the source location is obtained in two steps: 1) a Time Difference Of Arrival (TDOA) computation between the signals of the array; 2) an ``optimal'' source location based on the interchannel delay estimates and on a geometrical description of the sensor arrangement. The Crosspower Spectrum Phase technique was used for TDOA estimation, while a Maximum Likelihood approach was followed to derive the source coordinates. Source location experiments in a three-dimensional space were performed by means of an array of 8 microphones. For this purpose both a loudspeaker and a real talker were used to collect data in a large noisy and reverberant room. ** Title: Superdirective Microphone Array for a Set-Top Videoconferencing System Authors: Peter Chu, PictureTel Volume: 1, Page: 235 Abstract: In set-top videoconferencing, the complete videoconferencing system fits unobtrusively on top of the television. The microphone sound pickup system is one of the most important functional blocks with constraints of small size, high performance, and low cost. Persons speaking several feet away from the system must be picked up satisfactorily while noise generated internally in the system by the cooling fan and hard drive, and noise generated externally from air conditioning and nearby computers must be attenuated. In this paper, a three microphone superdirective array is described which meets these constraints. An analog highpass and lowpass filter are used to merge two of the microphone signals to form a single channel, so that a single stereo A/D converter is required to process the three microphone signals. The microphone signals are then linearly combined so as to maximize the signal-to-noise ratio, resulting in nulls steered toward nearby objectionable noise sources. ** Title: Simultaneous Echo Cancellation and Car Noise Suppression Employing a Microphone Array Authors: Matttias Dahl, University of Karlskrona/Ronneby Ingvar Claesson, University of Karlskrona/Ronneby Sven Nordebo, University of Karlskrona/Ronneby Volume: 1, Page: 239 Abstract: This paper presents a method to simultaneously perform 20~dB acoustic echo cancellation and 15-20~dB speech enhancement using an adaptive microphone array combined with spectral subtraction. Primarily intended for handsfree telephones in automobiles, the microphone array system simultaneously emphasizes the near-end talker and suppresses the handsfree loudspeaker and the broadband car noise. The array system is based on a fast and efficient on-site calibration and can be used in other situations such as conventional speaker phones. ** Title: Analytical Evaluation of a Self-calibrating Microphone Array Authors: Sven Nordholm, University of Karlskrona/Ronneby Ingvar Claesson, University of Karlskrona/Ronneby Volume: 1, Page: 243 Abstract: This paper gives an analytical description of an adaptive microphone array which facilitates a simple built-in calibration to the environment and instrumentation. The scheme offers several advantages, such as a simple calibration procedure and reduced target signal distortion. The analysis employs noncausal Wiener filters yielding compact and effective theoretical suppression limits. ** Title: Microphone Array Response to Speaker Movements Authors: Yves Grenier, ENST Sofiene Affes, INRS-Telecommunications Volume: 1, Page: 247 Abstract: Matched filtering and adaptive beamforming are both necessary for efficient speech dereverberation and noise reduction by microphone arrays. This can be achieved by the identification of impulse responses. In this contribution, we show that adaptive microphone arrays are sensitive to identification errors of impulse responses, particularly due to speaker movements. We prove that adjusted matched-filtering and permanent tracking of impulse responses are also necessary. The proposed microphone array responds well to these requirements under realistic conditions. ** Title: A Digital Processing System for Source Location and Sound Capture by Large Microphone Arrays Authors: Harvey F. Silverman, Brown University William R. Patterson, Brown University James L. Flanagan, Rutgers University Daniel Rabinkin, Rutgers University Volume: 1, Page: 251 Abstract: The Huge Microphone Array(HMA) project started in February 1994 to design, construct, and test a real-time 512-microphone array system and to develop algorithms for use on it. Analysis of known algorithms showed that signal-processing performance of over 6 Gigaflops would be required; at the same time, there was a need for portability, i.e., fitting into a small van. These tradeoffs and many others have led to a unique design in both hardware and software. This paper presents the design and its justifications. Performance data for a few important algorithms relative to usage of processing-capability, response latency, and difficulty of programming are discussed. ** Title: 3-D Unitary ESPRIT for Joint 2-D Angle and Carrier Estimation Authors: Martin Haardt, Siemens Josef A. Nossek, Technical University of Munich Volume: 1, Page: 255 Abstract: It is essential for an efficient frequency and time slot allocation procedure in future mobile communication systems using space division multiple access (SDMA) to determine the mobiles that are spatially well separated from one another. Thus, once a mobile desires to initiate a call, precise knowledge of the 2-D arrival angles of its dominant wavefronts is required. In this application, 3-D Unitary ESPRIT for joint 2-D angle and carrier estimation offers an efficient way to handle such mobile access requests since it provides efficient high-resolution measurements of the spatial characteristics of the wireless channel, even if only a small number of antennas is available at the base station. Automatic pairing of the 3-D estimates is achieved via a new simultaneous Schur decomposition (SSD) of three real-valued, non-symmetric matrices. In general, the SSD enables an R-dimensional extension of Unitary ESPRIT (R greater-or-equal-to 3) to estimate several undamped R-dimensional modes or frequencies along with their correct pairing in multidimensional harmonic retrieval problems. Here, we present a Jacobi-type method to calculate the SSD. For each of the R dimensions, the corresponding frequency estimates are obtained from the real eigenvalues of a real-valued matrix. The SSD jointly estimates the eigenvalues of all R matrices and, thereby, achieves automatic pairing of the estimated R-dimensional modes via a closed-form procedure that neither requires any search nor any other heuristic pairing strategy. ** Title: Quality enhancement of coded and corrupted speeches in GSM mobile systems using residual redundancy Authors: Thomas Hindelang, Technical University of Munich Wen Xu, Technical University of Munich Christian Erben, Technical University of Munich Volume: 1, Page: 259 Abstract: There is often residual redundancy remaining in coded speech data, even if a powerful speech codec (e.g. the full rate coder used in GSM mobile communications) is employed. By using such redundancy together with the information provided by the channel decoder, such as soft output (L-value), the number of channel bits inverted by the decoder, or a cyclic redundancy check, the bit error rate can be further reduced and a more graceful degradation of speech quality can be achieved, especially under bad channel conditions. In this paper, we report on the study with regard to this aspect for GSM full rate speech transmission and error concealment. The algorithms developed can be easily implemented with a currently available DSP designed for GSM mobile phones. ** Title: Pilot Assisted Coherent DS-CDMA Reverse-Link Communications with Optimal Robust Channel Estimation Authors: Fuyun Ling, Motorola Volume: 1, Page: 263 Abstract: Optimal pilot assisted estimation of communication channels is considered for coherent cellular and PCS CDMA reverse link communications. Both pilot symbol and pilot channel based schemes are described and the optimal estimators for these two schemes are analyzed. Relative mean square estimation error (RMSEE) and optimal power allocation between data and pilot signals are derived based on the analysis. Finally, simulation results are given to show the reverse link performance can be significantly improved by using the pilot assisted coherent communication instead of non-coherent schemes for CDMA reverse link. ** Title: A new Frequency Estimator applied to Burst Transmission Authors: Christian Bergogne, Telecom Paris, Alcatel Telspace Michel Bousquet, ENSAE Philippe Sehier, Alcatel Telspace Volume: 1, Page: 267 Abstract: In TDMA communications systems using all feedforward sychronization techniques, the quality of data decoding strictly depends on the estimation accuracy of the synchronization parameters (timing, carrier phase/frequency and preamble detection) extracted from the received signal. The frequency offset estimation is the most critical point. Indeed, an inaccurate frequency estimation can cause cycle slips and then errors during decoding. In this paper, we propose a new frequency estimator, analytically derived from the Maximum Likelihood principle and optimized thanks to variance simulations. Its performance is compared to the Cramer Rao Bound. ** Title: Unified Specification of Control and Data Flow Authors: Thorsten Groetker, ISS, RWTH Aachen Rainer Schoenen, ISS, RWTH Aachen Heinrich Meyr, ISS, RWTH Aachen Volume: 1, Page: 271 Abstract: Many signal processing systems use event driven mechanisms - typically based on finite state machines (FSMs) - to control the operation of computationally intensive (data flow) parts. The state machines in turn are often fueled by external inputs as well as by feedback from the signal processing portions of the system. Packet-based transmission systems are a good example for such a close interaction between data and control flow. For an efficient design flow it is of crucial importance to be able to model and analyze the complete functionality of the system within one single design environment. Therefore, we developed a computational model that integrates the specification of control and data flow by combining the notion of data flow graphs with event driven process activation. ** Title: Reconfigurable Processing: The Solution to Low-Power Programmable DSP Authors: Jan M. Rabaey, University of California at Berkeley Volume: 1, Page: 275 Abstract: One of the most compelling issues in the design of wireless commu- nication components is to keep power dissipation between bounds. While low-power solutions are readily achieved in an application- specific approach, doing so in a programmable environment is a sub- stantially harder problem. This paper presents an approach to low- power programmable DSP that is based on the dynamic reconfigura- tion of hardware modules. This technique has shown to yield at least an order of magnitude of power reduction compared to traditional instruction-based engines for problems in the area of wireless com- munication. ** Title: DSP Cores for Moblile Communications: Where are we going ? Authors: Gerhard P. Fettweis, Technical University of Dresden Volume: 1, Page: 279 Abstract: Digital signal processors (DSPs) have become a key component for the design of communications ICs. Application customization leads to key market advantages but also to enormous problems of having too many different DSPs and their software development tools. First, by analysis of the problem open issues are pointed out. Then, a possible solution named CATS is presented, which allows for customization without the generation of too much heterogeneity in hardware and tools. ** Title: DSPs in Mobile Communication in the United States Authors: Sanjay Kasturia, Bell Labs, Lucent Technologies Colin Warwick, Bell Labs, Lucent Technologies Volume: 1, Page: 283 Abstract: The mobile communication industry in the United States is undergoing major changes. Auctioning of additional spectrum will lead to more service providers and will significantly increase competition. Service providers are likely to customize the services they offer to differentiate themselves from others. We will discuss possible technologies for differentiation of services and the implications of these on the requirements for embedded DSPs. In the US, supporting the customization in the absence of a single industry wide standard, and the high likelihood of at least three widely used air interfaces will significantly challenge the ability of the industry to serve the phone needs of all service providers. The need for customization in the context of multiple standards, will create strong pressure to significantly improve the code development environment for DSPs. This also implies evolution to architectures that are more friendly to developers. ** Title: FRIDGE: An Interactive Code Generation Environment for HW/SW CoDesign Authors: Markus Willems, ISS, RWTH Aachen Volker Bursgens, ISS, RWTH Aachen Thorsten Grotker, ISS, RWTH Aachen Heinrich Meyr, ISS, RWTH Aachen Volume: 1, Page: 287 Abstract: Digital mobile systems are sensitive to power consumption, chip size and costs. Therefore they are realized using fixed-point architectures, either dedicated HW or fixed-point processors. On the other hand, system design starts from a floating-point description. These requirements have been the motivation for FRIDGE, a design environment for the specification, evaluation and implementation of fixed-point systems. FRIDGE offers a seamless design flow from a floating-point description to a fixed-point implementation. Within this paper we focus on the FRIDGE-concept of an interactive, automated transformation of floating-point programs written in ANSI-C into fixed-point specifications, based on an interpolative approach. Since HW and SW implementations of the same functionality in general require different fixed-point specifications, the design time reductions that can be achieved by using FRIDGE make it a key component for an efficient HW/SW-CoDesign. ** Title: Staying Ahead of the Game In Silicon for Digital Mobile Communications Authors: Ravi Subramanian, Synopsys Inc. Marc Barberis, Synopsys Inc. Herbert Dawid, Synopsys GmbH Volume: 1, Page: 291 Abstract: While the mobile communication electronics industry's appetite grows for ever more functions and ever higher levels of integration, the complexity of these large designs is creating a discontinuity in the method by which these systems are designed. In this paper, we will take a close look at what is causing the design discontinuity, and how new design technologies are being used to design advanced digital communications systems for portable and wireless communication applications. We will examine how system-level design tools closely tied to silicon design implementation and verification technologies are enabling the creation of digital communications ICs in record time. We take several examples of commercially available silicon solutions designed using these methodologies- a G.721 ADPCM speech codec for cordless telephony and a complete variable-rate digital-video broadcast receiver for the DVB-S broadcast standard. ** Title: Approximation of Optimal Step Size Control for Acoustic Echo Cancellation Authors: Christiane Antweiler, RWTH Aachen Jorn Grunwald, RWTH Aachen Holger Quack, RWTH Aachen Volume: 1, Page: 295 Abstract: One of the most widely used gradient-based adaptation algorithms is the so called normalized least mean square (NLMS) algorithm. The rate of convergence, misadjustment and noise insensitivity of the NLMS-type algorithm depend on the proper choice of the step size parameter, which controls the weighting applied to each coefficient update. Different step size methods have been proposed to improve the convergence of NLMS-type filters, while preserving the steady-state performance. The step size methods considered here use either a step size parameter which varies with time or a separate, tap-individual step size for each filter tap. The derivation of the respective step size methods is based on different optimization criteria. In this paper a step size parameter is proposed satisfying a combined optimization criterion leading to a time variant and individual step size parameter. The realization aspects of the new concept are discussed for an acoustic echo control application as an example. ** Title: Subband stereo echo canceller using the projection algorithm with fast convergence to the true echo path Authors: Shoji Makino, NTT Human Interface Labs Klaus Strauss, NTT Human Interface Labs Suehiro Shimauchi, NTT Human Interface Labs Yoichi Haneda, NTT Human Interface Labs Akira Nakagawa, NTT Human Interface Labs Volume: 1, Page: 299 Abstract: This paper proposes a new subband stereo echo canceller that converges to the true echo path impulse response much faster than conventional stereo echo cancellers. Since signals are bandlimited and downsampled in the subband structure, the time interval between the subband signals become longer, so the variation of the crosscorrelation between the stereo input signals becomes large. Consequently, convergence to the true solution is improved. Furthermore, the projection algorithm, or affine projection algorithm, is applied to further speed up the convergence. Computer simulations using stereo signals recorded in a conference room demonstrate that this method significantly improves convergence speed and almost solves the problem of stereo echo cancellation with low computational load. ** Title: A Better Understanding and an Improved Solution to the Problems of Stereophonic Acoustic Echo Cancellation Authors: Jacob Benesty, Bell Labs Dennis R. Morgan, Bell Labs M. Mohan Sondhi, Bell Labs Volume: 1, Page: 303 Abstract: Teleconferencing systems employ acoustic echo cancelers (AECs) to reduce echos that result from coupling between the loudspeaker and microphone. To enhance the sound realism, two-channel audio is necessary. However, in this case (stereophonic sound) the acoustic echo cancellation problem is more difficult to solve because of the necessity to uniquely identify two acoustic paths. In this paper, we explain these problems in detail and give an interesting solution which is much better than previously known solutions. The basic idea is to introduce a small nonlinearity into each channel that has the effect of reducing the interchannel coherence while not being noticeable for speech due to self masking. ** Title: Comparison of three post-filtering algorithms for residual acoustic echo reduction Authors: Valerie Turbin, CNET Andre Gilloire, CNET Pascal Scalart, CNET Volume: 1, Page: 307 Abstract: We consider an acoustic echo control system composed of a short conventional acoustic echo canceller combined with a post-filter in a teleconference context. The post-filter is implemented in an open-loop structure in the frequency domain, which provides good adaptive performance and flexibility for the choice of the post-filter length. Three post-filtering algorithms are compared in terms of residual echo attenuation and near-end speech distortion. The effect of the post-filter length is also examined. Our study confirms that the post-filtering approach provides high residual echo attenuation. Moreover, it appears that the distortion of the near-end speech can be controlled by choosing appropriately the post-filter length. ** Title: Audio Coding Using Sinusoidal Excitation Representation Authors: Wen-Whei Chang, National Chiao-Tung University De-Yu Wang, National Chiao-Tung University Li-Wei Wang, National Chiao-Tung University Volume: 1, Page: 311 Abstract: Most LPC-based audio coders employ simplistic noise-shaping operations to perform psychoacoustic control of quantization noise. In this paper, we report on new approaches to exploiting perceptual masking in the design of adaptive quantization of LPC excitation parameters. Due to its localized spectral sensitivity, sinusoidal excitation representation is preferred to spectrally flat signals for use in excitation modeling. Simulation results indicate that the proposed multisinusoid excited coder can deliver high quality audio reproduction at the rate of 72 kb/s. ** Title: Optimum Bit Allocation and Decomposition for High Quality Audio Coding Authors: Xiang Wei, University of Central Lancashire Martyn J. Shaw, University of Central Lancashire Martin R. Varley, University of Central Lancashire Volume: 1, Page: 315 Abstract: Current audio compression schemes are capable of reducing the per channel bit rate of high quality audio signals from 16 bits per sample to around 2-4 bits per sample. In these schemes, knowledge of psychoacoustics is utilised and a uniform or nonuniform frequency decomposition method is used. In this paper we derive the optimum bit allocation to achieve the highest perceptual quality under a fixed bit rate, for an arbitrarily decomposed, critically sampled, filter bank. The resultant optimum bit allocation gives rise to a shaped reconstruction noise floor approximately parallel to the masking threshold level. Perceptual coding gain is defined and should be maximized for an optimum decomposition performed by the filter bank. Optimum band splitting is discussed and it is pointed out that decomposition in the manner of critical band splitting does not lead to optimal performance. ** Title: The D5 Lattice Quantization For A 64 KBit/S Low-Delay Subband Audio Coder With A 15 KHz Bandwidth Authors: Karine Hay, ENST-Br, Dept. SC. S. Saoudi, ENST-Br, Dept. SC. L. Mainard, CCETT, Servive RCS/SDA Volume: 1, Page: 319 Abstract: A new method for coding generic audio signals at 64 kbit/s in the 20-15000 Hz bandwidth with a low delay is presented. It combines subband coding, Low Delay CELP algorithm and cascaded filterbanks. Our earlier works shown that, when using an equal bit rate on each subband, the resulting audio quality was not appropriate. We propose here a new technique based on lattice quantization to avoid the search complexity of the statistical vector quantization. It allows an adaptive bit rate allocation in each subband. Experimental results assessing the validity of the proposed method are also presented. ** Title: An Experimental Audio Codec Based on Warped Linear Prediction of Complex Valued Signals Authors: Aki Harma, Helsinki University of Technology Unto K. Laine, Helsinki University of Technology Matti Karjalainen, Helsinki University of Technology Volume: 1, Page: 323 Abstract: Bark-scale warped linear prediction [WLP] is a very potential core for a monophonic perceptual audio codec. In the current paper the WLP scheme is extended for processing complex valued signals (CWLP). Three different methods of converting a stereo signal to one complex valued signal are introduced. The philosophy behind the coding scheme is to integrate some aspects of modern wideband audio coding (e.g. perceptuality and stereo signal processing) into one computational element in order to find a more holistic and economic way of processing. ** Title: High Quality Low Complexity Scalable Wavelet Audio Coding Authors: William Kurt Dobson, U.S. Robotics Jiankan Jack Yang, U.S. Robotics Kevin J. Smart, U.S. Robotics Feng Kathy Guo, U.S. Robotics Volume: 1, Page: 327 Abstract: This paper presents an audio coder for real-time multimedia applications. To achieve high quality at low bit rate, the audio coder uses a wavelet packet decomposition to transform the audio data into the wavelet domain, and a psychoacoustic model is used to minimize quantization noise. The wavelet packet decomposition tree structures were chosen in a way to closely mimic the critical bands in a psychoacoustic model. Instead of determining the masking thresholds in the Fourier domain, the wavelet coefficients are used to drive the psychoacoustic model directly. Most of the standard industrial sampling frequencies are supported by this coder. An efficient bit rate control scheme was designed such that the audio coder operates at virtually any desired bit rate level. The audio coder achieves near perceptually lossless quality at or below 80 kb/s for most audio sources. Real-time encoding/decoding is possible by using only a fraction of a Pentium or faster CPU. ** Title: An Efficient Tonal Component Coding Algorithm For MPEG-2 Audio NBC Authors: Yuichiro Takamizawa, NEC Corporation Masahiro Iwadare, NEC Corporation Akihiko Sugiyama, NEC Corporation Volume: 1, Page: 331 Abstract: This paper proposes a tonal component coding algorithm for a codec that employs a transform followed by Huffman coding, such as MPEG-2 Audio NBC (Non-Backward Compatible). After the input audio signal is mapped onto a frequency domain, the proposed algorithm withdraws local maximum components that degrade coding efficiency. By this withdrawal, the flatness of the spectrum increases and the efficiency in Huffman coding is improved. The withdrawn components are encoded separately as side information. When the frequency resolution of the time/frequency mapping is high, this algorithm works more effectively since local maximum samples appear more frequently with such a mapping. Simulation results show that this algorithm achieves as much as 11% bit reduction per frame and improves the coding efficiency in 41% of all the audio frames. ** Title: Spectral Amplitude Warping (SAW) for Noise Spectrum Shaping in Audio Coding Authors: Roch Lefebvre, University of Sherbrooke Claude Laflamme, University of Sherbrooke Volume: 1, Page: 335 Abstract: In this paper, we present a new approach to shape the coding noise in speech and audio coders. The approach, called Spectral Amplitude Warping (SAW), consists essentially of a pre- and post-processing which apply a non-linear transformation to the signal short-term spectrum prior to, and after, encoding. Since it is possible to view SAW as a separate entity from the coder, the noise shaping capability of an existing coder can be improved without modifying the coder itself. Using SAW as a pre- and post-process to the G.722 wideband speech coding standard, it was found in an informal listening test that the quality of the 64 kb/s operating mode can be achieved at only 48 kb/s. The price to be paid is an additional delay. ** Title: A fast noise-scaling algorithm for uniform quantization in audio coding schemes Authors: Carlos A. Serantes, Universidad de Vigo Antonio S. Pena, Universidad de Vigo Nuria Gonzalez-Prelcic, Universidad de Vigo Volume: 1, Page: 339 Abstract: A new bit assignment algorithm is presented. Its goals are the simultaneous assignment on all subbands in a few steps of an iterative calculus, the use of memory to achieve a better speed of convergence and the consideration of a deformable error curve. The basis of the algorithm is discussed and also other considerations that are likely to arise in practice. Finally, an example of performance is given. ** Title: Pyramid Vector Coding for high quality audio compression Authors: Daniele Cadel, Cefriel Giorgio Parladori, Alcatel Telecom Volume: 1, Page: 343 Abstract: Target of this work is the high quality audio coding at low bit rate. It will be shown how the Pyramid Vector Coding (PVC) can conveniently replace the classical Huffman Coding technique in audio compression systems, giving also an advantage in the bit allocation procedure. The compression performances can be further improved by fixing an upper limit value of the vector components. ** Title: Subband Audio Coding with Synthesis Filters Minimizing a Perceptual Criterion Authors: Karine Gosse, ENST Paris Francois Moreau de Saint-Martin, CCETT Xavier Durot, CCETT Pierre Duhamel, ENST Paris Jean-Bernard Rault, CCETT Volume: 1, Page: 347 Abstract: The design of filter banks for source coding purposes classically relies on the perfect reconstruction (PR) property. However, several recent studies have shown that taking the quantization noise into account in the design could yield noticeable reduction of the mean square reconstruction error. The purpose of this study is to show that perceptual improvement can also be obtained in the particular audio coding context by relaxing the PR constraint. In this context, the mean square error is not relevant any more, and we define a new perceptual distortion criterion, making use of a simplified ear model, the MPE (Mean Perceptual Error). Then, synthesis filters are optimized so as to minimize this MPE. Finally, this MMPE (Minimum MPE) filter bank is included in an audio coding scheme. Compared to the corresponding PR filter bank-based scheme by the means of POM (Perceptual Objective Measure), they show an improved audio quality. ** Title: New Results in Low Bitrate Audio Coding Using a Combined Harmonic-Wavelet Representation Authors: Simon Boland, Queensland University of Technology Mohamed Deriche, Queensland University of Technology Volume: 1, Page: 351 Abstract: In this paper, we propose a new combined harmonic-wavelet representation for audio where a harmonic analysis-synthesis scheme is used, first, to approximate each audio frame as a sum of several sinusoids. Then, the difference between the original signal and the reconstructed harmonic signal is analyzed using a wavelet filtering scheme. After each step (harmonic analysis & wavelet filtering), parameters are quantized and encoded. Compared to previously proposed methods, our audio coder uses different harmonic analysis-synthesis and wavelet filtering schemes. We use the Total Least Squares (TLS)-Prony algorithm for the harmonic analysis-scheme, and an M-band wavelet transform for analyzing the residual. Altogether, our proposed coder is capable of delivering excellent audio signal quality at encoder bitrates of 60-70 kb/s. ** Title: Adaptive Inverse Control of Weakly Nonlinear Systems Authors: Wolfgang J. Klippel, Dresden Volume: 1, Page: 355 Abstract: A weak nonlinear plant can be linearized and will track an input signal if the plant is preceded by a nonlinear controller which approximates the inverse of the plant's transfer function. Present techniques for adjusting the controller adaptively to the plant require an additional nonlinear adaptive filter to perform a separate system identification. Straightforward update algorithms can not directly update the filter parameters in the controller because the transfer function of the plant might cause instabilities in the adaptive process. This problem is overcome by performing additional linear filtering to the nonlinear state vector and/or error signal. Novel filtered-A and filtered-E modifications of the stochastic gradient based methods are presented which are capable to update generic as well as special block-oriented nonlinear filter architectures. ** Title: Broadband Beamforming with Adaptive Postfiltering for Speech Acquisition in Noisy Environments Authors: Sven Fischer, Ericsson Eurolab Karl-Dirk Kammeyer, University of Bremen Volume: 1, Page: 359 Abstract: In this paper the implementation of a broadband beamformer which is built up by several harmonically nested subarrays for each octave band combined with optimal postfiltering is described. This method has the advantage of providing large sensor distances for the postfilter estimation by simultaneously controlling the directivity of the array. The selection of an optimal postfilter is discussed in detail and its estimation based on a Nuttall/Carter method for spectrum estimation is described. The resulting noise reduction system yields improved performance in diffuse noise fields and no distortions in the case of coherent direct path noise. Furthermore, the system is robust to steering misadjustment. ** Title: Near-field Beamforming for Microphone Arrays Authors: James G. Ryan, National Research Council Rafik A. Goubran, Carleton University Volume: 1, Page: 363 Abstract: This paper describes the application of array optimization techniques to improving the near-field response of an arbitrary microphone array. The optimization exploits the differences in wavefront curvature between near-field and far-field sound sources and is suitable for reverberation reduction in small rooms. The optimum near-field beamformer provides increased array gain over that obtained from a uniformly weighted delay-and-sum beamformer. ** Title: A Robust Adaptive Microphone Array with Improved Spatial Selectivity and Its Evaluation in an Echoic Environment Authors: Osamu Hoshuyama, NEC Akihiko Sugiyama, NEC Akihiro Hirano, NEC Volume: 1, Page: 367 Abstract: This paper presents a new robust adaptive microphone array (AMA) and its evaluation in an echoic environment. The proposed AMA is a generalized sidelobe canceller equipped with a variable blocking matrix using coefficient-constrained adaptive filters, and a multiple-input canceller using norm-constrained adaptive filters (NCAFs). Because the NCAFs have selective nonlinearity in the relationship between coefficient norm and coefficient error, the proposed AMA has better spatial selectivity than the conventional AMA. Evaluation with real acoustic data captured in a room of 0.3-second reverberation time shows that the noise is suppressed by 19 dB. In subjective evaluation, the proposed AMA obtains 3.8 on a 5-point mean-opinion-score scale. ** Title: Tracking Multiple Talkers using Microphone-Array Measurements Authors: Douglas E. Sturim, Brown University Harvey F. Silverman, Brown University Michael S. Brandstein, Harvard University Volume: 1, Page: 371 Abstract: A method for tracking the positional estimates of multiple talkers in the operating region of an acoustic microphone array is presented. Initial talker location estimates are provided by a time-delay-based localization algorithm. These raw estimates are spatially smoothed by a Kalman filter derived from a set of potential source motion models. Data association techniques based on the estimate clusterings and source trajectories are incorporated to match location observations with individual talkers. Experimental results are presented for array recorded data using multiple talkers in a variety of scenarios. ** Title: A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Rooms Authors: Michael S. Brandstein, Harvard University Harvey F. Silverman, Brown University Volume: 1, Page: 375 Abstract: Conventional time-delay estimators exhibit dramatic performance degradations in the presence of multipath signals. This limits their application in reverberant enclosures, particularly when the signal of interest is speech and it may not possible to estimate and compensate for channel effects prior to time-delay estimation. This paper details an alternative approach which reformulates the problem as a linear regression of phase data and then estimates the time-delay through minimization of a robust statistical error measure. The technique is shown to be less susceptible to room reverberation effects. Simulations are performed across a range of source placements and room conditions to illustrate the utility of the proposed time-delay estimation method relative to conventional methods. ** Title: A Model-Based Approach to Active Noise Cancellation Using Loudspeaker Array Authors: Jie Gu, HK University of Science & Tech. Sze Fong Yau, HK University of Science & Tech. Volume: 1, Page: 379 Abstract: This paper presents a new model-based adaptive noise cancellation system using loudspeaker array and error sensor array which can be used to reduce the noise in a specific three-dimensional region. First, open loop system transfer functions are designed using a theoretical propagation model. The transfer functions thus found are regarded as the nominal values for the complete system. Second, to compensate for deviations from the theoretical model, the transfer functions are adapted using error measures from error sensor array by LMS algorithm. Computer simulation results shows that our approach is effective for noise reduction in 3-D space. Experiments using real-time active noise control hardware also confirms the performance of the system. ** Title: Reverberant Sound Field Analysis using a Microphone Array Authors: Wolfgang Tager, CNET Yannick Mahieux, CNET Volume: 1, Page: 383 Abstract: The use of microphone arrays for sound pickup in reverberant environments has been proposed by many authors. The observation on the M microphones can be decomposed into a spatially coherent and an incoherent part. The first one is due to perfect (plane or spherical) sound waves caused by the direct path and specular reflections, whereas the latter is caused by diffusion, diffraction, non-perfect reflections, electrical and quantization noise. In this paper we firstly present a deflation method to detect and localize spatially coherent waves from the measured impulse responses. In a second step the filters which model the source directivity and the reflecting materials are estimated. The model takes into account nearfield delay, range attenuation, microphone and source directivity as well as non trivial reflections. ** Title: Minimisation of the Maximum Error Signal in Active Control Authors: Alberto Gonzalez, UPV, Valencia Antonio Albiol, UPV, Valencia Stephen J. Elliott, ISVR, Southampton Volume: 1, Page: 387 Abstract: This paper deals with Multiple Input Multiple Output systems for active control of acoustic signals. These systems are used when the acoustic field is complex and therefore a number of sensors are necessary to estimate the sound field and a number of sources to create the cancelling field. A steepest descent iterative algorithm is applied to minimise the p-norm of a vector composed by the output signals of a microphone array. The existing algorithms deal with the 2-norm of this vector. This paper describes a general framework that covers the existing systems and then it focuses on the (infinity)-norm minimisation algorithm. The minimax algorithm based on the (infinity)-norm minimises the output signal which has the greatest power. It is shown by means of simulations using measured data from a real room that the minimax algorithm leads to a more uniform final noise field than the existing algorithms. ** Title: Subband Active Noise Control Algorithm Based on a Delayless Subband Adaptive Filter Architecture Authors: Jeong-Hyeon Yun, Yonsei University Dae-Hee Youn, Yonsei University Young-Cheol Park, Samsung Electronic Volume: 1, Page: 391 Abstract: In this paper, a new active noise control algorithm based on a delayless subband adaptive filter architecture is presented. Also, an on-line system identification method implemented in the subband structure is suggested. To implement the filtered-x LMS algorithm in the subband structure, the secondary path transfer function is decomposed into sets of subband functions. The two filter on-line modeling algorithm is then applied to each subband to estimate the secondary-path transfer function in a decomposed form. In this manner, the computational load for the on-line system identification is reduced by a factor 3 compared with the wideband approach. Simulation results are presented to show the efficiency of the new ANC algorithm and the performance of the on-line system identification scheme. ** Title: Nonlinear Active Noise Control in a Linear Duct Authors: Paul Strauch, University of Edinburgh Bernard Mulgrew, University of Edinburgh Volume: 1, Page: 395 Abstract: The problem in active noise control in a linear duct is examined. Essentially, a nonlinear inverse to a nonminimum phase actuator is proposed. The nonlinear inverse exploits the non-Gaussian nature of some chaotic and stochastic noise sources. The architecture of the controller is derived using Bayesian estimation theory and is shown to be a combination of a linear adaptive network and a radial basis function (RBF) or Volterra series (VS) network. Because of the nonlinear nature of the controller, the filtered-x least means square (LMS) architecture cannot be used. Hence a modified active noise controller is proposed. Simulation results demonstrate the improvements in performance achievable with the combined linear and nonlinear controller. ** Title: Fast Exact Filtered-X LMS and LMS Algorithms for Multichannel Active Noise Control Authors: Scott C. Douglas, University of Utah Volume: 1, Page: 399 Abstract: In some situations where active noise control could be used, the well-known multichannel version of the filtered-X LMS adaptive filter is too computationally-complex to implement. In this paper, we develop a fast, exact implementation of this multichannel system whose complexity is approximately O(2L) per filter channel, where L is the FIR filter length. In addition, we provide a computationally-efficient method for effectively removing the delays of the secondary paths within the coefficient updates, thus yielding a fast implementation of the LMS adaptive algorithm for multichannel active noise control. Examples illustrate both the equivalence of the algorithms to their original counterparts and the computational gains provided by the new algorithms. ** Title: A Novel Frequency Domain Filtered-X LMS Algorithm for Active Noise Reduction Authors: Toshifumi Kosakat, Tokyo National College of Technology Stephen J. Elliott, University of Southampton Christopher C. Boucher, University of Southampton Volume: 1, Page: 403 Abstract: A Frequency Domain implementation of the LMS Algorithm has significant advantages. In broadband applications it is important to use the correct window function before Fourier transformation to obtain an unbiased estimation of the required cross correlation function and to eliminate wrap-around effects. In the Frequency Domain Filtered-X LMS Algorithm described in this paper, the control filter is updated in the frequency domain as a background task, while control filtering is performed in time domain, to minimize processing delays. The frequency domain algorithm showed better performance than the conventional time domain algorithm in simulations of single channel active control systems. The algorithm is also able to improve the convergence of multiple channel systems by compensating for the coupling between the control channels. ** Title: Practical Supergain Head Sized Arrays Authors: Dorra Masmoudi, University of Bordeaux Dominique Dallet, University of Bordeaux Jean Paul Dom, University of Bordeaux Volume: 1, Page: 407 Abstract: This paper carried out a new design of head sized sensor arrays with a simple delay-and-sum beamforming which provides useful amounts of directivity index with sufficient robustness to errors. A frequency-independant sidelobe reduction is proposed to achieve optimal frequency characteristics. In order to obtain this control, a principle of combining multiple level of array structures is established. Results are presented for spherically isotropic noise. It is found that good performance can be obtained for a head sized array by combining multiple level structures with simple delay and sum beamformer. ** Title: A Multichannel Compression Strategy for a Digital Hearing Aid Authors: Todd Schneider, Unitron Robert Brennan, Unitron Volume: 1, Page: 411 Abstract: Multi-channel compression schemes are a practical method of mapping the wide dynamic range of speech signals into the reduced dynamic range of hearing impaired listeners. These systems address two of the shortcomings of single-channel compression systems: (1) the reduction of gain as a result of narrow-band non-speech stimuli and (2) the reduction of gain that often occurs when high-frequency speech components are followed by intense low-frequency speech components. They also provide frequency-dependent compression ratios that are needed by many newer supra-threshold fitting strategies (e.g., DSL I/O). This paper presents a multichannel compression scheme that employs an oversampled, polyphase DFT filterbank. In each compressor channel, the gain is controlled by an adjustable combination of a overall, dual time-constant input signal level and the individual channel signal level that is measured with a short time-constant RMS detector. Informal listening tests have demonstrated that the design has very good audio quality and performs well in real-world listening situations. The design is suited for low-power, real-time operation. ** Title: Multi-Microphone Sub-band Adaptive Signal Processing for Improvement of Hearing Aid Performance: Preliminary Test Results using Normal Hearing Volunteers Authors: Paul Shields, University of Paisley Douglas R. Campbell, University of Paisley Volume: 1, Page: 415 Abstract: A system for the binaural pre-processing of speech signals for input to a standard linear hearing aid has been proposed. The work is based on that of Toner & Campbell which applied the Least Mean Squares (LMS) algorithm in sub-bands to speech signals from various acoustic environments and signal to noise ratios (SNR). The method attempts to take advantage of the multiple inputs to perform noise cancellation. The use of sub-bands enables a diverse processing mechanism to be employed, where the wide-band signal is split into smaller sub-bands, which can subsequently be processed according to their signal characteristics. The results of a series of intelligibility tests are presented from experiments in which acoustic speech and noise data, generated in a simulated room was tested on normal hearing volunteers. ** Title: Environmental noise reduction based on speech/non-speech identification for hearing aids Authors: Kenzo Itoh, NTT HI Labs. Masahide Mizushima, NTT HI Labs. Volume: 1, Page: 419 Abstract: We proposed a vary practical and useful noise reduction system that has wide application for hearing impaired persons, such as a sound-gathering system at a lecture hall or conference room. The system uses two basic technologies, a speech/non-speech identification process and a new noise reduction process. A speech/non-speech identification process uses four characteristics of the time and frequency domains of the input signal. In the noise reduction process, frequency weighting function is used for basic spectral subtraction and loss control algorithm. Various kinds of environmental noise were reduced by this system, which showed excellent performance. Noise is further reduced by using a multi-microphone system as an acoustic noise suppressor. The results of intelligibility tests using persons with hearing loss show excellent noise reduction. ** Title: Blind Separation of Multiple Speakers in a Multipath Environment Authors: Russell Lambert, TRW Anthony Bell, Salk Institute Volume: 1, Page: 423 Abstract: We relate information theoretic blind learning methods (infomax) and Bussgang blind equalization methods. The multipath extension of blind source separation methods can be seen in the frequency domain using FIR matrix algebra (matrices of finite impulse response filters). Three forms of Bussgang algorithms are given. The blind serial update method of Cardoso and Laheld is related to the infomax objective of Bell and Sejnowski. The application emphasis is on speech separation. We demonstrate the robustness and power of the new techniques by blindly separating speech signals recorded in a multipath environment. ** Title: A Single Chip 1,200 Sinusoid Real-Time Generator for Additive Synthesis of Musical Signals Authors: Fernando De Bernardinis, Dip. Ing. Informazione, Univ. Pisa Roberto Roncella, Dip. Ing. Informazione, Univ. Pisa Roberto Saletti, Dip. Ing. Informazione, Univ. Pisa Pierangelo Terreni, Dip. Ing. Informazione, Univ. Pisa Graziano Bertini, IEI-CNR, Pisa Volume: 1, Page: 427 Abstract: This paper presents a new hardware implementation of additive synthesis for high quality musical sound generation. The single-chip configuration is capable of performing 1,200 sinusoid real-time synthesis; the system is expandable to 13,200 partials by series connecting 11 chips. Each sinusoid is generated by a marginally stable second order IIR filter, and its frequency, amplitude and phase can be independently specified. The system is clocked at 60 MHz when working with a 44.1 kHz sampling rate. Two completely independent channels are available as output, and each sample relies on a 20 bit representation to achieve an SNR of at least 110 dB, thanks to the internal 24 bit word length. The IC is designed in a 0.5 (mu)m CMOS technology and has a core area of approximately 19 mm^2. ** Title: A Generalized Musical-Tone Generator with Applications to Sound Compression and Synthesis Authors: Carlo Drioli, University of Padova Davide Rocchesso, University of Padova Volume: 1, Page: 431 Abstract: A musical-tone generator based on physical modeling of the sound production mechanisms is presented. To the purpose of making this scheme general for a wide class of musical instruments, the nonlinear part of the tone-generator is modeled by a neural network. The system learns its parameters and the nonlinearity shape by means of nonlinear identification procedures based on waveform or spectral matching. Two possible applications of this model are discussed: sound compression can be obtained when considering the system as a nonlinear predictor, while sound synthesis can be obtained by adding control inputs to the network and by training the system to respond as desired. ** Title: A Singing Voice Synthesis System Based on Sinusoidal Modeling Authors: Michael Macon, Oregon Graduate Institute Leslie Jensen-Link, Momentum Data Systems James Oliverio, Georgia Institute of Technology Mark A. Clements, Georgia Institute of Technology E. Bryan George, Texas Instruments, Dallas Volume: 1, Page: 435 Abstract: Although sinusoidal models have been demonstrated to be capable of high-quality musical instrument synthesis speech modification, and speech synthesis, little exploration of the application of these models to the synthesis of singing voice has been undertaken. In this paper, we propose a system framework similar to that employed in concatenation-based text-to-speech synthesizers, and describe its extension to the synthesis of singing voice. The power and flexibility of the sinusoidal model used in the waveform synthesis portion of the system enables high-quality, computationally-efficient synthesis and the incorporation of musical qualities such as vibrato and spectral tilt variation. Modeling of segmental phonetic characteristics is achieved by employing a "unit selection" procedure that selects sinusoidally-modeled segments from an inventory of singing voice data collected from a human vocalist. The system, called LYRICOS, is capable of synthesizing very natural-sounding singing that maintains the characteristics and perceived identity of the analyzed vocalist. ** Title: Time-Scale Modification of Audio Signals with Combined Harmonic and Wavelet Representations Authors: Khaled N. Hamdy, University of Minnesota Ahmed H. Tewfik, University of Minnesota Satoshi Takagi, Sony Corporation Ting Chen, Stanford University Volume: 1, Page: 439 Abstract: We propose a new time-scale modification method for high quality audio signals. Our approach strives to preserve pitch and timbre. In our method, the signal is represented as the sum of sinusoidal components and a residual (edges and noise). The decomposition is computed via a combined harmonic and wavelet representation. Time-scaling is performed on the harmonic components and residual components separately. The harmonic portion is time-scaled by demodulating each harmonic component to DC, interpolating and decimating the DC signal, and remodulating each component back to its original frequency. The residual portion is time-scaled by preserving edges and relative distances between the edges while time-scaling the stationary (noise) components between the edges. ** Title: A Waveguide Model for Slapbass Synthesis Authors: Erhard Rank, Vienna University of Technology Gernot Kubin, Vienna University of Technology Volume: 1, Page: 443 Abstract: Starting from the waveguide model for plucked strings, a new digital signal processing model for the slapping technique on electric bassguitars is derived. The model includes amplitude limitations for the string at the frets and/or the fingerboard. These highly nonlinear elements are realized by conditional reflections which depend on the local string displacement. A model of the string dynamics for the two slapbass techniques - knocking the string with the thumb knuckle and plucking very strong with the index or middle finger - has been implemented both as MATLAB and C simulations and synthesizes sounds close to the natural instrument. ** Title: Minimum Perceptual Spectral Distance FIR Filter Design Authors: Shao-Po Wu, Stanford University William Putnam, Stanford University Volume: 1, Page: 447 Abstract: This paper addresses the problem of designing finite impulse response filters which optimally approximate desired frequency responses in the sense that they minimize a perceptual audio spectral measure. This measure is based on a simplified auditory model similar to those used in the area of perceptual audio quality measurement. It is shown that this problem can be cast as a logarithmic Chebychev approximation problem, which can be solved efficiently using recent interior point methods. ** Title: A Phase Interpolation Algorithm for Sinusoidal Model Based Music Synthesis Authors: Xiaoshu Qian, URI Yinong Ding, TI Volume: 1, Page: 451 Abstract: This paper presents a least square quadratic phase interpolation algorithm for sinusoidal model based music synthesis. This algorithm uses two additions with one parameter per data frame to generate the phase samples of a component sine wave. Compared with the cubic phase interpolation algorithm proposed by McAulay and Quatieri, the proposed algorithm is more efficient in terms of computational complexity and parameter storage. In the meantime, it also produces smoother frequency tracks. Unlike the existing quadratic phase interpolation algorithm, where the phase measurements are totally ignored ("magnitude-only"), the proposed algorithm interpolates phase in a least square sense from both the phase and the frequency measurements at data frame boundaries. Thus the resulting phase samples are approximately "locked" to the measured ones. Informal listening tests on various musical instrument tones indicate that the proposed algorithm clearly outperforms the magnitude-only synthesis approach and is qualitatively comparable to the cubic one. ** Title: Analytical Approximations of Fractional Delays: Lagrange Interpolators and Allpass Filters Authors: Stephan Tassart, IRCAM Philippe Depalle, IRCAM Volume: 1, Page: 455 Abstract: We propose in this paper a new point of view which unifies two well known filter families for approximating ideal fractional delay filters: Lagrange Interpolator Filters (LIF) and Thiran Allpass Filters. We achieve this unification by approximating the ideal Fourier transform of the fractional delay according to two different Pade approximations: series expansions and continued fraction expansions, and by proving that both approximations correspond exactly either to the LIF family or to the allpass delay filters family. This leads to an efficient modular implementation of LIFs. ** Title: Improved discrete-time modeling of multi-dimensional wave propagation using the interpolated digital waveguide mesh Authors: Lauri Savioja, Helsinki University of Technology Vesa Valimaki, Helsinki University of Technology Volume: 1, Page: 459 Abstract: The digital waveguide mesh is an extension of the one-dimensional digital waveguide technique. Waveguide meshes are used for simulation of two- and three-dimensional wave propagation in musical instruments and acoustic spaces. The original waveguide mesh algorithm suffers from direction-dependent dispersion. In this paper we show that this problem may be reduced by using an interpolated rectilinear mesh. In the analysis part we show the analytical solution for the wave propagation speed and numerical simulations of the magnitude response and phase speed in both the original and the interpolated two-dimensional waveguide mesh algorithms. We demonstrate by simulation that the wave propagation characteristics of the proposed interpolated waveguide mesh are independent of direction and thus the remaining errors caused by dispersion may be corrected with a postprocessor. ** Title: Generalized Likelihood Ratio Test for Selecting a Geo-acoustic Environmental Model Authors: Christoph F. Mecklenbrauker, RUB Peter Gerstoft, SACLANTCEN Pei-Jung Chung, COMNETS Johann F. Bohme, RUB Volume: 1, Page: 463 Abstract: A generalized likelihood ratio test is considered for testing acoustic environmental models with application to parameter inversion using an acoustic propagation code. In the following, we use the term ``hierarchy of models'' to denote a sequence of model structures M_1, M_2,ldots in which each particular model structure M_n contains all previous ones as special cases. We propose a combined parameter estimation and multiple sequential test for simultaneously determining the model order and its parameters: given the observed data, how many parameters should be included in the model? The last question is important for the order selection problem in hierarchies of models with increasing number of parameters where the observations are corrupted by additive noise. Monte Carlo simulations show the behaviour of the sequential test for selecting a model order as a function of the SNR. Finally, the test is applied to broadband data measured using a vertical array near the island of Elba in the Mediterranean Sea. ** Title: Tuning Genetic Algorithms for Underwater Acoustics Using a priori Statistical Information Authors: Maria Joao Rendas, I3S/CNRS Georges Bienvenu, Thomson Marconi Sonar Volume: 1, Page: 467 Abstract: In this paper we present a new technique for the evaluation/selection procedures of genetic algorithms, to be used in the context of parameter estimation problems. The proposed algorithm uses a priori information about the structure of the surface of which an extremum is being searched. For parameter estimation problems, the availability, at each iteration of a genetic algorithm, of a collection of samples of the ambiguity surface of the problem, enables the determination of the correlation between the observed ambiguity surface (at the sampled points) and the predicted ambiguity surface. The consideration of this information allows early detection of secondary extrema (which yield an ambiguity surface which does not correlate well with the observed one) and thus contributes to speed the convergence of the algorithm to the global optimal values. The paper applies the proposed technique to a source localization problem. ** Title: Robust Beamformer Design for Broadband Matched-Field Processing Authors: Kerem Harmanci, Duke University Jeffrey L. Krolik, Duke University Volume: 1, Page: 471 Abstract: Matched-field beamforming has been proposed for localizing wideband acoustic sources in uncertain underwater channels. While adaptive matched-field beamforming provides adequate sidelobe suppression for stronger sources, at low signal-to-noise ratios it converges to its quiescent response, in this case the Bartlett beamformer, which has unacceptably high sidelobe levels. In this paper, a design method is presented for reducing matched-field non-adaptive beamformer sidelobe levels given a sufficiently large observation time-bandwidth product. The proposed alpha-beamformer incoherently averages narrowband matched-field beamformer output power over the signal band after a trade-off has been performed at each frequency to achieve better sidelobe suppression at the expense of some reduction in gain against diffuse noise. Simulations and results with Mediterranean vertical array data indicate that the wideband alpha-beamformer can provide improved sidelobe suppression versus conventional techniques. ** Title: FASTMAP: A Fast, Approximate Maximum A Posteriori Probability Parameter Estimator with Application to Robust Matched-Field Processing Authors: Brian F. Harrison, NUWCDIVNPT Richard J. Vaccaro, University of Rhode Island Donald W. Tufts, University of Rhode Island Volume: 1, Page: 475 Abstract: In many estimation problems, the set of unknown parameters can be divided into a subset of desired parameters and a subset of nuisance parameters. Using a maximum a posteriori (MAP) approach to parameter estimation, these nuisance parameters are integrated out in the estimation process. This can result in an extremely computationally-intensive estimator. This paper proposes a method by which computationally-intensive integrations over the nuisance parameters required in Bayesian estimation may be avoided under certain conditions. The propsed method is an approximate MAP estimator which is much more computationally efficient than direct, or even Monte Carlo, integration of the joint posteriori distribution of the desired and nuisance parameters. As an example of its efficiency, we apply the fast algorithm to matched-field source localization in an uncertain environment. ** Title: Electromagnetic Matched Field Processing for Source Localization Authors: Donald F. Gingras, Naval Command, Control and Ocean Surveillance Center Peter Gerstoft, SACLANT Neil L. Gerr, Office of Naval Research Christoph F. Mecklenbrauker, Vienna University of Technology Volume: 1, Page: 479 Abstract: Matched field processing (MFP) refers to signal and array processing techniques in which, rather than a planewave arrival model, complex-valued (amplitude and phase) field predictions for propagating signals are used. Matched field processing has been successfully applied in ocean acoustics. In this paper the extension of MFP to the electromagnetic domain, i.e., electromagnetic (EM) MFP (EM-MFP) is described. Simulations of EM-MFP in the tropospheric setting suggest that, under suitable conditions, EM-MFP methods can enable EM sources to be both detected/localized and used as sources of opportunity for estimating the environmental parameters that determine EM propagation. ** Title: Power-Law Processors for Detecting Unknown in Signals in Colored Noise Authors: Ivars P. Kirsteins, NUWCDIVNPT Sanjay K. Mehta, NUWCDIVNPT John Fay, NUWCDIVNPT Volume: 1, Page: 483 Abstract: We propose a new non-parametric adaptive detector for detecting an unknown broadband signal in interference consisting of non-stationary narrowband components and a locally stationary broadband component. An important feature of this detector is that it needs no prior information about the signal or interference. The proposed detector is based on the integration of the non-parametric power law detector of Nuttall with robust narrowband interference removal and whitening using a multiple taper spectral estimation-based technique. Experimental results indicate that the proposed detector outperforms conventional detectors. ** Title: Multitarget detection/tracking of echoes with known waveform: algorithm and applications Authors: Vittorio Rampa, C.S.T.S. - C.N.R. Umberto Spagnolini, Politecnico di Milano Volume: 1, Page: 487 Abstract: The Time of Delay (TOD) estimation of multiple echoes is here solved with an iterative multitarget detection/tracking algorithm. The evaluation of the TODs is based on their a-posteriori probability, while a first-order Markov model is used for a-priori probability estimation. The effectiveness of the algorithm (low false-alarm rate and robustness) is also experimentally proven. Moreover the algorithm exhibits a better noise rejection and an improved target resolution with respect to algorithms that perform separate detection and tracking. ** Title: Detection of Gaussian Bandpass Transients Under Impulsive Noise: A Wavelet Transform Approach Authors: Francisco M. Garcia, ISR - IST Isabel M.G. Lourtie, ISR - IST Volume: 1, Page: 491 Abstract: In underwater acoustics, the modeling of impulsive noise ambients by symmetric-alpha-stable laws is motivated by the generalized central limit theorem. However, detection of stochastic signals under such additive noise is a difficult task to implement, due to the lack of a closed-form expression of the a-posteriori probability density function. In this paper, we present a suboptimal detector for Gaussian bandpass transients in impulsive noise that uses a nonlinear, memoryless prefilter followed by a discrete wavelet transform. The resulting signals present a Gaussian-like behavior and the decision is achieved by the comparison of a quadratic likelihood ratio with a threshold. The tuning of the nonlinearity parameter is performed either by looking at the receiver operating characteristic or using the Chernoff distance, that, although resulting in an approximate solution, is easier to compute. Simulation results are presented by Monte-Carlo simulation. ** Title: Maximum Likelihood Estimator for Magneto-Acoustic Localisation Authors: Gilles Dassot, LETI CEA/Grenoble Roland Blanpain, LETI CEA/Grenoble Claude Jauffret, GESSY, University Toulon et Var Volume: 1, Page: 495 Abstract: This paper is devoted to the localization of magneto- acoustic sources moving in a straight line at a constant speed. Our technique is based on the association of narrow band acoustic signals and magnetostatic measurements. First of all, we describe features that make possible the association of magnetic and acoustic data, secondly, we show that positioning accuracy is much improved by this association. In this paper we focus on solving the problem with as few sensors as possible. A geometric discussion of identifiability is proposed, as well as a Batch Maximum Likelihood estimator whose covariance matrix asymptotically achieves Cramer Rao Lower Bounds (CRLB). ** Title: Barankin Bound for Source Localization in Shallow Water Authors: Joseph Tabrikian, Duke University Jeffrey L. Krolik, Duke University Volume: 1, Page: 499 Abstract: Matched-field methods are known to have a severe ambiguity problem. In low signal-to-noise-ratios (SNR's), where the estimator cannot distinguish between the ambiguity function peak near the true source location and ambiguous ones, its mean square error deviates radically from the Cramer-Rao lower bound (CRLB). In this paper, the Barankin bound for the source localization problem in an uncertain shallow water environment is derived. In particular, a method of selection of the test-points for evaluation of the bound is presented. The bound is evaluated using a ``general mismatch'' benchmark scenario. The results presented here predict the threshold SNR below which the performance degrades dramatically. Channel uncertainties in the benchmark scerario are shown to increase this threshold SNR by as much as 3dB. ** Title: Underwater transient signal processing: marine mammal identification, localization, and source signal deconvolution Authors: Zoi-Heleni Michalopoulou, CAMS, NJIT Volume: 1, Page: 503 Abstract: Processing marine-mammal signals for species classification and monitoring of endangered marine mammals are problems that have recently attracted attention in the scientific literature. For classification it has been proposed to use methods appropriate for non-stationary signals, such as time-frequency and time-scale analysis. This paper shows that a factor that can significantly affect results from marine-mammal signal processing is the impulse response of the ocean in which the signals propagate. The ocean is a dispersive propagation medium and, therefore, affects the time-frequency characteristics of a propagating acoustic signal. Because of this distortion, feature selection should be performed after the oceanic impulse response has been deconvolved from the recorded signals. The paper also discusses localization of vocalizing marine mammals using matched-field processing and shows how this becomes a part of the deconvolution process. ** Title: Numerical Optimization of Non-adaptive Microphone Arrays Authors: Alexander Goldin, IBM Israel S&T Volume: 1, Page: 507 Abstract: The paper describes an application of the numerical optimization methods for the design of non-adaptive multi-sensor arrays. The parameters and the geometry of such arrays do not change with changes in the input signals, and must be chosen in advance. Generally, the goal of a non-adaptive multi-sensor array may be numerically expressed through its pattern function which shows the gain for a signal coming from a particular direction in space. The real pattern function depends on the geometry of the array and on the processing which signals from every sensor undergo. The array pattern function is non-linear and it is frequency dependent. The geometry and the processing parameters of the multi-sensor array are optimized to provide the minimum difference between the goal and the real functions over a specified frequency range. Optimization results for several goal functions for multi-microphone arrays are provided and discussed. ** Title: Joint Direction-of-Arrival and Array Shape Tracking for Multiple Moving Targets Authors: Jason Goldberg, Tel Aviv University Ana Perez-Neira, UPC Miguel Lagunas, UPC Volume: 1, Page: 511 Abstract: An algorithm for the joint tracking of source DOA's and sensor positions is presented to address the problem of DOA tracking in the presence sensor motion. Initial maximum likelihood estimates of source DOA's and sensor positions are refined by Kalman filtering. Spatio-temporally correlated array movement is considered. Source angle dynamics are used to achieve correct data association. The new technique is capable of performing well for the difficult cases of sources that cross in angle, fully coherent sources, as well as sources of identical or vastly different (possibly time-varying) power. Computer simulations show that the approach is robust in the presence of array motion modeling uncertainty and effectively reduces dependence on expensive and possibly unreliable hardware. ** Title: Comparison of Probabilistic Least Squares and Probabilistic Multi-Hypothesis Tracking Algorithms for Multi-Sensor Tracking Authors: Mark L. Krieg, DSTO, CSSIP, University of Adelaide Douglas A. Gray, University of Adelaide, CSSIP Volume: 1, Page: 515 Abstract: A key element for successful tracking is knowing from which target each measurement originates. These measurement-to-target associations are generally unavailable, and the tracking problem becomes one of estimating both the assignments and the target states. We present the Probabilistic Least Squares Tracking (msPLST) algorithm for estimating the measurement-to-target assignments and the track trajectories of multiple targets, using measurements from multiple sensors. This is a different approach to that used in Probabilistic Multi-Hypothesis Tracking (PMHT), although both algorithms employ the concept of an extended observer containing both the target states and the measurement-to-target assignments. A comparison of both algorithms is made, and their performance is evaluated using simulated data. ** Title: Direction Finding with Imperfect Wavefront Coherence: A Matrix Fitting Approach Using Genetic Algorithm Authors: Alex B. Gershman, Ruhr University Bochum Christoph F. Mecklenbrauker, Ruhr University Bochum Johann F. Bohme, Ruhr University Bochum Volume: 1, Page: 519 Abstract: The performance of high-resolution direction finding methods degrades in several practical situations where the wavefronts have imperfect spatial coherence. The original solution to this problem was proposed by Paulraj and Kailath, but their technique requires a priori knowledge of the matrix characterizing the loss of wavefront coherence along the array aperture. Below, a novel solution to this problem is proposed, which does not require a priori knowledge of the spatial coherence matrix. Our technique is based on the multidimensional minimization of appropriate concentrated cost function using Genetic Algorithm (GA). ** Title: Design Of An Optimum Wideband Active Sonar Array With Robustness Authors: Saman S. Abeysekera, Curtin University of Technology Y.H. Leung, Curtin University of Technology Volume: 1, Page: 523 Abstract: The use of wideband active sonar array processing to estimate the range, velocity and bearing of a target has received much interest in the literature recently. Although increased attention has been focused on wideband correlation processing for estimating range and velocity, array directivity patterns are almost always computed and interpreted under the narrowband signal assumption. This paper considers the target bearing estimation problem using the wideband correlation approach. Via this approach, it will be shown how an optimum set of array weights can be selected for a known transmitted signal. The optimization procedure also provides robustness against errors in the array structure. ** Title: Multipath time-delay estimation Authors: Jean Jacques Fuchs, IRISA Volume: 1, Page: 527 Abstract: A transmitted and known signal is observed at the receiver through more than one path in additive noise. The problem is to estimate the number of paths and for each of them the associated attenuation and delay. It is a frequent problem in sonar, radar and geophysics. We propose an algorithm that is easy to implement, that has a reasonable computational load and seems to be able to solve the problem under more severe conditions (lower SNR) than previous methods. ** Title: Fast Maximum Likelihood Estimation With Multiple Signal Initialization Authors: Robert B. MacLeod, NUWC Richard J. Vaccaro, University of Rhode Island Volume: 1, Page: 531 Abstract: In this paper we are concerned with signal processing of acoustic signals resulting from active transmissions by high frequency sonar systems. These signals consist of structured interference related to propagation effects in the media, reflections from targets, and measurement noise. The methods herein model these signals as replicas of the transmitted signal, scaled in amplitude and time, and delayed. Furthermore, we are interested in signals with `simple' time frequency profiles, such as linear frequency modulated (LFM) or hyperbolic frequency modulated (HFM) signals. These signals have the underlying property that the principle ridge of the autoambiguity function crosses the mid point of the time-frequency plane in a smooth manner, with a simple relationship between time delay and time scaling (frequency shifting). This paper describes a method for estimating the delay and time scale of signal components using fast maximum likelihood, while preserving the high resolution property of related time delay estimation techniques. ** Title: An Algorithm for Detecting Closely Spaced Delay/Doppler Components Authors: Amir W. Habboosh, NUWCDIVNPT Richard J. Vaccaro, University of Rhode Island Steven M. Kay, University of Rhode Island Volume: 1, Page: 535 Abstract: This paper considers a method for estimating time delays, amplitudes, and Doppler scales of a multipath signal. The method is an extension of work previously reported by Manickam and Vaccaro which dealt solely with time delays and amplitudes, and extended by Habboosh and Vaccaro to include Doppler scale. In this paper, an algorithm is presented for determining the size of the indicator set to reduce ill-conditioning of the signal subspace matrix. Simulation results are shown and comparisons to the Cramer-Rao lower bound provided; these results show that significant reduction in estimate variances can be achieved using the deconvolution approach with a properly selected indicator set. ** Title: Improvement of TDOA Measurement using Wavelet Denoising with a Novel Thresholding Technique Authors: Shi Quan Wu, The Chinese University of Hong Kong Hing Cheung So, City University of Hong Kong Pak Chung Ching, The Chinese University of Hong Kong Volume: 1, Page: 539 Abstract: In this paper, wavelet denoising is applied in time delay estimation between signals received at two spatially separated sensors in the presence of noise. Prior to cross correlation, each of the sensor outputs is denoised according to a n