Introduction
Here I will add some software produced during my research. I program basically in two languages: I use Matlab for initial development stages and research, and then I port my results to C++ with the help of some auxiliary library I have created for that purpose. For this reason, some of the software in this page will be released in Matlab language and the rest in C++.
License
All the software I release is provided as open-source free to use code. I believe in the principle of reproducible research, which requires that the results reported in specific papers can be reproducible by any other researcher. This is why I will produce open source releases of my methods, for other researchers to use them. Note that the main interest of this releasing of software is to provide the research community with standard implementation of methods that allows for a fair and direct comparison.
Also, I want to emphasize that all the software available in this page is provided as is, without ANY kind of warranty, and the author do not take ANY responsibility for the results of its usage.
If you like this software, I will consider enough payment if you let me know your opinion, and also if you let me know in which application was used successfully.
Also I will appreciate if you report the existence of bugs. I am pretty sure that it will contain a lot of bugs, because I was up to now the only user, and I only used it in very restricted applications and examples, so please be careful with the usage of the software.
Downloads
SPeech Processing library (SPP)
This library is written in C++, although basically it is C with some flavour of C++ (for example, for operator and functions overloading). I almost did not use classes and templates provided by C++, for speed, compatibility and portability reasons. The main purpose of this library is to provide a generic set of functions to manipulate speech signals. As the signals are usually represented as vectors, and the processing on them can be presented in terms of matrix-vector operations, the library is built around a set of matrix and vector types, both of real and complex values, and basic operations and functions to modify and operate over them.
On top of these basic building blocks, a lot of high level processing, like filtering, transformations and analysis, are produced. The library is fully documented using Doxygen, and it provides more than 700 functions.
I have compiled it successfully in several computers under Linux operating system (Fedora 6, 7, 8, 9, Ubuntu 6, 7, 8), both in machines with 32 and 64 bits architectures, and also under Windows XP using Mingw and Msys, MS Visual C++ 5, MS Visual C++ 2005 Express edition and Borland C++ Compiler 5.5 (for the last three, some minor changes and option configurations were needed).
Most of the library was written by me, Leandro Di Persia. In some specific functions, I have used some codes from other people, usually adapting them to the data types of this library and correcting minor bugs, and sometimes porting them from their original languages. In those cases I have left an acknowledgement to the authors in the specific files. Also the file AUTHORS describe the authorship of all the code.
Requirements
There are not specific requirements. The library does not uses any extremely complicated characteristics, nor any external library. It is completely self-contained.
Installation
The usual Linux build structure using Makefile is provided. For other architectures, (MS visual C++ under Windows for example) some projects should be created.
The file INSTALL has instructions for building and installing the library. Note that only a static version of the library is provided.
Download
Source: tar.gz
Trim
This is a small program that I used sometimes to cut the extremes of a speech signal. When a recording is done, sometimes a rather large duration silence appears at both ends of the sentence. This can be a negative aspects in some research areas. For example, if one wants to artificially contaminate that signal with noise at certain SNR, the noise will be active during all time, but the speech will be active in a shorter period. This will cause a different SNR (with more power during the speech part) than desired.
Note that this is not a Voice Activity Detector (VAD), it also cannot detect pauses inside the sentence, as it scans the signal from the beginning and the end, searching for enough energy, thus only can set two points for cutting the signal.
Requirements
This program uses functions from SPP library (see above). You need to have SPP installed in your system for this program to compile.
Installation
The file INSTALL contains detailed installation instructions.
Download
Source: tar.gz
Mixer
This program is aimed at producing synthetic convolutive mixtures of 2 sources, as measured by 2 microphones. For this, it needs the Impulse Responses (IR) measured from each source to each microphone. The program allows to specify a desired power ratio to use. This power ratio is adjusted in the sources (that is, it simulate the output power relation of the sources. The final SNR of the mixtures will depend on the characteristics of the IRs.
Requirements
This program uses functions from SPP library (see above). You need to have SPP installed in your system for this program to compile.
Installation
The file INSTALL contains detailed installation instructions.
Download
Source: tar.gz
RevTime
This program is produced to measure the acoustical properties of a room using an impulse response measured on it. The measures are the reverberation times
and
, the Early Decay Time (EDT), the clarity indexes
and
, and the Definition index
. All these measurements are calculated using the global IR, without filtering. Also, the
,
and EDT are measured in octave bands according to ISO standard. The reverberation time measurements are done using the Schroeder reverse integration for smoothing the energy decay curve, and an estimation of noise threshold is done using the last 20% samples of the IR.
Requirements
This program uses functions from SPP library (see above). You need to have SPP installed in your system for this program to compile.
The program also can generate figures of the Energy Decay curves in .eps and .pdf formats. For this part, Dislin library is used (see http://www.mps.mpg.de/dislin/). If you do not have this library installed, or you do not need the production of the graphics, you need to edit the Makefile to adapt it for this case.
Installation
The file INSTALL contains detailed installation instructions.
Download
Source: tar.gz
FDBSS
This program is a demo for our frequency-domain Blind Source Separation method using the pseudoanechoic model, as presented in:
- L. Di Persia, D. Milone and M. Yanagida, "Indeterminacy free frequency-domain blind separation of reverberant audio sources", IEEE Transactions on Audio, Speech and Language Processing, Vol. 17, NÂș2, pp. 299-311, 2009.
It is prepared to work on a 2-by-2 convolutive mixture. It uses a robust frequency-domain ICA approach to estimate the mixture parameters and then synthesize the separation matrices for each frequency bin using this information. It also apply a Wiener-like postfilter to improve the separation, by reducing the effect of echoes.
To test the method, some examples of real mixtures are given, with a small script to run the program on them.
Requirements
This program uses functions from SPP library (see above). You need to have SPP installed in your system for this program to compile.
Installation
The file INSTALL contains detailed installation instructions.
Download
Source: tar.gz





