Page 1 |
Save page Remove page | Previous | 1 of 116 | Next |
|
small (250x250 max)
medium (500x500 max)
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
Subset |
MUSIC RETRIEVAL SYSTEMS: ROBUST PERFORMANCE UNDER THE
EFFECT OF UNCERTAINTY
by
Erdem Unal
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
August 2008
Copyright 2008 Erdem Unal
Object Description
| Title | Music retrieval systems: robust performance under the effect of uncertainty |
| Author | Unal, Erdem |
| Author email | unal@usc.edu |
| Degree | Doctor of Philosophy |
| Document type | Dissertation |
| Degree program | Electrical Engineering |
| School | Viterbi School of Engineering |
| Date defended/completed | 2008-05-13 |
| Date submitted | 2008 |
| Restricted until | Unrestricted |
| Date published | 2008-07-25 |
| Advisor (committee chair) | Narayanan, Shrikanth |
| Advisor (committee member) |
Chew, Elaine Georgiou, Panayiotis Kuo, C.-C. Jay |
| Abstract | Music Information Retrieval (MIR) is gaining widespread attention and becoming increasingly important. The growing capacity of web servers parallels the explosion of information generated worldwide. The need for efficient and natural access to these databases cannot be overstated. Digital music and its associated information are prime examples of such complex information that can be stored in a variety of formats, such as MP3, MIDI, wav, scores etc. These data can also be accessed in multiple ways. If the user is familiar with the name of the song or the band, and the source material is annotated with metadata, retrieval can be straightforward. However, if one does not know the lyrics, title, or the performer, alternative retrieval methods are necessary, such as through singing, humming, or playing a sample of the piece as a the query to the database. Enabling such kinds of natural human interactions with large databases has thus become an essential component of effective and flexible MIR systems.; In this thesis, two general domains for MIR systems are under discussion: a) Retrieval in Monophonic Music, and b) Retrieval in Polyphonic Music. For both domains, this thesis investigates the different sources and effects of uncertainty that is present in the input level, and system level, and present algorithms for solving the robust retrieval problem by combining music knowledge, signal processing techniques and statistical analysis.; First, we discuss Query by Humming, a specific instance of music retrieval systems in monophonic music domain, where only one sound source is available at a time. Here, straight forward signal analysis via prosodic features such as pitch and energy can be used to achieve accurate transcription from audio to symbol domain, however, the variability in the way people produce humming is not easy to handle with such straight forward algorithms. Since the transcription provided by the system front end will be used in the query engine, the robustness against user dependent variability is important. The performance of the transcription directly affects the performance that of the retrieval engine.; Our approach for achieving robust performance under the effect of uncertatinty is statistical. We first discuss our experiments for collection of real world humming data. The data is important in designing statistical systems. The goal is to achieve a collection of data that represents the general variability that is expected in the input of QBH systems. The data is also used in estimating important parameters of the front end of the system for segmentation, and also it is used in testing our QBH system's retrieval performance. We analyzed the humming performance of different users against different criteria such as, the effects of musical background, musical structure of the target melody, and familiarity. We also tried to observe performance differences over humming different interval levels such as high intervals vs low intervals, perfect intervals vsaugmented intervals etc... The final goal is to use the acquired statistical information as a guidence through our retrieval calculations.; An Hidden Markov Models (HMM) based speech recognition system is used in the front end of our QBH system. The goal is to segment humming syllables that represent musical notes in the input audio. Accurate segmentation leads accurate representation of the audio in the symbol domain. We use relative information of change in pitch and duration for consecutive notes to ensure key and rhythm independent representation. From the two dimensional transcription of pitch contour and duration ratios, we extract fixed length characteristic finger prints(FP) from the audio at rare pitch movements and duration movements where highest and lowest change in the input is performed. The main assumption we use is a subsequence of pitch contour and duration ratios would be enough for representing a melody. These subsequences, finger prints are mapped ontothe database entries and compared to see if any similarities can be found. Statistical measures are used to define and calculate the similarity distance from the extracted finger prints and the database entries to achieve robust performance.; We also extended the MIR problem to a next level, which is retrieval in polyphonic domain. Now the retreival task is performing matching between audio files, that has unlimited sound sources, and they might be in different forms with respect to expressive parameters and orchestration. In polyphonic music, since the number of instruments playing at a time and their identity is unknown, mapping the audio signal into a true note transcription is a hard task to achieve. Researchers used different machine learning techniques and they were only able to report around 55% note detection accuracy in monotimbral domain. Here a mid level representation whose performance is not affected by different spectral characteristics of the different instruments should be defined. We used a representation technique that maps small audio frames into a symbolic representation that tracks general tonal movement, behaviour and characteristics of the polyphonic audio. The selected representation is a string sequence of lexical chords for each major and minor chord of twelve distinct sounds in a full octave. The representation is achieved by mapping the audio spectrum of the individual frames onto the Spiral Array, a 3d space for tonality, that has specific tonal marks at specific coordinates. The Spiral Array is updated with respect to the tonal labeling task for faster transcription. Fromthe mapping, a decision is made for identifying which tonal cluster the audio frame belongs to, and appropriate labeling is perfromed. The transcription process is continuously labeling consequtive audio frames with the most appropriate (closest to the tonal center).; For modeling, we used sequential statistical models, which are n-grams. An n-gram is a sub-sequence of n items from a given sequence. The n-gram sequential modeling strategy can be applied to the tonal sequences that are transcribed from polyphonic audio for statistically representing tonal movements. This sequential representation technique is similiar to the ones that is used in genetic analysis, instead of protein names, we have chord names in our sequential code. N-grams are extracted from tonal string sequences to create a statistical model for each of the polyphonic melody that is in our melody database. After appropriate smoothing, which is needed to compansate for different audio lengths, the smoothed n-grams will ne accumulated in the melody database. For retrieval, a symbolic sequence is compared to each of the smoothed ngrams in the database using perplexity based scoring. Perplexity calculates the cross enthrophy between a query sequence and the smoothed n-grams in the database. The sequential models which are close to the query sequence will be less surprised from the subsequences generated by the query, so their perplexity score will be less, which will be used as the similarity metric in our retrieval calculations. |
| Keyword | music information retrieval; query by humming; query by example |
| Language | English |
| Part of collection | University of Southern California dissertations and theses |
| Publisher (of the original version) | University of Southern California |
| Place of publication (of the original version) | Los Angeles, California |
| Publisher (of the digital version) | University of Southern California. Libraries |
| Type | texts |
| Legacy record ID | usctheses-m1415 |
| Rights | Unal, Erdem |
| Repository name | Libraries, University of Southern California |
| Repository address | Los Angeles, California |
| Repository email | http://www.usc.edu/isd/libraries/services/ask_a_librarian/email/ |
| Filename | etd-Unal-20080725 |
| Archival file | uscthesesreloadpub_Volume32/etd-Unal-20080725.pdf |
Description
| Title | Page 1 |
| Full text | MUSIC RETRIEVAL SYSTEMS: ROBUST PERFORMANCE UNDER THE EFFECT OF UNCERTAINTY by Erdem Unal A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2008 Copyright 2008 Erdem Unal |
Comments
Post a Comment for Page 1

