DSpace Repository

Quasi-Metrics, Similarities and Searches: Aspects of Geometry of Protein Datasets

Show simple item record

dc.contributor.author Stojmirović, Aleksandar
dc.date.accessioned 2008-08-11T05:19:59Z
dc.date.accessioned 2022-10-31T01:50:42Z
dc.date.available 2008-08-11T05:19:59Z
dc.date.available 2022-10-31T01:50:42Z
dc.date.copyright 2005
dc.date.issued 2005
dc.identifier.uri https://ir.wgtn.ac.nz/handle/123456789/26758
dc.description.abstract A quasi-metric is a distance function which satisfies the triangle inequality but is not symmetric: it can be thought of as an asymmetric metric. Quasi-metrics were first introduced in 1930s and are a subject of intensive research in the context of topology and theoretical computer science. The central result of this thesis, developed in Chapter 3, is that a natural correspondence exists between similarity measures between biological (nucleotide or protein) sequences and quasi-metrics. As sequence similarity search is one of the most important techniques of modern bioinformatics, this motivates a new direction of research: development of geometric aspects of the theory of quasi-metric spaces and its applications to similarity search in general and large protein datasets in particular. The thesis starts by presenting basic concepts of the theory of quasi-metric spaces illustrated by numerous examples, some previously known, some novel. In particular, the universal countable rational quasi-metric space and its bicompletion, the universal bicomplete separable quasi-metric space are constructed. Sets of biological sequences with some commonly used similarity measures provide a further and the most important example. Chapter 4 is dedicated to development of a notion of the quasi-metric space with Borel probability measure, or pq-space. The concept of a pq-space is a generalization of a notion of an mm-space from the asymptotic geometric analysis: an mm-space is a metric space with Borel measure that provides the framework for study of the phenomenon of concentration of measure on high dimensional structures. While some concepts and results are direct extensions of results about mm-spaces, some are intrinsic to the quasi-metric case. One of the main results of this chapter indicates that 'a high dimensional quasi-metric space is close to being a metric space'. Chapter 5 investigates the geometric aspects of the theory of database similarity search. It extends the existing concepts of a workload and an indexing scheme in order to cover more general cases and introduces the concept of a quasi-metric tree as an analogue to a metric tree, a popular class of access methods for metric datasets. The results about pq-spaces are used to produce some new theoretical bounds on performance of indexing schemes. Finally, the thesis presents some biological applications. Chapter 6 introduces FSIndex, an indexing scheme that significantly accelerates similarity searches of short protein fragment datasets. The performance of FSIndex turns out to be very good in comparison with existing access methods. Chapter 7 presents the prototype of the system for discovery of short functional protein motifs called PFMFind, which relies on FSIndex for similarity searches. en_NZ
dc.format pdf en_NZ
dc.language en_NZ
dc.language.iso en_NZ
dc.publisher Te Herenga Waka—Victoria University of Wellington en_NZ
dc.title Quasi-Metrics, Similarities and Searches: Aspects of Geometry of Protein Datasets en_NZ
dc.type Text en_NZ
vuwschema.type.vuw Awarded Doctoral Thesis en_NZ
thesis.degree.grantor Te Herenga Waka—Victoria University of Wellington en_NZ
thesis.degree.level Doctoral en_NZ
thesis.degree.name Doctor of Philosophy en_NZ


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account