Quasi-Metrics, Similarities and Searches: Aspects of Geometry of Protein Datasets

Stojmirović, Aleksandar

dc.contributor.author	Stojmirović, Aleksandar
dc.date.accessioned	2008-08-11T05:19:59Z
dc.date.accessioned	2022-10-31T01:50:42Z
dc.date.available	2008-08-11T05:19:59Z
dc.date.available	2022-10-31T01:50:42Z
dc.date.copyright	2005
dc.date.issued	2005
dc.identifier.uri	https://ir.wgtn.ac.nz/handle/123456789/26758
dc.description.abstract	A quasi-metric is a distance function which satisfies the triangle inequality but is not symmetric: it can be thought of as an asymmetric metric. Quasi-metrics were first introduced in 1930s and are a subject of intensive research in the context of topology and theoretical computer science. The central result of this thesis, developed in Chapter 3, is that a natural correspondence exists between similarity measures between biological (nucleotide or protein) sequences and quasi-metrics. As sequence similarity search is one of the most important techniques of modern bioinformatics, this motivates a new direction of research: development of geometric aspects of the theory of quasi-metric spaces and its applications to similarity search in general and large protein datasets in particular. The thesis starts by presenting basic concepts of the theory of quasi-metric spaces illustrated by numerous examples, some previously known, some novel. In particular, the universal countable rational quasi-metric space and its bicompletion, the universal bicomplete separable quasi-metric space are constructed. Sets of biological sequences with some commonly used similarity measures provide a further and the most important example. Chapter 4 is dedicated to development of a notion of the quasi-metric space with Borel probability measure, or pq-space. The concept of a pq-space is a generalization of a notion of an mm-space from the asymptotic geometric analysis: an mm-space is a metric space with Borel measure that provides the framework for study of the phenomenon of concentration of measure on high dimensional structures. While some concepts and results are direct extensions of results about mm-spaces, some are intrinsic to the quasi-metric case. One of the main results of this chapter indicates that 'a high dimensional quasi-metric space is close to being a metric space'. Chapter 5 investigates the geometric aspects of the theory of database similarity search. It extends the existing concepts of a workload and an indexing scheme in order to cover more general cases and introduces the concept of a quasi-metric tree as an analogue to a metric tree, a popular class of access methods for metric datasets. The results about pq-spaces are used to produce some new theoretical bounds on performance of indexing schemes. Finally, the thesis presents some biological applications. Chapter 6 introduces FSIndex, an indexing scheme that significantly accelerates similarity searches of short protein fragment datasets. The performance of FSIndex turns out to be very good in comparison with existing access methods. Chapter 7 presents the prototype of the system for discovery of short functional protein motifs called PFMFind, which relies on FSIndex for similarity searches.	en_NZ
dc.format	pdf	en_NZ
dc.language	en_NZ
dc.language.iso	en_NZ
dc.publisher	Te Herenga Waka—Victoria University of Wellington	en_NZ
dc.title	Quasi-Metrics, Similarities and Searches: Aspects of Geometry of Protein Datasets	en_NZ
dc.type	Text	en_NZ
vuwschema.type.vuw	Awarded Doctoral Thesis	en_NZ
thesis.degree.grantor	Te Herenga Waka—Victoria University of Wellington	en_NZ
thesis.degree.level	Doctoral	en_NZ
thesis.degree.name	Doctor of Philosophy	en_NZ

Files in this item

Name: thesis.pdf

Size: 29.58Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

Restricted Theses and Research Papers
A collection of theses and papers only available to University staff and students.

Quasi-Metrics, Similarities and Searches: Aspects of Geometry of Protein Datasets

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account