Repository logo
 

Quasi-Metrics, Similarities and Searches: Aspects of Geometry of Protein Datasets

dc.contributor.authorStojmirović, Aleksandar
dc.date.accessioned2008-08-11T05:19:59Z
dc.date.accessioned2022-10-31T01:50:42Z
dc.date.available2008-08-11T05:19:59Z
dc.date.available2022-10-31T01:50:42Z
dc.date.copyright2005
dc.date.issued2005
dc.description.abstractA quasi-metric is a distance function which satisfies the triangle inequality but is not symmetric: it can be thought of as an asymmetric metric. Quasi-metrics were first introduced in 1930s and are a subject of intensive research in the context of topology and theoretical computer science. The central result of this thesis, developed in Chapter 3, is that a natural correspondence exists between similarity measures between biological (nucleotide or protein) sequences and quasi-metrics. As sequence similarity search is one of the most important techniques of modern bioinformatics, this motivates a new direction of research: development of geometric aspects of the theory of quasi-metric spaces and its applications to similarity search in general and large protein datasets in particular. The thesis starts by presenting basic concepts of the theory of quasi-metric spaces illustrated by numerous examples, some previously known, some novel. In particular, the universal countable rational quasi-metric space and its bicompletion, the universal bicomplete separable quasi-metric space are constructed. Sets of biological sequences with some commonly used similarity measures provide a further and the most important example. Chapter 4 is dedicated to development of a notion of the quasi-metric space with Borel probability measure, or pq-space. The concept of a pq-space is a generalization of a notion of an mm-space from the asymptotic geometric analysis: an mm-space is a metric space with Borel measure that provides the framework for study of the phenomenon of concentration of measure on high dimensional structures. While some concepts and results are direct extensions of results about mm-spaces, some are intrinsic to the quasi-metric case. One of the main results of this chapter indicates that 'a high dimensional quasi-metric space is close to being a metric space'. Chapter 5 investigates the geometric aspects of the theory of database similarity search. It extends the existing concepts of a workload and an indexing scheme in order to cover more general cases and introduces the concept of a quasi-metric tree as an analogue to a metric tree, a popular class of access methods for metric datasets. The results about pq-spaces are used to produce some new theoretical bounds on performance of indexing schemes. Finally, the thesis presents some biological applications. Chapter 6 introduces FSIndex, an indexing scheme that significantly accelerates similarity searches of short protein fragment datasets. The performance of FSIndex turns out to be very good in comparison with existing access methods. Chapter 7 presents the prototype of the system for discovery of short functional protein motifs called PFMFind, which relies on FSIndex for similarity searches.en_NZ
dc.formatpdfen_NZ
dc.identifier.urihttps://ir.wgtn.ac.nz/handle/123456789/26758
dc.languageen_NZ
dc.language.isoen_NZ
dc.publisherTe Herenga Waka—Victoria University of Wellingtonen_NZ
dc.rights.holderAll rights, except those explicitly waived, are held by the Authoren_NZ
dc.rights.licenseAuthor Retains Copyrighten_NZ
dc.rights.urihttps://www.wgtn.ac.nz/library/about-us/policies-and-strategies/copyright-for-the-researcharchive
dc.subjectNucleotide sequenceen_NZ
dc.subjectMolecular structureen_NZ
dc.subjectMetric spacesen_NZ
dc.subjectBioinformaticsen_NZ
dc.subjectTopologyen_NZ
dc.titleQuasi-Metrics, Similarities and Searches: Aspects of Geometry of Protein Datasetsen_NZ
dc.typeTexten_NZ
thesis.degree.disciplineMathematicsen_NZ
thesis.degree.grantorTe Herenga Waka—Victoria University of Wellingtonen_NZ
thesis.degree.levelDoctoralen_NZ
thesis.degree.nameDoctor of Philosophyen_NZ
vuwschema.contributor.unitSchool of Mathematical and Computing Sciencesen_NZ
vuwschema.contributor.unitSchool of Biological Sciencesen_NZ
vuwschema.type.vuwAwarded Doctoral Thesisen_NZ

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis.pdf
Size:
29.59 MB
Format:
Adobe Portable Document Format

Collections