Isotropic Dynamic Hierarchical Clustering

Article ID

CSTSDET7A58

Isotropic Dynamic Hierarchical Clustering

Victor Sadikov
Victor Sadikov
Oliver Rutishauser
Oliver Rutishauser
DOI

Abstract

We face a business need of discovering a pattern in locations of a great number of points in a high-dimensional space. We assume that there should be a certain structure, so that in some locations the points are close while in other locations the points are more dispersed. Our goal is to group the close points together. The process of grouping close objects is known under the name of clustering. 1. We are particularly interested in a hierarchical structure. A plain structure may reduce the number of objects, but the data are still difficult to manage or present. 2. The classical technique suited for the task at hand is a B-Tree. The key properties of the B-Tree are that it is hierarchical and balanced, and it can be dynamically constructed from the input data. In these terms, B-Tree has certain advantages over other clustering algorithms, where the number of clusters needs to be defined a priori. The BTree approach allows to hope that the structure of input data will be well determine without any supervised learning. 3. The space is Euclidean and isotropic. This is the most challenging part of the project, because currently there are no B-Tree implementations processing indices in a symmetrical and isotropical way. Some known implementations are based on constructing compound asymmetrical indices from point coordinates, where the main index works as a key, while the function of other (999!) indices is lost; and the other known implementations split the nodes along the coordinate hyper-planes, sacrificing the isotropy of the original space. In the latter case the clusters become coordinate parallelepipeds, which is a rather artificial and unnecessary assumption. Our implementation of a B Tree for a high-dimensional space is based directly on concepts of factor analysis. 4. We need to process a great deal of data, something like tens of millions of points in a thousand-dimensional space. The application has to be scalable, even though, technically, out task is not con

Isotropic Dynamic Hierarchical Clustering

We face a business need of discovering a pattern in locations of a great number of points in a high-dimensional space. We assume that there should be a certain structure, so that in some locations the points are close while in other locations the points are more dispersed. Our goal is to group the close points together. The process of grouping close objects is known under the name of clustering. 1. We are particularly interested in a hierarchical structure. A plain structure may reduce the number of objects, but the data are still difficult to manage or present. 2. The classical technique suited for the task at hand is a B-Tree. The key properties of the B-Tree are that it is hierarchical and balanced, and it can be dynamically constructed from the input data. In these terms, B-Tree has certain advantages over other clustering algorithms, where the number of clusters needs to be defined a priori. The BTree approach allows to hope that the structure of input data will be well determine without any supervised learning. 3. The space is Euclidean and isotropic. This is the most challenging part of the project, because currently there are no B-Tree implementations processing indices in a symmetrical and isotropical way. Some known implementations are based on constructing compound asymmetrical indices from point coordinates, where the main index works as a key, while the function of other (999!) indices is lost; and the other known implementations split the nodes along the coordinate hyper-planes, sacrificing the isotropy of the original space. In the latter case the clusters become coordinate parallelepipeds, which is a rather artificial and unnecessary assumption. Our implementation of a B Tree for a high-dimensional space is based directly on concepts of factor analysis. 4. We need to process a great deal of data, something like tens of millions of points in a thousand-dimensional space. The application has to be scalable, even though, technically, out task is not con

Victor Sadikov
Victor Sadikov
Oliver Rutishauser
Oliver Rutishauser

No Figures found in article.

Victor Sadikov. 2016. “. Global Journal of Computer Science and Technology – C: Software & Data Engineering GJCST-C Volume 16 (GJCST Volume 16 Issue C3): .

Download Citation

Journal Specifications

Crossref Journal DOI 10.17406/gjcst

Print ISSN 0975-4350

e-ISSN 0975-4172

Issue Cover
GJCST Volume 16 Issue C3
Pg. 29- 36
Classification
GJCST-C Classification: H.3.3, I.5.3
Keywords
Article Matrices
Total Views: 7335
Total Downloads: 1867
2026 Trends
Research Identity (RIN)
Related Research
Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]

Request Access

Please fill out the form below to request access to this research paper. Your request will be reviewed by the editorial or author team.
X

Quote and Order Details

Contact Person

Invoice Address

Notes or Comments

This is the heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

High-quality academic research articles on global topics and journals.

Isotropic Dynamic Hierarchical Clustering

Victor Sadikov
Victor Sadikov
Oliver Rutishauser
Oliver Rutishauser

Research Journals