Agglomerative Hierarchical Clustering: An Introduction to Essentials. (3) Standardization, Normalization and Dimensionality Reduction of a Data Matrix

Article ID

6300W

Agglomerative Hierarchical Clustering: An Introduction to Essentials. (3) Standardization, Normalization and Dimensionality Reduction of a Data Matrix

Refat Aljumily
Refat Aljumily
DOI

Abstract

In a previous tutorial article I looked at a proximity coefficient and, in the light of that proximity created a vectordistance matrix and used it to construct a hierarchical tree using different hierarchical clustering methods which will be the basis for exploratory multivariate analysis. The present article deals with three topics: (i) standardization for variable scales variation, (ii) normalization for sample length variation, and (iii) dimensionality reduction or minimization of data space. These techniques reflect the author’s academic background and particular area of interest and are, by necessity, not a particular purpose and are straightforwardly applicable to other kinds of data, and thus to a wide range of analysis in Linguistics. My treatment of these techniques is, necessarily, introductory and brief. I hope that this article will provide practitioners with an introductory overview of these techniques used for cluster analysis of electronic corpora of linguistic data. The assumption is that the data is in the form of an m x n matrix D in which, may require to transform it in various ways prior to cluster analyzing it. Standardized data matrix enables practitioners to measure the variation between n-variables and to cluster the cases they describe in common scales and values, regardless of their original scales and values. Normalized data matrix enables practitioners to eliminate the effect of variation in length among n-samples and to cluster them as if they were all (about) the same length, regardless of their original length. Dimensionality-reduced space data matrix enables practitioners to select and/or extract n-most interesting variables relevant to the research question and to visualize an existing pattern, regardless of the original space. A worked example is given to illustrate the effect each transformation technique has on a given data matrix. These transformation techniques have their own strengths and weakness but are beyond the scope of

Agglomerative Hierarchical Clustering: An Introduction to Essentials. (3) Standardization, Normalization and Dimensionality Reduction of a Data Matrix

In a previous tutorial article I looked at a proximity coefficient and, in the light of that proximity created a vectordistance matrix and used it to construct a hierarchical tree using different hierarchical clustering methods which will be the basis for exploratory multivariate analysis. The present article deals with three topics: (i) standardization for variable scales variation, (ii) normalization for sample length variation, and (iii) dimensionality reduction or minimization of data space. These techniques reflect the author’s academic background and particular area of interest and are, by necessity, not a particular purpose and are straightforwardly applicable to other kinds of data, and thus to a wide range of analysis in Linguistics. My treatment of these techniques is, necessarily, introductory and brief. I hope that this article will provide practitioners with an introductory overview of these techniques used for cluster analysis of electronic corpora of linguistic data. The assumption is that the data is in the form of an m x n matrix D in which, may require to transform it in various ways prior to cluster analyzing it. Standardized data matrix enables practitioners to measure the variation between n-variables and to cluster the cases they describe in common scales and values, regardless of their original scales and values. Normalized data matrix enables practitioners to eliminate the effect of variation in length among n-samples and to cluster them as if they were all (about) the same length, regardless of their original length. Dimensionality-reduced space data matrix enables practitioners to select and/or extract n-most interesting variables relevant to the research question and to visualize an existing pattern, regardless of the original space. A worked example is given to illustrate the effect each transformation technique has on a given data matrix. These transformation techniques have their own strengths and weakness but are beyond the scope of

Refat Aljumily
Refat Aljumily

No Figures found in article.

Refat Aljumily. 2016. “. Global Journal of Human-Social Science – G: Linguistics & Education GJHSS-G Volume 16 (GJHSS Volume 16 Issue G3): .

Download Citation

Journal Specifications

Crossref Journal DOI 10.17406/GJHSS

Print ISSN 0975-587X

e-ISSN 2249-460X

Issue Cover
GJHSS Volume 16 Issue G3
Pg. 55- 63
Classification
GJHSS-G Classification: FOR Code: 139999
Keywords
Article Matrices
Total Views: 4051
Total Downloads: 2156
2026 Trends
Research Identity (RIN)
Related Research
Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]

Request Access

Please fill out the form below to request access to this research paper. Your request will be reviewed by the editorial or author team.
X

Quote and Order Details

Contact Person

Invoice Address

Notes or Comments

This is the heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

High-quality academic research articles on global topics and journals.

Agglomerative Hierarchical Clustering: An Introduction to Essentials. (3) Standardization, Normalization and Dimensionality Reduction of a Data Matrix

Refat Aljumily
Refat Aljumily

Research Journals