Maximising the Value of Missing Data

Article ID

CSTSDE82YZR

Maximising the Value of Missing Data

Atai Winkler
Atai Winkler
DOI

Abstract

The subject of missing values in databases and how to handle them has received very little attention in the statistics and data mining literature1, 2, 3 and even less, if any at all, in the marketing literature. The usual attitude of practitioners is ‘we’ll just have to ignore records with missing values’. On the other hand, a few very advanced theoretical solutions have been developed, some of which have been applied, particularly to clinical trials data. These solutions can only be applied to small databases, not to the very large databases held by many companies on their customers. This paper describes a new method for imputing missing values in such very large databases. Two particular features of the method are that it can handle all combinations of variable type (continuous, ordinal and categorical) and that all the missing values in the database are imputed in one run of the software. It is based on the k-nearest neighbours method, a well known method in data mining. The paper concludes by presenting the results of a study of this method when used to impute the missing values in a real set of data. This paper is only concerned with ‘missing’ data, i.e. data that are not known but which have real values. It does not address the problem of ‘empty’ data, i.e. data that are not known but which cannot have real values.

Maximising the Value of Missing Data

The subject of missing values in databases and how to handle them has received very little attention in the statistics and data mining literature1, 2, 3 and even less, if any at all, in the marketing literature. The usual attitude of practitioners is ‘we’ll just have to ignore records with missing values’. On the other hand, a few very advanced theoretical solutions have been developed, some of which have been applied, particularly to clinical trials data. These solutions can only be applied to small databases, not to the very large databases held by many companies on their customers. This paper describes a new method for imputing missing values in such very large databases. Two particular features of the method are that it can handle all combinations of variable type (continuous, ordinal and categorical) and that all the missing values in the database are imputed in one run of the software. It is based on the k-nearest neighbours method, a well known method in data mining. The paper concludes by presenting the results of a study of this method when used to impute the missing values in a real set of data. This paper is only concerned with ‘missing’ data, i.e. data that are not known but which have real values. It does not address the problem of ‘empty’ data, i.e. data that are not known but which cannot have real values.

Atai Winkler
Atai Winkler

No Figures found in article.

Atai Winkler. 2014. “. Global Journal of Computer Science and Technology – C: Software & Data Engineering GJCST-C Volume 14 (GJCST Volume 14 Issue C3): .

Download Citation

Journal Specifications

Crossref Journal DOI 10.17406/gjcst

Print ISSN 0975-4350

e-ISSN 0975-4172

Issue Cover
GJCST Volume 14 Issue C3
Pg. 41- 48
Classification
Not Found
Article Matrices
Total Views: 8659
Total Downloads: 2312
2026 Trends
Research Identity (RIN)
Related Research
Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]

Request Access

Please fill out the form below to request access to this research paper. Your request will be reviewed by the editorial or author team.
X

Quote and Order Details

Contact Person

Invoice Address

Notes or Comments

This is the heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

High-quality academic research articles on global topics and journals.

Maximising the Value of Missing Data

Atai Winkler
Atai Winkler

Research Journals