Null Distribution of the Test Statistic for Model Selection via Marginal Screening: Implications for Multivariate Regression Analysis

Article ID

1YN32

Methods for multivariate regression analysis with model selection via marginal screening. Suitable for academic research and statistical modeling.

Null Distribution of the Test Statistic for Model Selection via Marginal Screening: Implications for Multivariate Regression Analysis

A.V. Rubanovich
A.V. Rubanovich
V.A. Saenko
V.A. Saenko
DOI

Abstract

Marginal screening (MS) is the computationally simple and commonly used for the dimension reduction procedures. In it, a linear model is constructed for several top predictors, chosen according to the absolute value of marginal correlations with the dependent variable. Importantly, when k predictors out of m primary covariates are selected, the standard regression analysis may yield false-positive results if m >> k (Freedman’s paradox). In this work, we provide analytical expressions describing null distribution of the test statistics for model selection via MS. Using the theory of order statistics, we show that under MS, the common F-statistic is distributed as a mean of k top variables out of m independent random variables having a 2 1 χ distribution. Based on this finding, we estimated critical p-values for multiple regression models after MS, comparisons with which of those obtained in real studies will help researchers to avoid false-positive result. Analytical solutions obtained in the work are implemented in a free Excel spreadsheet program.

Null Distribution of the Test Statistic for Model Selection via Marginal Screening: Implications for Multivariate Regression Analysis

Marginal screening (MS) is the computationally simple and commonly used for the dimension reduction procedures. In it, a linear model is constructed for several top predictors, chosen according to the absolute value of marginal correlations with the dependent variable. Importantly, when k predictors out of m primary covariates are selected, the standard regression analysis may yield false-positive results if m >> k (Freedman’s paradox). In this work, we provide analytical expressions describing null distribution of the test statistics for model selection via MS. Using the theory of order statistics, we show that under MS, the common F-statistic is distributed as a mean of k top variables out of m independent random variables having a 2 1 χ distribution. Based on this finding, we estimated critical p-values for multiple regression models after MS, comparisons with which of those obtained in real studies will help researchers to avoid false-positive result. Analytical solutions obtained in the work are implemented in a free Excel spreadsheet program.

A.V. Rubanovich
A.V. Rubanovich
V.A. Saenko
V.A. Saenko

No Figures found in article.

A.V. Rubanovich. 2021. “. Global Journal of Science Frontier Research – G: Bio-Tech & Genetics GJSFR-G Volume 21 (GJSFR Volume 21 Issue G1): .

Download Citation

Journal Specifications

Crossref Journal DOI 10.17406/GJSFR

Print ISSN 0975-5896

e-ISSN 2249-4626

Issue Cover
GJSFR Volume 21 Issue G1
Pg. 23- 31
Classification
GJSFR-G Classification: FOR Code: 060499
Keywords
Article Matrices
Total Views: 1875
Total Downloads: 875
2026 Trends
Research Identity (RIN)
Related Research
Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]

Request Access

Please fill out the form below to request access to this research paper. Your request will be reviewed by the editorial or author team.
X

Quote and Order Details

Contact Person

Invoice Address

Notes or Comments

This is the heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

High-quality academic research articles on global topics and journals.

Null Distribution of the Test Statistic for Model Selection via Marginal Screening: Implications for Multivariate Regression Analysis

A.V. Rubanovich
A.V. Rubanovich
V.A. Saenko
V.A. Saenko

Research Journals