A Study of Spam E-mail classification using Feature Selection package

α
Dr. R. Parimala
Dr. R. Parimala
σ
Dr. R. Nallaswamy
Dr. R. Nallaswamy
α National Institute of Technology Tiruchirappalli National Institute of Technology Tiruchirappalli

Send Message

To: Author

A Study of Spam E-mail classification using Feature Selection package

Article Fingerprint

ReserarchID

BDAAS

A Study of Spam E-mail classification using Feature Selection package Banner

AI TAKEAWAY

Connecting with the Eternal Ground
  • English
  • Afrikaans
  • Albanian
  • Amharic
  • Arabic
  • Armenian
  • Azerbaijani
  • Basque
  • Belarusian
  • Bengali
  • Bosnian
  • Bulgarian
  • Catalan
  • Cebuano
  • Chichewa
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Corsican
  • Croatian
  • Czech
  • Danish
  • Dutch
  • Esperanto
  • Estonian
  • Filipino
  • Finnish
  • French
  • Frisian
  • Galician
  • Georgian
  • German
  • Greek
  • Gujarati
  • Haitian Creole
  • Hausa
  • Hawaiian
  • Hebrew
  • Hindi
  • Hmong
  • Hungarian
  • Icelandic
  • Igbo
  • Indonesian
  • Irish
  • Italian
  • Japanese
  • Javanese
  • Kannada
  • Kazakh
  • Khmer
  • Korean
  • Kurdish (Kurmanji)
  • Kyrgyz
  • Lao
  • Latin
  • Latvian
  • Lithuanian
  • Luxembourgish
  • Macedonian
  • Malagasy
  • Malay
  • Malayalam
  • Maltese
  • Maori
  • Marathi
  • Mongolian
  • Myanmar (Burmese)
  • Nepali
  • Norwegian
  • Pashto
  • Persian
  • Polish
  • Portuguese
  • Punjabi
  • Romanian
  • Russian
  • Samoan
  • Scots Gaelic
  • Serbian
  • Sesotho
  • Shona
  • Sindhi
  • Sinhala
  • Slovak
  • Slovenian
  • Somali
  • Spanish
  • Sundanese
  • Swahili
  • Swedish
  • Tajik
  • Tamil
  • Telugu
  • Thai
  • Turkish
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Welsh
  • Xhosa
  • Yiddish
  • Yoruba
  • Zulu

Abstract

Feature selection (FS) is the technique of selecting a subset of relevant features for building learning models. FS algorithms typically fall into two categories: feature ranking and subset selection. Feature ranking ranks the features by a metric and eliminates all features that do not achieve an adequate score. Subset selection searches the set of possible features for the optimal subset. Many FS algorithm have been proposed. This paper presents a new FS technique which is guided by Fselector Package. The package Fselector implements a novel FS algorithm which is devoted to the feature ranking and feature subset selection of high dimensional data. This package provides functions for selecting attributes from a given dataset. Attribute subset selection is the process of identifying and removing as much of the irrelevant and redundant information as possible. The R package provides a convenient interface to the algorithm. This paper investigates the effectiveness of twelve commonly used FS methods on spam data set. One of the basic popular methods involves filter which select the subset of feature as preprocessing step independent of chosen classifier, Support vector machine classifier. The algorithm is designed as a wrapper around five classification algorithms. The short description of the algorithm and performance measure of its classification is presented with the spam data set.

References

35 Cites in Article
  1. M Hall,L Smith (1997). Feature Subset Selection: A Correlation Based Filter Approach.
  2. K Kira,L Rendell (1992). The feature selection problem: Traditional methods and a new algorithm.
  3. Kenji Kira,Larry Rendell (1992). A Practical Approach to Feature Selection.
  4. George John,Ron Kohavi,Karl Pfleger (1994). Irrelevant Features and the Subset Selection Problem.
  5. M Sahami,S Dumais,D Heckerman,E Horvitz (1998). A Bayesian approach to filtering junk e-mail. Learning for Text Categorization.
  6. A Mesleh (2007). CHI Square Feature Extraction Based SVMs Arabic Language Text Categorization System.
  7. E Ghiselli Theory of Psychological Measure_ment.
  8. J Quinlan (1986). Induction of decision trees.
  9. R Battiti (1994). Using mutual information for selecting features in supervised neural net learning.
  10. W Press,B Flannery,S Teukolsky,W (1988). Numerical recipes in C.
  11. Robert Holte (1993). Very Simple Classification Rules Perform Well on Most Commonly Used Datasets.
  12. Matt Ginsberg (1993). LEARNING.
  13. S Duchene,Leclercq (1988). An Optimal Transformation for Discriminant Principal Component Analysis.
  14. Leo Breiman (1998). Arcing classifier (with discussion and a rejoinder by the author).
  15. L Breiman,J Friedman,R Olshen,C Stone (1984). Classification and Regression Trees.
  16. T Therneau,E Atkinson (1997). Introduction to Recursive Partitioning Using the rpart Routine.
  17. I Rish An empirical study of the naive Bayes classifier.
  18. V Vapnik (1995). The nature of statistical learning theory.
  19. Vladimir Vapnik (1995). The Nature of Statistical Learning Theory.
  20. Y Yang,J Pedersen (1997). A Comparative Study on Feature Selection in Text Categorization.
  21. Ion Androutsopoulos,John Koutsias,Konstantinos Chandrinos,Constantine Spyropoulos (2000). An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages.
  22. N Cristianini,J Shawe-Taylor (2000). An introduction to support vector machines.
  23. N Cristianini,J Shawe-Taylor (2003). Support Vector and Kernel Methods, References Références Referencias Intelligent Data Analysis: An Introduction Springer -Verlag‖.
  24. (1998). Advances in Kernel Methods.
  25. A Smola,B Scholkopf Learning with kernels: Support Vector Machines, regularization, optimization, and beyond.
  26. Christopher Burges (1998). A Tutorial on Support Vector Machines for Pattern Recognition.
  27. Alexandros Karatzoglou,Alex Smola,Kurt Hornik (2005). kernlab: Kernel-Based Machine Learning Lab.
  28. A Karatzoglou,A Smola,K Hornik,A Zeileis (2004). kernlab -An S4 Package for Kernel Methods in R.
  29. Chih-Chung Chang,Chih-Jen,Lin (2001). Libsvm: a library forsupport vector machines.
  30. (2009). R: A Language and Environment for Statistical Computing.
  31. F Leisch,E Dimitriadou (2001). mlbench-A Collection for Artificial and Real-world Machine Learning Benchmarking Problems.
  32. S Hettich,C Blake,C Merz (1998). Table 3: Health datasets from UCI machine learning repository..
  33. E Dimitriadou,K Hornik,F Leisch,D Meyer,A Weingessel (2005). 1071: Misc Functions of the Department of Statistics (e1071).
  34. M Dash,H Liu (1997). Feature selection for classification.
  35. P Domingos,M Pazzani (1996). Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier.

Funding

No external funding was declared for this work.

Conflict of Interest

The authors declare no conflict of interest.

Ethical Approval

No ethics committee approval was required for this article type.

Data Availability

Not applicable for this article.

How to Cite This Article

Dr. R. Parimala. 1970. \u201cA Study of Spam E-mail classification using Feature Selection package\u201d. Unknown Journal GJCST Volume 11 (GJCST Volume 11 Issue 7): .

Download Citation

Journal Specifications
Keywords
Version of record

v1.2

Issue date

May 6, 2011

Language
en
Experiance in AR

Explore published articles in an immersive Augmented Reality environment. Our platform converts research papers into interactive 3D books, allowing readers to view and interact with content using AR and VR compatible devices.

Read in 3D

Your published article is automatically converted into a realistic 3D book. Flip through pages and read research papers in a more engaging and interactive format.

Article Matrices
Total Views: 21003
Total Downloads: 11073
2026 Trends
Related Research

Published Article

Feature selection (FS) is the technique of selecting a subset of relevant features for building learning models. FS algorithms typically fall into two categories: feature ranking and subset selection. Feature ranking ranks the features by a metric and eliminates all features that do not achieve an adequate score. Subset selection searches the set of possible features for the optimal subset. Many FS algorithm have been proposed. This paper presents a new FS technique which is guided by Fselector Package. The package Fselector implements a novel FS algorithm which is devoted to the feature ranking and feature subset selection of high dimensional data. This package provides functions for selecting attributes from a given dataset. Attribute subset selection is the process of identifying and removing as much of the irrelevant and redundant information as possible. The R package provides a convenient interface to the algorithm. This paper investigates the effectiveness of twelve commonly used FS methods on spam data set. One of the basic popular methods involves filter which select the subset of feature as preprocessing step independent of chosen classifier, Support vector machine classifier. The algorithm is designed as a wrapper around five classification algorithms. The short description of the algorithm and performance measure of its classification is presented with the spam data set.

Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]

Request Access

Please fill out the form below to request access to this research paper. Your request will be reviewed by the editorial or author team.
X

Quote and Order Details

Contact Person

Invoice Address

Notes or Comments

This is the heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

High-quality academic research articles on global topics and journals.

A Study of Spam E-mail classification using Feature Selection package

Dr. R. Parimala
Dr. R. Parimala National Institute of Technology Tiruchirappalli
Dr. R. Nallaswamy
Dr. R. Nallaswamy

Research Journals