Text Categorization and Machine Learning Methods: Current State Of The Art

1
Durga Bhavani Dasari
Durga Bhavani Dasari
2
Dr. Venu Gopala Rao. K
Dr. Venu Gopala Rao. K
1 Jawaharlal nehru university - Hyderabad

Send Message

To: Author

GJCST Volume 12 Issue C11

Article Fingerprint

ReserarchID

CSTSDE05N02

Text Categorization and Machine Learning Methods: Current State Of The Art Banner
  • English
  • Afrikaans
  • Albanian
  • Amharic
  • Arabic
  • Armenian
  • Azerbaijani
  • Basque
  • Belarusian
  • Bengali
  • Bosnian
  • Bulgarian
  • Catalan
  • Cebuano
  • Chichewa
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Corsican
  • Croatian
  • Czech
  • Danish
  • Dutch
  • Esperanto
  • Estonian
  • Filipino
  • Finnish
  • French
  • Frisian
  • Galician
  • Georgian
  • German
  • Greek
  • Gujarati
  • Haitian Creole
  • Hausa
  • Hawaiian
  • Hebrew
  • Hindi
  • Hmong
  • Hungarian
  • Icelandic
  • Igbo
  • Indonesian
  • Irish
  • Italian
  • Japanese
  • Javanese
  • Kannada
  • Kazakh
  • Khmer
  • Korean
  • Kurdish (Kurmanji)
  • Kyrgyz
  • Lao
  • Latin
  • Latvian
  • Lithuanian
  • Luxembourgish
  • Macedonian
  • Malagasy
  • Malay
  • Malayalam
  • Maltese
  • Maori
  • Marathi
  • Mongolian
  • Myanmar (Burmese)
  • Nepali
  • Norwegian
  • Pashto
  • Persian
  • Polish
  • Portuguese
  • Punjabi
  • Romanian
  • Russian
  • Samoan
  • Scots Gaelic
  • Serbian
  • Sesotho
  • Shona
  • Sindhi
  • Sinhala
  • Slovak
  • Slovenian
  • Somali
  • Spanish
  • Sundanese
  • Swahili
  • Swedish
  • Tajik
  • Tamil
  • Telugu
  • Thai
  • Turkish
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Welsh
  • Xhosa
  • Yiddish
  • Yoruba
  • Zulu

In this informative age, we find many documents are available in digital forms which need classification of the text. For solving this major problem present researchers focused on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of pre classified documents, the characteristics of the categories. The main benefit of the present approach is consisting in the manual definition of a classifier by domain experts where effectiveness, less use of expert work and straightforward portability to different domains are possible. The paper examines the main approaches to text categorization comparing the machine learning paradigm and present state of the art. Various issues pertaining to three different text similarity problems, namely, semantic, conceptual and contextual are also discussed.

59 Cites in Articles

References

  1. Y Bao,N Ishii (2002). Combining Multiple kNN Classifiers for Text Categorization by Reducts.
  2. Yaxin Bi,David Bell,Hui Wang,Gongde Guo,Kieran Greer (2004). Combining Multiple Classifiers Using Dempster’s Rule of Combination for Text Categorization.
  3. J Brank,M Grobelnik,N Milic-Frayling,D Mladenic (2002). Interaction of Feature Selection Methods and Linear Classification Models.
  4. Ana Cardoso-Cachopo,Arlindo Oliveira (2003). An Empirical Comparison of Text Categorization Methods.
  5. N Chawla,K Bowyer,L Hall,W Kegelmeyer (2002). SMOTE: Synthetic Minority Over-sampling Technique.
  6. George Forman (2003). A pitfall and solution in multi-class feature selection for text classification.
  7. D Fragoudis,D Meretakis,S Likothanassis (2002). Integrating Feature and Instance Selection for Text Classification.
  8. J Guan,S Zhou (2002). Pruning Training Corpus to Speedup Text Classification.
  9. D Johnson,F Oles,T Zhang,T Goetz (2002). A decision-tree-based symbolic rule induction system for text categorization.
  10. Xuexian Han,Guowei Zu,Wataru Ohyama,Tetsushi Wakabayashi,Fumitaka Kimura (2004). Accuracy Improvement of Automatic Text Classification Based on Feature Transformation and Multi-classifier Combination.
  11. H Ke,M Shaoping (2002). Text categorization based on Concept indexing and principal component analysis.
  12. Athanasios Kehagias,Vassilios Petridis,Vassilis Kaburlasos,Pavlina Fragkou (2003). A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms.
  13. Brett Kessler,Geoffrey Numberg,Hinrich Schütze (1997). Automatic detection of text genre.
  14. Sang-Bum Kim,Hae-Chang Rim,Dongsuk Yook,Heui-Seok Lim (2002). Effective Methods for Improving Naive Bayes Text Classifiers.
  15. Mieczysław Kłopotek,Marcin Woch (2003). Very Large Bayesian Networks in Text Classification.
  16. Edda Leopold,Jörg Kindermann (2002). Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?.
  17. D Lewis,Y Yang,T Rose,F Li (2004). RCV1: A New Benchmark Collection for Text Categorization Research.
  18. Heui Lim (2004). Improving kNN Based Text Classification with Well Estimated Parameters.
  19. R Madsen,S Sigurdsson,L Hansen,J Larsen (2004). Pruning the vocabulary for better context recognition.
  20. E Montanes,J Quevedo,I Diaz (2003). A Wrapper Approach with Support Vector Machines for Text Categorization.
  21. Pio Nardiello,Fabrizio Sebastiani,Alessandro Sperduti (2003). Discretizing Continuous Attributes in AdaBoost for Text Categorization.
  22. Jana Novovičová,Antonín Malík,Pavel Pudil (2004). Feature Selection Using Improved Mutual Information for Text Classification.
  23. Wang Qiang,Wang Xiaolong,Guan Yi (2005). A Study of Semi-discrete Matrix Decomposition for LSI in Automated Text Categorization.
  24. Karl-Michael Schneider (2005). Techniques for Improving the Performance of Naive Bayes for Text Classification.
  25. Fabrizio Sebastiani (2002). Machine learning in automated text categorization.
  26. James Shanahan,Norbert Roma (2003). Improving SVM Text Classification Performance through Threshold Adjustment.
  27. Pascal Soucy,Guy Mineau (2003). Feature Selection Strategies for Text Categorization.
  28. P Sousa,J Pimentao,B Santos,F Moura-Pires (2003). Feature Selection Algorithms to Improve Documents Classification Performance.
  29. Sung-Bae Cho,Jee-Haeng Lee (2003). Learning Neural Network Ensemble for Practical Text Classification.
  30. K Torkkola (2002). Discriminative Features for Text Document Classification.
  31. A Vinciarelli (2004). Noisy text categorization.
  32. Y Yang,J Zhang,B Kisiel (2003). A scalability analysis of classifiers in text categorization.
  33. Y Yang (1999). An evaluation of statistical approaches to text categorization.
  34. Zhenya Zhang,Shuguang Zhang,Enhong Chen,Xufa Wang,Hongmei Cheng (2005). TextCC: New Feed Forward Neural Network for Classifying Documents Instantly.
  35. Shuigeng Zhou,Jihong Guan (2002). Evaluation and Construction of Training Corpuses for Text Classification: A Preliminary Study.
  36. Thanaruk Verayuth Lertnattee,Theeramunkong (2004). Parallel Text Categorization for Multi-dimensional Data.
  37. Wang Qiang,Wang Xiaolong,Guan Yi (2005). A Study of Semi-discrete Matrix Decomposition for LSI in Automated Text Categorization.
  38. Guowei Zu,Wataru Ohyama,Tetsushi Wakabayashi,Fumitaka Kimura (2003). Accuracy improvement of automatic text classification based on feature transformation.
  39. Kevin Knight (1999). Mining online text.
  40. M Pazienza (1997). Information Extraction.
  41. Riloff (1995). Little words can make a big difference for text classification.
  42. Harold Borko,Myrna Bernick (1963). Automatic Document Classification.
  43. Dieter Merkl (1998). Text classification with self-organizing maps: Some lessons learned.
  44. C Manning,H Sch¨utze (1999). Foundations of Statistical Natural Language Processing.
  45. Oana Frunza,Diana Inkpen,Thomas Tran (2011). A Machine Learning Approach for Identifying Disease-Treatment Relations in Short Texts.
  46. H Al-Mubaid,S Umair (2006). A New Text Categorization Technique Using Distributional Clustering and Learning Logic.
  47. A Sun,E-P Lim,W-K Ng,J Srivastava (2004). Blocking reduction strategies in hierarchical text classification.
  48. K Sarinnapakorn,M Kubat (2007). Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study.
  49. D Bell,J Guan,Y Bi (2005). On combining classifier mass functions for text categorization.
  50. Padmini Srinivasan,T Rindflesch (2002). Text mining: Generating hypotheses from MEDLINE.
  51. Hisham Al-Mubaid,Klaus Truemper (2006). Learning to Find Context Based Spelling Errors.
  52. Giovanni Felici,Klaus Truemper (2002). A MINSAT Approach for Learning in Logic Domains.
  53. L Baker,Andrew Mccallum (1998). Distributional clustering of words for text classification.
  54. Ron Bekkerman,Ran El-Yaniv,Naftali Tishby,Yoad Winter (2003). On feature distributional clustering for text categorization.
  55. Inderjit Dhillon,Subramanyam Mallela,Dharmendra Modha (2003). Information-theoretic co-clustering.
  56. Fernando Pereira,Naftali Tishby,Lillian Lee (1993). Distributional clustering of English words.
  57. Noam Slonim,Naftali Tishby (2001). Document clustering using word clusters via the information bottleneck method.
  58. Susan Dumais,Hao Chen (2000). Hierarchical classification of Web content.
  59. A Sun,E. -P Lim (2001). Hierarchical Text Classification and Evaluation.

Funding

No external funding was declared for this work.

Conflict of Interest

The authors declare no conflict of interest.

Ethical Approval

No ethics committee approval was required for this article type.

Data Availability

Not applicable for this article.

Durga Bhavani Dasari. 2012. \u201cText Categorization and Machine Learning Methods: Current State Of The Art\u201d. Global Journal of Computer Science and Technology - C: Software & Data Engineering GJCST-C Volume 12 (GJCST Volume 12 Issue C11): .

Download Citation

Issue Cover
GJCST Volume 12 Issue C11
Pg. 37- 46
Journal Specifications

Crossref Journal DOI 10.17406/gjcst

Print ISSN 0975-4350

e-ISSN 0975-4172

Classification
Not Found
Version of record

v1.2

Issue date

July 17, 2012

Language

English

Experiance in AR

The methods for personal identification and authentication are no exception.

Read in 3D

The methods for personal identification and authentication are no exception.

Article Matrices
Total Views: 10199
Total Downloads: 2718
2026 Trends
Research Identity (RIN)
Related Research

Published Article

In this informative age, we find many documents are available in digital forms which need classification of the text. For solving this major problem present researchers focused on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of pre classified documents, the characteristics of the categories. The main benefit of the present approach is consisting in the manual definition of a classifier by domain experts where effectiveness, less use of expert work and straightforward portability to different domains are possible. The paper examines the main approaches to text categorization comparing the machine learning paradigm and present state of the art. Various issues pertaining to three different text similarity problems, namely, semantic, conceptual and contextual are also discussed.

Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]
×

This Page is Under Development

We are currently updating this article page for a better experience.

Request Access

Please fill out the form below to request access to this research paper. Your request will be reviewed by the editorial or author team.
X

Quote and Order Details

Contact Person

Invoice Address

Notes or Comments

This is the heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

High-quality academic research articles on global topics and journals.

Text Categorization and Machine Learning Methods: Current State Of The Art

Durga Bhavani Dasari
Durga Bhavani Dasari Jawaharlal nehru university - Hyderabad
Dr. Venu Gopala Rao. K
Dr. Venu Gopala Rao. K

Research Journals