Nomenclature and Contemporary Affirmation of the Unsupervised Learning in Text and Document Mining

α
Annaluri Sreenivasa Rao
Annaluri Sreenivasa Rao
σ
Prof. S. Ramakrishna
Prof. S. Ramakrishna
α Jawaharlal Nehru Technological University, Hyderabad

Send Message

To: Author

Nomenclature and Contemporary Affirmation of the Unsupervised Learning in Text and Document Mining

Article Fingerprint

ReserarchID

CSTSDEDM3UD

Nomenclature and Contemporary Affirmation of the Unsupervised Learning in Text and Document Mining Banner

AI TAKEAWAY

Connecting with the Eternal Ground
  • English
  • Afrikaans
  • Albanian
  • Amharic
  • Arabic
  • Armenian
  • Azerbaijani
  • Basque
  • Belarusian
  • Bengali
  • Bosnian
  • Bulgarian
  • Catalan
  • Cebuano
  • Chichewa
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Corsican
  • Croatian
  • Czech
  • Danish
  • Dutch
  • Esperanto
  • Estonian
  • Filipino
  • Finnish
  • French
  • Frisian
  • Galician
  • Georgian
  • German
  • Greek
  • Gujarati
  • Haitian Creole
  • Hausa
  • Hawaiian
  • Hebrew
  • Hindi
  • Hmong
  • Hungarian
  • Icelandic
  • Igbo
  • Indonesian
  • Irish
  • Italian
  • Japanese
  • Javanese
  • Kannada
  • Kazakh
  • Khmer
  • Korean
  • Kurdish (Kurmanji)
  • Kyrgyz
  • Lao
  • Latin
  • Latvian
  • Lithuanian
  • Luxembourgish
  • Macedonian
  • Malagasy
  • Malay
  • Malayalam
  • Maltese
  • Maori
  • Marathi
  • Mongolian
  • Myanmar (Burmese)
  • Nepali
  • Norwegian
  • Pashto
  • Persian
  • Polish
  • Portuguese
  • Punjabi
  • Romanian
  • Russian
  • Samoan
  • Scots Gaelic
  • Serbian
  • Sesotho
  • Shona
  • Sindhi
  • Sinhala
  • Slovak
  • Slovenian
  • Somali
  • Spanish
  • Sundanese
  • Swahili
  • Swedish
  • Tajik
  • Tamil
  • Telugu
  • Thai
  • Turkish
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Welsh
  • Xhosa
  • Yiddish
  • Yoruba
  • Zulu

Abstract

Document clustering is primarily a method applied for an uncomplicated, document search, analysis and review of content or is a process of automatic classification of documents of similar type categorized to relevant clusters, in a clustering hierarchy. In this paper a review of the related work in the field of document clustering from the simple techniques of word and phrase to the present complex techniques of statistical analysis, machine learning etc are illustrated with their implications for future research work.

References

81 Cites in Article
  1. S Kotsiantis,P Pintelas (2004). Recent Advances in Clustering: A Brief Survey.
  2. W Xu,Y Gong (2004). Document Clustering by Concept Factorization.
  3. S Siersdorfer,S Sizov (2004). Restrictive Clustering and Metaclustering for Self-Organizing Document Collections.
  4. Brian Everitt,Sabine Landau,Morven Leese,Daniel Stahl (2001). Cluster Analysis.
  5. Van Rijsbergen,Cj (1989). Information Retrieval.
  6. C Carpineto,S Osi´nski,G Romano,D Weiss (2009). A survey of web clustering engines.
  7. X Liu,W Croft (2004). Cluster-based retrieval using language models.
  8. J Silva,J Mexia,A Coelho,G Lopes (2001). Document clustering and cluster topic extraction in multilingual corpora.
  9. K Prof,C Raja,Narayanan (2010). Clustering Technique with Feature Selection for Text Documents.
  10. Fabrizio Sebastiani (2002). Machine learning in automated text categorization.
  11. Xufei Wang,Jiliang Tang,Huan Liu (2011). Document Clustering via Matrix Representation.
  12. R Nadig,J Ramanand,P Bhattacharyya (2008). Automatic evaluation of Word Net synonyms and hypermy my India.
  13. Abhinandan Das,Mayur Datar,Ashutosh Garg,Shyam Rajaram (2007). Google news personalization.
  14. M Steinbach,G Karypis,V Kumar (2000). A comparison of document clustering techniques.
  15. P Berkhin (2004). A Survey of Clustering Data Mining Techniques.
  16. Xu Rui (2005). Survey of Clustering Algorithms.
  17. B Fung,K Wan,M Ester (2003). Hierarchical Document Clustering Using Frequent Itemsets.
  18. I Dhillon,D Modha (2001). Concept decompositions for large sparse text data using clustering.
  19. M Khalilian,N Mustapha (2010). Data Stream Clustering: Challenges and Issues.
  20. Raul Martinez-Morais,Francisco Alfaro-Cortes,Jose Sanchez (2010). Providing QoS with the Deficit Table Scheduler.
  21. A Ng,M Jordan,Y Weiss (2001). On Spectral Clustering: Analysis and an Algorithm.
  22. Susan Dumais (1993). Latent Semantic Indexing (LSI) and TREC-2.
  23. Ronald Duclos,Ned Shepherd (2010). Structured Analysis/Design LSA for the Logistic Support Analysis (LSA) Tasks, LSA Subtask 402.2.2, Sources of Manpower and Personnel Skills.
  24. S Deerwester,S Dumais,T Landauer,G Furnas,R Harshman (1990). Indexing by Latent Semantic Analysis.
  25. C Hung,D Xiaotie (2008). Efficient Phrase-Based Document Similarity for Clustering.
  26. M Soon,D John,L Yanjun (2008). Text document clustering based on frequent word meaning sequences.
  27. Thomas Hansen,Britt Slagsvold (1999). The East–West divide in late-life depression in Europe: Results from the Generations and Gender Survey.
  28. Barak Chizi Unknown Title.
  29. Yonatan Edel,Rivka Mamet,Iftach Sagy,Igor Snast,Ran Kaftory,Tomer Mimouni,Assi Lefi (2023). A 25-Hour Fast Among Quiescent Hereditary Coproporphyria and Variegate Porphyria Patients is Associated With a Low Risk of Complications.
  30. Barak Chizi,Lior Rokach,Oded Maimon (2009). A Survey of Feature Selection Techniques.
  31. Y Yang,J Pedersen (1997). A Comparative Study on Feature Selection in Text Categorization.
  32. David Hardoon,Sandor Szedmak,John Shawe-Taylor (2004). Canonical Correlation Analysis: An Overview with Application to Learning Methods.
  33. D Cai,X He,J Han (2005). Document clustering using locality preserving indexing.
  34. Florian Beil,Martin Ester,Xiaowei Xu (2002). Frequent term-based text clustering.
  35. Benjamin Fung,Ke Wang,Martin Ester (2003). Hierarchical Document Clustering Using Frequent Itemsets.
  36. H Yu,D Searsmith,X Li,J Han (2004). Scalable Construction of Topic Directory with Nonparametric Closed Termset Mining.
  37. H Malik,J Kender (2006). High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets.
  38. Walaa Gad,Mohamed Kamel (2010). Incremental clustering algorithm based on phrase-semantic similarity histogram.
  39. S Gavin,X Yue (2009). Enhancing an incremental clustering algorithm for Web page collections.
  40. Shady Shehata,Fakhri Karray,Mohamed Kamel (2010). An Efficient Concept-Based Mining Model for Enhancing Text Clustering.
  41. Yongli Liu,Yuanxin Ouyang,Hao Sheng,Zhang Xiong (2008). An Incremental Algorithm for Clustering Search Results.
  42. J Sedding,D Kazakov (2004). Wordnetbased text document clustering.
  43. Y Li,S Chung (2005). Text Document Clustering Based on Frequent Word Sequences.
  44. Joshua Tenenbaum,Vin Silva,John Langford (2009). A Global Geometric Framework for Nonlinear Dimensionality Reduction.
  45. J Weng,Y Zhang,W.-S Hwang (2003). Candid Covariance-Free Incremental Principal Component Analysis.
  46. K Hiraoka,K Hidai,M Hamahira,H Mizoguchi,T Mishima,S Yoshizawa (1999). Successive learning of linear discriminant analysis: Sanger-type algorithm.
  47. J Yan,B Zhang,Z Yan,W Chen,Q Fan,W Yang,Q Ma,Cheng (2004). IMMC: Incremental Maximum, Marginal Criterion.
  48. Oren Zamir,Oren Etzioni (1998). Web document clustering.
  49. P Thangamani,Thangaraj (2010). Integrated Clustering and Feature Selection Scheme for Text Documents..
  50. George Miller (1995). WordNet.
  51. Zheng,Kim Kang (2009). Exploiting noun phrases and semantic relationships for text document clustering.
  52. Andreas Hotho (2005). Using Ontologies to Improve the Text Custering and Classification Task.
  53. Young-Woo Seo,Anupriya Ankolekar,Katia Sycara (2004). Feature Selection for Extracting Semantically Rich Words.
  54. Bettina Berendt,Andreas Hotho,Gerd Stumme (2002). Towards Semantic Web Mining.
  55. A Hotho,A Maedche,S Staab (2001). Text clustering based on good aggregations.
  56. E Gabrilovich,S Markovitch (2007). Wikipedia-based Semantic Interpretation for Natural Language Processing.
  57. Xiaohua Hu,Xiaodan Zhang,Caimei Lu,E Park,Xiaohua Zhou (2009). Exploiting Wikipedia as external knowledge for document clustering.
  58. Jian Hu,Lujun Fang,Yang Cao,Hua-Jun Zeng,Hua Li,Qiang Yang,Zheng Chen (2008). Enhancing text clustering by leveraging Wikipedia semantics.
  59. J Xu,B Xu,W Zhang,Z Cui,W Zhang (2007). A new feature selection method for text clustering.
  60. M Janaki Meena,K Chandran,J Brinda (0975). integrating swarm intelligence and statistical data forfeature selection in text categorization.
  61. Manoranjan Dash,Kiseok Choi,Peter Scheuermann,Huan Liu (2002). Feature selection for clustering - a filter solution.
  62. Wei-Ying Ma (2006). multitype features coselection for web document clustering.
  63. Fei Wang,Ping Li,Arnd Konig (2010). Learning a Bi-Stochastic Data Similarity Matrix.
  64. G Bharathi,D Vengatesan (2012). Improving information retrieval using document clusters and semantic synonym extraction.
  65. O Zamir,O Etzioni (1999). Grouper: A Dynamic Clustering Interface to Web Search Results.
  66. G Salton,C Buckley (1998). Term-weighting approaches in automatic text retrieval.
  67. N Kumar,K Srinathan (2009). A New Approach for Clustering Variable Length Documents.
  68. Y Prathima,Supreethi,Kp (2011). A survey paper on concept based text clustering.
  69. Daniel Lee,H Seung (1999). Learning the parts of objects by non-negative matrix factorization.
  70. Farial Shahnaz,Michael Berry,V Pauca,Robert Plemmons (2006). Document clustering using nonnegative matrix factorization.
  71. C Ding,Tao Li,M Jordan (2010). Convex and Semi-Nonnegative Matrix Factorizations.
  72. Chris Ding,Xiaofeng He,Horst Simon (2005). On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering.
  73. Eric Gaussier,Cyril Goutte (2005). Relation between PLSA and NMF and implications.
  74. Joowon Lee,Hanbaek Lyu,Weixin Yao (2000). Exponentially Convergent Algorithms for Supervised Matrix Factorization.
  75. Chih-Jen Lin (2007). Projected Gradient Methods for Nonnegative Matrix Factorization.
  76. Shi Zhong (2005). Efficient streaming text clustering.
  77. Fei Wang,Ping Li (2010). Efficient Nonnegative Matrix Factorization with Random Projections.
  78. Ping Li,Kenneth Church,Trevor Hastie (2008). Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data.
  79. F Li,Q Zhu (2011). Document clustering in research literature based on NMF and testor theory.
  80. B Cao,D Shen,J Sun,X Wang,Q Yang,Z Chen (2007). Detect and track latent factors with online nonnegative matrix factorization.
  81. Sun Park,Dong An,Choi Cheon (2010). Document Clustering Method Using Weighted Semantic Features and Cluster Similarity.

Funding

No external funding was declared for this work.

Conflict of Interest

The authors declare no conflict of interest.

Ethical Approval

No ethics committee approval was required for this article type.

Data Availability

Not applicable for this article.

How to Cite This Article

Annaluri Sreenivasa Rao. 2015. \u201cNomenclature and Contemporary Affirmation of the Unsupervised Learning in Text and Document Mining\u201d. Global Journal of Computer Science and Technology - C: Software & Data Engineering GJCST-C Volume 15 (GJCST Volume 15 Issue C2): .

Download Citation

Issue Cover
GJCST Volume 15 Issue C2
Pg. 15- 21
Journal Specifications

Crossref Journal DOI 10.17406/gjcst

Print ISSN 0975-4350

e-ISSN 0975-4172

Keywords
Classification
GJCST-C Classification: H.2.8
Version of record

v1.2

Issue date

March 30, 2015

Language
en
Experiance in AR

Explore published articles in an immersive Augmented Reality environment. Our platform converts research papers into interactive 3D books, allowing readers to view and interact with content using AR and VR compatible devices.

Read in 3D

Your published article is automatically converted into a realistic 3D book. Flip through pages and read research papers in a more engaging and interactive format.

Article Matrices
Total Views: 8327
Total Downloads: 2278
2026 Trends
Related Research

Published Article

Document clustering is primarily a method applied for an uncomplicated, document search, analysis and review of content or is a process of automatic classification of documents of similar type categorized to relevant clusters, in a clustering hierarchy. In this paper a review of the related work in the field of document clustering from the simple techniques of word and phrase to the present complex techniques of statistical analysis, machine learning etc are illustrated with their implications for future research work.

Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]

Request Access

Please fill out the form below to request access to this research paper. Your request will be reviewed by the editorial or author team.
X

Quote and Order Details

Contact Person

Invoice Address

Notes or Comments

This is the heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

High-quality academic research articles on global topics and journals.

Nomenclature and Contemporary Affirmation of the Unsupervised Learning in Text and Document Mining

Annaluri Sreenivasa Rao
Annaluri Sreenivasa Rao Jawaharlal Nehru Technological University, Hyderabad
Prof. S. Ramakrishna
Prof. S. Ramakrishna

Research Journals