Nomenclature and Contemporary Affirmation of the Unsupervised Learning in Text and Document Mining

Nomenclature and Contemporary Affirmation of the Unsupervised Learning in Text and Document Mining

Annaluri Sreenivasa Rao

Contact

Prof. S. Ramakrishna

Contact

α Jawaharlal Nehru Technological University, Hyderabad

Nomenclature and Contemporary Affirmation of the Unsupervised Learning in Text and Document Mining

Article Fingerprint

ReserarchID

CSTSDEDM3UD

Nomenclature and Contemporary Affirmation of the Unsupervised Learning in Text and Document Mining Banner

AI TAKEAWAY

Connecting with the Eternal Ground

Abstract

Document clustering is primarily a method applied for an uncomplicated, document search, analysis and review of content or is a process of automatic classification of documents of similar type categorized to relevant clusters, in a clustering hierarchy. In this paper a review of the related work in the field of document clustering from the simple techniques of word and phrase to the present complex techniques of statistical analysis, machine learning etc are illustrated with their implications for future research work.

References

81 Cites in Article

Reference Format

S Kotsiantis,P Pintelas (2004). Recent Advances in Clustering: A Brief Survey.
W Xu,Y Gong (2004). Document Clustering by Concept Factorization.
S Siersdorfer,S Sizov (2004). Restrictive Clustering and Metaclustering for Self-Organizing Document Collections.
Brian Everitt,Sabine Landau,Morven Leese,Daniel Stahl (2001). Cluster Analysis.
Van Rijsbergen,Cj (1989). Information Retrieval.
C Carpineto,S Osi´nski,G Romano,D Weiss (2009). A survey of web clustering engines.
X Liu,W Croft (2004). Cluster-based retrieval using language models.
J Silva,J Mexia,A Coelho,G Lopes (2001). Document clustering and cluster topic extraction in multilingual corpora.
K Prof,C Raja,Narayanan (2010). Clustering Technique with Feature Selection for Text Documents.
Fabrizio Sebastiani (2002). Machine learning in automated text categorization.
Xufei Wang,Jiliang Tang,Huan Liu (2011). Document Clustering via Matrix Representation.
R Nadig,J Ramanand,P Bhattacharyya (2008). Automatic evaluation of Word Net synonyms and hypermy my India.
Abhinandan Das,Mayur Datar,Ashutosh Garg,Shyam Rajaram (2007). Google news personalization.
M Steinbach,G Karypis,V Kumar (2000). A comparison of document clustering techniques.
P Berkhin (2004). A Survey of Clustering Data Mining Techniques.
Xu Rui (2005). Survey of Clustering Algorithms.
B Fung,K Wan,M Ester (2003). Hierarchical Document Clustering Using Frequent Itemsets.
I Dhillon,D Modha (2001). Concept decompositions for large sparse text data using clustering.
M Khalilian,N Mustapha (2010). Data Stream Clustering: Challenges and Issues.
Raul Martinez-Morais,Francisco Alfaro-Cortes,Jose Sanchez (2010). Providing QoS with the Deficit Table Scheduler.
A Ng,M Jordan,Y Weiss (2001). On Spectral Clustering: Analysis and an Algorithm.
Susan Dumais (1993). Latent Semantic Indexing (LSI) and TREC-2.
Ronald Duclos,Ned Shepherd (2010). Structured Analysis/Design LSA for the Logistic Support Analysis (LSA) Tasks, LSA Subtask 402.2.2, Sources of Manpower and Personnel Skills.
S Deerwester,S Dumais,T Landauer,G Furnas,R Harshman (1990). Indexing by Latent Semantic Analysis.
C Hung,D Xiaotie (2008). Efficient Phrase-Based Document Similarity for Clustering.
M Soon,D John,L Yanjun (2008). Text document clustering based on frequent word meaning sequences.
Thomas Hansen,Britt Slagsvold (1999). The East–West divide in late-life depression in Europe: Results from the Generations and Gender Survey.
Barak Chizi Unknown Title.
Yonatan Edel,Rivka Mamet,Iftach Sagy,Igor Snast,Ran Kaftory,Tomer Mimouni,Assi Lefi (2023). A 25-Hour Fast Among Quiescent Hereditary Coproporphyria and Variegate Porphyria Patients is Associated With a Low Risk of Complications.
Barak Chizi,Lior Rokach,Oded Maimon (2009). A Survey of Feature Selection Techniques.
Y Yang,J Pedersen (1997). A Comparative Study on Feature Selection in Text Categorization.
David Hardoon,Sandor Szedmak,John Shawe-Taylor (2004). Canonical Correlation Analysis: An Overview with Application to Learning Methods.
D Cai,X He,J Han (2005). Document clustering using locality preserving indexing.
Florian Beil,Martin Ester,Xiaowei Xu (2002). Frequent term-based text clustering.
Benjamin Fung,Ke Wang,Martin Ester (2003). Hierarchical Document Clustering Using Frequent Itemsets.
H Yu,D Searsmith,X Li,J Han (2004). Scalable Construction of Topic Directory with Nonparametric Closed Termset Mining.
H Malik,J Kender (2006). High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets.
Walaa Gad,Mohamed Kamel (2010). Incremental clustering algorithm based on phrase-semantic similarity histogram.
S Gavin,X Yue (2009). Enhancing an incremental clustering algorithm for Web page collections.
Shady Shehata,Fakhri Karray,Mohamed Kamel (2010). An Efficient Concept-Based Mining Model for Enhancing Text Clustering.
Yongli Liu,Yuanxin Ouyang,Hao Sheng,Zhang Xiong (2008). An Incremental Algorithm for Clustering Search Results.
J Sedding,D Kazakov (2004). Wordnetbased text document clustering.
Y Li,S Chung (2005). Text Document Clustering Based on Frequent Word Sequences.
Joshua Tenenbaum,Vin Silva,John Langford (2009). A Global Geometric Framework for Nonlinear Dimensionality Reduction.
J Weng,Y Zhang,W.-S Hwang (2003). Candid Covariance-Free Incremental Principal Component Analysis.
K Hiraoka,K Hidai,M Hamahira,H Mizoguchi,T Mishima,S Yoshizawa (1999). Successive learning of linear discriminant analysis: Sanger-type algorithm.
J Yan,B Zhang,Z Yan,W Chen,Q Fan,W Yang,Q Ma,Cheng (2004). IMMC: Incremental Maximum, Marginal Criterion.
Oren Zamir,Oren Etzioni (1998). Web document clustering.
P Thangamani,Thangaraj (2010). Integrated Clustering and Feature Selection Scheme for Text Documents..
George Miller (1995). WordNet.
Zheng,Kim Kang (2009). Exploiting noun phrases and semantic relationships for text document clustering.
Andreas Hotho (2005). Using Ontologies to Improve the Text Custering and Classification Task.
Young-Woo Seo,Anupriya Ankolekar,Katia Sycara (2004). Feature Selection for Extracting Semantically Rich Words.
Bettina Berendt,Andreas Hotho,Gerd Stumme (2002). Towards Semantic Web Mining.
A Hotho,A Maedche,S Staab (2001). Text clustering based on good aggregations.
E Gabrilovich,S Markovitch (2007). Wikipedia-based Semantic Interpretation for Natural Language Processing.
Xiaohua Hu,Xiaodan Zhang,Caimei Lu,E Park,Xiaohua Zhou (2009). Exploiting Wikipedia as external knowledge for document clustering.
Jian Hu,Lujun Fang,Yang Cao,Hua-Jun Zeng,Hua Li,Qiang Yang,Zheng Chen (2008). Enhancing text clustering by leveraging Wikipedia semantics.
J Xu,B Xu,W Zhang,Z Cui,W Zhang (2007). A new feature selection method for text clustering.
M Janaki Meena,K Chandran,J Brinda (0975). integrating swarm intelligence and statistical data forfeature selection in text categorization.
Manoranjan Dash,Kiseok Choi,Peter Scheuermann,Huan Liu (2002). Feature selection for clustering - a filter solution.
Wei-Ying Ma (2006). multitype features coselection for web document clustering.
Fei Wang,Ping Li,Arnd Konig (2010). Learning a Bi-Stochastic Data Similarity Matrix.
G Bharathi,D Vengatesan (2012). Improving information retrieval using document clusters and semantic synonym extraction.
O Zamir,O Etzioni (1999). Grouper: A Dynamic Clustering Interface to Web Search Results.
G Salton,C Buckley (1998). Term-weighting approaches in automatic text retrieval.
N Kumar,K Srinathan (2009). A New Approach for Clustering Variable Length Documents.
Y Prathima,Supreethi,Kp (2011). A survey paper on concept based text clustering.
Daniel Lee,H Seung (1999). Learning the parts of objects by non-negative matrix factorization.
Farial Shahnaz,Michael Berry,V Pauca,Robert Plemmons (2006). Document clustering using nonnegative matrix factorization.
C Ding,Tao Li,M Jordan (2010). Convex and Semi-Nonnegative Matrix Factorizations.
Chris Ding,Xiaofeng He,Horst Simon (2005). On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering.
Eric Gaussier,Cyril Goutte (2005). Relation between PLSA and NMF and implications.
Joowon Lee,Hanbaek Lyu,Weixin Yao (2000). Exponentially Convergent Algorithms for Supervised Matrix Factorization.
Chih-Jen Lin (2007). Projected Gradient Methods for Nonnegative Matrix Factorization.
Shi Zhong (2005). Efficient streaming text clustering.
Fei Wang,Ping Li (2010). Efficient Nonnegative Matrix Factorization with Random Projections.
Ping Li,Kenneth Church,Trevor Hastie (2008). Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data.
F Li,Q Zhu (2011). Document clustering in research literature based on NMF and testor theory.
B Cao,D Shen,J Sun,X Wang,Q Yang,Z Chen (2007). Detect and track latent factors with online nonnegative matrix factorization.
Sun Park,Dong An,Choi Cheon (2010). Document Clustering Method Using Weighted Semantic Features and Cluster Similarity.

Download References

Funding

No external funding was declared for this work.

Conflict of Interest

The authors declare no conflict of interest.

Ethical Approval

No ethics committee approval was required for this article type.

Data Availability

Not applicable for this article.

How to Cite This Article

Annaluri Sreenivasa Rao. 2015. \u201cNomenclature and Contemporary Affirmation of the Unsupervised Learning in Text and Document Mining\u201d. Global Journal of Computer Science and Technology - C: Software & Data Engineering GJCST-C Volume 15 (GJCST Volume 15 Issue C2): .

More Citation Formats

Select Citation Style:

Download Citation

Download Article

GJCST Volume 15 Issue C2
Pg. 15- 21

Explore Journals Explore Volume Read This Issue

Journal Specifications

Crossref Journal DOI 10.17406/gjcst

Print ISSN 0975-4350

e-ISSN 0975-4172

Keywords

Not Found

Classification

GJCST-C Classification: H.2.8

Submission ReceivedDecember 15, 2014
Peer Review Double Blind
Handling Editor
Accepted December 31, 2014
Published January 15, 2015

Version of record

v1.2

Issue date

March 30, 2015

Language

Experiance in AR

Explore published articles in an immersive Augmented Reality environment. Our platform converts research papers into interactive 3D books, allowing readers to view and interact with content using AR and VR compatible devices.

View in VR

Read in 3D

Your published article is automatically converted into a realistic 3D book. Flip through pages and read research papers in a more engaging and interactive format.

View in 3D

Article Matrices

Total Score: 102

Country: India

Subject: Global Journal of Computer Science and Technology - C: Software & Data Engineering

Authors: Annaluri Sreenivasa Rao, Prof. S. Ramakrishna (PhD/Dr. count: 0)

View Count (all-time): 252

Total Views (Real + Logic): 8327

Total Downloads (simulated): 2278

Publish Date: 2015 03, Mon

Monthly Totals (Real + Logic):

Month 1: 33 views
Month 2: 47 views
Month 3: 44 views
Month 4: 25 views
Month 5: 58 views
Month 6: 49 views
Month 7: 20 views
Month 8: 27 views
Month 9: 29 views
Month 10: 46 views
Month 11: 39 views
Month 12: 37 views
Month 13: 35 views
Month 14: 23 views
Month 15: 21 views
Month 16: 30 views
Month 17: 35 views
Month 18: 24 views
Month 19: 37 views
Month 20: 29 views
Month 21: 36 views
Month 22: 32 views
Month 23: 12 views
Month 24: 39 views
Month 25: 24 views
Month 26: 38 views
Month 27: 45 views
Month 28: 17 views
Month 29: 28 views
Month 30: 36 views
Month 31: 21 views
Month 32: 43 views
Month 33: 29 views
Month 34: 23 views
Month 35: 29 views
Month 36: 30 views
Month 37: 40 views
Month 38: 29 views
Month 39: 43 views
Month 40: 23 views
Month 41: 33 views
Month 42: 37 views
Month 43: 28 views
Month 44: 20 views
Month 45: 28 views
Month 46: 28 views
Month 47: 35 views
Month 48: 24 views
Month 49: 18 views
Month 50: 35 views
Month 51: 20 views
Month 52: 37 views
Month 53: 20 views
Month 54: 17 views
Month 55: 46 views
Month 56: 38 views
Month 57: 27 views
Month 58: 24 views
Month 59: 32 views
Month 60: 30 views
Month 61: 20 views
Month 62: 32 views
Month 63: 46 views
Month 64: 22 views
Month 65: 29 views
Month 66: 23 views
Month 67: 25 views
Month 68: 33 views
Month 69: 36 views
Month 70: 32 views
Month 71: 40 views
Month 72: 24 views
Month 73: 29 views
Month 74: 31 views
Month 75: 18 views
Month 76: 31 views
Month 77: 44 views
Month 78: 32 views
Month 79: 42 views
Month 80: 29 views
Month 81: 29 views
Month 82: 32 views
Month 83: 30 views
Month 84: 41 views
Month 85: 37 views
Month 86: 30 views
Month 87: 25 views
Month 88: 28 views
Month 89: 31 views
Month 90: 49 views
Month 91: 25 views
Month 92: 21 views
Month 93: 34 views
Month 94: 41 views
Month 95: 26 views
Month 96: 33 views
Month 97: 14 views
Month 98: 27 views
Month 99: 17 views
Month 100: 17 views
Month 101: 40 views
Month 102: 33 views
Month 103: 37 views
Month 104: 47 views
Month 105: 48 views
Month 106: 19 views
Month 107: 28 views
Month 108: 35 views
Month 109: 23 views
Month 110: 21 views
Month 111: 33 views
Month 112: 29 views
Month 113: 33 views
Month 114: 30 views
Month 115: 26 views
Month 116: 37 views
Month 117: 43 views
Month 118: 17 views
Month 119: 38 views
Month 120: 26 views
Month 121: 42 views
Month 122: 27 views
Month 123: 20 views
Month 124: 27 views
Month 125: 47 views
Month 126: 31 views
Month 127: 44 views
Month 128: 26 views
Month 129: 42 views
Month 130: 16 views
Month 131: 30 views
Month 132: 57 views
Month 133: 44 views

Total Views: 8327

Total Downloads: 2278

2026 Trends

Published Article

Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]