Feature Extraction and Duplicate Detection for Text Mining: A Survey

Article ID

CSTSDEZ7U17

Feature Extraction and Duplicate Detection for Text Mining: A Survey

Ramya R S
Ramya R S University Visvesvaraya College of Engineering, UVCE
Venugopal K R
Venugopal K R
Iyengar S S
Iyengar S S
Patnaik L M
Patnaik L M
DOI

Abstract

Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data. The survey covers different text summarization, classi- fication, clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. Hence we also review the literature on duplicate detection and data fusion (remove and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user.

Feature Extraction and Duplicate Detection for Text Mining: A Survey

Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data. The survey covers different text summarization, classi- fication, clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. Hence we also review the literature on duplicate detection and data fusion (remove and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user.

Ramya R S
Ramya R S University Visvesvaraya College of Engineering, UVCE
Venugopal K R
Venugopal K R
Iyengar S S
Iyengar S S
Patnaik L M
Patnaik L M

No Figures found in article.

Ramya R S. 2017. “. Global Journal of Computer Science and Technology – C: Software & Data Engineering GJCST-C Volume 16 (GJCST Volume 16 Issue C5): .

Download Citation

Journal Specifications

Crossref Journal DOI 10.17406/gjcst

Print ISSN 0975-4350

e-ISSN 0975-4172

Classification
C.2.1,C.2.4,H.2.8
Keywords
Article Matrices
Total Views: 7042
Total Downloads: 1804
2026 Trends
Research Identity (RIN)
Related Research
Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]

Request Access

Please fill out the form below to request access to this research paper. Your request will be reviewed by the editorial or author team.
X

Quote and Order Details

Contact Person

Invoice Address

Notes or Comments

This is the heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

High-quality academic research articles on global topics and journals.

Feature Extraction and Duplicate Detection for Text Mining: A Survey

Ramya R S
Ramya R S University Visvesvaraya College of Engineering, UVCE
Venugopal K R
Venugopal K R
Iyengar S S
Iyengar S S
Patnaik L M
Patnaik L M

Research Journals