Probability of Semantic Similarity and N-grams Pattern Learning for Data Classification

1
V Vineeth Kumar
V Vineeth Kumar
2
Dr. N Satyanarayana.
Dr. N Satyanarayana.
1 JNTU Hyderabad

Send Message

To: Author

GJCST Volume 17 Issue H2

Article Fingerprint

ReserarchID

CSTITO5986

Probability of Semantic Similarity and N-grams Pattern Learning for Data Classification Banner
  • English
  • Afrikaans
  • Albanian
  • Amharic
  • Arabic
  • Armenian
  • Azerbaijani
  • Basque
  • Belarusian
  • Bengali
  • Bosnian
  • Bulgarian
  • Catalan
  • Cebuano
  • Chichewa
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Corsican
  • Croatian
  • Czech
  • Danish
  • Dutch
  • Esperanto
  • Estonian
  • Filipino
  • Finnish
  • French
  • Frisian
  • Galician
  • Georgian
  • German
  • Greek
  • Gujarati
  • Haitian Creole
  • Hausa
  • Hawaiian
  • Hebrew
  • Hindi
  • Hmong
  • Hungarian
  • Icelandic
  • Igbo
  • Indonesian
  • Irish
  • Italian
  • Japanese
  • Javanese
  • Kannada
  • Kazakh
  • Khmer
  • Korean
  • Kurdish (Kurmanji)
  • Kyrgyz
  • Lao
  • Latin
  • Latvian
  • Lithuanian
  • Luxembourgish
  • Macedonian
  • Malagasy
  • Malay
  • Malayalam
  • Maltese
  • Maori
  • Marathi
  • Mongolian
  • Myanmar (Burmese)
  • Nepali
  • Norwegian
  • Pashto
  • Persian
  • Polish
  • Portuguese
  • Punjabi
  • Romanian
  • Russian
  • Samoan
  • Scots Gaelic
  • Serbian
  • Sesotho
  • Shona
  • Sindhi
  • Sinhala
  • Slovak
  • Slovenian
  • Somali
  • Spanish
  • Sundanese
  • Swahili
  • Swedish
  • Tajik
  • Tamil
  • Telugu
  • Thai
  • Turkish
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Welsh
  • Xhosa
  • Yiddish
  • Yoruba
  • Zulu

Semantic learning is an important mechanism for the document classification, but most classification approaches are only considered the content and words distribution. Traditional classification algorithms cannot accurately represent the meaning of a document because it does not take into account semantic relations between words. In this paper, we present an approach for classification of documents by incorporating two similarity computing score method. First, a semantic similarity method which computes the probable similarity based on the Bayes’ method and second, n-grams pairs based on the frequent terms probability similarity score. Since, both semantic and Ngrams pairs can play important roles in a separated views for the classification of the document, we design a semantic similarity learning (SSL) algorithm to improves the performance of document classification for a huge quantity of unclassified documents.

22 Cites in Articles

References

  1. Masumi Shirakawa,Kotaro Nakayama,Takahiro Hara,Shojiro Nishio (2015). Wikipedia-Based Semantic Similarity Measurements for Noisy Short Texts Using Extended Naive Bayes.
  2. Elena Baralis,Luca Cagliero,Paolo Garza (2013). EnBay: A Novel Pattern-Based Bayesian Classifier.
  3. Daniel Grossman,Pedro Domingos (2004). Learning Bayesian network classifiers by maximizing conditional likelihood.
  4. R Kohavi (1996). Scaling up the Accuracy of Naive Bayes Classifiers: A Decision-Tree Hybrid.
  5. M Hall,E Frank (2008). Combining Naive Bayes and Decision Tables.
  6. José Ferreira,David Denison,David Hand (2001). Data Mining with Products of Trees.
  7. Mark Hall (2007). A Decision Tree-Based Attribute Weighting Filter for Naive Bayes.
  8. Rakesh Agrawal,Tomasz Imieliński,Arun Swami (1993). Mining association rules between sets of items in large databases.
  9. R Johnson,J Shore (1980). Axiomatic Derivation of the Principle of Maximum Entropy and the Principle of Minimum Cross Entropy.
  10. E Baralis,L Cagliero,T Cerquitelli,V D' Elia,P Garza (2010). Support Driven Opportunistic Aggregation for Generalized Itemset Extraction.
  11. Somnath Banerjee,Krishnan Ramanathan,Ajay Gupta (2007). Clustering short texts using wikipedia.
  12. Xinruo Sun,Haofen Wang,Yong Yu (2011). Towards effective short text deep classification.
  13. Ms Preethi,Dr Devi (2012). Case and Relation (CARE) based Page Rank Algorithm for Semantic Web Search Engines.
  14. T Lee,J Hendler,O Lassila (2001). The semantic web.
  15. Ch.-Qin Huan,Y Ru-Lin Duan,Zhi-Ting Tang,Y.-Jian Zhu,Yu-Qing Yan,Guo (2011). EIIS: an educational information intelligent search engine supported by semantic services.
  16. Robin Sharma,Ankita Kandpal,Priyanka Bhakuni,Rashmi Chauhan,R Goudar,Asit Tyagi (2013). Web page indexing through page ranking for effective semantic search.
  17. Lin Yuan,Lin Hongfei,Li He (2012). A Cluster-based Resource Correlative Query Expansion in Distributed Information Retrieval.
  18. W Chu,Z Liu,W Mao (2002). Textual document indexing and retrieval via knowledge sources and data mining.
  19. A Vizcaíno,F García,I Caballero,J Villar,M Piattini (2012). Towards an ontology for global software development.
  20. N Tyagi,S Sharma (2012). Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page.
  21. N Duhan,A Sharma,K Bhatia (2009). Page Ranking Algorithms: A Survey.
  22. Dr Vishal Jain,Singh (2013). Ontology Based Information Retrieval in Semantic Web: A Survey.

Funding

No external funding was declared for this work.

Conflict of Interest

The authors declare no conflict of interest.

Ethical Approval

No ethics committee approval was required for this article type.

Data Availability

Not applicable for this article.

V Vineeth Kumar. 2017. \u201cProbability of Semantic Similarity and N-grams Pattern Learning for Data Classification\u201d. Global Journal of Computer Science and Technology - H: Information & Technology GJCST-H Volume 17 (GJCST Volume 17 Issue H2): .

Download Citation

Journal Specifications

Crossref Journal DOI 10.17406/gjcst

Print ISSN 0975-4350

e-ISSN 0975-4172

Keywords
Classification
GJCST-H Classification: G.3 I.5, I.5.2
Version of record

v1.2

Issue date

May 24, 2017

Language

English

Experiance in AR

The methods for personal identification and authentication are no exception.

Read in 3D

The methods for personal identification and authentication are no exception.

Article Matrices
Total Views: 6502
Total Downloads: 1851
2026 Trends
Research Identity (RIN)
Related Research

Published Article

Semantic learning is an important mechanism for the document classification, but most classification approaches are only considered the content and words distribution. Traditional classification algorithms cannot accurately represent the meaning of a document because it does not take into account semantic relations between words. In this paper, we present an approach for classification of documents by incorporating two similarity computing score method. First, a semantic similarity method which computes the probable similarity based on the Bayes’ method and second, n-grams pairs based on the frequent terms probability similarity score. Since, both semantic and Ngrams pairs can play important roles in a separated views for the classification of the document, we design a semantic similarity learning (SSL) algorithm to improves the performance of document classification for a huge quantity of unclassified documents.

Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]
×

This Page is Under Development

We are currently updating this article page for a better experience.

Request Access

Please fill out the form below to request access to this research paper. Your request will be reviewed by the editorial or author team.
X

Quote and Order Details

Contact Person

Invoice Address

Notes or Comments

This is the heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

High-quality academic research articles on global topics and journals.

Probability of Semantic Similarity and N-grams Pattern Learning for Data Classification

V Vineeth Kumar
V Vineeth Kumar JNTU Hyderabad
Dr. N Satyanarayana.
Dr. N Satyanarayana.

Research Journals