Semantic Clustering of Genomic Documents using GO Terms as Feature Set

α
V.Bhuvaneswari
V.Bhuvaneswari
σ
Dr. B.L.Shivakumar
Dr. B.L.Shivakumar
α Bharathiar University Bharathiar University

Send Message

To: Author

Semantic Clustering of Genomic Documents using GO Terms as Feature Set

Article Fingerprint

ReserarchID

CSTSDE6IBVU

Semantic Clustering of Genomic Documents using GO Terms as Feature Set Banner

AI TAKEAWAY

Connecting with the Eternal Ground
  • English
  • Afrikaans
  • Albanian
  • Amharic
  • Arabic
  • Armenian
  • Azerbaijani
  • Basque
  • Belarusian
  • Bengali
  • Bosnian
  • Bulgarian
  • Catalan
  • Cebuano
  • Chichewa
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Corsican
  • Croatian
  • Czech
  • Danish
  • Dutch
  • Esperanto
  • Estonian
  • Filipino
  • Finnish
  • French
  • Frisian
  • Galician
  • Georgian
  • German
  • Greek
  • Gujarati
  • Haitian Creole
  • Hausa
  • Hawaiian
  • Hebrew
  • Hindi
  • Hmong
  • Hungarian
  • Icelandic
  • Igbo
  • Indonesian
  • Irish
  • Italian
  • Japanese
  • Javanese
  • Kannada
  • Kazakh
  • Khmer
  • Korean
  • Kurdish (Kurmanji)
  • Kyrgyz
  • Lao
  • Latin
  • Latvian
  • Lithuanian
  • Luxembourgish
  • Macedonian
  • Malagasy
  • Malay
  • Malayalam
  • Maltese
  • Maori
  • Marathi
  • Mongolian
  • Myanmar (Burmese)
  • Nepali
  • Norwegian
  • Pashto
  • Persian
  • Polish
  • Portuguese
  • Punjabi
  • Romanian
  • Russian
  • Samoan
  • Scots Gaelic
  • Serbian
  • Sesotho
  • Shona
  • Sindhi
  • Sinhala
  • Slovak
  • Slovenian
  • Somali
  • Spanish
  • Sundanese
  • Swahili
  • Swedish
  • Tajik
  • Tamil
  • Telugu
  • Thai
  • Turkish
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Welsh
  • Xhosa
  • Yiddish
  • Yoruba
  • Zulu

Abstract

The biological databases generate huge volume of genomics and proteomics data. The sequence information is used by researches to find similarity of genes, proteins and to find other related information. The genomic sequence database consists of large number of attributes as annotations, represented for defining the sequences in Xml format. It is necessary to have proper mechanism to group the documents for information retrieval. Data mining techniques like clustering and classification methods can be used to group the documents. The objective of the paper is to analyze the set of keywords which can be represented as features for grouping the documents semantically. This paper focuses on clustering genomic documents based on both structural and content similarity .The structural similarity is found using structural path between the documents. The semantic similarity is found for the structurally similar documents. We have proposed a methodology to cluster the genomic documents using sequence attributes without using the sequence data. The sequence attributes for genomic documents are analyzed using Filter based feature selection methods to find the relevant feature set for grouping the similar documents. Based on the attribute ranking we have clustered the similar documents using All Keyword approach (KBA) and GO Terms based approach (GOTA). The experimental results of the clusters are validated for two approaches by inferring biological meaning using Gene Ontology. From the results it was inferred that all keywords based approach grouped documents based on the semantic meaning of Gene Ontology terms. The GO terms based approach grouped larger number of documents without considering any other keywords, which is semantically relevant which results in reducing the complexity of the attributes considered. We claim that using GO terms can alone be used as features set to group genomic documents with high similarity.

References

14 Cites in Article
  1. Catia Pesquita,Daniel Faria (2008). Metrics for Go based protein semantic similarity: a systematic evaluation.
  2. A Doucet,H Ahonen -Myka (2002). Naïve Clustering of a large XML Document Collection.
  3. Julie Chabalier,Jean Mosser,Anita Burgun (2007). A transversal approach to predict gene product networks from ontology-based similarity.
  4. Joachim Ganseman,Paul Scheunders,D Wim haes "Using XQuery on Musical Databases for Musicological Analysis.
  5. Meeta Mistry,Paul Pavlidis (2008). Gene Ontology term overlap as a measure of gene functional similarity.
  6. Y Ma,R Chbeir (2005). Content and Structure Based Approach for XML Similarity.
  7. R Nayak,S Xu (2006). XCLS: A Fast and Effective Clustering Algorithm for Heterogeneous XML Documents.
  8. Nierman Jagadasih,H (2002). Evaluating structural similarity in XML documents.
  9. Panagiotis Antonellis,Christos Makris,Nikos Tsirakis (2008). XEdge.
  10. Yu Du,Gabriel Kosmacher,Yichen Liu,Jeff Massman,Joseph Palmer,Timothy Thieme,Jerry Wu,Zheyu Zhang (2007). Packing Densities of Delzant and Semitoric Polygons.
  11. A Tagarelli,S Greco (2006). toward Semantic XML Clustering.
  12. Theodore Dalamagas,Tao Cheng,Klaas-Jan Winkel,Timos Sellis (2006). A methodology for clustering XML documents by structure.
  13. Lian Wang,David Wai-Lok Cheung,Nikos Mamoulis,Siu-Ming Yiu (2004). An Efficient and by Structure.
  14. Yu-Chih Shen,Jia-Lein Hsu,Shuk-Chun Chung MF Tree: Extracting and Clustering the Structural Features from Music Object ib MusicXML.

Funding

No external funding was declared for this work.

Conflict of Interest

The authors declare no conflict of interest.

Ethical Approval

No ethics committee approval was required for this article type.

Data Availability

Not applicable for this article.

How to Cite This Article

V.Bhuvaneswari. 2012. \u201cSemantic Clustering of Genomic Documents using GO Terms as Feature Set\u201d. Global Journal of Computer Science and Technology - C: Software & Data Engineering GJCST-C Volume 12 (GJCST Volume 12 Issue C10): .

Download Citation

Issue Cover
GJCST Volume 12 Issue C10
Pg. 13- 19
Journal Specifications

Crossref Journal DOI 10.17406/gjcst

Print ISSN 0975-4350

e-ISSN 0975-4172

Version of record

v1.2

Issue date

June 7, 2012

Language
en
Experiance in AR

Explore published articles in an immersive Augmented Reality environment. Our platform converts research papers into interactive 3D books, allowing readers to view and interact with content using AR and VR compatible devices.

Read in 3D

Your published article is automatically converted into a realistic 3D book. Flip through pages and read research papers in a more engaging and interactive format.

Article Matrices
Total Views: 10363
Total Downloads: 2752
2026 Trends
Related Research

Published Article

The biological databases generate huge volume of genomics and proteomics data. The sequence information is used by researches to find similarity of genes, proteins and to find other related information. The genomic sequence database consists of large number of attributes as annotations, represented for defining the sequences in Xml format. It is necessary to have proper mechanism to group the documents for information retrieval. Data mining techniques like clustering and classification methods can be used to group the documents. The objective of the paper is to analyze the set of keywords which can be represented as features for grouping the documents semantically. This paper focuses on clustering genomic documents based on both structural and content similarity .The structural similarity is found using structural path between the documents. The semantic similarity is found for the structurally similar documents. We have proposed a methodology to cluster the genomic documents using sequence attributes without using the sequence data. The sequence attributes for genomic documents are analyzed using Filter based feature selection methods to find the relevant feature set for grouping the similar documents. Based on the attribute ranking we have clustered the similar documents using All Keyword approach (KBA) and GO Terms based approach (GOTA). The experimental results of the clusters are validated for two approaches by inferring biological meaning using Gene Ontology. From the results it was inferred that all keywords based approach grouped documents based on the semantic meaning of Gene Ontology terms. The GO terms based approach grouped larger number of documents without considering any other keywords, which is semantically relevant which results in reducing the complexity of the attributes considered. We claim that using GO terms can alone be used as features set to group genomic documents with high similarity.

Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]

Request Access

Please fill out the form below to request access to this research paper. Your request will be reviewed by the editorial or author team.
X

Quote and Order Details

Contact Person

Invoice Address

Notes or Comments

This is the heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

High-quality academic research articles on global topics and journals.

Semantic Clustering of Genomic Documents using GO Terms as Feature Set

Dr. B.L.Shivakumar
Dr. B.L.Shivakumar
V.Bhuvaneswari
V.Bhuvaneswari Bharathiar University

Research Journals