Telugu Text Categorization using Language Models

1
Swapna Narala
Swapna Narala
2
B. Padmaja Rani
B. Padmaja Rani
3
K. Ramakrishna
K. Ramakrishna

Send Message

To: Author

GJCST Volume 16 Issue H4

Article Fingerprint

ReserarchID

CSTIT41JBO

Telugu Text Categorization using Language Models Banner
  • English
  • Afrikaans
  • Albanian
  • Amharic
  • Arabic
  • Armenian
  • Azerbaijani
  • Basque
  • Belarusian
  • Bengali
  • Bosnian
  • Bulgarian
  • Catalan
  • Cebuano
  • Chichewa
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Corsican
  • Croatian
  • Czech
  • Danish
  • Dutch
  • Esperanto
  • Estonian
  • Filipino
  • Finnish
  • French
  • Frisian
  • Galician
  • Georgian
  • German
  • Greek
  • Gujarati
  • Haitian Creole
  • Hausa
  • Hawaiian
  • Hebrew
  • Hindi
  • Hmong
  • Hungarian
  • Icelandic
  • Igbo
  • Indonesian
  • Irish
  • Italian
  • Japanese
  • Javanese
  • Kannada
  • Kazakh
  • Khmer
  • Korean
  • Kurdish (Kurmanji)
  • Kyrgyz
  • Lao
  • Latin
  • Latvian
  • Lithuanian
  • Luxembourgish
  • Macedonian
  • Malagasy
  • Malay
  • Malayalam
  • Maltese
  • Maori
  • Marathi
  • Mongolian
  • Myanmar (Burmese)
  • Nepali
  • Norwegian
  • Pashto
  • Persian
  • Polish
  • Portuguese
  • Punjabi
  • Romanian
  • Russian
  • Samoan
  • Scots Gaelic
  • Serbian
  • Sesotho
  • Shona
  • Sindhi
  • Sinhala
  • Slovak
  • Slovenian
  • Somali
  • Spanish
  • Sundanese
  • Swahili
  • Swedish
  • Tajik
  • Tamil
  • Telugu
  • Thai
  • Turkish
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Welsh
  • Xhosa
  • Yiddish
  • Yoruba
  • Zulu

Document categorization has become an emerging technique in the field of research due to the abundance of documents available in digital form. In this paper we propose language dependent and independent models applicable to categorization of Telugu documents. India is a multilingual country; a provision is made for each of the Indian states to choose their own authorized language for communicating at the state level for legitimate purpose. The availability of constantly increasing amount of textual data of various Indian regional languages in electronic form has accelerated. Hence, the Classification of text documents based on languages is crucial. Telugu is the third most spoken language in India and one of the fifteen most spoken language n the world. It is the official language of the states of Telangana and Andhra Pradesh. A variant of k-nearest neighbors algorithm used for categorization process. The results obtained by the Comparisons of language dependent and independent models.

4 Cites in Articles

References

  1. T Landauer,P Foltz,D Laham (1998). An Introduction to Latent Semantic Analysis.
  2. Lulin Shi,Shi Li,Ping Jiang,Hongsen Liu (2003). Improving Comparative Sentence Extraction of Chinese Product Reviews by Sentiment Analysis.
  3. A Mrs,Dr Kanaka Durga,Govardhan (2011). Ontology Based Text Categorization Telugu Documents.
  4. M Hanumanthappa 5,B Vishnu Vardhan (2008). Indian Language Text Representation and Categorization Using Supervised Learning Algorithm M Narayana Swamy1.

Funding

No external funding was declared for this work.

Conflict of Interest

The authors declare no conflict of interest.

Ethical Approval

No ethics committee approval was required for this article type.

Data Availability

Not applicable for this article.

Swapna Narala. 2017. \u201cTelugu Text Categorization using Language Models\u201d. Global Journal of Computer Science and Technology - H: Information & Technology GJCST-H Volume 16 (GJCST Volume 16 Issue H4): .

Download Citation

Journal Specifications

Crossref Journal DOI 10.17406/gjcst

Print ISSN 0975-4350

e-ISSN 0975-4172

Keywords
Classification
D.2.11,D.2.12
Version of record

v1.2

Issue date

January 25, 2017

Language

English

Experiance in AR

The methods for personal identification and authentication are no exception.

Read in 3D

The methods for personal identification and authentication are no exception.

Article Matrices
Total Views: 6866
Total Downloads: 1726
2026 Trends
Research Identity (RIN)
Related Research

Published Article

Document categorization has become an emerging technique in the field of research due to the abundance of documents available in digital form. In this paper we propose language dependent and independent models applicable to categorization of Telugu documents. India is a multilingual country; a provision is made for each of the Indian states to choose their own authorized language for communicating at the state level for legitimate purpose. The availability of constantly increasing amount of textual data of various Indian regional languages in electronic form has accelerated. Hence, the Classification of text documents based on languages is crucial. Telugu is the third most spoken language in India and one of the fifteen most spoken language n the world. It is the official language of the states of Telangana and Andhra Pradesh. A variant of k-nearest neighbors algorithm used for categorization process. The results obtained by the Comparisons of language dependent and independent models.

Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]
×

This Page is Under Development

We are currently updating this article page for a better experience.

Request Access

Please fill out the form below to request access to this research paper. Your request will be reviewed by the editorial or author team.
X

Quote and Order Details

Contact Person

Invoice Address

Notes or Comments

This is the heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

High-quality academic research articles on global topics and journals.

Telugu Text Categorization using Language Models

Swapna Narala
Swapna Narala
B. Padmaja Rani
B. Padmaja Rani
K. Ramakrishna
K. Ramakrishna

Research Journals