Hybrid Technique for Arabic Text Compression

α
Arafat Awajan
Arafat Awajan
σ
Enas Abu Jrai
Enas Abu Jrai
α Princess Sumaya University for Technology

Send Message

To: Author

Hybrid Technique for Arabic Text Compression

Article Fingerprint

ReserarchID

CSTSDE2XAL2

Hybrid Technique for Arabic Text Compression Banner

AI TAKEAWAY

Connecting with the Eternal Ground
  • English
  • Afrikaans
  • Albanian
  • Amharic
  • Arabic
  • Armenian
  • Azerbaijani
  • Basque
  • Belarusian
  • Bengali
  • Bosnian
  • Bulgarian
  • Catalan
  • Cebuano
  • Chichewa
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Corsican
  • Croatian
  • Czech
  • Danish
  • Dutch
  • Esperanto
  • Estonian
  • Filipino
  • Finnish
  • French
  • Frisian
  • Galician
  • Georgian
  • German
  • Greek
  • Gujarati
  • Haitian Creole
  • Hausa
  • Hawaiian
  • Hebrew
  • Hindi
  • Hmong
  • Hungarian
  • Icelandic
  • Igbo
  • Indonesian
  • Irish
  • Italian
  • Japanese
  • Javanese
  • Kannada
  • Kazakh
  • Khmer
  • Korean
  • Kurdish (Kurmanji)
  • Kyrgyz
  • Lao
  • Latin
  • Latvian
  • Lithuanian
  • Luxembourgish
  • Macedonian
  • Malagasy
  • Malay
  • Malayalam
  • Maltese
  • Maori
  • Marathi
  • Mongolian
  • Myanmar (Burmese)
  • Nepali
  • Norwegian
  • Pashto
  • Persian
  • Polish
  • Portuguese
  • Punjabi
  • Romanian
  • Russian
  • Samoan
  • Scots Gaelic
  • Serbian
  • Sesotho
  • Shona
  • Sindhi
  • Sinhala
  • Slovak
  • Slovenian
  • Somali
  • Spanish
  • Sundanese
  • Swahili
  • Swedish
  • Tajik
  • Tamil
  • Telugu
  • Thai
  • Turkish
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Welsh
  • Xhosa
  • Yiddish
  • Yoruba
  • Zulu

Abstract

Arabic content on the Internet and other digital media is increasing exponentially, and the number of Arab users of these media has multiplied by more than 20 over the past five years. There is a real need to save allocated space for this content as well as allowing more efficient usage, searching, and retrieving information operations on this content. Using techniques borrowed from other languages or general data compression techniques, ignoring the proper features of Arabic has limited success in terms of compression ratio. In this paper, we present a hybrid technique that uses the linguistic features of Arabic language to improve the compression ratio of Arabic texts. This technique works in phases. In the first phase, the text file is split into four different files using a multilayer model-based approach. In the second phase, each one of these four files is compressed using the Burrows-Wheeler compression algorithm.

References

25 Cites in Article
  1. G Blelloch (2010). Introduction to Data Compression.
  2. R Lourdusamy,S Shanmugasundaram (2011). A Comparative Study Of Text Compression Algorithms.
  3. D Moronfolu,Oluwade (2009). An enhanced LZW text compression algorithm.
  4. Haroon Altarawneh,Mohammad Altarawneh (2011). Data Compression Techniques on Text Files: A Comparison Study.
  5. R Hasan (2011). Data Compression using Huffman based LZW Encoding Technique.
  6. J Teahan,R Mcnab,H Witten (2000). A Compressionbased Algorithm for Chinese Word Segmentation.
  7. (2007). Arabic Computational Morphology.
  8. Julius Golej,Andrej Adamuscin,Miroslav Panik (2008). A DATA-DRIVEN APPROACH TO REAL ESTATE PRICE ESTIMATION: THE CASE STUDY SLOVAKIA.
  9. Majdi Sawalha,Eric Atwell (2011). A standard tag set expounding traditional morphological features for Arabic language part-of-speech tagging.
  10. A Al-Sughaiyer,I Al-Kharashi (2004). Arabic Morphological Analysis Techniques: A Comprehensive Survey.
  11. D Jurafsky,J Martin (2008). Speech and Language Processing.
  12. G Pauw,G.-M Schryver (2008). Improving the Computational Morphological Analysis of a Swahili Corpus for Lexicographic Purposes.
  13. Sameh Ghwanmeh,Riyad Al-Shalabi,Ghassan Kanaan (2006). Efficient Data Compression Scheme using Dynamic Huffman Code Applied on Arabic Language.
  14. Ziad Alasmer,Bilal Zahran,Belal Ayyoub,Monther Kanan,Abdelaziz Hammouri,Jafar Ababneh (2013). A comparison between English and Arabic text compression.
  15. M Khafagy (2005). Arabic Text Data Compression.
  16. E Omer,K Khatatneh (2010). Arabic Short Text Compression.
  17. H Akman,S Bayindir,Z Ozleme,Akin,Sanjay Misra (2011). Lossless Text Compression Technique Using Syllable Based Morphology.
  18. M Daoud (2011). Morphological Analysis and Diacritical Arabic Text Compression.
  19. Awajan (2011). Multilayer Model for Arabic Text Compression.
  20. R Radescu (2009). Transform methods used in lossless compression of text files.
  21. Abel (2003). Improvements to the Burrows-Wheeler Compression Algorithm: After BWT Stages.
  22. Yair Wiseman,Irit Gefner (2007). Conjugation-based compression for Hebrew texts.
  23. Awajan (2007). Arabic Text Preprocessing for the Natural Language Processing Applications.
  24. M Saad (2011). Table 1: Comparison between different Arabic corpora..
  25. Alesco Arabic Language Derivation and Morphological System.

Funding

No external funding was declared for this work.

Conflict of Interest

The authors declare no conflict of interest.

Ethical Approval

No ethics committee approval was required for this article type.

Data Availability

Not applicable for this article.

How to Cite This Article

Arafat Awajan. 2015. \u201cHybrid Technique for Arabic Text Compression\u201d. Global Journal of Computer Science and Technology - C: Software & Data Engineering GJCST-C Volume 15 (GJCST Volume 15 Issue C1): .

Download Citation

Journal Specifications

Crossref Journal DOI 10.17406/gjcst

Print ISSN 0975-4350

e-ISSN 0975-4172

Keywords
Classification
C.1.3
Version of record

v1.2

Issue date

February 21, 2015

Language
en
Experiance in AR

Explore published articles in an immersive Augmented Reality environment. Our platform converts research papers into interactive 3D books, allowing readers to view and interact with content using AR and VR compatible devices.

Read in 3D

Your published article is automatically converted into a realistic 3D book. Flip through pages and read research papers in a more engaging and interactive format.

Article Matrices
Total Views: 8227
Total Downloads: 2098
2026 Trends
Related Research

Published Article

Arabic content on the Internet and other digital media is increasing exponentially, and the number of Arab users of these media has multiplied by more than 20 over the past five years. There is a real need to save allocated space for this content as well as allowing more efficient usage, searching, and retrieving information operations on this content. Using techniques borrowed from other languages or general data compression techniques, ignoring the proper features of Arabic has limited success in terms of compression ratio. In this paper, we present a hybrid technique that uses the linguistic features of Arabic language to improve the compression ratio of Arabic texts. This technique works in phases. In the first phase, the text file is split into four different files using a multilayer model-based approach. In the second phase, each one of these four files is compressed using the Burrows-Wheeler compression algorithm.

Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]

Request Access

Please fill out the form below to request access to this research paper. Your request will be reviewed by the editorial or author team.
X

Quote and Order Details

Contact Person

Invoice Address

Notes or Comments

This is the heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

High-quality academic research articles on global topics and journals.

Hybrid Technique for Arabic Text Compression

Arafat Awajan
Arafat Awajan Princess Sumaya University for Technology
Enas Abu Jrai
Enas Abu Jrai

Research Journals