Critical Analysis of Solutions to Hadoop Small File Problem

Prof. Shwetha K S
Prof. Shwetha K S
Dr. Chandramouli H
Dr. Chandramouli H

Send Message

To: Author

Critical Analysis of Solutions to Hadoop Small File Problem

Article Fingerprint

ReserarchID

CSTSDE4L4OD

Critical Analysis of Solutions to Hadoop Small File Problem Banner

AI TAKEAWAY

Connecting with the Eternal Ground
  • English
  • Afrikaans
  • Albanian
  • Amharic
  • Arabic
  • Armenian
  • Azerbaijani
  • Basque
  • Belarusian
  • Bengali
  • Bosnian
  • Bulgarian
  • Catalan
  • Cebuano
  • Chichewa
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Corsican
  • Croatian
  • Czech
  • Danish
  • Dutch
  • Esperanto
  • Estonian
  • Filipino
  • Finnish
  • French
  • Frisian
  • Galician
  • Georgian
  • German
  • Greek
  • Gujarati
  • Haitian Creole
  • Hausa
  • Hawaiian
  • Hebrew
  • Hindi
  • Hmong
  • Hungarian
  • Icelandic
  • Igbo
  • Indonesian
  • Irish
  • Italian
  • Japanese
  • Javanese
  • Kannada
  • Kazakh
  • Khmer
  • Korean
  • Kurdish (Kurmanji)
  • Kyrgyz
  • Lao
  • Latin
  • Latvian
  • Lithuanian
  • Luxembourgish
  • Macedonian
  • Malagasy
  • Malay
  • Malayalam
  • Maltese
  • Maori
  • Marathi
  • Mongolian
  • Myanmar (Burmese)
  • Nepali
  • Norwegian
  • Pashto
  • Persian
  • Polish
  • Portuguese
  • Punjabi
  • Romanian
  • Russian
  • Samoan
  • Scots Gaelic
  • Serbian
  • Sesotho
  • Shona
  • Sindhi
  • Sinhala
  • Slovak
  • Slovenian
  • Somali
  • Spanish
  • Sundanese
  • Swahili
  • Swedish
  • Tajik
  • Tamil
  • Telugu
  • Thai
  • Turkish
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Welsh
  • Xhosa
  • Yiddish
  • Yoruba
  • Zulu
Font Type
Font Size
Font Size
Bedground

Abstract

Hadoop big data platform is designed to process large volume of data. Small file problem is a performance bottleneck in Hadoop processing. Small files lower than the block size of Hadoop creates huge storage overhead at Namenode’s and also wastes computational resources due to spawning of many map tasks. Various solutions like merging small files, mapping multiple map threads to same java virtual machine instance etc have been proposed to solve the small file problems in Hadoop. This survey does a critical analysis of existing works addressing small file problems in Hadoop and its variant platforms like Spark. The aim is to understand their effectiveness in reducing the storage/computational overhead and identify the open issues for further research.

Generating HTML Viewer...

References

24 Cites in Article
  1. Tharwat El-Sayed,Mohammed Badawy,Ayman El-Sayed (2018). SFSAN Approach for Solving the Problem of Small Files in Hadoop.
  2. Bo Dong,Qinghua Zheng,Feng Tian,Kuo-Ming Chao,Rui Ma,Rachid Anane (2012). An optimized approach for storing and accessing small files on cloud storage.
  3. Mohd Ahad,Ranjit Biswas (2018). Dynamic Merging based Small File Storage (DM-SFS) Architecture for Efficiently Storing Small Size Files in Hadoop.
  4. Isma Siddiqui,Nawab Qureshi,Bhawani Chowdhry,Muhammad Uqaili (2020). Pseudo-Cache-Based IoT Small Files Management Framework in HDFS Cluster.
  5. Yanlong Zhai,Jude Tchaye-Kondi,Kwei-Jay & Lin,Zhu,& Liehuang,Tao,& Wenjun,Du,& Xiaojiang,Mohsen Guizani (2021). Hadoop Perfect File: A fast and memory-efficient metadata access archive file to face small files problem in HDFS.
  6. Xun Cai,Cai Chen,Yi Liang (2018). An optimization strategy of massive small files storage based on HDFS.
  7. Chang Choi,Chulwoong Choi,Junho Choi,Pankoo Kim (2018). Improved performance optimization for massive small files in cloud computing environment.
  8. Jian-Feng Peng,Wen-Guo Wei,Hui-Min Zhao,Qing-Yun Dai,Gui-Yuan Xie,Jun Cai,Ke-Jing He (2018). Hadoop Massive Small File Merging Technology Based on Visiting Hot-Spot and Associated File Optimization.
  9. Salman Niazi,Mikael Ronström,Seif Haridi,Jim Dowling (2018). Size Matters.
  10. Weipeng Jing,Danyu Tong,Guangsheng Chen,Chuanyu Zhao,Liangkuan Zhu (2018). An optimized method of HDFS for massive small files storage.
  11. Vijay Sharma,Asyraf Afthanorhan,Nemi Barwar,Satyendra Singh,Hasmat Malik (2022). A Dynamic Repository Approach for Small File Management With Fast Access Time on Hadoop Cluster: Hash Based Extended Hadoop Archive.
  12. Kun Wang,Yang Yang,Xuesong Qiu,Zhipeng Gao (2017). MOSM: An approach for efficient storing massive small files on Hadoop.
  13. Adnan Ali,Nada Masood Mirza,Mohamad Khairi Ishak (2023). Enhanced Best Fit Algorithm for Merging Small Files.
  14. L Prasanna,Kumar (2016). Optimization Scheme for Storing and Accessing Huge Number of Small Files on HADOOP Distributed File System.
  15. Xin Huang,Wenlong Yi,Jiwei Wang,Zhijian Xu (2021). Hadoop-Based Medical Image Storage and Access Method for Examination Series.
  16. Thomas Renner,Johannes Müller,Lauritz Thamsen,Odej Kao (2017). Addressing Hadoop's Small File Problem With an Appendable Archive File Format.
  17. Jun Liu (2019). Storage-Optimization Method for Massive Small Files of Agricultural Resources Based on Hadoop.
  18. Yanfeng Lyu,Xunli Fan,Kun Liu (2017). An Optimized Strategy for Small Files Storing and Accessing in HDFS.
  19. Qi Mu,Yikai Jia,Bibo Luo (2015). The Optimization Scheme Research of Small Files Storage Based on HDFS.
  20. Tao Wang,Shihong Yao,Zhengquan Xu,Lian Xiong,Xin Gu,Xiping Yang (2015). An Effective Strategy for Improving Small File Problem in Distributed File System.
  21. H He,Z Du,W Zhang,A Chen (2016). Optimization strategy of Hadoop small_le storage for big data in healthcare.
  22. S Fu,L He,C Huang,X Liao,K Li (2015). Performance optimization for managing massive numbers of small files in distributed file systems.
  23. Wenjun Tao,Yanlong Zhai,Jude Tchaye-Kondi (2019). LHF: A New Archive Based Approach to Accelerate Massive Small Files Access Performance in HDFS.
  24. Kyoungsoo Bok,Hyunkyo Oh,Jongtae Lim,Yosop Pae,Hyoungrak Choi,Byoungyup Lee,Jaesoo Yoo (2017). An efficient distributed caching for accessing small files in HDFS.

Funding

No external funding was declared for this work.

Conflict of Interest

The authors declare no conflict of interest.

Ethical Approval

No ethics committee approval was required for this article type.

Data Availability

Not applicable for this article.

How to Cite This Article

Prof. Shwetha K S. 2026. \u201cCritical Analysis of Solutions to Hadoop Small File Problem\u201d. Global Journal of Computer Science and Technology - C: Software & Data Engineering GJCST-C Volume 23 (GJCST Volume 23 Issue C2).

Download Citation

An in-depth study of Hadoop's file system limitations and performance issues for research.
Journal Specifications

Crossref Journal DOI 10.17406/gjcst

Print ISSN 0975-4350

e-ISSN 0975-4172

Keywords
Classification
GJCST-C Classification (LCC): QA76.585
Version of record

v1.2

Issue date
October 28, 2023

Language
en
Experiance in AR

Explore published articles in an immersive Augmented Reality environment. Our platform converts research papers into interactive 3D books, allowing readers to view and interact with content using AR and VR compatible devices.

Read in 3D

Your published article is automatically converted into a realistic 3D book. Flip through pages and read research papers in a more engaging and interactive format.

Article Matrices
Total Views: 2111
Total Downloads: 34
2026 Trends
Related Research
Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]

Request Access

Please fill out the form below to request access to this research paper. Your request will be reviewed by the editorial or author team.
X

Quote and Order Details

Contact Person

Invoice Address

Notes or Comments

This is the heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

High-quality academic research articles on global topics and journals.

Critical Analysis of Solutions to Hadoop Small File Problem

Prof. Shwetha K S
Prof. Shwetha K S
Dr. Chandramouli H
Dr. Chandramouli H

Research Journals