This research delves into the multifaceted implications of customer feedback within the e-commerce landscape, focusing on product reviews on Amazon. The study meticulously examines over 1,400 unique product reviews to decipher patterns, extrapolate trends, and offer actionable recommendations for the evolving e-commerce paradigm. The dataset comprises 16 distinct features, including product ratings, textual reviews, prices, and discounts. Preliminary data exploration reveals a prevalence of high ratings, indicative of an overarching positive sentiment among Amazon’s clientele. Furthermore, features related to pricing and discounts hint at the intricate interplay between economic factors and customer feedback. Through data preparation techniques, including numeric extraction and missing data handling, the research ensures the dataset’s readiness for advanced statistical and machine learning analyses. Leveraging the CRISP-DM methodology, the study uncovers insights into customer satisfaction, the impact of pricing strategies, and the significance of in depth reviews.
## I. INTRODUCTION
The digital age has ushered in an era where vast amounts of data are at the fingertips of organizations [1]. For businesses [2], especially in the financial sector, this data is a goldmine, offering insights into customer behaviors, preferences, and patterns. Among the various data types, credit card transaction data is particularly intriguing, encapsulating a user's spending habits, preferences, and financial behaviors.
Segmenting users based on this data can unveil distinct groups with unique characteristics, enabling businesses to tailor their services, offers, and marketing strategies accordingly [3], [4]. This research navigates this segmentation journey, harnessing the power of the KDD process and clustering techniques.
## II. DATA UNDERSTANDING
The dataset under scrutiny encapsulates the behaviors of 8950 active credit card users, spanning 18 behavioral variables. These variables range from basic metrics like balance and credit limit to more intricate ones like purchase frequency and cash advance trans- actions.
A cursory exploration of the dataset reveals:
- A diverse range of balances, with some users maintaining high balances and others minimal amounts.
- Various purchasing behaviors, with certain users inclined towards one-off purchases and others towards installment-based ones.
- Discrepancies in credit limits, indicating differing creditworthiness among users.
- Missing data points in certain columns necessitating preprocessing steps before deeper analysis.
## III. DATA PREPARATION
A rigorous data preparation phase was undertaken to ensure the dataset's suitability for clustering. This phase is pivotal in the KDD process [5], [6] as it sets the stage for effective data mining. Key steps included:
- Missing Value Imputation: Credit card data often contains missing values, either due to errors or omissions. The CREDIT_LIMIT and MINI-MUM PAYMENTS columns, which had missing values, were addressed by imputing the mean of the respective columns. This strategy ensured that the overall distribution of these columns remained unaffected while providing a reasonable estimate for the missing values.
- Standardization: Given the varying scales and units of the dataset's features, a standardization step was crucial. This ensured that each feature contributed equally to the clustering process, pre-venting any single feature from dominating the clustering due to its scale.
- Dimensionality Reduction: With a multitude of features, reducing dimensionality can Sim- plify the clustering process and make visual- izations feasible. Employing Principal Component Analysis (PCA), the dataset's dimension- ality was reduced, making it more manageable and visualizationfriendly. The first two principal components, which captured a significant portion of the dataset's variance, were retained for sub- sequent steps.
## IV. MODELING
The modeling phase is a pivotal step within the Knowledge Discovery in Databases (KDD) process. It entails applying data mining algorithms to extract patterns or knowledge from the prepared dataset[7]. The overarching goal of this phase was to segment credit card users based on their behavioral patterns.
### a) Optimal Cluster Determination
Before proceeding with clustering, it's imperative to determine the optimal number of clusters.
This ensures that the granularity of segmentation is neither too broad nor too specific.
For this purpose, the elbow method was employed, a technique that identifies the number of clusters at which the reduction in within-cluster variance startsto show diminishing returns.
 Fig. 1: Elbow Plot for Determining Optimal Number of Clusters
From the elbow plot, the inflection point around four clusters suggested that segmenting users into fourdistinct groups would be the most effective.
### b) Clustering with $K$ -Means
K-means was the algorithm of choice for clustering due to its effectiveness in partitioning datasets intonon-overlapping subgroups. The algorithm works by:
1. Initializing $k$ centroids randomly.
2. Assigning each data point to the nearest centroid.
3. Recomputing the centroid of each cluster based on its constituent data points.
4. Repeating the assignment and recomputation steps until cluster assignments no longer change or a set number of iterations is reached.
Upon application to the standardized dataset, users were segmented into the previously identified four clusters. Each of these clusters represents a distinct group of users, differentiated by their credit card usage behaviors.
## V. EVALUATION
The evaluation phase provides a critical assessment of the clusters formed, ensuring they are interpretable, actionable, and aligned with the initial objectives of the research.
- Cluster Characteristics: Each cluster was examined to understand the dominant behaviors of its members. For instance, one cluster might consist predominantly of users with high one-off purchase values, suggesting a pattern of infrequent, high-value shopping behaviors.
values, suggesting a pattern of infrequent, high-value shopping behaviors. Another cluster might be characterized by frequent cash advances, indicating users who might be facing financial challenges or have immediate liquidity needs.
- Cluster Visualization: Visualizing the clusters, especially in reduced-dimensional space (e.g., using the first two principal components), provides an intuitive grasp of how distinct the clusters are and how they relate to each other in the feature space.
- Inter-cluster Analysis: By examining the centroids of the clusters, one can derive insights into the "average" behavior of each segment. This is crucial for businesses aiming to tailor marketing strategies for each segment.
- Intra-cluster Analysis: Within each cluster, the variance or spread of data points was analyzed. A tighter cluster indicates members with very similar behaviors, while a more dispersed cluster suggests a broader range of behaviors within that segment.
Through this comprehensive evaluation, it was evident that the clusters formed are distinct and insightful. Each cluster offers a unique lens into a segment of credit card users, providing businesses with granular insights into user behaviors and preferences.
## VI. CONCLUSION
The application of the KDD process in segmenting credit card users has demonstrated the profound potential of data mining in unveiling hidden patterns within datasets. By methodically progressing through data understanding, preparation, modeling, and evaluation, this research has segmented credit card users into discernible clusters, each echoing unique behavioral patterns.
Such segmentation provides businesses with a deeper understanding of their customer base, enabling them to devise personalized marketing strategies, offers, and services. In an era where enterprises strive for personalization and meaningful customer engagement, the insights derived from this study are of paramount importance.
Furthermore, this research underscores the signifi- cance of a structured approach to data analysis, empha-sizing the importance of each phase in the KDD pro-cess. As the field of data mining continues to evolve, methodologies like KDD will remain instrumental in transforming raw data into actionable knowledge.
Generating HTML Viewer...
Funding
No external funding was declared for this work.
Conflict of Interest
The authors declare no conflict of interest.
Ethical Approval
No ethics committee approval was required for this article type.
Data Availability
Not applicable for this article.
Azizul Hakim Rafi. 2026. \u201cUnveiling Customer Sentiments: A Comprehensive Analysis of Product Reviews on Amazon\u201d. Global Journal of Computer Science and Technology - D: Neural & AI GJCST-D Volume 24 (GJCST Volume 24 Issue D2): .
This research delves into the multifaceted implications of customer feedback within the e-commerce landscape, focusing on product reviews on Amazon. The study meticulously examines over 1,400 unique product reviews to decipher patterns, extrapolate trends, and offer actionable recommendations for the evolving e-commerce paradigm. The dataset comprises 16 distinct features, including product ratings, textual reviews, prices, and discounts. Preliminary data exploration reveals a prevalence of high ratings, indicative of an overarching positive sentiment among Amazon’s clientele. Furthermore, features related to pricing and discounts hint at the intricate interplay between economic factors and customer feedback. Through data preparation techniques, including numeric extraction and missing data handling, the research ensures the dataset’s readiness for advanced statistical and machine learning analyses. Leveraging the CRISP-DM methodology, the study uncovers insights into customer satisfaction, the impact of pricing strategies, and the significance of in depth reviews.
Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]
×
This Page is Under Development
We are currently updating this article page for a better experience.
Thank you for connecting with us. We will respond to you shortly.