Customer Segmentation analysis for Arnova Store

Get all the code used in this project here :

https://github.com/manishpaneru/Customer_segmentation

As a data analyst at Arnova Store, I have been entrusted by the CEO with the task of uncovering actionable insights from our customer data. The CEO's key question is: What are the key segments in our customer base, and how do they differ in purchasing behavior?

To answer this, I will analyze the dataset using a structured approach:

Data Storage: Customer data is maintained in Excel for easy accessibility.
Analysis Tools: Python will be used for data cleaning, exploratory data analysis (EDA), and visualizations.
Outcome: A detailed report summarizing customer segments, their characteristics, and tailored recommendations to optimize marketing strategies

Methodology

To address the CEO's question and effectively segment Arnova Store's customer base, I followed a systematic, hands-on approach using Python for all analysis tasks. The dataset, which contained customer demographics, spending habits, and other behavioral attributes, was provided to me by the CEO in Excel format. Here’s how I tackled the project step by step:

Data Acquisition
- I received the dataset directly from the CEO and ensured it was properly stored in Excel for accessibility.
- Using Python’s pandas library, I imported the dataset into my workspace for analysis and verified its structure to confirm that all key attributes were present.
Data Cleaning
- I began by checking for missing values, duplicates, and inconsistent entries in the dataset. For missing values in categorical variables like the "Profession" column, I replaced them with the mode to maintain consistency.
- I removed duplicate entries to avoid skewing the analysis and standardized all column names to a consistent format for easier referencing during the analysis.
- I also identified and handled outliers in critical numerical columns, such as "Annual Income ($)" and "Spending Score (1-100)," by examining box plots and using appropriate capping techniques.
Exploratory Data Analysis (EDA)
- To better understand the customer base, I explored key demographic attributes such as age, gender, profession, and family size.
- I analyzed spending patterns by looking at distributions of annual income and spending scores, and used scatter plots to identify trends and relationships between these variables.
- Using visualizations like histograms, bar charts, and scatter plots, I uncovered patterns that provided a deeper understanding of our customers' behaviors and preferences.
Customer Clustering (Segmentation)
- To identify distinct customer groups, I applied k-means clustering. I first normalized numerical data such as "Annual Income ($)" and "Spending Score (1-100)" to ensure equal contribution from each variable.
- To determine the optimal number of clusters, I used the elbow method, plotting the sum of squared errors (SSE) for different cluster numbers and selecting the point where the reduction in SSE began to plateau.
- After finalizing the number of clusters, I assigned cluster labels to each customer in the dataset for further analysis.
Cluster Analysis
- With clusters assigned, I examined the characteristics of each group, comparing attributes like average income, spending score, and demographics (e.g., gender, age, and profession).
- I visualized the clusters to highlight differences between groups, using scatter plots and bar charts to make the distinctions clear.
Insights and Reporting
- Finally, I summarized my findings and crafted actionable insights based on the analysis. For each customer segment, I outlined their key characteristics and proposed tailored strategies to better engage them.
- I compiled all results into a comprehensive report to present to the CEO, ensuring the insights were actionable and aligned with Arnova Store’s business goals.
Data Overview The dataset provided by the CEO contained 2,000 records and eight key attributes, including customer demographics (e.g., Gender, Age, Profession), spending habits (e.g., Annual Income and Spending Score), and additional details like Work Experience and Family Size. Upon importing the data, I noticed that some values in the "Profession" column were missing, which I addressed by imputing the most frequent value (mode).

To ensure data integrity, I also removed duplicate entries and standardized column names to make them easier to reference during analysis. Outliers in the numerical attributes were identified through box plots, particularly in "Annual Income" and "Spending Score," and were capped appropriately to prevent them from skewing the clustering results. This preprocessing ensured that the dataset was clean, consistent, and ready for analysis.

Exploratory Data Analysis (EDA) To gain an initial understanding of the data, I analyzed demographic attributes such as Age, Gender, and Profession.

A histogram of Age revealed that the majority of customers were between 25 and 45 years old, indicating a younger customer base. The Gender distribution was fairly balanced, while the Profession data showed a concentration in fields like Engineering, Healthcare, and Entertainment.

Next, I analyzed spending habits. A histogram of Annual Income showed that most customers earned between $30,000 and $90,000 annually. The Spending Score histogram revealed diverse purchasing behaviors, with some customers scoring very low and others extremely high. A scatter plot of Annual Income vs. Spending Score highlighted potential patterns, with high earners clustering into distinct spending categories.

These insights provided a solid foundation for clustering and revealed initial trends in customer demographics and spending behaviors.

Customer Clustering (Segmentation) To segment the customer base, I applied k-means clustering. Before clustering, I normalized numerical attributes such as "Annual Income" and "Spending Score" to ensure that each feature contributed equally to the clustering process.

The elbow method was used to determine the optimal number of clusters, revealing that four distinct groups would provide the most meaningful segmentation.

After performing k-means clustering, I assigned cluster labels to each customer and visualized the results using a scatter plot. This plot displayed clear segmentation, with each cluster representing a unique group based on income and spending behavior.

Cluster Analysis

With clusters assigned, I examined the characteristics of each group, comparing attributes like average income, spending score, and demographics (e.g., gender, age, and profession).
I visualized the clusters to highlight differences between groups, using scatter plots and bar charts to make the distinctions clear.

How My Approach Ties to the Project Goals

My methodology was focused on addressing the CEO’s question with a logical and thorough approach. By cleaning the data, exploring it to understand underlying trends, and segmenting the customer base into clear clusters, I provided a structured way to answer the key business question: "What are the key segments in our customer base, and how do they differ in purchasing behavior?"
Every step, from cleaning to cluster analysis, was aimed at ensuring that the insights were actionable and relevant to the business needs. This process helped me not only answer the question but also provide practical recommendations to enhance Arnova Store’s marketing strategies.

Insights and Reporting

Finally, I summarized my findings and crafted actionable insights based on the analysis. For each customer segment, I outlined their key characteristics and proposed tailored strategies to better engage them.
I compiled all results into a comprehensive report to present to the CEO, ensuring the insights were actionable and aligned with Arnova Store’s business goals.