Airline Customer Value Analysis
Airline customer value analysis with k-means clustering approach in python.
SOURCE CODEBackground
An airline company based in China wants to maximize its profit by creating an accurate marketing strategy for its customers, especially those who are signed up as members. Business team asked data team to provide insights into customer member segmentation in this company.
Goals
Classify customer member of airline company into different categories, compare customer values of different customer category, provide personalized services based on the categories.
Analytic Approach
Clustering analysis: build a model that can divide customers into categories or segments.
Data Requirements
We will use airline customer that are already a member. We will analyze customer member value analysis using LRFMC analysis. LRFMC analysis is an extended version of RFM analysis that has been used in the industry for years to divide customers into segments. Based on LRFMC analysis we will need 5 variables:
- L : The number of months since the member’s joining time from the end of the observation time. => LOAD_TIME - FFP_DATE
- R : Number of months since the member’s last flight from the end of observation time. => LAST_TO_END
- F : The total number of times the member has flown during the observation period. => FLIGHT_COUNT
- M : Miles accumulated during member observation time. => SEG_KM_SUM
- C : The average value of the discount factor used by the member during the observation period. => avg_discount
Data Preparation
- Features extraction: create a new feature for 'L' variables using LOAD_TIME and FFP_DATE.
- Outlier treatment: Log transformation, IQR method
- Scaling: standardization.
Modeling
We will use k-means clustering to create customer categories.
Evaluation
Since this is an unsupervised method, we do not have the ground truth of the data. Hence, we are only able to evaluate the right number of clusters using the elbow method. Or sometimes, the number of segments could be determined by industry-standard.
Result
Analysis
Cluster 1
- (L) Length of membership: Medium-shorter than cluster 3 but longer than cluster 4 (~36 months/~3 years).
- (R) Recent Flight: Haven't flight for the longest time.
- (F) Flight Count: Customer with the lowest flight count-flight rarely (similar with cluster 4).
- (M) Miles Accumulated: Has the lowest sum of flight distance (similar with cluster 4).
- (C) Discount used: Has the greatest amount of discount used.
Cluster 2
- (L) Length of membership: They've stayed with the company for the shortest time (~24 months/~2 years).
- (R) Recent Flight: Medium-longer time than cluster 3 but shorter time than cluster 5.
- (F) Flight Count: Customer with the 2nd highest flight count-flight frequently.
- (M) Miles Accumulated: Has the 2nd greatest sum of flight distance.
- (C) Discount used: Has the 3rd greatest amount of discount used.
Cluster 3
- (L) Length of membership: They have been the airplane customer member for a long-term but not longer than customers in cluster 5 (~65 months/~5 years).
- (R) Recent Flight: Have the most recent flight.
- (F) Flight Count: Customer with the highest flight count-flight really frequent.
- (M) Miles Accumulated: Has the greatest sum of flight distance.
- (C) Discount used: Has the 2nd greatest amount of discount used.
Cluster 4
- (L) Length of membership: Medium-shorter than cluster 1 but longer than cluster 2 (~30 months/~2.5 years).
- (R) Recent Flight: Haven't been flight for a longer time than cluster 5 but not as long as cluster 1.
- (F) Flight Count: Customer with the lowest flight count-flight rarely (similar with cluster 1).
- (M) Miles Accumulated: Has the lowest sum of flight distance (similar with cluster 4).
- (C) Discount used: Has the lowest amount of discount used.
Cluster 5
- (L) Length of membership: They've stayed with the company for the longest time (~75 months/~6 years).
- (R) Recent Flight: Medium-longer time than cluster 2 but shorter time than cluster 4.
- (F) Flight Count: Customer with the 3rd highest flight count-flight frequently but not as frequent cluster 2.
- (M) Miles Accumulated: Has the 3rd greatest sum of flight distance.
- (C) Discount used: Has the 4th greatest amount of discount used.
Conclusion
The customer within cluster 3 is the most loyal customer due to: Have stayed for a quite long period, Flight frequently and have the greatest sum of distance, Also most of them recently used the airplane service. Most of them used a great amount of discount, but maybe this is was given as part of the airplane treatment for their loyal customer member.