Airline Customer Value Analysis

Airline customer value analysis with k-means clustering approach in python.

SOURCE CODE
Photo by Scarbor Siu on Unsplash

Background

An airline company based in China wants to maximize its profit by creating an accurate marketing strategy for its customers, especially those who are signed up as members. Business team asked data team to provide insights into customer member segmentation in this company.

Goals

Classify customer member of airline company into different categories, compare customer values of different customer category, provide personalized services based on the categories.

Analytic Approach

Clustering analysis: build a model that can divide customers into categories or segments.

Data Requirements

We will use airline customer that are already a member. We will analyze customer member value analysis using LRFMC analysis. LRFMC analysis is an extended version of RFM analysis that has been used in the industry for years to divide customers into segments. Based on LRFMC analysis we will need 5 variables:

- L : The number of months since the member’s joining time from the end of the observation time. => LOAD_TIME - FFP_DATE

- R : Number of months since the member’s last flight from the end of observation time. => LAST_TO_END

- F : The total number of times the member has flown during the observation period. => FLIGHT_COUNT

- M : Miles accumulated during member observation time. => SEG_KM_SUM

- C : The average value of the discount factor used by the member during the observation period. => avg_discount

Data Preparation

- Features extraction: create a new feature for 'L' variables using LOAD_TIME and FFP_DATE.

- Outlier treatment: Log transformation, IQR method

- Scaling: standardization.

Modeling

We will use k-means clustering to create customer categories.

Evaluation

Since this is an unsupervised method, we do not have the ground truth of the data. Hence, we are only able to evaluate the right number of clusters using the elbow method. Or sometimes, the number of segments could be determined by industry-standard.

Result

Customer Member Cluster Visualization

Analysis

Cluster 1

- (L) Length of membership: Medium-shorter than cluster 3 but longer than cluster 4 (~36 months/~3 years).

- (R) Recent Flight: Haven't flight for the longest time.

- (F) Flight Count: Customer with the lowest flight count-flight rarely (similar with cluster 4).

- (M) Miles Accumulated: Has the lowest sum of flight distance (similar with cluster 4).

- (C) Discount used: Has the greatest amount of discount used.

Cluster 2

- (L) Length of membership: They've stayed with the company for the shortest time (~24 months/~2 years).

- (R) Recent Flight: Medium-longer time than cluster 3 but shorter time than cluster 5.

- (F) Flight Count: Customer with the 2nd highest flight count-flight frequently.

- (M) Miles Accumulated: Has the 2nd greatest sum of flight distance.

- (C) Discount used: Has the 3rd greatest amount of discount used.

Cluster 3

- (L) Length of membership: They have been the airplane customer member for a long-term but not longer than customers in cluster 5 (~65 months/~5 years).

- (R) Recent Flight: Have the most recent flight.

- (F) Flight Count: Customer with the highest flight count-flight really frequent.

- (M) Miles Accumulated: Has the greatest sum of flight distance.

- (C) Discount used: Has the 2nd greatest amount of discount used.

Cluster 4

- (L) Length of membership: Medium-shorter than cluster 1 but longer than cluster 2 (~30 months/~2.5 years).

- (R) Recent Flight: Haven't been flight for a longer time than cluster 5 but not as long as cluster 1.

- (F) Flight Count: Customer with the lowest flight count-flight rarely (similar with cluster 1).

- (M) Miles Accumulated: Has the lowest sum of flight distance (similar with cluster 4).

- (C) Discount used: Has the lowest amount of discount used.

Cluster 5

- (L) Length of membership: They've stayed with the company for the longest time (~75 months/~6 years).

- (R) Recent Flight: Medium-longer time than cluster 2 but shorter time than cluster 4.

- (F) Flight Count: Customer with the 3rd highest flight count-flight frequently but not as frequent cluster 2.

- (M) Miles Accumulated: Has the 3rd greatest sum of flight distance.

- (C) Discount used: Has the 4th greatest amount of discount used.

Conclusion

The customer within cluster 3 is the most loyal customer due to: Have stayed for a quite long period, Flight frequently and have the greatest sum of distance, Also most of them recently used the airplane service. Most of them used a great amount of discount, but maybe this is was given as part of the airplane treatment for their loyal customer member.

Other projects: