Then last year Varun and I decided to formalize our findings in the form of an approach. Essentially, segmentation is desirable if one of the following happens:
(a) When there are some clear business or external knowledge reason for segmentation
(b) When data availability/coverage vary across sub segments
(c) When the model is over dependent on certain predictors
(d) When the relationship of certain key predictors with target variable are not stable across sub pockets of the population
(e) When it is possible to identify some patterns in the error terms of the base model
Based on this we recommend the following approach:
Step 1: Any business knowledge or data availability reasons to work on pre-defined segments?
− If YES, go to Step 2
− If NO, go to Step 3
Step 2: Develop segment level models [refer scenario (b) in Sec. 2] and go to Step 1
Step 3: Build an aggregate model and go to Step 4
Step 4: Any binary predictor, whose contribution is very high?
− If YES, go to Step 5
− If NO, go to Step 6
Step 5: Develop segment level models [refer scenario (a) in Sec. 2] and go to Step 4
Step 6: Any predictor, across whose classes/cut-off values, direction of impact of remaining predictors on the target variable gets flipped or change significantly?
− If YES, go to Step 7
− If NO, go to Step 8
Step 7: Develop segment level models [refer scenario (c) in Sect. 2] and go to Step 4
Step 8: Any patterns (based on classification tree) in residuals of aggregate model?
− If YES, go to Step 9
− If NO, go to STOP
Step 9: Develop segment level models [refer scenario (d) in Sec. 2] and go to Step 4
STOP: No segmentation needed
I look forward to your comments.
--datamining_guy
No comments:
Post a Comment