Monday, October 4, 2010

Developing a Segmented Model

As a modeler, I have always had to deal with the need to develop and justify a segmented model.   There is always the tension between trying to develop a richer single model that leverages all the data, versus a  suite of models that explains the  niches well, but then uses less data and so has stability implications. Over the years have developed a  number of best practices around it.

Then last year Varun and I decided to  formalize our findings in the form of an approach. Essentially, segmentation is desirable if one of the following happens:

(a)   When there are some  clear business or external knowledge reason for segmentation
(b)   When data availability/coverage  vary across sub segments
(c)    When the model is over dependent on certain predictors
(d)   When the relationship of certain key predictors with target variable are not stable across sub pockets of the population
(e)   When it is possible to identify some patterns in the error terms of the base model

Based on this we recommend the following  approach:

Step 1:   Any business knowledge or data availability reasons to work on pre-defined segments?
    If YES, go to Step 2
    If NO, go to Step 3
Step 2:   Develop segment level models [refer scenario (b) in Sec. 2] and go to Step 1
Step 3:   Build an aggregate model and go to Step 4
Step 4:   Any binary predictor, whose contribution is very high?
    If YES, go to Step 5
    If NO, go to Step 6
Step 5:   Develop segment level models [refer scenario (a) in Sec. 2] and go to Step 4
Step 6:   Any predictor, across whose classes/cut-off values, direction of impact of remaining predictors on the target variable gets flipped or change significantly?
    If YES, go to Step 7
    If NO, go to Step 8
Step 7:   Develop segment level models [refer scenario (c) in Sect. 2] and go to Step 4

Step 8:   Any patterns (based on classification tree) in residuals of aggregate model?
    If YES, go to Step 9
    If NO, go to STOP
Step 9:   Develop segment level models [refer scenario (d) in Sec. 2] and go to Step 4

STOP:    No segmentation needed


I look forward to your comments.

--datamining_guy

No comments:

Post a Comment