Hundreds of thousands of policies per year, each with scores of attributes make manual examination very hard. Further, renewal behavior varies substantially with the type of the vehicle and customer location, requiring an ensemble of prediction models.
Data was extracted from internal CRM and other systems, and blended. Substantial cleanup was required in terms of filling / deleting missing values. New attributes were generated using domain knowledge inputs. In total, there were hundreds of thousands of rows and hundreds of columns. From relative contributions of individual attributes towards renewal, promising attributes were shortlisted, and multiple model building methods were run on that data. Models were also built at the individual product level and individual geography level. Explicability of patterns was a primary consideration for the customer, so rule & tree based classifiers were preferred. A side benefit of the models was a structured way of adjusting the data capture strategy.