Twice a year, I attend Casualty Actuarial Society, where actuaries gather to discuss a wide variety of topics. Some are business-related, but others delve into predictive modeling. Under the latter category, one frequently discussed question is: should actuaries build separate models for frequency and severity, or should they build a single model for loss costs?

Most computer languages and commonly available software products can handle either approach. Generally, a logistic model or a Poisson generalized linear model (GLM) is appropriate for frequency, and a gamma GLM works for severity. Recently, actuaries have modeled loss costs directly using the Tweedie distribution. What is a Tweedie distribution? Here’s a common description: If claims occur through a Poisson process and each loss is gamma-distributed, then the total dollars of loss are Tweedie-distributed. That description makes it sound as if frequency and severity are independent. But if you dig into the equations, you’ll find that’s not the case; in fact, a Tweedie GLM implicitly assumes that predictors of loss simultaneously increase or decrease both claim frequency and claim size. (This is due to the assumption of a scale parameter constant across the entire data set.)

In reality, frequency and severity are probably correlated. We often think of particular loss types as either high-frequency/low-severity (think auto glass breakage) or low-frequency/high-severity (think hurricanes). So we often have situations in which a particular variable affects frequency and severity in opposite directions.

Actuaries who use a frequency/severity approach typically multiply the results of the two models, thus assuming no correlation between frequency and severity, while actuaries who use a Tweedie GLM must assume some positive correlation. There are more sophisticated approaches for the intrepid actuary. A “double GLM” estimates expected loss costs with one model and variance of expected loss costs with another model. Therefore, a double GLM doesn’t make any assumption about the correlation of frequency and severity; it lets the data do the talking.

But how much does any of this matter in reality? One could compare the results obtained from each of these modeling approaches. If they differ meaningfully, then validate each approach’s results on holdout data and choose the one that fits best. But if they don’t differ meaningfully, then just use the simplest modeling approach. For that I would nominate the Tweedie GLM. It requires building only one model instead of two, saving some time. It is also easier, using a Tweedie GLM, to control for factors that are not modeled, since we generally express such factors in terms of loss cost relativities, not frequency and severity relativities.

Of course, there are always two sides in any debate. The best argument for frequency/severity modeling is that having separate models allows for increased insight into the reason certain factors are driving losses. There’s also the matter of estimating the Tweedie “p” parameter, which relates to the coefficient of variation of the losses. There are many ways to estimate it, but our investigations show that model results don’t differ much over a wide range of feasible values for “p,” so it’s not much of an issue.

Do you have differing opinions or insights? If so, please post your comments here, or send me an e-mail at [email protected].