Credibility (20-25%)
1. Apply limited fluctuation (classical) credibility including criteria for both full and partial credibility.
Setup: several kinds of problems:
I.a. Past periods are labeled 1,2,\ldots, n; there are X_j claims (or losses) in each period j.
I.b. Different policies are labeled 1,2,\ldots, n and each X_j is the claim or loss for each policy j.
Assume that
E(X_j) = \xi for every j. (Stable mean)
Var(X_j = \sigma^2 for every j. (Stable variance)
Past experience is \overline{X} = (X_1+ \cdots + X_n)/n.
Then E(\overline{X})= \xi, Var(\overline{X}) = \sigma^2/n.
Define manual premium to be M, some value for the mean that's determined from other past experiences.
Goal: Determine what the new premium should be (between M and \xi.
Full Credibility: Choose r close to 0, p close to 1. We would like the RV \overline{X} to be "stable", i.e., that the difference between \overline{X} and \xi is small, most of the time. More precisely,
Prob( -r\xi \leq \overline{X}-\xi \leq r\xi) \geq p.
Rewrite as
Prob( \left| \frac{\overline{X}-\xi}{\sigma/\sqrt{n}} \right| \leq \frac{r\xi\sqrt{n}}{\sigma}) \geq p.
This holds whenever \frac{r\xi\sqrt{n}}{\sigma} \geq y_p, where
Prob( \left| \frac{\overline{X}-\xi}{\sigma/\sqrt{n}} \right| \leq y_p) = p;
we call this the full credibility standard for the given r and p values:
n \geq \lambda_0 \left(\frac{\sigma}{\xi}\right)^2,\qquad \text{in terms of # of exposures/policies (I.a), or in terms of # of claims (I.b),}
where \lambda_0 = (y_p/r)^2.
We actually find y_p by assuming overline{X} is approximately normal with mean \xi and variance \sigma^2/n; then
p = Prob( \left| \frac{\overline{X}-\xi}{\sigma/\sqrt{n}} \right| \leq y_p) \approx Prob( \left| Z \right| \leq y_p) = 2\Phi(y_p)-1,
where
\Phi is the cdf for the standard normal distribution. Then
y_p is such that
\Phi(y_p) = (p+1)/2 .
The quantity
\lambda_0 \left(\frac{\sigma}{\xi}\right)^2 is what the notes call
Standard for Full Credibility for Severity. (In the situation I.b.)
In practice, we have to estimate
\xi and
\sigma with the data we're presented. (
\sigma is approximated with the "n-1" unbiased estimator.)
II. Suppose we have more info in the case of I.a.: each
X_j is compund Poisson distributed, i.e.,
X_j = Y_{j,1}+\cdots+Y_{j,N_j}, where each
N_j is Poisson with parameter
\lambda and all the
Y's represent the claim distributions, and have mean
\theta_Y and variance
\sigma_Y^2.
Full Credibility on the average number of claims:
The RV to consider here is N_j:
\xi = E(N_j) = \lambda, \qquad \sigma^2 = Var(N_j) = \lambda,
so full credibility for the average number of claims is
n \geq \lambda_0 \left(\frac{\sqrt{\lambda}}{\lambda}\right)^2= \lambda_0/\lambda, \qquad \text{in terms of # of policies,}
n\lambda \geq \lambda_0, \qquad \text{in terms of # of expected claims.}
The quantity
\lambda_0 is what the notes call
Standard for Full Credibility for Frequency.
In practice, we estimate
\lambda by taking # of claims in the data divided by # of policies, and
\theta_Y and
\sigma_Y by the given data. (
\sigma_Y is approximated with the "n-1" unbiased estimator.)
Full Credibility on the average total payment:
The RV to consider here is X_j,
\xi = E(X_j) = \lambda\theta_Y, \qquad \sigma^2 = Var(X_j) = \lambda(\theta_Y^2+\sigma_Y^2,
so full credibility for the average total payment is
n \geq \lambda_0 \frac{\lambda(\theta_Y^2+\sigma_Y^2)}{ \lambda^2\theta_Y^2}= \frac{\lambda_0}{\lambda} \left(1+\left(\frac{\sigma_Y}{\theta_Y}\right)^2\right), \qquad \text{in terms of # of policies,}
n\lambda \geq \lambda_0 \left(1+\left(\frac{\sigma_Y}{\theta_Y}\right)^2\right), \qquad \text{in terms of # of expected claims,}
n\lambda\theta_Y \geq \lambda_0 \left(\theta_Y+\frac{\sigma_Y^2}{\theta_Y}\right), \qquad \text{in terms of # of expected total dollars of claims.}
The middle quantity
\lambda_0 \left(1+\left(\frac{\sigma_Y}{\theta_Y}\right)^2\right) is what the notes call
Standard for Full Credibility for Pure Premium.
Partial Credibility: Let M is the manual premium, and P_c = Z\overline{X} + (1-Z) M be the credibility premium. Note that in the standard for full credibility for frequency,
n \geq \lambda_0 \left(\frac{\sigma}{\xi}\right)^2 \Leftrightarrow Var(\overline{X}) = \frac{\sigma^2}{n} \leq \frac{\xi^2}{\lambda_0},
so we can set the variance of P_c to be the same as \xi^2/\lambda_0:
\frac{\xi^2}{\lambda_0} = Var(P_c) = Z^2 Var(\overline{X}) = Z^2 \frac{\sigma^2}{n}.
So
Z = \frac{\xi\sqrt{n}}{\sigma\sqrt{\lambda_0}}=\sqrt{\frac{n}{\lambda_0\frac{\sigma^2}{\xi^2}}} = \text{Sq root of the ratio of the actual count to the count required for full credibility}.
2. Perform Bayesian analysis using both discrete and continuous models.
Know the following basic formulas about conditional expectation: Let f_{X,Y}(x,y) be the joing pdf for two RVs X and Y, with marginal pdfs ($f_X(x)$} and f_Y(y). [The whole discussion applies to discrete RVs as well.] Then
\text{conditional pdf of }X \text{ given }Y = f_{X|Y}(x|y) =\frac{f_{X,Y}(x,y)}{f_Y(y)}.
\text{If }X \text{ and }Y \text{ are independent, then } f_{X,Y}(x,y) = f_X(x)f_Y(y);
so in the case of independence, the conditional and marginal distributions are the same.
Switching the definitions and formulas around a bit, we have
f_X(x) = \int f_{X|Y}(x|y)f_Y(y)\,dy.
As the result of this formula we can deduce that
If X|Y \sim Poisson(Y) and Y\sim \Gamma(\alpha, \beta), then X\sim NegBinomial(\alpha,\beta).
If X|Y \sim Normal(Y,\sigma_1^2) and Y\sim Normal(\mu,\sigma_2^2 ), then X\sim Normal(\mu,\sigma_1^2+\sigma_2^2).
From the formula above we can exchange the roles of X and Y to get
f_{X,Y}(x,y) = f_{X|Y}(x|y)f_Y(y) = f_{Y|X}(y|x)f_X(x),
yielding Bayes's Theorem:
f_{X|Y}(x|y) = \frac{ f_{Y|X}(y|x)f_X(x)}{f_Y(y)}.
We can also define the conditional expection (which is a random variable that is a function of Y:
E(X|Y) = \int x f_{X|Y}(x|y) \, dx.
It can be shown that
E(E(X|Y)) = \int E(X|Y=y) f_Y(y) \, dy = E(X).
This can be generalized to any functions of X, Y:
E(E(h(X,Y)|Y)) = \int \int h(x,y) f_{X|Y}(x|y)\,dx f_Y(y) \, dy = \int\int h(x,y) f_{X,Y}(x,y)=E(h(X,Y)).
Getting into variances,
Var(X|Y) = E( (X-E(X|Y))^2 | Y) = E(X^2Y) - E(X|Y)^2.
Quite a bit of derivation later we can show that
Var(X) = E(Var(X|Y)) + Var(E(X|Y)).
Now the Baysian anaysis. Setup:
\theta represents a risk parameter with a pdf \pi(\theta) that describes the risk characteristics within a population. Call \pi(\theta) the prior distribution.
\vec{X}=(X_1, \ldots, X_n) represent the claim amount that already occurred in the past; X_{n+1} is the next one we want to predict. Each has a conditional pdf f_{X_j|\theta}(x_j|\theta) (they may be all the same if X_j's are identically distributed). Assume the X_js are independent conditional on \theta. Then
a) Find the joint density function, by plugging in the observed data x_j:
f_{\vec{X},\theta}(\vec{x},\theta)= \prod_{j=1}^n f_{X_j|\theta}(x_j|\theta) \pi(\theta).
b) Find the marginal distribution by integrating out \theta:
f_{\vec{X}}(\vec{x})= \int_\theta f_{\vec{X},\theta}(\vec{x},\theta)\,d\theta.
c) Find the posterior density of \Theta given \vec{X}:
\pi_{\Theta|\vec{X}}(\theta|\vec{x}) = \frac{f_{\vec{X},\theta}(\vec{x},\theta)}{f_{\vec{X}}(\vec{x})}.
We can sometimes avoid evaluating the integral in b) by looking at the numerator in c), and see if it's a known distribution; then b) must be the appropriate constant that makes c) a density function.
d) Then the conditional density of X_{n+1}, given \vec{X} (the predicative distribution), is
f_{X_{n+1}|\vec{X}}(x_{n+1}|\vec{x}) = \frac{f_{X_{n+1},\vec{X}}(x_{n+1},\vec{x})}{f_{\vec{X}}(\vec{x})} = \frac{\int \prod_{j=1}^{n+1} f_{X_j|\theta}(x_j|\theta) \pi(\theta) }{f_{\vec{X}}(\vec{x})} = \int f_{X_{n+1}|\Theta}(x_{n+1}|\theta) \pi_{\Theta|\vec{X}}(\theta|\vec{x}) \,d\theta.
The next step is to find the expected values of X_{n+1}. We can do this two ways:
a) Find the hypothetical mean \mu(\theta), i.e.,
\mu(\theta)=E(X_{n+1}|\Theta=\theta) = \int x_{n+1} f_{X_{n+1}|\theta}(x_{n+1}|\theta) \,dx_{n+1}.
The pure premium \mu (which does not depend on prior observations \vec{x}, or the risk parameter \theta) is the mean of the hypothetical means:
\mu = E(E(X_{n+1}|\Theta)) = E(\mu(\Theta)) = \int \mu(\theta) \pi(\theta)\,d\theta.
b) Find the Bayesian premium, which is the mean of the predictive distribution:
E(X_{n+1}|\vec{X}=\vec{x}) = \int x_{n+1} f_{X_{n+1}|\vec{X}}(x_{n+1}|\vec{x}) \,dx_{n+1} =\ldots = \int \mu(\theta) \pi_{\Theta|\vec{X}}(\theta|\vec{x})\,d\theta.
3. Apply Bühlmann and Bühlmann-Straub models and understand the relationship of these to the Bayesian model.
If the Bayesian premium is difficult to evaluate, an alternative is to alculate the credibility premium instead, that is, find \alpha_0, \ldots, \alpha_n such that the sum Y=\alpha_0+\sum_{j=1}^n \alpha_j X_j minimizes the squared error loss:
Q = E\left[ (\mu_{n+1}(\Theta) - Y )^2 \right],
where the expectation is over all possible joint values of \theta and X_j.
[Side note: the same solution of
\alpha_j's also minimizes
E\left[ (E(X_{n+1}|\vec{X}) - Y )^2 \right] and
E\left[ (X_{n+1} - Y )^2 \right].]
It's messy if the means and variances of X_j are all different, but here are two simplified situations:
a)The Buhlmann model: All the X_js have the same mean and variance and are iid conditional on \Theta. Define the hypothetical mean and process variance as follows, respectively:
\mu(\theta) = E( X_j |\theta), \qquad v(\theta) = VAR (X_j | \theta).
Let \mu, v, a be the expected value of the hypothetical means (EHM), expected value of the process variance (EPV), and variance of the hypothetical means (VHM):
\mu = E(\mu(\theta)), \qquad v= E(v(\theta)), \qquad a = VAR (\mu( \theta)).
Then we can show that the sum Y (the credibility premium) above can be written as Z\overline{X} + (1-Z)\mu, where
Z=\frac{n}{n+k}, \qquad k = \frac{v}{a} = \frac{\text{EPV}}{\text{VHM}}.
Z is called the Buhlmann credibility factor.
b)The Buhlmann-Straub model: Same except now the variance looks like
Var(X_j|\theta) = \frac{v(\theta)}{m_j};
the factors m_j takes into account things that changes every year, like the # of individuals in the group in year j, or the premium income for the policy in year j. Let m=sum_j m_j.
Define \mu, v, a exactly as above (not involving the m_j's). Then the credibility premium becomes Z\overline{X} + (1-Z)\mu where
Z=\frac{m}{m+k}, \qquad k = \frac{v}{a} = \frac{\text{EPV}}{\text{VHM}}, \qquad \overline{X} = \sum_{j=1}^n \frac{m_j}{m} X_j.
The questions seem to come in this flavor: Suppose in year
j there are
N_j claims from
m_j policies, for
j = 1, \ldots, n. The individual policy has the Poisson distribution with parameter
\Theta, and the parameter itself has the gamma distribution with parameters
\alpha and
\beta. Estimate the number of claims in year
n+1 if there will be
m_{n+1} policies.
Here we let X_j=N_j/m_j. The N_j claims from m_j policies are given in the data of the problem, so we can compute X_j=N_j/m_j and \overline{X} by average the X_j's.
The individual policy X_j has Poisson distribution, so E(X_j | \Theta) = \Theta, and Var(X_j |\Theta = \Theta, and
\mu = E(\Theta) = \alpha\beta, \qquad v=E(\Theta) =\alpha\beta, \qquad a = Var(\Theta) = \alpha\beta^2.
[Expectation and variance are taken with respect to the Gamma distribution.]
Then k=v/a, Z=m/m+k, and the credibility estimate for X_{n+1} (# of claims in year n+1 by one insured) is Z\overline{X} + (1-Z)\mu, so the number of claims in year n+1 is that multiplied by m_{n+1}.
4. Apply conjugate priors in Bayesian analysis and in particular the Poisson-gamma model.
If a prior distribution gives rise to a posterior distribution that's in the same family (maybe a different parameter), it's called a conjugate prior distribution. An example is the Poisson-Gamma model.
Setup:
Prior distribution \Theta \sim Gamma(\alpha,\beta'), with \beta' = 1/\theta where the \theta is the one in the appendix.
The observations X_j \sim Poisson(\Theta).
Then we can show that the posterior distribution is \sim Gamma(\alpha+ \sum_j x_j, \beta + n), so the expected value of the posterior distribution is
\frac{\alpha+\sum_j x_j}{\beta'+n}.
In the situation I.a., this is (\alpha + # of claims over n years) divided by (\beta' + # of n years that we've been observing).
5. Apply empirical Bayesian methods in the nonparametric and semiparametric cases.
In practice, we might not have a given pdfs for \pi and f_{X_j}'s, so we'd have to estimate those. Two kinds of situations:
a) Nonparametric: where \pi and f_{X_j}'s are unspecified- like in Buhlmann models where only the first two moments are needed.
b) Semiparametric: where
f_{X_j}'s are in parametric form, but not
\pi.
a) Nonparametric estimation: Goal is to estiimate \mu, v, a based on observations.
steps:
1) Observations are given for r policy holders. The losses of each policy holder are described by \vec{X}_{i}=X_{i,1}, \cdots, X_{i,n_i}, for i = 1,\ldots, r. Suppose for simplicity that all n_is are the same and equal n.
2) Then the old quantity \mu(\theta) can be estimated by \overline(X)_i = 1/n \sum_{j=1}^n X_{i,j}.
3) Then \mu can be estimated by \mu = 1/r \sum_{i=1}^r \overline{X}_{i}.
4) Also v(\theta) can be estimated by v_i = 1/(n-1) sum_{j=1}^n \left(X_{i,j}-\overline{X}_i \right)^2.
5) So v is estimated by v = 1/r \sum_{i=1}^r v_{i}.
6) For a, we first estimate Var(\overline{X}_i) by 1/(r-1) sum_{i=1}^r \left(X_{i}-\mu \right)^2.
7) Then note that
Var(\overline{X}_i) = Var(E(\overline{X}_i)|\Theta) + E(Var(\overline{X}_i|\Theta)) = \cdots = a + \frac{v}{n},
so that a=Var(\overline{X}_i)-v/n.
8) Then k=v/a, Z = n/(n+k), and the credibility premium is Z\overline{X}_i + (1-Z)\mu, for each policy holder i = 1, \ldots, r. All these quantities are estimates.