Credibility (20-25%)
1. Apply limited fluctuation (classical) credibility including criteria for both full and partial credibility.
Setup:  several kinds of problems:
I.a.  Past periods are labeled {$1,2,\ldots, n$}; there are {$X_j$} claims (or losses) in each period {$j$}.
I.b.  Different policies are labeled {$1,2,\ldots, n$} and each {$X_j$} is the claim or loss for each policy {$j$}.
Assume that 
{$E(X_j) = \xi $} for every {$j$}.  (Stable mean)
{$Var(X_j = \sigma^2$} for every {$j$}. (Stable variance)
Past experience is  {$\overline{X} = (X_1+ \cdots + X_n)/n$}.
Then {$E(\overline{X})= \xi, Var(\overline{X}) = \sigma^2/n$}.  
Define manual premium to be {$M$}, some value for the mean that's determined from other past experiences.
Goal:  Determine what the new premium should be (between {$M$} and {$\xi$}.
Full Credibility:  Choose {$r$} close to 0, {$p$} close to 1.  We would like the RV {$\overline{X}$} to be "stable", i.e., that the difference between {$\overline{X}$} and {$\xi$} is small, most of the time.  More precisely, 
{$$Prob( -r\xi \leq \overline{X}-\xi \leq r\xi) \geq p.$$}
Rewrite as
{$$Prob( \left| \frac{\overline{X}-\xi}{\sigma/\sqrt{n}}  \right| \leq \frac{r\xi\sqrt{n}}{\sigma}) \geq p.$$}
This holds whenever {$ \frac{r\xi\sqrt{n}}{\sigma} \geq y_p$}, where
{$$Prob( \left| \frac{\overline{X}-\xi}{\sigma/\sqrt{n}}  \right| \leq y_p) = p;$$}
we call this the full credibility standard for the given {$r$} and {$p$} values:
{$$n \geq \lambda_0 \left(\frac{\sigma}{\xi}\right)^2,\qquad \text{in terms of # of exposures/policies (I.a), or in terms of # of claims (I.b),}$$}
where {$\lambda_0 = (y_p/r)^2$}.
We actually find {$y_p$} by assuming {$overline{X}$} is approximately normal with mean {$\xi$} and variance {$\sigma^2/n$}; then
{$$p = Prob( \left| \frac{\overline{X}-\xi}{\sigma/\sqrt{n}}  \right| \leq y_p) \approx  Prob( \left| Z \right| \leq y_p) = 2\Phi(y_p)-1, $$}
where {$\Phi$} is the cdf for the standard normal distribution.  Then {$y_p$} is such that {$\Phi(y_p) = (p+1)/2 $}.
The quantity {$\lambda_0 \left(\frac{\sigma}{\xi}\right)^2$} is what the notes call 
Standard for Full Credibility for Severity. (In the situation I.b.)
In practice, we have to estimate {$\xi$} and {$\sigma$} with the data we're presented.   ({$\sigma$} is approximated with the "n-1" unbiased estimator.)
II.  Suppose we have more info in the case of I.a.:  each {$X_j$} is compund Poisson distributed, i.e., {$X_j = Y_{j,1}+\cdots+Y_{j,N_j}$}, where each {$N_j$} is Poisson with parameter {$\lambda$} and all the {$Y$}'s represent the claim distributions, and have mean {$\theta_Y$} and variance {$\sigma_Y^2$}.
Full Credibility on the average number of claims:
The RV to consider here is {$N_j$}:
{$$ \xi = E(N_j) = \lambda, \qquad \sigma^2 = Var(N_j) = \lambda,$$}
- so full credibility for the average number of claims is
 
{$$n \geq \lambda_0 \left(\frac{\sqrt{\lambda}}{\lambda}\right)^2= \lambda_0/\lambda, \qquad \text{in terms of # of policies,}$$}
{$$n\lambda \geq \lambda_0, \qquad \text{in terms of # of expected claims.}$$}
The quantity {$\lambda_0$} is what the notes call 
Standard for Full Credibility for Frequency.
In practice, we estimate {$\lambda$} by taking # of claims in the data divided by # of policies, and {$\theta_Y$} and {$\sigma_Y$} by the given data.  ({$\sigma_Y$} is approximated with the "n-1" unbiased estimator.)
Full Credibility on the average total payment:
The RV to consider here is {$X_j$},
{$$ \xi = E(X_j) = \lambda\theta_Y, \qquad \sigma^2 = Var(X_j) = \lambda(\theta_Y^2+\sigma_Y^2,$$}
- so full credibility for the average total payment is
 
{$$n \geq \lambda_0 \frac{\lambda(\theta_Y^2+\sigma_Y^2)}{ \lambda^2\theta_Y^2}= \frac{\lambda_0}{\lambda} \left(1+\left(\frac{\sigma_Y}{\theta_Y}\right)^2\right), \qquad \text{in terms of # of policies,}$$}
{$$n\lambda \geq \lambda_0 \left(1+\left(\frac{\sigma_Y}{\theta_Y}\right)^2\right), \qquad \text{in terms of # of expected claims,}$$}
{$$n\lambda\theta_Y \geq \lambda_0 \left(\theta_Y+\frac{\sigma_Y^2}{\theta_Y}\right), \qquad \text{in terms of # of expected total dollars of claims.}$$}
The middle quantity {$\lambda_0 \left(1+\left(\frac{\sigma_Y}{\theta_Y}\right)^2\right)$} is what the notes call 
Standard for Full Credibility for Pure Premium.
Partial Credibility:  Let {$M$} is the manual premium, and {$P_c = Z\overline{X} + (1-Z) M$} be the credibility premium.  Note that in the standard for full credibility for frequency,
{$$n \geq \lambda_0 \left(\frac{\sigma}{\xi}\right)^2 \Leftrightarrow  Var(\overline{X}) = \frac{\sigma^2}{n} \leq \frac{\xi^2}{\lambda_0},$$}
so we can set the variance of {$P_c$} to be the same as {$\xi^2/\lambda_0$}:
{$$\frac{\xi^2}{\lambda_0} = Var(P_c) = Z^2 Var(\overline{X}) = Z^2 \frac{\sigma^2}{n}.$$}
So 
{$$Z = \frac{\xi\sqrt{n}}{\sigma\sqrt{\lambda_0}}=\sqrt{\frac{n}{\lambda_0\frac{\sigma^2}{\xi^2}}} = \text{Sq root of the ratio of the actual count to the count required for full credibility}.$$}
2. Perform Bayesian analysis using both discrete and continuous models.
Know the following basic formulas about conditional expectation:  Let {$f_{X,Y}(x,y)$} be the joing pdf for two RVs {$X$} and {$Y$}, with marginal pdfs ($f_X(x)$} and {$f_Y(y)$}.  [The whole discussion applies to discrete RVs as well.]  Then
{$$\text{conditional pdf of }X \text{ given }Y = f_{X|Y}(x|y) =\frac{f_{X,Y}(x,y)}{f_Y(y)}.$$}
{$$\text{If }X \text{ and }Y \text{ are independent, then } f_{X,Y}(x,y) = f_X(x)f_Y(y);$$}
so in the case of independence, the conditional and marginal distributions are the same.
Switching the definitions and formulas around a bit, we have
{$$f_X(x) = \int f_{X|Y}(x|y)f_Y(y)\,dy.$$}
As the result of this formula we can deduce that
If {$X|Y \sim Poisson(Y)$} and {$Y\sim \Gamma(\alpha, \beta)$}, then {$X\sim NegBinomial(\alpha,\beta)$}.
If {$X|Y \sim Normal(Y,\sigma_1^2)$} and {$Y\sim Normal(\mu,\sigma_2^2 )$}, then {$X\sim Normal(\mu,\sigma_1^2+\sigma_2^2)$}.
From the formula above we can exchange the roles of {$X$} and {$Y$} to get
{$$f_{X,Y}(x,y) = f_{X|Y}(x|y)f_Y(y) = f_{Y|X}(y|x)f_X(x),$$}
yielding Bayes's Theorem:
{$$f_{X|Y}(x|y) = \frac{ f_{Y|X}(y|x)f_X(x)}{f_Y(y)}.$$}
We can also define the conditional expection (which is a random variable that is a function of {$Y$}:
{$$E(X|Y) = \int x f_{X|Y}(x|y) \, dx.$$}
It can be shown that
{$$E(E(X|Y)) = \int E(X|Y=y) f_Y(y) \, dy = E(X).$$}
This can be generalized to any functions of {$X, Y$}:
{$$E(E(h(X,Y)|Y)) = \int \int h(x,y) f_{X|Y}(x|y)\,dx f_Y(y) \, dy = \int\int h(x,y) f_{X,Y}(x,y)=E(h(X,Y)).$$}
Getting into variances,
{$$Var(X|Y) = E( (X-E(X|Y))^2 | Y) = E(X^2Y) - E(X|Y)^2.$$}
Quite a bit of derivation later we can show that
{$$Var(X) = E(Var(X|Y)) + Var(E(X|Y)).$$}
Now the Baysian anaysis.  Setup:
{$\theta$} represents a risk parameter with a pdf {$\pi(\theta)$} that describes the risk characteristics within a population.  Call {$\pi(\theta)$} the prior distribution.
{$\vec{X}=(X_1, \ldots, X_n)$} represent the claim amount that already occurred in the past; {$X_{n+1}$} is the next one we want to predict.  Each has a conditional pdf {$f_{X_j|\theta}(x_j|\theta)$} (they may be all the same if {$X_j$}'s are identically distributed).  Assume the {$X_j$}s are independent conditional on {$\theta$}.  Then
a)  Find the joint density function, by plugging in the observed data {$x_j$}:
{$$f_{\vec{X},\theta}(\vec{x},\theta)= \prod_{j=1}^n f_{X_j|\theta}(x_j|\theta) \pi(\theta).$$}
b)  Find the marginal distribution by integrating out {$\theta$}:
{$$f_{\vec{X}}(\vec{x})= \int_\theta f_{\vec{X},\theta}(\vec{x},\theta)\,d\theta.$$}
c)  Find the posterior density of {$\Theta$} given {$\vec{X}$}:
{$$\pi_{\Theta|\vec{X}}(\theta|\vec{x}) = \frac{f_{\vec{X},\theta}(\vec{x},\theta)}{f_{\vec{X}}(\vec{x})}.$$} 
We can sometimes avoid evaluating the integral in b) by looking at the numerator in c), and see if it's a known distribution; then b) must be the appropriate constant that makes c) a density function.
d)  Then the conditional density of {$X_{n+1}$}, given {$\vec{X}$} (the predicative distribution), is
{$$f_{X_{n+1}|\vec{X}}(x_{n+1}|\vec{x}) = \frac{f_{X_{n+1},\vec{X}}(x_{n+1},\vec{x})}{f_{\vec{X}}(\vec{x})} = \frac{\int \prod_{j=1}^{n+1} f_{X_j|\theta}(x_j|\theta) \pi(\theta)   }{f_{\vec{X}}(\vec{x})} = \int f_{X_{n+1}|\Theta}(x_{n+1}|\theta) \pi_{\Theta|\vec{X}}(\theta|\vec{x}) \,d\theta. $$}
The next step is to find the expected values of {$X_{n+1}$}.  We can do this two ways:
a)  Find the hypothetical mean {$\mu(\theta)$}, i.e.,
{$$\mu(\theta)=E(X_{n+1}|\Theta=\theta) = \int x_{n+1} f_{X_{n+1}|\theta}(x_{n+1}|\theta) \,dx_{n+1}.$$}
The pure premium {$\mu$} (which does not depend on prior observations {$\vec{x}$}, or the risk parameter {$\theta$}) is the mean of the hypothetical means:
{$$\mu = E(E(X_{n+1}|\Theta)) = E(\mu(\Theta)) = \int \mu(\theta) \pi(\theta)\,d\theta.$$}
b) Find the Bayesian premium, which is the mean of the predictive distribution:
{$$E(X_{n+1}|\vec{X}=\vec{x}) =  \int x_{n+1} f_{X_{n+1}|\vec{X}}(x_{n+1}|\vec{x}) \,dx_{n+1} =\ldots = \int \mu(\theta) \pi_{\Theta|\vec{X}}(\theta|\vec{x})\,d\theta.$$}
3. Apply Bühlmann and Bühlmann-Straub models and understand the relationship of these to the Bayesian model.
If the Bayesian premium is difficult to evaluate, an alternative is to alculate the credibility premium instead, that is, find {$\alpha_0, \ldots, \alpha_n$} such that the sum {$Y=\alpha_0+\sum_{j=1}^n \alpha_j X_j$} minimizes the squared error loss:
{$$Q = E\left[ (\mu_{n+1}(\Theta) - Y )^2   \right],$$}
where the expectation is over all possible joint values of {$\theta$} and {$X_j$}.  
[Side note:  the same solution of {$\alpha_j$}'s also minimizes {$E\left[ (E(X_{n+1}|\vec{X}) - Y )^2   \right]$} and  {$E\left[ (X_{n+1} - Y )^2   \right]$}.]
It's messy if the means and variances of {$X_j$} are all different, but here are two simplified situations:
a)The Buhlmann model:  All the {$X_j$}s have the same mean and variance and are iid conditional on {$\Theta$}.  Define the hypothetical mean and process variance as follows, respectively:
{$$\mu(\theta) = E( X_j |\theta), \qquad v(\theta) = VAR (X_j | \theta).$$}
Let {$\mu, v, a$} be the expected value of the hypothetical means (EHM), expected value of the process variance (EPV), and variance of the hypothetical means (VHM):
{$$\mu = E(\mu(\theta)), \qquad v= E(v(\theta)), \qquad a = VAR (\mu( \theta)).$$}
Then we can show that the sum {$Y$} (the credibility premium) above can be written as {$Z\overline{X} + (1-Z)\mu$}, where 
{$$Z=\frac{n}{n+k}, \qquad k = \frac{v}{a} = \frac{\text{EPV}}{\text{VHM}}.$$}
{$Z$} is called the Buhlmann credibility factor.
b)The Buhlmann-Straub model:  Same except now the variance looks like
{$$Var(X_j|\theta) = \frac{v(\theta)}{m_j};$$}  
the factors {$m_j$} takes into account things that changes every year, like the # of individuals in the group in year {$j$}, or the premium income for the policy in year {$j$}.  Let {$m=sum_j m_j$}.
Define {$\mu, v, a$} exactly as above (not involving the {$m_j$}'s).  Then the credibility premium becomes  {$Z\overline{X} + (1-Z)\mu$} where
{$$Z=\frac{m}{m+k}, \qquad k = \frac{v}{a} = \frac{\text{EPV}}{\text{VHM}}, \qquad \overline{X} = \sum_{j=1}^n \frac{m_j}{m} X_j.$$}
The questions seem to come in this flavor:  Suppose in year {$j$} there are {$N_j$} claims from {$m_j$} policies, for {$j = 1, \ldots, n$}.  The individual policy has the Poisson distribution with parameter {$\Theta$}, and the parameter itself has the gamma distribution with parameters {$\alpha$} and {$\beta$}.  Estimate the number of claims in year {$n+1$} if there will be {$m_{n+1}$} policies.
Here we let {$X_j=N_j/m_j$}.  The {$N_j$} claims from {$m_j$} policies are given in the data of the problem, so we can compute {$X_j=N_j/m_j$} and {$\overline{X}$} by average the {$X_j$}'s.   
The individual policy {$X_j$} has Poisson distribution, so {$E(X_j | \Theta) = \Theta$}, and {$Var(X_j |\Theta = \Theta$}, and
{$$\mu = E(\Theta) = \alpha\beta, \qquad v=E(\Theta) =\alpha\beta, \qquad a = Var(\Theta) = \alpha\beta^2.$$}
[Expectation and variance are taken with respect to the Gamma distribution.]
Then {$k=v/a$}, {$Z=m/m+k$}, and the credibility estimate for {$X_{n+1}$} (# of claims in year {$n+1$} by one insured) is {$Z\overline{X} + (1-Z)\mu$}, so the number of claims in year {$n+1$} is that multiplied by {$m_{n+1}$}.
4. Apply conjugate priors in Bayesian analysis and in particular the Poisson-gamma model.
If a prior distribution gives rise to a posterior distribution that's in the same family (maybe a different parameter), it's called a conjugate prior distribution.  An example is the Poisson-Gamma model.
Setup:
Prior distribution {$\Theta \sim Gamma(\alpha,\beta')$}, with {$\beta' = 1/\theta$} where the {$\theta$} is the one in the appendix.
The observations {$X_j \sim Poisson(\Theta).$}
Then we can show that the posterior distribution is {$\sim Gamma(\alpha+ \sum_j x_j, \beta + n)$}, so the expected value of the posterior distribution is 
{$$\frac{\alpha+\sum_j x_j}{\beta'+n}.$$} 
In the situation I.a., this is ({$\alpha$} + # of claims over {$n$} years) divided by ({$\beta'$} + # of {$n$} years that we've been observing).
5. Apply empirical Bayesian methods in the nonparametric and semiparametric cases.
In practice, we might not have a given pdfs for {$\pi$} and {$f_{X_j}$}'s, so we'd have to estimate those.  Two kinds of situations:
a) Nonparametric:  where {$\pi$} and {$f_{X_j}$}'s are unspecified- like in Buhlmann models where only the first two moments are needed.
b) Semiparametric:  where {$f_{X_j}$}'s  are in parametric form, but not  {$\pi$}.
a) Nonparametric estimation:  Goal is to estiimate  {$\mu, v, a$} based on observations.
steps:
1) Observations are given for {$r$} policy holders.  The losses of each policy holder are described by {$\vec{X}_{i}=X_{i,1}, \cdots, X_{i,n_i}$}, for {$i = 1,\ldots, r$}.  Suppose for simplicity that all {$n_i$}s are the same and equal {$n$}.
2) Then the old quantity {$\mu(\theta)$} can be estimated by {$\overline(X)_i = 1/n \sum_{j=1}^n X_{i,j}.$}
3) Then {$\mu$} can be estimated by {$\mu = 1/r \sum_{i=1}^r \overline{X}_{i}.$} 
4) Also {$v(\theta)$} can be estimated by {$v_i = 1/(n-1) sum_{j=1}^n \left(X_{i,j}-\overline{X}_i \right)^2$}.
5) So {$v$} is estimated by  {$v = 1/r \sum_{i=1}^r v_{i}.$} 
6) For {$a$}, we first estimate {$Var(\overline{X}_i)$} by {$ 1/(r-1) sum_{i=1}^r \left(X_{i}-\mu \right)^2$}.
7) Then note that 
{$$Var(\overline{X}_i) = Var(E(\overline{X}_i)|\Theta) + E(Var(\overline{X}_i|\Theta)) = \cdots = a + \frac{v}{n}, $$}
- so that {$a=Var(\overline{X}_i)-v/n$}.
 - 8) Then {$k=v/a$}, {$Z = n/(n+k)$}, and the credibility premium is {$Z\overline{X}_i + (1-Z)\mu$}, for each policy holder {$i = 1, \ldots, r$}.  All these quantities are estimates.