Score Function and Fisher Information
Table of contents
Score Function
The score function is the first derivative (gradient) of the log-likelihood function with respect to the parameter
The way we find MLE is by taking the derivative of
The log-likelihood function is:
The score function is:
Other notation
Some people like to define the score function for a single observation
And say that the score function for the entire sample is:
Expected Value of the Score Function
When
Which makes sense, because we want all the partial derivatives to be zero (at a maxima) when we have the true parameter.
Fisher Information
The Fisher information is the variance of the score function.
For IID samples, it suffices to calculate the variance of the score function for a single observation:
And then
There are several other ways to define the Fisher information.
First one is to use the fact that the score function has expected value 0 around the true parameter:
Furthermore, you could also derive the fact that:
And then you could define the Fisher information for a single observation as:
Don’t forget
Interpretation of Fisher Information
We know that the second derivative of a function is a measure of curvature.
So the Fisher information is a measure of the curvature of the log-likelihood function around
If the curve is shallow, then the Fisher information is small, meaning, this may have been just a local maxima and we may not be very confident with our MLE.
It turns out that
Inverse of Fisher Information
The inverse of the Fisher information is the variance of the MLE:
This matches up with the interpretation above: larger Fisher information means smaller variance of the MLE, meaning our estimate is more precise.