Maximum Likelihood Estimation for Machine Learning

Maximum likelihood estimation begins with writing a mathematical expression known as the Likelihood Function of the sample data. Loosely speaking, the likelihood of a set of data is the probability of obtaining that particular set of data, given the chosen probability distribution model. This expression contains the unknown model parameters. The values of these parameters that maximize the sample likelihood are known as the Maximum Likelihood Estimates or MLEs.

Maximum likelihood estimation is a totally analytic maximization procedure. It applies to every form of censored or multicensored data, and it is even possible to use the technique across several stress cells and estimate acceleration model parameters at the same time as life distribution parameters. Moreover, MLEs and Likelihood Functions generally have very desirable large sample properties:

  • they become unbiased minimum variance estimators as the sample size increases
  • they have approximate normal distributions and approximate sample variances that can be calculated and used to generate confidence bounds
  • likelihood functions can be used to test hypotheses about models and parameters

There are only two drawbacks to MLEs, but they are important ones:

With small numbers of failures (less than 5, and sometimes less than 10 is small), MLEs can be heavily biased and the large sample optimality properties do not apply
Calculating MLEs often requires specialized software for solving complex non-linear equations. This is less of a problem as time goes by, as more statistical packages are upgrading to contain MLE analysis capability every year.

Likelihood Function Examples for Reliability Data:

Let f(t) be the PDF and F(t) the CDF for the chosen life distribution model. Note that these are functions of t and the unknown parameters of the model.
The general mathematical technique for solving for MLEs involves setting partial derivatives of ln L (the derivatives are taken with respect to the unknown parameters) equal to zero and solving the resulting (usually non-linear) equations. The equation for the exponential model can easily be solved, however.