Performing Fits and Analyzing Outputs

As shown in the previous chapter, a simple fit can be performed with the minimize() function. For more sophisticated modeling, the Minimizer class can be used to gain a bit more control, especially when using complicated constraints or comparing results from related fits.

The minimize() function

The minimize() function is a wrapper around Minimizer for running an optimization problem. It takes an objective function (the function that calculates the array to be minimized), a Parameters object, and several optional arguments. See Writing a Fitting Function for details on writing the objective function.

minimize(fcn, params, method='leastsq', args=None, kws=None, iter_cb=None, scale_covar=True, nan_policy='raise', reduce_fcn=None, calc_covar=True, max_nfev=None, **fit_kws)

Perform the minimization of the objective function.

The minimize function takes an objective function to be minimized, a dictionary (Parameters ; Parameters) containing the model parameters, and several optional arguments including the fitting method.

Parameters:
  • fcn (callable) –

    Objective function to be minimized. When method is ‘leastsq’ or ‘least_squares’, the objective function should return an array of residuals (difference between model and data) to be minimized in a least-squares sense. With the scalar methods the objective function can either return the residuals array or a single scalar value. The function must have the signature:

    fcn(params, *args, **kws)
    

  • params (Parameters) – Contains the Parameters for the model.

  • method (str, optional) –

    Name of the fitting method to use. Valid values are:

    • ’leastsq’: Levenberg-Marquardt (default)

    • ’least_squares’: Least-Squares minimization, using Trust Region Reflective method

    • ’differential_evolution’: differential evolution

    • ’brute’: brute force method

    • ’basinhopping’: basinhopping

    • ’ampgo’: Adaptive Memory Programming for Global Optimization

    • nelder’: Nelder-Mead

    • ’lbfgsb’: L-BFGS-B

    • ’powell’: Powell

    • ’cg’: Conjugate-Gradient

    • ’newton’: Newton-CG

    • ’cobyla’: Cobyla

    • ’bfgs’: BFGS

    • ’tnc’: Truncated Newton

    • ’trust-ncg’: Newton-CG trust-region

    • ’trust-exact’: nearly exact trust-region

    • ’trust-krylov’: Newton GLTR trust-region

    • ’trust-constr’: trust-region for constrained optimization

    • ’dogleg’: Dog-leg trust-region

    • ’slsqp’: Sequential Linear Squares Programming

    • ’emcee’: Maximum likelihood via Monte-Carlo Markov Chain

    • ’shgo’: Simplicial Homology Global Optimization

    • ’dual_annealing’: Dual Annealing optimization

    In most cases, these methods wrap and use the method of the same name from scipy.optimize, or use scipy.optimize.minimize with the same method argument. Thus ‘leastsq’ will use scipy.optimize.leastsq, while ‘powell’ will use scipy.optimize.minimizer(…, method=’powell’)

    For more details on the fitting methods please refer to the SciPy docs.

  • args (tuple, optional) – Positional arguments to pass to fcn.

  • kws (dict, optional) – Keyword arguments to pass to fcn.

  • iter_cb (callable, optional) –

    Function to be called at each fit iteration. This function should have the signature:

    iter_cb(params, iter, resid, *args, **kws),
    

    where params will have the current parameter values, iter the iteration number, resid the current residual array, and *args and **kws as passed to the objective function.

  • scale_covar (bool, optional) – Whether to automatically scale the covariance matrix (default is True).

  • nan_policy ({'raise', 'propagate', 'omit'}, optional) –

    Specifies action if fcn (or a Jacobian) returns NaN values. One of:

    • ’raise’ : a ValueError is raised

    • ’propagate’ : the values returned from userfcn are un-altered

    • ’omit’ : non-finite values are filtered

  • reduce_fcn (str or callable, optional) – Function to convert a residual array to a scalar value for the scalar minimizers. See Notes in Minimizer.

  • calc_covar (bool, optional) – Whether to calculate the covariance matrix (default is True) for solvers other than ‘leastsq’ and ‘least_squares’. Requires the numdifftools package to be installed.

  • max_nfev (int or None, optional) – Maximum number of function evaluations (default is None). The default value depends on the fitting method.

  • **fit_kws (dict, optional) – Options to pass to the minimizer being used.

Returns:

Object containing the optimized parameters and several goodness-of-fit statistics.

Return type:

MinimizerResult

Changed in version 0.9.0: Return value changed to MinimizerResult.

Notes

The objective function should return the value to be minimized. For the Levenberg-Marquardt algorithm from leastsq(), this returned value must be an array, with a length greater than or equal to the number of fitting variables in the model. For the other methods, the return value can either be a scalar or an array. If an array is returned, the sum-of- squares of the array will be sent to the underlying fitting method, effectively doing a least-squares optimization of the return values.

A common use for args and kws would be to pass in other data needed to calculate the residual, including such things as the data array, dependent variable, uncertainties in the data, and other data structures for the model calculation.

On output, params will be unchanged. The best-fit values and, where appropriate, estimated uncertainties and correlations, will all be contained in the returned MinimizerResult. See MinimizerResult – the optimization result for further details.

This function is simply a wrapper around Minimizer and is equivalent to:

fitter = Minimizer(fcn, params, fcn_args=args, fcn_kws=kws,
                   iter_cb=iter_cb, scale_covar=scale_covar,
                   nan_policy=nan_policy, reduce_fcn=reduce_fcn,
                   calc_covar=calc_covar, **fit_kws)
fitter.minimize(method=method)

Writing a Fitting Function

An important component of a fit is writing a function to be minimized – the objective function. Since this function will be called by other routines, there are fairly stringent requirements for its call signature and return value. In principle, your function can be any Python callable, but it must look like this:

func(params, *args, **kws):

Calculate objective residual to be minimized from parameters.

Parameters:
  • params (Parameters) – Parameters.

  • args – Positional arguments. Must match args argument to minimize().

  • kws – Keyword arguments. Must match kws argument to minimize().

Returns:

Residual array (generally data-model) to be minimized in the least-squares sense.

Return type:

numpy.ndarray. The length of this array cannot change between calls.

A common use for the positional and keyword arguments would be to pass in other data needed to calculate the residual, including things as the data array, dependent variable, uncertainties in the data, and other data structures for the model calculation.

The objective function should return the value to be minimized. For the Levenberg-Marquardt algorithm from leastsq(), this returned value must be an array, with a length greater than or equal to the number of fitting variables in the model. For the other methods, the return value can either be a scalar or an array. If an array is returned, the sum of squares of the array will be sent to the underlying fitting method, effectively doing a least-squares optimization of the return values.

Since the function will be passed in a dictionary of Parameters, it is advisable to unpack these to get numerical values at the top of the function. A simple way to do this is with Parameters.valuesdict(), as shown below:

from numpy import exp, sign, sin, pi


def residual(pars, x, data=None, eps=None):
    # unpack parameters: extract .value attribute for each parameter
    parvals = pars.valuesdict()
    period = parvals['period']
    shift = parvals['shift']
    decay = parvals['decay']

    if abs(shift) > pi/2:
        shift = shift - sign(shift)*pi

    if abs(period) < 1.e-10:
        period = sign(period)*1.e-10

    model = parvals['amp'] * sin(shift + x/period) * exp(-x*x*decay*decay)

    if data is None:
        return model
    if eps is None:
        return model - data
    return (model-data) / eps

In this example, x is a positional (required) argument, while the data array is actually optional (so that the function returns the model calculation if the data is neglected). Also note that the model calculation will divide x by the value of the period Parameter. It might be wise to ensure this parameter cannot be 0. It would be possible to use bounds on the Parameter to do this:

params['period'] = Parameter(name='period', value=2, min=1.e-10)

but putting this directly in the function with:

if abs(period) < 1.e-10:
    period = sign(period)*1.e-10

is also a reasonable approach. Similarly, one could place bounds on the decay parameter to take values only between -pi/2 and pi/2.

Types of Data to Use for Fitting

Minimization methods assume that data is numerical. For all the fitting methods supported by lmfit, data and fitting parameters are also assumed to be continuous variables. As the routines make heavy use of numpy and scipy, the most natural data to use in fitting is then numpy nd-arrays. In fact, many of the underlying fitting algorithms - including the default leastsq() method - require the values in the residual array used for the minimization to be a 1-dimensional numpy array with data type (dtype) of “float64”: a 64-bit representation of a floating point number (sometimes called a “double precision float”).

Python is generally forgiving about data types, and in the scientific Python community there is a concept of an object being “array like” which essentially means that the can usually be coerced or interpreted as a numpy array, often with that object having an __array__() method specially designed for that conversion. Important examples of objects that can be considered “array like” include Lists and Tuples that contain only numbers, pandas Series, and HDF5 Datasets. Many objects from data-processing libraries like dask, xarray, zarr, and more are also “array like”.

Lmfit tries to be accommodating in the data that can be used in the fitting process. When using Minimizer, the data you pass in as extra arrays for the calculation of the residual array will not be altered, and can be used in your objective function in whatever form you send. Usually, “array like” data will work, but some care may be needed. In the example above, if x was not a numpy array but a list of numbers, this would give an error message like:

TypeError: unsupported operand type(s) for /: 'list' and 'float'

or:

TypeError: can't multiply sequence by non-int of type 'float'

because a list of numbers is only sometimes “array like”.

Sending in a “more array-like” object like a pandas Series will avoid many (though maybe not all!) such exceptions, but the resulting calculation returned from the function would then also be a pandas Series. Lmfit minimize() will always coerce the return value from the objective function into a 1-D numpy array with dtype of “float64”. This will usually “just work”, but there may be exceptions.

When in doubt, or if running it trouble, converting data to float64 numpy arrays before being used in a fit is recommended. If using complex data or functions, a dtype of “complex128” will also always work, and will be converted to “float64” with ndaarray.view("float64"). Numpy arrays of other dtype (say, “int16” or “float32”) should be used with caution. In particular, “float32” data should be avoided: Multiplying a “float32” array and a Python float will result in a “float32” array for example. As fitting variables may have small changes made to them, the results may be at or below “float32” precision, which will cause the fit to give up. For integer data, results are more sometimes promoted to “float64”, but many numpy ufuncs (say, numpy.exp()) will promote only to “float32”, so care is still needed.

See also Data Types for data and independent data with Model for discussion of data passed in for curve-fitting.

Choosing Different Fitting Methods

By default, the Levenberg-Marquardt algorithm is used for fitting. While often criticized, including the fact it finds a local minimum, this approach has some distinct advantages. These include being fast, and well-behaved for most curve-fitting needs, and making it easy to estimate uncertainties for and correlations between pairs of fit variables, as discussed in MinimizerResult – the optimization result.

Alternative algorithms can also be used by providing the method keyword to the minimize() function or Minimizer.minimize() class as listed in the Table of Supported Fitting Methods. If you have the numdifftools package installed, lmfit will try to estimate the covariance matrix and determine parameter uncertainties and correlations if calc_covar is True (default).

Table of Supported Fitting Methods:

Fitting Method

method arg to minimize() or Minimizer.minimize()

Levenberg-Marquardt

leastsq or least_squares

Nelder-Mead

nelder

L-BFGS-B

lbfgsb

Powell

powell

Conjugate Gradient

cg

Newton-CG

newton

COBYLA

cobyla

BFGS

bfgsb

Truncated Newton

tnc

Newton CG trust-region

trust-ncg

Exact trust-region

trust-exact

Newton GLTR trust-region

trust-krylov

Constrained trust-region

trust-constr

Dogleg

dogleg

Sequential Linear Squares Programming

slsqp

Differential Evolution

differential_evolution

Brute force method

brute

Basinhopping

basinhopping

Adaptive Memory Programming for Global Optimization

ampgo

Simplicial Homology Global Optimization

shgo

Dual Annealing

dual_annealing

Maximum likelihood via Monte-Carlo Markov Chain

emcee

Note

The objective function for the Levenberg-Marquardt method must return an array, with more elements than variables. All other methods can return either a scalar value or an array. The Monte-Carlo Markov Chain or emcee method has two different operating methods when the objective function returns a scalar value. See the documentation for emcee.

Warning

Much of this documentation assumes that the Levenberg-Marquardt (leastsq) method is used. Many of the fit statistics and estimates for uncertainties in parameters discussed in MinimizerResult – the optimization result are done only unconditionally for this (and the least_squares) method. Lmfit versions newer than 0.9.11 provide the capability to use numdifftools to estimate the covariance matrix and calculate parameter uncertainties and correlations for other methods as well.

MinimizerResult – the optimization result

An optimization with minimize() or Minimizer.minimize() will return a MinimizerResult object. This is an otherwise plain container object (that is, with no methods of its own) that simply holds the results of the minimization. These results will include several pieces of informational data such as status and error messages, fit statistics, and the updated parameters themselves.

Importantly, the parameters passed in to Minimizer.minimize() will be not be changed. To find the best-fit values, uncertainties and so on for each parameter, one must use the MinimizerResult.params attribute. For example, to print the fitted values, bounds and other parameter attributes in a well-formatted text tables you can execute:

result.params.pretty_print()

with results being a MinimizerResult object. Note that the method pretty_print() accepts several arguments for customizing the output (e.g., column width, numeric format, etcetera).

class MinimizerResult(**kws)

The results of a minimization.

Minimization results include data such as status and error messages, fit statistics, and the updated (i.e., best-fit) parameters themselves in the params attribute.

The list of (possible) MinimizerResult attributes is given below:

residual

Residual array \({\rm Resid_i}\). Return value of the objective function when using the best-fit values of the parameters.

Type:

numpy.ndarray

params

The best-fit Parameters resulting from the fit.

Type:

Parameters

uvars

Dictionary of uncertainties ufloats from Parameters

Type:

dict

var_names

list of variable Parameter names used in optimization in the same order as the values in init_vals and covar.

Type:

list

covar

Covariance matrix from minimization, with rows and columns corresponding to var_names. If uncertainties cannot be determined, this value will be None.

Type:

numpy.ndarray or None

init_vals

List of initial values for variable parameters using var_names.

Type:

list

init_values

Dictionary of initial values for variable parameters.

Type:

dict

aborted

Whether the fit was aborted.

Type:

bool

status

Termination status of the optimizer. Its value depends on the underlying solver. Refer to message for details.

Type:

int

success

True if the fit succeeded, otherwise False. This is an optimistic view of success, meaning that the method finished without error.

Type:

bool

errorbars

whether uncertainties were estimated for variable Parameters.

Type:

bool

message

Message about fit success.

Type:

str

ier

Integer error value from scipy.optimize.leastsq (‘leastsq’ method only).

Type:

int

lmdif_message

Message from scipy.optimize.leastsq (‘leastsq’ method only).

Type:

str

call_kws

Keyword arguments sent to underlying solver.

Type:

dict

flatchain

A flatchain view of the sampling chain from the emcee method.

Type:

pandas.DataFrame

nfev

Number of function evaluations.

Type:

int

nvarys

Number of variables in fit: \(N_{\rm varys}\).

Type:

int

ndata

Number of data points: \(N\).

Type:

int

nfree

Degrees of freedom in fit: \(N - N_{\rm varys}\).

Type:

int

chisqr

Chi-square: \(\chi^2 = \sum_i^N [{\rm Resid}_i]^2\).

Type:

float

redchi

Reduced chi-square: \(\chi^2_{\nu}= {\chi^2} / {(N - N_{\rm varys})}\).

Type:

float

aic

Akaike Information Criterion statistic: \(N \ln(\chi^2/N) + 2 N_{\rm varys}\).

Type:

float

bic

Bayesian Information Criterion statistic: \(N \ln(\chi^2/N) + \ln(N) N_{\rm varys}\).

Type:

float

show_candidates()

pretty_print() representation of candidates from the brute fitting method.

Goodness-of-Fit Statistics

Table of Fit Results: These values, including the standard Goodness-of-Fit statistics, are all attributes of the MinimizerResult object returned by minimize() or Minimizer.minimize().

Attribute Name

Description / Formula

nfev

number of function evaluations

nvarys

number of variables in fit \(N_{\rm varys}\)

ndata

number of data points: \(N\)

nfree

degrees of freedom in fit: \(N - N_{\rm varys}\)

aborted

boolean of whether the fit has been aborted.

success

boolean for a minimal test of whether the fit finished successfully

errorbars

boolean of whether error bars and unccertainty were estimated

ier

integer flag describing message from leastsq.

message

simple message from leastsq

method

name of fitting methods

residual

residual array, returned by the objective function: \(\{\rm Resid_i\}\)

chisqr

chi-square: \(\chi^2 = \sum_i^N [{\rm Resid}_i]^2\)

redchi

reduced chi-square: \(\chi^2_{\nu}= {\chi^2} / {(N - N_{\rm varys})}\)

aic

Akaike Information Criterion statistic (see below)

bic

Bayesian Information Criterion statistic (see below)

params

best-fit parameters after fit, with uncertainties is available

var_names

ordered list of variable parameter names used for init_vals and covar

covar

covariance matrix (with rows/columns using var_names)

init_vals

list of initial values for variable parameters

init_values

dictionary of initial values for variable Parameters.

uvars

dictionary of uncertainties uvalues for all Parameters.

call_kws

dict of keyword arguments sent to underlying solver

Note that the calculation of chi-square and reduced chi-square assume that the returned residual function is scaled properly to the uncertainties in the data. For these statistics to be meaningful, the person writing the function to be minimized must scale them properly.

After a fit using the leastsq() or least_squares() method has completed successfully, standard errors for the fitted variables and correlations between pairs of fitted variables are automatically calculated from the covariance matrix. For other methods, the calc_covar parameter (default is True) in the Minimizer class determines whether or not to use the numdifftools package to estimate the covariance matrix. The standard error (estimated \(1\sigma\) error-bar) goes into the stderr attribute of the Parameter. The correlations with all other variables will be put into the correl attribute of the Parameter – a dictionary with keys for all other Parameters and values of the corresponding correlation.

In some cases, it may not be possible to estimate the errors and correlations. For example, if a variable actually has no practical effect on the fit, it will likely cause the covariance matrix to be singular, making standard errors impossible to estimate. Placing bounds on varied Parameters makes it more likely that errors cannot be estimated, as being near the maximum or minimum value makes the covariance matrix singular. In these cases, the errorbars attribute of the fit result (Minimizer object) will be False.

Akaike and Bayesian Information Criteria

The MinimizerResult includes the traditional chi-square and reduced chi-square statistics:

\begin{eqnarray*} \chi^2 &=& \sum_i^N r_i^2 \\ \chi^2_\nu &=& \chi^2 / (N-N_{\rm varys}) \end{eqnarray*}

where \(r\) is the residual array returned by the objective function (likely to be (data-model)/uncertainty for data modeling usages), \(N\) is the number of data points (ndata), and \(N_{\rm varys}\) is number of variable parameters.

Also included are the Akaike Information Criterion, and Bayesian Information Criterion statistics, held in the aic and bic attributes, respectively. These give slightly different measures of the relative quality for a fit, trying to balance quality of fit with the number of variable parameters used in the fit. These are calculated as:

\begin{eqnarray*} {\rm aic} &=& N \ln(\chi^2/N) + 2 N_{\rm varys} \\ {\rm bic} &=& N \ln(\chi^2/N) + \ln(N) N_{\rm varys} \\ \end{eqnarray*}

When comparing fits with different numbers of varying parameters, one typically selects the model with lowest reduced chi-square, Akaike information criterion, and/or Bayesian information criterion. Generally, the Bayesian information criterion is considered the most conservative of these statistics.

Uncertainties in Variable Parameters, and their Correlations

As mentioned above, when a fit is complete the uncertainties for fitted Parameters as well as the correlations between pairs of Parameters are usually calculated. This happens automatically either when using the default leastsq() method, the least_squares() method, or for most other fitting methods if the highly-recommended numdifftools package is available. The estimated standard error (the \(1\sigma\) uncertainty) for each variable Parameter will be contained in the stderr, while the correl attribute for each Parameter will contain a dictionary of the correlation with each other variable Parameter. These updated parameters with uncertainty and correlation information will be placed in MinimizerResult.params, so that you may access the best fit value, standard error and correlation. For a successful fit for which uncertainties and correlations can be calculated, the MinimizerResult will also have a uvars attribute that is a dictionary with keynames for each Parameter (includnig constraints) and values of Ufloats from the uncertainties package using the best fit values, the standard error and the correlation between Parameters.

These estimates of the uncertainties are done by inverting the Hessian matrix which represents the second derivative of fit quality for each variable parameter. There are situations for which the uncertainties cannot be estimated, which generally indicates that this matrix cannot be inverted because one of the fit is not actually sensitive to one of the variables. This can happen if a Parameter is stuck at an upper or lower bound, if the variable is simply not used by the fit, or if the value for the variable is such that it has no real influence on the fit.

In principle, the scale of the uncertainties in the Parameters is closely tied to the goodness-of-fit statistics chi-square and reduced chi-square (chisqr and redchi). The standard errors or \(1 \sigma\) uncertainties are those that increase chi-square by 1. Since a “good fit” should have redchi of around 1, this requires that the data uncertainties (and to some extent the sampling of the N data points) is correct. Unfortunately, it is often not the case that one has high-quality estimates of the data uncertainties (getting the data is hard enough!). Because of this common situation, the uncertainties reported and held in stderr are not those that increase chi-square by 1, but those that increase chi-square by reduced chi-square. This is equivalent to rescaling the uncertainty in the data such that reduced chi-square would be 1. To be clear, this rescaling is done by default because if reduced chi-square is far from 1, this rescaling often makes the reported uncertainties sensible, and if reduced chi-square is near 1 it does little harm. If you have good scaling of the data uncertainty and believe the scale of the residual array is correct, this automatic rescaling can be turned off using scale_covar=False.

Note that the simple (and fast!) approach to estimating uncertainties and correlations by inverting the second derivative matrix assumes that the components of the residual array (if, indeed, an array is used) are distributed around 0 with a normal (Gaussian distribution), and that a map of probability distributions for pairs would be elliptical – the size of the of ellipse gives the uncertainty itself and the eccentricity of the ellipse gives the correlation. This simple approach to assessing uncertainties ignores outliers, highly asymmetric uncertainties, or complex correlations between Parameters. In fact, it is not too hard to come up with problems where such effects are important. Our experience is that the automated results are usually the right scale and quite reasonable as initial estimates, but a more thorough exploration of the Parameter space using the tools described in Minimizer.emcee() - calculating the posterior probability distribution of parameters and An advanced example for evaluating confidence intervals can give a more complete understanding of the distributions and relations between Parameters.

Getting and Printing Fit Reports

fit_report(inpars, modelpars=None, show_correl=True, min_correl=0.1, sort_pars=False, correl_mode='list')

Generate a report of the fitting results.

The report contains the best-fit values for the parameters and their uncertainties and correlations.

Parameters:
  • inpars (Parameters) – Input Parameters from fit or MinimizerResult returned from a fit.

  • modelpars (Parameters, optional) – Known Model Parameters.

  • show_correl (bool, optional) – Whether to show list of sorted correlations (default is True).

  • min_correl (float, optional) – Smallest correlation in absolute value to show (default is 0.1).

  • sort_pars (bool or callable, optional) – Whether to show parameter names sorted in alphanumerical order. If False (default), then the parameters will be listed in the order they were added to the Parameters dictionary. If callable, then this (one argument) function is used to extract a comparison key from each list element.

  • correl_mode ({'list', table'} str, optional) – Mode for how to show correlations. Can be either ‘list’ (default) to show a sorted (if sort_pars is True) list of correlation values, or ‘table’ to show a complete, formatted table of correlations.

Returns:

Multi-line text of fit report.

Return type:

str

An example using this to write out a fit report would be:

# <examples/doc_fitting_withreport.py>
from numpy import exp, linspace, pi, random, sign, sin

from lmfit import create_params, fit_report, minimize

p_true = create_params(amp=14.0, period=5.46, shift=0.123, decay=0.032)


def residual(pars, x, data=None):
    """Model a decaying sine wave and subtract data."""
    vals = pars.valuesdict()
    amp = vals['amp']
    per = vals['period']
    shift = vals['shift']
    decay = vals['decay']

    if abs(shift) > pi/2:
        shift = shift - sign(shift)*pi
    model = amp * sin(shift + x/per) * exp(-x*x*decay*decay)
    if data is None:
        return model
    return model - data


random.seed(0)
x = linspace(0.0, 250., 1001)
noise = random.normal(scale=0.7215, size=x.size)
data = residual(p_true, x) + noise

fit_params = create_params(amp=13, period=2, shift=0, decay=0.02)

out = minimize(residual, fit_params, args=(x,), kws={'data': data})

print(fit_report(out))
# <end examples/doc_fitting_withreport.py>

which would give as output:

[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 83
    # data points      = 1001
    # variables        = 4
    chi-square         = 498.811759
    reduced chi-square = 0.50031270
    Akaike info crit   = -689.222517
    Bayesian info crit = -669.587497
[[Variables]]
    amp:     13.9121959 +/- 0.14120321 (1.01%) (init = 13)
    period:  5.48507038 +/- 0.02666520 (0.49%) (init = 2)
    shift:   0.16203673 +/- 0.01405662 (8.67%) (init = 0)
    decay:   0.03264539 +/- 3.8015e-04 (1.16%) (init = 0.02)
[[Correlations]] (unreported correlations are < 0.100)
    C(period, shift) = +0.7974
    C(amp, decay)    = +0.5816
    C(amp, shift)    = -0.2966
    C(amp, period)   = -0.2432
    C(shift, decay)  = -0.1819
    C(period, decay) = -0.1496

To be clear, you can get at all of these values from the fit result out and out.params. For example, a crude printout of the best fit variables and standard errors could be done as

print('-------------------------------')
print('Parameter    Value       Stderr')
for name, param in out.params.items():
    print(f'{name:7s} {param.value:11.5f} {param.stderr:11.5f}')
-------------------------------
Parameter    Value       Stderr
amp        13.91220     0.14120
period      5.48507     0.02667
shift       0.16204     0.01406
decay       0.03265     0.00038

Using a Iteration Callback Function

An iteration callback function is a function to be called at each iteration, just after the objective function is called. The iteration callback allows user-supplied code to be run at each iteration, and can be used to abort a fit.

iter_cb(params, iter, resid, *args, **kws):

User-supplied function to be run at each iteration.

Parameters:
  • params (Parameters) – Parameters.

  • iter (int) – Iteration number.

  • resid (numpy.ndarray) – Residual array.

  • args – Positional arguments. Must match args argument to minimize()

  • kws – Keyword arguments. Must match kws argument to minimize()

Returns:

Iteration abort flag.

Return type:

None for normal behavior, any value like True to abort the fit.

Normally, the iteration callback would have no return value or return None. To abort a fit, have this function return a value that is True (including any non-zero integer). The fit will also abort if any exception is raised in the iteration callback. When a fit is aborted this way, the parameters will have the values from the last iteration. The fit statistics are not likely to be meaningful, and uncertainties will not be computed.

Using the Minimizer class

For full control of the fitting process, you will want to create a Minimizer object.

class Minimizer(userfcn, params, fcn_args=None, fcn_kws=None, iter_cb=None, scale_covar=True, nan_policy='raise', reduce_fcn=None, calc_covar=True, max_nfev=None, **kws)

A general minimizer for curve fitting and optimization.

Parameters:
  • userfcn (callable) –

    Objective function that returns the residual (difference between model and data) to be minimized in a least-squares sense. This function must have the signature:

    userfcn(params, *fcn_args, **fcn_kws)
    

  • params (Parameters) – Contains the Parameters for the model.

  • fcn_args (tuple, optional) – Positional arguments to pass to userfcn.

  • fcn_kws (dict, optional) – Keyword arguments to pass to userfcn.

  • iter_cb (callable, optional) –

    Function to be called at each fit iteration. This function should have the signature:

    iter_cb(params, iter, resid, *fcn_args, **fcn_kws)
    

    where params will have the current parameter values, iter the iteration number, resid the current residual array, and *fcn_args and **fcn_kws are passed to the objective function.

  • scale_covar (bool, optional) – Whether to automatically scale the covariance matrix (default is True).

  • nan_policy ({'raise', 'propagate', 'omit}, optional) –

    Specifies action if userfcn (or a Jacobian) returns NaN values. One of:

    • ’raise’ : a ValueError is raised (default)

    • ’propagate’ : the values returned from userfcn are un-altered

    • ’omit’ : non-finite values are filtered

  • reduce_fcn (str or callable, optional) –

    Function to convert a residual array to a scalar value for the scalar minimizers. Optional values are (where r is the residual array):

    • None : sum-of-squares of residual (default)

      = (r*r).sum()

    • ’negentropy’ : neg entropy, using normal distribution

      = rho*log(rho).sum()`, where rho = exp(-r*r/2)/(sqrt(2*pi))

    • ’neglogcauchy’ : neg log likelihood, using Cauchy distribution

      = -log(1/(pi*(1+r*r))).sum()

    • callable : must take one argument (r) and return a float.

  • calc_covar (bool, optional) – Whether to calculate the covariance matrix (default is True) for solvers other than 'leastsq' and 'least_squares'. Requires the numdifftools package to be installed.

  • max_nfev (int or None, optional) – Maximum number of function evaluations (default is None). The default value depends on the fitting method.

  • **kws (dict, optional) – Options to pass to the minimizer being used.

Notes

The objective function should return the value to be minimized. For the Levenberg-Marquardt algorithm from leastsq() or least_squares(), this returned value must be an array, with a length greater than or equal to the number of fitting variables in the model. For the other methods, the return value can either be a scalar or an array. If an array is returned, the sum-of-squares of the array will be sent to the underlying fitting method, effectively doing a least-squares optimization of the return values. If the objective function returns non-finite values then a ValueError will be raised because the underlying solvers cannot deal with them.

A common use for the fcn_args and fcn_kws would be to pass in other data needed to calculate the residual, including such things as the data array, dependent variable, uncertainties in the data, and other data structures for the model calculation.

The Minimizer object has a few public methods:

Minimizer.minimize(method='leastsq', params=None, **kws)

Perform the minimization.

Parameters:
  • method (str, optional) –

    Name of the fitting method to use. Valid values are:

    • ’leastsq’: Levenberg-Marquardt (default)

    • ’least_squares’: Least-Squares minimization, using Trust Region Reflective method

    • ’differential_evolution’: differential evolution

    • ’brute’: brute force method

    • ’basinhopping’: basinhopping

    • ’ampgo’: Adaptive Memory Programming for Global Optimization

    • ’nelder’: Nelder-Mead

    • ’lbfgsb’: L-BFGS-B

    • ’powell’: Powell

    • ’cg’: Conjugate-Gradient

    • ’newton’: Newton-CG

    • ’cobyla’: Cobyla

    • ’bfgs’: BFGS

    • ’tnc’: Truncated Newton

    • ’trust-ncg’: Newton-CG trust-region

    • ’trust-exact’: nearly exact trust-region

    • ’trust-krylov’: Newton GLTR trust-region

    • ’trust-constr’: trust-region for constrained optimization

    • ’dogleg’: Dog-leg trust-region

    • ’slsqp’: Sequential Linear Squares Programming

    • ’emcee’: Maximum likelihood via Monte-Carlo Markov Chain

    • ’shgo’: Simplicial Homology Global Optimization

    • ’dual_annealing’: Dual Annealing optimization

    In most cases, these methods wrap and use the method with the same name from scipy.optimize, or use scipy.optimize.minimize with the same method argument. Thus ‘leastsq’ will use scipy.optimize.leastsq, while ‘powell’ will use scipy.optimize.minimizer(…, method=’powell’).

    For more details on the fitting methods please refer to the SciPy documentation.

  • params (Parameters, optional) – Parameters of the model to use as starting values.

  • **kws (optional) – Additional arguments are passed to the underlying minimization method.

Returns:

Object containing the optimized parameters and several goodness-of-fit statistics.

Return type:

MinimizerResult

Changed in version 0.9.0: Return value changed to MinimizerResult.

Minimizer.leastsq(params=None, max_nfev=None, **kws)

Use Levenberg-Marquardt minimization to perform a fit.

It assumes that the input Parameters have been initialized, and a function to minimize has been properly set up. When possible, this calculates the estimated uncertainties and variable correlations from the covariance matrix.

This method calls scipy.optimize.leastsq and, by default, numerical derivatives are used.

Parameters:
  • params (Parameters, optional) – Parameters to use as starting point.

  • max_nfev (int or None, optional) – Maximum number of function evaluations. Defaults to 2000*(nvars+1), where nvars is the number of variable parameters.

  • **kws (dict, optional) – Minimizer options to pass to scipy.optimize.leastsq.

Returns:

Object containing the optimized parameters and several goodness-of-fit statistics.

Return type:

MinimizerResult

Changed in version 0.9.0: Return value changed to MinimizerResult.

Minimizer.least_squares(params=None, max_nfev=None, **kws)

Least-squares minimization using scipy.optimize.least_squares.

This method wraps scipy.optimize.least_squares, which has built-in support for bounds and robust loss functions. By default it uses the Trust Region Reflective algorithm with a linear loss function (i.e., the standard least-squares problem).

Parameters:
  • params (Parameters, optional) – Parameters to use as starting point.

  • max_nfev (int or None, optional) – Maximum number of function evaluations. Defaults to 2000*(nvars+1), where nvars is the number of variable parameters.

  • **kws (dict, optional) – Minimizer options to pass to scipy.optimize.least_squares.

Returns:

Object containing the optimized parameters and several goodness-of-fit statistics.

Return type:

MinimizerResult

Changed in version 0.9.0: Return value changed to MinimizerResult.

Minimizer.scalar_minimize(method='Nelder-Mead', params=None, max_nfev=None, **kws)

Scalar minimization using scipy.optimize.minimize.

Perform fit with any of the scalar minimization algorithms supported by scipy.optimize.minimize. Default argument values are:

scalar_minimize() arg

Default Value

Description

method

‘Nelder-Mead’

fitting method

tol

1.e-7

fitting and parameter tolerance

hess

None

Hessian of objective function

Parameters:
  • method (str, optional) –

    Name of the fitting method to use. One of:

    • ’Nelder-Mead’ (default)

    • ’L-BFGS-B’

    • ’Powell’

    • ’CG’

    • ’Newton-CG’

    • ’COBYLA’

    • ’BFGS’

    • ’TNC’

    • ’trust-ncg’

    • ’trust-exact’

    • ’trust-krylov’

    • ’trust-constr’

    • ’dogleg’

    • ’SLSQP’

    • ’differential_evolution’

  • params (Parameters, optional) – Parameters to use as starting point.

  • max_nfev (int or None, optional) – Maximum number of function evaluations. Defaults to 2000*(nvars+1), where nvars is the number of variable parameters.

  • **kws (dict, optional) – Minimizer options pass to scipy.optimize.minimize.

Returns:

Object containing the optimized parameters and several goodness-of-fit statistics.

Return type:

MinimizerResult

Changed in version 0.9.0: Return value changed to MinimizerResult.

Notes

If the objective function returns a NumPy array instead of the expected scalar, the sum-of-squares of the array will be used.

Note that bounds and constraints can be set on Parameters for any of these methods, so are not supported separately for those designed to use bounds. However, if you use the differential_evolution method you must specify finite (min, max) for each varying Parameter.

Minimizer.prepare_fit(params=None)

Prepare parameters for fitting.

Prepares and initializes model and Parameters for subsequent fitting. This routine prepares the conversion of Parameters into fit variables, organizes parameter bounds, and parses, “compiles” and checks constrain expressions. The method also creates and returns a new instance of a MinimizerResult object that contains the copy of the Parameters that will actually be varied in the fit.

Parameters:

params (Parameters, optional) – Contains the Parameters for the model; if None, then the Parameters used to initialize the Minimizer object are used.

Return type:

MinimizerResult

Notes

This method is called directly by the fitting methods, and it is generally not necessary to call this function explicitly.

Changed in version 0.9.0: Return value changed to MinimizerResult.

Minimizer.brute(params=None, Ns=20, keep=50, workers=1, max_nfev=None)

Use the brute method to find the global minimum of a function.

The following parameters are passed to scipy.optimize.brute and cannot be changed:

brute() arg

Value

Description

full_output

1

Return the evaluation grid and the objective function’s values on it.

finish

None

No “polishing” function is to be used after the grid search.

disp

False

Do not print convergence messages (when finish is not None).

It assumes that the input Parameters have been initialized, and a function to minimize has been properly set up.

Parameters:
  • params (Parameters, optional) – Contains the Parameters for the model. If None, then the Parameters used to initialize the Minimizer object are used.

  • Ns (int, optional) – Number of grid points along the axes, if not otherwise specified (see Notes).

  • keep (int, optional) – Number of best candidates from the brute force method that are stored in the candidates attribute. If ‘all’, then all grid points from scipy.optimize.brute are stored as candidates.

  • workers (int or map-like callable, optional) – For parallel evaluation of the grid (see scipy.optimize.brute for more details).

  • max_nfev (int or None, optional) – Maximum number of function evaluations (default is None). Defaults to 200000*(nvarys+1).

Returns:

Object containing the parameters from the brute force method. The return values (x0, fval, grid, Jout) from scipy.optimize.brute are stored as brute_<parname> attributes. The MinimizerResult also contains the :attr:candidates attribute and show_candidates() method. The candidates attribute contains the parameters and chisqr from the brute force method as a namedtuple, ('Candidate', ['params', 'score']) sorted on the (lowest) chisqr value. To access the values for a particular candidate one can use result.candidate[#].params or result.candidate[#].score, where a lower # represents a better candidate. The show_candidates() method uses the pretty_print() method to show a specific candidate-# or all candidates when no number is specified.

Return type:

MinimizerResult

Added in version 0.9.6.

Notes

The brute() method evaluates the function at each point of a multidimensional grid of points. The grid points are generated from the parameter ranges using Ns and (optional) brute_step. The implementation in scipy.optimize.brute requires finite bounds and the range is specified as a two-tuple (min, max) or slice-object (min, max, brute_step). A slice-object is used directly, whereas a two-tuple is converted to a slice object that interpolates Ns points from min to max, inclusive.

In addition, the brute() method in lmfit, handles three other scenarios given below with their respective slice-object:

  • lower bound (min) and brute_step are specified:

    range = (min, min + Ns * brute_step, brute_step).

  • upper bound (max) and brute_step are specified:

    range = (max - Ns * brute_step, max, brute_step).

  • numerical value (value) and brute_step are specified:

    range = (value - (Ns//2) * brute_step`, value + (Ns//2) * brute_step, brute_step).

For more information, check the examples in examples/lmfit_brute_example.ipynb.

Minimizer.basinhopping(params=None, max_nfev=None, **kws)

Use the basinhopping algorithm to find the global minimum.

This method calls scipy.optimize.basinhopping using the default arguments. The default minimizer is BFGS, but since lmfit supports parameter bounds for all minimizers, the user can choose any of the solvers present in scipy.optimize.minimize.

Parameters:
  • params (Parameters, optional) – Contains the Parameters for the model. If None, then the Parameters used to initialize the Minimizer object are used.

  • max_nfev (int or None, optional) – Maximum number of function evaluations (default is None). Defaults to 200000*(nvarys+1).

  • **kws (dict, optional) – Minimizer options to pass to scipy.optimize.basinhopping.

Returns:

Object containing the optimization results from the basinhopping algorithm.

Return type:

MinimizerResult

Added in version 0.9.10.

Minimizer.ampgo(params=None, max_nfev=None, **kws)

Find the global minimum of a multivariate function using AMPGO.

AMPGO stands for ‘Adaptive Memory Programming for Global Optimization’ and is an efficient algorithm to find the global minimum.

Parameters:
  • params (Parameters, optional) – Contains the Parameters for the model. If None, then the Parameters used to initialize the Minimizer object are used.

  • max_nfev (int, optional) – Maximum number of total function evaluations. If None (default), the optimization will stop after totaliter number of iterations (see below)..

  • **kws (dict, optional) –

    Minimizer options to pass to the ampgo algorithm, the options are listed below:

    local: str, optional
        Name of the local minimization method. Valid options
        are:
        - `'L-BFGS-B'` (default)
        - `'Nelder-Mead'`
        - `'Powell'`
        - `'TNC'`
        - `'SLSQP'`
    local_opts: dict, optional
        Options to pass to the local minimizer (default is
        None).
    maxfunevals: int, optional
        Maximum number of function evaluations. If None
        (default), the optimization will stop after
        `totaliter` number of iterations (deprecated: use
        `max_nfev` instead).
    totaliter: int, optional
        Maximum number of global iterations (default is 20).
    maxiter: int, optional
        Maximum number of `Tabu Tunneling` iterations during
        each global iteration (default is 5).
    glbtol: float, optional
        Tolerance whether or not to accept a solution after a
        tunneling phase (default is 1e-5).
    eps1: float, optional
        Constant used to define an aspiration value for the
        objective function during the Tunneling phase (default
        is 0.02).
    eps2: float, optional
        Perturbation factor used to move away from the latest
        local minimum at the start of a Tunneling phase
        (default is 0.1).
    tabulistsize: int, optional
        Size of the (circular) tabu search list (default is 5).
    tabustrategy: {'farthest', 'oldest'}, optional
        Strategy to use when the size of the tabu list exceeds
        `tabulistsize`. It can be `'oldest'` to drop the oldest
        point from the tabu list or `'farthest'` (defauilt) to
        drop the element farthest from the last local minimum
        found.
    disp: bool, optional
        Set to True to print convergence messages (default is
        False).
    

Returns:

Object containing the parameters from the ampgo method, with fit parameters, statistics and such. The return values (x0, fval, eval, msg, tunnel) are stored as ampgo_<parname> attributes.

Return type:

MinimizerResult

Added in version 0.9.10.

Notes

The Python implementation was written by Andrea Gavana in 2014 (http://infinity77.net/global_optimization/index.html).

The details of the AMPGO algorithm are described in the paper “Adaptive Memory Programming for Constrained Global Optimization” located here:

http://leeds-faculty.colorado.edu/glover/fred%20pubs/416%20-%20AMP%20(TS)%20for%20Constrained%20Global%20Opt%20w%20Lasdon%20et%20al%20.pdf

Minimizer.shgo(params=None, max_nfev=None, **kws)

Use the SHGO algorithm to find the global minimum.

SHGO stands for “simplicial homology global optimization” and calls scipy.optimize.shgo using its default arguments.

Parameters:
  • params (Parameters, optional) – Contains the Parameters for the model. If None, then the Parameters used to initialize the Minimizer object are used.

  • max_nfev (int or None, optional) – Maximum number of function evaluations. Defaults to 200000*(nvars+1), where nvars is the number of variable parameters.

  • **kws (dict, optional) – Minimizer options to pass to the SHGO algorithm.

Returns:

Object containing the parameters from the SHGO method. The return values specific to scipy.optimize.shgo (x, xl, fun, funl, nfev, nit, nlfev, nlhev, and nljev) are stored as shgo_<parname> attributes.

Return type:

MinimizerResult

Added in version 0.9.14.

Minimizer.dual_annealing(params=None, max_nfev=None, **kws)

Use the dual_annealing algorithm to find the global minimum.

This method calls scipy.optimize.dual_annealing using its default arguments.

Parameters:
  • params (Parameters, optional) – Contains the Parameters for the model. If None, then the Parameters used to initialize the Minimizer object are used.

  • max_nfev (int or None, optional) – Maximum number of function evaluations. Defaults to 200000*(nvars+1), where nvars is the number of variables.

  • **kws (dict, optional) – Minimizer options to pass to the dual_annealing algorithm.

Returns:

Object containing the parameters from the dual_annealing method. The return values specific to scipy.optimize.dual_annealing (x, fun, nfev, nhev, njev, and nit) are stored as da_<parname> attributes.

Return type:

MinimizerResult

Added in version 0.9.14.

Minimizer.emcee(params=None, steps=1000, nwalkers=100, burn=0, thin=1, ntemps=1, pos=None, reuse_sampler=False, workers=1, float_behavior='posterior', is_weighted=True, seed=None, progress=True, run_mcmc_kwargs={})

Bayesian sampling of the posterior distribution.

The method uses the emcee Markov Chain Monte Carlo package and assumes that the prior is Uniform. You need to have emcee version 3 or newer installed to use this method.

Parameters:
  • params (Parameters, optional) – Parameters to use as starting point. If this is not specified then the Parameters used to initialize the Minimizer object are used.

  • steps (int, optional) – How many samples you would like to draw from the posterior distribution for each of the walkers?

  • nwalkers (int, optional) – Should be set so \(nwalkers >> nvarys\), where nvarys are the number of parameters being varied during the fit. ‘Walkers are the members of the ensemble. They are almost like separate Metropolis-Hastings chains but, of course, the proposal distribution for a given walker depends on the positions of all the other walkers in the ensemble.’ - from the emcee webpage.

  • burn (int, optional) – Discard this many samples from the start of the sampling regime.

  • thin (int, optional) – Only accept 1 in every thin samples.

  • ntemps (int, deprecated) – ntemps has no effect.

  • pos (numpy.ndarray, optional) – Specify the initial positions for the sampler, an ndarray of shape (nwalkers, nvarys). You can also initialise using a previous chain of the same nwalkers and nvarys. Note that nvarys may be one larger than you expect it to be if your userfcn returns an array and is_weighted=False.

  • reuse_sampler (bool, optional) – Set to True if you have already run emcee with the Minimizer instance and want to continue to draw from its sampler (and so retain the chain history). If False, a new sampler is created. The keywords nwalkers, pos, and params will be ignored when this is set, as they will be set by the existing sampler. Important: the Parameters used to create the sampler must not change in-between calls to emcee. Alteration of Parameters would include changed min, max, vary and expr attributes. This may happen, for example, if you use an altered Parameters object and call the minimize method in-between calls to emcee.

  • workers (Pool-like or int, optional) – For parallelization of sampling. It can be any Pool-like object with a map method that follows the same calling sequence as the built-in map function. If int is given as the argument, then a multiprocessing-based pool is spawned internally with the corresponding number of parallel processes. ‘mpi4py’-based parallelization and ‘joblib’-based parallelization pools can also be used here. Note: because of multiprocessing overhead it may only be worth parallelising if the objective function is expensive to calculate, or if there are a large number of objective evaluations per step (nwalkers * nvarys).

  • float_behavior (str, optional) – Meaning of float (scalar) output of objective function. Use ‘posterior’ if it returns a log-posterior probability or ‘chi2’ if it returns \(\chi^2\). See Notes for further details.

  • is_weighted (bool, optional) – Has your objective function been weighted by measurement uncertainties? If is_weighted=True then your objective function is assumed to return residuals that have been divided by the true measurement uncertainty (data - model) / sigma. If is_weighted=False then the objective function is assumed to return unweighted residuals, data - model. In this case emcee will employ a positive measurement uncertainty during the sampling. This measurement uncertainty will be present in the output params and output chain with the name __lnsigma. A side effect of this is that you cannot use this parameter name yourself. Important: this parameter only has any effect if your objective function returns an array. If your objective function returns a float, then this parameter is ignored. See Notes for more details.

  • seed (int or numpy.random.RandomState, optional) – If seed is an int, a new numpy.random.RandomState instance is used, seeded with seed. If seed is already a numpy.random.RandomState instance, then that numpy.random.RandomState instance is used. Specify seed for repeatable minimizations.

  • progress (bool, optional) – Print a progress bar to the console while running.

  • run_mcmc_kwargs (dict, optional) – Additional (optional) keyword arguments that are passed to emcee.EnsembleSampler.run_mcmc.

Returns:

MinimizerResult object containing updated params, statistics, etc. The updated params represent the median of the samples, while the uncertainties are half the difference of the 15.87 and 84.13 percentiles. The MinimizerResult contains a few additional attributes: chain contain the samples and has shape ((steps - burn) // thin, nwalkers, nvarys). flatchain is a pandas.DataFrame of the flattened chain, that can be accessed with result.flatchain[parname]. lnprob contains the log probability for each sample in chain. The sample with the highest probability corresponds to the maximum likelihood estimate. acor is an array containing the auto-correlation time for each parameter if the auto-correlation time can be computed from the chain. Finally, acceptance_fraction (an array of the fraction of steps accepted for each walker).

Return type:

MinimizerResult

Notes

This method samples the posterior distribution of the parameters using Markov Chain Monte Carlo. It calculates the log-posterior probability of the model parameters, F, given the data, D, \(\ln p(F_{true} | D)\). This ‘posterior probability’ is given by:

\[\ln p(F_{true} | D) \propto \ln p(D | F_{true}) + \ln p(F_{true})\]

where \(\ln p(D | F_{true})\) is the ‘log-likelihood’ and \(\ln p(F_{true})\) is the ‘log-prior’. The default log-prior encodes prior information known about the model that the log-prior probability is -numpy.inf (impossible) if any of the parameters is outside its limits, and is zero if all the parameters are inside their bounds (uniform prior). The log-likelihood function is [1]:

\[\ln p(D|F_{true}) = -\frac{1}{2}\sum_n \left[\frac{(g_n(F_{true}) - D_n)^2}{s_n^2}+\ln (2\pi s_n^2)\right]\]

The first term represents the residual (\(g\) being the generative model, \(D_n\) the data and \(s_n\) the measurement uncertainty). This gives \(\chi^2\) when summed over all data points. The objective function may also return the log-posterior probability, \(\ln p(F_{true} | D)\). Since the default log-prior term is zero, the objective function can also just return the log-likelihood, unless you wish to create a non-uniform prior.

If the objective function returns a float value, this is assumed by default to be the log-posterior probability, (float_behavior default is ‘posterior’). If your objective function returns \(\chi^2\), then you should use float_behavior='chi2' instead.

By default objective functions may return an ndarray of (possibly weighted) residuals. In this case, use is_weighted to select whether these are correctly weighted by measurement uncertainty. Note that this ignores the second term above, so that to calculate a correct log-posterior probability value your objective function should return a float value. With is_weighted=False the data uncertainty, s_n, will be treated as a nuisance parameter to be marginalized out. This uses strictly positive uncertainty (homoscedasticity) for each data point, \(s_n = \exp(\rm{\_\_lnsigma})\). __lnsigma will be present in MinimizerResult.params, as well as Minimizer.chain and nvarys will be increased by one.

References

Minimizer.emcee() - calculating the posterior probability distribution of parameters

Minimizer.emcee() can be used to obtain the posterior probability distribution of parameters, given a set of experimental data. Note that this method does not actually perform a fit at all. Instead, it explores parameter space to determine the probability distributions for the parameters, but without an explicit goal of attempting to refine the solution. It should not be used for fitting, but it is a useful method to to more thoroughly explore the parameter space around the solution after a fit has been done and thereby get an improved understanding of the probability distribution for the parameters. It may be able to refine your estimate of the most likely values for a set of parameters, but it will not iteratively find a good solution to the minimization problem. To use this method effectively, you should first use another minimization method and then use this method to explore the parameter space around those best-fit values.

To illustrate this, we’ll use an example problem of fitting data to function of a double exponential decay, including a modest amount of Gaussian noise to the data. Note that this example is the same problem used in An advanced example for evaluating confidence intervals for evaluating confidence intervals in the parameters, which is a similar goal to the one here.

import matplotlib.pyplot as plt
import numpy as np

import lmfit

x = np.linspace(1, 10, 250)
np.random.seed(0)
y = 3.0 * np.exp(-x / 2) - 5.0 * np.exp(-(x - 0.1) / 10.) + 0.1 * np.random.randn(x.size)

Create a Parameter set for the initial guesses:

p = lmfit.Parameters()
p.add_many(('a1', 4.), ('a2', 4.), ('t1', 3.), ('t2', 3., True))

def residual(p):
    v = p.valuesdict()
    return v['a1'] * np.exp(-x / v['t1']) + v['a2'] * np.exp(-(x - 0.1) / v['t2']) - y

Solving with minimize() gives the Maximum Likelihood solution. Note that we use the robust Nelder-Mead method here. The default Levenberg-Marquardt method seems to have difficulty with exponential decays, though it can refine the solution if starting near the solution:

mi = lmfit.minimize(residual, p, method='nelder', nan_policy='omit')
lmfit.printfuncs.report_fit(mi.params, min_correl=0.5)
[[Variables]]
    a1:  2.98623689 +/- 0.15010519 (5.03%) (init = 4)
    a2: -4.33525597 +/- 0.11765824 (2.71%) (init = 4)
    t1:  1.30993186 +/- 0.13449656 (10.27%) (init = 3)
    t2:  11.8240752 +/- 0.47172610 (3.99%) (init = 3)
[[Correlations]] (unreported correlations are < 0.500)
    C(a2, t2) = +0.9876
    C(a2, t1) = -0.9278
    C(t1, t2) = -0.8852
    C(a1, t1) = -0.6093

and plotting the fit using the Maximum Likelihood solution gives the graph below:

plt.plot(x, y, 'o')
plt.plot(x, residual(mi.params) + y, label='best fit')
plt.legend()
plt.show()
_images/fitting_12_0.svg

Note that the fit here (for which the numdifftools package is installed) does estimate and report uncertainties in the parameters and correlations for the parameters, and reports the correlation of parameters a2 and t2 to be very high. As we’ll see, these estimates are pretty good, but when faced with such high correlation, it can be helpful to get the full probability distribution for the parameters. MCMC methods are very good for this.

Furthermore, we wish to deal with the data uncertainty. This is called marginalisation of a nuisance parameter. emcee requires a function that returns the log-posterior probability. The log-posterior probability is a sum of the log-prior probability and log-likelihood functions. The log-prior probability is assumed to be zero if all the parameters are within their bounds and -np.inf if any of the parameters are outside their bounds.

If the objective function returns an array of unweighted residuals (i.e., data-model) as is the case here, you can use is_weighted=False as an argument for emcee. In that case, emcee will automatically add/use the __lnsigma parameter to estimate the true uncertainty in the data. To place boundaries on this parameter one can do:

mi.params.add('__lnsigma', value=np.log(0.1), min=np.log(0.001), max=np.log(2))

Now we have to set up the minimizer and do the sampling (again, just to be clear, this is not doing a fit):

res = lmfit.minimize(residual, method='emcee', nan_policy='omit', burn=300, steps=1000, thin=20,
                     params=mi.params, is_weighted=False, progress=False)

As mentioned in the Notes for Minimizer.emcee(), the is_weighted argument will be ignored if your objective function returns a float instead of an array. For the documentation we set progress=False; the default is to print a progress bar to the Terminal if the tqdm package is installed.

The success of the method (i.e., whether or not the sampling went well) can be assessed by checking the integrated autocorrelation time and/or the acceptance fraction of the walkers. For this specific example the autocorrelation time could not be estimated because the “chain is too short”. Instead, we plot the acceptance fraction per walker and its mean value suggests that the sampling worked as intended (as a rule of thumb the value should be between 0.2 and 0.5).

plt.plot(res.acceptance_fraction, 'o')
plt.xlabel('walker')
plt.ylabel('acceptance fraction')
plt.show()
_images/fitting_15_0.svg

With the results from emcee, we can visualize the posterior distributions for the parameters using the corner package:

import corner

emcee_plot = corner.corner(res.flatchain, labels=res.var_names,
                           truths=list(res.params.valuesdict().values()))
_images/fitting_16_0.svg

The values reported in the MinimizerResult are the medians of the probability distributions and a 1 \(\sigma\) quantile, estimated as half the difference between the 15.8 and 84.2 percentiles. Printing these values:

print('median of posterior probability distribution')
print('--------------------------------------------')
lmfit.report_fit(res.params)
median of posterior probability distribution
--------------------------------------------
[[Variables]]
    a1:         2.98945718 +/- 0.14033921 (4.69%) (init = 2.986237)
    a2:        -4.34687243 +/- 0.12131092 (2.79%) (init = -4.335256)
    t1:         1.32883916 +/- 0.13766047 (10.36%) (init = 1.309932)
    t2:         11.7836194 +/- 0.47719763 (4.05%) (init = 11.82408)
    __lnsigma: -2.32559226 +/- 0.04542650 (1.95%) (init = -2.302585)
[[Correlations]] (unreported correlations are < 0.100)
    C(a2, t2) = +0.9811
    C(a2, t1) = -0.9377
    C(t1, t2) = -0.8943
    C(a1, t1) = -0.5076
    C(a1, a2) = +0.2140
    C(a1, t2) = +0.1777

You can see that this recovered the right uncertainty level on the data. Note that these values agree pretty well with the results, uncertainties and correlations found by the fit and using numdifftools to estimate the covariance matrix. That is, even though the parameters a2, t1, and t2 are all highly correlated and do not display perfectly Gaussian probability distributions, the probability distributions found by explicitly sampling the parameter space are not so far from elliptical as to make the simple (and much faster) estimates from inverting the covariance matrix completely invalid.

As mentioned above, the result from emcee reports the median values, which are not necessarily the same as the Maximum Likelihood Estimate. To obtain the values for the Maximum Likelihood Estimation (MLE) we find the location in the chain with the highest probability:

highest_prob = np.argmax(res.lnprob)
hp_loc = np.unravel_index(highest_prob, res.lnprob.shape)
mle_soln = res.chain[hp_loc]
for i, par in enumerate(p):
    p[par].value = mle_soln[i]


print('\nMaximum Likelihood Estimation from emcee       ')
print('-------------------------------------------------')
print('Parameter  MLE Value   Median Value   Uncertainty')
fmt = '  {:5s}  {:11.5f} {:11.5f}   {:11.5f}'.format
for name, param in p.items():
    print(fmt(name, param.value, res.params[name].value,
              res.params[name].stderr))

Maximum Likelihood Estimation from emcee       
-------------------------------------------------
Parameter  MLE Value   Median Value   Uncertainty
  a1         2.93839     2.98946       0.14034
  a2        -4.35274    -4.34687       0.12131
  t1         1.34310     1.32884       0.13766
  t2        11.78782    11.78362       0.47720

Here the difference between MLE and median value are seen to be below 0.5%, and well within the estimated 1-\(\sigma\) uncertainty.

Finally, we can use the samples from emcee to work out the 1- and 2-\(\sigma\) error estimates.

print('\nError estimates from emcee:')
print('------------------------------------------------------')
print('Parameter  -2sigma  -1sigma   median  +1sigma  +2sigma')

for name in p.keys():
    quantiles = np.percentile(res.flatchain[name],
                              [2.275, 15.865, 50, 84.135, 97.275])
    median = quantiles[2]
    err_m2 = quantiles[0] - median
    err_m1 = quantiles[1] - median
    err_p1 = quantiles[3] - median
    err_p2 = quantiles[4] - median
    fmt = '  {:5s}   {:8.4f} {:8.4f} {:8.4f} {:8.4f} {:8.4f}'.format
    print(fmt(name, err_m2, err_m1, median, err_p1, err_p2))

Error estimates from emcee:
------------------------------------------------------
Parameter  -2sigma  -1sigma   median  +1sigma  +2sigma
  a1       -0.2656  -0.1362   2.9895   0.1445   0.3141
  a2       -0.3209  -0.1309  -4.3469   0.1118   0.1985
  t1       -0.2377  -0.1305   1.3288   0.1448   0.3278
  t2       -1.0677  -0.4807  11.7836   0.4739   0.8990

And we see that the initial estimates for the 1-\(\sigma\) standard error using numdifftools was not too bad. We’ll return to this example problem in An advanced example for evaluating confidence intervals and use a different method to calculate the 1- and 2-\(\sigma\) error bars.