In-depth Model Fitting

This notebook demonstrates how calibration models can be fitted with the helper function calibr8.fit_scipy.

[29]:

from matplotlib import pyplot
import numpy
import pandas
import scipy.stats

import calibr8

Generating fake data

For the purpose of this notebook, a synthetic dataset is sufficient. The parameter vector of the ground truth model consists of two parts: \(\theta_µ\) and \(\theta_\sigma\).

[30]:

θ_mu = (0.1, 4, 15, 0.3, 1) # 5 parameters of the asymmetric logistic
θ_scale = (0.05,)           # 1 parameter for constant scale
# concatenate all the parameter into one vector:
θ_true = θ_mu + θ_scale

# generate 24 calibration points with normal distributed noise
X = numpy.linspace(0.1, 30, 24)
Y = scipy.stats.norm.rvs(
    loc=calibr8.asymmetric_logistic(X, θ_mu),
    scale=calibr8.polynomial(X, θ_scale),
)

fig, ax = pyplot.subplots(dpi=120)

# plot the noise band of the ground truth
X_dense = numpy.linspace(0, 30, 100)
calibr8.plot_norm_band(
    ax,
    X_dense,
    mu=calibr8.asymmetric_logistic(X_dense, θ_mu),
    scale=calibr8.polynomial(X_dense, θ_scale),
)

ax.scatter(X, Y, label='$Y_{obs}$')

ax.set_xlabel('independent value')
ax.set_ylabel('dependent value')
ax.legend()
pyplot.show()

_images/Advanced_In-depth_Model_Fitting_3_0.png

Setting up the model

Our data follows an asymmetric logistic ground truth with normal-distributed observation noise. The noise sigma is constant, so it can be modeled with a 0th-degree polynomial.

calibr8 already implements three commonly used model types that can be inherited and customized: + calibr8.BaseModelN implements likelihood and inference methods for normal-distributed noise models + calibr8.BasePolynomialModelN models mu and sigma as polynomials + calibr8.BaseAsymmetricLogisticModelN models mu as asymmetric logistic and sigma as polynomial The degress of the polynomials in BasePolynomialModelN and BaseAsymmetricLogisticModelN are user-defined.

[31]:

class ToyAsymmetricModelV1(calibr8.BaseAsymmetricLogisticN):
    def __init__(self):
        super().__init__(
            independent_key='concentration_mM',
            dependent_key='absorbance_au',
            sigma_degree=0
        )

Fitting

Here we will not only fit the model, but also compare the two available fitting methods.

We begin by making initial guesses about \(\theta\) and the bounds of the parameter space.

The best way to make initial guesses is to scatter-plot the raw data and visually guess the limits, inflection point and slope. We already have a scatter plot above.

For more intuition about the parameters of the asymmetric logistic function, see the “Basic_GeneralizedLogistic” notebook that has an interactive plot where you can manipulate the parameters.

[32]:

guess = [
    0,   # lower limit approximately at 0 ?
    4.5, # upper limit maybe around 4.5 ?
    12,  # inflection point maybe at arond 12 ?
    1/5, # approximate slope at inflection point
    2,   # asymmetry paramter: the inflection point is near the upper limit -> positive value
] + [
    numpy.ptp(Y)/100,  # constant noise at around 1 % of the amplitude
]

# bounds can be half-open, or closed:
# `fit_scipy` works best with half-open bounds for L_L and L_U
bounds_halfopen = [
    (-numpy.inf, 1),   # L_L
    (3.5, numpy.inf),  # L_U
    (10, 20),          # I_x
    (1/20, 1),         # S
    (-3, 3),           # c
] + [
    (0.01, 0.5), # min/max for scale
]

We’ll collect not only the fit results, but also the guess and ground truth in a DataFrame:

[33]:

model = ToyAsymmetricModelV1()

df_results = pandas.DataFrame(columns=['method', 'θ']).set_index('method')
df_results.loc['truth', 'θ'] = θ_true
df_results.loc['guess', 'θ'] = guess
df_results

[33]:

	θ
method
truth	(0.1, 4, 15, 0.3, 1, 0.05)
guess	[0, 4.5, 12, 0.2, 2, 0.037627665408536896]

The fitting helper methods return a tuple of the best found θ and a list of historic parameter sets during the optimization.

The history will later be used to visualize the optimization over time.

[34]:

df_results.loc['scipy', 'θ'], history_scipy = calibr8.fit_scipy(
    model,
    independent=X, dependent=Y,
    theta_guess=guess,
    theta_bounds=bounds_halfopen
)

Now that parameter vectors were found, we can add a “loglikelihood” column to compare the results:

[35]:

df_results['loglikelihood'] = [
    model.loglikelihood(x=X, y=Y, theta=θ)
    for θ in df_results['θ']
]
df_results

[35]:

	θ	loglikelihood
method
truth	(0.1, 4, 15, 0.3, 1, 0.05)	39.048817
guess	[0, 4.5, 12, 0.2, 2, 0.037627665408536896]	-12718.564797
scipy	[0.03816243158544993, 4.036333518903234, 14.84...	42.226240

[36]:

def plot_fit_history(axs, model, history):
    X = model.cal_independent
    for i, x in enumerate(history):
        x_dense = numpy.exp(numpy.linspace(numpy.log(min(X)), numpy.log(max(X)), 1000))
        mu, _ = model.predict_dependent(x_dense, theta=x)
        axs[0].plot(x_dense, mu, color=pyplot.cm.autumn(i / len(history)), linewidth=0.2)
        axs[1].plot(x_dense, mu, color=pyplot.cm.autumn(i / len(history)), linewidth=0.2)
    return


for method, history in zip(df_results.index, [None, None, history_scipy]):
    # the plot_model method uses model.theta_fitted
    # here we override it with the parameter set found by a certain method
    model.theta_fitted = df_results.loc[method, 'θ']

    fig, axs = calibr8.plot_model(model)
    if history is not None:
        plot_fit_history(axs, model, history)
    axs[1].set_title(method)
    fig.tight_layout()
    pyplot.show()

C:\Users\osthege\AppData\Local\Temp\ipykernel_24424\1783795081.py:20: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect.
  fig.tight_layout()

_images/Advanced_In-depth_Model_Fitting_14_1.png

_images/Advanced_In-depth_Model_Fitting_14_2.png

_images/Advanced_In-depth_Model_Fitting_14_3.png

[37]:

%load_ext watermark
%watermark -n -u -v -iv -w

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
Last updated: Tue Dec 13 2022

Python implementation: CPython
Python version       : 3.9.15
IPython version      : 8.7.0

sys       : 3.9.15 (main, Nov 24 2022, 14:39:17) [MSC v.1916 64 bit (AMD64)]
pandas    : 1.4.3
scipy     : 1.9.0
numpy     : 1.23.1
matplotlib: 3.5.3
calibr8   : 6.6.1

Watermark: 2.3.1

[ ]: