General interface

General interface

Overview and concepts

The general interface supports the following setup for likelihood-based indirect inference, using a structural model $S$ and an auxiliary model $A$, with given data $y$.

For each set of parameters $θ$, a structural model $S$ is used to generate simulated data $x$, ie

\[x_S(θ, ϵ)\]

where $ϵ$ is a set of common random numbers that can be kept constant for various $θ$s. Nevertheless, the above is not necessarily a deterministic relationship, as additional randomness can be used.

An auxiliary model $A$ with parameters $ϕ$ can be estimated using generated data $x$ with maximum likelihood, ie

\[ϕ_A(x) = \arg\max_ϕ p_A(x ∣ ϕ)\]

is the maximum likelihood estimate.

The likelihood of $y$ at parameters $θ$ is obtained as

\[p_A(y ∣ ϕ_A(x_S(θ, ϵ))\]

This is multiplied by a prior in $θ$, specified in logs.

The user should define

  1. a model type, which represents both the structural and the auxiliary model,

  2. methods for the functions below to dispatch on this type.

componentJulia method
$x_M$simulate_data
$p_A$loglikelihood
$ϕ_A$MLE
draw $ϵ$common_random

The framework is explained in detail below.

Models

The structural and the auxiliary model should be represented by a type (ie struct) defined for each problem. A single type is used for both the structural and the auxiliary model; since the methods that implement the two aspects are implemented by different methods, this should not cause confusion.

Simulating data

The (structural) model is used to generate data from parameters $θ$ and common random numbers $ϵ$ using simulate_data. The mapping is not necessarily deterministic even given $ϵ$, as additional randomness is allowed, but common random numbers are advantageous since they make the mapping from structural to auxiliary parameters continuous. When used for simulation, common random numbers also reduce variance.

When applicable, independent variables (eg covariates) necessary for simulation should be included in the structural model object. It is recommended that a type is defined for each problem, along with methods for simulate_data.

simulate_data(rng, model, θ, ϵ)

Simulate data from the model

  1. using random number generator rng,

  2. with parameters θ,

  3. common random numbers ϵ.

This method should

  1. accept ϵ generated by common_random,

  2. return simulated data in the format that can be used by MLE and

loglikelihood.

Usage

The user should define a method for this function for each model type with the signature

simulate_data(rng::AbstractRNG, model, θ, ϵ)

For infeasible/meaningless parameters, return nothing.

source
simulate_data(rng, model, θ)

Simulate data, generating ϵ using rng.

See common_random.

Usage

For interactive/exploratory use. Models should define methods for simulate_data(rng::AbstractRNG, model, θ, ϵ).

source
simulate_data(model, θ)

Simulate data, generating ϵ with the default random number generator.

See common_random.

Usage

For interactive/exploratory use. Models should define methods for simulate_data(rng::AbstractRNG, model, θ, ϵ).

source
simulate_data(problem, θ)

Simulate data with the given parameters θ.

source

Common random numbers

Common random numbers are a set of random numbers that can be re-used for for simulation with different parameter values.

common_random should yield a random value for the random variables (usually an Array or a collection of similar structures).

common_random(rng, model)

Return common random numbers that can be reused by simulate_data with different parameters.

When the model structure does not allow common random numbers, the convention is to return nothing.

The first argument is the random number generator.

Usage

The user should define a method for this function for each model type.

See also common_random! for further optimizations.

source

For error structures which can be overwritten in place, the user can define common_random! as an optimization.

common_random!(rng, model, ϵ)

Update the common random numbers for the model. The semantics is as follows:

  1. it can, but does not need to, change the contents of its argument ϵ,

  2. the “new” common random numbers should be returned regardless.

Two common usage patterns are

  1. having a mutable ϵ, updating that in place, returning ϵ,

  2. generating new ϵ, returning that.

Note

The default method falls back to common_random, reallocating with each call. A method for this function should be defined only when allocations can be optimized.

source
common_random!(problem)

Return a new problem with updated common random numbers.

source

Data

Data can be of any type, since it is generated and used by user-defined functions. Arrays, tuples (optionally named) are recommended for simple models, as there no need to wrap them in a type since the model type is used for dispatch.

More complex data structures may benefit from being wrapped in a struct.

Auxiliary model estimation and likelihood

These methods should be defined for model types, and accept data from simulate_data.

MLE(model, data)

Maximum likelihood estimate of the parameters for data in model.

When ϕ == MLE(model, data), ϕ should maximize

ϕ -> loglikelihood(model, data, ϕ)

See loglikelihood.

Methods should be defined by the user for each model type.

source
loglikelihood(model, data, ϕ)

Log likelihood of data under model with parameters $ϕ$. See MLE.

source

Problem framework

Estimation problems can be organized into a single structure, which contains the model and data objects, and also a (log) prior in the parameters. This simplifies the evaluation of likelihoods.

The common random numbers are saved in the object. Use common_random! to update them.

IndirectLikelihoodProblem(model, logprior, data; rng, ϵ)

A simple wrapper for an indirect likelihood problem, with the given model object, log prior, and data.

The random number generator rng is saved, and used to initialize the common random numbers ϵ by default.

The user should implement simulate_data, MLE, loglikelihood, and common_random.

source

You can then obtain the (log) posterior at the parameters. Note the single-argument version which returns a callable.

indirect_logposterior(problem, θ)

Evaluate the indirect log posterior of problem at parameters θ.

Short-circuits for infeasible parameters.

source
indirect_logposterior(problem)

Return a callable that evaluates indirect_logposterior(problem, θ) at the given θ.

source

For testing inference with simulated data, the following function can be used to create a problem object.

simulate_problem(model, logprior, θ; rng, ϵ)

Initialize an IndirectLikelihoodProblem with simulated data, using parameters θ.

Useful for debugging and exploration of identification with simulated data.

source

Utilities

local_jacobian(problem, θ₀, ω_to_θ; vecϕ)

Calculate the local Jacobian of the estimated auxiliary parameters $ϕ$ at the structural parameters $θ=θ₀$.

ω_to_θ is a transformation that maps a vector of reals $ω ∈ ℝⁿ$ to the parameters θ in the format acceptable to simulate_data. It should support ContinuousTransformations.transform and ContinuousTransformations.inverse. See, for example, ContinuousTransformations.TransformationTuple.

vecϕ is a function that is used to flatten the auxiliary parameters to a vector. Defaults to vec_parameters.

source

Return the values of the argument as a vector, potentially (but not necessarily) restricting to those elements that uniquely determine the argument.

For example, a symmetric matrix would be determined by the diagonal and either half.

source