High-Level Methods for Unconstrained Fitting
Maximum Likelihood Optimization
Template construction is by far the most complicated step in the fitting procedure. Once your templates have been constructed, fitting them to an observed Hess diagram amounts to maximization of the Poisson likelihood ratio (Dolphin 2002). It is possible to construct more complicated hierarchical models including things like metallicity distribution functions; we discuss these in the next section. In this section we discuss methods for fitting where the only constraint is that star formation rates cannot be negative. We provide the StarFormationHistories.construct_x0
method to help with setting the initial guess for this optimization.
StarFormationHistories.construct_x0
— Functionx0::typeof(logage) = construct_x0(logAge::AbstractVector{T},
T_max::Number;
normalize_value::Number=one(T)) where T <: Number
Generates a vector of initial stellar mass normalizations for input to fit_templates
or hmc_sample
with a total stellar mass of normalize_value
such that the implied star formation rate is constant across the provided logAge
vector that contains the log10(Age [yr])
of each isochrone that you are going to input as models. For the purposes of computing the constant star formation rate, the provided logAge
are treated as left-bin edges, with the final right-bin edge being T_max
, which has units of Gyr. For example, you might have logAge=[6.6, 6.7, 6.8]
in which case a final logAge of 6.9 would give equal bin widths. In this case you would set T_max = exp10(6.9) / 1e9 ≈ 0.0079
so that the width of the final bin for the star formation rate calculation has the same log10(Age [yr])
step as the other bins.
Examples
julia> x0 = construct_x0(repeat([7.0,8.0,9.0],3), 10.0; normalize_value=5.0)
9-element Vector{Float64}: ...
julia> sum(x0)
4.99... # Close to `normalize_value`.
When it comes to performing the optimization, the simplest method we offer is StarFormationHistories.fit_templates_lbfgsb
. This will optimize one coefficient per template; there is no overarching metallicity evolution or other constraint, besides that the stellar masses of the populations cannot be negative. This performs a maximum likelihood optimization with the bounded quasi-Newton LBFGS method as implemented in L-BFGS-B and wrapped in LBFGS.jl with analytic gradients. It is fast and converges fairly reliably, even when the initial guess is not particularly close to the maximum likelihood estimate. It provides no uncertainty estimation. It is normal for some of the coefficients to converge to zero.
StarFormationHistories.fit_templates_lbfgsb
— Function(-logL, coeffs) =
fit_templates_lbfgsb(models::AbstractVector{T},
data::AbstractMatrix{<:Number};
x0::AbstractVector{<:Number} = ones(S,length(models)),
factr::Number=1e-12,
pgtol::Number=1e-5,
iprint::Integer=0,
kws...) where {S <: Number, T <: AbstractMatrix{S}}
Finds the coefficients coeffs
that maximize the Poisson likelihood ratio (equations 7–10 in Dolphin 2002) for the composite Hess diagram model sum(models .* coeffs)
given the provided templates models
and the observed Hess diagram data
using the box-constrained LBFGS method provided by LBFGSB.jl.
Arguments
models::AbstractVector{AbstractMatrix{<:Number}}
: the list of template Hess diagrams for the simple stellar populations (SSPs) being considered; all must have the same size.data::AbstractMatrix{<:Number}
: the observed Hess diagram; must match the size of the templates contained inmodels
.
Keyword Arguments
x0
: The vector of initial guesses for the stellar mass coefficients. You should basically always be calculating and passing this keyword argument; we provideStarFormationHistories.construct_x0
to preparex0
assuming constant star formation rate, which is typically a good initial guess.factr::Number
: Keyword argument passed toLBFGSB.lbfgsb
; essentially a relative tolerance for convergence based on the inter-iteration change in the objective function.pgtol::Number
: Keyword argument passed toLBFGSB.lbfgsb
; essentially a relative tolerance for convergence based on the inter-iteration change in the projected gradient of the objective.iprint::Integer
: Keyword argument passed toLBFGSB.lbfgsb
controlling how much information is printed to the terminal. Setting to1
can sometimes be helpful to diagnose convergence issues. Setting to-1
will disable printing.
Other kws...
are passed to LBFGSB.lbfgsb
.
Returns
-logL::Number
: the minimum negative log-likelihood found by the optimizer.coeffs::Vector{<:Number}
: the maximum likelihood estimate for the coefficient vector.
Notes
- It can be helpful to normalize your
models
to contain realistic total stellar masses to aid convergence stability; for example, if the total stellar mass of your population is 10^7 solar masses, then you might normalize your templates to contain 10^3 solar masses. If you are usingpartial_cmd_smooth
to construct the templates, you can specify this normalization via thenormalize_value
keyword.
fit_templates_lbfgsb(models::AbstractMatrix{S},
data::AbstractVector{<:Number};
x0::AbstractVector{<:Number} = ones(S,size(models,2)),
factr::Number=1e-12,
pgtol::Number=1e-5,
iprint::Integer=0,
kws...) where S <: Number
This call signature supports the flattened formats for models
and data
. See the notes for the flattened call signature of StarFormationHistories.composite!
for more details.
This method simply minimizes the negative logarithm of the Poisson likelihood ratio (Equation 10 in Dolphin 2002),
\[- \text{ln} \, \mathscr{L} = \sum_i m_i - n_i \times \left( 1 - \text{ln} \, \left( \frac{n_i}{m_i} \right) \right)\]
where $m_i$ is bin $i$ of the complex model and $n_i$ is bin $i$ of the observed Hess diagram; this can therefore be thought of as computing the maximum likelihood estimate.
We also provide StarFormationHistories.fit_templates_fast
, which is the fastest method we offer for deriving a maximum likelihood estimate for the type of model described above.
StarFormationHistories.fit_templates_fast
— Function(coeffs::Vector{::eltype(x0)}, result::Optim.MultivariateOptimizationResults) =
fit_templates_fast(models::AbstractVector{T},
data::AbstractMatrix{<:Number};
x0::AbstractVector{<:Number} = ones(S,length(models)),
kws...)
where {S <: Number, T <: AbstractMatrix{S}}
Finds only the maximum likelihood estimate (MLE) for the coefficients coeffs
given the provided data
such that the best-fit composite Hess diagram model is sum(models .* coeffs)
. This is a simplification of the main fit_templates
function, which will return the MLE as well as the maximum a posteriori estimate, and further supports uncertainty quantification. For additional details on arguments to this method, see the documentation for fit_templates
.
This method optimizes parameters θ
such that coeffs = θ.^2
– this allows for faster convergence than both the fit_templates_lbfgsb
method, which does not use a variable transformation, and the logarithmic transformation used in fit_templates
. However, the inverse Hessian is not useful for uncertainty estimation under this transformation. As such this method only returns the MLE for coeffs
as a vector and the result object returned by Optim.optimize
. While this method offers fewer features than fit_templates
, this method's runtime is typically half as long (or less). As such, this method is recommended for use in performance-sensitive applications like hierarchical models or hyperparameter estimation where the additional features of fit_templates
are unnecessary. In these applications, the value of the objective function at the derived MLE is typically desired as well; this can be obtained the from result::Optim.MultivariateOptimizationResults
object as Optim.minimum(result)
. Note that this will return the negative loglikelihood, which is what is minimized in this application.
Notes
- By passing additional convergence keyword arguments supported by
Optim.Options
(see this guide), it is possible to converge to the MLE in fewer than 30 iterations with fewer than 100 calls to the likelihood and gradient methods. For example, the main convergence criterion is typically the magnitude of the gradient vector, which by default isg_abstol=1e-8
, terminating the optimization when the magnitude of the gradient is less than 1e-8. We find results are typically sufficiently accurate withg_abstol=1e-3
, which often uses half as many objective evaluations as the default value.
fit_templates_fast(models::AbstractMatrix{S},
data::AbstractVector{<:Number};
x0::AbstractVector{<:Number} = ones(S,size(models,2)),
kws...)
where S <: Number
This call signature supports the flattened formats for models
and data
. See the notes for the flattened call signature of StarFormationHistories.composite!
for more details.
Posterior Sampling: MCMC
For low-dimensional problems, Markov Chain Monte Carlo (MCMC) methods can be an efficient way to sample the posterior and obtain uncertainty estimates on the fitting coefficients $r_j$. We provide StarFormationHistories.mcmc_sample
for this purpose. Internally this uses the multi-threaded affine-invariant MCMC sampler from KissMCMC.jl to perform the sampling, which is based on the same algorithm as Python's emcee (specifically, their emcee.moves.StretchMove
). There are other MCMC packages like AdvancedMH.jl that offer additional features like distributed execution.
StarFormationHistories.mcmc_sample
— Functionresult::MCMCChains.Chains =
mcmc_sample(models::AbstractVector{<:AbstractMatrix{T}},
data::AbstractMatrix{S},
x0::Union{AbstractVector{<:AbstractVector{<:Number}}, AbstractMatrix{<:Number}},
nsteps::Integer;
nburnin::Integer=0,
nthin::Integer=1,
a_scale::Number=2.0,
use_progress_meter::Bool=true)
mcmc_sample(models::AbstractMatrix{<:Number},
data::AbstractVector{<:Number},
args...; kws...)
Samples the posterior of the coefficients coeffs
such that the full model of the observational data
is sum(models .* coeffs)
. Uses the Poisson likelihood ratio as defined by equations 7–10 of Dolphin 2002. Sampling is done using the affine-invariant MCMC sampler implemented in KissMCMC.jl, which is analogous to Python's emcee.moves.StretchMove. This method will automatically parallelize over threads. If you need distributed execution, you may want to look into AdvancedMH.jl.
The second call signature supports the flattened formats for models
and data
. See the notes for the flattened call signature of StarFormationHistories.composite!
for more details.
Arguments
models::AbstractVector{<:AbstractMatrix{<:Number}}
is a vector of equal-sized matrices that represent the template Hess diagrams for the simple stellar populations that compose the observed Hess diagram.data::AbstractMatrix{<:Number}
is the Hess diagram for the observed data.x0::Union{AbstractVector{<:AbstractVector{<:Number}}, AbstractMatrix{<:Number}}
are the initial positions for the MCMC walkers. If providing a vector of vectors, each element (i.e.,x0[1]
) is taken to be the initial position for one walker and each element must have length equal tolength(models)
. The length ofx0
is the number of MCMC walkers (nwalkers
). You can alternatively provide a matrix of size(nwalkers, length(models))
or(length(models), nwalkers)
.nsteps::Integer
is the number of steps to take with each walker.
Keyword Arguments
nburnin::Integer=0
is the number of steps to discard from the start of each chain.nthin::Integer=1
is the factor by which to thin the chain; walker positions will only be saved everynthin
steps.a_scale::Number=2.0
is the scale parameter for the stretch move; probably shouldn't need to be changed.use_progress_Meter::Bool=true
indicates whether or not to show a progress bar during the MCMC procedure.
Returns
result
is aMCMCChains.Chains
instance which enables easy calculation of diagnostic and summary statistics. This type can be indexed and used like a 3-D array of samples with shape(nsteps, length(models), nwalkers)
.
Notes
- When displaying
result
to the terminal it will display summary statistics (MCMCChains.summarystats
) and quantiles (MCMCChains.quantile
) by calling theMCMCChains.describe
method. This can take a second but is nice to have as an option. - The highest posterior density interval, which is the narrowest credible interval that includes the posterior mode, can be calculated with the
MCMCChains.hpd
method. - If you want to extract the array of samples from the
MCMCChains.Chains
object, you can indexresult.value
– this will return anAxisArray
but can be converted to a normal array withArray(result.value)
.
Examples
using Distributions: Poisson
coeffs = rand(10) # SFH coefficients we want to sample
models = [rand(100,100) .* 100 for i in 1:length(coeffs)] # Vector of model Hess diagrams
data = rand.(Poisson.( sum(models .* coeffs) ) ) # Poisson-sample the model `sum(models .* coeffs)`
nwalkers = 1000
nsteps = 400
x0 = rand(nwalkers, length(coeffs)) # Initial walker positions
result = mcmc_sample(models, data, x0, nsteps) # Sample
Chains MCMC chain (400×10×1000 Array{Float64, 3}) ...
Posterior Sampling: Change of Variables and HMC
Dolphin 2013 examined methods for obtaining uncertainties on the fitted coefficients (the $r_j$ in Equation 1 of Dolphin 2002) and found that the Hamiltonian Monte Carlo (HMC) approach allowed for relatively efficient sampling of the posterior distribution when considering many isochrones in the modelling process. HMC requires that the variables to be fit are continuous over the real numbers and so requires a change of variables. Rather than sampling the variables $r_j$ directly, we can sample $\theta_j = \text{ln} \left( r_j \right)$ such that the sampled variables are continuous over the real numbers $-\infty < \theta_j < \infty$ while the $r_j=\text{exp} \left( \theta_j \right)$ coefficients are bounded from $0 < r_j < \infty$. Using a logarithmic transformation has the additional benefit that the gradient of the Poisson likelihood ratio is still continuous and easy to compute analytically.
While maximum likelihood estimates are invariant under variable transformations, sampling methods like HMC are not, as formally the posterior being sampled from is a distribution and therefore must be integrable over the sampling coefficients. We can write the posterior from which we wish to sample as
\[\begin{aligned} p(r_j | D) &= \frac{p(D | r_j) \; p(r_j)}{Z} \\ p(\boldsymbol{r} | D) &= \frac{1}{Z} \; \prod_j p(D | r_j) \; p(r_j) \\ -\text{ln} \; p(\boldsymbol{r} | D) &= \text{ln} \, Z - \sum_j \, \text{ln} \, p(D | r_j) + \text{ln} \, p(r_j) \\ &= \text{ln} \, Z - \text{ln} \, \mathscr{L} + \sum_j \text{ln} \, p(r_j) \end{aligned}\]
where $Z$ is the Bayesian evidence (a constant that can be neglected for sampling methods), $p \left( r_j \right)$ is the prior on the star formation history, and $\mathscr{L}$ is the Poisson likelihood ratio discussed above. An uninformative (and unnormalized) prior on the coefficients $r_j$ could take the form of
\[p(r_j) = \begin{cases} 1; & r_j \geq 0\\ 0; & r_j < 0 \end{cases}\]
such that, if the coefficients $r_j$ are guaranteed to be positive, the final term becomes zero (since $\text{ln}(1)=0$) and
\[-\text{ln} \; p(\boldsymbol{r} | D) = \text{ln} \, Z - \text{ln} \, \mathscr{L}\]
When sampling with methods like HMC, constants like $\text{ln} \, Z$ can be neglected and $-\text{ln} \; p(\boldsymbol{r} | D) \propto - \text{ln} \, \mathscr{L}$ such that the posterior is approximated by the likelihood surface.
Let us consider now what happens when we wish to do a variable transformation from $r_j$ to $\theta_j = \text{ln} (r_j)$. From above we can write the posterior as
\[p(r_j | D) = \frac{p(D | r_j) \; p(r_j)}{Z} \\\]
Under the change of variables formula we can write
\[\begin{aligned} p(\theta_j | D) &= p(r_j | D) \left| \frac{d r_j}{d \theta_j} \right| \\ &= p(r_j | D) \left| \frac{d \theta_j}{d r_j} \right|^{-1} \end{aligned}\]
where $\left| \frac{d \theta_j}{d r_j} \right|^{-1}$ is often called the Jacobian correction. We choose $\theta_j$ such that
\[\begin{aligned} \theta_j &= \text{ln} ( r_j ) \\ \left| \frac{d \theta_j}{d r_j} \right| &= \frac{1}{r_j} \\ r_j &= \text{exp} (\theta_j) \\ \end{aligned}\]
which leads to a posterior of
\[p(\theta_j | D) = \text{exp} (\theta_j) \times p(\text{exp} (\theta_j) | D) = r_j \times p(r_j | D) \\\]
We can then write the product over the $\theta_j$ as
\[\begin{aligned} p(\boldsymbol{\theta} | D) &= \frac{1}{Z} \; \prod_j r_j \; p(D | r_j) \; p(r_j) \\ -\text{ln} \, p(\boldsymbol{\theta} | D) &= \text{ln} \, Z - \sum_j \text{ln} \, (r_j) + \text{ln} \, p(D | r_j) + \text{ln} \, p(r_j) \\ &= \text{ln} \, Z - \sum_j \text{ln} \, p(D | r_j) + \text{ln} \, p(r_j) - \sum_j \theta_j \\ &= -\text{ln} \, p(\boldsymbol{r} | D) - \sum_j \theta_j \\ &= -\text{ln} \, p(\boldsymbol{r} | D) - \sum_j \text{ln} \, (r_j) \end{aligned}\]
The choice of a logarithmic transformation means that the negative logarithm of the posterior (which is what HMC uses for its objective function) has this very simple form which allows for simple analytic gradients as well. Once samples of $\theta$ have been obtained from this distribution via HMC or any other sampling method, they can be directly transformed back to the standard coefficients $r_j = \text{exp}(\theta_j)$.
The method hmc_sample
implements this approach for sampling the $\theta_j$ coefficients; these samples can then be used to estimate random uncertainties on the derived star formation history.
StarFormationHistories.hmc_sample
— Functionhmc_sample(models::AbstractVector{T},
data::AbstractMatrix{<:Number},
nsteps::Integer [, nchains::Integer];
rng::Random.AbstractRNG=Random.default_rng(),
kws...)
where {S <: Number, T <: AbstractMatrix{S}}
hmc_sample(models::AbstractMatrix{S},
data::AbstractVector{<:Number},
nsteps::Integer;
rng::AbstractRNG=default_rng(),
kws...)
where S <: Number
Function to sample the posterior of the coefficients coeffs
such that the full model of the observational data
is sum(models .* coeffs)
. Uses the Poisson likelihood ratio as defined by equations 7–10 of Dolphin 2002 along with a logarithmic transformation of the coeffs
so that the fitting variables are continuous and differentiable over all reals. Sampling is done using the No-U-Turn sampler as implemented in DynamicHMC.jl, which is a form of dynamic Hamiltonian Monte Carlo.
The second call signature supports the flattened formats for models
and data
. See the notes for the flattened call signature of StarFormationHistories.composite!
for more details.
Arguments
models::AbstractVector{<:AbstractMatrix{<:Number}}
is a vector of equal-sized matrices that represent the template Hess diagrams for the simple stellar populations that compose the observed Hess diagram.data::AbstractMatrix{<:Number}
is the Hess diagram for the observed data.nsteps::Integer
is the number of samples to draw per chain.
Optional Arguments
nchains::Integer
: If this argument is not provided, this method will return a single chain. If this argument is provided, it will samplenchains
chains using all available Julia threads and will return a vector of the individual chains.
Keyword Arguments
rng::Random.AbstractRNG
is the random number generator that will be passed to DynamicHMC.jl. Ifnchains
is provided this method will attempt to sample in parallel, requiring a thread-saferng
such as that provided byRandom.default_rng()
.
All other keyword arguments kws...
will be passed to DynamicHMC.mcmc_with_warmup
or DynamicHMC.mcmc_keep_warmup
depending on whether nchains
is provided.
Returns
- If
nchains
is not provided, returns aNamedTuple
as summarized in DynamicHMC.jl's documentation. In short, the matrix of samples can be extracted and transformed asexp.( result.posterior_matrix )
. Statistics about the chain can be obtained withDynamicHMC.Diagnostics.summarize_tree_statistics(result.tree_statistics)
; you want to see a fairly high acceptance rate (>0.5) and the majority of samples having termination criteria being "turning." See DynamicHMC.jl's documentation for more information. - If
nchains
is provided, returns a vector of lengthnchains
of the sameNamedTuple
s described above. The samples from each chain in the returned vector can be stacked to a single(nsamples, nchains, length(models))
matrix withDynamicHMC.stack_posterior_matrices(result)
.
Examples
import DynamicHMC
import StatFormationHistories: hmc_sample
import Statistics: mean
# Run sampler using progress meter to monitor progress
# assuming you have constructed some templates `models` and your observational Hess diagram `data`
result = hmc_sample( models, data, 1000; reporter=DynamicHMC.ProgressMeterReport())
# The chain values are stored in result.posterior matrix; extract them with `result.posterior_matrix`
# An exponential transformation is needed since the optimization internally uses a logarithmic
# transformation and samples log(θ) rather than θ directly.
mc_matrix = exp.( result.posterior_matrix )
# We can look at some statistics from the chain; want to see high acceptance rate (>0.5) and large % of
# "turning" for termination criteria.
DynamicHMC.Diagnostics.summarize_tree_statistics(result.tree_statistics)
Hamiltonian Monte Carlo sample of length 1000
acceptance rate mean: 0.92, 5/25/50/75/95%: 0.65 0.88 0.97 1.0 1.0
termination: divergence => 0%, max_depth => 0%, turning => 100%
depth: 0 => 0%, 1 => 64%, 2 => 36%
# mc_matrix has size `(length(models), nsteps)` so each column is an independent
# sample of the SFH as defined by the coefficients and the rows contain the samples
# for each parameter.
mstar_tot = sum.(eachcol(mc_matrix)) # Total stellar mass of the modelled system per sample
mc_means = mean.(eachrow(mc_matrix)) # Mean of each coefficient evaluated across all samples
# Example with multiple chains sampled in parallel via multi-threading
import Threads
t_result = hmc_sample( models, data, 1000, Threads.nthreads(); reporter=DynamicHMC.ProgressMeterReport())
# Combine the multiple chains into a single matrix and transform
# Can then use the same way as `mc_matrix` above
mc_matrix = exp.( DynamicHMC.pool_posterior_matrices(t_result) )
See the DynamicHMC.jl documentation for more information on how to use the chains that are output by this method.
Inspection of the samples generated by hmc_sample
shows that the posterior defined by the above model is typically smooth, well-behaved, and unimodal. In particular, we find that the sampled $r_j$ for coefficients that are non-zero in the MLE are approximately Gaussian distributed while the logarithms of the sampled $r_j$ are roughly Gaussian distributed for coefficients that are zero in the MLE; i.e.
\[\begin{cases} X_j \sim \mathcal{N}; & \hat r_j > 0 \\ \text{ln} \left( X_j \right) \sim \mathcal{N}; & \hat r_j = 0 \\ \end{cases}\]
where $X_j$ are the samples of $r_j$ obtained from the posterior and $\hat r_j$ is the maximum likelihood estimate of $r_j$.
This indicates we may be able to approximate the posterior in the region surrounding the maximum a posteriori (MAP) value by the inverse of the Hessian matrix (see, e.g., Dovi et al. 1991), allowing us to estimate parameter uncertainties very cheaply. The inverse of the Hessian matrix is exactly equal to the variance-covariance matrix of the parameters for a Gaussian probability distribution; for other probability distributions, the inverse of the Hessian approximates the variance-covariance matrix of the parameters when the second-order expansion defined by the Hessian at the maximum is a reasonable approximation to the real objective function being optimized. A particularly simple form arises when the logarithm of the objective is quadratic in the fitting parameters, as in the Gaussian case, because the second derivatives of the objective are constant and do not depend on the fitting parameters or the MAP estimate.
Maximum a Posteriori Optimization
Direct computation of the Hessian and its inverse is expensive, so we'd like another way to obtain it. The first-order, quasi-Newton BFGS optimization algorithm provides such a method as it iteratively builds a dense approximation to the inverse Hessian using the change in the gradient of the objective, which we can compute analytically. It is, however, much less memory efficient than the LBFGS algorithm we use in StarFormationHistories.fit_templates_lbfgsb
. For moderate isochrone grids up to a few hundred model templates, this is not a problem. Beyond this it may be better to use StarFormationHistories.fit_templates_lbfgsb
to obtain the MLE and hmc_sample
to obtain posterior samples.
We implement this optimization scheme in fit_templates
, which is our recommended method for unconstrained SFH fitting (i.e., direct fitting of the $r_j$ coefficients). See the next section for notes on more complicated, hierarchical models that can incorporate features like metallicity distribution functions.
StarFormationHistories.fit_templates
— Functionresult = fit_templates(models::AbstractVector{T},
data::AbstractMatrix{<:Number};
x0::AbstractVector{<:Number} = ones(S,length(models)),
kws...) where {S <: Number, T <: AbstractMatrix{S}}
Finds both maximum likelihood estimate (MLE) and maximum a posteriori estimate (MAP) for the coefficients coeffs
such that the composite Hess diagram model is sum(models .* coeffs)
using the provided templates models
and the observed Hess diagram data
. Utilizes the Poisson likelihood ratio (equations 7–10 in Dolphin 2002) for the likelihood of the data given the model. See the examples in the documentation for comparisons of the results of this method and hmc_sample
which samples the posterior via Hamiltonian Monte Carlo.
Arguments
models::AbstractVector{AbstractMatrix{<:Number}}
: the list of template Hess diagrams for the simple stellar populations (SSPs) being considered; all must have the same size.data::AbstractMatrix{<:Number}
: the observed Hess diagram; must match the size of the templates contained inmodels
.
Keyword Arguments
x0
: The vector of initial guesses for the stellar mass coefficients. You should basically always be calculating and passing this keyword argument; we provideStarFormationHistories.construct_x0
to preparex0
assuming constant star formation rate, which is typically a good initial guess.
Other kws...
are passed to Optim.options
to set things like convergence criteria for the optimization.
Returns
result
is a NamedTuple
containing two StarFormationHistories.LogTransformFTResult
. The two components of result
are result.map
or result[1]
, which contains the results of the MAP optimization, and result.mle
or result[2]
, which contains the results of the MLE optimization. The documentation for StarFormationHistories.LogTransformFTResult
contains more information about these types, but briefly they contain the following fields, accessible as, e.g., result.map.μ
, result.map.σ
, etc.:
μ::Vector{<:Number}
are the optimizedcoeffs
at the maximum.σ::Vector{<:Number}
are the standard errors on the coeffsμ
calculated from an estimate of the inverse Hessian matrix evaluated atμ
. The inverse of the Hessian matrix at the maximum of the likelihood (or posterior) is a estimator for the variance-covariance matrix of the parameters, but is only accurate when the second-order expansion given by the Hessian at the maximum is a good approximation to the function being optimized (i.e., when the optimized function is approximately quadratic around the maximum; see Dovi et al. 1991 for more information). We find this is often the case for the MAP estimate, but the errors found for coefficients that are ≈0 in the MLE are typically unrealistically small. For coefficients significantly greater than 0, theσ
values from the MAP and MLE are typically consistent to 5–10%.invH::Matrix{<:Number}
is the estimate of the inverse Hessian matrix atμ
that was used to deriveσ
. The optimization is done under a logarithmic transform, such thatθ[j] = log(coeffs[j])
are the actual parameters optimized, so the entries in the Hessian are actually
\[H^{(j,k)} ( \boldsymbol{\hat \theta} ) = \left. \frac{\partial^2 \, J(\boldsymbol{\theta})}{\partial \theta_j \, \partial \theta_k} \right\vert_{\boldsymbol{\theta}=\boldsymbol{\hat \theta}}\]
result
is the full object returned by the optimization fromOptim.jl
; this is of typeOptim.MultivariateOptimizationResults
. Remember that the optimization is done with parametersθ[j] = log(coeffs[j])
when dealing with this raw output. This means that, for example, we calculateresult.map.μ
asexp.(Optim.minimizer(result.map.result))
.
The special property of the StarFormationHistories.LogTransformFTResult
type is that you can draw a set of N::Integer
random parameter samples from the result using the inverse Hessian approximation discussed above by doing rand(result.map, N)
. This type implements the random sampling API of Distributions.jl
so the other standard sampling methods should work as well. In our tests these samples compare very favorably against those from hmc_sample
, which samples the posterior via Hamiltonian Monte Carlo and is therefore more robust but much more expensive. We compare these methods in the examples.
Notes
- This method uses the
BFGS
method fromOptim.jl
internally because it builds and tracks the inverse Hessian matrix approximation which can be used to estimate parameter uncertainties. BFGS is much more memory-intensive than LBFGS (as used forStarFormationHistories.fit_templates_lbfgsb
) for large numbers of parameters (equivalently, manymodels
), so you should consider LBFGS to solve for the MLE along withhmc_sample
to sample the posterior if you are using a large grid of models (greater than a few hundred). - The BFGS implementation we use from Optim.jl uses BLAS operations during its iteration. The OpenBLAS that Julia ships with will often default to running on multiple threads even if Julia itself is started with only a single thread. You can check the current number of BLAS threads with
import LinearAlgebra: BLAS; BLAS.get_num_threads()
. For the problem sizes typical of this function we actually see performance regression with larger numbers of BLAS threads. For this reason you may wish to use BLAS in single-threaded mode; you can set this asimport LinearAlgebra: BLAS; BLAS.set_num_threads(1)
.
fit_templates(models::AbstractMatrix{S},
data::AbstractVector{<:Number};
x0::AbstractVector{<:Number} = ones(S,length(models)),
kws...) where S <: Number
This call signature supports the flattened formats for models
and data
. See the notes for the flattened call signature of StarFormationHistories.composite!
for more details.
StarFormationHistories.LogTransformFTResult
— TypeLogTransformFTResult(μ::AbstractVector{<:Number},
σ::AbstractVector{<:Number},
invH::AbstractMatrix{<:Number},
result)
Type for containing the maximum likelihood estimate (MLE) and maximum a posteriori (MAP) results from fit_templates
. The fitted coefficients are available in the μ
field. Estimates of the standard errors are available in the σ
field. These have both been transformed from the native logarithmic fitting space into natural units (i.e., stellar mass or star formation rate).
invH
contains the estimated inverse Hessian of the likelihood / posterior at the maximum point in the logarithmic fitting units. result
is the full result object returned by the optimization routine.
This type is implemented as a subtype of Distributions.Sampleable{Multivariate, Continuous}
to enable sampling from an estimate of the likelihood / posterior distribution. We approximate the distribution as a multivariate Gaussian in the native (logarithmically transformed) fitting variables with covariance matrix invH
and means log.(μ)
. We find this approximation is good for the MAP result but less robust for the MLE. You can obtain N::Integer
samples from the distribution by rand(R, N)
where R
is an instance of this type; this will return a size length(μ) x N
matrix, or fail if invH
is not positive definite.
Examples
julia> result = fit_templates(models, data);
julia> typeof(result.map)
StarFormationHistories.LogTransformFTResult{...}
julia> size(rand(result.map, 3)) == (length(models),3)
true
Once you have obtained stellar mass coefficients from the above methods, you can convert them into star formation rates and compute per-age mean metallicities with StarFormationHistories.calculate_cum_sfr
.
StarFormationHistories.calculate_cum_sfr
— Function(unique_logAge, cum_sfh, sfr, mean_MH) =
calculate_cum_sfr(coeffs::AbstractVector,
logAge::AbstractVector,
MH::AbstractVector,
T_max::Number;
normalize_value=1,
sorted::Bool=false)
Calculates cumulative star formation history, star formation rates, and mean metallicity evolution as functions of logAge = log10(age [yr])
.
Arguments
coeffs::AbstractVector
is a vector of stellar mass coefficients such as those returned byfit_templates
, for example. Actual stellar mass in stellar populationj
iscoeffs[j] * normalize_value
.logAge::AbstractVector
is a vector giving thelog10(age [yr])
of the stellar populations corresponding to the providedcoeffs
. For the purposes of calculating star formation rates, these are assumed to be left-bin edges.MH::AbstractVector
is a vector giving the metallicities of the stellar populations corresponding to the providedcoeffs
.T_max::Number
is the rightmost final bin edge for calculating star formation rates in units of Gyr. For example, you might havelogAge=[6.6, 6.7, 6.8]
in which case a final logAge of 6.9 would give equal bin widths. In this case you would setT_max = exp10(6.9) / 1e9 ≈ 0.0079
so that the width of the final bin for the star formation rate calculation has the samelog10(Age [yr])
step as the other bins.
Keyword Arguments
normalize_value
is a multiplicative prefactor to apply to all thecoeffs
; same as the keyword inpartial_cmd_smooth
.sorted::Bool
is eithertrue
orfalse
and signifies whether to assumelogAge
is sorted.
Returns
unique_logAge::Vector
is essentiallyunique(sort(logAge))
and provides the x-values you would plot the other returned vectors against.cum_sfh::Vector
is the normalized cumulative SFH implied by the providedcoeffs
. This is ~1 at the most recent time inlogAge
and decreases aslogAge
increases.sfr::Vector
gives the star formation rate across each bin inunique_logAge
. Ifcoeffs .* normalize_value
are in units of solar masses, thensfr
is in units of solar masses per year.mean_MH::Vector
gives the stellar-mass-weighted mean metallicity of the stellar population as a function ofunique_logAge
.