High-Level Methods for Unconstrained Fitting

Maximum Likelihood Optimization

Template construction is by far the most complicated step in the fitting procedure. Once your templates have been constructed, fitting them to an observed Hess diagram amounts to maximization of the Poisson likelihood ratio (Dolphin 2002). It is possible to construct more complicated hierarchical models including things like metallicity distribution functions; we discuss these in the next section. In this section we discuss methods for fitting where the only constraint is that star formation rates cannot be negative. We provide the StarFormationHistories.construct_x0 method to help with setting the initial guess for this optimization.

StarFormationHistories.construct_x0Function
x0::typeof(logage) = construct_x0(logAge::AbstractVector{T},
                                  T_max::Number;
                                  normalize_value::Number=one(T)) where T <: Number

Generates a vector of initial stellar mass normalizations for input to fit_templates or hmc_sample with a total stellar mass of normalize_value such that the implied star formation rate is constant across the provided logAge vector that contains the log10(Age [yr]) of each isochrone that you are going to input as models. For the purposes of computing the constant star formation rate, the provided logAge are treated as left-bin edges, with the final right-bin edge being T_max, which has units of Gyr. For example, you might have logAge=[6.6, 6.7, 6.8] in which case a final logAge of 6.9 would give equal bin widths. In this case you would set T_max = exp10(6.9) / 1e9 ≈ 0.0079 so that the width of the final bin for the star formation rate calculation has the same log10(Age [yr]) step as the other bins.

Examples

julia> x0 = construct_x0(repeat([7.0,8.0,9.0],3), 10.0; normalize_value=5.0)
9-element Vector{Float64}: ...

julia> sum(x0)
4.99... # Close to `normalize_value`.
source

When it comes to performing the optimization, the simplest method we offer is StarFormationHistories.fit_templates_lbfgsb. This will optimize one coefficient per template; there is no overarching metallicity evolution or other constraint, besides that the stellar masses of the populations cannot be negative. This performs a maximum likelihood optimization with the bounded quasi-Newton LBFGS method as implemented in L-BFGS-B and wrapped in LBFGS.jl with analytic gradients. It is fast and converges fairly reliably, even when the initial guess is not particularly close to the maximum likelihood estimate. It provides no uncertainty estimation. It is normal for some of the coefficients to converge to zero.

StarFormationHistories.fit_templates_lbfgsbFunction
(-logL, coeffs) = 
fit_templates_lbfgsb(models::AbstractVector{T},
                     data::AbstractMatrix{<:Number};
                     x0::AbstractVector{<:Number} = ones(S,length(models)),
                     factr::Number=1e-12,
                     pgtol::Number=1e-5,
                     iprint::Integer=0,
                     kws...) where {S <: Number, T <: AbstractMatrix{S}}

Finds the coefficients coeffs that maximize the Poisson likelihood ratio (equations 7–10 in Dolphin 2002) for the composite Hess diagram model sum(models .* coeffs) given the provided templates models and the observed Hess diagram data using the box-constrained LBFGS method provided by LBFGSB.jl.

Arguments

  • models::AbstractVector{AbstractMatrix{<:Number}}: the list of template Hess diagrams for the simple stellar populations (SSPs) being considered; all must have the same size.
  • data::AbstractMatrix{<:Number}: the observed Hess diagram; must match the size of the templates contained in models.

Keyword Arguments

  • x0: The vector of initial guesses for the stellar mass coefficients. You should basically always be calculating and passing this keyword argument; we provide StarFormationHistories.construct_x0 to prepare x0 assuming constant star formation rate, which is typically a good initial guess.
  • factr::Number: Keyword argument passed to LBFGSB.lbfgsb; essentially a relative tolerance for convergence based on the inter-iteration change in the objective function.
  • pgtol::Number: Keyword argument passed to LBFGSB.lbfgsb; essentially a relative tolerance for convergence based on the inter-iteration change in the projected gradient of the objective.
  • iprint::Integer: Keyword argument passed to LBFGSB.lbfgsb controlling how much information is printed to the terminal. Setting to 1 can sometimes be helpful to diagnose convergence issues. Setting to -1 will disable printing.

Other kws... are passed to LBFGSB.lbfgsb.

Returns

  • -logL::Number: the minimum negative log-likelihood found by the optimizer.
  • coeffs::Vector{<:Number}: the maximum likelihood estimate for the coefficient vector.

Notes

  • It can be helpful to normalize your models to contain realistic total stellar masses to aid convergence stability; for example, if the total stellar mass of your population is 10^7 solar masses, then you might normalize your templates to contain 10^3 solar masses. If you are using partial_cmd_smooth to construct the templates, you can specify this normalization via the normalize_value keyword.
source
fit_templates_lbfgsb(models::AbstractMatrix{S},
                     data::AbstractVector{<:Number};
                     x0::AbstractVector{<:Number} = ones(S,size(models,2)),
                     factr::Number=1e-12,
                     pgtol::Number=1e-5,
                     iprint::Integer=0,
                     kws...) where S <: Number

This call signature supports the flattened formats for models and data. See the notes for the flattened call signature of StarFormationHistories.composite! for more details.

source

This method simply minimizes the negative logarithm of the Poisson likelihood ratio (Equation 10 in Dolphin 2002),

\[- \text{ln} \, \mathscr{L} = \sum_i m_i - n_i \times \left( 1 - \text{ln} \, \left( \frac{n_i}{m_i} \right) \right)\]

where $m_i$ is bin $i$ of the complex model and $n_i$ is bin $i$ of the observed Hess diagram; this can therefore be thought of as computing the maximum likelihood estimate.

We also provide StarFormationHistories.fit_templates_fast, which is the fastest method we offer for deriving a maximum likelihood estimate for the type of model described above.

StarFormationHistories.fit_templates_fastFunction
(coeffs::Vector{::eltype(x0)}, result::Optim.MultivariateOptimizationResults) =
fit_templates_fast(models::AbstractVector{T},
                   data::AbstractMatrix{<:Number};
                   x0::AbstractVector{<:Number} = ones(S,length(models)),
                   kws...)
                   where {S <: Number, T <: AbstractMatrix{S}}

Finds only the maximum likelihood estimate (MLE) for the coefficients coeffs given the provided data such that the best-fit composite Hess diagram model is sum(models .* coeffs). This is a simplification of the main fit_templates function, which will return the MLE as well as the maximum a posteriori estimate, and further supports uncertainty quantification. For additional details on arguments to this method, see the documentation for fit_templates.

This method optimizes parameters θ such that coeffs = θ.^2 – this allows for faster convergence than both the fit_templates_lbfgsb method, which does not use a variable transformation, and the logarithmic transformation used in fit_templates. However, the inverse Hessian is not useful for uncertainty estimation under this transformation. As such this method only returns the MLE for coeffs as a vector and the result object returned by Optim.optimize. While this method offers fewer features than fit_templates, this method's runtime is typically half as long (or less). As such, this method is recommended for use in performance-sensitive applications like hierarchical models or hyperparameter estimation where the additional features of fit_templates are unnecessary. In these applications, the value of the objective function at the derived MLE is typically desired as well; this can be obtained the from result::Optim.MultivariateOptimizationResults object as Optim.minimum(result). Note that this will return the negative loglikelihood, which is what is minimized in this application.

Notes

  1. By passing additional convergence keyword arguments supported by Optim.Options (see this guide), it is possible to converge to the MLE in fewer than 30 iterations with fewer than 100 calls to the likelihood and gradient methods. For example, the main convergence criterion is typically the magnitude of the gradient vector, which by default is g_abstol=1e-8, terminating the optimization when the magnitude of the gradient is less than 1e-8. We find results are typically sufficiently accurate with g_abstol=1e-3, which often uses half as many objective evaluations as the default value.
source
fit_templates_fast(models::AbstractMatrix{S},
                   data::AbstractVector{<:Number};
                   x0::AbstractVector{<:Number} = ones(S,size(models,2)),
                   kws...)
                   where S <: Number

This call signature supports the flattened formats for models and data. See the notes for the flattened call signature of StarFormationHistories.composite! for more details.

source

Posterior Sampling: MCMC

For low-dimensional problems, Markov Chain Monte Carlo (MCMC) methods can be an efficient way to sample the posterior and obtain uncertainty estimates on the fitting coefficients $r_j$. We provide StarFormationHistories.mcmc_sample for this purpose. Internally this uses the multi-threaded affine-invariant MCMC sampler from KissMCMC.jl to perform the sampling, which is based on the same algorithm as Python's emcee (specifically, their emcee.moves.StretchMove). There are other MCMC packages like AdvancedMH.jl that offer additional features like distributed execution.

StarFormationHistories.mcmc_sampleFunction
result::MCMCChains.Chains =
mcmc_sample(models::AbstractVector{<:AbstractMatrix{T}},
            data::AbstractMatrix{S},
            x0::Union{AbstractVector{<:AbstractVector{<:Number}}, AbstractMatrix{<:Number}},
            nsteps::Integer;
            nburnin::Integer=0,
            nthin::Integer=1,
            a_scale::Number=2.0,
            use_progress_meter::Bool=true)
mcmc_sample(models::AbstractMatrix{<:Number},
            data::AbstractVector{<:Number},
            args...; kws...)

Samples the posterior of the coefficients coeffs such that the full model of the observational data is sum(models .* coeffs). Uses the Poisson likelihood ratio as defined by equations 7–10 of Dolphin 2002. Sampling is done using the affine-invariant MCMC sampler implemented in KissMCMC.jl, which is analogous to Python's emcee.moves.StretchMove. This method will automatically parallelize over threads. If you need distributed execution, you may want to look into AdvancedMH.jl.

The second call signature supports the flattened formats for models and data. See the notes for the flattened call signature of StarFormationHistories.composite! for more details.

Arguments

  • models::AbstractVector{<:AbstractMatrix{<:Number}} is a vector of equal-sized matrices that represent the template Hess diagrams for the simple stellar populations that compose the observed Hess diagram.
  • data::AbstractMatrix{<:Number} is the Hess diagram for the observed data.
  • x0::Union{AbstractVector{<:AbstractVector{<:Number}}, AbstractMatrix{<:Number}} are the initial positions for the MCMC walkers. If providing a vector of vectors, each element (i.e., x0[1]) is taken to be the initial position for one walker and each element must have length equal to length(models). The length of x0 is the number of MCMC walkers (nwalkers). You can alternatively provide a matrix of size (nwalkers, length(models)) or (length(models), nwalkers).
  • nsteps::Integer is the number of steps to take with each walker.

Keyword Arguments

  • nburnin::Integer=0 is the number of steps to discard from the start of each chain.
  • nthin::Integer=1 is the factor by which to thin the chain; walker positions will only be saved every nthin steps.
  • a_scale::Number=2.0 is the scale parameter for the stretch move; probably shouldn't need to be changed.
  • use_progress_Meter::Bool=true indicates whether or not to show a progress bar during the MCMC procedure.

Returns

  • result is a MCMCChains.Chains instance which enables easy calculation of diagnostic and summary statistics. This type can be indexed and used like a 3-D array of samples with shape (nsteps, length(models), nwalkers).

Notes

  • When displaying result to the terminal it will display summary statistics (MCMCChains.summarystats) and quantiles (MCMCChains.quantile) by calling the MCMCChains.describe method. This can take a second but is nice to have as an option.
  • The highest posterior density interval, which is the narrowest credible interval that includes the posterior mode, can be calculated with the MCMCChains.hpd method.
  • If you want to extract the array of samples from the MCMCChains.Chains object, you can index result.value – this will return an AxisArray but can be converted to a normal array with Array(result.value).

Examples

using Distributions: Poisson
coeffs = rand(10) # SFH coefficients we want to sample
models = [rand(100,100) .* 100 for i in 1:length(coeffs)] # Vector of model Hess diagrams
data = rand.(Poisson.( sum(models .* coeffs) ) ) # Poisson-sample the model `sum(models .* coeffs)`
nwalkers = 1000
nsteps = 400
x0 = rand(nwalkers, length(coeffs)) # Initial walker positions
result = mcmc_sample(models, data, x0, nsteps) # Sample
Chains MCMC chain (400×10×1000 Array{Float64, 3}) ...
source

Posterior Sampling: Change of Variables and HMC

Dolphin 2013 examined methods for obtaining uncertainties on the fitted coefficients (the $r_j$ in Equation 1 of Dolphin 2002) and found that the Hamiltonian Monte Carlo (HMC) approach allowed for relatively efficient sampling of the posterior distribution when considering many isochrones in the modelling process. HMC requires that the variables to be fit are continuous over the real numbers and so requires a change of variables. Rather than sampling the variables $r_j$ directly, we can sample $\theta_j = \text{ln} \left( r_j \right)$ such that the sampled variables are continuous over the real numbers $-\infty < \theta_j < \infty$ while the $r_j=\text{exp} \left( \theta_j \right)$ coefficients are bounded from $0 < r_j < \infty$. Using a logarithmic transformation has the additional benefit that the gradient of the Poisson likelihood ratio is still continuous and easy to compute analytically.

While maximum likelihood estimates are invariant under variable transformations, sampling methods like HMC are not, as formally the posterior being sampled from is a distribution and therefore must be integrable over the sampling coefficients. We can write the posterior from which we wish to sample as

\[\begin{aligned} p(r_j | D) &= \frac{p(D | r_j) \; p(r_j)}{Z} \\ p(\boldsymbol{r} | D) &= \frac{1}{Z} \; \prod_j p(D | r_j) \; p(r_j) \\ -\text{ln} \; p(\boldsymbol{r} | D) &= \text{ln} \, Z - \sum_j \, \text{ln} \, p(D | r_j) + \text{ln} \, p(r_j) \\ &= \text{ln} \, Z - \text{ln} \, \mathscr{L} + \sum_j \text{ln} \, p(r_j) \end{aligned}\]

where $Z$ is the Bayesian evidence (a constant that can be neglected for sampling methods), $p \left( r_j \right)$ is the prior on the star formation history, and $\mathscr{L}$ is the Poisson likelihood ratio discussed above. An uninformative (and unnormalized) prior on the coefficients $r_j$ could take the form of

\[p(r_j) = \begin{cases} 1; & r_j \geq 0\\ 0; & r_j < 0 \end{cases}\]

such that, if the coefficients $r_j$ are guaranteed to be positive, the final term becomes zero (since $\text{ln}(1)=0$) and

\[-\text{ln} \; p(\boldsymbol{r} | D) = \text{ln} \, Z - \text{ln} \, \mathscr{L}\]

When sampling with methods like HMC, constants like $\text{ln} \, Z$ can be neglected and $-\text{ln} \; p(\boldsymbol{r} | D) \propto - \text{ln} \, \mathscr{L}$ such that the posterior is approximated by the likelihood surface.

Let us consider now what happens when we wish to do a variable transformation from $r_j$ to $\theta_j = \text{ln} (r_j)$. From above we can write the posterior as

\[p(r_j | D) = \frac{p(D | r_j) \; p(r_j)}{Z} \\\]

Under the change of variables formula we can write

\[\begin{aligned} p(\theta_j | D) &= p(r_j | D) \left| \frac{d r_j}{d \theta_j} \right| \\ &= p(r_j | D) \left| \frac{d \theta_j}{d r_j} \right|^{-1} \end{aligned}\]

where $\left| \frac{d \theta_j}{d r_j} \right|^{-1}$ is often called the Jacobian correction. We choose $\theta_j$ such that

\[\begin{aligned} \theta_j &= \text{ln} ( r_j ) \\ \left| \frac{d \theta_j}{d r_j} \right| &= \frac{1}{r_j} \\ r_j &= \text{exp} (\theta_j) \\ \end{aligned}\]

which leads to a posterior of

\[p(\theta_j | D) = \text{exp} (\theta_j) \times p(\text{exp} (\theta_j) | D) = r_j \times p(r_j | D) \\\]

We can then write the product over the $\theta_j$ as

\[\begin{aligned} p(\boldsymbol{\theta} | D) &= \frac{1}{Z} \; \prod_j r_j \; p(D | r_j) \; p(r_j) \\ -\text{ln} \, p(\boldsymbol{\theta} | D) &= \text{ln} \, Z - \sum_j \text{ln} \, (r_j) + \text{ln} \, p(D | r_j) + \text{ln} \, p(r_j) \\ &= \text{ln} \, Z - \sum_j \text{ln} \, p(D | r_j) + \text{ln} \, p(r_j) - \sum_j \theta_j \\ &= -\text{ln} \, p(\boldsymbol{r} | D) - \sum_j \theta_j \\ &= -\text{ln} \, p(\boldsymbol{r} | D) - \sum_j \text{ln} \, (r_j) \end{aligned}\]

The choice of a logarithmic transformation means that the negative logarithm of the posterior (which is what HMC uses for its objective function) has this very simple form which allows for simple analytic gradients as well. Once samples of $\theta$ have been obtained from this distribution via HMC or any other sampling method, they can be directly transformed back to the standard coefficients $r_j = \text{exp}(\theta_j)$.

The method hmc_sample implements this approach for sampling the $\theta_j$ coefficients; these samples can then be used to estimate random uncertainties on the derived star formation history.

StarFormationHistories.hmc_sampleFunction
hmc_sample(models::AbstractVector{T},
           data::AbstractMatrix{<:Number},
           nsteps::Integer [, nchains::Integer];
           rng::Random.AbstractRNG=Random.default_rng(),
           kws...)
           where {S <: Number, T <: AbstractMatrix{S}}
hmc_sample(models::AbstractMatrix{S},
           data::AbstractVector{<:Number},
           nsteps::Integer;
           rng::AbstractRNG=default_rng(),
           kws...)
           where S <: Number

Function to sample the posterior of the coefficients coeffs such that the full model of the observational data is sum(models .* coeffs). Uses the Poisson likelihood ratio as defined by equations 7–10 of Dolphin 2002 along with a logarithmic transformation of the coeffs so that the fitting variables are continuous and differentiable over all reals. Sampling is done using the No-U-Turn sampler as implemented in DynamicHMC.jl, which is a form of dynamic Hamiltonian Monte Carlo.

The second call signature supports the flattened formats for models and data. See the notes for the flattened call signature of StarFormationHistories.composite! for more details.

Arguments

  • models::AbstractVector{<:AbstractMatrix{<:Number}} is a vector of equal-sized matrices that represent the template Hess diagrams for the simple stellar populations that compose the observed Hess diagram.
  • data::AbstractMatrix{<:Number} is the Hess diagram for the observed data.
  • nsteps::Integer is the number of samples to draw per chain.

Optional Arguments

  • nchains::Integer: If this argument is not provided, this method will return a single chain. If this argument is provided, it will sample nchains chains using all available Julia threads and will return a vector of the individual chains.

Keyword Arguments

  • rng::Random.AbstractRNG is the random number generator that will be passed to DynamicHMC.jl. If nchains is provided this method will attempt to sample in parallel, requiring a thread-safe rng such as that provided by Random.default_rng().

All other keyword arguments kws... will be passed to DynamicHMC.mcmc_with_warmup or DynamicHMC.mcmc_keep_warmup depending on whether nchains is provided.

Returns

  • If nchains is not provided, returns a NamedTuple as summarized in DynamicHMC.jl's documentation. In short, the matrix of samples can be extracted and transformed as exp.( result.posterior_matrix ). Statistics about the chain can be obtained with DynamicHMC.Diagnostics.summarize_tree_statistics(result.tree_statistics); you want to see a fairly high acceptance rate (>0.5) and the majority of samples having termination criteria being "turning." See DynamicHMC.jl's documentation for more information.
  • If nchains is provided, returns a vector of length nchains of the same NamedTuples described above. The samples from each chain in the returned vector can be stacked to a single (nsamples, nchains, length(models)) matrix with DynamicHMC.stack_posterior_matrices(result).

Examples

import DynamicHMC
import StatFormationHistories: hmc_sample
import Statistics: mean
# Run sampler using progress meter to monitor progress
# assuming you have constructed some templates `models` and your observational Hess diagram `data`
result = hmc_sample( models, data, 1000; reporter=DynamicHMC.ProgressMeterReport())
# The chain values are stored in result.posterior matrix; extract them with `result.posterior_matrix`
# An exponential transformation is needed since the optimization internally uses a logarithmic 
# transformation and samples log(θ) rather than θ directly. 
mc_matrix = exp.( result.posterior_matrix )
# We can look at some statistics from the chain; want to see high acceptance rate (>0.5) and large % of
# "turning" for termination criteria. 
DynamicHMC.Diagnostics.summarize_tree_statistics(result.tree_statistics)
    Hamiltonian Monte Carlo sample of length 1000
      acceptance rate mean: 0.92, 5/25/50/75/95%: 0.65 0.88 0.97 1.0 1.0
      termination: divergence => 0%, max_depth => 0%, turning => 100%
      depth: 0 => 0%, 1 => 64%, 2 => 36%
# mc_matrix has size `(length(models), nsteps)` so each column is an independent
# sample of the SFH as defined by the coefficients and the rows contain the samples
# for each parameter. 
mstar_tot = sum.(eachcol(mc_matrix)) # Total stellar mass of the modelled system per sample
mc_means = mean.(eachrow(mc_matrix)) # Mean of each coefficient evaluated across all samples
# Example with multiple chains sampled in parallel via multi-threading
import Threads
t_result = hmc_sample( models, data, 1000, Threads.nthreads(); reporter=DynamicHMC.ProgressMeterReport())
# Combine the multiple chains into a single matrix and transform
# Can then use the same way as `mc_matrix` above
mc_matrix = exp.( DynamicHMC.pool_posterior_matrices(t_result) )
source

See the DynamicHMC.jl documentation for more information on how to use the chains that are output by this method.

Inspection of the samples generated by hmc_sample shows that the posterior defined by the above model is typically smooth, well-behaved, and unimodal. In particular, we find that the sampled $r_j$ for coefficients that are non-zero in the MLE are approximately Gaussian distributed while the logarithms of the sampled $r_j$ are roughly Gaussian distributed for coefficients that are zero in the MLE; i.e.

\[\begin{cases} X_j \sim \mathcal{N}; & \hat r_j > 0 \\ \text{ln} \left( X_j \right) \sim \mathcal{N}; & \hat r_j = 0 \\ \end{cases}\]

where $X_j$ are the samples of $r_j$ obtained from the posterior and $\hat r_j$ is the maximum likelihood estimate of $r_j$.

This indicates we may be able to approximate the posterior in the region surrounding the maximum a posteriori (MAP) value by the inverse of the Hessian matrix (see, e.g., Dovi et al. 1991), allowing us to estimate parameter uncertainties very cheaply. The inverse of the Hessian matrix is exactly equal to the variance-covariance matrix of the parameters for a Gaussian probability distribution; for other probability distributions, the inverse of the Hessian approximates the variance-covariance matrix of the parameters when the second-order expansion defined by the Hessian at the maximum is a reasonable approximation to the real objective function being optimized. A particularly simple form arises when the logarithm of the objective is quadratic in the fitting parameters, as in the Gaussian case, because the second derivatives of the objective are constant and do not depend on the fitting parameters or the MAP estimate.

Maximum a Posteriori Optimization

Direct computation of the Hessian and its inverse is expensive, so we'd like another way to obtain it. The first-order, quasi-Newton BFGS optimization algorithm provides such a method as it iteratively builds a dense approximation to the inverse Hessian using the change in the gradient of the objective, which we can compute analytically. It is, however, much less memory efficient than the LBFGS algorithm we use in StarFormationHistories.fit_templates_lbfgsb. For moderate isochrone grids up to a few hundred model templates, this is not a problem. Beyond this it may be better to use StarFormationHistories.fit_templates_lbfgsb to obtain the MLE and hmc_sample to obtain posterior samples.

We implement this optimization scheme in fit_templates, which is our recommended method for unconstrained SFH fitting (i.e., direct fitting of the $r_j$ coefficients). See the next section for notes on more complicated, hierarchical models that can incorporate features like metallicity distribution functions.

StarFormationHistories.fit_templatesFunction
result = fit_templates(models::AbstractVector{T},
                       data::AbstractMatrix{<:Number};
                       x0::AbstractVector{<:Number} = ones(S,length(models)),
                       kws...) where {S <: Number, T <: AbstractMatrix{S}}

Finds both maximum likelihood estimate (MLE) and maximum a posteriori estimate (MAP) for the coefficients coeffs such that the composite Hess diagram model is sum(models .* coeffs) using the provided templates models and the observed Hess diagram data. Utilizes the Poisson likelihood ratio (equations 7–10 in Dolphin 2002) for the likelihood of the data given the model. See the examples in the documentation for comparisons of the results of this method and hmc_sample which samples the posterior via Hamiltonian Monte Carlo.

Arguments

  • models::AbstractVector{AbstractMatrix{<:Number}}: the list of template Hess diagrams for the simple stellar populations (SSPs) being considered; all must have the same size.
  • data::AbstractMatrix{<:Number}: the observed Hess diagram; must match the size of the templates contained in models.

Keyword Arguments

  • x0: The vector of initial guesses for the stellar mass coefficients. You should basically always be calculating and passing this keyword argument; we provide StarFormationHistories.construct_x0 to prepare x0 assuming constant star formation rate, which is typically a good initial guess.

Other kws... are passed to Optim.options to set things like convergence criteria for the optimization.

Returns

result is a NamedTuple containing two StarFormationHistories.LogTransformFTResult. The two components of result are result.map or result[1], which contains the results of the MAP optimization, and result.mle or result[2], which contains the results of the MLE optimization. The documentation for StarFormationHistories.LogTransformFTResult contains more information about these types, but briefly they contain the following fields, accessible as, e.g., result.map.μ, result.map.σ, etc.:

  • μ::Vector{<:Number} are the optimized coeffs at the maximum.
  • σ::Vector{<:Number} are the standard errors on the coeffs μ calculated from an estimate of the inverse Hessian matrix evaluated at μ. The inverse of the Hessian matrix at the maximum of the likelihood (or posterior) is a estimator for the variance-covariance matrix of the parameters, but is only accurate when the second-order expansion given by the Hessian at the maximum is a good approximation to the function being optimized (i.e., when the optimized function is approximately quadratic around the maximum; see Dovi et al. 1991 for more information). We find this is often the case for the MAP estimate, but the errors found for coefficients that are ≈0 in the MLE are typically unrealistically small. For coefficients significantly greater than 0, the σ values from the MAP and MLE are typically consistent to 5–10%.
  • invH::Matrix{<:Number} is the estimate of the inverse Hessian matrix at μ that was used to derive σ. The optimization is done under a logarithmic transform, such that θ[j] = log(coeffs[j]) are the actual parameters optimized, so the entries in the Hessian are actually

\[H^{(j,k)} ( \boldsymbol{\hat \theta} ) = \left. \frac{\partial^2 \, J(\boldsymbol{\theta})}{\partial \theta_j \, \partial \theta_k} \right\vert_{\boldsymbol{\theta}=\boldsymbol{\hat \theta}}\]

  • result is the full object returned by the optimization from Optim.jl; this is of type Optim.MultivariateOptimizationResults. Remember that the optimization is done with parameters θ[j] = log(coeffs[j]) when dealing with this raw output. This means that, for example, we calculate result.map.μ as exp.(Optim.minimizer(result.map.result)).

The special property of the StarFormationHistories.LogTransformFTResult type is that you can draw a set of N::Integer random parameter samples from the result using the inverse Hessian approximation discussed above by doing rand(result.map, N). This type implements the random sampling API of Distributions.jl so the other standard sampling methods should work as well. In our tests these samples compare very favorably against those from hmc_sample, which samples the posterior via Hamiltonian Monte Carlo and is therefore more robust but much more expensive. We compare these methods in the examples.

Notes

  • This method uses the BFGS method from Optim.jl internally because it builds and tracks the inverse Hessian matrix approximation which can be used to estimate parameter uncertainties. BFGS is much more memory-intensive than LBFGS (as used for StarFormationHistories.fit_templates_lbfgsb) for large numbers of parameters (equivalently, many models), so you should consider LBFGS to solve for the MLE along with hmc_sample to sample the posterior if you are using a large grid of models (greater than a few hundred).
  • The BFGS implementation we use from Optim.jl uses BLAS operations during its iteration. The OpenBLAS that Julia ships with will often default to running on multiple threads even if Julia itself is started with only a single thread. You can check the current number of BLAS threads with import LinearAlgebra: BLAS; BLAS.get_num_threads(). For the problem sizes typical of this function we actually see performance regression with larger numbers of BLAS threads. For this reason you may wish to use BLAS in single-threaded mode; you can set this as import LinearAlgebra: BLAS; BLAS.set_num_threads(1).
source
fit_templates(models::AbstractMatrix{S},
              data::AbstractVector{<:Number};
              x0::AbstractVector{<:Number} = ones(S,length(models)),
              kws...) where S <: Number

This call signature supports the flattened formats for models and data. See the notes for the flattened call signature of StarFormationHistories.composite! for more details.

source
StarFormationHistories.LogTransformFTResultType
LogTransformFTResult(μ::AbstractVector{<:Number},
                     σ::AbstractVector{<:Number},
                     invH::AbstractMatrix{<:Number},
                     result)

Type for containing the maximum likelihood estimate (MLE) and maximum a posteriori (MAP) results from fit_templates. The fitted coefficients are available in the μ field. Estimates of the standard errors are available in the σ field. These have both been transformed from the native logarithmic fitting space into natural units (i.e., stellar mass or star formation rate).

invH contains the estimated inverse Hessian of the likelihood / posterior at the maximum point in the logarithmic fitting units. result is the full result object returned by the optimization routine.

This type is implemented as a subtype of Distributions.Sampleable{Multivariate, Continuous} to enable sampling from an estimate of the likelihood / posterior distribution. We approximate the distribution as a multivariate Gaussian in the native (logarithmically transformed) fitting variables with covariance matrix invH and means log.(μ). We find this approximation is good for the MAP result but less robust for the MLE. You can obtain N::Integer samples from the distribution by rand(R, N) where R is an instance of this type; this will return a size length(μ) x N matrix, or fail if invH is not positive definite.

Examples

julia> result = fit_templates(models, data);

julia> typeof(result.map)
StarFormationHistories.LogTransformFTResult{...}

julia> size(rand(result.map, 3)) == (length(models),3)
true
source

Once you have obtained stellar mass coefficients from the above methods, you can convert them into star formation rates and compute per-age mean metallicities with StarFormationHistories.calculate_cum_sfr.

StarFormationHistories.calculate_cum_sfrFunction
(unique_logAge, cum_sfh, sfr, mean_MH) =
    calculate_cum_sfr(coeffs::AbstractVector,
                      logAge::AbstractVector,
                      MH::AbstractVector,
                      T_max::Number;
                      normalize_value=1,
                      sorted::Bool=false)

Calculates cumulative star formation history, star formation rates, and mean metallicity evolution as functions of logAge = log10(age [yr]).

Arguments

  • coeffs::AbstractVector is a vector of stellar mass coefficients such as those returned by fit_templates, for example. Actual stellar mass in stellar population j is coeffs[j] * normalize_value.
  • logAge::AbstractVector is a vector giving the log10(age [yr]) of the stellar populations corresponding to the provided coeffs. For the purposes of calculating star formation rates, these are assumed to be left-bin edges.
  • MH::AbstractVector is a vector giving the metallicities of the stellar populations corresponding to the provided coeffs.
  • T_max::Number is the rightmost final bin edge for calculating star formation rates in units of Gyr. For example, you might have logAge=[6.6, 6.7, 6.8] in which case a final logAge of 6.9 would give equal bin widths. In this case you would set T_max = exp10(6.9) / 1e9 ≈ 0.0079 so that the width of the final bin for the star formation rate calculation has the same log10(Age [yr]) step as the other bins.

Keyword Arguments

  • normalize_value is a multiplicative prefactor to apply to all the coeffs; same as the keyword in partial_cmd_smooth.
  • sorted::Bool is either true or false and signifies whether to assume logAge is sorted.

Returns

  • unique_logAge::Vector is essentially unique(sort(logAge)) and provides the x-values you would plot the other returned vectors against.
  • cum_sfh::Vector is the normalized cumulative SFH implied by the provided coeffs. This is ~1 at the most recent time in logAge and decreases as logAge increases.
  • sfr::Vector gives the star formation rate across each bin in unique_logAge. If coeffs .* normalize_value are in units of solar masses, then sfr is in units of solar masses per year.
  • mean_MH::Vector gives the stellar-mass-weighted mean metallicity of the stellar population as a function of unique_logAge.
source