Linear Age-Metallicity Relation

Here we describe the linear age-metallicity relation $\langle [\text{M}/\text{H}] \rangle (t) = \alpha \, \left( T_\text{max} - t \right) + \beta$ with a Gaussian distribution in metallicity at fixed age as described by the GaussianDispersion dispersion model. $T_\text{max}$ here is the earliest lookback time under consideration such that $\langle [\text{M}/\text{H}] \rangle (T_\text{max}) = \beta$. If the per-age-bin stellar mass coefficients are $R_j$, the age of the stellar population $j$ is $t_j$, and the metallicity of population $k$ is $[\text{M}/\text{H}]_k$, then we can write the per-model $r_{j,k}$ (where we are now using separate indices for age and metallicity) as

\[\begin{aligned} \mu_j &= \alpha \, \left( T_\text{max} - t_j \right) + \beta \\ r_{j,k} &= R_j \, \frac{ \text{exp} \left( - \left( \frac{ [\text{M}/\text{H}]_k - \mu_j}{\sigma} \right)^2 \right)}{\sum_k \text{exp} \left( - \left( \frac{ [\text{M}/\text{H}]_k - \mu_j}{\sigma} \right)^2 \right)} \end{aligned}\]

where the numerator is the MDF at fixed age evaluated at metallicity $[\text{M}/\text{H}]_k$ and the denominator is a normalizing coefficient that ensures $\sum_k r_{j,k} = R_j$. In this notation, bin $i$ of the complex model Hess diagram (equation 1 of Dolphin 2002) is

\[m_i = \sum_{j,k} \, r_{j,k} \; c_{i,j,k}\]

Below we show a fit using this hierarchical model to the same data as was used to derive the unconstrained fit in the introduction.

Example of a SFH fit with a linear metallicity evolution.

This model is represented by the LinearAMR type, which is a subtype of AbstractAMR.

StarFormationHistories.AbstractAMRType

AbstractAMR{T <: Real} <: AbstractMetallicityModel{T}: abstract supertype for all metallicity models that are age-metallicity relations. Concrete subtypes T <: AbstractAMR should implement the following API:

  • (model::T)(logAge::Real) should be defined so that the struct is callable with a logarithmic age (log10(age [yr])), returning the mean metallicity given the AMR model. This is $\mu_j \left( t_j \right)$ in the derivations presented in the documentation.
  • nparams(model::T) should return the number of fittable parameters in the model.
  • fittable_params(model::T) should return the values of the fittable parameters in the model.
  • gradient(model::T, logAge::Real) should return a tuple that contains the partial derivative of the mean metallicity $\mu_j$ with respect to each fittable model parameter evaluated at logarithmic age logAge.
  • update_params(model::T, newparams) should return a new instance of T with the fittable parameters contained in newparams (which is typically a vector or tuple) and non-fittable parameters inherited from the provided model.
  • transforms(model::T) should return a tuple of length nparams(model) which indicates how the fittable variables should be transformed for optimization, if at all. Elements should be 1 for parameters that are constrained to always be positive, 0 for parameters that can be positive or negative, and -1 for parameters that are constrained to always be negative.
  • free_params(model::T) should return an NTuple{nparams(model), Bool} that is true for fittable parameters that you want to optimize and false for fittable parameters that you want to stay fixed during optimization.
source
StarFormationHistories.LinearAMRType
LinearAMR(α::Real,
          β::Real,
          T_max::Real = 137//10,
          free::NTuple{2, Bool} = (true, true))

Subtype of AbstractAMR implementing the linear age-metallicity relation where the mean metallicity at a lookback time $t_j$ (in Gyr) is μ_j = α * (T_max - t_j) + β. α is therefore a slope describing the rate of change in the metallicity per Gyr, and β is the mean metallicity value of stars being born at a lookback time of T_max, which has units of Gyr. free controls whether α and β should be freely fit or fixed when passed into fit_sfh; if free[1] == true then α will be freely fit, whereas it will fixed if free[1] == false. free[2] has the same effect but for β.

source
LinearAMR(constraint1, constraint2, T_max::Real=137//10,
          free::NTuple{2, Bool}=(true, true))

Construct an instance of LinearAMR from MH constraints at two different lookback times. Each of constraint1 and constraint2 should be length-2 indexables (e.g., tuples) whose first element is a metallicity [M/H] and second element is a lookback time in Gyr. The order of the constraints does not matter.

Examples

julia> LinearAMR((-2.5, 13.7), (-1.0, 0.0), 13.7) isa LinearAMR{Float64}
true

julia> LinearAMR((-2.5, 13.7), (-1.0, 0.0), 13.7) == LinearAMR((-1.0, 0.0), (-2.5, 13.7), 13.7)
true
source

Fitting Functions

fit_sfh and sample_sfh both work with this AMR model.

The method StarFormationHistories.construct_x0_mdf can be used to construct the stellar mass components $R_j$ of the initial guess vector x0.

StarFormationHistories.construct_x0_mdfFunction
x0::Vector = construct_x0_mdf(logAge::AbstractVector{T},
                              [ cum_sfh, ]
                              T_max::Number;
                              normalize_value::Number = one(T)) where T <: Number

Generates a vector of initial stellar mass normalizations for input to fit_sfh and similar methods with a total stellar mass of normalize_value. The logAge vector must contain the log10(Age [yr]) of each isochrone that you are going to input as models. If cum_sfh is not provided, a constant star formation rate is assumed. For the purposes of computing the constant star formation rate, the provided logAge are treated as left-bin edges, with the final right-bin edge being T_max, which has units of Gyr. For example, you might have logAge=[6.6, 6.7, 6.8] in which case a final logAge of 6.9 would give equal bin widths (in log-space). In this case you would set T_max = exp10(6.9) / 1e9 ≈ 0.0079 so that the width of the final bin for the star formation rate calculation has the same log10(Age [yr]) step as the other bins.

A desired cumulative SFH vector cum_sfh::AbstractVector{<:Number} can be provided as the second argument, which should correspond to a lookback time vector unique(logAge). You can also provide cum_sfh as a length-2 indexable (e.g., a length-2 Vector{Vector{<:Number}}) with the first element containing a list of log10(Age [yr]) values and the second element containing the cumulative SFH values at those values. This cumulative SFH is then interpolated onto the logAge provided in the first argument. This method should be used when you want to define the cumulative SFH on a different age grid from the logAge you provide in the first argument. The examples below demonstrate these use cases.

The difference between this function and StarFormationHistories.construct_x0 is that this function generates an x0 vector that is of length length(unique(logage)) (that is, a single normalization factor for each unique entry in logAge) while StarFormationHistories.construct_x0 returns an x0 vector that is of length length(logAge); that is, a normalization factor for every entry in logAge. The order of the coefficients is such that the coefficient x[i] corresponds to the entry unique(logAge)[i].

Notes

Examples – Constant SFR

julia> isapprox( construct_x0_mdf([9.0, 8.0, 7.0], 10.0; normalize_value=5.0),
                 [4.504504504504504, 0.4504504504504504, 0.04504504504504504] )
true

julia> isapprox( construct_x0_mdf(repeat([9.0, 8.0, 7.0, 8.0]; inner=3), 10.0; normalize_value=5.0),
                 [4.504504504504504, 0.4504504504504504, 0.04504504504504504] )
true

julia> isapprox( construct_x0_mdf(repeat([9.0, 8.0, 7.0, 8.0]; outer=3), 10.0; normalize_value=5.0),
                 construct_x0([9.0, 8.0, 7.0], 10.0; normalize_value=5.0) )
true

Examples – Input Cumulative SFH defined on same logAge grid

julia> isapprox( construct_x0_mdf([9.0, 8.0, 7.0], [0.9009, 0.99099, 1.0], 10.0; normalize_value=5.0),
                 [4.5045, 0.4504, 0.0450]; atol=1e-3 )
true

julia> isapprox( construct_x0_mdf([9.0, 8.0, 7.0], [0.1, 0.5, 1.0], 10.0; normalize_value=5.0),
                 [0.5, 2.0, 2.5] )
true

julia> isapprox( construct_x0_mdf([7.0, 8.0, 9.0], [1.0, 0.5, 0.1], 10.0; normalize_value=5.0),
                 [2.5, 2.0, 0.5] )
true

Examples – Input Cumulative SFH with separate logAge grid

julia> isapprox( construct_x0_mdf([9.0, 8.0, 7.0],
                                  [[9.0, 8.0, 7.0], [0.9009, 0.99099, 1.0]], 10.0; normalize_value=5.0),
                 construct_x0_mdf([9.0, 8.0, 7.0], [0.9009, 0.99099, 1.0], 10.0; normalize_value=5.0) )
true

julia> isapprox( construct_x0_mdf([9.0, 8.0, 7.0],
                                  [[9.0, 8.5, 8.25, 7.0], [0.9009, 0.945945, 0.9887375, 1.0]], 10.0; normalize_value=5.0),
                 construct_x0_mdf([9.0, 8.0, 7.0], [0.9009, 0.99099, 1.0], 10.0; normalize_value=5.0) )
true
source

and calculate_coeffs can be used to calculate per-template stellar mass coefficients (the $r_{j,k}$ above) given the results of a fit (which will be the $R_j$ in the equations above).

Implementation

While one could optimize the above model without an analytic gradient, such gradient-free methods are typically slower and less robust. One could also calculate the gradient numerically using finite differences or auto-differentiation, but these are still slower than analytic calculations. We will show that the gradient of this hierarchical model is analytic, allowing us to design an efficient optimization scheme.

Equation 21 in Dolphin 2001 gives the gradient of our objective function with respect to the underlying coefficients

\[\begin{aligned} F \equiv - \text{ln} \, \mathscr{L} &= \sum_i m_i - n_i \times \left( 1 - \text{ln} \, \left( \frac{n_i}{m_i} \right) \right) \\ \frac{\partial \, F}{\partial \, r_{j,k}} &= \sum_i c_{i,j,k} \left( 1 - \frac{n_i}{m_i} \right) \end{aligned}\]

where $c_{i,j,k}$ is the value of template $j,k$ in bin $i$ and $n_i$ is bin $i$ of the observed Hess diagram. These partial derivatives are easy to obtain, but we need partials with respect to the per-age-bin fitting parameters $R_j$. Given the above relation between $r_{j,k}$ and $R_j$, we can calculate these derivatives as

\[\begin{aligned} \frac{\partial \, F}{\partial \, R_j} &= \sum_k \, \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{\partial \, r_{j,k}}{\partial \, R_j} \\ \frac{\partial \, r_{j,k}}{\partial \, R_j} &= \frac{ \text{exp} \left( - \frac{1}{2} \left( \frac{ [\text{M}/\text{H}]_k - \mu_j}{\sigma} \right)^2 \right)}{\sum_k \text{exp} \left( - \frac{1}{2} \left( \frac{ [\text{M}/\text{H}]_k - \mu_j}{\sigma} \right)^2 \right)} = \frac{r_{j,k}}{R_j} \end{aligned}\]

Then we need only the partial derivatives of the objective function $F$ with respect to the MDF parameters, which in this case are $\alpha, \beta, \sigma$. For convenience we will rewrite

\[r_{j,k} = R_j \, \frac{ \text{exp} \left( - \frac{1}{2} \left( \frac{ [\text{M}/\text{H}]_k - \mu_j}{\sigma} \right)^2 \right)}{\sum_k \text{exp} \left( - \frac{1}{2} \left( \frac{ [\text{M}/\text{H}]_k - \mu_j}{\sigma} \right)^2 \right)} = R_j \, \frac{A_{j,k}}{\sum_k A_{j,k}}\]

as many different types of models can be expressed via this simplified notation by substituting the $A_{j,k}$ with different distributions. This allows us to write

\[\begin{aligned} \frac{\partial \, F}{\partial \, \beta} &= \sum_{j,k} \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{\partial \, r_{j,k}}{\partial \, \beta} \\ \frac{\partial \, r_{j,k}}{\partial \, \beta} &= R_j \left( \frac{1}{\sum_k \, A_{j,k}} \, \frac{\partial \, A_{j,k}}{\partial \, \beta} - \frac{A_{j,k}}{\left( \sum_k \, A_{j,k} \right)^2} \, \frac{\partial \, \sum_k \, A_{j,k}}{\partial \, \beta} \right) \\ &= \frac{R_j}{\sum_k \, A_{j,k}} \left( \frac{\partial \, A_{j,k}}{\partial \, \beta} - \frac{A_{j,k}}{\sum_k \, A_{j,k}} \sum_k \frac{\partial \, A_{j,k}}{\partial \, \beta} \right) \\ \end{aligned}\]

Given our specific definition of $A_{j,k}$ being a Gaussian distribution, we have

\[\begin{aligned} \mu_j &= \alpha \, \left( T_\text{max} - t_j \right) + \beta \\ \frac{\partial \, A_{j,k}}{\partial \, \beta} &= \frac{\partial}{\partial \, \beta} \, \left[ \text{exp} \left( - \frac{1}{2} \left( \frac{ [\text{M}/\text{H}]_k - \mu_j}{\sigma} \right)^2 \right) \right] \\ &= \frac{A_{j,k}}{\sigma^2} \left( [\text{M}/\text{H}]_k - \mu_j \right) \end{aligned}\]

We can now substitute this result into the above expressions to write

\[\begin{aligned} \frac{\partial \, F}{\partial \, \beta} &= \sum_{j,k} \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{\partial \, r_{j,k}}{\partial \, \beta} \\ &= \sum_{j,k} \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{R_j}{\sum_k \, A_{j,k}} \left( \frac{\partial \, A_{j,k}}{\partial \, \beta} - \frac{A_{j,k}}{\sum_k \, A_{j,k}} \sum_k \frac{\partial \, A_{j,k}}{\partial \, \beta} \right) \\ &= \sum_{j,k} \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{R_j}{\sigma^2 \, \sum_k \, A_{j,k}} \left( A_{j,k} \left( [\text{M}/\text{H}]_k - \mu_j \right) - \frac{A_{j,k}}{\sum_k \, A_{j,k}} \sum_k A_{j,k} \left( [\text{M}/\text{H}]_k - \mu_j \right) \right) \end{aligned}\]

It can be shown that the partial derivative of $F$ with respect to $\alpha$ is simply

\[\frac{\partial \, F}{\partial \, \alpha} = \sum_{j,k} \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{\partial \, r_{j,k}}{\partial \, \alpha} = \sum_{j,k} \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{\partial \, r_{j,k}}{\partial \, \beta} \times \left( T_\text{max} - t_j \right) \\\]

The partial derivative with respect to $\sigma$ is slightly more complicated, but we can start identically to how we started above when deriving $\frac{\partial \, F}{\partial \, \beta}$ with

\[\begin{aligned} \frac{\partial \, F}{\partial \, \sigma} &= \sum_{j,k} \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{\partial \, r_{j,k}}{\partial \, \sigma} \\ \frac{\partial \, r_{j,k}}{\partial \, \sigma} &= R_j \left( \frac{1}{\sum_k \, A_{j,k}} \, \frac{\partial \, A_{j,k}}{\partial \, \sigma} - \frac{A_{j,k}}{\left( \sum_k \, A_{j,k} \right)^2} \, \frac{\partial \, \sum_k \, A_{j,k}}{\partial \, \sigma} \right) \\ &= \frac{R_j}{\sum_k \, A_{j,k}} \left( \frac{\partial \, A_{j,k}}{\partial \, \sigma} - \frac{A_{j,k}}{\sum_k \, A_{j,k}} \sum_k \frac{\partial \, A_{j,k}}{\partial \, \sigma} \right) \\ \end{aligned}\]

Then all we need is

\[\frac{\partial \, A_{j,k}}{\partial \, \sigma} = \frac{A_{j,k} \, \left( [\text{M}/\text{H}]_k - \mu_j \right)^2}{\sigma^3}\]

which we can substitute into the above expressions to find $\frac{\partial \, F}{\partial \, \sigma}$.