Linear Age-Metallicity Relation
Here we describe the linear age-metallicity relation $\langle [\text{M}/\text{H}] \rangle (t) = \alpha \, \left( T_\text{max} - t \right) + \beta$ with a Gaussian distribution in metallicity at fixed age as described by the GaussianDispersion
dispersion model. $T_\text{max}$ here is the earliest lookback time under consideration such that $\langle [\text{M}/\text{H}] \rangle (T_\text{max}) = \beta$. If the per-age-bin stellar mass coefficients are $R_j$, the age of the stellar population $j$ is $t_j$, and the metallicity of population $k$ is $[\text{M}/\text{H}]_k$, then we can write the per-model $r_{j,k}$ (where we are now using separate indices for age and metallicity) as
\[\begin{aligned} \mu_j &= \alpha \, \left( T_\text{max} - t_j \right) + \beta \\ r_{j,k} &= R_j \, \frac{ \text{exp} \left( - \left( \frac{ [\text{M}/\text{H}]_k - \mu_j}{\sigma} \right)^2 \right)}{\sum_k \text{exp} \left( - \left( \frac{ [\text{M}/\text{H}]_k - \mu_j}{\sigma} \right)^2 \right)} \end{aligned}\]
where the numerator is the MDF at fixed age evaluated at metallicity $[\text{M}/\text{H}]_k$ and the denominator is a normalizing coefficient that ensures $\sum_k r_{j,k} = R_j$. In this notation, bin $i$ of the complex model Hess diagram (equation 1 of Dolphin 2002) is
\[m_i = \sum_{j,k} \, r_{j,k} \; c_{i,j,k}\]
Below we show a fit using this hierarchical model to the same data as was used to derive the unconstrained fit in the introduction.
This model is represented by the LinearAMR
type, which is a subtype of AbstractAMR
.
StarFormationHistories.AbstractAMR
— TypeAbstractAMR{T <: Real} <: AbstractMetallicityModel{T}
: abstract supertype for all metallicity models that are age-metallicity relations. Concrete subtypes T <: AbstractAMR
should implement the following API:
(model::T)(logAge::Real)
should be defined so that the struct is callable with a logarithmic age (log10(age [yr])
), returning the mean metallicity given the AMR model. This is $\mu_j \left( t_j \right)$ in the derivations presented in the documentation.nparams(model::T)
should return the number of fittable parameters in the model.fittable_params(model::T)
should return the values of the fittable parameters in the model.gradient(model::T, logAge::Real)
should return a tuple that contains the partial derivative of the mean metallicity $\mu_j$ with respect to each fittable model parameter evaluated at logarithmic agelogAge
.update_params(model::T, newparams)
should return a new instance ofT
with the fittable parameters contained innewparams
(which is typically a vector or tuple) and non-fittable parameters inherited from the providedmodel
.transforms(model::T)
should return a tuple of lengthnparams(model)
which indicates how the fittable variables should be transformed for optimization, if at all. Elements should be1
for parameters that are constrained to always be positive,0
for parameters that can be positive or negative, and-1
for parameters that are constrained to always be negative.free_params(model::T)
should return anNTuple{nparams(model), Bool}
that istrue
for fittable parameters that you want to optimize andfalse
for fittable parameters that you want to stay fixed during optimization.
StarFormationHistories.LinearAMR
— TypeLinearAMR(α::Real,
β::Real,
T_max::Real = 137//10,
free::NTuple{2, Bool} = (true, true))
Subtype of AbstractAMR
implementing the linear age-metallicity relation where the mean metallicity at a lookback time $t_j$ (in Gyr) is μ_j = α * (T_max - t_j) + β
. α
is therefore a slope describing the rate of change in the metallicity per Gyr, and β
is the mean metallicity value of stars being born at a lookback time of T_max
, which has units of Gyr. free
controls whether α
and β
should be freely fit or fixed when passed into fit_sfh
; if free[1] == true
then α
will be freely fit, whereas it will fixed if free[1] == false
. free[2]
has the same effect but for β
.
LinearAMR(constraint1, constraint2, T_max::Real=137//10,
free::NTuple{2, Bool}=(true, true))
Construct an instance of LinearAMR
from MH constraints at two different lookback times. Each of constraint1
and constraint2
should be length-2 indexables (e.g., tuples) whose first element is a metallicity [M/H] and second element is a lookback time in Gyr. The order of the constraints does not matter.
Examples
julia> LinearAMR((-2.5, 13.7), (-1.0, 0.0), 13.7) isa LinearAMR{Float64}
true
julia> LinearAMR((-2.5, 13.7), (-1.0, 0.0), 13.7) == LinearAMR((-1.0, 0.0), (-2.5, 13.7), 13.7)
true
Fitting Functions
fit_sfh
and sample_sfh
both work with this AMR model.
The method StarFormationHistories.construct_x0_mdf
can be used to construct the stellar mass components $R_j$ of the initial guess vector x0
.
StarFormationHistories.construct_x0_mdf
— Functionx0::Vector = construct_x0_mdf(logAge::AbstractVector{T},
[ cum_sfh, ]
T_max::Number;
normalize_value::Number = one(T)) where T <: Number
Generates a vector of initial stellar mass normalizations for input to fit_sfh
and similar methods with a total stellar mass of normalize_value
. The logAge
vector must contain the log10(Age [yr])
of each isochrone that you are going to input as models. If cum_sfh
is not provided, a constant star formation rate is assumed. For the purposes of computing the constant star formation rate, the provided logAge
are treated as left-bin edges, with the final right-bin edge being T_max
, which has units of Gyr. For example, you might have logAge=[6.6, 6.7, 6.8]
in which case a final logAge of 6.9 would give equal bin widths (in log-space). In this case you would set T_max = exp10(6.9) / 1e9 ≈ 0.0079
so that the width of the final bin for the star formation rate calculation has the same log10(Age [yr])
step as the other bins.
A desired cumulative SFH vector cum_sfh::AbstractVector{<:Number}
can be provided as the second argument, which should correspond to a lookback time vector unique(logAge)
. You can also provide cum_sfh
as a length-2 indexable (e.g., a length-2 Vector{Vector{<:Number}}
) with the first element containing a list of log10(Age [yr])
values and the second element containing the cumulative SFH values at those values. This cumulative SFH is then interpolated onto the logAge
provided in the first argument. This method should be used when you want to define the cumulative SFH on a different age grid from the logAge
you provide in the first argument. The examples below demonstrate these use cases.
The difference between this function and StarFormationHistories.construct_x0
is that this function generates an x0
vector that is of length length(unique(logage))
(that is, a single normalization factor for each unique entry in logAge
) while StarFormationHistories.construct_x0
returns an x0
vector that is of length length(logAge)
; that is, a normalization factor for every entry in logAge
. The order of the coefficients is such that the coefficient x[i]
corresponds to the entry unique(logAge)[i]
.
Notes
Examples – Constant SFR
julia> isapprox( construct_x0_mdf([9.0, 8.0, 7.0], 10.0; normalize_value=5.0),
[4.504504504504504, 0.4504504504504504, 0.04504504504504504] )
true
julia> isapprox( construct_x0_mdf(repeat([9.0, 8.0, 7.0, 8.0]; inner=3), 10.0; normalize_value=5.0),
[4.504504504504504, 0.4504504504504504, 0.04504504504504504] )
true
julia> isapprox( construct_x0_mdf(repeat([9.0, 8.0, 7.0, 8.0]; outer=3), 10.0; normalize_value=5.0),
construct_x0([9.0, 8.0, 7.0], 10.0; normalize_value=5.0) )
true
Examples – Input Cumulative SFH defined on same logAge
grid
julia> isapprox( construct_x0_mdf([9.0, 8.0, 7.0], [0.9009, 0.99099, 1.0], 10.0; normalize_value=5.0),
[4.5045, 0.4504, 0.0450]; atol=1e-3 )
true
julia> isapprox( construct_x0_mdf([9.0, 8.0, 7.0], [0.1, 0.5, 1.0], 10.0; normalize_value=5.0),
[0.5, 2.0, 2.5] )
true
julia> isapprox( construct_x0_mdf([7.0, 8.0, 9.0], [1.0, 0.5, 0.1], 10.0; normalize_value=5.0),
[2.5, 2.0, 0.5] )
true
Examples – Input Cumulative SFH with separate logAge
grid
julia> isapprox( construct_x0_mdf([9.0, 8.0, 7.0],
[[9.0, 8.0, 7.0], [0.9009, 0.99099, 1.0]], 10.0; normalize_value=5.0),
construct_x0_mdf([9.0, 8.0, 7.0], [0.9009, 0.99099, 1.0], 10.0; normalize_value=5.0) )
true
julia> isapprox( construct_x0_mdf([9.0, 8.0, 7.0],
[[9.0, 8.5, 8.25, 7.0], [0.9009, 0.945945, 0.9887375, 1.0]], 10.0; normalize_value=5.0),
construct_x0_mdf([9.0, 8.0, 7.0], [0.9009, 0.99099, 1.0], 10.0; normalize_value=5.0) )
true
and calculate_coeffs
can be used to calculate per-template stellar mass coefficients (the $r_{j,k}$ above) given the results of a fit (which will be the $R_j$ in the equations above).
Implementation
While one could optimize the above model without an analytic gradient, such gradient-free methods are typically slower and less robust. One could also calculate the gradient numerically using finite differences or auto-differentiation, but these are still slower than analytic calculations. We will show that the gradient of this hierarchical model is analytic, allowing us to design an efficient optimization scheme.
Equation 21 in Dolphin 2001 gives the gradient of our objective function with respect to the underlying coefficients
\[\begin{aligned} F \equiv - \text{ln} \, \mathscr{L} &= \sum_i m_i - n_i \times \left( 1 - \text{ln} \, \left( \frac{n_i}{m_i} \right) \right) \\ \frac{\partial \, F}{\partial \, r_{j,k}} &= \sum_i c_{i,j,k} \left( 1 - \frac{n_i}{m_i} \right) \end{aligned}\]
where $c_{i,j,k}$ is the value of template $j,k$ in bin $i$ and $n_i$ is bin $i$ of the observed Hess diagram. These partial derivatives are easy to obtain, but we need partials with respect to the per-age-bin fitting parameters $R_j$. Given the above relation between $r_{j,k}$ and $R_j$, we can calculate these derivatives as
\[\begin{aligned} \frac{\partial \, F}{\partial \, R_j} &= \sum_k \, \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{\partial \, r_{j,k}}{\partial \, R_j} \\ \frac{\partial \, r_{j,k}}{\partial \, R_j} &= \frac{ \text{exp} \left( - \frac{1}{2} \left( \frac{ [\text{M}/\text{H}]_k - \mu_j}{\sigma} \right)^2 \right)}{\sum_k \text{exp} \left( - \frac{1}{2} \left( \frac{ [\text{M}/\text{H}]_k - \mu_j}{\sigma} \right)^2 \right)} = \frac{r_{j,k}}{R_j} \end{aligned}\]
Then we need only the partial derivatives of the objective function $F$ with respect to the MDF parameters, which in this case are $\alpha, \beta, \sigma$. For convenience we will rewrite
\[r_{j,k} = R_j \, \frac{ \text{exp} \left( - \frac{1}{2} \left( \frac{ [\text{M}/\text{H}]_k - \mu_j}{\sigma} \right)^2 \right)}{\sum_k \text{exp} \left( - \frac{1}{2} \left( \frac{ [\text{M}/\text{H}]_k - \mu_j}{\sigma} \right)^2 \right)} = R_j \, \frac{A_{j,k}}{\sum_k A_{j,k}}\]
as many different types of models can be expressed via this simplified notation by substituting the $A_{j,k}$ with different distributions. This allows us to write
\[\begin{aligned} \frac{\partial \, F}{\partial \, \beta} &= \sum_{j,k} \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{\partial \, r_{j,k}}{\partial \, \beta} \\ \frac{\partial \, r_{j,k}}{\partial \, \beta} &= R_j \left( \frac{1}{\sum_k \, A_{j,k}} \, \frac{\partial \, A_{j,k}}{\partial \, \beta} - \frac{A_{j,k}}{\left( \sum_k \, A_{j,k} \right)^2} \, \frac{\partial \, \sum_k \, A_{j,k}}{\partial \, \beta} \right) \\ &= \frac{R_j}{\sum_k \, A_{j,k}} \left( \frac{\partial \, A_{j,k}}{\partial \, \beta} - \frac{A_{j,k}}{\sum_k \, A_{j,k}} \sum_k \frac{\partial \, A_{j,k}}{\partial \, \beta} \right) \\ \end{aligned}\]
Given our specific definition of $A_{j,k}$ being a Gaussian distribution, we have
\[\begin{aligned} \mu_j &= \alpha \, \left( T_\text{max} - t_j \right) + \beta \\ \frac{\partial \, A_{j,k}}{\partial \, \beta} &= \frac{\partial}{\partial \, \beta} \, \left[ \text{exp} \left( - \frac{1}{2} \left( \frac{ [\text{M}/\text{H}]_k - \mu_j}{\sigma} \right)^2 \right) \right] \\ &= \frac{A_{j,k}}{\sigma^2} \left( [\text{M}/\text{H}]_k - \mu_j \right) \end{aligned}\]
We can now substitute this result into the above expressions to write
\[\begin{aligned} \frac{\partial \, F}{\partial \, \beta} &= \sum_{j,k} \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{\partial \, r_{j,k}}{\partial \, \beta} \\ &= \sum_{j,k} \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{R_j}{\sum_k \, A_{j,k}} \left( \frac{\partial \, A_{j,k}}{\partial \, \beta} - \frac{A_{j,k}}{\sum_k \, A_{j,k}} \sum_k \frac{\partial \, A_{j,k}}{\partial \, \beta} \right) \\ &= \sum_{j,k} \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{R_j}{\sigma^2 \, \sum_k \, A_{j,k}} \left( A_{j,k} \left( [\text{M}/\text{H}]_k - \mu_j \right) - \frac{A_{j,k}}{\sum_k \, A_{j,k}} \sum_k A_{j,k} \left( [\text{M}/\text{H}]_k - \mu_j \right) \right) \end{aligned}\]
It can be shown that the partial derivative of $F$ with respect to $\alpha$ is simply
\[\frac{\partial \, F}{\partial \, \alpha} = \sum_{j,k} \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{\partial \, r_{j,k}}{\partial \, \alpha} = \sum_{j,k} \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{\partial \, r_{j,k}}{\partial \, \beta} \times \left( T_\text{max} - t_j \right) \\\]
The partial derivative with respect to $\sigma$ is slightly more complicated, but we can start identically to how we started above when deriving $\frac{\partial \, F}{\partial \, \beta}$ with
\[\begin{aligned} \frac{\partial \, F}{\partial \, \sigma} &= \sum_{j,k} \frac{\partial \, F}{\partial \, r_{j,k}} \, \frac{\partial \, r_{j,k}}{\partial \, \sigma} \\ \frac{\partial \, r_{j,k}}{\partial \, \sigma} &= R_j \left( \frac{1}{\sum_k \, A_{j,k}} \, \frac{\partial \, A_{j,k}}{\partial \, \sigma} - \frac{A_{j,k}}{\left( \sum_k \, A_{j,k} \right)^2} \, \frac{\partial \, \sum_k \, A_{j,k}}{\partial \, \sigma} \right) \\ &= \frac{R_j}{\sum_k \, A_{j,k}} \left( \frac{\partial \, A_{j,k}}{\partial \, \sigma} - \frac{A_{j,k}}{\sum_k \, A_{j,k}} \sum_k \frac{\partial \, A_{j,k}}{\partial \, \sigma} \right) \\ \end{aligned}\]
Then all we need is
\[\frac{\partial \, A_{j,k}}{\partial \, \sigma} = \frac{A_{j,k} \, \left( [\text{M}/\text{H}]_k - \mu_j \right)^2}{\sigma^3}\]
which we can substitute into the above expressions to find $\frac{\partial \, F}{\partial \, \sigma}$.