| Title: | Peak Detection and Fire History from Sediment-Charcoal Records |
|---|---|
| Description: | A program for reconstructing local fire histories from high-resolution, continuously sampled lake-sediment charcoal records. 'CharAnalysis' decomposes a charcoal record into low- and high-frequency components and uses locally defined thresholds to separate fire signal from noise, following the approach of Higuera et al. (2009) <doi:10.1890/07-2019.1>, with underlying assumptions and rationale described in Higuera et al. (2010) <doi:10.1071/WF09134>. The package is designed for macroscopic charcoal records with contiguous sampling fine enough to resolve individual fire events, and is not appropriate for low-resolution or discontinuously sampled records. See the package URL for the User's Guide and application examples. |
| Authors: | Philip Higuera [aut, cre] (ORCID: <https://orcid.org/0000-0003-1278-4095>) |
| Maintainer: | Philip Higuera <[email protected]> |
| License: | GPL-3 |
| Version: | 2.0.3 |
| Built: | 2026-06-02 08:09:09 UTC |
| Source: | https://github.com/cran/CharAnalysis |
Pure-R implementation of the shifted-window lowess algorithm from MATLAB's
charLowess.m v2.0, which itself replicates MATLAB's
smooth(y, k, 'lowess') from the Curve Fitting Toolbox. Replaces
the previous wrapper around lowess, which used a
shrinking window at record boundaries and different tricubic weight
normalisation, causing systematic differences of ~1e-3 in charBkg.
char_lowess(y, x = NULL, span = 0.1, iter = 0L)char_lowess(y, x = NULL, span = 0.1, iter = 0L)
y |
Numeric vector of values to smooth (NaN-free; bridge NaNs before calling and restore afterward, matching the MATLAB workflow). |
x |
Numeric vector of x-positions (same length as |
span |
Smoothing span.
Converted to integer |
iter |
Number of bisquare robustness passes after the initial fit.
|
## Algorithm
For each point , a window of exactly points is selected:
the window is centred on at interior points and SHIFTED (not
shrunk) at record boundaries, maintaining neighbours throughout.
Each neighbour receives a tricubic weight:
where is the distance to the furthest point in
the window (not the half-window radius). A weighted least-squares line
is fitted and evaluated at . This matches
charLowess.m lines 109-147 exactly.
For iter > 0 ('rlowess'), bisquare robustness weights are
computed from the residuals after each WLS pass and multiplied into the
tricubic weights for the next pass (matching charLowess.m lines
151-157).
## Why not stats::lowess()?
stats::lowess() trims (shrinks) the window at boundaries and
normalises tricubic weights by the half-window radius . The two
differences compound to ~1e-3 absolute error in charBkg for the CO
dataset (500 yr / 15 yr = 33-point window, 17-point boundary zone).
Numeric vector of smoothed values, same length as y.
[char_smooth()], [char_thresh_local()]
Reads the *_charParams.csv (or .xlsx) parameter file and the
companion charcoal data file, then unpacks all analysis parameters into
named lists that mirror the MATLAB struct layout.
char_parameters(file_name)char_parameters(file_name)
file_name |
Path to the |
**CSV convention** (unchanged from MATLAB v1.1): the parameter file is
named <site>_charParams.csv and the companion data file is
<site>_charData.csv in the same directory. The site name is the
basename with the trailing _charParams.csv (15 characters) removed,
mirroring MATLAB's fileName(1:end-15) idiom.
**Parameter vector layout**: column 3 ("Parameters") of the CSV, rows
2-26 (25 data rows after the header), maps to positions 1-25 of the
internal charParams vector exactly as in the MATLAB codebase.
Unused zoneDiv slots are filled with -9999 in the CSV
(sentinel) or NaN in Excel; both are stripped before the list is
returned.
A named list with elements:
Numeric matrix (n_samples x 6+): cmTop, cmBot, ageTop, ageBot, charVol, charCount.
List: zoneDiv, yrInterp,
transform.
List: method, yr.
List: cPeak, threshType,
threshMethod, threshValues, minCountP,
peakFrequ, bkgSens.
List: saveFigures, save,
allFigures.
Character string: site name derived from the filename stem.
[char_validate_params()], [char_pretreatment()], [CharAnalysis()]
# Read a bundled example parameter file: params_file <- system.file("validation", "CO_charParams.csv", package = "CharAnalysis") p <- char_parameters(params_file) p$pretreatment$yrInterp # interpolation interval (yr) p$smoothing$method # smoothing method index (1-5)# Read a bundled example parameter file: params_file <- system.file("validation", "CO_charParams.csv", package = "CharAnalysis") p <- char_parameters(params_file) p$pretreatment$yrInterp # interpolation interval (yr) p$smoothing$method # smoothing method index (1-5)
Mirrors CharPeakID.m from the MATLAB v2.0 codebase. For each
threshold column, samples where C_peak exceeds the threshold are flagged,
consecutive exceedances are collapsed to a single event (keeping the
last sample of each run, i.e. the oldest, matching the MATLAB
v1.1 / v2.0 convention), and each event is screened against a
minimum-count criterion using the Shuie-Bain (1982) extension of the
Detre-White (1970) test for unequal sediment volumes.
char_peak_id(charcoal, pretreatment, peak_analysis, char_thresh)char_peak_id(charcoal, pretreatment, peak_analysis, char_thresh)
charcoal |
Named list containing:
|
pretreatment |
Named list with element |
peak_analysis |
Named list with elements |
char_thresh |
Named list returned by [char_thresh_global()] or
[char_thresh_local()], containing |
## Threshold value matrix
For **global** thresholds (threshType == 1), char_thresh$pos
is a constant-row matrix reused directly. For
**local** thresholds (threshType == 2), char_thresh$pos is
already (per-sample values).
## Consecutive-peak removal After flagging all exceedances, a diff-based pass retains only the last sample of each consecutive run – the oldest sample within a group of contiguous above-threshold values. This matches the MATLAB v1.1 algorithm (which the v2.0 comment documents correctly despite the v1.1 comment being misleading).
## Minimum-count test
For each identified peak in column , a time window of
yr is constructed around the peak, then narrowed to the
adjacent peaks when they fall within the window. The test statistic is
and the p-value is (standard normal CDF; equivalent to
MATLAB's 1 - tcdf(d, 1e10) because ).
Peaks with are removed.
A named list with two components:
Input charcoal list augmented with:
charPeaks – numeric: 1 at peak
samples, 0 elsewhere.
charPeaksThresh – numeric: threshold
value at each identified peak, 0 elsewhere.
peaksTotal – numeric vector length : total
peaks per threshold column.
threshFRI – numeric matrix ():
fire-return intervals derived from peak ages per threshold column.
Input char_thresh list augmented with
minCountP – matrix of Shuie-Bain p-values
(NaN where not computed).
[char_thresh_local()], [char_thresh_global()], [CharAnalysis()]
Nine output figures mirroring the MATLAB CharAnalysis v2.0 plots. Function names follow R snake_case conventions.
char_plot_peaks(out) char_plot_fire_history(out) char_plot_cumulative(out) char_plot_fri(out) char_plot_zones(out) char_plot_raw(out) char_plot_thresh_diag(out) char_plot_sni(out) char_plot_all(out, save = FALSE, out_dir = NULL, width = 11, height = 8.5)char_plot_peaks(out) char_plot_fire_history(out) char_plot_cumulative(out) char_plot_fri(out) char_plot_zones(out) char_plot_raw(out) char_plot_thresh_diag(out) char_plot_sni(out) char_plot_all(out, save = FALSE, out_dir = NULL, width = 11, height = 8.5)
out |
Named list returned by [CharAnalysis()]. Must contain
|
save |
Logical. If |
out_dir |
Directory for saved PDFs. Required when |
width, height
|
PDF dimensions in inches. Defaults: 11 x 8.5. |
Figure 1 (allFigures only): C_raw, C_interpolated, and C_background smoothing options.
Figure 2 (allFigures only): Local threshold determination diagnostics (5x5 window grid).
Figure 3: Resampled CHAR with background trend (top) and C_peak with thresholds and peak markers (bottom).
Figure 4: Sensitivity to alternative thresholds and signal-to-noise index.
Figure 5: Cumulative peaks through time.
Figure 6: FRI distributions by zone with Weibull model fits.
Figure 7: Peak magnitude (top), FRIs through time with smoothed FRI and CI ribbon (middle), and smoothed fire frequency (bottom).
Figure 8: Between-zone comparisons of raw CHAR distributions (CDF and box plots).
Convenience wrapper: produces all figures and optionally saves them as PDF files.
Requires the ggplot2 package. Multi-panel layout uses patchwork if available; otherwise panels are printed separately with a message.
Individual figure functions each return a patchwork / ggplot2
object. char_plot_all() returns a named list of all figure objects.
[CharAnalysis()], [char_post_process()]
# Run pipeline on the bundled example dataset, then plot: params_file <- system.file("validation", "CO_charParams.csv", package = "CharAnalysis") out <- CharAnalysis(params_file) char_plot_peaks(out) char_plot_fire_history(out) # Individual figures can also be called directly: char_plot_sni(out) char_plot_fri(out) # Save all figures to PDF in a temporary directory: char_plot_all(out, save = TRUE, out_dir = tempdir())# Run pipeline on the bundled example dataset, then plot: params_file <- system.file("validation", "CO_charParams.csv", package = "CharAnalysis") out <- CharAnalysis(params_file) char_plot_peaks(out) char_plot_fire_history(out) # Individual figures can also be called directly: char_plot_sni(out) char_plot_fri(out) # Save all figures to PDF in a temporary directory: char_plot_all(out, save = TRUE, out_dir = tempdir())
Resamples the raw charcoal data to equal time steps using a proportion-weighted scheme, fills missing-value gaps by linear interpolation, computes charcoal accumulation rates (CHAR), and optionally applies a log transformation.
char_pretreatment( char_data, site, pretreatment, results = NULL, plot_data = 1L )char_pretreatment( char_data, site, pretreatment, results = NULL, plot_data = 1L )
char_data |
Numeric matrix (n x 6+): cmTop, cmBot, ageTop, ageBot, charVol, charCount. Rows sorted youngest-first (ascending ageTop). |
site |
Character string; site name used in diagnostic messages. |
pretreatment |
Named list with elements:
|
results |
Named list; only |
plot_data |
0/1 integer flag. Ignored in R (no diagnostic plots); included for API symmetry with the MATLAB function signature. |
Mirrors CharPretreatment.m from the MATLAB v2.0 codebase. The
vectorised proportion matrix (four broadcast cases) produces results
numerically identical to the MATLAB implementation within floating-point
tolerance (~1e-14).
## Proportion matrix
For each resampled interval and each raw sample ,
prop_matrix[i,j] is the fraction of the raw sample's age span
that falls within the resampled interval. Four mutually exclusive
overlap geometries (Cases A-D) are evaluated via matrix broadcasting
across the full grid – no loops required.
| Case | Geometry | Overlap | |——|———————————————–|——————–| | A | Raw straddles the **bottom** edge | rsAgeBot - ageTop | | B | Raw straddles the **top** edge | ageBot - rsAgeTop | | C | Raw lies **entirely within** resampled interval | ageBot - ageTop | | D | Resampled lies **entirely within** raw sample | yrInterp |
## zoneDiv auto-correction
If zoneDiv[end] exceeds the bottom age of the last raw sample,
the terminal resampled intervals would have no overlapping raw data and
accI would be NA. These NAs propagate into
charBkg and can hang the GMM in Phase 2. The value is silently
corrected to lastAgeBotInData and the user is notified
(v2.0 behaviour, preserved here).
Named list with three elements:
List of raw and resampled series:
cm, count, vol, con, ybp, acc (raw);
cmI, countI, volI, conI, accI, ybpI (resampled).
Input list returned with yrInterp updated
when it was 0 (auto), and zoneDiv[end] corrected if it
exceeded the bottom age of the last raw sample.
Integer matrix (nGaps x 2) of gap row-index pairs, or
matrix(NA_integer_, 0, 2) when no gaps exist.
[char_parameters()], [char_validate_params()], [CharAnalysis()]
Applies one of five smoothing methods to the interpolated charcoal
accumulation rate series (charcoal$accI) and stores the result in
charcoal$accIS. Mirrors CharSmooth.m from the MATLAB v2.0
codebase.
char_smooth(charcoal, pretreatment, smoothing, results = NULL, plot_data = 0L)char_smooth(charcoal, pretreatment, smoothing, results = NULL, plot_data = 0L)
charcoal |
Named list with elements |
pretreatment |
Named list with element |
smoothing |
Named list with elements:
|
results |
Named list (not used in R; kept for API symmetry). |
plot_data |
0/1 flag; ignored in R (no diagnostic plots). |
## Smoothing methods
| Index | Name | Implementation |
|——-|——|—————-|
| 1 | Lowess | [char_lowess()] with iter = 0 |
| 2 | Robust Lowess | [char_lowess()] with iter = 4 |
| 3 | Moving average | zoo::rollapply(..., mean, partial=TRUE) |
| 4 | Running median + Lowess | Shifted-window median loop, then [char_lowess()] |
| 5 | Running mode + Lowess | Shifted-window 100-bin mode loop, then [char_lowess()] |
## Span convention
The smoothing window width in data-point units is
s = smoothing$yr / pretreatment$yrInterp. This is passed to
[char_lowess()] as span = s (number of points), which
converts it to the fraction required by stats::lowess() via
f = round(s) / N.
## NaN bridging
NaN values in accI (from record gaps) cannot be passed to
[char_lowess()] directly. They are bridged by linear interpolation
before smoothing and restored afterward, matching the MATLAB fallback
path in CharSmooth.m (used when the Curve Fitting Toolbox is
absent). Methods 4 and 5 always use the bridged series.
## Window selection for methods 4 and 5
The boundary window is SHIFTED (not shrunk) to maintain exactly
round(s) samples, matching MATLAB's loop logic. Note that
MATLAB's round() rounds 0.5 away from zero while R uses banker's
rounding (round half to even); this can cause single-sample differences
in window boundaries for odd half-integers.
The input charcoal list with an additional element
accIS: the smoothed C_background series (length N).
[char_lowess()], [CharAnalysis()]
Determines a single threshold applied uniformly across the entire record.
The noise component of C_peak is modelled as a zero-mean Gaussian
(residuals), a one-mean Gaussian (ratios), or via a Gaussian mixture
model (GMM). Mirrors CharThreshGlobal.m from the MATLAB v2.0
codebase.
char_thresh_global( charcoal, pretreatment, peak_analysis, site = NULL, results = NULL, plot_data = 0L, bkg_sens_in = 0L )char_thresh_global( charcoal, pretreatment, peak_analysis, site = NULL, results = NULL, plot_data = 0L, bkg_sens_in = 0L )
charcoal |
Named list containing |
pretreatment |
Named list with |
peak_analysis |
Named list with |
site |
Character string (site name; unused in R, kept for API symmetry). |
results |
Named list (unused in R, kept for API symmetry). |
plot_data |
0/1 flag; ignored in R. |
bkg_sens_in |
0/1 flag; ignored in R (no sensitivity loop). |
## Gaussian assumption (threshMethod = 2)
For residuals (cPeak = 1): noise is zero-mean; sigma is estimated
from the negative half of C_peak, mirrored and pooled.
For ratios (cPeak = 2): noise is one-mean; values are shifted to
zero, mirrored, shifted back, and sigma estimated from the pooled set.
## GMM (threshMethod = 3)
MATLAB's GaussianMixture(X, 2, 2, false) is replicated by
gaussian_mixture_em(X) from utils_gaussian_mixture.R, using
the same first/last-point initialisation and loose convergence criterion as
the original Bowman CLUSTER EM. The noise component is identified as the
Gaussian with the smaller mean (matching MATLAB's
noiseIdx = find(mu == min(mu), 1)).
## Bin-lookup for threshold values
Percentile thresholds are mapped to the nearest bin in
possible (251 equally-spaced values spanning C_peak range).
The v2.0 bug fix is preserved: both sides of the abs() comparison
use the CHAR-unit threshold value thresh[i], not the raw percentile.
Named list char_thresh with elements:
Numeric vector of 251 candidate threshold bins.
Numeric matrix [N x 4]: positive threshold for each of the
four threshValues.
Numeric matrix [N x 4] (method 1) or [N x 1] (methods 2-3): negative threshold.
Estimated noise PDF evaluated at possible
(methods 2-3), or scalar -99 (method 1).
Fitted noise-component mean.
Fitted noise-component standard deviation.
Signal-to-noise index (scalar).
Sentinel vector (-999, length N).
[char_thresh_local()], [char_smooth()], [CharAnalysis()]
Determines a sliding-window threshold value for each sample, based on the
distribution of C_peak within a window centred on that sample. The noise
component within each window is modelled as a zero-mean Gaussian (residuals),
a one-mean Gaussian (ratios), or via a Gaussian mixture model (GMM).
Mirrors CharThreshLocal.m from the MATLAB v2.0 codebase.
char_thresh_local( charcoal, smoothing, peak_analysis, site = NULL, results = NULL, plot_data = 0L )char_thresh_local( charcoal, smoothing, peak_analysis, site = NULL, results = NULL, plot_data = 0L )
charcoal |
Named list with |
smoothing |
Named list with |
peak_analysis |
Named list with |
site |
Character string (site name; unused in R, kept for API symmetry). |
results |
Named list (unused in R, kept for API symmetry). |
plot_data |
0/1 flag; ignored in R (no diagnostic plots). |
## Window selection
The half-window is hw = round(0.5 * smoothing$yr / r) samples, where
r = mean(diff(charcoal$ybpI)) is the record resolution. The window
is shifted (not shrunk) at record boundaries, matching MATLAB's loop logic.
## NaN bridging within windows
NaN entries in charcoal$peak (from record gaps) are replaced with the
neutral value (0 for residuals, 1 for ratios) before distribution fitting.
This preserves window size while preventing gap samples from biasing the fit,
matching the v2.0 bug fix documented in CharThreshLocal.m.
## Small-sample fallback If fewer than 4 non-neutral samples exist in the window, a simple Gaussian is fitted (same as method 2) and the KS test is skipped.
## GMM (threshMethod = 3)
MATLAB's GaussianMixture(X, 2, 2, false) is replicated by
gaussian_mixture_em(X) from utils_gaussian_mixture.R. This
uses the same first/last-point initialisation and loose convergence
criterion () as the original Bowman CLUSTER
EM, substantially improving agreement with MATLAB threshold values compared
to mclust. The noise component is identified as the Gaussian with
the smaller mean. Poor-fit fallback (mu1 == mu2) re-fits with K = 3 and
takes the two components with the smallest means, mirroring the MATLAB v2.0
bug fix (the fallback passes the local window X, not the full record).
## KS goodness-of-fit
MATLAB evaluates the fitted normal CDF at 101 equally-spaced bin centres
and calls kstest() with a custom CDF table. R uses
stats::ks.test(noise, "pnorm", mu, sigma), which evaluates the
continuous CDF at each observation. P-values may differ by a small amount
for small samples; the statistic converges as sample size grows.
## Post-loop smoothing
After the per-sample loop, pos, neg, and SNI are
smoothed with char_lowess(span = smoothing$yr / r, iter = 0).
SNI is then clamped to .
Named list char_thresh with elements:
Numeric matrix [N x 4]: per-sample positive threshold for each
of the four threshValues, after Lowess smoothing.
Numeric matrix [N x 4]: per-sample negative threshold, after Lowess smoothing.
Numeric vector [N]: signal-to-noise index time series, Lowess-
smoothed and clamped to .
Numeric vector [N]: KS goodness-of-fit p-values per sample (NA where fewer than 4 noise samples exist).
[char_thresh_global()], [char_smooth()], [CharAnalysis()]
Checks all user-supplied parameters for internal consistency and plausible ranges. Throws a descriptive error and halts if any check fails. Emits warnings for non-fatal but unusual settings.
char_validate_params( char_data, pretreatment, smoothing, peak_analysis, results )char_validate_params( char_data, pretreatment, smoothing, peak_analysis, results )
char_data |
Numeric matrix (n x 6+). |
pretreatment |
List with |
smoothing |
List with |
peak_analysis |
List with |
results |
List (not checked; included for API symmetry). |
Mirrors CharValidateParams.m from the MATLAB v2.0 codebase. All 15
checks and 2 non-fatal warnings are reproduced in the same order.
NULL invisibly on success. Stops with an error on failure.
[char_parameters()], [CharAnalysis()]
Writes the char_results matrix (assembled by
[char_post_process()]) to a CSV file whose column headers and numeric
format match the MATLAB CharAnalysis v2.0 *_charResults.csv output
exactly.
char_write_results(char_results, site, out_dir, digits = 7L)char_write_results(char_results, site, out_dir, digits = 7L)
char_results |
Numeric matrix ( |
site |
Character string: site name, used to build the
output filename ( |
out_dir |
Directory in which to create the file. Required;
no default. Created if it does not exist. Use |
digits |
Number of significant digits for numeric output.
Default 7, matching MATLAB's |
## Column order and headers
Columns follow the MATLAB charResults layout:
cm Top_i (cm)
age Top_i (yr BP)
char Count_i (#)
char Vol_i (cm3)
char Con_i (# cm-3)
char Acc_i (# cm-2 yr-1)
charBkg (# cm-2 yr-1)
char Peak (# cm-2 yr-1)
thresh 1 (# cm-2 yr-1)
thresh 2 (# cm-2 yr-1)
thresh 3 (# cm-2 yr-1)
thresh FinalPos (# cm-2 yr-1)
thresh FinalNeg (# cm-2 yr-1)
SNI (index)
thresh GOF (p-val)
peaks 1
peaks 2
peaks 3
peaks Final
peaks Insig.
peak Mag (# cm-2 peak-1)
smPeak Frequ (peaks 1ka-1)
smFRIs (yr fire-1)
nFRIs (#)
mFRI (yr fire-1)
mFRI_uCI (yr fire-1)
mFRI_lCI (yr fire-1)
WBLb (yr)
WBLb_uCI (yr)
WBLb_lCI (yr)
WBLc (unitless)
WBLc_uCI (unitless)
WBLc_lCI (unitless)
## NA / empty handling
NA values are written as empty fields (no quotes), matching
MATLAB's blank-cell convention. This applies to:
smFRIs rows beyond the smoothed-FRI coverage window;
zone-statistics columns (24-33) for rows beyond the last zone;
any column not computed for a given run configuration.
## CI column convention
MATLAB stores bootstrap CIs as [quantile(2.5%), quantile(97.5%)]
in the columns labelled uCI / lCI respectively (i.e.
uCI = lower bound, lCI = upper bound – MATLAB's own
labelling is inverted). The R output follows the same convention so that
column indices are identical to the MATLAB reference file.
Invisibly returns the full path to the file written.
[char_post_process()], [CharAnalysis()]
# Run the pipeline on the bundled example and write results to tempdir: params_file <- system.file("validation", "CO_charParams.csv", package = "CharAnalysis") out <- CharAnalysis(params_file) char_write_results(out$char_results, out$site, out_dir = tempdir())# Run the pipeline on the bundled example and write results to tempdir: params_file <- system.file("validation", "CO_charParams.csv", package = "CharAnalysis") out <- CharAnalysis(params_file) char_write_results(out$char_results, out$site, out_dir = tempdir())
Top-level wrapper that calls each analytical stage in sequence and returns all intermediate and final results as a single named list.
CharAnalysis(file_name = NULL)CharAnalysis(file_name = NULL)
file_name |
Path to the |
Named list with the following elements:
List of raw and resampled series. After Phase 2:
includes accIS (smoothed background) and peak
(C_peak, either residuals or ratios). After Phase 3: also includes
charPeaks ( binary peak matrix),
charPeaksThresh, peaksTotal, and threshFRI.
After Phase 4: also includes peakInsig,
peakMagnitude, smoothedFireFrequ, peaksFrequ.
Pretreatment parameter list (possibly updated by
[char_pretreatment()] – e.g. yrInterp auto-set,
zoneDiv end-value corrected).
Smoothing parameter list.
Peak-analysis parameter list.
Results / output parameter list.
Character string: site name.
Integer matrix (nGaps x 2) of missing-value gap indices.
Threshold list returned by [char_thresh_global()] or
[char_thresh_local()]. Contains pos, neg, SNI,
GOF, and (after Phase 3) minCountP – the
matrix of Shuie-Bain p-values.
Post-processing list from [char_post_process()]: FRI
series, smoothed FRI curve, per-zone Weibull statistics, and the
assembled char_results output matrix ().
Numeric matrix () matching the
MATLAB charResults output exactly (alias of
post$char_results).
[char_parameters()], [char_validate_params()], [char_pretreatment()], [char_smooth()], [char_thresh_global()], [char_thresh_local()], [char_peak_id()], [char_post_process()]
# Run the full pipeline on the bundled example dataset: params_file <- system.file("validation", "CO_charParams.csv", package = "CharAnalysis") out <- CharAnalysis(params_file) # Phase 2 outputs head(data.frame(ageTop_i = out$charcoal$ybpI, charAcc_i = out$charcoal$accI, charBkg_i = out$charcoal$accIS, charPeak_i = out$charcoal$peak)) # Phase 3 outputs sum(out$charcoal$charPeaks[, ncol(out$charcoal$charPeaks)])# Run the full pipeline on the bundled example dataset: params_file <- system.file("validation", "CO_charParams.csv", package = "CharAnalysis") out <- CharAnalysis(params_file) # Phase 2 outputs head(data.frame(ageTop_i = out$charcoal$ybpI, charAcc_i = out$charcoal$accI, charBkg_i = out$charcoal$accIS, charPeak_i = out$charcoal$peak)) # Phase 3 outputs sum(out$charcoal$charPeaks[, ncol(out$charcoal$charPeaks)])
Fits a K-component univariate Gaussian mixture using the same EM algorithm
bundled with CharAnalysis v1.1 and v2.0 (Bowman CLUSTER implementation).
Replicates three key behaviours that distinguish it from mclust:
gaussian_mixture_em(x, k = 2L)gaussian_mixture_em(x, k = 2L)
x |
Numeric vector of observations (the C_peak values in the window). |
k |
Integer number of components (default 2). |
First/last-point initialisation: for K = 2, component
means are seeded at the first and last elements of the input data
vector (not at sorted min/max; the data are in time order). This is
the initMixture() behaviour from MATLAB.
Loose convergence criterion: EM stops when the per-step
log-likelihood gain falls to or below
where
for univariate data. This is much looser than the mclust
default () and causes MATLAB to freeze closer to the
initial configuration.
Variance regularisation: a small floor
is added to each component
variance after every M-step, preventing degenerate (zero-variance)
solutions.
Because CharAnalysis always calls GaussianMixture(X, 2, 2, false)
(i.e.\ finalK = initK = 2), the MDLReduceOrder() path is
never exercised and is therefore not implemented here.
Named list:
Numeric vector length k: component means sorted
ascending.
Numeric vector length k: component standard
deviations (= sqrt of fitted variance), sorted to match mu.
Numeric vector length k: mixing proportions, sorted
to match mu.
Scalar: log-likelihood at convergence.
[char_thresh_local()], [char_thresh_global()]