PCR
This class represents an estimator which performs Principal Components Regression using xcast.linear_regression
and xcast.EOF
in concert.
X principal component time series and loadings are calculated using singular value decomposition and saved on the fitted PCR class as xarray.DataArrays for easy visualization / post processing. linear_regression
is then fit between the PC time-series and each of the individual gridpoints on Y.
pcr = xc.PCR(
eof_modes=None, # integer number of EOF modes to use - if none, use min(n_samples, m_features)
latitude_weighting=False, # latitude weighting? yes or no
separate_members=True, # whether or not to calculate principal components of X features jointly (stacking them all as equally weighted features) or separately.
crossvalidation_splits=5, # number of splits to use for K-Fold cross validation
):
Once instantiated, you need to fit pcr
on two numpy-arrays, x
, and y
:
pcr.fit(x, y)
After fitting, the principal component scores, loadings, and singular values will be available as xarray.DataArrays as attributes on the pcr
object, named as follows:
x_eof_scores = pcr.eof_scores # PC time series for x
x_eof_loadings = pcr.eof_loadings # EOFs/PC loadings for x
x_eof_variance_explained = pcr.eof_variance_explained # percent of variance explained by each X EOF mode
you can then also make deterministic and probabilistic predictions for new data like X
(X1
, maybe) as follows:
deterministic_preds = pcr.predict(X1)
tercile_probabilities = pcr.predict_proba(X1)
nonexceedance_30thquantile = pcr.predict_proba(X1, quantile=0.3)
Pro Tip: if the standard deviation of any gridpoint in any cross validation window in X is too close to zero, an assertionerror will be thrown. to prevent this, first apply a drymask to your data.