CCA

This class represents an estimator which performs canonical correlation analysis using xcast.canonical_correlation_analysis.

X and Y principal component time series and loadings, as well as CCA time series and loadings (for both X and Y) are calculated using singular value decomposition and saved on the fitted CCA class as xarray.DataArrays for easy visualization / post processing.

cca = xc.CCA(
  xmodes=(1, 5), 
  ymodes=(1, 5), 
  ccamodes=(1,5), 
  crossvalidation_splits='auto',
  probability_method='error_variance', 
  latitude_weighting=None,
  search_override=(None, None, None)
):

like canonical_correlation_analysis, CCA by default performs a comprehensive search across the three-dimensional space defined by the xmodes, ymodes, and ccamodes arguments, where each represents the minimum and maximum number of X-PCA/Y-PCA/CCA modes to retain, respectively.

If you want to explicitly pass the number of X-EOFs (PCs), Y-EOFS (PCs) and CC-Modes to retain, use the search_override argument, representing (X-Modes, Y-Modes, CCA-Modes) as a tuple of integers. Note that this must satisfy the condition CCA-Modes <= min(X-Modes, Y-Modes) otherwise you’ll get an assertion error. Read up on CCA for the mathematical justification of this condition.

Once instantiated, you need to fit cca on two numpy-arrays, x, and y:

cca.fit(x, y) 

After fitting, the principal component and CCA scores, loadings, and singular values will be available as NumPy arrays as attributes on the cca object, named as follows:

y_eof_scores           = cca.y_eof_scores             # PC time series for y
y_eof_loadings         = cca.y_eof_loadings           # EOFs/PC loadings for y
y_pct_variances        = cca.y_variance_explained     # Percent Variance explained by each CCA mode for y
y_pct_variances        = cca.y_variance_explained     # Percent Variance explained by each CCA mode for X
x_eof_scores           = cca.x_eof_scores             # PC time series for x
x_eof_loadings         = cca.x_eof_loadings           # EOFs/PC loadings for x
x_cca_loadings         = cca.x_cca_loadings           # CCA coefficients for the X PC time-series projected back onto the x-eof loadings 
x_cca_scores           = cca.x_cca_scores             # Time-series of CCA scores associated with X 
y_cca_loadings         = cca.y_cca_loadings           # CCA coefficients for the Y PC time-series projected back onto the y-eof loadings
y_cca_scores           = cca.y_cca_scores             # Time-series of CCA scores associated with Y
canonical_correlations = cca.canonical_correlations   # CCA values - canonical correlation between time series for each mode

you can then also make deterministic and probabilistic predictions for new data like X (X1, maybe) as follows:

deterministic_preds = cca.predict(X1)
tercile_probabilities = cca.predict_proba(X1) 
nonexceedance_30thquantile = cca.predict_proba(X1, quantile=0.3) 

Pro Tip: if the standard deviation of any gridpoint in any cross validation window in Y is too close to zero, an assertionerror will be thrown. to prevent this, first apply a drymask to your data.