align_chunks

align_chunks takes two compatible four-dimensional xarray.DataArray, and rechunks each of them along their latitude and longitude dimensions such that they match and can be parallelized.

X, Y = xc.align_chunks(X, Y, lat_chunks=5, lon_chunks=5, x_lat_dim=None, x_lon_dim=None, y_lat_dim=None, y_lon_dim=None, x_feature_dim=None, y_feature_dim=None, x_sample_dim=None, y_sample_dim=None):

Where X and Y are the data arrays of interest and lat_chunks and lon_chunks specify the number of chunks along the latitude and longitude dimensions respectively.

This function will convert in-memory data arrays to dask arrays. This not required, but allows you to use dask.distributed.Client to distributed XCast computations to multiple subprocesses or a task scheduler like slurm. If the data arrays you’re working with are in-memory (i.e., the da.values returns a NumPy array, rather than a Dask array) then using a dask client will not yield any parallelism or benefit.

Instead of using align_chunks, you could re-chunk manually with xarray.DataArray.chunk, or open your netcdf files as out-of-memory dask arrays by passing the chunks argument to xarray.open_dataset. The size and shape of the chunks must match exactly between X and Y in order to be compatible- you’ll get a Dask error message if they do not.

If you are using a dask client to parallelize your workflow as below, remember to pass the rechunk=False keyword argument to your XCast functions and class methods, otherwise XCast will by default convert whatever you pass into a single-chunk dask array and you’ll get slower performance.

import xcast as xc 
from dask.distributed import Client 

X, Y = xc.align_chunks(X, Y, lat_chunks=5, lon_chunks=5, x_lat_dim=None, x_lon_dim=None, y_lat_dim=None, y_lon_dim=None, x_feature_dim=None, y_feature_dim=None, x_sample_dim=None, y_sample_dim=None):
with Client(): 
  ( ... some XCast functions here ... )