Samples

Module contents

class Samples(input_array: Float[ndarray, 'n_samples batch y_dim'])[source]

Bases: object

A wrapper class for the output of Treeffuser, the samples from the conditional distribution p(y|x). It provides convenient methods to compute various statistics from the samples.

Parameters:

input_array (np.ndarray) – An array containing samples with shape (n_samples, batch, y_dim).

n_samples

Number of samples.

Type:

int

batch

Batch size.

Type:

int

y_dim

Dimension of the response variable.

Type:

int

shape

Shape of the samples array.

Type:

tuple

ndim

Number of dimensions of the samples array.

Type:

int

sample_apply(fun: Callable[[ndarray], ndarray]) Float[ndarray, 'batch y_dim'][source]

Apply a function to the samples for each x.

Parameters:

func (callable) – A function to apply to each sample. The function should take a numpy array of shape (n_samples,) and return a numpy array of the same shape.

Returns:

result – The result of applying the function to each row of the samples.

Return type:

np.ndarray

sample_confidence_interval(confidence: float = 0.95) Float[ndarray, '2 batch y_dim'][source]

Estimate the confidence interval of the samples for each x using the empirical quantiles of the samples.

Parameters:

confidence (float) – The confidence level for the interval.

Returns:

ci – The confidence interval of the samples for each x.

Return type:

np.ndarray

sample_correlation() Float[ndarray, 'batch y_dim y_dim'][source]

Compute the correlation matrix of the samples for each x. Estimate: corr[Y | X = x] for each x.

Returns:

correlation – The correlation matrix of the samples for each x.

Return type:

np.ndarray

sample_kde(bandwidth: float | Literal['scott', 'silverman'] = 1.0, verbose: bool = False) List[KernelDensity][source]

Compute the Kernel Density Estimate (KDE) for each x. Estimate: KDE[Y | X = x] for each x using Gaussian kernels from sklearn.neighbors.

Parameters:
  • bandwidth (float or {'scott', 'silverman'}, default=1.0) – The bandwidth of the kernel. Bandwidth can be specified as a scalar value or as a string: - ‘scott’: Scott’s rule of thumb. - ‘silverman’: Silverman’s rule of thumb.

  • verbose (bool, default=False) – Whether to display progress bars.

Returns:

kdes – A list of KernelDensity objects, one for each x.

Return type:

list of KernelDensity

sample_max() Float[ndarray, 'batch y_dim'][source]

Compute the maximum of the samples for each x. Estimate: max[Y | X = x] for each x. Equivalent to np.max(samples.to_numpy(), axis=0).

Returns:

max – The maximum of the samples for each x.

Return type:

np.ndarray

sample_mean() Float[ndarray, 'batch y_dim'][source]

Compute the mean of the samples for each x. Estimate: E[Y | X = x] for each x. Equivalent to np.mean(samples.to_numpy(), axis=0).

Returns:

mean – The mean of the samples for each x.

Return type:

np.ndarray

sample_median() Float[ndarray, 'batch y_dim'][source]

Compute the median of the samples for each x. Estimate: median[Y | X = x] for each x. Equivalent to np.median(samples.to_numpy(), axis=0).

Returns:

median – The median of the samples for each x.

Return type:

np.ndarray

sample_min() Float[ndarray, 'batch y_dim'][source]

Compute the minimum of the samples for each x. Estimate: min[Y | X = x] for each x. Equivalent to np.min(samples.to_numpy(), axis=0).

Returns:

min – The minimum of the samples for each x.

Return type:

np.ndarray

sample_mode(bandwidth: float | Literal['scott', 'silverman'] = 1.0, verbose: bool = False) Float[ndarray, 'batch'][source]

Compute the mode of the samples for each x. Estimate: mode[Y | X = x] for each x using Kernel Density Estimation.

Parameters:
  • bandwidth (float or {'scott', 'silverman'}, default=1.0) – The bandwidth of the kernel. Bandwidth can be specified as a scalar value or as a string: - ‘scott’: Scott’s rule of thumb. - ‘silverman’: Silverman’s rule of thumb.

  • verbose (bool, default=False) – Whether to display progress bars.

Notes

The mode is computed via grid search on the Kernel Density Estimate (KDE). The step size of the grid is set to be equal to the maximum between twice the number of batches and 1,000.

Returns:

mode – The mode of the samples for each x.

Return type:

np.ndarray

sample_quantile(q: float | List[float]) Float[ndarray, 'q_dim batch y_dim'][source]

Compute the quantiles of the samples for each x. Estimate: q-th quantile[Y | X = x] for each x. Equivalent to np.quantile(samples.to_numpy(), q, axis=0).

Parameters:

q (float or list[float]) – Quantile or sequence of quantiles to compute.

sample_range() Float[ndarray, 'batch 2'][source]

Compute the range of the samples for each x using the empirical minimum and maximum of the samples, np.min(samples.to_numpy(), axis=0) and np.max(samples.to_numpy(), axis=0).

Returns:

range – The range of the samples for each x.

Return type:

np.ndarray

sample_std(ddof: int = 0) Float[ndarray, 'batch y_dim'][source]

Compute the standard deviation of the samples for each x. Estimate: std[Y | X = x] for each x. Equivalent to np.std(samples.to_numpy(), axis=0, ddof=ddof).

Parameters:

ddof (int) – Delta Degrees of Freedom. The divisor used in the calculation is N - ddof, where N represents the number of elements.

Returns:

std – The standard deviation of the samples for each x.

Return type:

np.ndarray

to_numpy() Float[ndarray, 'n_samples batch y_dim'][source]

Return the samples as a numpy array.

Returns:

samples – The numpy array of the samples.

Return type:

np.ndarray