Samples¶
Module contents¶
- class Samples(input_array: Float[ndarray, 'n_samples batch y_dim'])[source]¶
Bases:
object
A wrapper class for the output of Treeffuser, the samples from the conditional distribution p(y|x). It provides convenient methods to compute various statistics from the samples.
- Parameters:
input_array (np.ndarray) – An array containing samples with shape (n_samples, batch, y_dim).
- n_samples¶
Number of samples.
- Type:
int
- batch¶
Batch size.
- Type:
int
- y_dim¶
Dimension of the response variable.
- Type:
int
- shape¶
Shape of the samples array.
- Type:
tuple
- ndim¶
Number of dimensions of the samples array.
- Type:
int
- sample_apply(fun: Callable[[ndarray], ndarray]) Float[ndarray, 'batch y_dim'] [source]¶
Apply a function to the samples for each x.
- Parameters:
func (callable) – A function to apply to each sample. The function should take a numpy array of shape (n_samples,) and return a numpy array of the same shape.
- Returns:
result – The result of applying the function to each row of the samples.
- Return type:
np.ndarray
- sample_confidence_interval(confidence: float = 0.95) Float[ndarray, '2 batch y_dim'] [source]¶
Estimate the confidence interval of the samples for each x using the empirical quantiles of the samples.
- Parameters:
confidence (float) – The confidence level for the interval.
- Returns:
ci – The confidence interval of the samples for each x.
- Return type:
np.ndarray
- sample_correlation() Float[ndarray, 'batch y_dim y_dim'] [source]¶
Compute the correlation matrix of the samples for each x. Estimate: corr[Y | X = x] for each x.
- Returns:
correlation – The correlation matrix of the samples for each x.
- Return type:
np.ndarray
- sample_kde(bandwidth: float | Literal['scott', 'silverman'] = 1.0, verbose: bool = False) List[KernelDensity] [source]¶
Compute the Kernel Density Estimate (KDE) for each x. Estimate: KDE[Y | X = x] for each x using Gaussian kernels from sklearn.neighbors.
- Parameters:
bandwidth (float or {'scott', 'silverman'}, default=1.0) – The bandwidth of the kernel. Bandwidth can be specified as a scalar value or as a string: - ‘scott’: Scott’s rule of thumb. - ‘silverman’: Silverman’s rule of thumb.
verbose (bool, default=False) – Whether to display progress bars.
- Returns:
kdes – A list of KernelDensity objects, one for each x.
- Return type:
list of KernelDensity
- sample_max() Float[ndarray, 'batch y_dim'] [source]¶
Compute the maximum of the samples for each x. Estimate: max[Y | X = x] for each x. Equivalent to np.max(samples.to_numpy(), axis=0).
- Returns:
max – The maximum of the samples for each x.
- Return type:
np.ndarray
- sample_mean() Float[ndarray, 'batch y_dim'] [source]¶
Compute the mean of the samples for each x. Estimate: E[Y | X = x] for each x. Equivalent to np.mean(samples.to_numpy(), axis=0).
- Returns:
mean – The mean of the samples for each x.
- Return type:
np.ndarray
- sample_median() Float[ndarray, 'batch y_dim'] [source]¶
Compute the median of the samples for each x. Estimate: median[Y | X = x] for each x. Equivalent to np.median(samples.to_numpy(), axis=0).
- Returns:
median – The median of the samples for each x.
- Return type:
np.ndarray
- sample_min() Float[ndarray, 'batch y_dim'] [source]¶
Compute the minimum of the samples for each x. Estimate: min[Y | X = x] for each x. Equivalent to np.min(samples.to_numpy(), axis=0).
- Returns:
min – The minimum of the samples for each x.
- Return type:
np.ndarray
- sample_mode(bandwidth: float | Literal['scott', 'silverman'] = 1.0, verbose: bool = False) Float[ndarray, 'batch'] [source]¶
Compute the mode of the samples for each x. Estimate: mode[Y | X = x] for each x using Kernel Density Estimation.
- Parameters:
bandwidth (float or {'scott', 'silverman'}, default=1.0) – The bandwidth of the kernel. Bandwidth can be specified as a scalar value or as a string: - ‘scott’: Scott’s rule of thumb. - ‘silverman’: Silverman’s rule of thumb.
verbose (bool, default=False) – Whether to display progress bars.
Notes
The mode is computed via grid search on the Kernel Density Estimate (KDE). The step size of the grid is set to be equal to the maximum between twice the number of batches and 1,000.
- Returns:
mode – The mode of the samples for each x.
- Return type:
np.ndarray
- sample_quantile(q: float | List[float]) Float[ndarray, 'q_dim batch y_dim'] [source]¶
Compute the quantiles of the samples for each x. Estimate: q-th quantile[Y | X = x] for each x. Equivalent to np.quantile(samples.to_numpy(), q, axis=0).
- Parameters:
q (float or list[float]) – Quantile or sequence of quantiles to compute.
- sample_range() Float[ndarray, 'batch 2'] [source]¶
Compute the range of the samples for each x using the empirical minimum and maximum of the samples, np.min(samples.to_numpy(), axis=0) and np.max(samples.to_numpy(), axis=0).
- Returns:
range – The range of the samples for each x.
- Return type:
np.ndarray
- sample_std(ddof: int = 0) Float[ndarray, 'batch y_dim'] [source]¶
Compute the standard deviation of the samples for each x. Estimate: std[Y | X = x] for each x. Equivalent to np.std(samples.to_numpy(), axis=0, ddof=ddof).
- Parameters:
ddof (int) – Delta Degrees of Freedom. The divisor used in the calculation is N - ddof, where N represents the number of elements.
- Returns:
std – The standard deviation of the samples for each x.
- Return type:
np.ndarray