Documentation

bootplot.base.bootplot(f: callable, data: ndarray | DataFrame, m: int = 100, k: int = 2.5, threshold: int = 0.3, output_size_px: Tuple[int, int] = (512, 512), output_image_path: str | Path | None = None, transformation: bool = True, output_animation_path: str | Path | None = None, sort_type: str = 'tsp', sort_kwargs: dict | None = None, decay: int = 0, animation_duration: float = 5.0, backend: Backend | str = 'matplotlib', verbose: bool = False) ndarray[source]

Create a bootstrapped plot or animation.

This function internally creates m samples with replacement from the provided data. Each sample has the same number of rows as the input. The samples are then plotted using the function handle f and the images stored as numpy.ndarray objects. The output is a weighted sum of these images. If specified, this function can also create an animation where images are sorted according to sort_type and the output animation is written to disk.

Parameters:
  • f (callable) – function handle to perform the plotting. The handle should have the form f(data_subset, data_full, ax) where data_subset, data_full are numpy.ndarray or pandas.DataFrame objects and ax is a matplotlib.axes.Axes object.

  • data (numpy.ndarray or pandas.DataFrame) – data to be used in plotting.

  • m (int) – number of boostrap resamples. Default: 100.

  • k (int) – input beta cdf transformation parameter. Controls the shape Default: 2.5.

  • threshold (int) – input transformation parameter. Controls the codomain of the transformation. It lies between 0 and 0.5. Default: 0,3.

  • output_size_px (tuple[int, int]) – output size (height, width) in pixels. Default: (512, 512).

  • output_image_path (str or pathlib.Path) – path where the image should be stored. The image format is inferred from the filename extension. If None, the image is not stored. Default: None.

  • transformation (bool) – if True transformation is applied, else images are just averaged. Default: True.

  • output_animation_path (str or pathlib.Path) – path where the animation should be stored. The animation format is inferred from the filename extension. If None, the animation is not created. Default: None.

  • sort_type (str) – method to sort images when constructing the animation. Should be one of the following: “tsp” (traveling salesman method on the image similarity graph), “pca” (image projection onto the real line using PCA), “hm” (order using center mass in the horizontal direction), “none” (no sorting; random order). Default: "tsp".

  • sort_kwargs (dict) – keyword arguments for the sorting method. If None, no keyword arguments are passed to the sorting method. See bootplot.sorting.sort_images for details. Default: None.

  • decay (int) – decay length when creating the animation. If 0, no decay is applied. Default: 0.

  • animation_duration (float) – desired output animation duration in seconds. Default: 5.0.

  • xlim (tuple[float, float]) – x axis limits representing the minimum and maximum. If a limit is None, the plot is unbounded horizontally and the user is warned. Default: (None, None).

  • ylim (tuple[float, float]) – y axis limits representing the minimum and maximum. If a limit is None, the plot is unbounded vertically and the user is warned. Default: (None, None).

  • verbose (bool) – if True, print progress messages. Default: False.

  • warn_limits (bool) – if True, warns the user when a limit is not specified. Default: True.

Returns:

bootstrapped plot.

Return type:

numpy.ndarray

Examples:

Consider the task of estimating the uncertainty of a regression model. In this example, we use linear regression model to fit data drawn from a bivariate normal distribution. Instead of manually deriving and writing uncertainty estimation code, we only need to know how to plot our data.

We define a function that plots our data of interest and pass it to bootplot. In this case, we show a scatterplot of the entire dataset and a regression line based on the bootstrapped sample. We also provide axis limits to constrain our region of interest. bootplot generates the static image and saves it to disk. We can also continue to work with the returned image as a numpy.ndarray.

>>> import numpy as np
>>> from bootplot import bootplot
>>> from sklearn.linear_model import LinearRegression
>>> np.random.seed(0)
>>>
>>> def make_plot(data_subset, data_full, ax):
...     ax.scatter(data_full[:, 0], data_full[:, 1])
...     lr = LinearRegression()
...     lr.fit(data_subset[:, 0].reshape(-1, 1), data_subset[:, 1])
...     xs = np.linspace(-10, 10, 1000)
...     ax.plot(xs, lr.predict(xs.reshape(-1, 1)), c='r')
>>>
>>> dataset = np.random.multivariate_normal(mean=[0, 0], cov=[[5, 1.5], [1.5, 1]], size=(25, ))
>>> dataset.shape
(25, 2)
>>> image = bootplot(make_plot, dataset, output_image_path='bootstrapped_linear_regression.png')
>>> image.shape
(512, 512, 3)
bootplot.sorting.sort_images(images: ndarray, sort_type: str = 'tsp', verbose: bool = False, working_size: Tuple[int, int] | None = None, **kwargs) List[int][source]

Sort images with a specified sorting method.

Parameters:
  • images (numpy.ndarray with shape (n_images, n_rows, n_columns, n_channels)) – images to be sorted.

  • sort_type (str) – method to sort images when constructing the animation. Should be one of the following: “tsp” (traveling salesman method on the image similarity graph), “pca” (image projection onto the real line using PCA), “hm” (order using center mass in the horizontal direction), “none” (no sorting; random order). Default: "tsp".

  • verbose (bool) – if True, print additional information during sorting. Default: False.

  • working_size (tuple[int, int]) – optional (height, width) tuple that determines working image size. Images are resized to this size before being sorted. If None, images are not resized. Default: None.

  • kwargs – keyword arguments for the sorting method.

Returns:

image indices in the final ordering.

Return type:

list[int]

class bootplot.sorting.TravelingSalesmanSorter(verbose: bool = False)[source]

Class that sorts images according to a hamiltonian (traveling salesman) path in their similarity graph.

distance_matrix(gray_images: ndarray, features: str = 'full') ndarray[source]

Compute distances for all pairs input images. The distances are based on image representations, given by the features parameter.

Parameters:
  • gray_images (numpy.ndarray with shape (n_images, n_rows, n_columns)) – input images.

  • features (str) – type of features to be used for distance computation. Must be one of “selected” (HOG, center mass, per component center mass), “center_mass” (center mass), “full” (flattened images). Default: "full".

Returns:

square pairwise distance matrix with shape (n_images, n_images).

Return type:

numpy.ndarray

sort(gray_images: ndarray, features: str = 'full', **kwargs) List[int][source]

Sort images by posing the task as solving the Traveling Salesman Problem (TSP). We first compute pairwise distances between images. These are treated as a matrix which gives rise to an undirected weighted graph. By solving TSP on this graph, we obtain a Hamiltonian path (i.e. each vertex is theoretically visited only once), which is equal to the order of the sorted images. Note that an approximation of TSP is used and it is possible (likely) that an image may be encountered more than once.

Parameters:
  • gray_images (numpy.ndarray with shape (n_images, n_rows, n_columns)) – input images.

  • features (str) – types of features to use when computing pairwise distances. Default: "full".

  • kwargs – unused.

Returns:

image indices in the final ordering.

Return type:

list[int]

class bootplot.sorting.PCASorter(verbose: bool = False)[source]

Class that sorts images according to their PCA projections.

sort(gray_images: ndarray, features: str = 'center_mass', **kwargs) List[int][source]

Sort images with a PCA-based method. We first compute image features, then project these features into 1D. The order of the projection is the order of the images.

Parameters:
  • gray_images (numpy.ndarray with shape (n_images, n_rows, n_columns)) – input images.

  • features (str) – type of features to be used for PCA. Must be one of “selected” (HOG, center mass, per component center mass), “center_mass” (center mass), “full” (flattened images). Default: "center_mass".

  • kwargs – unused.

Returns:

image indices in for the new ordering.

Return type:

list[int]

class bootplot.sorting.HorizontalMassSorter(verbose: bool = False)[source]

Class that sorts images according to the horizontal component of their center of mass.

sort(gray_images: ndarray, **kwargs) List[int][source]

Sort images using a horizontal center mass algorithm. For each image, we compute its center mass. Images are sorted according to the horizontal (x) component of the center mass.

Parameters:
  • gray_images (numpy.ndarray with shape (n_images, n_rows, n_columns)) – input images.

  • kwargs – unused.

Returns:

image indices in the final ordering.

Return type:

list[int]

class bootplot.sorting.DefaultSorter(verbose: bool = False)[source]

Class that does not perform any sorting.

sort(images: ndarray, **kwargs) List[int][source]

Does not sort images.

Parameters:
  • images (numpy.ndarray with shape (n_images, n_rows, n_columns)) – input images.

  • kwargs – unused.

Returns:

indices of images in the original order.

Return type:

list[int]