pyppin.math.histogram

[View Source]

A flexible class for managing, computing with, and plotting histograms.

This is an example ASCII plot, from the histogram used in the pyppin.iterators.sample unittest:

    |
    |
    |
    |
0.0 +
    |                                                         #
    |                                                         ##
    |                                                         ##
    |                                                         ##
0.0 +                                                         ###
    |                                                       # ###
    |                                                       # ###
    |                                                       # ### #
    |                                                       # #####
0.0 +                                                      ## ######
    |                                                      ## ######
    |                                                      ## ######
    |                                                      ## ######
    |                                                      ## ######
0.0 +                                                    ###########
    |                                                    ############
    |                                                    #############
    |                                                   ###############
    |                                                   ###############
0.0 +                                                   ###############
    |                                                  ################
    |                                                  ################
    |                                                  ################
    |                                                  ################
0.0 +                                                 #################
    |                                                 ###################
    |                                                 ####################
    |                                               # ####################
    |                                               # ####################
0.0 +                                             # # ##################### #
    |                                             # ####################### #
    |                                           # # #########################
    ++------+------+------+------+------+------+------+------+------+------+----
     0.0    12.1   24.2   36.3   48.4   60.5   72.6   84.8   96.9   109.0  121.1

Classes

Bucketing([max_linear_value, linear_steps, ...])

Define the shapes of the buckets that we will use for the histogram.

Histogram([bucketing])

class pyppin.math.histogram.Histogram(bucketing: Optional[Bucketing] = None)[source]

Bases: object

add(value: float, count: int = 1) None[source]

Add a value to the histogram.

Parameters
  • value – The value to add.

  • count – The number of times to add this value.

combine(other: Histogram) None[source]

Add another histogram to this histogram.

property min: float[source]

The minimum value found in this histogram.

property max: float[source]

The maximum value found in this histogram.

property count: int[source]

The total number of values in this histogram.

property total: float[source]

The sum of values in this histogram.

property mean: float[source]

The mean value of the data.

percentile(n: float) float[source]

Return the value at the Nth percentile of this histogram, with n in [0, 100].

property median: float[source]

The median (50th percentile) value of the data.

property variance: float[source]

The distribution variance of the data.

property standard_deviation: float[source]

The standard deviation of the data.

Reminder: If your data doesn’t follow a Gaussian distribution, this is not going to give you a very meaningful number.

plot_ascii(width: int = 100, height: int = 0, min_percentile: float = 0, max_percentile: float = 100, raw_counts: bool = False) str[source]

Generate an ASCII plot of the histogram.

Parameters
  • width – The dimensions of the plot, in characters. The height defaults to half the width.

  • height – The dimensions of the plot, in characters. The height defaults to half the width.

  • min_percentile – The subrange of the histogram to include.

  • max_percentile – The subrange of the histogram to include.

  • raw_counts – If True, plot the raw bucket counts. If False (the default), plot the probability distribution function.

histogram_values() Callable[[float], float][source]

Return a function that looks like the histogram itself: For each value X, it returns the count in the bucket containing X.

Note that this is not the same as the PDF, because buckets do not all have the same width!

pdf() Callable[[float], float][source]

Return the probability distribution function inferred from this histogram.

cdf() Callable[[float], float][source]

Return the cumulative distribution function inferred from this histogram.

class pyppin.math.histogram.Bucketing(max_linear_value: Optional[float] = None, linear_steps: float = 1, exponential_multiplier: float = 2)[source]

Bases: object

Define the shapes of the buckets that we will use for the histogram.

We use linear/exponential bucketing: linear buckets (i.e. [0->n) [n->2n) [2n->3n)) up to some initial limit, and beyond that exponential buckets ([m->kn), [kn->k²n), [k²n, k³n)).

Parameters
  • max_linear_value – The value at which we switch from linear to exponential buckets, or None to use linear values for everything.

  • linear_steps – The interval size for linear values.

  • exponential_multiplier – The multiplication factor for exponential values.

bucket(value: float) int[source]

Given a value to be added to the histogram, figure out which bucket it goes in.

bucket_width(bucket: int) float[source]

Find the width (in the histogram’s natural units) of a given bucket.

value_for_bucket(bucket: int) float[source]

Return the (min) value for the indicated bucket.