pyppin.iterators

[Browse Source]

Useful tools for working with iterators and iterables.

Modules

pyppin.iterators.zip_by_key

Merge multiple sorted iterators into a single iterator over sorted tuples.

Functions

sample(source, count)

Select <count> randomly sampled items from the stream <source>.

split(source, key)

Split a source according to keys, like a SQL GROUP BY statement.

pyppin.iterators.split(source: Iterable[DataType], key: Callable[[DataType], KeyType]) Dict[KeyType, List[DataType]][source]

Split a source according to keys, like a SQL GROUP BY statement.

For each value in source, call key(value), and return a dict whose keys are all possible keys, and whose values are a list of all values in the original source who had that key.

pyppin.iterators.sample(source: Iterable[DataType], count: int) List[DataType][source]

Select <count> randomly sampled items from the stream <source>.

This function implements reservoir sampling using Li’s “Algorithm L.”

Parameters
  • source – The data from which to sample.

  • count – The number of items to select from this sample.

Returns

A list of (up to) [count] items from the source, randomly selected with equal weights.