pyppin.iterators¶
Useful tools for working with iterators and iterables.
Modules
Merge multiple sorted iterators into a single iterator over sorted tuples. |
Functions
|
Select <count> randomly sampled items from the stream <source>. |
|
Split a source according to keys, like a SQL GROUP BY statement. |
- pyppin.iterators.split(source: Iterable[DataType], key: Callable[[DataType], KeyType]) Dict[KeyType, List[DataType]] [source]¶
Split a source according to keys, like a SQL GROUP BY statement.
For each value in
source
, call key(value), and return a dict whose keys are all possible keys, and whose values are a list of all values in the original source who had that key.
- pyppin.iterators.sample(source: Iterable[DataType], count: int) List[DataType] [source]¶
Select <count> randomly sampled items from the stream <source>.
This function implements reservoir sampling using Li’s “Algorithm L.”
- Parameters
source – The data from which to sample.
count – The number of items to select from this sample.
- Returns
A list of (up to) [count] items from the source, randomly selected with equal weights.