pyppin.iterators.zip_by_key

[View Source]

Merge multiple sorted iterators into a single iterator over sorted tuples.

Functions

zip_by_key(*sources[, yield_keys])

Combine N sorted iterators into a single iterator that yields merged tuples.

Classes

ZipSource(source[, key, value, name, ...])

class pyppin.iterators.zip_by_key.ZipSource(source: Iterable[ValueType], key: Optional[Callable[[ValueType], KeyType]] = None, value: Optional[Callable[[ValueType], YieldedType]] = None, name: Optional[str] = None, required: bool = False, missing: Optional[Callable[[KeyType], YieldedType]] = None, missing_value: Optional[YieldedType] = None)[source]

Bases: Generic[KeyType, ValueType, YieldedType]

classmethod aux(value: Callable[[KeyType], YieldedType], name: Optional[str] = None) ZipSource[source]

Return a ZipSource that contains just default values; you can use this to add per-key annotations to your yielded output easily.

For example, zip_by_key([1, 2, 3, 4, 5], ZipSource.aux(lambda x: x*x), yield_keys=False) will yield (1, 1), (2, 4), (3, 9), (4, 16), and (5, 25); the second value is the “auxiliary source.”

Note that auxiliary sources will only yield when other iterators are yielding; if you pass only auxiliary sources to zip_by_key(), you’ll get back the empty sequence.

pyppin.iterators.zip_by_key.zip_by_key(*sources: Union[ZipSource, Iterable], yield_keys: bool = True) Iterator[Union[YieldedType, Tuple[YieldedType, ...]]][source]

Combine N sorted iterators into a single iterator that yields merged tuples.

In the simplest use, if the sources are N iterators that yield values that are already strictly sorted (i.e., if one yields a then b, then a < b), then this function will yield tuples of N+1 items, with each entry being (key, val1, val2…), in sorted order. The key is the common key for all of the items in the row, and the individual values are the corresponding values from each source (if present) or None (if that iterator didn’t have a value for this key).

The fun comes from the extra options you can provide per-source or overall. You can provide options for each source by wrapping it in a ZipSource.

Parameters
  • sources – The set of N source iterators to scan over. These can either be simple iterators, or be wrapped in a ZipSource to provide per-iterator options.

  • yield_keys – If True, the tuples will have N+1 elements, and the first is the key. If false, the tuples will have N elements, and the key will not be separately yielded.

Raises
  • IndexError if any of the lists is not actually sorted in the correct way.

  • AssertionError if invalid options were passed for any source.

Example

Say you have two sorted lists that you want to merge:

l1 = [1, 2, 3, 4, 5]
l2 = [(2, "two"), (5, "five"), (7, "seven")]

Then zip_by_key(l1, ZipSource(l2, key=lambda x: x[0])) will yield:

(1, 1, None)
(2, 2, (2, "two"))
(3, 3, None)
(4, 4, None)
(5, 5, (5, "five"))
(7, None, (7, "seven"))

What happened here? The first item in each tuple is the key, which is the same for everything in that tuple. The remaining items in the tuple are the values of l1 and l2, respectively, where None appears whenever an item is missing. (e.g., l2 has no value for the key 3)

Fancier Example

Let’s say instead you have:

squares = [1, 4, 9, 16, 25]

You want to get, for each number in l2, its printed name and its square.:

zip_by_key(
    ZipSource(l1, key=lambda x: int(sqrt(x))),
    ZipSource(l2, key=lambda x: x[0], value=lambda x: x[1], required=True),
)

This will yield:

(2, 4, "two")
(5, 25, "five")
(7, None, "seven"))

What happened here?

  • For l1, the key is the square root of the value, and the (default) value is just the element of l1.

  • For l2, the key is the first element of the tuple, the yielded value is the second element, and because required=True, all items that don’t show up in l2 are dropped outright.

  • Because l1 isn’t required, we get one yielded item that has no value for l1!