This part of the series will introduce the concept of “pipes” using coroutines. We’ll write a naive implementation of tail -f
, and enhance it with a filter, then broadcast the results.
Prerequisites
- You’re using python 2.7.x
- You’re familiar with coroutines. otherwise, read - coroutines: introduction.
Pipelines
coroutines can be used to set up pipes, by chaining multiple coroutines together and pushing data from a producer to consumers.
[producer] --> [consumer coroutine] --> [consumer coroutine] --> ...
The producer has an end-point consumer (sink), which consumes the produced data:
def producer(target): |
tail -f
tail is a program on Unix and Unix-like systems used to display the tail end of a text file or piped data.
The --follow
flag (-f
for short) cause tail to output appended data as the file grows.
We will emulate the same behavior with a line producer. The producer will read lines from a given file, and send each line to a consumer that prints them.
from io import SEEK_END |
Now, we’ll define a coroutine decorator that bootstraps coroutines:
from functools import wraps |
Then create a print coroutine that consumes lines and prints them to stdout:
|
And now lets hook everything together:
from sys import argv |
Here we go. we’ve got a nice implementation of tail -f
.
Adding a filter
Let’s say we want to add a filter to our solution that behaves like grep
.
from re import compile |
We’ll leverage the current design and only print lines that start with the word hello
.
from sys import argv |
The former works like so: [follow] --> [filter] --> [printer]
Broadcasting
Lets say we want to broadcast produced results, we can create a broadcast coroutine:
|
Now, we’ll create another coroutine that cleans nasty words:
from re import compile |
And hook everything together:
with open(argv[1], 'r') as f: |
You’ve probably noticed we’ve created our filter multiple times. Maybe a better approach would be to send a compiled expression to both cleaner
and filter
coroutines.
Moreover, we’re creating multiple printer
coroutines. Instead of creating separate pipes for each filter, we can create one instance of printer
and branch back when we need to print.
Let’s fix these issues -
printer_routine = printer() |
All the code you’ve seen in this blog is available as a GitHub Gist as well.
Final thoughts
coroutines provide more powerful data routing possibilities than simple iterators.
If you’ve built a collection of simple data processing components, you can glue them together into complex arrangements of pipes, branches, merging, etc.
What’s next?
Go ahead and read the next part of the series, coroutines: basic building blocks for concurrency.