Understanding Python Generator
Category: programming
Recently I was working with generators and it really took me some time to get it. So I decided to write this post to share. Before I dive into explanation and examples of generators in Python, let me first point out some great posts about generators in Python:
- Improve Your Python: ‘yield’ and Generators Explained (The author is pretty good at Python and his small book–Writing Idiomatic Python–is also worth reading if you want to write more readable Python code. The only problem with this post is that he puts too much emphasis on generator as iterator but not the multitasking part. However, most people use generator as iterator anyway.)
- Introduction to Generators (Python Wiki)
- If you are really brave and you are confident that you will be able to get it, see PEP 342. Honestly I am still blur on some parts in this PEP but I would surely recommend this as it contains a lot of useful and intriguing information about why and how generators are generators in Python now.
So first of all, we need to understand subroutine and coroutine:
In computer programming, a subroutine is a sequence of program instructions that perform a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed.
Coroutines are computer program components that generalize subroutines for nonpreemptive multitasking, by allowing multiple entry points for suspending and resuming execution at certain locations. Coroutines are well-suited for implementing more familiar program components such as cooperative tasks, exceptions, event loop, iterators, infinite lists and pipes.
So according to definitions from Wikipedia above, you can see the most useful features of generators in Python already: iterators and infinite list. By calling yield
, the function is voluntarily giving up control with context saved. After that you can execute the generator function from its start or from the part where control was handed to others(where yield
is). Sounds confusing? Let us see some examples.
So if you take a look at the code above, it seems that there is no difference in subroutine and coroutine. But wait a moment, let’s append this block of code:
So subroutine does not finish and return until the whole list is computed. Coroutine on the other hand, gives back control after one prime is found and next invocation starts from previously saved context. So effectively, the yield
expression will convert a function to a generator function. So when you call the generator function, you get back generator. Only when you call next method on generator object, the execution starts and pauses at yield expression(We say “pause” here because we can still continue later).
Well, that is the main use of generator in Python and the only use of generator in Python before PEP 342. But now we have another use case for yield
now in Python. See this example:
So the example here demonstrates another use case of generator in Python: cooperative tasks. So as explained before, when we say consumer = consume()
we get back a generator object. Now we send None to it, we basically mean execute the generator until it yields the control. So consumer is paused at data = yield
and the control is back to main part. When we call next on producer, we basically mean that execute producer until next yield is encountered. But inside produce, it sends some data to consumer, which means that consumer will now take over control and consume those data. The control is back from consumer to producer when next yield in consumer is encountered. If you have a look and execute the following program, you can trace it more easily:
If you can understand this example, let’s go back to the first use of generator–acting as an iterator. You will see that yield is still doing the same thing, only that this time only next is called on it and nobody sends anything to do cooperative multitasking with it anymore–so the control is always handed back to caller.
Some more tips about generator:
- You cannot have return statement in a generator function.
- Generator(or iterator) is generally more memory friendly and faster.
- Generator as iterator demostrates power of laziness in programming. Haskell is the laziest and purest of all programming languages.
To sum up:
- Generator in Python is coroutine now. But still the most frequent use is to generate a list of data.
yield
is like a return but it saves the current context. So if next is called on a generator, the execution resumes from statement right afteryield
. And since it is like return, you cannot have return in a generator function.- You can use next or send on a generator but they are for different purposes. next for using generator as an iterator and send for cooperative multitasking.