Python's GIL is great, except when writing CPU-intensive network services
The GIL is great. People who say otherwise are wrong 99% of the time. This post is about the other 1% of cases.
Python’s Global Interpreter Lock (GIL) is wonderful. The performance cost of the GIL is negligible and its presence makes programming in Python predictable and enjoyable. 99% of the time when the GIL seems like a problem there exists a simple and well-known workaround.
Sometimes the GIL is a genuine inconvenience. The GIL is a problem when writing one particular kind of software: CPU-intensive network services. If your network service needs plenty of CPU time to respond to a request, the GIL will get in your way. This post introduces an example of such a service written in Python. We will then look at an equivalent version written in Go.
Let’s begin by writing the service using Python. In response to each client request, our network service should return a random prime number. This service is CPU-intensive because, as you’ll see, it uses a slow method to check if a randomly generated number is prime. As this is a pedagogical example, our implementation of the service will be simple and schematic. (A more realistic example would involve, say, image or audio processing.) The service makes available a function, rand_prime
, which returns a random prime less than 50,000,000. Since this is a network service we need to respond to requests as they arrive. A new request may arrive while we are processing another request. So we care how fast the service can respond to several requests received at the same time. Here’s a clean implementation, making use of Python’s asyncio
library for concurrency:
import asyncio
import random
import time
def is_prime(x):
"""A very slow method. Assumes `x` is non-negative."""
for n in range(2, x):
if x % n == 0:
return False
if x in {0, 1}:
return False
return True
async def rand_prime():
while True:
x = random.randrange(50_000_000)
if is_prime(x):
return x
async def main():
num_requests = 8
start = time.time()
tasks = [asyncio.create_task(rand_prime()) for _ in range(num_requests)]
for task in asyncio.as_completed(tasks):
print(await task)
print("Time elapsed:", time.time() - start)
if __name__ == "__main__":
asyncio.run(main())
This version is no faster than a serial version, a program which processes each request sequentially (see the Appendix for code). Because of the GIL, rand_prime
cannot use available CPU cores, even though it could do so safely. This is frustrating because the same program, written in Go, runs 3.4 times faster than a serial version (written in Go).
At pixel time, there is no simple way to speed up the Python code. You can use a ProcessPoolExecutor, but doing so comes with large costs, including added complexity and decreased readability.1 Looking at the equivalent code in golang should stir feelings of envy in the heart of every Python programmer. The code is just as clean but it uses all available CPU cores. Here it is:
package main
import (
"fmt"
"math/rand"
"time"
)
func isPrime(x int) bool {
for n := 2; n < x; n++ {
if x%n == 0 {
return false
}
}
if x == 0 || x == 1 {
return false
}
return true
}
func randPrime() int {
for {
x := rand.Intn(50_000_000)
if isPrime(x) {
return x
}
}
}
func main() {
numRequests := 8
start := time.Now()
primeStream := make(chan int)
for i := 0; i < numRequests; i++ {
go func() {
primeStream <- randPrime()
}()
}
for i := 0; i < numRequests; i++ {
fmt.Printf("%v\n", <-primeStream)
}
close(primeStream)
fmt.Printf("Time elapsed: %v\n", time.Since(start))
}
The GIL is a serious inconvenience when writing CPU-intensive network services in Python. The standard solution—to write the CPU-intensive bits using GIL-releasing C or C++—is inconvenient. Python is a great language and it would be wonderful if it could handle this case as well. Right now, however, there’s no easy fix. But we live in hope for an alternative Python implementation which does solve the problem. For a brief period of time, it appeared as if PyPy might deliver on the promise of GIL-less Python. But time has shown that getting rid of the GIL is a very hard task. For now, those writing CPU-intensive network services will need to consider writing (some) code in a different language.
This is episode 2 of season 1 of Polyglot Python.
This is version 1, published on 2021-06-21. The first version of this post appeared on 2021-06-21.
If you read this episode and value it, consider subscribing to the paid version. Doing so supports the development of new episodes. There are a couple of subscriber-only perks. More are planned.
Subscriptions are not intended to discriminate against those lacking access to financial resources. If you have limited access to money, I’ll give you a free subscription.
If you’re reading this in your inbox, know that there’s a version on the web.
Appendix
Version of the Python program which uses ProcessPoolExecutor
import asyncio
import concurrent.futures
import itertools
import random
import time
def is_prime(x):
"""A very slow method. Assumes `x` is non-negative."""
for n in range(2, x):
if x % n == 0:
return False
if x in {0, 1}:
return False
return True
def rand_prime():
while True:
x = random.randrange(50_000_000)
if is_prime(x):
return x
async def main():
num_requests = 8
start = time.time()
loop = asyncio.get_running_loop()
with concurrent.futures.ProcessPoolExecutor() as pool:
tasks = [loop.run_in_executor(pool, rand_prime) for _ in range(num_requests)]
for task in asyncio.as_completed(tasks):
print(await task)
print("Time elapsed:", time.time() - start)
if __name__ == "__main__":
asyncio.run(main())
Version of the Python program which processes requests serially
import asyncio
import itertools
import random
import time
def is_prime(x):
"""A very slow method. Assumes `x` is non-negative."""
for n in range(2, x):
if x % n == 0:
return False
if x in {0, 1}:
return False
return True
def rand_prime():
while True:
x = random.randrange(50_000_000)
if is_prime(x):
return x
def main():
num_requests = 8
start = time.time()
for _ in range(num_requests):
print(rand_prime())
print("Time elapsed:", time.time() - start)
if __name__ == "__main__":
main()
Here are four costs associated with using a ProcessPoolExecutor
, ordered in terms of declining importance. First, your code is no longer cross-platform because process creation varies dramatically across operating systems. Second, using an executor adds complexity to your code as you now need to keep track of the ProcessPoolExecutor
instance—not a trivial task in a network service. Third, you must make sure that your function arguments and return values are serializable using the pickle
format, as this is a requirement. Fourth, your code is harder to read.