horizontal and vertical parallelism

Asynchronous: Parallel but Vertical

In asynchronous or cooperative multitasking the programmer can decide at well defined instruction where code can be executed virtual parallel.
This gives availability to make synchronous code vertical parallel. This means that the code itself will be executed in the order of written lines (from top to bottom). But whenever a well defined transition happens other code could be executed in synchronous way at a different place. This is why I call this vertical parallelism.
To read such code, you can just ignore the syntax await and you easily understand the execution direction (like synchronous code from top to bottom).

# a simple asynchron function will execute vertical line by line 
# allowing at defined places to halt(wait) and resume for
# other code to be executed
async def foo():
    print("starting here")
    a = 1
    await asyncio.sleep(1)  # at this place other code can be executed
    print("slept for 1 second")
    a += 2
    await asyncio.sleep(0.5)  # at this place other code can be executed
    print("leaving")
    return a

The above code does exact same as the below sync version of it.

async def foo():
    print("starting here")
    a = 1
    time.sleep(1)  # at this place other code can be executed
    print("slept for 1 second")
    a += 2
    time.sleep(0.5)  # at this place other code can be executed
    print("leaving")
    return a

In the sync case the interpreter is deemed to wait for the sleep time. In background they so called global interpreter lock (GIL) is released and the OS can continue switch to another thread.

Parallel but horizontal and vertical

Now combining both direction could give us a very high performant pattern. The basic asynchronous framework, asyncio, in Python has therefore a simple wrapper for coroutines. Asyncios event loop comes with a method called create_task. This method wraps the passed coroutine into a task object (a subclass of asyncios Future). The task execution is then scheduled by the event loop. The programmer should hold reference because asyncio/python will garbage collect after the coroutine has finished.

async def request(page:int,  client: aiohttp.ClientSession):
        response = await client.get(f"https://books.toscrape.com/catalogue/page-{page}.html")
        if response.status < 300:
            body = await response.text()
            print(body[:18].strip())


async def main():
    client = aiohttp.ClientSession()
    loop = asyncio.get_event_loop()
    time_taken = time.time()
    async with client:
        # hold reference here
        pending = [loop.create_task(request(page, client)) for page in range(10)]
        # wait for them and release when all complete (default of wait())
        done, pending = await asyncio.wait(pending, return_when=asyncio.ALL_COMPLETED)
    print(f"done in {time.time() - time_taken:.2f}")
    await client.close()
    await asyncio.sleep(0.5)

if __name__ == '__main__':
    asyncio.run(main(), debug=True)
    # prints 
    # <!DOCTYPE html>
    # ...
    # done in 1.02

As you can see the request coroutine will first issue a http request and wait for it.
When the we received a response, we will check its status and then wait again to read the buffer from network interface. This so far is vertical and typical for asynchronous style.
But this is not enough, so we issue 10 request coroutines at nearly the same time, doing all the above stuff, but 10 times horizontal parallel.

To control the way how asyncio waits for the task a fine grained condition can be passed. I recommend to read asyncios docs.
Below you can find low performance version. At least it is non blocking, but it is still as slow as a synchronous version would be, because in the list comprehension we wait for each request to finish and than issue a new one. In conclusion even as the underlying request coroutine is asynchronous you can do things easily in a wrong way, by synchronizing things carelessly.

async def request(page: int, client: aiohttp.ClientSession):
    response = await client.get(f"https://books.toscrape.com/catalogue/page-{page}.html")
    if response.status < 300:
        body = await response.text()
        print(body[:18].strip())


async def main():
    client = aiohttp.ClientSession()
    loop = asyncio.get_event_loop()
    time_taken = time.time()
    async with client:
        results = [await request(page, client) for page in range(10)]
    print(f"done in {time.time() - time_taken:.2f}")
    await client.close()
    await asyncio.sleep(0.5)


if __name__ == '__main__':
    asyncio.run(main(), debug=True)
    # prints
    # <!DOCTYPE html>
    # ...
    # done in 2.50

At the end of this blog I really encourage you to read the docs of asyncio and read source code or buy a book.

Leave a Comment

Your email address will not be published. Required fields are marked *