Deployment To Kubernetes Connection Pools Postgres

Hello,

I want to deploy a small sanic app to kubernetes.
It just has to return a database entry.
However it needs to handle a lot of requests concentrated on a short period of time.

I thought of running one worker for each pod.

app.run(host=“0.0.0.0”, port=8000, workers=1, access_log=False)

I use asyncpg connection pools to connect to the postgres database

class pg:

def __init__(self, pg_pool):

    self.pg_pool = pg_pool

async def fetch(self, sql, *args, **kwargs):

    async with self.pg_pool.acquire() as connection:

        return await connection.fetch(sql, *args, **kwargs)

async def execute(self, sql, *args, **kwargs):

    async with self.pg_pool.acquire() as connection:

        return await connection.execute(sql, *args, **kwargs)

@app.listener(‘before_server_start’)
async def register_db(app, loop):

pool = await asyncpg.create_pool(

    user=app.config.get("DB_USER"), password=app.config.get("DB_PASSWORD"), host=app.config.get("DB_HOST"),

    port=app.config.get("DB_PORT"), database=app.config.get("DB_NAME")

)

app.pool = pg(pool)

I believe every pod will create an own connection pool. So if I have 5 pods on one node there will be 5 connection pools. One for each pod. Isn’t there are better way to do this?

Thank you very much for your help.

This sounds like a good way to do this. What do you mean better way? What are you trying to optimize from where you are?

The default values when creating a connection pool with asyncpg are following:

create_pool ( dsn=None , *** , min_size=10 , max_size=10 , max_queries=50000 , max_inactive_connection_lifetime=300.0 , setup=None , init=None , loop=None , connection_class=<class ‘asyncpg.connection.Connection’> , **connect_kwargs )

I was worried that database can not handle 100 connections if I spawn 5 pods on 2 nodes each.
If I set a static value for the max_size and I scale the pods up and down according to the load on the webserver, I will never achieve the optimal amount of connections.

Those numbers do not sound unreasonable to me. Workers in Sanic are separate processes. Pods (and the containers inside them) create an even further division that means you cannot pool across instances. Keep an eye on it and stress test it, but that sounds within capacity to me.

I just encountered this error when running 10 pods with 1 worker each.

Traceback (most recent call last):
File “/usr/local/lib/python3.7/site-packages/sanic/app.py”, line 937, in handle_request
response = await response
File “app.py”, line 39, in test
result = await app.pool.fetch(query)
File “app.py”, line 19, in fetch
async with self.pg_pool.acquire() as connection:
File “/usr/local/lib/python3.7/site-packages/asyncpg/pool.py”, line 773, in aenter
self.connection = await self.pool._acquire(self.timeout)
File “/usr/local/lib/python3.7/site-packages/asyncpg/pool.py”, line 613, in _acquire
return await _acquire_impl()
File “/usr/local/lib/python3.7/site-packages/asyncpg/pool.py”, line 598, in _acquire_impl
proxy = await ch.acquire() # type: PoolConnectionProxy
File “/usr/local/lib/python3.7/site-packages/asyncpg/pool.py”, line 133, in acquire
await self.connect()
File “/usr/local/lib/python3.7/site-packages/asyncpg/pool.py”, line 125, in connect
self._con = await self._pool._get_new_connection()
File “/usr/local/lib/python3.7/site-packages/asyncpg/pool.py”, line 487, in _get_new_connection
connection_class=self._connection_class)
File “/usr/local/lib/python3.7/site-packages/asyncpg/connect_utils.py”, line 631, in _connect_addr
await asyncio.wait_for(connected, timeout=timeout)
File “/usr/local/lib/python3.7/asyncio/tasks.py”, line 416, in wait_for
return fut.result()
asyncpg.exceptions.TooManyConnectionsError: sorry, too many clients already

Maybe look into using https://www.pgbouncer.org/

Thank you very much. I believe thats the solution for my problem.

1 Like

It seems like without pgbouncer the responses are 55% faster.
I testet with apache benchmark: ab -k -c 1000 -n 2000 “URL”

pgbouncer.ini:

[databases]

  • = host=xxx port=5432 user=xxx

[pgbouncer]

listen_addr = 0.0.0.0

listen_port = 5432

unix_socket_dir =/tmp

user = pgbouncer

auth_file = /etc/pgbouncer/userlist.txt

auth_type = md5

ignore_startup_parameters = extra_float_digits

pool_mode = transaction

default_pool_size = 90

max_client_conn = 10000

Log settings

admin_users = postgres

My query code in app.py

async def fetch(query):

connection = await asyncpg.connect(

    user=app.config.get("DB_USER"), password=app.config.get("DB_PASSWORD"), host=app.config.get("DB_HOST"),

    port=app.config.get("DB_PORT"), database=app.config.get("DB_NAME"), statement_cache_size=0

)

result = await connection.fetch(query)

await connection.close()

return result

My view function

@app.route("/perimeterx", version=1)

async def test(request):

result = await fetch(query)

return json({ "result": jsonify(result)})

Well, that does make sense that it would be slower because you are routing through another service. However, and this is inherently a flaw in most benchmarks, it does not necessarily reflect the truth of deployment. Where will pgbouncer sit? App server? DB server? What queries will it be hitting? Etc, etc…

Nevertheless, I think for your use case, it probably adds a lot of benefit. I do not proclaim to be a Postgres expert, but there likely are further optimizations that could be made.