Deployment To Kubernetes Connection Pools Postgres

halteproblem · July 12, 2020, 11:26am

Hello,

I want to deploy a small sanic app to kubernetes.
It just has to return a database entry.
However it needs to handle a lot of requests concentrated on a short period of time.

I thought of running one worker for each pod.

app.run(host=“0.0.0.0”, port=8000, workers=1, access_log=False)

I use asyncpg connection pools to connect to the postgres database

class pg:

def __init__(self, pg_pool):

    self.pg_pool = pg_pool

async def fetch(self, sql, *args, **kwargs):

    async with self.pg_pool.acquire() as connection:

        return await connection.fetch(sql, *args, **kwargs)

async def execute(self, sql, *args, **kwargs):

    async with self.pg_pool.acquire() as connection:

        return await connection.execute(sql, *args, **kwargs)

@app.listener(‘before_server_start’)
async def register_db(app, loop):

pool = await asyncpg.create_pool(

    user=app.config.get("DB_USER"), password=app.config.get("DB_PASSWORD"), host=app.config.get("DB_HOST"),

    port=app.config.get("DB_PORT"), database=app.config.get("DB_NAME")

)

app.pool = pg(pool)

I believe every pod will create an own connection pool. So if I have 5 pods on one node there will be 5 connection pools. One for each pod. Isn’t there are better way to do this?

Thank you very much for your help.

ahopkins · July 12, 2020, 2:40pm

This sounds like a good way to do this. What do you mean better way? What are you trying to optimize from where you are?

halteproblem · July 12, 2020, 6:46pm

The default values when creating a connection pool with asyncpg are following:

create_pool ( dsn=None , *** , min_size=10 , max_size=10 , max_queries=50000 , max_inactive_connection_lifetime=300.0 , setup=None , init=None , loop=None , connection_class=<class ‘asyncpg.connection.Connection’> , **connect_kwargs )

I was worried that database can not handle 100 connections if I spawn 5 pods on 2 nodes each.
If I set a static value for the max_size and I scale the pods up and down according to the load on the webserver, I will never achieve the optimal amount of connections.

ahopkins · July 13, 2020, 8:11am

Those numbers do not sound unreasonable to me. Workers in Sanic are separate processes. Pods (and the containers inside them) create an even further division that means you cannot pool across instances. Keep an eye on it and stress test it, but that sounds within capacity to me.

halteproblem · July 13, 2020, 4:09pm

I just encountered this error when running 10 pods with 1 worker each.

Traceback (most recent call last):
File “/usr/local/lib/python3.7/site-packages/sanic/app.py”, line 937, in handle_request
response = await response
File “app.py”, line 39, in test
result = await app.pool.fetch(query)
File “app.py”, line 19, in fetch
async with self.pg_pool.acquire() as connection:
File “/usr/local/lib/python3.7/site-packages/asyncpg/pool.py”, line 773, in aenter
self.connection = await self.pool._acquire(self.timeout)
File “/usr/local/lib/python3.7/site-packages/asyncpg/pool.py”, line 613, in _acquire
return await _acquire_impl()
File “/usr/local/lib/python3.7/site-packages/asyncpg/pool.py”, line 598, in _acquire_impl
proxy = await ch.acquire() # type: PoolConnectionProxy
File “/usr/local/lib/python3.7/site-packages/asyncpg/pool.py”, line 133, in acquire
await self.connect()
File “/usr/local/lib/python3.7/site-packages/asyncpg/pool.py”, line 125, in connect
self._con = await self._pool._get_new_connection()
File “/usr/local/lib/python3.7/site-packages/asyncpg/pool.py”, line 487, in _get_new_connection
connection_class=self._connection_class)
File “/usr/local/lib/python3.7/site-packages/asyncpg/connect_utils.py”, line 631, in _connect_addr
await asyncio.wait_for(connected, timeout=timeout)
File “/usr/local/lib/python3.7/asyncio/tasks.py”, line 416, in wait_for
return fut.result()
asyncpg.exceptions.TooManyConnectionsError: sorry, too many clients already

ahopkins · July 13, 2020, 5:23pm

Maybe look into using https://www.pgbouncer.org/

halteproblem · July 13, 2020, 8:40pm

Thank you very much. I believe thats the solution for my problem.

halteproblem · July 14, 2020, 7:11pm

It seems like without pgbouncer the responses are 55% faster.
I testet with apache benchmark: ab -k -c 1000 -n 2000 “URL”

pgbouncer.ini:

[databases]

= host=xxx port=5432 user=xxx

[pgbouncer]

listen_addr = 0.0.0.0

listen_port = 5432

unix_socket_dir =/tmp

user = pgbouncer

auth_file = /etc/pgbouncer/userlist.txt

auth_type = md5

ignore_startup_parameters = extra_float_digits

pool_mode = transaction

default_pool_size = 90

max_client_conn = 10000

Log settings

admin_users = postgres

My query code in app.py

async def fetch(query):

connection = await asyncpg.connect(

    user=app.config.get("DB_USER"), password=app.config.get("DB_PASSWORD"), host=app.config.get("DB_HOST"),

    port=app.config.get("DB_PORT"), database=app.config.get("DB_NAME"), statement_cache_size=0

)

result = await connection.fetch(query)

await connection.close()

return result

My view function

@app.route("/perimeterx", version=1)

async def test(request):
result = await fetch(query)

return json({ "result": jsonify(result)})

ahopkins · July 18, 2020, 8:25pm

Well, that does make sense that it would be slower because you are routing through another service. However, and this is inherently a flaw in most benchmarks, it does not necessarily reflect the truth of deployment. Where will pgbouncer sit? App server? DB server? What queries will it be hitting? Etc, etc…

Nevertheless, I think for your use case, it probably adds a lot of benefit. I do not proclaim to be a Postgres expert, but there likely are further optimizations that could be made.