Why am I getting an error when making HTTP request to one Sanic application running in Docker container, but not the other one?

OldNewb · August 28, 2020, 11:56am

Okay, I have been following tutorials online and reading the documentation on Sanic, and I have tried transferring my application to a loosely coded one (functions not contained in a class) to a more strictly coded one (functions contained within a class). I have tried Googling my errors and my problem specifically in many different forms to no avail. I’m somewhat desperate at this point. Please keep in mind that I’m completely new to Docker as well as to Sanic.

So, what is happening is the following:

Toward the goal of creating this distributed data storage system, I am first attempting to code a single “view” or cluster of redundant servers. They should all be available for HTTP requests to the client, and should one of them fail (go down) for any reason, the other servers should be capable of recognizing this. If a server goes down and then comes back up, the server that has come back online is responsible for contacting the other servers in its view, and notifying them of its return to the fold.

So, I am creating a network with docker thusly:

sudo docker network create --subnet=10.10.0.0/16 mynet

Then, I am creating my Docker container image thusly:

sudo docker build -t server .

Then, I am running my containers thusly:

$ sudo docker run -p 8082:8085 --net=mynet --ip=10.10.0.2 --name="replica1" -e SOCKET_ADDRESS="10.10.0.2:8085" -e VIEW="10.10.0.2:8085,10.10.0.3:8085,10.10.0.4:8085" server python3 <name_of_application.py>

$ sudo docker run -p 8083:8085 --net=mynet --ip=10.10.0.3 --name="replica2" -e SOCKET_ADDRESS="10.10.0.3:8085" -e VIEW="10.10.0.2:8085,10.10.0.3:8085,10.10.0.4:8085" server <name_of_application.py>

$ sudo docker run -p 8084:8085 --net=mynet --ip=10.10.0.4 --name="replica3" -e SOCKET_ADDRESS="10.10.0.4:8085" -e VIEW="10.10.0.2:8085,10.10.0.3:8085,10.10.0.4:8085" server python3 <name_of_application.py>

Now, when I run a curl request to the first server that gets spun up, everything works fine. It returns the correct response, i.e., it will return a response in json format with an HTTP status code.

However, when I contact the SECOND server (i.e., send a curl request to it) that gets spun up, that server will NOT return the correct response. Instead, it throws the following error:

Traceback (most recent call last): File “/usr/local/lib/python3.8/site-packages/sanic/app.py”, line 937, in handle_request response = await response File “asgn3_2.py”, line 75, in get return response.json({“message”:“View retrieved successfully”,“view”:returnString}) TypeError: json() takes 1 positional argument but 2 were given

For the love of all that is holy, does anyone have any idea why this is happening? Is it because I’m not running my Docker containers within a virtual environment? Is it because God hates me? If anyone has any help or ideas they can offer, I would be eternally grateful. Thank you in advance.

EDIT: This is the contents of my Dockerfile:

FROM python

COPY /dep.sh /asgn3_2.py /

RUN ./dep.sh

As you can see, the container image will be created with the dependencies that ‘dep.sh’ installs, so here is the contents of the ‘dep.sh’ file:

#!/bin/bash
pip3 install requests
pip3 install simplejson
pip3 install sanic
pip3 install sanic_scheduler

ahopkins · August 27, 2020, 1:14pm

What you need is container orchestration. I highly suggest that you not attempt to build this yourself and you use one of the many tools out there for this. the most prominent being Kubernetes (aka k8s). It will handle routing requests through one or more instances of tour application, and can be configured to restart an instance that has failed. Furthermore, you have the ability of doing zero-down time upgrades as it can do a rolling release.

I am AFK, and will try and respond more fully to the rest of the message later today.

OldNewb · August 27, 2020, 1:17pm

Again, thank you so much for any responses that you provide. Unfortunately, I am actually trying to get a head start on a school project, (a very, very difficult one from my perspective), and the spec for this project prohibits the use of any outside cloud ware or software. I need to write a causally consistent, fault tolerant distributed data base on my own.

Thank you so much again for any help you can provide. I can provide any code snippets and info you request…

ahopkins · August 27, 2020, 10:20pm

Ahh, got it…

Well, then I would say you need to setup your services so that they are “aware” of each other.

It should be possible for you to have them ping each other on an event. The sort of standard problem with this is you need to know their addresses. Tools like k8s obfuscate that for you.

In general, you should do something akin to the following:

On startup, send message to other services
- make sure to handle the case when the other service is not yet running (race condition)
On shutdown, you could also send a “graceful” message to the other services, just do not rely that it will always happen
I suppose it also would be nice to run a periodic “health check”

@app.listener("before_server_start")
async def do_startup(app, loop):
    try:
        await notifiy_other_instances()
    except ____:
        ... # handle exception for service not running

@app.route("/health")
async def health(request):
    return json({"ok": True})

Sanic comes bundled with httpx, so I would suggest you look into how to make requests using it: https://www.python-httpx.org/

OldNewb · August 28, 2020, 11:55am

Ah thank you for the advice, especially on the ‘app.listener’ stuff…I will try to employ …

but…I’m just concerned that none of this will help my core issue which is that I keep getting the error I listed above when I try to send an HTTP request to the second server that gets spun up.

Yeah, I know that the servers need to know each others’ addresses, I’m accomplishing that using the Docker run command which allows me to insert the “view” list into the environment variables of the Docker container.

Here is the code I have so far …I was hoping you could possibly take a look and maybe tell me what could be possibly causing the issue I’m experiencing? Thank you again so much…

from datetime import datetime, time, timedelta
from sanic_scheduler import SanicScheduler, task
from sanic import Sanic, response
from sanic.response import json
from sanic.handlers import ErrorHandler
from sanic.exceptions import NotFound
from requests.exceptions import Timeout
#from vectorclock import * 
from random import * 
import os
import socket
import copy
import requests
import asyncio
import re
import time
import json
from sanic.views import HTTPMethodView

app = Sanic("asgn3_2.py")

env_variables = dict(os.environ)
viewString    = env_variables['VIEW']
viewList      = viewString.split(",")
viewListCopy  = viewList

data = {'dingus':'bungus'}
retrievedDataStore = {}

for x in range(len(viewList)):



    headers = {'content-type':'application/json'}

    url  = "http://" + str(viewList[x]) + "/key-value-store-view/"
    url2 = "http://" + str(viewList[x]) + "/dataStoreDispersal"

    print("This is viewList[x]: " + viewList[x] + "\n")
    print("This is os.environ['SOCKET_ADDRESS']: " + os.environ['SOCKET_ADDRESS'] + "\n")

    if (str(viewList[x]) != str(os.environ['SOCKET_ADDRESS'])):
        try:
            payload  = json.dumps({'socket-address':str(os.environ['SOCKET_ADDRESS'])})
            response = requests.put(url, headers = headers, data=payload)
            #print("THIS IS RESPONSE: " + str(response))
        except requests.exceptions.ConnectionError:
            print("server " + str(viewList[x]) + " is down.")
        
        #Get dat data and store dat data.

        try:
            
            response           = requests.get(url2, headers = headers)
            #print("This is the json() from the response: " + response.json())
            retrievedDataStore = response.json()
            data               = retrievedDataStore
        except requests.exceptions.ConnectionError:
            print("server " + str(viewList[x]) + " is down.")
        #print(retrievedDataStore)

class dataDisperse(HTTPMethodView):

    async def get(self, request):
        return response.json(json.dumps(data))

class viewOps(HTTPMethodView):

    async def get(self, request):

        initString = ", "

        returnString = initString.join(viewList)

        return response.json({"message":"View retrieved successfully","view":returnString})


    async def put(self, request):

        addrToBeAppendedToView = request.json['socket-address']

        if (addrToBeAppendedToView in viewList):
            return response.json({"error":"Socket address already exists in the view","message":"Error in PUT"}, status=404)

        elif (addrToBeAppendedToView not in viewList):
            viewList.append(addrToBeAppendedToView)
            return response.json({"message":"Replica added successfully to the view"}, status=201)


    async def delete(self, request):

        addrToBeDeletedFromView = request.json['socket-address']
        print("THis is from DELETE: " + str(addrToBeDeletedFromView))
        try:
            viewList.remove(addrToBeDeletedFromView)
        except:
            return response.json({"error":"Socket address does not exist in the view","message":"Error in DELETE"}, status=404)
            
        return response.json({"message":"Replica deleted successfully from the view"})


app.add_route(dataDisperse.as_view(), '/dataStoreDispersal')
app.add_route(viewOps.as_view(), '/key-value-store-view')

if __name__ == "__main__":
    app.run(debug=True, host="0.0.0.0", port=8085)

sjsadowski · August 28, 2020, 2:02pm

Your second container execution line differs from the other two. Is that a typo or intentional?

sjsadowski · August 28, 2020, 2:21pm

Circling back reading through the rest of the information here:

Simple (but robust) orchestration can be done with docker in Swarm Mode (note, this is different than docker swarm, which was an enterprise product and is now deprecated). This would effectively allow you to do something like the following:

docker service create --replicas=3 [other options] <image> <command>

docker then handles the distribution of the incoming request to the replicas in the service. It is much easier to handle than managing k8s to begin with, and can scale with and across additional docker nodes that are running in swarm mode.

OldNewb · August 30, 2020, 1:19am

Hey sjsadowski and aHopkins…thank you guys so much for the help…

So essentially, I am trying to build a single “view” or cluster of redundant servers, and just as aHopkins suggested, the first thing I needed to accomplish (per the advice of a friend who has already taken the class) is get a replica (a member of the server cluster or ‘view’) that has gone down and has been brought back up to notify the other replicas in the cluster that it is back online, as well as retrieve the data store from one of the servers still functioning in the cluster, and assign its own data store to the retrieved data. I was having trouble with these initial tasks because I was not taking advantage of (or misusing) Sanic’s async capabilities. I simply wrote a loop at the top of the application that sent an HTTP request to each server in the cluster, and also sent a second request to each server in the cluster asking for the data store.

As you can imagine, this was slow as hell, but aside from that, was causing an issue in which the third server in the cluster would receive a strange error from the second server (the one it contacted most recently) which would cause it to crash.

I followed aHopkins’ advice and switched to using app.listener(), and everything appears to be working now, although I’m not entirely sure why. I’m fairly sure it had something to do with the fact that I was totally misunderstanding how to use the async function declaration. For some reason, in my stupidity, I thought the async keyword was only for use with Sanic or was only supported by Sanic - but then I realized, no this is Python’s version of concurrent programming.

As you all know, Python’s interpreter lock (the GIL) prevents more than one interpreter instruction from being read at a time. For this reason, concurrent programming in Python is done using a single thread and asynchronous function calls by using async function definitions and the “await” keyword.

Sorry, I’m not stating all this like you don’t know, it’s just that I didn’t know, and I’m just listing all the stuff I’ve been learning.

My next big step is enforcing causal consistency in my cluster when requests are made to the cluster. I’m gonna continue to learn more about general async programming in Python before I do that though.

I’m not sure if I can use the swarm mode because I think it might violate the requirements of the project, btw.

Thank you both so much for your help, again, btw.

ahopkins · August 29, 2020, 7:45pm

Good luck. Let us know if you have questions.

OldNewb · August 31, 2020, 5:35am

Hey, aHopkins.

Okay, I have a question. This isn’t really specific to Sanic, it’s more a conceptual question regarding vector clocks in general for the key-value storage system I’m trying to build. As I stated above, I am trying to “enforce causal consistency”. I’m trying to accomplish this using vector clocks. I’m a little confused about how vector clocks work in this context. I think my main goal is enforce client-centric consistency, so I was just wondering…

Should my vector clocks be specific to each client? It seems like, after reading some directions and watching some tutorials, that I should be keeping a running record of the version of each key that a client has written to the data store? Does that sound correct, or am I way off? I know that it also sounds very infeasible and memory-consumptive to keep a running record of every client’s write operations to the store…

Is there any chance you could shed some light on how to do this or give me a kick in the right direction?

Because everything I’ve read states that all you need is a general version number … wait…when vector clocks denote the syntax “V1, V2, etc”…are these version numbers specific to keys in the store, or the store itself? Is it “version1 of the store, version2 of the store”, etc, and then we just have to craft some ancillary method of keeping track of what has caused the version number to increase such as a write to key “x” or a write to key “y”, etc?

Any answers or responses you can provide would be greatly appreciated. Thank you so much again for the help you’ve already given!