Programmatically Shutdown Uvicorn Server Running FastAPI Application

Running over localhost in a Kubernetes Job Pod — can be adapted for other environments

I was faced with an interesting scenario recently where I had a task-handling system written using Python 3.11 and had to integrate a process for inference against Deep Learning model. The inference model was being served via FastAPI on a Uvicorn server… As was the accompanying Task Handler.

The reality being that time and budget impact the way pieces become coupled may start out inefficiently — instead of using two processes to handle this task, implementing the Task Handler and the model Inference into the same entity would be more efficient. However, I digress.

It was simple to implement the communication over the Kubernetes Pod’s localhost — or any OS localhost where two processes are running. I just had to create a couple request calls from my Task Handler Client to communicate over localhost instead of an external service.

Once all the tasks had been consumed from a queue, perhaps a topic for another article, the Task Handler was able to gracefully exit. However, the model Inference process was not and was also not receiving the SIGTERM signal. To force it’s abstracted hand, the Task Handler issued it’s Client to send a shutdown request to the running Inference process.

Attempting to use sys.exit(0) to signal SIGTERM (zero is regarded as “successful” completion of the process and any non-zero value is an error), the Uvicorn server would recognise, execute the call, but then fail to terminate.

Therefore, to navigate around the abstracted Inference process, a SIGTERM signal, when requested to shut down by the Task Handler, was issued to the underlying process ID os.getpid() on the (Container) OS to start termination of self with respect to the Inference process.

This then sets the Inference process into a termination state, initiating the call (referring to the Gist below) for the shutdown event to be executed before successfully terminating. This now allows both Containers in the Pod to terminate and prevent hanging Pods taking up resources that should be scaled-down on completion.

My specific scenario was from a Kubernetes Job with two Containers in a single Pod — a Task Handler and a model Inference process. Ideally, as mentioned earlier, these two processes would be either more tightly coupled over sockets or integrated into a single process, and they will be. Programming complex environments and breaking them down into separable pieces can aid in data flow understanding, however, the goal should be moving toward a more effective and efficient organisation. Separated pieces, when identified to be tightly coupled, can be assessed and redesigned. This is the Agile way.

Programmatically Shutdown Uvicorn Server Running FastAPI Application

Topics