Getting rid of 'Zombie' processes.

Getting rid of 'Zombie' processes.

I have read quite posts around looking for a way to kill processes marked as 'Zombie'...

In theory (and I guess this assertion is true since it makes sense to me) zombie processes are harmless, since the process doesn't truly exist, it died or was killed somewhere in the past, so it doesn't wastes resources, but it still appears in the process list...
The situation is not in fact rare, it is something considered by Linux systems, handled, and the demonstration is that ps command has the ability to deal and properly mark processes as 'Zombies'.
In addition, Zombie processes should be rare, very rare, so why to bother about them?... let them rot!

...Well, but, what if your production server has far more zombie processes than living ones? What if you have several thousand zombies crawling around? What if a ps run yields so many results that it gets awkward?

There are lot of places where you'll read this is impossible to solve without killing the 'parent process' or even worst, system reboot: 'zombies are dead you cannot kill them'... and things like that... well the truth is different, of course you can!!!!

'Zombies' turned to be 'Specters'!

Working with streaming servers, specially Icecast2 has made me face huge zombie infections... Specially, streaming server hosting servers, that host dozens of Icecast2 instances for lots of small customers can be an example of an ideal zombie nightmare.

Understanding what zombies really are, lead me the way to go from periodic server reboots, to perform a precise Google search using proper words and find a script that cleanly kills them all at once!

So, to understand what a 'Zombie' is, we have to know how they appear moaning in your garden... And there is where I realized that those zombies where not really zombies, but mere visible specters, lost souls, trapped in the world of the living, waiting to be freed somehow from some past evil and vanish.
There is no CPU or RAM usage by zombies (no rotting flesh) but just a 'ghostly appearance' in process lists and pid usage... but again, how to they are with us?

To explain this, we have to consider that a process, in Linux environment, can spawn 'child' processes.
So, we have the 'parent process' and the 'child processes'.
If a parent process is terminated, all their children are terminated in cascade.
Here is the key! The parent process has the responsibility of managing their children life-cycle: the parent spawns them, and he, and only he, has to signal them to terminate.
What happens if, somehow, a child process dies or gets killed? it turns out that in some circumstances, the parent doesn't want to realize his beloved child died, and the process ID and its references do persist in time, like if the soul of the dead process is trapped forever in your ps lists... Interestingly, the parent may happily spawn new childs to get the work done! so everything looks normal!

Convince the parent: 'let him go in peace...'

So, instead of talking about gorish zombie killing, what we should do is to have a conversation with the parent process, and ask him to signal normal termination to its already dead process, that will immediately free the the ghost from is charm and he will vanish from process listings, freeing its pid... sweet!!!

But how to do this?
Here's where, at least in Debian systems, we will use gdb command.
To even do things easier I will introduce here Mitch Milner's amazing script, as is, no modification, since all credits should go to him!

Debugger tools: gdb command

The debugging tools allow us to bind to a running process, that's done through the attach stanza, which, upon successfully bind to the process allows us to send the call wait stanza, passing the target 'Zombie pid'.

So, provide you still do not have gdb installed in your system, you should intall it as usual:

apt-get install gdb

 
The gdb command allows its usage in a 'batch' mode, that is, we pass it a text file with a set of commands it should perform.

So, as you'll see (Mitch Milner's script is nicely commented!) ,it looks clear that it first scans the process list searching for Zombies, identifying their parents, and generating, at the same time, a ready to use, gdb batch temporary file in /tmp/zombie_slayer.txt.
Such file, which is carefully renewed after successive script re-runs, contains the appropriate commands to recursively bind to any process which has some child process that has 'zombified', and ask it to free its dead children from the hall of the lost souls...

Mitch Milner's 'Zombie Slayer' shell script:

#!/bin/bash
##################################################################
# Script: Zombie Slayer
# Author: Mitch Milner
# Date:   03/13/2013 ---> A good day to slay zombies
#
# Requirements: yum install gdb / apt-get install gdb
#               permissions to attach to the parent process
#
# This script works by using a debugger to
# attach to the parent process and then issuing
# a waitpid to the dead zombie. This will not kill
# the living parent process.
##################################################################

clear
# Wait for user input to proceed, give user a chance to cancel script
echo "***********************************************************"
echo -e "This script will terminate all zombie process."
echo -e "Press [ENTER] to continue or [CTRL] + C to cancel:"
echo "***********************************************************"
read cmd_string
echo -e "\n"

# initialize variables
intcount=0
lastparentid=0

# remove old gdb command file
rm -f /tmp/zombie_slayer.txt

# create the gdb command file
echo "***********************************************************"
echo "Creating command file..."
echo "***********************************************************"
ps -e -o ppid,pid,stat,command | grep Z | sort | while read LINE; do
  intcount=$((intcount+1))
  parentid=`echo $LINE | awk '{print $1}'`
  zombieid=`echo $LINE | awk '{print $2}'`
  verifyzombie=`echo $LINE | awk '{print $3}'`

  # make sure this is a zombie file and we are not getting a Z from
  # the command field of the ps -e -o ppid,pid,stat,command
  if [ "$verifyzombie" == "Z" ]
  then
    if [ "$parentid" != "$lastparentid" ]
    then
      if [ "$lastparentid" != "0" ]
      then
        echo "detach" >> /tmp/zombie_slayer.txt
      fi
    echo "attach $parentid" >> /tmp/zombie_slayer.txt
    fi
    echo "call waitpid ($zombieid,0,0)" >> /tmp/zombie_slayer.txt
    echo "Logging: Parent: $parentid  Zombie: $zombieid"
    lastparentid=$parentid
  fi
done
if [ "$lastparentid" != "0" ]
then
  echo "detach" >> /tmp/zombie_slayer.txt
fi

# Slay the zombies with gdb and the created command file
echo -e "\n\n"
echo "***********************************************************"
echo "Slaying zombie processes..."
echo "***********************************************************"
gdb -batch -x /tmp/zombie_slayer.txt
echo -e "\n\n"
echo "***********************************************************"
echo "Script complete."
echo "***********************************************************"
#!/bin/bash
##################################################################
# Script: Zombie Slayer
# Author: Mitch Milner
# Date:   03/13/2013 ---> A good day to slay zombies
#
# Requirements: yum install gdb / apt-get install gdb
#               permissions to attach to the parent process
#
# This script works by using a debugger to
# attach to the parent process and then issuing
# a waitpid to the dead zombie. This will not kill
# the living parent process.
##################################################################

clear
# Wait for user input to proceed, give user a chance to cancel script
echo "***********************************************************"
echo -e "This script will terminate all zombie process."
echo -e "Press [ENTER] to continue or [CTRL] + C to cancel:"
echo "***********************************************************"
read cmd_string
echo -e "\n"

# initialize variables
intcount=0
lastparentid=0

# remove old gdb command file
rm -f /tmp/zombie_slayer.txt

# create the gdb command file
echo "***********************************************************"
echo "Creating command file..."
echo "***********************************************************"
ps -e -o ppid,pid,stat,command | grep Z | sort | while read LINE; do
  intcount=$((intcount+1))
  parentid=`echo $LINE | awk '{print $1}'`
  zombieid=`echo $LINE | awk '{print $2}'`
  verifyzombie=`echo $LINE | awk '{print $3}'`

  # make sure this is a zombie file and we are not getting a Z from
  # the command field of the ps -e -o ppid,pid,stat,command
  if [ "$verifyzombie" == "Z" ]
  then
    if [ "$parentid" != "$lastparentid" ]
    then
      if [ "$lastparentid" != "0" ]
      then
        echo "detach" >> /tmp/zombie_slayer.txt
      fi
    echo "attach $parentid" >> /tmp/zombie_slayer.txt
    fi
    echo "call waitpid ($zombieid,0,0)" >> /tmp/zombie_slayer.txt
    echo "Logging: Parent: $parentid  Zombie: $zombieid"
    lastparentid=$parentid
  fi
done
if [ "$lastparentid" != "0" ]
then
  echo "detach" >> /tmp/zombie_slayer.txt
fi

# Slay the zombies with gdb and the created command file
echo -e "\n\n"
echo "***********************************************************"
echo "Slaying zombie processes..."
echo "***********************************************************"
gdb -batch -x /tmp/zombie_slayer.txt
echo -e "\n\n"
echo "***********************************************************"
echo "Script complete."
echo "***********************************************************"

 
Simply superb...
I'm using a modified version of this script (I simply comment out the interactive part of it) safely on streaming servers in production environments (I'm so confident on it that I add it to the cronjob) cleaning hordes of thousands of zombie processes without any issue... Thank you Mitch!!!