monitoring memory usage and time with Python of the subprocess in Unix

60 views Asked by At

I want to track the time elapsed and memory usage of the process (bioinformatic tool) I execute from a Python script. I run the process on the Unix cluster, and save the monitoring parameters in a report_file.txt. To measure the elapsed time, I use the resources library, and to monitor the memory usage I use psutil library.

My main objective is to compare the performance of different tools, so I don't want to restrict memory or time in any way.

import sys
import os
import subprocess, resource
import psutil
import time

def get_memory_info():
    return {
        "total_memory": psutil.virtual_memory().total / (1024.0 ** 3),
        "available_memory": psutil.virtual_memory().available / (1024.0 ** 3),
        "used_memory": psutil.virtual_memory().used / (1024.0 ** 3),
        "memory_percentage": psutil.virtual_memory().percent
    }


# Open file to capture process parameters 
outrepfp = open(tbl_rep_file, "w");


### Start measuring the process parameters
SLICE_IN_SECONDS = 1

# Start measuring time
usage_start = resource.getrusage(resource.RUSAGE_CHILDREN)

# Create the line for process execution
cmd = '{0} {1} --tblout {2} {3}'.format(bioinformatics_tool, setups, resultdir, inputs)

# Execute the process
r = subprocess.Popen(cmd.split(), stdout=subprocess.DEVNULL, stderr=subprocess.PIPE, encoding='utf-8')

# End measuring time
usage_end = resource.getrusage(resource.RUSAGE_CHILDREN) # end measuring resources

# Save memory measures
resultTable = []
while r.poll() == None:
    resultTable.append(get_memory_info())
    time.sleep(SLICE_IN_SECONDS)

# In case the process fails
if r.returncode: sys.exit('FAILED: {}\n{}'.format(cmd, r.stderr))


# Extract used memory
memory = [m['used_memory'] for m in resultTable]

# Count the elapsed time
cpu_time_user = usage_end.ru_utime - usage_start.ru_utime
cpu_time_system = usage_end.ru_stime - usage_start.ru_stime

# Write measurment to report_file.txt
outrepfp.write('{0} {1} {2} {3}\n'.format(bioinformatics_tool, cpu_time_user, cpu_time_system, memory))

For a given process, I received my report_file.txt:

bioinformatics_tool 0.0 0.0 [48.16242980957031, 47.76295852661133]

Could you please help me understand why the elapsed time is showing as 0, even though memory usage was monitored for 2 seconds and two values were captured?

Previously, I had implemented a time-capturing mechanism that reported around 4 seconds of elapsed time for the same process, which seems inconsistent with my current memory usage measurement.

***** EDIT *****

When I moved usage_end behind r.poll() loop I received some time measurement, but more reports of memory:

bioinformatics_tool 1.699341 0.063338 [18.01854705810547, 18.022377014160156, 17.966495513916016, 18.160659790039062, 18.281261444091797, 18.44908142 0898438, 18.343822479248047]

1

There are 1 answers

0
matias On

If the objective is to measure the running time of the process launched by subprocess.Popen, then usage_end = resource.getrusage(resource.RUSAGE_CHILDREN) should probably be after the loop which polls for process termination.