python - comparing a newly written file with filecmp.cmp() always returns False?

user797963

I must be making a stupid mistake here, because this should be working. I'm thinking the file is staying open or something, and it's driving me nuts.

This is for some regression test cases I have where I'm comparing generated output of a script ran against mock files to known good output files (key files).

Here is a simple example:

def run_and_compare(self, key_file, out_file, option):
    print filecmp.cmp(out_file, key_file) # always True (as long as I've run this before, so the out_file exists already)
    cmd = './analyze_files.py -f option'
    with open(out_file, 'wb') as out:
        subprocess.Popen(cmd.split(), stdout=out, stderr=subprocess.PIPE)
    print filecmp.cmp(out_file, key_file) # always False 
    time.sleep(5)
    print filecmp.cmp(out_file, key_file) # always True 

I really don't want to keep that sleep in the test! How can I be sure the out file is OK to compare without using the sleep? I've tried using out.close(), but it doesn't work, and shouldn't be needed as long as I'm using 'with'. I'm using python 2.6.4 if that matters here.

Martijn Pieters

It doesn't matter that you opened the output file object as a context manager. It wouldn't even matter if you explicitly, manually closed the file object.

That's because when you hand a Python file object to subprocess.Popen(), all it takes from that file object is the file handle, an integer number that your OS uses to communicate about open files. The subprocess then uses os.dup2() to clone that filehandle onto the STDOUT file handle of a child process; this is what causes the output of that child process to go to your designated file on disk.

Because the file handle is duped, closing the original Python file object (and indirectly, the original OS file handle) won't actually close the file, because that second file handle still keeps it open.

The reason that you see the file data appear after waiting a few seconds, is because eventually the subprocess you created will complete and only then is that other, duped file handle closed.

Instead of waiting for a few seconds, wait for the subprocess to complete using the Popen.communicate() method:

p = subprocess.Popen(cmd.split(), stdout=open(out_file, 'wb'),
                     stderr=subprocess.PIPE)
stdout, stderr = p.communicate()  # stdout will always be None

I inlined the open() call, because there is no other use for that file object once subprocess.Popen() retrieved the file handle from it. You could also use os.open() instead of open() (same arguments) and safe yourself creating a Python file object where only a file handle is enough.

Don't use p.wait(); because you are using a pipe for the STDERR stream of the child process, you can deadlock the process if you don't read from STDERR but the child process writes a lot of data to it. You'd end up waiting forever.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related