Unbounded memory usage using pyo3::types::PyIterator

90 views Asked by At

I have a simple #[pymythods]-impl that uses a &pyo3::types::PyIterator in the most naive way:

/// Construct a `Foo` from an iterator
#[staticmethod]
fn from_iter(iter: &pyo3::types::PyIterator) -> PyResult<Self> {
    for mut foo = Self::new()?;
    for obj in iter {
        foo.bar(obj?)?;
    }
    Ok(foo)
}

I've noticed that memory-usage grows unbounded while the iterator is executing. The pyo3 documentation on memory management seems to specifically mention this situation, albeit it is unclear to me if I understand the problem correctly:

As far as I can see, since we enter the function using a &'a PyIterator, we already hold the GIL, with 'a being bound to the GIL. As the PyIterator returns &'a PyAny during iteration, and because those objects must be valid for at least 'a, the iterated-over objects do not get destroyed during after each iteration of the loop; therefor memory usage grows until the function returns and everything is collected in one fell swoop.

What's the correct strategy here to have each obj destroyed while looping? The documentation points to using unsafe, which I am unsure if the simple code above actually needs.

2

There are 2 answers

0
user2722968 On BEST ANSWER

Answering my own question:

The solution is, indeed, to use unsafe as described in PyO3's documentation on memory management. The unsafety is brought in because we need to destroy the objects that are being iterated over, while the interpreter has no way to determine if the Rust-part is secretly holding on to them.

fn from_iter(py: Python, mut iter: &pyo3::types::PyIterator) -> PyResult<Self> {
    let mut foo = Self::new()?
    // Explicit `loop` instead of `for`-loop, so that the `pool`
    // is active *before* `obj` is returned from the `PyIterator`
    loop {
        // SAFETY: We only derive, and then never observe
        // `obj` outside/after each iteration
        let pool = unsafe { py.new_pool() };
        match iter.next() {
            Some(obj) => {
                foo.bar(obj?)?;
            }
            None => {
                break;
            }
            drop(pool); // Explicit for clarity
        }
    }
    Ok(Self { inner: foo })
}
2
VonC On

When you iterate over a PyIterator, each object returned is a reference (&'a PyAny) that lives as long as the GIL (Python Global Interpreter Lock) is held. That means none of the Python objects you are iterating over will be released until the GIL scope ends, leading to the unbounded memory usage you observed.

That behavior is a result of how PyO3 bridges Rust's safety guarantees with Python's garbage collection, not an issue with Rust's lifetime annotations themselves.
As noted by Jmb in the comments, Rust's lifetimes are indeed descriptive rather than prescriptive. They are used by the Rust compiler to make sure references do not outlive the data they point to, preventing dangling references and ensuring memory safety.
However, lifetimes themselves do not alter how the code behaves at runtime; they are a compile-time check. If your code's behavior does not match the specified lifetimes, you will get a compile-time error rather than a runtime adjustment to make the behavior match the lifetimes.

So it is worth focusing on how PyO3 handles Python objects, rather than the Rust lifetime annotations. One approach to manage memory more effectively is to convert the &PyAny references to owned PyObject instances, which can be dropped explicitly, allowing Python's garbage collector more opportunities to reclaim memory. However, this does not change the fundamental way that Rust's lifetimes work or how PyO3 interacts with Python's garbage collector.

use pyo3::prelude::*;
use pyo3::types::PyIterator;

/// Construct a `Foo` from an iterator
#[staticmethod]
fn from_iter(py: Python, iter: &PyIterator) -> PyResult<Self> {
    let mut foo = Self::new()?;
    for obj in iter {
        let obj = obj?.to_object(py); // Convert to PyObject, which is owned
        foo.bar(&obj)?;
        // Dropping `obj` here allows for its reference count to be decremented immediately
    }
    Ok(foo)
}

That would convert each iterated object into an owned PyObject, which is immediately dropped at the end of the loop iteration. That way, you are not relying on Rust's lifetime annotations to manage memory, but instead using PyO3's API to control the lifetime of Python objects more explicitly.


Running your example in the most simplistic way (for obj in iter { drop(obj?.to_object(py)) }) shows the same behavior, though; memory usage balloons during iteration, until all objects are collected after returning to the interpreter.

It means the act of converting PyO3 references to owned PyObject instances and dropping them does not necessarily trigger Python's garbage collector to immediately reclaim the memory. Python's garbage collector runs periodically and may not collect the objects immediately, especially during a tight loop where the Rust side rapidly creates and drops PyObject instances.
That is why you see the memory usage balloon: the objects are being dereferenced (and thus eligible for collection), but they have not been collected yet.

You could try to periodically trigger Python's garbage collection manually from your Rust code. That is more of a workaround and should be used judiciously, as frequent calls to the garbage collector can negatively impact performance.

use pyo3::prelude::*;
use pyo3::types::{PyIterator, PyList};

#[pyfunction]
fn iterate_with_gc(iter: &PyIterator) -> PyResult<()> {
    let gil = Python::acquire_gil();
    let py = gil.python();
    let gc_module = PyModule::import(py, "gc")?;

    let mut count = 0;
    for obj in iter {
        drop(obj?.to_object(py));
        count += 1;

        // Trigger Python's garbage collection every 100 iterations
        if count % 100 == 0 {
            gc_module.call0("collect")?;
        }
    }
    Ok(())
}

#[pymodule]
fn my_module(py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(iterate_with_gc, m)?)?;
    Ok(())
}