How do I use objects to represent data from my relational database without making an excessive number of database queries?

103 views Asked by At

I'm trying to design my first complex object-oriented PHP application and I'm confused about how to use objects to represent data from my database. More specifically I'm confused about how to combine multiple objects without making a huge number of database queries.

A simple example: let's say I have two database tables:

products: - product_id
          - manufacturer_id
          - name
          - price

manufacturers: - manufacturer_id
               - name
               - factory_address
               - factory_zipcode
               - factory_country
               - // [50 additional columns]

I represent these using two classes:

class Product {}
class Manufacturer {}

Now let's say I want to create a page which lists 1000 products.

I create an array of 1000 Product objects using a ProductFactory class or similar, something like this:

class ProductFactory {
    function allProducts() {
        $results = db_query("SELECT * FROM products");
        foreach ($results as $result) {
            $product_objects[] = new Product($result);
        }
        return $product_objects;
    }
}

That's fine. That's only a single database query and now I have all the product data - great!

But what if I want to create a slightly different page which also displays 1000 products, but also includes data from the manufacturers table?

Sure, I could change my query to use a JOIN to return columns from the manufacturer table as part of my product query, but that doesn't seem very flexible to me, because:

  1. I might not always need manufacturer data, so loading it in the ProductFactory seems wasteful if I only use it on a single page and nowhere else.
  2. The manufacturer data might be very extensive, and I don't want 50 extra columns bloating my query result when all I need is the manufacturer's name.
  3. If I fetched product and manufacturer data in a single query I don't see how I could elegantly create two separate objects (Product and Manufacturer) from the query results without it being super messy.

I guess I could add a second method to my ProductFactory class, something like:

function allProductsWithManufacturerData() {
    // SELECT * FROM products JOIN manufacturers...
}

But that seems hacky and inflexible. People using my code would not know that they need to specifically call this method if they want to include manufacturer data in the Product objects.

Alternatively I could load data only when it's explicity requested:

echo $productObject->getManufacturer()->name;

That seems like a fairly flexible approach to me. But in my example above, that would result in 1000 extra queries (one for each product on the page) which would be horrible for performance.

Also... what if the Manufacturer object itself also loads a child object of its own? For example a Manufacturer might be associated with an Address object ($Manufacturer->getAddress()->zipcode). Fetching the Address object will require its own query too. So now our query chain is 3 queries deep:

  • A query data for the Product object
    • For every Product... a query for the Product->Manufacturer object
      • For every Manufacturer... a query for the Manufacturer->Address object

This could quickly snowball into an exponentially increasing number of nested queries!

So it seems like I'm being forced into a choice:

  1. Choose to fetch all the data you could ever possibly need at the "top level" (using a single query with JOIN statements)
  2. Or... choose to fetch the bare minimum data and then use extra queries to fetch data as required.

Are these my only choices or am I missing something? Is there a design pattern I don't know about that I can use to leverage more flexibility here?

2

There are 2 answers

2
tereško On

Here is how I solve this issue :)

Let's assume you already got your 1000 products, with manufacturers that only have ID (which you got by selecting from only products table).

$manufacturers = []; 
// I would actually use a Collection class, but that's not important here

foreach ($products as $product) {
    $manufacturers[] = $product->getManufacturer();
}

$manufacturerMapper->populate($manufacturers);

Then the data mapper for manufacturers makes a single select with WHERE manufacturer_id IN (....) condition and "fills in" all of the objects using the setters.

What this relies on is the following behavior:

$foo = new Thing;
$bar = $foo;
$bar->setStuff('blah');
var_dump($foo->getStuff() === 'blah'); /// TRUE

The list manufacturers that you passes to the mapper are the same manufacturers that already exist inside the product entities.

The flaw

This is not a perfect solution, because you have to handle "the manufacturers repeat" situation. While you are dealing with 1:1 case, it's not all that tricky (you just need to make an "index" inside the mapper to know all entries with matching PK).

But this does not actually scale all that well.

The "more correct" way is to use something called Identity map that acts as something like a runtime-cache either inside your mappers or your repositories. The first time I encountered the concept was when I as reading about how to make "proper repository".

Core of the solution is creation of this behavior:

$first = $repo->get(45);
$second = $repo->get(45);

$first->setTitle('Lorem Ipsum');
var_dump($second->getTitle() === 'Lorem Ipsum'); // TRUE

This looks a lot like the previous bit, but the important past is that the repository is NOT returning an entity with same data, but instead is returning the same entity.

I personally implemented this behavior at the mapper-level and not in the repositories themselves, but YMMV.

If you are just starting with OOP, ignore this "overkill" solution and stick with the first part :) This extended version requires some boilerplate code.

0
Your Common Sense On

You have two questions here. One is practical and easily answerable. And another one simply has no answer.

The first one is "How do I use objects to represent data from my relational database without making an excessive number of database queries?" and JOIN is the most natural answer to your predicament, and I don't see much point in the objections you made.

#1 and #2 and are actually the same, and honestly, I don't see how it makes any problem. If you don't need the manufacturer data, then don't use a JOIN query. As simple as that.

#3 can be easily resolved with PDO::ATTR_FETCH_TABLE_NAMES

$pdo->setAttribute(PDO::ATTR_FETCH_TABLE_NAMES, true);
$stmt = $pdo->query("SELECT * FROM product p JOIN manufacturer m ON m.id=p.id");
$row = pdo_table_separator($stmt->fetch(PDO::FETCH_ASSOC));
$product =  new Product($row['product']);
$product->manufacturer = new Manufacturer($row['manufacturer']);

function pdo_table_separator($row) {
    foreach ($row as $col => $value) {
        [$table, $column] = explode(".", $col);
        $return[$table][$column] = $value;
    }
    return $return;
}

Your second question is "how to make my objects to represent my relational data" and it has no (satisfying) answer. Some even compare this problem with Vietnam War that gets harder with each foothold you gain and that you inevitably lose anyway.

So you can just comfort yourself that it's not a problem with your experience or a certain pattern that you just don't know yet.

You can add some strategies, such as "lazy load" (your fetch when needed) or "eager load" (tereško's get manufacturers' ids, fetch them in one go and graft the results into products). You can add some switch that tells your code which strategy to use. But you see, there is essentially no solution that doesn't seem hacky or inflexible.

Personally, I would let my PHP code to write SQL for me only for the most basic queries. While every other query, including one requesting products with manufacturers, I'd write explicitly, accompanied with a mapping code, altogether put in a method of a repository class.