GROUP BY and custom order

253 views Asked by At

I've read through the answers on MySQL order by before group by but applying it to my query ends up with a subquery in a subquery for a rather simple case so I'm wondering if this can be simplified:

Schema with sample data

For brevity I've omitted the other fields on the members table. Also, there's many more tables joined in the actual application but those are straightforward to join. It's the membership_stack table that's giving me issues.

CREATE TABLE members (
  id int unsigned auto_increment,
  first_name varchar(255) not null,
  PRIMARY KEY(id)
);

INSERT INTO members (id, first_name)
     VALUES (1, 'Tyler'),
            (2, 'Marissa'),
            (3, 'Alex'),
            (4, 'Parker');

CREATE TABLE membership_stack (
  id int unsigned auto_increment,
  member_id int unsigned not null,
  sequence int unsigned not null,
  team varchar(255) not null,
  `status` varchar(255) not null,
  PRIMARY KEY(id),
  FOREIGN KEY(member_id) REFERENCES members(id)
);

-- Algorithm to determine correct team:
-- 1. Only consider rows with the highest sequence number
-- 2. Order statuses and pick the first one found:
--    (active, completed, cancelled, abandoned)

INSERT INTO membership_stack (member_id, sequence, team, status)
     VALUES (1, 1, 'instinct', 'active'),
            (1, 1, 'valor', 'abandoned'),
            (2, 1, 'valor', 'active'),
            (2, 2, 'mystic', 'abandoned'),
            (2, 2, 'valor', 'completed'),
            (3, 1, 'instinct', 'completed'),
            (3, 2, 'valor', 'active');

I can't change the database schema because the data is synchronized with an external data source.

Query

This is what I have so far:

    SELECT m.id, m.first_name, ms.sequence, ms.team, ms.status
      FROM membership_stack AS ms
      JOIN (
    SELECT member_id, MAX(sequence) AS sequence
      FROM membership_stack
  GROUP BY member_id
           ) AS t1
        ON ms.member_id = t1.member_id
       AND ms.sequence = t1.sequence
RIGHT JOIN members AS m
        ON ms.member_id = m.id
  ORDER BY m.id, FIELD(ms.status, 'active', 'completed', 'cancelled', 'abandoned');

This works as expected but members may appear multiple times if their "most recent sequence" involves more than one team. What I need to do is aggregate again on id and select the FIRST row in each group.

However that poses some issues:

  1. There is no FIRST() function in MySQL
  2. This entire resultset would become a subtable (subquery), which isn't a big deal here but the queries are quite big on the application.
  3. It needs to be compatible with ONLY_FULL_GROUP_BY mode as it is enabled on MySQL 5.7 by default. I haven't checked but I doubt that FIELD(ms.status, 'active', 'completed', 'cancelled', 'abandoned') is considered a functionally dependent field on this resultset. The query also needs to be compatible with MySQL 5.1 as that is what we are running at the moment.

Goal

| id | first_name | sequence |     team |    status |
|----|------------|----------|----------|-----------|
|  1 |      Tyler |        1 | instinct |    active |
|  2 |    Marissa |        2 |    valor | completed |
|  3 |       Alex |        2 |    valor |    active |
|  4 |     Parker |     NULL |     NULL |      NULL |

What can I do about this?

Edit: It has come to my attention that some members don't belong to any team. These members should be included in the resultset with null values for those fields. Question updated to reflect new information.

2

There are 2 answers

2
Paul Spiegel On BEST ANSWER

You can use a correlated subquery in the WHERE clause with LIMIT 1:

SELECT m.id, m.first_name, ms.sequence, ms.team, ms.status
FROM members AS m
JOIN membership_stack AS ms ON ms.member_id = m.id
WHERE ms.id = (
    SELECT ms1.id
    FROM membership_stack AS ms1
    WHERE ms1.member_id = ms.member_id
    ORDER BY ms1.sequence desc, 
             FIELD(ms1.status, 'active', 'completed', 'cancelled', 'abandoned'),
             ms1.id asc
    LIMIT 1
)
ORDER BY m.id;

Demo: http://rextester.com/HGU18448

Update

To include members who have no entries in the membership_stack table you should use a LEFT JOIN, and move the subquery condition from the WHERE clause to the ON clause:

SELECT m.id, m.first_name, ms.sequence, ms.team, ms.status
FROM members AS m
LEFT JOIN membership_stack AS ms 
    ON  ms.member_id = m.id
    AND ms.id = (
        SELECT ms1.id
        FROM membership_stack AS ms1
        WHERE ms1.member_id = ms.member_id
        ORDER BY ms1.sequence desc, 
                 FIELD(ms1.status, 'active', 'completed', 'cancelled', 'abandoned'),
                 ms1.id asc
        LIMIT 1
    )
ORDER BY m.id;

Demo: http://rextester.com/NPI79503

3
Gordon Linoff On

I would do this using variables.

You are looking for the one membership_stack row that is maximal for your special ordering. I'm focusing just on that. The join back to members is trivial.

select ms.*
from (select ms.*,
             (@rn := if(@m = member_id, @rn + 1,
                        if(@m := member_id, 1, 1)
                       )
             ) as rn
      from membership_stack ms cross join
           (select @m := -1, @rn := 0) params
      order by member_id, sequence desc,
               field(ms.status, 'active', 'completed', 'cancelled', 'abandoned')
     ) ms
where rn = 1;

The variables is how the logic is implemented. The ordering is key to getting the right result.

EDIT:

MySQL is quite finicky about LIMIT in subqueries. It is possible that this will work:

select ms.*
from membership_stack ms
where (sequence, status) = (select ms2.sequence, ms2.status
                            from membership_stack ms2
                            where ms2.member_id = ms.member_id
                            order by ms2.member_id, ms2.sequence desc,
                                     field(ms2.status, 'active', 'completed', 'cancelled', 'abandoned')
                            limit 1
                           );