MySQL Max of a Date not returning the correct tuple

267 views Asked by At

I have a table "messages", that stores messages sent to people over time, regarding some items.

The structure of the messages table is:

message_id user_id date_sent created_at

For each user, I can have multiple tuples in the table. Some of these messages are already sent, and some are not sent yet.

I'm trying to get the last created message for each user. I'm using max(created_at) and a group_by(user_id), but the associated message_id is not the one associated with the max(created_id) tuple.

Table data:

   message_id | user_id | date_sent | created_at
----------------------------------------------
    1           1       2021-07-01  2021-07-01        
    2           1       2021-07-02  2021-07-02
    3           2       2021-07-01  2021-07-01
    4           3       2021-07-04  2021-07-04
    5           1       2021-07-22  2021-07-22
    6           1       NULL        2021-07-23
    7           2       NULL        2021-07-29
    8           1       NULL        2021-07-29
    9           3       2021-07-29  2021-07-29

My Select:

select * from messages ma right join 
( SELECT max(mb.created_at), message_id
FROM `messages` mb WHERE mb.created_at <= '2021-07-24' 
group by user_id)
mc on ma.message_id=mc.message_id

the result is

   message_id | user_id | date_sent | created_at
----------------------------------------------      
    5           1       2021-07-22  2021-07-23
    3           2       2021-07-01  2021-07-01
    4           3       2021-07-04  2021-07-04

I don't know why but for user 1, the message_id returned is not the one associated with the tuple that has the max(created_at).

I was expecting to be: (get the tuple with the max(date_sent) of the select grouped by user_id)

   message_id | user_id | date_sent | created_at
----------------------------------------------      
    6           1       NULL        2021-07-23
    3           2       2021-07-01  2021-07-01
    4           3       2021-07-04  2021-07-04

Any idea? Any help? thank you.

1

There are 1 answers

0
O. Jones On BEST ANSWER

You're stumbling over MySQL's notorious nonstandard extension to GROUP BY. It gives you the illusion you can do things you can't. Example

 SELECT max(created_at), message_id
   FROM messages 
  GROUP BY user_id

actually means

 SELECT max(created_at), ANY_VALUE(message_id)
   FROM messages 
  GROUP BY user_id

where ANY_VALUE() means MySQL can choose any message_id it finds most convenient from among that user's messages. That's not what you want.

To solve your problem, you need first to use a subquery to find the latest created_at date for each user_id. Fiddle.

              SELECT user_id, MAX(created_at) created_at
                FROM messages
               WHERE created_at <= '2021-07-24'
               GROUP BY user_id

Then, you need to find the message for the particular user_id created on that date. Use the subquery for that. Fiddle

  SELECT a.*
    FROM messages a
    JOIN (
              SELECT user_id, MAX(created_at) created_at
                FROM messages
               WHERE created_at <= '2021-07-24'
               GROUP BY user_id
         ) b ON a.user_id = b.user_id AND a.created_at = b.created_at

See how that JOIN works? It pulls out the rows matching the latest date for each user.

There's a possible optimization. If

  • your message_id is an autoincrementing primary key and
  • you never UPDATE your created_at columns, but only set them to the current date when you INSERT the rows

then the most recent message for each user_id is also the message with the largest message_id. In that case you can use this query instead. Fiddle

  SELECT a.*
    FROM messages a
    JOIN (
              SELECT user_id, MAX(message_id) message_id
                FROM messages
               WHERE created_at <= '2021-07-24'
               GROUP BY user_id
         ) b ON a.message_id=b.message_id

Due to the way primary key indexes work, this can be faster.

You want an ordinary JOIN rather than a RIGHT or LEFT JOIN here: the ordinary JOIN only returns rows that match the ON condition.

Pro tip almost nobody actually uses RIGHT JOIN. When you want that kind of JOIN, use LEFT JOIN. You don't want that kind of join to solve this problem.