How to associate an online order with several preceding website visits using a rank function in PostgreSQL

57 views Asked by At

I'm working to create a database view that associates an online order with a users preceding website visits. This is for an eCommerce website so one user may visit and order several times.

I've already joined the visits table and orders table on user_id and associated the nearest less-than session time with an order time. Now, I'm hoping to say every visit up until order #1 is "1", and then after visit thereafter up until order #2 is "2". Also, if there is no order_id for that specific user I'd like to return "0". See screenshots linked below for reference.

I've already tried to use dense_rank, but it is only ranking the rows in which an order_id is present. I want to carry forward these ranks.

SELECT v.id AS visit_id,
    v.user_id,
    v.started_at AS visit_date,
    dense_rank() OVER (PARTITION BY v.user_id ORDER BY v.started_at) AS visit_number,
    dense_rank() OVER (PARTITION BY v.user_id ORDER BY o.id) AS order_number,
    o.id AS order_id,
    o.created_at AS order_date
   FROM visits v
     FULL JOIN orders o ON v.user_id = o.user_id AND v.started_at < o.created_at AND o.created_at < (( SELECT min(visits.started_at) AS min
           FROM visits
          WHERE visits.user_id = v.user_id AND visits.started_at > v.started_at)) AND (v.started_at + '24:00:00'::interval) > o.created_at
  GROUP BY v.id, v.user_id, v.started_at, o.id, o.created_at
  ORDER BY v.started_at;

Current results Expected Results

2

There are 2 answers

1
Gordon Linoff On BEST ANSWER

The GROUP BY seems unnecessary, but I'll leave it it. You basically need a cumulative sum.

I would assign all visits before a particular order the order number:

SELECT v.id AS visit_id, v.user_id,
       v.started_at AS visit_date,
       dense_rank() OVER (PARTITION BY v.user_id ORDER BY v.started_at) AS visit_number,
       dense_rank() OVER (PARTITION BY v.user_id ORDER BY o.id) AS order_number,
       o.id AS order_id,
       o.created_at AS order_date,
       count(o.id) over (partition by v.user_id order by v.started_at) as order_number
FROM visits v FULL JOIN
     orders o
     ON v.user_id = o.user_id AND
        v.started_at < o.created_at AND
        o.created_at < (SELECT min(visits.started_at)
                        FROM visits v2 
                        WHERE v2.user_id = v.user_id AND 
                              v2.started_at > v.started_at) AND
        (v.started_at + '24:00:00'::interval) > o.created_at
GROUP BY v.id, v.user_id, v.started_at, o.id, o.created_at
ORDER BY v.started_at;

I think this is the logic that you want:

SELECT v.id AS visit_id, v.user_id,
       v.started_at AS visit_date,
       dense_rank() OVER (PARTITION BY v.user_id ORDER BY v.started_at) AS visit_number,
       dense_rank() OVER (PARTITION BY v.user_id ORDER BY o.id) AS order_number,
       o.id AS order_id,
       o.created_at AS order_date,
       MIN(o.order_number) OVER (PARTITION BY v.user_id ORDER BY v.started_at DESC) as order_number
FROM visits v FULL JOIN
     (SELECT o.*,
             ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY o.id) as order_number
      FROM orders o
     ) o
     ON v.user_id = o.user_id AND
        v.started_at < o.created_at AND
        o.created_at < (SELECT min(visits.started_at)
                        FROM visits v2 
                        WHERE v2.user_id = v.user_id AND 
                              v2.started_at > v.started_at) AND
        (v.started_at + '24:00:00'::interval) > o.created_at
GROUP BY v.id, v.user_id, v.started_at, o.id, o.created_at
ORDER BY v.started_at;

It might produces NULLs where you want 0s, however.

1
Vamsi Prabhala On

Use lag to check if the previous row is non-null so it can be flagged for a new group start. Once the flag is set, you can use a running sum to define groups.

SELECT T.*,
       1+SUM(FLAG) OVER(PARTITION BY user_id ORDER BY visit_date) AS order_number
FROM (
SELECT v.id AS visit_id,
    v.user_id,
    v.started_at AS visit_date,
    dense_rank() OVER (PARTITION BY v.user_id ORDER BY v.started_at) AS visit_number,
    o.id AS order_id,
    o.created_at AS order_date,
    --conditioncheck with lag
    case when lag(o.id) over(partition by v.user_id order by v.started_at) is not null then 1 else 0 end as flag
   FROM visits v
     FULL JOIN orders o ON v.user_id = o.user_id AND v.started_at < o.created_at AND o.created_at < (( SELECT min(visits.started_at) AS min
           FROM visits
          WHERE visits.user_id = v.user_id AND visits.started_at > v.started_at)) AND (v.started_at + '24:00:00'::interval) > o.created_at
    ) T