I was trying to generate a report for a given date range from the following table.
table_columns => employee_id |date | status
where status 1 = not_visited, 2 = visited, 3 = canceled, 4 = pending (for approval) The report should look like the following:
+-------------+------------+-------+-------------+---------+----------+---------+
| employee_id | date | total | not_visited | visited | canceled | pending |
+-------------+------------+-------+-------------+---------+----------+---------+
| 3 | 2021-06-01 | 10 | 10 | 0 | 0 | 0 |
| 3 | 2021-06-02 | 22 | 10 | 2 | 10 | 0 |
| 3 | 2021-06-03 | 10 | 10 | 0 | 0 | 0 |
| 3 | 2021-06-05 | 11 | 10 | 1 | 0 | 0 |
| 4 | 2021-06-01 | 11 | 8 | 3 | 0 | 0 |
| 5 | 2021-06-01 | 10 | 1 | 9 | 0 | 0 |
+-------------+------------+-------+-------------+---------+----------+---------+
The query for this report is:
select va.employee_id, va.date,
count(*) as total,
sum(case when status = 1 then 1 else 0 end) as not_visited,
sum(case when status = 2 then 1 else 0 end) as visited,
sum(case when status = 3 then 1 else 0 end) as canceled,
sum(case when status = 4 then 1 else 0 end) as pending
from visiting_addresses va
where va.date >= '2021-06-01'
and va.date <= '2021-06-30'
group by va.employee_id, va.date;
If you look at the result, there is no entry for date 2021-06-04 for employee_id = 3. Also there is no data from 2021-06-06 to 2021-06-30 . I will have to include this dates on the result. So I tried to create another query that will generate dates between the given range. The following query will do that
SELECT gen_date
FROM
(SELECT v.gen_date
FROM
(SELECT ADDDATE('1970-01-01',t4 * 10000 + t3 * 1000 + t2 * 100 + t1 * 10 + t0) gen_date
FROM
(SELECT 0 t0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t0,
(SELECT 0 t1 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t1,
(SELECT 0 t2 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t2,
(SELECT 0 t3 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t3,
(SELECT 0 t4 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t4
) v
WHERE v.gen_date BETWEEN '2021-06-01' AND '2021-06-30'
) calendar;
This query will generate dates like the following:
+------------+
| gen_date |
+------------+
| 2021-06-01 |
| 2021-06-02 |
| 2021-06-03 |
| .......... |
| ...........|
| 2021-06-27 |
| 2021-06-28 |
| 2021-06-29 |
| 2021-06-30 |
+------------+
Now The question is, how do I join this above two queries in a way so that for each employee_id, all dates are present in the result? Or Is it even possible in this way? (The actual table contains 5 million rows. employee_id column has a cardinality of 3k++, date and employee_id columns are indexed)
You tagged both MySQL and MariaDB. These two DBMS are relatives, but they are still different DBMS. In MariaDB you can easily generate a series with the built-in
seq:In MySQL this is not avalable and you'd probably use a recursive query for this:
In a recursive query you can of course generate the dates dynamically, e.g. for the last month in your table or, say, for the current and previous month.
In any SQL dialect you can join queries. In your case you want all dates (generated as shown) combined with either all employees (by selecting from the employee table) or only with employees present in your visiting_addresses table. If you only want employees that have data in your table, use:
In order to get all combinations you'll cross join the two data sets. Then you outer join the data from your table in order to also keep employees/dates without visits.
The query format is:
(If you want this for all employees, then simply replace
( <employee table query here> ) employeeswith the mere table nameemployees.For readability you may prefer
WITHclauses:You have mentioned that your table is quite big. I'd suggest the following index for this query: