MYSQL: Still using filesort, using temporary, even though the index is properly added

69 views Asked by At

I have this query that I think I have indexed them properly. But still get filesort and temporary indexing.

The query is as follow:

SELECT * FROM
    (SELECT PIH.timestamp, PIH.practice_id, PIH.timestamp as invoice_num, PIH.custom_invnum,
CEIL(PIH.total_invoice + PIH.tax + PIH.other_bill)  as grand_total, PIH.total_invoice, PIH.extra_charge_ph as extra_charge,
PIH.tax, PIH.other_bill, PIH.changed, PIH.source,
PIH.notes, PIH.is_active, PIH.paid as pay,
PIH.covered_amount, IF(PIH.is_active = 1, IF(PIH.total_invoice = 0 OR PIH.total_invoice + PIH.tax + PIH.other_bill - PIH.covered_amount <= PIH.paid, 1, IF(PIH.paid = 0, 0, 2)), '')  as invoice_st,
RPP.patient_id, RPP.first_name as pfname, RPP.last_name as plname, RPP.dob as p_dob, RPP.gender as p_gender, RPP.reff_id as p_reff_id, RPP.mobile_number as p_mobile, IF(PIH.group_doctors IS NOT NULL, NULL, D.doc_title) as doc_title, IF(PIH.group_doctors IS NOT NULL,
PIH.group_doctors, D.first_name) as doc_fname, IF(PIH.group_doctors IS NOT NULL, PIH.group_doctors, D.last_name) as doc_lname, IF(PIH.group_doctors IS NOT NULL, NULL, D.spc_dsg) as spc_dsg, PA.username, TL.timestamp as checkout_time, IP.name as ip_name, PMM.timestamp as mcu_id
            FROM  practice_invoice_header PIH
            INNER JOIN  practice_invoice_detail PID  ON PID.timestamp = PIH.timestamp
              AND  PID.practice_id = PIH.practice_id
            INNER JOIN  practice_queue_list PQL  ON PQL.encounter_id = PID.encounter_id
              AND  PQL.practice_place_id = PIH.practice_id
            INNER JOIN  temp_search_view D  ON D.id = PQL.doctor_id
              AND  D.pp_id = PQL.practice_place_id
            INNER JOIN  practice_place PP  ON PP.id = PIH.practice_id
            INNER JOIN  ref_practice_patient RPP  ON RPP.patient_id = PIH.patient_id
              AND  RPP.practice_id = PP.parent_id
            LEFT JOIN  practice_mcu_module PMM  ON PMM.id = PID.mcu_module_id
              AND  PMM.practice_id = PID.practice_id
            LEFT JOIN  transaction_log TL  ON TL.reff_id = PIH.timestamp
              AND  TL.practice_id = PIH.practice_id
              AND  TL.activity = "CHK"
            LEFT JOIN  practice_admin PA  ON PA.id = TL.admin_id
            LEFT JOIN  insurance_plan IP  ON IP.id = PIH.insurance_plan_id
            WHERE  PIH.source <> 'P'
              AND  PIH.practice_id = 28699
              AND  PIH.is_active = 1
              AND  PQL.cal_id >= 201807010
              AND  PQL.cal_id <= 201807312
            GROUP BY  PIH.timestamp, PIH.practice_id 
    ) AS U  LIMIT 0,20

NOTE: I only show some of the main tables that are used in this query and the ones that sort using filesort/temporary, of course If I post everything it will be too many information.

The query is about list of invoices, and it has the header (practice_invoice_header) and the details (practice_invoice_detail). And this query join with the practice_place table

CREATE TABLE `practice_invoice_header` (
 `timestamp` bigint(20) NOT NULL,
 `practice_id` int(11) NOT NULL,
 `cal_id` int(11) NOT NULL,
 `patient_id` int(11) NOT NULL DEFAULT 0,
 `source` char(1) NOT NULL COMMENT 'E = ENCOUNTER; P = OTHER (PHARM / LAB)',
 `total_invoice` float(30,2) NOT NULL DEFAULT 0.00,
 `tax` float(30,2) NOT NULL DEFAULT 0.00,
 `other_bill` float(30,2) NOT NULL DEFAULT 0.00,
 `changed` float(30,2) NOT NULL DEFAULT 0.00,
 `paid` float(30,2) NOT NULL DEFAULT 0.00,
 `covered_amount` float(30,2) NOT NULL DEFAULT 0.00,
 `notes` varchar(300) DEFAULT NULL,
 `custom_invnum` varchar(30) DEFAULT NULL,
 `insurance_plan_id` varchar(20) DEFAULT NULL,
 `is_active` int(11) NOT NULL DEFAULT 1,
 `cancel_reason` varchar(200) DEFAULT NULL,
 PRIMARY KEY (`timestamp`,`practice_id`),
 KEY `custom_invnum` (`custom_invnum`),
 KEY `insurance_plan_id` (`insurance_plan_id`),
 KEY `practice_id_3` (`practice_id`,`xxx_reff_id`),
 KEY `ph_check_status` (`ph_checked_by`),
 KEY `cal_id` (`cal_id`),
 KEY `practice_id_5` (`practice_id`,`outpx_id`),
 KEY `practice_id_6` (`practice_id`,`cal_id`,`source`,`is_active`),
 KEY `total_invoice` (`total_invoice`),
 KEY `patient_id` (`patient_id`),
 CONSTRAINT `practice_invoice_header_ibfk_1` FOREIGN KEY (`practice_id`)
       REFERENCES `practice_place` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `practice_invoice_detail` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `timestamp` bigint(20) NOT NULL,
 `practice_id` int(11) NOT NULL,
 `item_id` int(11) NOT NULL,
 `item_sub_id` int(11) DEFAULT NULL,
 `item_type` char(1) NOT NULL COMMENT 'D = DRUG; P = PROCEDURE; L = LAB',
 `item_qty` float NOT NULL,
 `item_price` float(22,2) NOT NULL,
 `discount` float NOT NULL DEFAULT 0,
 `is_active` int(11) NOT NULL DEFAULT 1,
 PRIMARY KEY (`id`),
 KEY `item_type` (`item_type`),
 KEY `timestamp` (`timestamp`,`practice_id`),
 KEY `practice_id` (`practice_id`),
 KEY `item_id_2` (`item_id`,`item_sub_id`,`item_type`),
 KEY `timestamp_2` (`timestamp`,`practice_id`,`item_id`,`item_sub_id`,`item_type`),
 KEY `practice_id_3` (`practice_id`,`item_type`),
 KEY `the_id` (`id`,`practice_id`) USING BTREE,
 KEY `timestamp_3` (`timestamp`,`practice_id`,`item_type`,`item_comission`,
      `item_comission_type`, `doctor_id`,`item_id`,`item_sub_id`,`id`) USING BTREE,
 KEY `timestamp_4` (`timestamp`,`practice_id`,`item_id`,`item_sub_id`,`item_type`,
      `item_comission_2`,`item_comission_2_type`,`doctor_id_2`,`id`) USING BTREE,
 KEY `request_id` (`request_id`,`request_practice`),
 KEY `timestamp_5` (`timestamp`,`practice_id`,`is_active`),
 KEY `practice_id_6` (`practice_id`,`encounter_id`,`is_active`),
 KEY `practice_id_7` (`practice_id`,`item_type`,`encounter_id`,`is_active`),
 CONSTRAINT `practice_invoice_detail_ibfk_1` FOREIGN KEY (`timestamp`)
     REFERENCES `practice_invoice_header` (`timestamp`) ON DELETE CASCADE,
 CONSTRAINT `practice_invoice_detail_ibfk_2` FOREIGN KEY (`practice_id`)
     REFERENCES `practice_place` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1447348 DEFAULT CHARSET=latin1

CREATE TABLE `ref_practice_patient` (
 `practice_id` int(11) NOT NULL,
 `patient_id` int(11) NOT NULL,
 `reff_id` varchar(35) DEFAULT NULL,
 `is_user` int(11) NOT NULL DEFAULT 0,
 `parent_user_id` int(11) NOT NULL DEFAULT 0
 PRIMARY KEY (`practice_id`,`patient_id`),
 KEY `patient_id` (`patient_id`),
 KEY `reff_id` (`reff_id`),
 KEY `practice_id` (`practice_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `practice_place` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(75) NOT NULL,
 `statement` text DEFAULT NULL,
 `address` varchar(200) NOT NULL,
 `phone` varchar(15) NOT NULL,
 `wa_number` varchar(15) DEFAULT NULL,
 `fax` varchar(15) NOT NULL,
 `email` varchar(50) NOT NULL,
 `is_branch` int(11) NOT NULL,
 `parent_id` int(11) NOT NULL,
 `editted_by` int(11) DEFAULT NULL,
 `editted_date` bigint(20) DEFAULT NULL,
 `status` int(11) NOT NULL DEFAULT 1,
 PRIMARY KEY (`id`),
 KEY `parent_id` (`parent_id`),
 KEY `reff_id` (`reff_id`),
) ENGINE=InnoDB AUTO_INCREMENT=29058 DEFAULT CHARSET=latin1

And below is the explain produce by the query, and I highlight the one using filsort (no. 2)

1 PRIMARY ALL NULL  NULL NULL NULL  14028
2 DERIVED PP  const PRIMARY,parent_id PRIMARY 4 const 1 Using temporary; Using filesort
2 DERIVED PIH ref   PRIMARY,practice_id_3,practice_id_5,practice_id_6,practice_id_8,pharm_read,lab_read,rad_read,patient_id
                   practice_id_5 4 const  7014 Using where 
2 DERIVED RPP eq_ref PRIMARY,patient_id,practice_id,practice_id_2,practice_id_3 
                   PRIMARY 8 const,k6064619_lokadok.PIH.patient_id  1
2 DERIVED PID ref   timestamp,practice_id,timestamp_2,practice_id_2,practice_id_3,timestamp_3,timestamp_4,practice_id_4,practice_id_5,timestamp_5,practice_id_6,practice_id_7
                   timestamp 12 k6064619_lokadok.PIH.timestamp,const  1 
2 DERIVED PMM eq_ref PRIMARY,id,practice_id
                   PRIMARY 4 k6064619_lokadok.PID.mcu_module_id  1 Using where 
2 DERIVED TL  ref   reff_id reff_id 12 k6064619_lokadok.PIH.timestamp,const  1 Using where 
2 DERIVED PA  eq_ref PRIMARY PRIMARY 4 k6064619_lokadok.TL.admin_id  1 Using where 
2 DERIVED IP  ref   PRIMARY,id PRIMARY 22 k6064619_lokadok.PIH.insurance_plan_id  1 Using where 
2 DERIVED PQL ref   PRIMARY,encounter_id,cal_id_2
                   encounter_id 5 k6064619_lokadok.PID.encounter_id  2 Using where; Using index
2 DERIVED D   ref   doc_id,pp_id,id_2,pp_doc doc_id 4 k6064619_lokadok.PQL.doctor_id  1 Using where

I believe I have indexed the parent_id in practice_place table, and also in ref_practice_patient the patient_id and practice_id is PRIMARY.

1

There are 1 answers

2
Rick James On

Why have the outer query? The Optimizer is free to shuffle the result of the inner query, thereby leaving the LIMIT to pick an ordering that you not expecting. At least add ORDER BY, preferably also toss the outer select.

Main Index

Let's analyze the likely place to design an index:

        WHERE  PIH.source <> 'P'
          AND  PIH.practice_id = 28699
          AND  PIH.is_active = 1
          AND  PQL.cal_id >= 201807010
          AND  PQL.cal_id <= 201807312
        GROUP BY  PIH.timestamp, PIH.practice_id 

Since there is a mixture of tables involved, it is not possible to have an index that handles all the WHERE.

Since the tests are not all =, it is not possible to reach beyond the WHERE and include columns of the GROUP BY.

So, I see two indexes:

PIH:  INDEX(practice_id, is_active,   -- in either order
            source)
PQL:  INDEX(cal_id)

Since we can't get into the GROUP BY, the Optimizer has no choice but to gather all the rows based on WHERE, do some grouping, and do an ORDER BY (as I said, that is missing, but necessary).

Therefor, GROUP BY and the ORDER BY will require one or two temps and filesorts. No, you can't get away from it, at least not without changing the query in some way. (Please note that "filesort" might actually be done in RAM.)

Your extra SELECT layer may be adding an extra temp and filesort.

EXPLAIN fails to point out when there are two sorts. EXPLAIN FORMAT=JSON has that sort of detail.

Other issues...

Having a timestamp in a PRIMARY KEY is risky unless you are sure that two rows can occur with the same timestamp, or there is another column in the PK to assure uniqueness.

Don't use FLOAT for money. It will incur extra rounding errors, and it cannot store more than about 7 significant digits (that' less than $100K to the penny). Don't use float(30,2), it is even worse because you are forcing an extra rounding. Use DECIMAL(30,2), but pick something reasonable, not 30. It takes 14 bytes -- mostly a waste of space.

Whenever you have INDEX(a,b), you don't need INDEX(a); it is redundant and slows down (slightly) INSERTs.

LEFT JOIN  transaction_log TL
           ON  TL.reff_id = PIH.timestamp
          AND  TL.practice_id = PIH.practice_id
          AND  TL.activity = "CHK"

needs

INDEX(reff_id, practice_id, activity)  -- in any order

Also

        INNER JOIN  practice_invoice_detail PID  ON PID.timestamp = PIH.timestamp
          AND  PID.practice_id = PIH.practice_id

PIH:  INDEX(practice_id, timestamp)   -- not the opposite order
PIH:  INDEX(practice_id, is_active, timestamp)

        INNER JOIN  practice_queue_list PQL  ON PQL.encounter_id = PID.encounter_id
          AND  PQL.practice_place_id = PIH.practice_id

PQL:  INDEX(encounter_id, cal_id)
PQL:  INDEX(encounter_id, practice_place_id, cal_id)

Some discussion...

  • In a JOIN, EXPLAIN shows one order of working through the tables; it gives you no clues of how things would be if it worked through the tables some other way.
  • I have attempted to show what index might be needed if PQL were used first, or if PIH were used first -- namely use the WHERE stuff for that table, then
  • I have attempted to show the optimial index for joining to the other table.
  • Probably the Optimizer will not start with any table not mentioned in the WHERE clause, but this is not a certainty.
  • I have not listed the optimal indexes for getting to each of the other tables.
  • More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql