While doing performance tests on our application, we are running into an issue where queries executed by Hibernate are taking too long to execute. Details as below:
- Entity class
@Entity @Table(name = "merged_bill_items_data") public class MergedBillItemData{
@Id @AccessType("property") @Column(unique = true, nullable = false) private String id; @Temporal(TemporalType.DATE) @Column(name = "start_date", nullable = false) @Type(type = "com.iblogix.analytic.type.LocalDateDBType") @JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "yyyy-MM-dd") private LocalDate startDate; @Temporal(TemporalType.DATE) @Column(name = "end_date", nullable = false) @Type(type = "com.iblogix.analytic.type.LocalDateDBType") @JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "yyyy-MM-dd") private LocalDate endDate; @Temporal(TemporalType.DATE) @Column(name = "statement_date", nullable = false) @Type(type = "com.iblogix.analytic.type.LocalDateDBType") private LocalDate statementDate; @ManyToOne(fetch = FetchType.EAGER) @JoinColumn(name = "analysis_id", nullable = false) private Analysis analysis; @ManyToOne(fetch = FetchType.EAGER) @JoinColumn(name = "bill_item_def_id", nullable = false) private BillItemDefinition billItemDefinition; ...
Repository class
public interface MergedBillItemsDataRepository extends GenericJpaRepository<MergedBillItemData, String>, JpaSpecificationExecutor {
Corresponding named query:
@NamedQuery(name = "MergedBillItemData.findByUserAndEnergyTypeAndDisplayMonthRangeByAdjType", query = "Select mbid From BuildingUsers bu, MergedBillItemData mbid where bu.user.id=:userId and bu.building.id=mbid.analysis.building.id and mbid.energyType.id =:energyTypeId and mbid.adjustmentType =: adjustmentType and mbid.displayMonth >= :startDate and mbid.displayMonth <= :endDate order by mbid.displayMonth asc"),
Query fetched arounnd 18k records and took around 90 secs to complete. Our assumption was the two entities which are fetched eargly should be the root cause as hibernate would fetch data for each record for the eagerly fetched entities.
So as an alternative approach(Approach 1), we modified the named query and mapped the result to a custom dto, so that the calls for the eagerly fetched entities can be skipped.
@NamedQuery(name = "MergedBillItemData.getBillItemsByUserIdAndEnergyTypeAndDisplayMonth", query = "select new com.iblogix.analytic.dto.MergedBillItemDataWrapper(mbid.id, mbid.startDate,mbid.endDate, mbid.statementDate, mbid.analysis.id as analysisId,mbid.analysis.building.id as buildingId, mbid.billItemDefinition.id as billItemDefinitionId, mbid.billItemDefinition.ffBillItemName,mbid.billItemDefinition.utilityBillItemName, " + "mbid.billItemDefinition.ffBillItemCategory,mbid.energyType.id as energyTypeId, mbid.meterReadDatesAligned, mbid.cost,mbid.statementDatesAligned,mbid.numberOfStatements,mbid.thirdPartyBilled,mbid.itemUsageValue,mbid.unitId,mbid.unitPrice,mbid.readingType,mbid.displayMonth, mbid.adjustmentType) From MergedBillItemData mbid , BuildingUsers bu where bu.user.id=:userId and " + "bu.building.id=mbid.analysis.building.id and mbid.energyType.id =:energyTypeId and mbid.adjustmentType =:adjustmentType and mbid.displayMonth >= :startDate and mbid.displayMonth <= :endDate order by mbid.displayMonth asc"), })
The above change did not have any effect as well.As a last resort, we changed the query to a native query(Approach 2) as below:
> @SqlResultSetMappings({
@SqlResultSetMapping(All the mappings go here)
})
})
})
@NamedNativeQueries({
@NamedNativeQuery(name = "MergedBillItemData.getBillItemsByUserIdAndEnergyTypeAndDisplayMonthAndAdjustmentType", query =
"select mbid.id, mbid.start_date as startDate, mbid.end_date as endDate, mbid.statement_date as statementDate, mbid.analysis_id as analysisId, b.id as buildingId, mbid.bill_item_def_id as billItemDefinitionId,bd.ff_util_bill_item_name as ffBillItemName, bd.util_bill_item_name as utilityBillItemName,bd.ff_util_bill_item_category as ffBillItemCategory "
+",mbid.energy_type_id as energyTypeId, mbid.are_meter_read_dates_aligned as meterReadDatesAligned, mbid.cost as cost,mbid.are_statement_dates_aligned as statementDatesAligned, mbid.number_of_statements as numberOfStatements, mbid.third_party_billed as thirdPartyBilled, "
+"mbid.item_usage_value as itemUsageValue, mbid.unit_id as unitId, mbid.unit_price as unitPrice, mbid.reading_type as readingType, mbid.display_month as displayMonth, mbid.adjustment_type as adjustmentType "
+"from building_users bu "
"INNER JOIN user u ON bu.user_id=u.id "
"INNER JOIN building b ON bu.building_id=b.id "
"INNER JOIN analysis a ON a.building_id=b.id "
"INNER JOIN merged_bill_items_data mbid ON mbid.analysis_id=a.analysis_id "
"INNER JOIN energy_type et ON mbid.energy_type_id=et.id "
"INNER JOIN bill_item_defs bd ON mbid.bill_item_def_id= bd.id "
"where bu.user_id=:userId "
"and mbid.energy_type_id =:energyTypeId "
"and mbid.display_month >= :startDate "
"and mbid.display_month <= :endDate "
"and mbid.adjustment_type =:adjustmentType "
"order by mbid.display_month asc "
, resultSetMapping = "MBIDMapping"),
})
Even with the native query we are not seeing any performance improvement.
Surprisingly, the query generated by Hibernate, if executed from mysql workbench completes within a second for both Approach 2 and Approach 3. Can anyone help identifying the root cause here. Thanks in advance!
There can be many reasons for the behavior you are seeing. Usually, the query execution plan will tell you what the database is doing. As for the workbench completing within seconds, note that the database has various caches which allows subsequent executions of the same query to run faster. Also note that it makes a difference for row estimates if you provide query parameters or literal values. With parameters, databases usually are pessimistic and assume rows based on the average selectivity, which usually results in a higher row estimate. I guess that based on
energyTypeIdandadjustmentTypeyou might have way more or less rows, so it might be worth having separate queries that use literal values for which the database can optimize.Apart from that, also note that many query tools do not actually fetch the whole result set but only the first 100 rows. Depending on the row size and network, it makes a big difference if you fetch just 100 or 18k rows.