Performance optimization with generated column vs trigger-maintained column

339 views Asked by At

A table I'm using has a char(19) column let's call it P. Due to circumstances in some wheres I need to check if a 10 character variable is found in P (at the end of the string in P). Thus I'm doing: where P like CONCAT('%',variableName).

The performance is accordingly bad. A possible solution is now the introduction of an automatically updated/calculated column let's call it P10 with char(10). Then I can use where P10 like variableName which is way better. (especially when it comes to indexes).

Now the question is what is performancewise the better method (I'm not finding any info on that)? -Alter TABLE MyTable add column P10 char(10) GENERATED ALWAYS AS substr(P,9,10) -Or a trigger event that fills P10 with a substr from P

2

There are 2 answers

0
The Impaler On

If you are sure column P is always the full 19 chars long and the search values is always 10 chars long you can index the search expression and search using it (typing it exactly as you indexed it). For example:

create table t (p varchar(19));

insert into t (p) values ('1234567890123456789');
insert into t (p) values ('1234567890555555555');

create index ix1 on t (substr(p, 10, 10));

select * from t where substr(p, 10, 10) = '0123456789';

See running example at db<>fiddle.

Note that substr(p, 10, 10) is typed the exact same way to the letter in the SELECT.

Alternatively, if you want to search on the last chars of a string, the generic solution is to index the reversed value and then use the reverse pattern to search. This way the % will be at the end of the search pattern and the engine will use the index naturally.

0
Fred Sobotka On

A GENERATED ALWAYS column is just one way to do it

Adding and indexing a derived/generated CHAR(10) column P10 on table MyTable is not the only way to accelerate queries that search the last ten characters of CHAR(19) column P, but if you ultimately choose that approach, defining column P10 as GENERATED ALWAYS offers some advantages over populating P10 via triggers. Aside from being relatively straightforward and more rigorously maintained by the DBMS, GENERATED ALWAYS columns also provide important context that the query optimizer can exploit even when a statement doesn't reference the generated column but does reference the base column the right way.

In the scenario described in the question, adding column P10 CHAR(10) GENERATED ALWAYS AS (SUBSTR(P, 10, 10)) followed by CREATE INDEX ixmytblp10 ON MyTable (P10) should benefit at least these statements:

SELECT ... FROM MyTable ... WHERE P10 = '0516273849'; 
SELECT ... FROM MyTable ... WHERE SUBSTR(P, 10, 10) = '0246813579';

The first statement references P10 directly and will benefit from the index as expected. The second statement doesn't mention the indexed column at all, but the query optimizer should notice that the statement has an expression on P that identically matches the definition of GENERATED ALWAYS column P10 and then take advantage of the index on P10.

A query explain utility is a must-have when tuning SQL

A query explain utility such as db2expln or db2exfmt will show you the detailed access plan for a given SQL statement and reveal whether that statement will or won't use the index you've created. For example, the explain utility isn't likely to produce a desirable access plan for ...WHERE MyTable.P LIKE '%' || some10charactervalue because the % wildcard at the beginning of the LIKE pattern is effectively an ends-with search that typically scans the entire index if not the whole table.

Variable-length considerations

If there's any chance of a given value of P containing fewer than 19 characters, you might want to consider replacing SUBSTR(P, 10, 10) with a more flexible expression such as RIGHT(RTRIM(P), 10) in the GENERATED ALWAYS column definition and in your SQL queries so they can more reliably grab the last ten non-whitespace characters from column P no matter what it contains.

An expression-based index is another way to do it

Another answer to this question correctly points out that a generated column won't be needed if you instead create an expression-based index on column P and ensure your applications are using that expression in their queries. If every non-NULL value of P contains exactly 19 characters, then CREATE INDEX ixmytblp10 ON MyTable (SUBSTR(P, 10, 10)) should work, otherwise go with something more flexible like CREATE INDEX ixmytblp10 ON MyTable (RIGHT(RTRIM(P), 10))