Firebird Database: SELECT on OCTET versus CHAR

63 views Asked by At

My platform:

  • Server Name: localhost/3050
  • Server Version: WI-V3.0.7.33374 Firebird 3.0
  • Server Implementation: Firebird/Windows/AMD/Intel/x64
  • Service Version: 2

I have been told that record identifiers should be of integer type, because that gives the best select and join performance. So, for database apps in the past, I used a BIGINT as the primary key. This posses problems when migrating data from a production system (let's not go into that nightmare).

I then realized I needed to record primary key identifiers issued in non-sequential order, so I used GUIDs: CHAR(36) CHARACTER SET ASCII populated on insert as = UUID_TO_CHAR(GEN_UUID()). I did speed tests before I built my app (we're talking many years ago) using GUIDs as primary keys and the speed test showed GUIDs to be as fast as integers, so I though I'd be OK. The performance of the app was OK at first, but as the app matured, a couple years later it got slower and slower, and had to be re-written.

Now, I'm building a different project and I read somewhere that the solution to my problem (non-sequentially issued unique identifiers + top performance) was to use octets, populated on insert as GEN_UUID(). I built a test table with 5,000,000 records, and only 3 fields. MY_BIGINT BIGINT, MY_GUID CHAR(36) CHARACTER SET ASCII, MY_UUID CHAR(16) CHARACTER SET OCTETS. I put a primary key on MY_BIGINT and a unique on MY_GUID and MY_UUID. I find a value of each from one record somewhere near the end of the table and issue a SELECT ... WHERE on each of the fields, looking for one record.

My results show there is no difference between any of the field types when selecting. This goes against what I have read, and what I have personally experienced in the past. Does anyone have any experience using these different field types as primary keys and can you tell me for certain which field types provide the best performance for selects and joins? Should I use VARCHAR(16) OCTET or GUID CHAR(36) CHARACTER SET ASCII? I cannot use BIGINT or INTEGER.

gstat -u sysdba -p masterkey -a -t TBL1 "D:\app_vlt\db\fdb\SPEED_TEST.FDB"

gstat results of table:


Database "D:\APP_VLT\DB\FDB\SPEED_TEST.FDB"
Gstat execution time Sun Feb 11 07:17:39 2024

Database header page information:
    Flags           0
    Generation      821
    System Change Number    0
    Page size       16384
    ODS version     12.0
    Oldest transaction  710
    Oldest active       711
    Oldest snapshot     711
    Next transaction    711
    Sequence number     0
    Next attachment ID  75
    Implementation      HW=AMD/Intel/x64 little-endian OS=Windows CC=MSVC
    Shadow count        0
    Page buffers        256
    Next header page    0
    Database dialect    3
    Creation date       Feb 10, 2024 10:56:30
    Attributes      force write

    Variable header data:
    Sweep interval:     20000
    *END*


Database file sequence:
File D:\APP_VLT\DB\FDB\SPEED_TEST.FDB is the only file

Analyzing database pages ...
TBL1 (128)
    Primary pointer page: 167, Index root page: 168
    Pointer pages: 21, data page slots: 67208
    Data pages: 67208, average fill: 77%
    Primary pages: 67208, secondary pages: 0, swept pages: 0
    Empty pages: 6, full pages: 67201
    Fill distribution:
     0 - 19% = 6
    20 - 39% = 1
    40 - 59% = 0
    60 - 79% = 67201
    80 - 99% = 0

    Index PK_TBL1_UUID (0)
    Root page: 5241, depth: 3, leaf buckets: 17496, nodes: 10000000
    Average node length: 19.75, total dup: 0, max dup: 0
    Average key length: 16.76, compression ratio: 0.95
    Average prefix length: 2.24, average data length: 13.76
    Clustering factor: 9999855, ratio: 1.00
    Fill distribution:
         0 - 19% = 40
        20 - 39% = 0
        40 - 59% = 5246
        60 - 79% = 7738
        80 - 99% = 4472

    Index UNQ_MY_BIGINT (1)
    Root page: 19342, depth: 3, leaf buckets: 6930, nodes: 10000000
    Average node length: 11.22, total dup: 0, max dup: 0
    Average key length: 8.22, compression ratio: 1.09
    Average prefix length: 3.78, average data length: 5.22
    Clustering factor: 67202, ratio: 0.01
    Fill distribution:
         0 - 19% = 0
        20 - 39% = 0
        40 - 59% = 0
        60 - 79% = 1
        80 - 99% = 6929

    Index UNQ_MY_GUID (2)
    Root page: 1614, depth: 3, leaf buckets: 32999, nodes: 10000000
    Average node length: 36.85, total dup: 0, max dup: 0
    Average key length: 33.86, compression ratio: 1.06
    Average prefix length: 5.14, average data length: 30.86
    Clustering factor: 9999857, ratio: 1.00
    Fill distribution:
         0 - 19% = 144
        20 - 39% = 1
        40 - 59% = 10726
        60 - 79% = 13874
        80 - 99% = 8254

Gstat completion time Sun Feb 11 07:17:42 2024

Here are the SQLs that I ran to get the results described. They give the exact same response times, same level of PLAN (same number of steps)

/* Each select is run in it's own connection to the database */
SELECT
  MY_BIGINT, MY_GUID, UUID_TO_CHAR(MY_UUID) AS "MY_UUID"
FROM
  TBL1
WHERE
  MY_BIGINT = 12999998
--
SELECT
  MY_BIGINT, MY_GUID, UUID_TO_CHAR(MY_UUID) AS "MY_UUID"
FROM
  TBL1
WHERE
  MY_GUID = '38DD0E53-A80D-4E0A-976E-7AA6C80C11A8'
--
SELECT
  MY_BIGINT, MY_GUID, UUID_TO_CHAR(MY_UUID) AS "MY_UUID"
FROM
  TBL1
WHERE
  MY_UUID = CHAR_TO_UUID('6B81D7EC-BACE-4C3D-963F-CF251ED0202C')
---

The nightmare I refer to is when trying to merge 3 production databases into 1, every record in the target DBs and feeder DBs is linked by sequentially issued integer identifiers, being issued every second, you can't shut down any of it, so you end up with a "moving hole" problem, which I describe in my book, available on Amazon, which I'm probably not allowed to mention here.

0

There are 0 answers