My table in a PostgreSQL database stores a large number of integer arrays in the column called vector. Since all the numbers fit into the range from -128 to 127 and to save space, I use the bytea type. I need to use the dot product distance operator <#> provided by the pgvector extension, e.g.:
select id
from MyTable
order by vector<#>key
limit 10;
However, this operator does not support the bytea type. I want to convert my vectors to int[], which can then be cast to the vector type supported by pgvector.
I think the general direction for converting bytea to int[] might be along the lines of:
SELECT string_to_array(encode(vector, 'escape'), '\\')::int[]
FROM MyTable;
How do I modify this to make the conversion work?
You can join with
generate_series()and use theget_byte()function:This is going to be slow and inefficient, but that is to be expected if you try to save space at all costs.
I would recommend to use a
smallint[], which uses 2 bytes per number.If you are running PostgreSQL v15 or later, you could also use an array of the internal data type
"char", which takes only one byte per element:To get an individual element, you use
A
"char"[]uses one byte per element plus an overhead of 21 bytes per array (abyteahas an overhead of one or four bytes per value, depending on the size).