Replacing empty and null strings in Redshift with default strings when querying?

60 views Asked by At

My table has some null and empty strings, these look ugly when I query the table so I want to replace them with other values. I don't own the data so I can't modify the table itself, this needs to happen when I query.

I tried using regexp_replace to replace the empty strings with regexp_replace(column, '^$', '(no value found/some other custom message)' which didn't work

3

There are 3 answers

0
Schwern On

null and empty string have to be treated separately. In addition, there's a difference between '' (empty) and ' ' (blank).

null is "unknown". regexp_replace(null, '^$', 'default') doesn't do anything because the value it's matching is unknown. The match fails and it returns null.

Instead, use coalesce. It returns the first non-null value. coalesce(thing, 'default').

To capture empty and blank strings, search for ^\s*$. regexp_replace(thing, '^\s*$', 'default').

We put them together like so...

select
  coalesce(regexp_replace(thing, '^\s*$', 'default'), 'default')
from test;

If thing is null, regexp_replace will return null, and coalesce will return 'default'.

If thing is empty or blank, regexp_replace will return default and coalesce will return default.

If thing is none of those, regexp_replace will return thing and coalesce will return thing.

An alternative that might be easier to understand is:

select
  case 
    when trim(word) = '' then 'default'
    else coalesce(word, 'default')
  end
from test

If you like, you can put this into a function. We can declare it immutable (the same arguments always produce the same result) potentially giving it a performance boost.

create function replace_blanks(text, text) returns text
immutable
as $$
  select coalesce(regexp_replace($1, '^\s*$', $2), $2)
$$ language sql;

select replace_blanks(word, 'default') from test;

Demonstration in PostgreSQL. Redshift is derived from PostgreSQL so it should work the same.

0
Bohemian On

Convert null to blank using coalesce() before applying your regexp_replace:

regexp_replace(coalesce(thing, ''), '^\s*$', 'default')

With Redshift you can also use nvl() instead of coalesce() for a briefer solution, but coalesce() is the SQL standard and therefore arguably more readable.

1
Erwin Brandstetter On

This is the fastest way to replace null and empty strings:

CASE WHEN column <> '' THEN column ELSE 'default' END

Only applicable to string types. Most other data types have no "empty" value.

This uses only a very basic CASE expression, so it works in any RDBMS (certainly including Postgres and Redshift) that handles null values according to the SQL standard.

Another standard SQL way:

COALESCE(NULLIF(column, ''), 'default')

See:

Regular expressions are way to expensive for the simple task (IMO).

If you have to deal with non-printing characters, you need to do more. See: