(VIM) Is vimgrep capable of searching unicode string

259 views Asked by At

Is vimgrep capable of searching unicode strings? For example:

a.txt contains wide string "hello", vimgrep hello *.txt found nothing, and of course it's in the right path.

2

There are 2 answers

2
romainl On BEST ANSWER

"Unicode" is a bit misleading in this case. What you have is not at all typical of text "encoded in accordance with any of the method provided by the Unicode standard". It's a bunch of normal characters with normal code points separated with NULL characters with code point 0000 or 00. Some Java programs do output that kind of garbage.

So, if your search pattern is hello, Vim and :vim are perfectly capable of searching for and finding hello (without NULLs) but they won't ever find hello (with NULLs).

Searching for h^@e^@l^@l^@o (^@ is <C-v><C-@>), on the other hand, will find hello (with NULLs) but not hello (without NULLs).

Anyway, converting that file/buffer or making sure you don't end up with such a garbage are much better long-term solutions.

0
Ben On

If Vim can detect the encoding of the file, then yes, Vim can grep the file. :vimgrep works by first reading in the file as normal (even including autocmds) into a hidden buffer, and then searching the buffer.

It looks like your file is little-endian UTF-16, without a byte-order mark (BOM). Vim can detect this, but won't by default.

First, make sure your Vim is running with internal support for unicode. To do that, :set encoding=utf-8 at the top of your .vimrc. Next, Vim needs to be able to detect this file's encoding. The 'fileencodings' option controls this.

By default, when you set 'encoding' to utf-8, Vim's 'fileencodings' option contains "ucs-bom" which will detect UTF-16, but ONLY if a BOM is present. To also detect it when no BOM is present, you need to add your desired encoding to 'fileencodings'. It needs to come before any of the 8-bit encodings but after ucs-bom. Try doing this at the top of your .vimrc and restart Vim to use:

set encoding=utf-8
set fileencodings=ucs-bom,utf-16le,utf-8,default,latin1

Now loading files with the desired encoding should work just fine for editing, and therefore also for vimgrep.