(VIM) Is vimgrep capable of searching unicode string

Question

(VIM) Is vimgrep capable of searching unicode string

259 views Asked by aj3423 At 16 July 2014 at 06:06

Is vimgrep capable of searching unicode strings? For example:

a.txt contains wide string "hello", vimgrep hello *.txt found nothing, and of course it's in the right path.

Original Q&A

There are 2 answers

Ben On 16 July 2014 at 13:18

If Vim can detect the encoding of the file, then yes, Vim can grep the file. :vimgrep works by first reading in the file as normal (even including autocmds) into a hidden buffer, and then searching the buffer.

It looks like your file is little-endian UTF-16, without a byte-order mark (BOM). Vim can detect this, but won't by default.

First, make sure your Vim is running with internal support for unicode. To do that, :set encoding=utf-8 at the top of your .vimrc. Next, Vim needs to be able to detect this file's encoding. The 'fileencodings' option controls this.

By default, when you set 'encoding' to utf-8, Vim's 'fileencodings' option contains "ucs-bom" which will detect UTF-16, but ONLY if a BOM is present. To also detect it when no BOM is present, you need to add your desired encoding to 'fileencodings'. It needs to come before any of the 8-bit encodings but after ucs-bom. Try doing this at the top of your .vimrc and restart Vim to use:

set encoding=utf-8
set fileencodings=ucs-bom,utf-16le,utf-8,default,latin1

Now loading files with the desired encoding should work just fine for editing, and therefore also for vimgrep.

**romainl** · Accepted Answer · 2014-07-16T13:45:40+00:00

"Unicode" is a bit misleading in this case. What you have is not at all typical of text "encoded in accordance with any of the method provided by the Unicode standard". It's a bunch of normal characters with normal code points separated with NULL characters with code point 0000 or 00. Some Java programs do output that kind of garbage.

So, if your search pattern is hello, Vim and :vim are perfectly capable of searching for and finding hello (without NULLs) but they won't ever find hello (with NULLs).

Searching for h^@e^@l^@l^@o (^@ is <C-v><C-@>), on the other hand, will find hello (with NULLs) but not hello (without NULLs).

Anyway, converting that file/buffer or making sure you don't end up with such a garbage are much better long-term solutions.

TechQA.

(VIM) Is vimgrep capable of searching unicode string

There are 2 answers

Related Questions in VIM

Related Questions in UNICODE

Related Questions in VIMGREP

Popular Questions

Trending Questions