Clearing up confusion about V8 and the JavaScript engine

155 views Asked by At

I've been researching the V8 engine and JavaScript engines in general for about 2 hours, and I was wondering if I have the process correct.

First, we have JavaScript code.

With the code as input, the CPU runs the JavaScript engine, which in turn starts the process.

The code is parsed through the Parser

The Parser outputs an Abstract Syntax Tree.

The AST is then passed through the Interpreter, where a few things happen.

  • The Interpreter evaluates the AST (code) and returns an output
  • (*)JIT compilation converts the output to byte code.

This byte code is run directly, as the interpreter doesn't output any machine code.

(*) - This is a part where I am basically guessing. It's hard to understand the differences between the interpreter and compiler, since... isn't JavaScript an intrepted language? What is the compiler even doing? Optimizations? On what? And sometimes I see people say it's not just doing optimizations. I don't really know...

Any assistance would be greatly appreciated.

1

There are 1 answers

0
jmrk On

(V8 developer here.)

There are (at least) two distinct concepts here:

(1) JavaScript being an interpreted language means that JavaScript programs are delivered to those who want to run them in the form of source code. This is in contrast to e.g. C++, which is (almost always) compiled to machine code by the developer, and then distributed as machine code (a.k.a. "binaries" or "executables", meaning "executable by hardware").
For JavaScript's primary use case on websites, this is an advantage, because machine code is specific to a hardware platform, whereas websites are meant to run on any client hardware. By distributing JS as source, the browser takes care of what's needed to run it on your particular device (x86 or ARM or MIPS or ..., 32 or 64 bits, SIMD support or not, ...).
In accordance with this characterization of the JS language, sometimes the term "JS interpreter" is used interchangeably with "JS engine" and "JS virtual machine", meaning "any software that executes JavaScript" (regardless of how exactly it is implemented under the hood).

(2) An "interpreter" inside a (JavaScript or other) engine as opposed to a JIT compiler.
For programming languages that are distributed as source code, engines can still choose to generate machine code on the fly, this is called "just in time compilation", as opposed to executing the program on an "interpreter" that does not rely on generating machine code. Instead, such an interpreter is a program that simulates a machine that can execute the programming language in question directly. (This can be a bit difficult to grasp at first; see e.g. this answer for an example of a very simple interpreter for a made-up language.)

For example, @Pointy's comment that says:

JavaScript interpreters definitely create blocks of machine code when they decide it's important.

refers to concept (1), because an interpreter in the sense of (2) does not produce machine code, but an "interpreter" in the sense of "JS engine as used in today's browsers" certainly does (when a function is hot).


With that general framing out of the way, I have some answers to your specific question, specifically in V8:

passed through the Interpreter, where a few things happen

Almost: the AST is compiled to bytecode (just once), this bytecode is then interpreted (as often as the function is called). That's it. The result of the interpretation is that the program is getting executed, which could e.g. insert a DOM node or print something to the console or whatever.

What is the compiler even doing? Optimizations? On what?

It generates optimized machine code for a given function.
If and when the interpreter realizes that it's spending a lot of time interpreting a particular function, it instructs the optimizing compiler to compile this function to optimized machine code. The compiler runs in the background, and takes some time (usually on the order of milliseconds, but potentially hundreds of them) to:

  • create an IR (intermediate representation) graph for the function
  • apply a bunch of "standard" compiler optimizations (constant folding, dead code elimination, inlining, strength reduction, ...) to this IR, as well as a bunch of JS-specific tricks (such as replacing built-in functions like Array.push with shortcuts when circumstances allow)
  • generate machine code for the final IR. Once that job is done and optimized code is available for a function, the next time this function is called, this optimized code is executed (instead of the interpreter).

There are several reasons for having such a multi-tier execution system, three of the most important ones are:

  1. Bytecode is a lot smaller than machine code, and memory consumption matters. Compiling functions that don't get executed much (so their performance doesn't matter much) only to bytecode saves quite a lot of memory compared to compiling them to machine code.
  2. Optimized compilation takes quite a bit of time. For functions that don't get called much, executing them without optimizing them first is overall faster than spending time on optimizing them and only then running them.
  3. For a dynamic language like JavaScript, even if you decided that you wanted to spend the time and memory on optimizing everything right away, the optimizing compiler couldn't actually do a good job at creating optimized code without collecting type feedback during unoptimized execution first.

I was wondering if the optimized code gets passed through the interpreter again? How is it executed other than being reinterpreted into byte code?

No, machine code (optimized or not) is executed directly on the CPU, not on the interpreter.


For completeness, it gets even more complicated:

  • Switching from interpreted mode to optimized compiled mode can, strictly speaking, not only happen on the next call: there's also a mechanism called "on-stack replacement" that can replace functions while they're running a loop.
  • V8 today actually has more than just an interpreter and an optimizing compiler: it has a four-tier system consisting of an interpreter, a baseline compiler that's fast at generating code but applies no optimizations, a quick optimizing compiler that performs a couple of relatively cheap but impactful optimizations, and a "heavyweight" optimizing compiler that's slower to run but produces even better code and is used for really hot functions. These tiers take progressively more effort to reach, so functions progress through them if they keep spending enough time in their current tier.
  • As a particular implementation detail, all of V8's compilers take the function's bytecode as input, rather than an AST as most compilers do. That saves a few CPU cycles, because V8's bytecode is similar in expressiveness to an AST, and the bytecode needs to be kept around anyway, whereas the AST would have to be recreated (at the cost of CPU) or kept around (at the cost of more memory consumption). But if this confuses you, just assume that the compilers start by parsing the source code and building an AST again -- it's not important for the overall concept that the compilers read the bytecode, it's just an optimization.