How to avoid slow TokuMX startup?

364 views Asked by At

We run a TokuMX replica-set (2 instances + arbiter) with about about 120GB data (on disk) and lots of indices.

Since the upgrade to TokuMX 2.0 we noticed that restarting the SECONDARY instance always took a very long time. The database kept getting stuck at STARTUP2 for 1h+, before switching to normal mode. While the server is at STARTUP2, it's running at a continuous CPU load - we assume it's rebuilding its indices, even though it was shut down properly before.

While this is annoying, with the PRIMARY being available it caused no downtime. But recently during an extended maintenance we needed to restart both instances. We stopped the SECONDARY first, then the PRIMARY and started them in reverse order. But this resulted in both taking the full 1h+ startup-time and therefore the replica-set was not available for this time.

Not being able to restart a possibly downed replica-set without waiting for such a long time, is a risk we'd rather not take.

Is there a way to avoid the (possible) full index-rebuild on startup?

2

There are 2 answers

0
DaveR On

@Chris - We are revisiting your ticket now. It may have been inadvertently closed prematurely.

@Benjamin: You may want to post this on https://groups.google.com/forum/#!forum/tokumx-user where many more TokuMX users, and the Tokutek support team, will see it.

1
Chris Heald On

This is a bug in TokuMX, which is causing it to load and iterate the entire oplog on startup, even if the oplog has been mostly (or entirely) replicated already. I've located and fixed this issue in my local build of TokuMX. The pull request is here: https://github.com/Tokutek/mongo/pull/1230

This has reduced my node startup times from hours to <5 seconds.