Apify: weird costs for using key-value stores

57 views Asked by At

I'm working with Node.js but I think my question is language-agnostic.

In my Apify actor, First I do a few long scrapes, and store the result in some arrays.

Then I do many successive POST requests iterating over the information collected in the step above.

It's important to notice that I'm only interested in sending the requests, so I'm not collecting any scraping result.

So far, good. But then I felt the need of storing the state of the process the way the actor is able to recover when something bad happens, saving time and money in the process.

So my first solution was something like this:

const actorRunStore    = await apify.Actor.openKeyValueStore('runStore');
const actorRunState    = await actorRunStore.getValue('runState') || {};
// Big scrape, place everything inside actorRunState.
await actorRunStore.setValue('runState', actorRunState);
for (/*whatever*/) {
    // Successive requests, place counters inside actorRunState.
    await actorRunStore.setValue('runState', actorRunState);
}
// Finally, get rid of the store.
await actorRunStore.drop();

After running that actor, I wrote down the total cost of key-value store writes.

So I wanted to reduce the cost and thought of the following: it's not required to keep writing the full runState inside the for() since most of its contents are fixed (the big pre-scraped arrays). Hence, I separated the contents in two states. One, runState, containing the arrays (to be written ONCE), and other, ndxState, containing the counters/indexes, and got this:

const actorRunStore    = await apify.Actor.openKeyValueStore('runStore');
const actorRunState    = await actorRunStore.getValue('runState') || {};
const actorNdxState    = await actorRunStore.getValue('ndxState') || {};
// Big scrape, place everything inside actorRunState.
await actorRunStore.setValue('runState', actorRunState);
for (/*whatever*/) {
    // Successive requests, place counters inside actorNdxState.
    await actorRunStore.setValue('ndxState', actorNdxState);
}
// Finally, get rid of the store.
await actorRunStore.drop();

Now, actorRunState gets tenths of KBs and actorNdxState just a bunch of bytes. This way, the amount of KBs written should be way smaller, right? Well, no. The total cost of key-value store writes was higher. :/

So I thought, OK, maybe they modify the given values individually, but the actual write operation involves the whole store. Let's use TWO stores then:

const actorRunStore    = await apify.Actor.openKeyValueStore('runStore');
const actorNdxStore    = await apify.Actor.openKeyValueStore('ndxStore');
const actorRunState    = await actorRunStore.getValue('runState') || {};
const actorNdxState    = await actorNdxStore.getValue('ndxState') || {};
// Big scrape, place everything inside actorRunState.
await actorRunStore.setValue('runState', actorRunState);
for (/*whatever*/) {
    // Successive requests, place counters inside actorNdxState.
    await actorNdxStore.setValue('ndxState', actorNdxState);
}
// Finally, get rid of the stores.
await actorRunStore.drop();
await actorNdxStore.drop();

And guess what. The cost is even higher now!

....

Is my logic flawed or is there something I'm forgetting about?

Thanks for your insights.

1

There are 1 answers

1
Lukáš Křivka On

Apify doesn't charge for how big the writes are. Apify charges for number of writes and then for time based storage where size matters (GB hours).