saveAll() is too slow. Cassandra Database with Spring boot. why?

nar-007 :

I am trying to insert in batches (Objects are stored in an arraylist and as soon as count is divisible by 10000, I insert all these objects into my table. But it takes more than 4 minutes to do so. Is there any approach which is faster?

arr.add(new Car(name, count, type));
if(count%10000==0){
repository.saveAll(arr);
arr.clear();
}
Aaron :

So here is what is happening. I am most curious to see the table definition inside Cassandra. But given your Car constructor,

new Car(name, count, type)

Given those column names, I'm guessing that name is the partition key.

The reason that is significant, is because the hash of the partition key column is what Cassandra uses to figure out which node (token range) the data should be written to.

When you saveAll on 10000 Cars at once, there is no way you can guarantee that all 10000 of those are going to the same node. To deal with this, Spring Data Cassandra must be using a BATCH (or something like it) behind the scenes. If it is a BATCH, that essentially puts one Cassandra node (designated as a "coordinator") to route writes to the required nodes. Due to Cassandra's distributed nature, that is never going to be fast.

If you really need to store 10000 of them, the best way would be send one write at a time asynchronously. Of course, you won't want 10000 threads all writing concurrently, so you'll want to throttle-down (limit) the number of active threads in your code. DataStax's Ryan Svihla has written a couple of articles detailing how to do this. I recommend this one- Cassandra: Batch Loading Without the Batch - The Nuanced Edition.

tl;dr;

Spring Data Cassandra's saveAll really shouldn't be used to persist several thousand writes. If I were using Spring Data Cassandra, I wouldn't even go beyond double-digits with saveAll, TBH.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=31667&siteId=1