Fast streaming batch data from mssql database

Dimitri Shkoklev :

I need to read each row from a complex query in SQL server database using Hibernate and write the result to a file. But the query can return millions of records so it seemed that the following code was appropriate:

Session unwrap = entityManager.unwrap(Session.class);
NativeQuery nativeQuery =
    unwrap.createNativeQuery("the sql query string read from a file");
nativeQuery.setFlushMode(FlushMode.MANUAL);
nativeQuery.addEntity("C", CustomObject.class);
nativeQuery.setFetchSize(100000);
nativeQuery.setReadOnly(true);
ScrollableResults scroll = nativeQuery.scroll(ScrollMode.FORWARD_ONLY);

while(scroll.next()) {
   CustomObject customObject = (CustomObject) scroll.get(0);
   jsonGenerator.writeObject(customObject); // using the JsonGenerator library https://fasterxml.github.io/jackson-core/javadoc/2.6/com/fasterxml/jackson/core/JsonGenerator.html
   unwrap.evict(claimEntity);
}

Currently, this code takes approximately 3-4 days to write around 1 million records to the file, which is too slow. I am using the mssql-jdbc driver with hibernate and I assume that the fetch size might be ignored by the driver, but changing the driver is not an option for me since the other drivers do not support the bulk copy functionality.

The problem is that hibernate is probably making a connection to fetch each row separately from the database, resulting in expensive network calls.

I have tried setting adaptive buffering, enabled cursors, setting the connection auto commit mode to false and other things, but nothing seemed to make this faster.

I would like to make this faster and would appreciate any help.

Aditya Rewari :

Had a similar issue!

Data set was too big, while in a project which involved task of Bank Migration

Solution Adopted: Used PlSql instead of Java Batch. They are always faster.


Another thought I will like to add into this, from my experience writing for big data sets

  • Instead of committing after every iteration, rather go for BULK COMMITS

We used to commit together after 30,000 iterations over result set.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=28888&siteId=1