|
| 1 | += Spring Boot: JPA Bulk Database Insert |
| 2 | + |
| 3 | +In this project, I achieved reducing 10k records insertion time from 183 seconds to just 5 secs. |
| 4 | + |
| 5 | +For this I did teh following changes :- |
| 6 | + |
| 7 | +==== 1) Change the number of records while inserting. |
| 8 | + |
| 9 | +i. Set hibernate batchin insert size with the folowing properties. |
| 10 | + |
| 11 | + |
| 12 | + spring.jpa.properties.hibernate.jdbc.batch_size=30 |
| 13 | + |
| 14 | +ii. Add connection string properties. |
| 15 | + |
| 16 | + |
| 17 | + cachePrepStmts=true |
| 18 | + &useServerPrepStmts=true |
| 19 | + &rewriteBatchedStatements=true |
| 20 | + |
| 21 | + e.g |
| 22 | + jdbc:mysql://localhost:3306/BOOKS_DB?serverTimezone=UTC&cachePrepStmts=true&useServerPrepStmts=true&rewriteBatchedStatements=true |
| 23 | + |
| 24 | +iii. Changed the code for inserting, so that saveAll methods get batch sizes of 30 to insert as per what we also set in the properties file. |
| 25 | + |
| 26 | +A very crude implementation of something like this. |
| 27 | + |
| 28 | + for (int i = 0; i < totalObjects; i = i + batchSize) { |
| 29 | + if( i+ batchSize > totalObjects){ |
| 30 | + List<Book> books1 = books.subList(i, totalObjects - 1); |
| 31 | + repository.saveAll(books1); |
| 32 | + break; |
| 33 | + } |
| 34 | + List<Book> books1 = books.subList(i, i + batchSize); |
| 35 | + repository.saveAll(books1); |
| 36 | + } |
| 37 | + |
| 38 | +This reduced the time by not that much, but dropped from 185 secs to 153 Secs. That's approximately 18% improvement. |
| 39 | + |
| 40 | + |
| 41 | +==== 2) Change the ID generation strategy. |
| 42 | + |
| 43 | +This made a major impact. |
| 44 | + |
| 45 | +I stopped usign the `@GeneratedValue` annotation with strategy i.e `GenerationType.IDENTITY` on my entity class. |
| 46 | +Hibernate has disabled batch update with this strategy, Because it has to make a select call to get the id from the database to insert each row. |
| 47 | + |
| 48 | +I changed the strategy to SEQUENCE and provided a sequence generator. |
| 49 | + |
| 50 | + public class Book { |
| 51 | + @Id |
| 52 | + @GeneratedValue(strategy = SEQUENCE, generator = "seqGen") |
| 53 | + @SequenceGenerator(name = "seqGen", sequenceName = "seq", initialValue = 1) |
| 54 | + private Long id; |
| 55 | + } |
| 56 | + |
| 57 | +This change drastically changed the insert performance as Hibernate was able to leverage bulk insert. |
| 58 | +From the previous performance improvement of 153 secs, the time to insert 10k records reduced to only 9 secs. Thats an increase in performance by nearly 95%. |
| 59 | + |
| 60 | +Next, I pushed it further to use higher batch sizes and I noticed that doubling the batch size does not double down on time. The time to insert only gradually reduces. |
| 61 | + |
| 62 | +|=== |
| 63 | +|Batch Size | Time to insert (Secs) |
| 64 | + |
| 65 | +|30 |
| 66 | +|9.5 |
| 67 | + |
| 68 | +|60 |
| 69 | +|6.48 |
| 70 | + |
| 71 | +|200 |
| 72 | +|5.04 |
| 73 | + |
| 74 | +|500 |
| 75 | +|4.46 |
| 76 | + |
| 77 | +|1000 |
| 78 | +|4.39 |
| 79 | + |
| 80 | +|2000 |
| 81 | +|4.5 |
| 82 | + |
| 83 | +|5000 |
| 84 | +|5.09 |
| 85 | + |
| 86 | +|=== |
| 87 | + |
| 88 | + |
| 89 | +The most optimal I found for my case was a batch size of 1000 which took around 4.39 secs for 10K records. After that, I saw the performance degrading as you can see in the graph. |
0 commit comments