Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Speed up your PostgreSQL bulk inserts with COPY

Posted on Oct 2 Have you ever had the task of migrating data from one database to another? Or even from one table to another? Or to make too many INSERTs all at once. If so, this article is for you: today we will discuss the Copy command. An amazing alternative to performing intensive writes to the database is very useful for data migration imports, and bulk inserts. Having to migrate/insert data in a few seconds is a very common requirement for a task. When people hear that, the first thing they will say to you is:Is this really necessary? Maybe not. For many cases, migration tools like the COPY command, are way faster than the options above, and also cheaper in terms of infrastructure. COPY is a native PostgreSQL command and its syntax is like this:ORThe COPY command in PostgreSQL is a powerful tool for performing bulk inserts and data migrations. It allows you to quickly and efficiently insert large amounts of data into a table.By using the COPY command, you can avoid the need for distributed processing tools, adding more CPU and RAM to the database, or using a NoSQL database. It provides a simpler and more cost-effective solution for bulk inserts and data migrations.So, if you have a task that requires inserting a large number of rows in a short amount of time, consider using the COPY command in PostgreSQL. It can significantly speed up your data migration and import processes.Note: PostgreSQL 16 has improved COPY’s performance by over 300%.  Internally, the COPY command in PostgreSQL works by leveraging the server's file system. When you use the COPY command to import data from a file, PostgreSQL reads the file directly from the file system and writes the data directly to the table, bypassing many of the normal SQL processing steps.This direct file-to-table transfer allows for much faster and more efficient data inserts compared to traditional INSERT statements. It also reduces the overhead on the database server, making it a preferred method for bulk inserts and data migrations.But let’s go deeper to understand all the details.Let’s see a quick comparison between COPY and INSERT commands, to understand why COPY performs better.Summing up, COPY is faster because it reduces the overhead of logging, network latency, parsing, and transaction management compared to multi-line INSERT statements. It allows for a simpler query execution plan, resulting in faster and more efficient bulk inserts and data migrations. A tradeoff is that it requires direct access to the file system, so it may not be suitable for all scenarios where you need to insert data. Another tradeoff is durability, COPY generates few logs, and executes all in a single transaction, which makes it more risky. I did some code with Bun (yes, Bun), to compare the performances of INSERT and COPY loading heavy amounts of data to Postgres. Let’s see:This code inserts 2mi rows at once into the database. Notice that, for multi-line inserts, there is a limit of 1000 lines per INSERT statement. If we run this code, we get:Multi-line INSERT took: 37.187 secondsCOPY took: 10.088 secondsChanging the code to load 5 million records, gives us the following result:Multi-line INSERT took: 78.957 secondsCOPY took: 20.534 secondsNow, with 10 mi records, we get:Multi-line INSERT took: 134.412 secondsCOPY took: 36.965 secondsWe can see how absurdly fast is the COPY command, and why we should consider its use in favor of INSERT in some cases. But we can make COPY even faster, by parallelizing it with threads! For our initial tests with 2 million records, we have:COPY took: 5.915 secondsOur parallelized COPY with 5 million records to insert, performs:COPY took: 15.41 secondsOur parallelized COPY with 10 million records to insert, performs:COPY took: 21.19 seconds I conducted a performance comparison using a simple plot generated with a notebook running Matplotlib. The results are as follows:The graph clearly illustrates the superior efficiency of the COPY command in comparison to the INSERT command for this specific task. The difference in speed is quite remarkable. Github repoPostgres 16 releaseHow does COPY work and why is it so much faster than INSERT?COPY official docsInsert 5000 rows per second using PostgreSQL CopyTemplates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse Zouhair Sahtout - Sep 15 Dmitry Romanoff - Sep 11 Code_Jedi - Sep 23 Łukasz Wójcik - Sep 10 Once suspended, josethz00 will not be able to comment or publish posts until their suspension is removed. Once unsuspended, josethz00 will be able to comment and publish posts again. Once unpublished, all posts by josethz00 will become hidden and only accessible to themselves. If josethz00 is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to José Thomaz. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag josethz00: josethz00 consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging josethz00 will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

Speed up your PostgreSQL bulk inserts with COPY

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×