Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Data Migration Strategies in Ruby on Rails: The Right Way to Manage Missing Data

Posted on Jul 22 In this article, we're going to discuss the possible strategies for migrating, generating, and backfilling data in a Rails application. We'll implement them, improve them, consider their pros and cons, and discuss which ones are better to use in different scenarios. By the end of the article, we will have a full picture of the different ways to solve data Migration problems.In simple terms, data migration is the process of adding, updating, or transferring some data inside your application. The most popular cases for data migration are as follows:We'll consider 3 different ways to do it:The first option is the simplest one: we'll just add missing data via rails c or through a direct database connection in production.Advantages:Problems:The second option is the rake task. In this chapter, we will try to understand how to properly add rake tasks, ensure that they work correctly, learn their pros and cons, and explore how they can be used for data migration. We will start by adding the simplest rake task, and then we will proceed to improve its structure, cover the logic with tests, and consider using best practices for writing data migration using rake tasks.Let's imagine that we have an Animal model with the following fields:And we need to change the status value from nil to reserved for all animals that we created before today. How can we do it? Let's start by adding a simple rake task template.And check that it works:The task has been executed and works as expected. Now, let's add the actual code with the database update. It will look like the following:Now, let's check if it works:The rake task has been executed, and the database values are updated accordingly. That's it. Our main scenario works as expected, but there's still some room for improvements. Let's take a look at what we can do to make our task more reliable.There are 5 areas that we can potentially improve:As you may have noticed, the rake task above hasn't shown any output. It can be a real problem because you don't know whether it was run successfully or not, and you will spend much time trying to check it by yourself on production data. Let's fix this problem:Now, let's run the rake task:With the updated code, we display the number of animals in the 'reserved' state both before and after running the rake task, providing better visibility and ensuring that the task is executed successfully.What happens if some unexpected errors appear in the middle of data migration? Right now, we don't handle it. Even if it's not critical for the example that we've provided, in general, we should not forget to wrap such kind of data manipulation into a transaction to keep a consistent data state.By adding the ActiveRecord::Base.transaction block, we ensure that all the updates are executed as a single atomic operation. If any error occurs during the data migration, the transaction will be rolled back, and the data will remain unchanged, maintaining data consistency and integrity.We have already handled this problem in our rake task, but it's important to mention that we should use the optimal database solution if possible. For example, someone could write our task like this:The following code will trigger an SQL update request for every animal from the list, making it non-optimal:That's why it always makes sense to try to find a way to do it in a single DB operation, as we did with the following code:This code will trigger only one DB request:If there's no way to update something in a single DB request, at least you should consider using batches as a good practice:P.S. To see SQL logs from the rake task, you can add the following code inside:One not so obvious problem that can occur when you have many rake tasks is a lack of encapsulation. Let's take a look at the following two rake tasks and try to guess what can be wrong here:Now let's run both of them:Have you noticed? That's not what we expected! The second rake task overrode the method value from the first one! And that's quite dangerous and unexpected if you were to use something like this:And run:You would remove all your records instead of the desired subset!How can we fix it?We need to wrap our rake tasks into the Rake::DSL class like this:And let's execute:Now everything works as expected. Let's apply the same isolation for our backfill_statuses rake task.That's it.The last thing that we're going to do to ensure quality is to add tests. Let's see how we can test the rake tasks.First of all, we need to define some code to load our tasks:And include it in the spec/rails_helper.rb:Then let's add our test:That's it.The third option is to use the data-migrate gem.Let's add this gem to our project:And execute:Now you can generate a data migration as you would generate a schema migration:Let's add some code to the generated file to check if it actually works:To run the migration, we need to use the following command:And we get the following output:This migration can be run only once. So let's remove it and generate another one and add real code inside:Here's what we get after adding our business logic code:And let's run:Here's what we get:Basically, data migration works by the same logic as scheme migration, but instead of saving the last running migration version into the schema_migrations table, data migration saves the version into another table called data_migrations.It's important to mention that data migration should be irreversible in most cases, but we don't want to raise an explicit error as it would prevent rollback for the scheme structure changes. Instead, we just leave the down method empty. For this reason, it would be better to design the migration in an idempotent way, to be able to run it several times if possible.Data Migration doesn't provide any additional advantages except for what we discussed above. Therefore, we still need to think about the problems that we solved for the rake task, such as displaying output, adding transactions, optimizing DB requests, etc.Let's compare these two solutions and decide which one we should use and under what conditions:So, in general, the rake task is much more flexible and testable and can cover the same tasks as the data migration but may require more effort. Data migrations are much more strict but provide some automations and strict execution order that are connected with schema changes.For example, the data migration gem suits very well if you need to support some database schema restructurization and you don't want to miss the data during these changes. For instance, if you need to rename a column and you want to copy the values from the old one to a new one, then set NULL=false constraint to the new one, and then completely remove the old one, the data migration gem will make this process much easier compared to using a rake task.On the other hand, if the task is unrelated to database schema changes, a rake task might be a more suitable choice. It offers greater flexibility and testability, making it easier to manage tasks that are not directly tied to schema alterations.Throughout this article, we have explored various strategies for data migration, generation, and backfilling in a Rails application. We have implemented and improved these strategies, carefully considering their advantages and disadvantages. Additionally, we have compared two primary solutions, the rake task and data migration gem, and examined their suitability in different scenarios. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse Nemwel Boniface - Apr 25 Nezir Zahirovic - Apr 26 Peter Kim Frank - Apr 25 kojix2 - Apr 24 Once suspended, vladhilko will not be able to comment or publish posts until their suspension is removed. Once unsuspended, vladhilko will be able to comment and publish posts again. Once unpublished, all posts by vladhilko will become hidden and only accessible to themselves. If vladhilko is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to Vlad Hilko. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag vladhilko: vladhilko consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging vladhilko will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

Data Migration Strategies in Ruby on Rails: The Right Way to Manage Missing Data

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×