Seeding data as part of Rails database migrations

So, we stumbled across a slight problem when trying to seed data as part of our database migration scripts. The de-facto standard seems to be to keep the seed data in yaml files and load them in using fixtures. This is a perfectly adequate way of doing it for seed data that will never change, but doesn’t really fit into the infrastructure of migrations. There are also many plugins out there to help with using fixtures to seed data, but none of them allow you to have different seed data for different migration versions.

After bumping our heads together, we came up with a way of managing seed data using the same philosophy as data structure migrations. The only prerequisite to your data structure is to add a column named “came_from_migration” to any table that is going to contain seeded data. This way when moving down migrations it is possible to determine what data that shouldn’t be there (and delete it).

  1. Add a “fixtures” folder to /db/migrate. Your seed data yaml files will be saved here.
  2. Create your seed data yaml file and name it in a similar fashion to your migration script using the table name as the descriptor e.g. 005_table_name_to_seed.yml. Save this in the folder you created above. This means that you can seed several tables as part of one migration version and it’s nice and easy to see what’s going on from just the file name.
  3. Now I needed to write a couple of helper methods to move to and from the seed data. I’ve placed these in a MigrationsHelper module which is included in any migration that needs to seed data. And because we are so nice, here they are:
  def seed_from_yaml(table_name)
    info = get_migration_file_info(caller[0])
    fixture_folder = RAILS_ROOT + "/db/migrate/#{info[:type]}_fixtures"
    fixture_file = "#{info[:version]}_#{table_name}"
    puts "Seeding #{table_name} from #{info[:type]} fixture version #{info[:version]}"

    require 'active_record/fixtures'    
    connection = ActiveRecord::Base.connection
    seed_data = Fixtures.new(connection, table_name.to_s, nil, File.join(fixture_folder, fixture_file))
    seed_data.insert_fixtures
  end
  
  def deseed(table_name)
    info = get_migration_file_info(caller[0])
    puts "Deseeding #{table_name} with anything newer than #{info[:version]}"
        
    execute %{DELETE FROM #{quote_table_name(table_name)} WHERE #{quote_column_name('came_from_migration')}=#{info[:version].to_i}}
  end
  
  private
    def get_migration_file_info(file)
      file.gsub!(/:.*/, '') # get rid of everything after the colon
      migration = file.split '/'
      db_type = migration.include?("global") ? "global" : "private"
      migration_file = migration[migration.length-1].split '_'
      migration_number = migration_file[0]
      { :type => db_type, :version => migration_number, :file => migration_file }
    end

Seeding data when not using migrations

What about when running tests or deploying, i.e. seeding data when not using migrations and coming from a empty database? Well, the schema should be held for you in your _schema.rb file, but that doesn’t cover any seed data. The answer is to create a *_seed.yml containing all the data in the database (in the same way the *_schema.rb file contains the complete data structure schema of the database). This way you can load in the seed data after the tables have been created. We have yet to write this method, so is left as an exercise for the reader at present. The basic premise is to go through every table in the schema, check for the existence of the “came_from_migration” column, do a “select * from table where came_from_migration is not null” on tables with that column and then concatenate the output from the select statements to a *_seed.yml and save in /db. There may also be problems with referential integrity with seed data and to what order it is inserted into the database. No doubt we will come across these difficulties soon.

Bibliography