Dumping and loading seed data for Rails database resets

As mentioned in our previous posts regarding database migrations in Rails, we have introduced a method of managing seed data in migrations. Although this methodology still has some problems, generally speaking it works pretty well.

After a database migration, a schema dump takes place. This allows fast recreation of databases without having to go through every single migration. The problem with this is that any seed data added during the migrations is not added back in when doing a db:reset. This can be solved by having a similar function to Schema Dumper.

Here are the seed data tasks (you probably want to add these into the db namespace):

namespace :seed_data do
  desc 'Load schema seed data'
  task :load => :environment do
    puts "== Loading dumped seed data"
    config = ActiveRecord::Base.configurations[RAILS_ENV || 'development']
    ActiveRecord::Base.establish_connection(config)
    require 'active_record/fixtures'
    Dir.glob(RAILS_ROOT + '/db/seed_data/global/*.yml').each do |file|
      Fixtures.create_fixtures(RAILS_ROOT + '/db/seed_data/global', File.basename(file, '.*'))
    end
  end
  
  desc 'Save schema seed datae'
  task :dump => :environment do
    config = ActiveRecord::Base.configurations[RAILS_ENV || 'development']
    ActiveRecord::Base.establish_connection(config)
    Database::SeedDataDumper.dump(ActiveRecord::Base.connection)
  end
end

You’ll need the SeedDataDumper class:

module Database
  class SeedDataDumper

    def self.dump(connection)
      new(connection).dump()
      true
    end

    def dump()
      @connection.tables.each do |table|
        if @connection.select_all("DESCRIBE #{table}").inject(false) { |b, col| b || (col["Field"]=="came_from_migration") }
          data = @connection.select_all("SELECT * FROM #{table} WHERE came_from_migration IS NOT NULL")
          if !data.empty?
            outfile = File.new("#{@folder}/#{table}.yml",  "w")
            header(outfile)
            i = "0"
            outfile.write data.inject({}) { |hash, record|
              hash["#{table}_#{i.succ!}"] = record
              hash
            }.to_yaml
            footer(outfile)
          end
        end
      end
    end

    private
      def initialize(connection)
        @connection = connection
        @folder = "#{RAILS_ROOT}/db/seed_data"
        puts "== Dumping seed data"
      end

      def header(stream)
        stream.puts <<HEADER
# This file is auto-generated from all the current "came_from_migration" rows in the selected
# database.  It generates a yaml file per table which can be loaded in again using fixtures.
HEADER
      end

      def footer(stream)
        stream
      end

  end
end
&#91;/sourcecode&#93;

Remember to put the following line at the end of your <code>db:migrate</code> task:


Rake::Task["db:seed_data:dump"].invoke if ActiveRecord::Base.schema_format == :ruby

Seeding data in Rails Migrations (Part 2)

We’ve had a significant number of hits on our first article on Seeding data as part of Rail database migrations which just goes to show that people are finding this a bit of a problem! As we have been using our technique for a few weeks now, we’d like to point out a few of the problems with it that we have found (some with solutions, and some that we just haven’t figured out yet!).

  1. Fixtures#create_fixtures deletes data – Whoops! After looking at the source code, create_fixtures effectively drops all the data in the table it is just about to populate. Obviously this isn’t very useful if you want to use a migration on any table that isn’t empty.
  2. Related data – Migrations in Rails 2 have a nifty feature allowing you to create inter-related table rows by using named data items. The methodology outlined in Part 1 doesn’t load all the yaml files into the migration and then run them, instead it runs them one at a time meaning (and with no model class) so Rails is unable to resolve the relationships between rows.
Solutions
The easiest way to fix the related data is to extend the reader for yaml files to have more than one table per file, consider a yaml seed data migration file “006_add_monkeys_and_fruit.yml”:
monkeys:
  george:
    name: George the Monkey
    pirate: reginald
    fruits: apple, orange, grape

fruits:
  apple:
    name: apple

  orange:
    name: orange

  grape:
    name: grape

This naming convention follows the ideas behind migration naming convention and doesn’t stop you having more than one seed file per migration (obviously of unrelated data) e.g. 006_add_tables_and_chairs.yml

Temporary work around

For the time being (until we extend the yaml parser) we have written a slightly different seed data function for our migration files This allows us to seed a row into the database, read its insertion ID and then reference that row object in subsequent inserts.

Conclusion

We still think that Fixtures provides a pretty good way of introducing structure data into our tables, but just needs a little bit more work before it plays nicely with migrations. We have also found the time to start writing the schema dump/load equivalents for seed data (when creating database from schemas, i.e. in production). These methods are just being finished and tested and we will post them up here in the next couple of days.

Seeding data as part of Rails database migrations

So, we stumbled across a slight problem when trying to seed data as part of our database migration scripts. The de-facto standard seems to be to keep the seed data in yaml files and load them in using fixtures. This is a perfectly adequate way of doing it for seed data that will never change, but doesn’t really fit into the infrastructure of migrations. There are also many plugins out there to help with using fixtures to seed data, but none of them allow you to have different seed data for different migration versions.

After bumping our heads together, we came up with a way of managing seed data using the same philosophy as data structure migrations. The only prerequisite to your data structure is to add a column named “came_from_migration” to any table that is going to contain seeded data. This way when moving down migrations it is possible to determine what data that shouldn’t be there (and delete it).

  1. Add a “fixtures” folder to /db/migrate. Your seed data yaml files will be saved here.
  2. Create your seed data yaml file and name it in a similar fashion to your migration script using the table name as the descriptor e.g. 005_table_name_to_seed.yml. Save this in the folder you created above. This means that you can seed several tables as part of one migration version and it’s nice and easy to see what’s going on from just the file name.
  3. Now I needed to write a couple of helper methods to move to and from the seed data. I’ve placed these in a MigrationsHelper module which is included in any migration that needs to seed data. And because we are so nice, here they are:
  def seed_from_yaml(table_name)
    info = get_migration_file_info(caller[0])
    fixture_folder = RAILS_ROOT + "/db/migrate/#{info[:type]}_fixtures"
    fixture_file = "#{info[:version]}_#{table_name}"
    puts "Seeding #{table_name} from #{info[:type]} fixture version #{info[:version]}"

    require 'active_record/fixtures'    
    connection = ActiveRecord::Base.connection
    seed_data = Fixtures.new(connection, table_name.to_s, nil, File.join(fixture_folder, fixture_file))
    seed_data.insert_fixtures
  end
  
  def deseed(table_name)
    info = get_migration_file_info(caller[0])
    puts "Deseeding #{table_name} with anything newer than #{info[:version]}"
        
    execute %{DELETE FROM #{quote_table_name(table_name)} WHERE #{quote_column_name('came_from_migration')}=#{info[:version].to_i}}
  end
  
  private
    def get_migration_file_info(file)
      file.gsub!(/:.*/, '') # get rid of everything after the colon
      migration = file.split '/'
      db_type = migration.include?("global") ? "global" : "private"
      migration_file = migration[migration.length-1].split '_'
      migration_number = migration_file[0]
      { :type => db_type, :version => migration_number, :file => migration_file }
    end

Seeding data when not using migrations

What about when running tests or deploying, i.e. seeding data when not using migrations and coming from a empty database? Well, the schema should be held for you in your _schema.rb file, but that doesn’t cover any seed data. The answer is to create a *_seed.yml containing all the data in the database (in the same way the *_schema.rb file contains the complete data structure schema of the database). This way you can load in the seed data after the tables have been created. We have yet to write this method, so is left as an exercise for the reader at present. The basic premise is to go through every table in the schema, check for the existence of the “came_from_migration” column, do a “select * from table where came_from_migration is not null” on tables with that column and then concatenate the output from the select statements to a *_seed.yml and save in /db. There may also be problems with referential integrity with seed data and to what order it is inserted into the database. No doubt we will come across these difficulties soon.

Bibliography