Rails 2.3, named_scope, destroy_all and callback confusion
Background
I was recently working on a Rails 2.3.5 application when I found myself
stumped on some perplexing behavior that arose when using a named scope,
the #destroy_all
method and an after destroy callback fired
by an ActiveRecord observer. Let’s begin with the models. There are
Asset and User models. The assets table’s columns include the integer
file_size
and timestamp delete_after
. There
is a named scope, expired
, which returns all assets for
which the value of delete_after
is in the past. Assets
belong to users.
class Asset < ActiveRecord::Base
belongs_to :user, :counter_cache => true
named_scope :expired, lambda { {:conditions => ['delete_after < ?', Time.now]} }
end
# == Schema information
# ...
# user_id :integer not null
# file_size :integer default(0), not null
# delete_after :datetime
# ...
Users have many assets. The users table has a large integer column,
current_disk_usage
, which holds the denormalized sum of
the file sizes of the assets belonging to each user. There is an
instance method, #recalculate_disk_usage!
, which
triggers the User instance to update current_disk_usage
be recalculating the sum of the user’s assets.
class User < ActiveRecord::Base
has_many :assets
def recalculate_disk_usage!
update_attribute(:current_disk_usage, assets.sum(:file_size))
end
end
# == Schema information
# ...
# current_disk_usage :integer(12) default(0), not null
# ...
There is an AssetObserver class which triggers
User#recalculate_disk_usage!
any time an asset record is
created, updated or destroyed. (The implementation show below is
inefficient, and one workaround to the problem discussed in this article
was to make the observer a bit gentler on the database. That version can
be found at the bottom of the article.)
class AssetObserver < ActiveRecord::Observer
def after_create(asset)
asset.user.recalculate_disk_usage! unless asset.file_size.zero?
end
def after_update(asset)
asset.user.recalculate_disk_usage! if asset.file_size_changed?
end
def after_destroy(asset)
asset.user.recalculate_disk_usage!
end
end
The problem
The application in question must periodically purge asset records which
have expired (where delete_after
is in the past). This is
accomplished with a Rake task which sends #destroy_all
to
the Asset.expired
named scope.
task :purge_expired => :environment do
Asset.expired.destroy_all
end
And this is where the confusion set in. Whenever I manually created,
updated or destroyed an asset the asset observer kicked in as expected
and the user’s current disk usage was updated accordingly. Whenever I
ran the purge_expired
Rake task, the correct assets were
destroyed, but the user’s current disk usage would get set to zero. If I
manually created, updated or destroyed another asset the current disk
usage would once again be set to the correct, up-to-date value. It was
time to see what SQL was actually being run, so I opened up the log
file.
DELETE FROM "assets" WHERE "id" = 2
SELECT sum("assets".file_size) AS sum_file_size FROM "assets" WHERE (("assets".delete_after < '2010-04-30 06:41:20.755388') AND ("assets".user_id = 5))
UPDATE "users" SET "updated_at" = '2010-04-30 06:41:21.189777', "current_disk_usage" = 0 WHERE "id" = 5
J’accuse! The DELETE
statement on the first line removes
the record from the assets table, and the UPDATE
on the
last line shows the user’s disk usage being set to zero. But what’s up
with the SELECT
in the middle? It’s calculating the sum of
the file sizes of the user’s assets, but it’s limited the sum to those
assets which have expired – look at the first half of the
WHERE
clause. For some reason the conditions imposed by the
Asset.expired
named scope are filtering into the call to
user.assets.sum(:file_size)
. I took a look at the
implementation of ActiveRecord::Base.destroy_all
for a
clue.
# File activerecord/lib/active_record/base.rb, line 876
def destroy_all(conditions = nil)
find(:all, :conditions => conditions).each { |object| object.destroy }
end
Not really an obvious answer, but it helped to steer my thinking.
Asset.expired
returns an instance of
ActiveRecord::NamedScope::Scope
. Sending
#destroy_all
to that scope calls
ActiveRecord::Base.find(:all)
which returns an array of
Asset
instances. We then iterate over that array, sending
#destroy
to each asset. For each asset, after
#destroy
runs,
AssetObserver#after_destroy(asset)
is called which calls
asset.user.recalculate_disk_usage!
which in turn calls
asset.user.assets.sum(:file_size)
.
A workaround
Okay, but I still didn’t know why the expired
named scope
conditions at the beginning of the process were showing up again when
calling user.assets
near the end of the process. However I
had a hunch the call to ActiveRecord::Base.find
had
something to do with it, so to test this hunch I modified the Rake task,
removing the call to destroy_all
and instead iterating
over the scope directly.
task :purge_expired => :environment do
Asset.expired.each { |asset| asset.destroy }
end
So what happens when I run the Rake task now?
DELETE FROM "assets" WHERE "id" = 12
SELECT sum("assets".file_size) AS sum_file_size FROM "assets" WHERE ("assets".user_id = 5)
UPDATE "users" SET "updated_at" = '2010-04-30 18:13:41.050527', "current_disk_usage" = 1696416 WHERE "id" = 5
Et voilà! I don’t know what destroy_all
’s use of
ActiveRecord::Base.find
does that leaks into later elements
of the destruction process. Bypassing destroy_all
by
iterating over the named scope directly seems to fix the problem, and to
be honest, while the original version with destroy_all
looked more like idiomatic Rails, the use of each
feels
more like idiomatic Ruby, i.e., purer. I think this may be a bug, but
I’d need to get deeper into the guts of Rails’ traversal across the
association to be sure. I’ve only encountered this problem on Rails 2.3.
I haven’t tested the Rails 3 release candidate yet.
A different workaround
But wait, there’s more. As I mentioned earlier, the original
implementation of AssetObserver
isn’t efficient. Here’s the
problem: each time an asset record is destroyed, the database is issued
both SELECT SUM
and UPDATE
statements. The
latter is unavoidable, but is the former strictly necessary?
ActiveRecord provides models with #increment!
and
#decrement!
instance methods. They’d normally be used to
increment or decrement a counter by one, but they both take optional
second arguments which are the amount by which the counter
(current_disk_usage
) should be changed. When the
observer’s callback is triggered, the observed object (in this case, an
asset) is passed as an argument to the callback method. Since the
callback is acting on a single object at the time, instead of telling
the user to calculate a sum of all remaining asset’s file sizes, we
could just increment or decrement the current disk usage by the file
size of the asset being changed (or in the case of updates, by the
difference between the new and old file sizes). Here’s a simple version
of how that would look:
class AssetObserver < ActiveRecord::Observer
def after_save(asset)
if asset.file_size_changed?
difference = asset.file_size_change.last - asset.file_size_change.first
asset.user.increment!(:current_disk_usage, difference)
end
end
def after_destroy(asset)
asset.user.decrement!(:current_disk_usage, asset.file_size)
end
end
By removing the SELECT SUM
call we certainly won’t hit the
database as hard for each asset being removed, so perhaps this is a more
correct way of doing things.
As it happens, the application this problem comes from has very light
database use as it is. With only a couple hundred users, each with a few
dozen assets, usually not more than a couple dozen assets total expiring
each week, and reliably long window of inactivity each night (asset
expiration happens once per day), peppering the database with a few
dozen SELECT SUM
statements isn’t a big deal. The use of
User#recalculate_disk_usage!
also has the benefit of
being self-healing: if, somehow, the current disk usage value for a user
were to become corrupted, it would be automatically fixed the next time
#recalculate_disk_usage!
was called. If instead of
simply triggering the user’s recalculating of disk usage,
AssetObserver
becomes responsible for the actual values
being added or subtracted from current_disk_usage
there
may be implications for possible data corruption (although that should
be pretty well mitigated by the asset activities being wrapped in a
single database transaction).