jwadsack jwadsack - 4 years ago 75
Ruby Question

How can I ensure that ActiveRecord is saving changes to the database

I have a complex system that involves many Resque workers, jobs, and a monitoring process. The jobs have parent-child dependencies and these run through a series of states (using state-machine), which is the reason for the monitoring process. We depend on the database state to ensure that cross-process tracking is in sync.

Here's a rough idea:

class ParentMonitor < ActiveRecord::Base
has_many children, class: ChildMonitor

state_machine :state, initial: :work_needed do
event :succeed do
transition :work_needed => :work_succeeded
end

event :fail do
transition :work_needed => :work_failed
end
end

def child_transition
return if children.any? { |child| child.work_needed? }

if children.any? { |child| child.work_succeeded? }
succeed
else
fail
end
end
end

class ChildMonitor < ActiveRecord::Base
belongs_to: owner, class: ParentMonitor

state_machine :state, initial: :work_needed do
event :succeed do
transition :work_needed => :work_succeeded
end
after_transition :to => :work_succeeded, :do => :notify_owner

event :fail do
transition :work_needed => :work_failed
end
after_transition :to => :work_failed, :do => :notify_owner
end

def notify_owner
owner.child_transition
end
end


What's happening is that for the first few such jobs (say a dozen or two out of several hundred), the ParentMonitors are being left in the
work_needed
state even though all children are either in
work_succeeded
or
work_failed
. Through tracing and testing I've determined that each the time
ParentMonitor#child_transition
is called, the list of children in "work needed" state has successively been reduced until at some point it makes a database load and replaces all the children with values of "work needed". Even though some had previously been completed.

In addition I don't see any
UPDATE
logs in the log file for these first few children until it suddenly starts logging the updates. That logging is simultaneous with when it seems to reset the states of all its children.

It makes me think that the changes are all happening in memory due to some cached state, but I've added
reload
,
save
and
find
calls throughout and they don't seem to effect change. I've also tried wrapping these calls in
uncache
but that doesn't help.

Answer Source

As it turned out this was caused by the fact that the writes were held in a long-running transaction because the state-machine gem holds open a transaction between the state change and the end of any after hooks. We had written hooks that ran for hours on the main monitoring loop.

We resolved this by performing the actions between state changes rather than in callbacks.

Incidentally, the erroneous behavior is exactly as described in the latest Red Book as a side-effect of "weak isolation" concurrency implemented in most RDMBS's:

example anomalies include reading intermediate data that another transaction produced, reading aborted data, reading two or more different values for the same item during execution of the same transaction, and “losing” some effects of transactions due to concurrent writes to the same item

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download