marcamillion marcamillion - 4 months ago 5
Ruby Question

How do I extract just a specific portion of a code snippet from multiple files, that may be different in different files

So what I am doing is iterating over various versions of snippet of code (for e.g. Associations.rb in Rails).

What I want to do is just extract one snippet of the code, for example the has_many method:

def has_many(name, scope = nil, options = {}, &extension)
reflection = Builder::HasMany.build(self, name, scope, options, &extension)
Reflection.add_reflection self, name, reflection
end


At first I was thinking of just searching this entire file for the string
def has_many
and then saving everything between that string and
end
. The obvious issue with this, is that different versions of this file can have multiple
end
strings within the method.

For instance, whatever I come up with for the above snippet, should also work for this one too:

def has_many(association_id, options = {})
validate_options([ :foreign_key, :class_name, :exclusively_dependent, :dependent, :conditions, :order, :finder_sql ], options.keys)
association_name, association_class_name, association_class_primary_key_name =
associate_identification(association_id, options[:class_name], options[:foreign_key])

require_association_class(association_class_name)

if options[:dependent] and options[:exclusively_dependent]
raise ArgumentError, ':dependent and :exclusively_dependent are mutually exclusive options. You may specify one or the other.' # ' ruby-mode
elsif options[:dependent]
module_eval "before_destroy '#{association_name}.each { |o| o.destroy }'"
elsif options[:exclusively_dependent]
module_eval "before_destroy { |record| #{association_class_name}.delete_all(%(#{association_class_primary_key_name} = '\#{record.id}')) }"
end

define_method(association_name) do |*params|
force_reload = params.first unless params.empty?
association = instance_variable_get("@#{association_name}")
if association.nil?
association = HasManyAssociation.new(self,
association_name, association_class_name,
association_class_primary_key_name, options)
instance_variable_set("@#{association_name}", association)
end
association.reload if force_reload
association
end

# deprecated api
deprecated_collection_count_method(association_name)
deprecated_add_association_relation(association_name)
deprecated_remove_association_relation(association_name)
deprecated_has_collection_method(association_name)
deprecated_find_in_collection_method(association_name)
deprecated_find_all_in_collection_method(association_name)
deprecated_create_method(association_name)
deprecated_build_method(association_name)
end


Assuming that each value is stored as
text
in some column in my db.

How do I approach this, using Ruby's string methods or should I be approaching this another way?

Edit 1

Please note that this question relates specifically to string manipulation via using a Regex, without a parser.

Answer

As discussed, this should be done with a parser like Ripper.


However, to answer if it can be done with string methods, I will match the syntax with a regex, provided:

  • You can rely on indentation i.e. the string has the exact same characters before "def" and before "end".
  • There are no multiline strings in between that could simulate an "end" with the same indentation. That includes multine strings, HEREDOC, %{ }, etc.

Code

regex = /^
        (\s*)              # matches the indentation (we'll backreference later)
        def\ +has_many\b   # literal "def has_many" with a word boundary
        (?:.*+\n)*?        # match whole lines - as few as possible
        \1                 # matches the same indentation as the def line
        end\b              # literal "end"
        /x

subject = %q|
  def has_many(name, scope = nil, options = {}, &extension)
      if association.nil?
        instance_variable_set("@#{association_name}", association)
      end
  end|


#Print matched text
puts subject.to_enum(:scan,regex).map {$&}

ideone demo


The regex relies on:

  1. Capturing the whitespace (indentation) with the group (\s*),
  2. followed by the literal def has_many.
  3. It then consumes as few lines as it can with (?:.*+\n)*?.
    Notice that .*+\n matches a whole line
    and (?:..)*? repeats it 0 or more times. Also, the last ? makes the repetition lazy (as few as possible).
    It will consume lines until it matches the following condition...
  4. \1 is a backreference, storing the text matched in (1), i.e. the exact same indentation as the first line.
  5. Followed by end obviously.


Test in Rubular

Comments