antonyCas antonyCas - 26 days ago 6
Ruby Question

calculating the occurrences of word lengths in a file with ruby

I am currently trying to count the occurrences of word lengths in a file. The method looks like this:

def count_words_of_each_length_in_a_file(file_path)
hash = {}
File.open(file_path,"r") do |f|
f.each_line do |line|
line.split(" ").each do |word|
hash.key?(word.length) ? hash[word.length] += 1 : hash[word.length] = 1
end
end
end
hash
end


It is not returning the expected values, can anyone tell me why or point me towards a better solution?

Answer

Use String#scan passing in the regex for any word or ' character:

scan(/[\w\']+/)

So your code looks like this:

#script.rb

def count_words_of_each_length_in_a_file(file_path)
  hash = {}
  File.open(file_path,"r") do |f|
    f.each_line do |line|
      line.scan(/[\w\']+/).each do |word|
        hash.key?(word.length) ? hash[word.length] += 1 : hash[word.length] = 1
      end
    end
  end
  hash
end

Example

#test.rb
o
tw tw
thr thr, thr thr
four four. four four
they've they've

Then run your program:

count_words_of_each_length_in_a_file('./test.rb')
#=> {1=>1, 2=>2, 3=>4, 4=>4, 7=>2}

caveat: the above solution is a starting point but not altogether watertight. For example consider, hyphenated-words. What are your rules for dealing with these types of words?