LEoREo_2247 LEoREo_2247 - 4 months ago 5x
Ruby Question

Replacing html text based on database

I want to make app which will switch vocabulary in desired url of webpage Japanese to English.
But firstable I want start form just simply display desired url of webpage inline lust like Google Translate.(See here)

I got html data from desired url using code below,
and now I want to replace text in html all at same time based data on database.

def submit
require 'open-uri'

charset = nil
@html = open(params[:url]) do |f|
charset = f.charset

Database is undone, but I am going to contain Japanese vocabulary which should be switched, and English vocabulary which should be switched instead of Japanese vocabulary.

Any ideas or ways to do this?
Also, I just started learning Ruby on Rails recently so it would be nice if you explain it with some examples or detailed explanation :)

I just want to replace particular word in text based on item on database,I don't want to multilingualism.


For example i got following html below from desired webpage.

<p>I want to switch "aaa" this and "ccc"</p>

Lets say I want to switch(Replace) "aaa" to "bbb", "ccc" to "ddd".
Word that should be switched and be switched instead of previous word are in database.(Target:"aaa","ccc" Switch:"bbb","ddd")

since this html is the one i got it using open-uri, i can't implement code like


Working based on the code in this answer and this answer, you could do something like this:

replacements = {'aaa' => 'ccc', 'bbb' => 'ddd' }
regex = Regexp.new(replacements.keys.map { |x| Regexp.escape(x) }.join('|'))
doc = Nokogiri::HTML::DocumentFragment.parse(html)
doc.traverse do |x|
  if x.text?
    x.content = x.content.gsub(regex, replacements)

I've also tested that:

replacements = {'こんにちは' => 'Good day', 'bbb' => 'ddd' }
regex = Regexp.new(replacements.keys.map { |x| Regexp.escape(x) }.join('|'))
"こんにちは Mr bbb".gsub(regex, replacements)

Gives the expected:

Good day Mr ddd

You might also want to use:

regex = Regexp.new(replacements.keys.map { |x| '\\b'+Regexp.escape(x)+'\\b' }.join('|'))

to prevent "aaardvark" being changed into "cccrdvark".