Mike R Mike R - 4 months ago 11
JSON Question

How to parse a text file containing multiple lines of data and organized by numerical values and then convert to JSON

I need to parse a text file with the following format and convert it to a Hash which will be converted to JSON.

The text file has this format:

HD040008000415350110XXXXXXXXXX0208XXXXXXXX0302EN0403USA0502EN0604000107014
EM04000800030010112TME001205IQ50232Blue Point Coastal Cuisine. INC.06145655th Avenue0805921010909SAN DIEGO1008Downtown1102CA1203USA


Every line is a group of segments which work with a Key value format. For example, the second line would be:


  • EM
    is key

  • 04
    is the length of the value including blank spaces

  • 0008
    is the value



Breaking it up, it would look like this
EM 04 0008
. The next segment keys are numerical and start with
00
and then increment until the end of the line which would then start all over again. I would need to iterate through every single line in a text file.

I need to be able to convert this into a Ruby hash which in turn would be converted to JSON in an API response.

The current format would be:

EM0400080003001


It would need to get parsed into:

{"EM" => 0008, "00" => "001"}

Answer

This is a very common type of encoding called Type-Length-Value (or Tag-Length-Value), for reasons I suppose are obvious. As with many such tasks in Ruby, String#unpack is a good fit:

def decode(data)
  return {} if data.empty?
  key, len, rest = data.unpack("a2 a2 a*")
  val = rest.slice!(0, len.to_i)
  { key => val }.merge(decode(rest))
end

p decode("HD040008000415350110XXXXXXXXXX0208XXXXXXXX0302EN0403USA0502EN0604000107014")
# => {"HD"=>"0008", "00"=>"1535", "01"=>"XXXXXXXXXX", "02"=>"XXXXXXXX", "03"=>"EN", "04"=>"USA", "05"=>"EN", "06"=>"0001", "07"=>"4"}

p decode("EM04000800030010112TME001205IQ50232Blue Point Coastal Cuisine. INC.0614565 5th Avenue0805921010909SAN DIEGO1008Downtown1102CA1203USA")
# => {"EM"=>"0008", "00"=>"001", "01"=>"TME001205IQ5", "02"=>"Blue Point Coastal Cuisine. INC.", "06"=>"565 5th Avenue", "08"=>"92101", "09"=>"SAN DIEGO", "10"=>"Downtown", "11"=>"CA", "12"=>"USA"}

If you want to read an entire file and return a JSON array of objects, something like this would suffice:

#!/usr/bin/env ruby -n
BEGIN {
  require "json"
  def decode(data)
    # ...
  end
  arr = []
}

arr << decode($_.chomp)

END { puts arr.to_json }

Then (supposing the script is called script.rb and is executable:

$ cat data.txt | ./script.rb > out.json