user2524994 user2524994 - 1 month ago 12
Python Question

Find ranges of Roman Numerals in SQL

I am looking for a way to find the ranges in a SQL statement.

My data looks like this

COUNTRY PROTEIN MG
1 China 42.8 II
2 China 42.3 III
3 China 41.9 III
4 United States 40.0 IV
5 China 43.2 I
6 China 42.5 IV
7 China 42.9 III
8 China 45.9 VI
9 Japan 42.3 VI
10 United States 40.9 III


I am currently running a statement like this where I get the average for the protein values but I do not know how to go about the range of the roman numerals

select COUNTRY, avg(PROTEIN), MG from data group by COUNTRY


I would like my output to look like this (NOT CORRECT NUMBERS JUST AN EXAMPLE)

COUNTRY MEAN MG_RANGE
China 42.3 I-III
United States 45.2 I-VI


I have this file as a CSV so I am open to alternatives in R and Python that would give me the same output. It could be numerical as well whatever output is easiest.

Answer

In R, the utils package has a non-exported function .roman2numeric(), for converting character Roman numerals to their respective numeric values. We can convert from Roman to numeric then aggregate to find the range within a country.

(dat$MG2 <- utils:::.roman2numeric(as.character(dat$MG))
#  2   3   3   4   1   4   3   6   6   3 
dat
#         COUNTRY PROTEIN  MG MG2
# 1         China    42.8  II   2
# 2         China    42.3 III   3
# 3         China    41.9 III   3
# 4  UnitedStates    40.0  IV   4
# 5         China    43.2   I   1
# 6         China    42.5  IV   4
# 7         China    42.9 III   3
# 8         China    45.9  VI   6
# 9         Japan    42.3  VI   6
# 10 UnitedStates    40.9 III   3

And to (arbitrarily) find the range of MG2 per country, we can do

aggregate(MG2 ~ COUNTRY, dat, range)
#        COUNTRY MG2.1 MG2.2
# 1        China     1     6
# 2        Japan     6     6
# 3 UnitedStates     3     4
Comments