KidSudi KidSudi - 1 year ago 96
Python Question

Converting a Pandas DataFrame to R dataframe using Rpy2

I have a pandas dataframe that I convert to R dataframe using the convert_to_r_dataframe method from pandas.rpy.common. I have it set up as such:

self.event = pd.read_csv('C://' + self.event_var.get() + '.csv')
final_products = pd.DataFrame({'Product': self.event.Product, 'Size': self.event.Size, 'Order': self.event.Order})
r.assign('final_products', com.convert_to_r_dataframe(final_products))
r.assign('EventName', self.event_var.get())
r.assign('EventTime', self.eventtime_var.get())

where self.event_var.get() retrieves a user input in the GUI (I am creating an application using Tkinter). Product, Size, and Order are columns from the CSV file.

Since Rpy2 sets the R environment within Python, I would expect the final_products R dataframe to be understood by the R environment. Unfortunately, while the R script does run, it does not give the correct results (I create graphs using the R script but they are just empty when the program terminates). However, the EventName and EventTime variables do work. Is there something that I am missing here? Any ideas to why the assignment of the R dataframe within Python is not correctly being interpreted by the R environment?

The error obtained:

Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python27\lib\lib-tk\", line 1470, in __call__
return self.func(*args)
File "G:\Development\workspace\GUI\", line 126, in evaluate
File "C:\Python27\lib\site-packages\rpy2\robjects\", line 86, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "C:\Python27\lib\site-packages\rpy2\robjects\", line 35, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)

Answer Source

Unfortunately, this is going to be difficult because the Python -> R transformation is better than it used to be, but isn't perfect, and is still hard on Windows currently, which it looks like you're using.

This is a bit of a hack, but as a work-around you might try setting the name and time variables while you are assigning the pd.DataFrame before you convert the DataFrame into R.

Once it's in R, you'll need to use R functions to operate on the data frame, rather than your python functions---even your getter and setter will need to be passed into the R environment in a way that looks more like this:

myfunct = robjects.r('''
        f <- function(r, verbose=FALSE) {
            if (verbose) {
                cat("I am calling f().\n")
            2 * pi * r

from here.

But just to check that your DataFrame is being converted appropriately in the first place, you might start your debugging by running this:

import pandas as pd
import numpy as np
import pandas.rpy.common as com
from datetime import datetime

n = 10
df = pd.DataFrame({
    "timestamp": [ for t in range(n)],
    "value": np.random.uniform(-1, 1, n)

r_dataframe = com.convert_to_r_dataframe(df)

Is that producing something that looks like an R print statement of a dataframe, like so

>>>             timestamp        value
0 2014-06-03 15:02:20 -0.36672....
1 2014-06-03 15:02:20 -0.89136....
2 2014-06-03 15:02:20 0.509215....
3 2014-06-03 15:02:20 0.862909....
4 2014-06-03 15:02:20 0.389879....
5 2014-06-03 15:02:20 -0.80607....
6 2014-06-03 15:02:20 -0.97116....
7 2014-06-03 15:02:20 0.376419....
8 2014-06-03 15:02:20 0.848243....
9 2014-06-03 15:02:20 0.446798....

Example peeled from here and here.