akavalar akavalar - 20 days ago 15
Python Question

Extracting raw data from a PowerPivot model using Python

What seemed like a trivial task turned into a real nightmare when I had to read some data from a PowerPivot model using Python. I believe I've researched this very well over the last couple of days but now I hit a brick wall and would appreciate some help from the Python/SSAS/ADO community.

Basically, all I want to do is programmatically access raw data stored in PowerPivot models - my idea was to connect to the underlying PowerPivot (i.e. MS Analysis Services) engine via one of the methods listed below, list the tables contained in the model, then extract the raw data from each table using a simple DAX query (something like "EVALUATE (table_name")). Easy peasy, right? Well, maybe not.

Some Background Information



As you can see, I've tried several different approaches. I'll try to document everything as carefully as possible so that those uninitiated in PowerPivot functionality will have a good idea of what I'd like to do.

First of all, some background on programmatic access to Analysis Services engine (it says 2005 SQL Server, but all of it ought to still be applicable): https://msdn.microsoft.com/en-us/library/ms345148.aspx and https://msdn.microsoft.com/en-us/library/dn141152.aspx

Sample Excel/PowerPivot file I'll be using in the example below can be found here: https://www.microsoft.com/en-us/download/details.aspx?id=102

Note that I'm using Excel 2010, so some of my code is version-specific. E.g.


wb.Connections["PowerPivot Data"].OLEDBConnection.ADOConnection


should be


wb.Model.DataModelConnection.ModelConnection.ADOConnection


if you're using Excel 2013.

Connection string I'll be using throughout this question is based on the information found here:
https://social.technet.microsoft.com/Forums/sqlserver/en-US/0b9499f6-f80f-4dcc-8ab9-c52574b14a70/connect-to-powerpivot-engine-with-c

Also, some of the methods apparently require some sort of initialization of the PowerPivot model prior to data retrieval. See here: https://gobansaor.wordpress.com/2011/08/31/automating-powerpivot-refresh-operation-from-vba/

Finally, here's a couple of links showing that this should be achievable (note however, that these links mainly refer to C#, not Python):



Using ADOMD



import clr
clr.AddReference("Microsoft.AnalysisServices.AdomdClient")
import Microsoft.AnalysisServices.AdomdClient as ADOMD
ConnString="Provider=MSOLAP;Data Source=$Embedded$;Locale Identifier=1033;Location=H:\\PowerPivotTutorialSample.xlsx;SQLQueryMode=DataKeys"
Connection=ADOMD.AdomdConnection(ConnString)
Connection.Open()


Here, it appears the problem is that the PowerPivot model has not been initialized:


AdomdConnectionException: A connection cannot be made. Ensure that the
server is running.


Using AMO



import clr
clr.AddReference("Microsoft.AnalysisServices")
import Microsoft.AnalysisServices as AMO
ConnString="Provider=MSOLAP;Data Source=$Embedded$;Locale Identifier=1033;Location=H:\\PowerPivotTutorialSample.xlsx;SQLQueryMode=DataKeys"
AMOServer = AMO.Server()
AMOServer.Connect(ConnString)


Same story, "the server is not running":


ConnectionException: A connection cannot be made. Ensure that the
server is running.


Note that AMO is technically not used for querying data, but I included it as one of the potential ways of connecting to the PowerPivot model.

Using ADO.NET



import clr
clr.AddReference("System.Data")
import System.Data.OleDb as ADONET
ConnString="Provider=MSOLAP;Data Source=$Embedded$;Locale Identifier=1033;Location=H:\\PowerPivotTutorialSample.xlsx;SQLQueryMode=DataKeys"
ADONETconn = ADONET.OleDbConnection()
ADONETconn.ConnectionString = ConnString
ADONETconn.Open()


This is similar to http://stackoverflow.com/a/301746. Unfortunately, this also doesn't work:


OleDbException: OLE DB error: OLE DB or ODBC error: The following
system error occurred: The requested name is valid, but no data of
the requested type was found.


Using ADO via adodbapi module



import adodbapi
ConnString="Provider=MSOLAP;Data Source=$Embedded$;Locale Identifier=1033;Location=H:\\PowerPivotTutorialSample.xlsx;SQLQueryMode=DataKeys"
conn = adodbapi.connect(ConnString)


Similar to Oppposite Workings of OLEDB/ODBC between Python and MS Access VBA. The error I get is:


OperationalError: (com_error(-2147352567, 'Exception occurred.', (0,
u'Microsoft OLE DB Provider for SQL Server 2012 Analysis Services.',
u'OLE DB error: OLE DB or ODBC error: The following system error
occurred: The requested name is valid, but no data of the requested
type was found...


This is basically the same problem as with ADO.NET above.

Using ADO via Excel/win32com module



from win32com.client import Dispatch
xlfile="H:\\PowerPivotTutorialSample.xlsx"
xlApp=Dispatch("Excel.Application")
wb=xlApp.Workbooks.Open(xlfile)
wb.Connections["PowerPivot Data"].Refresh()
connection=wb.Connections["PowerPivot Data"].OLEDBConnection.ADOConnection
recordset=Dispatch('ADODB.Recordset')
DAXquery="EVALUATE(dbo_DimDate)"
recordset.Open(DAXquery,connection)


The idea for this approach came from the following blog post that uses VBA: http://www.powerpivotblog.nl/export-a-table-or-dax-query-from-power-pivot-to-csv-using-vba/. Note that this approach uses an explicit Refresh command that initializes the model ("server"). Error:


com_error: (-2147352567, 'Exception occurred.', (0,
u'ADODB.Recordset', u'Arguments are of the wrong type, are out of
acceptable range, or are in conflict with one another.',
u'C:\Windows\HELP\ADO270.CHM', 1240641, -2146825287), None)


Here it appears the ADO connection has been established:


  • type(connection) returns




instance



  • print(connection) returns




Provider=MSOLAP.5;Persist Security Info=True;Initial
Catalog=Microsoft_SQLServer_AnalysisServices;Data
Source=$Embedded$;MDX Compatibility=1;Safety
Options=2;ConnectTo=11.0;MDX Missing Member
Mode=Error;Subqueries=2;Optimize Response=3;Cell Error Mode=TextValue


The problem, however, seems to be in the creation of the ADODB.Recordset object.

Using ADO via Excel/win32com - direct use of ADODB.Connection and ADO.Recordset



from win32com.client import Dispatch
conn = Dispatch('ADODB.Connection')
ConnString="Provider=MSOLAP;Data Source=$Embedded$;Locale Identifier=1033;Location=H:\\PowerPivotTutorialSample.xlsx;SQLQueryMode=DataKeys"
conn.Open(ConnString)


Similar to http://stackoverflow.com/a/12761442 and http://code.activestate.com/recipes/196462-query-access-using-ado-in-win32-platform/. Unfortunately, the error Python spits out is the same as in the two examples above:


com_error: (-2147352567, 'Exception occurred.', (0, u'Microsoft OLE DB
Provider for SQL Server 2012 Analysis Services.', u'OLE DB error: OLE
DB or ODBC error: The following system error occurred: The requested
name is valid, but no data of the requested type was found. ..', None,
0, -2147467259), None)


Using ADO via Excel/win32com (direct use of ADODB.Connection and ADO.Recordset together with refreshing the PowerPivot model)



from win32com.client import Dispatch
xlfile="H:\\PowerPivotTutorialSample.xlsx"
xlApp=Dispatch("Excel.Application")
wb=xlApp.Workbooks.Open(xlfile)
wb.Connections["PowerPivot Data"].Refresh()
ConnStringInternal="Provider=MSOLAP.5;Persist Security Info=True;Initial Catalog=Microsoft_SQLServer_AnalysisServices;Data Source=$Embedded$;MDX Compatibility=1;Safety Options=2;ConnectTo=11.0;MDX Missing Member Mode=Error;Optimize Response=3;Cell Error Mode=TextValue"
conn = Dispatch('ADODB.Connection')
conn.Open(ConnStringInternal)


I was hoping I can initialize an instance of Excel, then initialize the PowerPivot model, and then create a connection using the internal connection string Excel uses for embedded PowerPivot data (similar to http://stackoverflow.com/a/33580647 - note that the connection string is different from the one I've used elsewhere). Unfortunately, this doesn't work and my guess is that Python starts the ADODB.Connection process in a separate instance (I get the same error message when I execute the last three rows without initializing Excel, etc.):


com_error: (-2147352567, 'Exception occurred.', (0, u'Microsoft OLE DB
Provider for SQL Server 2012 Analysis Services.', u'Either the user,
****** (masked), does not have access to the
Microsoft_SQLServer_AnalysisServices database, or the database does
not exist.', None, 0, -2147467259), None)

Answer

I get the following error

Traceback (most recent call last):
  File "C:\Users\milang\Desktop\Power BI.py", line 174, in <module>
    session = restorePowerPivot("D:/Downloads/PowerPivotTutorialSample.xlsx", "D:/", 60000, pathPowerBI)
  File "C:\Users\milang\Desktop\Power BI.py", line 28, in restorePowerPivot
    target = file(os.path.join(folder, abfname), 'wb')
NameError: name 'file' is not defined