I have PPTX files generated by users with PowerPoint 2016. The slides have embedded excel worksheets which I need to access for further processing. I am using Open Xml SDK v2.6.1 in my project.
On passing the embedded object stream to the SpreadsheetDocument, using the following code:
using (PresentationDocument pd = PresentationDocument.Open(pptxFile, true))
foreach (SlidePart slide in pd.PresentationPart.GetPartsOfType<SlidePart>())
foreach (EmbeddedObjectPart eoPart in slide.EmbeddedObjectParts)
using (SpreadsheetDocument sd = SpreadsheetDocument.Open(eoPart.GetStream(), true))
// do some work with worksheets
var count = sd.WorkbookPart.WorksheetParts.Count();
System.IO.FileFormatException: File contains corrupted data.
at System.IO.Packaging.ZipPackage..ctor(Stream s, FileMode packageFileMode, FileAccess packageFileAccess)
at System.IO.Packaging.Package.Open(Stream stream, FileMode packageMode, FileAccess packageAccess)
at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.OpenCore(Stream stream, Boolean readWriteMode)
at DocumentFormat.OpenXml.Packaging.SpreadsheetDocument.Open(Stream stream, Boolean isEditable, OpenSettings openSettings)
I finally figured out that though a tool like WinRar shows that the embedded object is SFX zip volume, it actually is a MS-CFB (Compound file binary) file.
You can work with CFB files in the following ways:
Bottom line, in order to work with office documents embedded in other office documents as embedded objects, are saved in MS-CFB format. Reading and writing to these files needs to be done outside of Open XML SDK, either using Win API or any other alternative.