The XML Guts of an Excel Workbook File
Smorgasbord / Kirt's Cogitations™ #337

RF Cafe University"Factoids," "Kirt's Cogitations," and "Tech Topics Smorgasbord" are all manifestations of my ranting on various subjects relevant (usually) to the overall RF Cafe theme. All may be accessed on these pages:

 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37

 <Previous                     Next>

 

Some years ago while first developing my "RF Cascade Workbook" spreadsheets, I read that when Microsoft began using the XML file format for Excel with the 2007 version (Office 12), what appears in the File Manager as a *.xls or *.xlsm (*.xls with VBA‡ macros) is actually a compressed collection of individual XML files and possibly a *.bin and any images you might have buried within. If you want to see what actually makes up your Excel file, follow these simple instructions. A word of warning though, as Otto von Bismarck is reported to have admonished, "Laws are like sausages. It is better not to see them being made." After seeing what goes into an Excel file, you might loose your taste for them (not really, it just seemed like an apt quotation at the moment).

There may be another way to dissect an Excel file, but probably the easiest is the following:

  • Change the file name extension from *.xls or *.xlsm to *.zip. You will get a warning box about potentially making it unstable, but ignore that.

  • Use your favorite unzipping application to extract the compressed files (right-click and select "Extract all..." works, too).

  • Let it save all the files to the default folder it wants to crate, or give the new folder a different name.

  • Navigate to the new folder and look at what is in there.

My RF-Cascade-Workbook-v2018p5.xlsm file (with VBA macro code) contains the following:

The Guts of an Excel *.xls or *.xlsm Workbook File - RF Cafe Smorgasbord

One thing I noticed right off is that the worksheet files are not named according to the names assigned on the tabs which are at the bottoms of the pages. "sheet1.xml" is for the page named "Start," "sheet2.xml" is the page named "System Definition," etc., as can be seen in this screen shot.

When you view the XML files with a text editor, all the cell contents including numbers, formulas, and VBA calls are easily read, as are cell formatting parameters. It is amazing how much stuff makes up the spreadsheet. That it can update everything so quickly each time you make a change in a cell is quite an accomplishment.

The "./xl" subfolder contains a file named "vbaProject.bin" which, being a binary file, is quite cryptic when viewed in a text reader. However, portions of the VBA code are discernable within it. "RF Cascade Workbook" contains a large amount of VBA code (all written by me) to make it work.

Simpler spreadsheet workbooks do not contain nearly as much in the way of files. You might want to try this dissection on one of your Excel files, just because now you know you can.

 

‡ VBA = Visual Basic for Applications, Microsoft Office's built-in coding language containing a lot in common with Visual Basic 6. It is a full-featured object-oriented language.

 

 

Posted April 19, 2022