BlogExportFileToBook: VBA code to produce HTML blog book from Blogger blog backup/export XML file

Last updated on 9 Sep. 2023 

9 Sep. 2023 Update: I rarely use the software covered in this post now. To see the current software for creating Blogger blogbooks that I use, please visit: Short User Guide to creating Blogger Blogbooks from Backup/Export File using ExportFileFilterAndGenBook and another VBA projects' macros/code (free and open source), https://ravisiyermisc.blogspot.com/2023/09/short-user-guide-to-creating-blogger.html .

end-Update 9 Sep. 2023

I decided to put up and publish this post now on 25 Jul 2023 while this work is in progress, as I wanted to ensure that whatever work I have done till now is available for interested readers. 

This post follows up on my previous post: VBA code to read XML structure of blogger blog entries in export file - Test version, https://ravisiyermisc.blogspot.com/2023/07/vba-code-to-read-xml-structure-of.html . I was able to take it further by writing a VBA macro BlogExportFileToBook (and related macros) which takes a Blogger "Back up content" (or Export) file which is in XML format, as input and produces an HTML blog book file (output file) having all posts and pages contained in the Backup/Export file. It also produces a text log file containing short information about each post or page written to HTML output file, along with some information about the run (or macro execution). [As of now, the macro does not read and output the comments in the blog which are also contained in the blog backup/export file. See 26 Jul 2023 Update later on in the post about test version of dictionary of post links and associated comments.]

1 Aug 2023 Update Start

Made minor changes in this version:

  • Added Newline to separate posts and pages in output HTML file. This makes it more convenient to view the output file in Notepad++ as new posts and pages always start on a new line.
  • Added to header comment in Module-CreateCommentsDictionary to make it same as/similar to main module header comment

The code files of this version are:

1 Aug 2023 Update End

30 Jul 2023 Update

For SplitBlogBook to generate and insert Contents Internal Links (like TOC but without page numbers) to posts and pages in the split blog book, an id attribute had to be added in post title and page title h1 tags in the blog book created by this BlogExportFileToBook program/macro. That, IIRC, is the main change to the code in this version. The code files are:

30 Jul 2023 End-Update

29 Jul 2023 Update

Latest version code and data share folder: 20230729Run,  https://drive.google.com/drive/folders/1VrXEnwRprCr5uLCh6wxvw2onAsSYM-KA?usp=drive_link . I think this version is quite stable now and so can be viewed as a stable version instead of a test version.

In this version, the CommentsExtraction part described in 26 Jul 2023 update section below, is integrated with the main BlogExportFileToBook code. This project now has two modules:
  1. Module MainBlogExportFileToBook: 20230729-Module-MainBlogExportFileToBook-Code.txt,  https://drive.google.com/file/d/17bYTvxZADu6igHnrFd6eLNO3xf_m7ZUu/view?usp=drive_link
  2. Module CreateCommentsDictionary: 20230729-Module-CreateCommentsDictionary-Code.txt,  https://drive.google.com/file/d/1fgYFwEC3S_rTf4cz98NaDIRWZgLG4mG2/view?usp=drive_link
The output data and log files for the test runs are also provided in above folder. Two main output files are:

29 Jul 2023 End-Update

[25 Jul 2023 dated section continues below.]

Here's the folder share for this macro having all the associated files: BlogExportFileToBook,  https://drive.google.com/drive/folders/1FRRgqmjjd4FdGvM5kJeS-iRSG4C0OGhA?usp=drive_link .

The macro code is copy-pasted into 20230725v2-Macro-code.txt, https://drive.google.com/file/d/13nrp319uhFP4qzZcehweCdXGuWgJfWpx/view?usp=drive_link . Note that I used the macro in Microsoft Word 2007 program and document.

The VBA code has 3 key subs listed below, along with some support subs/functions (not listed below):

  • BlogExportFileToBook - the main function/sub.
  • DriverBlogExportFileToBook- invokes BlogExportFileToBook 
  • DriverPromptInputBlogExportFileToBook  - invokes BlogExportFileToBook 

The macro creates UNICODE encoding output file and log file to enable proper rendering of Devanagari characters. However, the VBA code used produces UTF-16 LE UNICODE encoding output and log files which makes the file sizes roughly twice of what a UTF-8 encoded file would be. UTF-8 is good enough for my needs. I could not find a way to specify in VBA code that I wanted a UTF-8 file instead of UTF-16 LE. So I converted the UTF-16 LE output files to UTF-8 by running Windows PowerShell commands. "Converting UTF-16 file to UTF-8 on Windows.txt" file has the details of these commands, https://drive.google.com/file/d/1Q_mra6FTCqh9WkGXpjSKy1Qv8tYYnirM/view?usp=drive_link . The log file is not very large and so I left that as UTF-16 LE encoding.

For testing, I used the backup/export files of two of my blogger blogs. Details are given below. [Google Drive Preview renders the output HTML file as text with HTML tags shown and so is not useful. Download the output HTML file to PC using download button in top right of preview page and then open in browser. On my 4 GB RAM Windows 10 PC, Chrome opened the 10.2 MB output file fully after a few minutes and I was able to comfortably browse through the large HTML document. The Log text files are rendered properly by Google Drive Preview.]

1) Input file: 20230523-tns-blog.xml, 776 KB.
Output file: 20230523-tns-blog-BlogBook.html (in UTF-16 LE), 943 KB.
Output file converted to UTF-8: 20230523-tns-blog-BlogBook-utf8.html, 491 KB, https://drive.google.com/file/d/1q6L3SzO2nRiVwmgaZtut8CqTZYzbB-Z6/view?usp=drive_link .
Log file: 20230523-tns-blog-BlogBookLog.txt, 13 KB, https://drive.google.com/file/d/17InWB17KOOfsbBW2cvF4nmhw9D2_O5-b/view?usp=drive_link .

2) Input file: 20230623-ravisiyermisc-blog.xml, 12.5 MB.
Output file: 20230623-ravisiyermisc-blog-BlogBook.html (in UTF-16 LE), 20.3 MB.
Output file converted to UTF-8: 20230623-ravisiyermisc-blog-BlogBook-utf8.html, 10.2 MB, https://drive.google.com/file/d/1yLJsceYtPPjzjBTu9dG1cUYnA3TSiLU3/view?usp=drive_link 
Log file: 20230623-ravisiyermisc-blog-BlogBookLog.txt, 474 KB, https://drive.google.com/file/d/18Q8uKgaCl81_nEt0yajKzypJ8jXnnZdd/view?usp=drive_link .

========================================
26 Jul 2023 Update

The test version of the code creating a dictionary of post links and comments is ready and shared in this folder: CommentsExtraction, https://drive.google.com/drive/folders/1A715O65Zbgy_3dgyZ9zDj3VnAbfgVdpB?usp=drive_link .
The main files are:
  1. The VBA macro code file: 20230726v2-Macro-TestBlogCommentsExtraction-code.txt,   https://drive.google.com/file/d/160YhiisAg2Ia7Yf_nuQQi8BsMy2c8Cm5/view?usp=drive_link
  2. Input Backup/Export XML file: 20230623-ravisiyermisc-blog.xml (12.5 MB),   https://drive.google.com/file/d/1JK5p7K_tV81bsJr47krN4NZ9qveUproA/view?usp=drive_link
  3. Write-out of created comments dictionary (HTML file): 20230623-ravisiyermisc-blog-commnts-dict.html (127 KB),   https://drive.google.com/file/d/1vedwRAPm0pQUC_YB0OvvqieMdWQMDXF8/view?usp=drive_link 
  4. Write-out of comments in input file in sequential order (HTML file): 20230623-ravisiyermisc-blog-commnts-seqlist.html (118 KB),  https://drive.google.com/file/d/1B4NWsJsm4rOlwfLbdQ3eOyxhyDmzcKDC/view?usp=drive_link
  5. Log file: 20230623-ravisiyermisc-blog-commnts-log.txt (36 KB),  https://drive.google.com/file/d/1HA_DCcOoIAKTglpW_88TbjSBZ8arRewj/view?usp=drive_link

Comments

Archive

Show more