BlogExportFileToBook: VBA code to produce HTML blog book from Blogger blog backup/export XML file
Last updated on 9 Sep. 2023
9 Sep. 2023 Update: I rarely use the software covered in this post now. To see the current software for creating Blogger blogbooks that I use, please visit: Short User Guide to creating Blogger Blogbooks from Backup/Export File using ExportFileFilterAndGenBook and another VBA projects' macros/code (free and open source), https://ravisiyermisc.blogspot.com/2023/09/short-user-guide-to-creating-blogger.html .
end-Update 9 Sep. 2023
I decided to put up and publish this post now on 25 Jul 2023 while this work is in progress, as I wanted to ensure that whatever work I have done till now is available for interested readers.
This post follows up on my previous post: VBA code to read XML structure of blogger blog entries in export file - Test version, https://ravisiyermisc.blogspot.com/2023/07/vba-code-to-read-xml-structure-of.html . I was able to take it further by writing a VBA macro BlogExportFileToBook (and related macros) which takes a Blogger "Back up content" (or Export) file which is in XML format, as input and produces an HTML blog book file (output file) having all posts and pages contained in the Backup/Export file. It also produces a text log file containing short information about each post or page written to HTML output file, along with some information about the run (or macro execution). [As of now, the macro does not read and output the comments in the blog which are also contained in the blog backup/export file. See 26 Jul 2023 Update later on in the post about test version of dictionary of post links and associated comments.]
1 Aug 2023 Update Start
Made minor changes in this version:
- Added Newline to separate posts and pages in output HTML file. This makes it more convenient to view the output file in Notepad++ as new posts and pages always start on a new line.
- Added to header comment in Module-CreateCommentsDictionary to make it same as/similar to main module header comment
- 20230731-Module-MainBlogExportFileToBook-Code.txt, https://drive.google.com/file/d/1dHrsnmpkvDRKfu99jd04BODsHWn5NHFw/view?usp=drive_link
- 20230731-Module-CreateCommentsDictionary-Code.txt, https://drive.google.com/file/d/1uQboqEhGguzycLZfidg6rA0eJVst_WVp/view?usp=drive_link
1 Aug 2023 Update End
30 Jul 2023 Update
- 20230730-Module-MainBlogExportFileToBook-Code.txt, https://drive.google.com/file/d/1NweprmfTG7UdUwojnqB4dJsf-zh8gdzE/view?usp=drive_link
- 20230730-Module-CreateCommentsDictionary-Code.txt, https://drive.google.com/file/d/1APOS7HCnO8FNASrkn8_aH3HV2-Cfdz3G/view?usp=drive_link
29 Jul 2023 Update
- Module MainBlogExportFileToBook: 20230729-Module-MainBlogExportFileToBook-Code.txt, https://drive.google.com/file/d/17bYTvxZADu6igHnrFd6eLNO3xf_m7ZUu/view?usp=drive_link
- Module CreateCommentsDictionary: 20230729-Module-CreateCommentsDictionary-Code.txt, https://drive.google.com/file/d/1fgYFwEC3S_rTf4cz98NaDIRWZgLG4mG2/view?usp=drive_link
- TNS blog book - full, having posts and pages (blog has no comments): 20230523-tns-blog-BlogBook.html (942 KB, UTF-16 LE encoding), (Download file to PC and open that in browser like Chrome), https://drive.google.com/file/d/1UmC8ZXElWuQzjcSASbPDoMafZl74x56s/view?usp=drive_link
- Spiritual blog book - full, having posts, pages and comments: 20230623-ravisiyer-blog-BlogBook.html (32 MB, UTF-16 LE encoding), (Download file to PC and open that in Notepad++ instead of browser as the file is too big) https://drive.google.com/file/d/1NAswLfHOUjVX3CQxafpo9k_x0s-CGoNU/view?usp=drive_link . This long file has been split into more manageable size files which can be opened by Chrome using another VBA program about which I plan to shortly put up a post.
29 Jul 2023 End-Update
[25 Jul 2023 dated section continues below.]
Here's the folder share for this macro having all the associated files: BlogExportFileToBook, https://drive.google.com/drive/folders/1FRRgqmjjd4FdGvM5kJeS-iRSG4C0OGhA?usp=drive_link .
The macro code is copy-pasted into 20230725v2-Macro-code.txt, https://drive.google.com/file/d/13nrp319uhFP4qzZcehweCdXGuWgJfWpx/view?usp=drive_link . Note that I used the macro in Microsoft Word 2007 program and document.
The VBA code has 3 key subs listed below, along with some support subs/functions (not listed below):
- BlogExportFileToBook - the main function/sub.
- DriverBlogExportFileToBook- invokes BlogExportFileToBook
- DriverPromptInputBlogExportFileToBook - invokes BlogExportFileToBook
The macro creates UNICODE encoding output file and log file to enable proper rendering of Devanagari characters. However, the VBA code used produces UTF-16 LE UNICODE encoding output and log files which makes the file sizes roughly twice of what a UTF-8 encoded file would be. UTF-8 is good enough for my needs. I could not find a way to specify in VBA code that I wanted a UTF-8 file instead of UTF-16 LE. So I converted the UTF-16 LE output files to UTF-8 by running Windows PowerShell commands. "Converting UTF-16 file to UTF-8 on Windows.txt" file has the details of these commands, https://drive.google.com/file/d/1Q_mra6FTCqh9WkGXpjSKy1Qv8tYYnirM/view?usp=drive_link . The log file is not very large and so I left that as UTF-16 LE encoding.
For testing, I used the backup/export files of two of my blogger blogs. Details are given below. [Google Drive Preview renders the output HTML file as text with HTML tags shown and so is not useful. Download the output HTML file to PC using download button in top right of preview page and then open in browser. On my 4 GB RAM Windows 10 PC, Chrome opened the 10.2 MB output file fully after a few minutes and I was able to comfortably browse through the large HTML document. The Log text files are rendered properly by Google Drive Preview.]
Output file: 20230523-tns-blog-BlogBook.html (in UTF-16 LE), 943 KB.
Output file converted to UTF-8: 20230523-tns-blog-BlogBook-utf8.html, 491 KB, https://drive.google.com/file/d/1q6L3SzO2nRiVwmgaZtut8CqTZYzbB-Z6/view?usp=drive_link .
Log file: 20230523-tns-blog-BlogBookLog.txt, 13 KB, https://drive.google.com/file/d/17InWB17KOOfsbBW2cvF4nmhw9D2_O5-b/view?usp=drive_link .
Output file: 20230623-ravisiyermisc-blog-BlogBook.html (in UTF-16 LE), 20.3 MB.
Output file converted to UTF-8: 20230623-ravisiyermisc-blog-BlogBook-utf8.html, 10.2 MB, https://drive.google.com/file/d/1yLJsceYtPPjzjBTu9dG1cUYnA3TSiLU3/view?usp=drive_link
Log file: 20230623-ravisiyermisc-blog-BlogBookLog.txt, 474 KB, https://drive.google.com/file/d/18Q8uKgaCl81_nEt0yajKzypJ8jXnnZdd/view?usp=drive_link .
- The VBA macro code file: 20230726v2-Macro-TestBlogCommentsExtraction-code.txt, https://drive.google.com/file/d/160YhiisAg2Ia7Yf_nuQQi8BsMy2c8Cm5/view?usp=drive_link
- Input Backup/Export XML file: 20230623-ravisiyermisc-blog.xml (12.5 MB), https://drive.google.com/file/d/1JK5p7K_tV81bsJr47krN4NZ9qveUproA/view?usp=drive_link
- Write-out of created comments dictionary (HTML file): 20230623-ravisiyermisc-blog-commnts-dict.html (127 KB), https://drive.google.com/file/d/1vedwRAPm0pQUC_YB0OvvqieMdWQMDXF8/view?usp=drive_link
- Write-out of comments in input file in sequential order (HTML file): 20230623-ravisiyermisc-blog-commnts-seqlist.html (118 KB), https://drive.google.com/file/d/1B4NWsJsm4rOlwfLbdQ3eOyxhyDmzcKDC/view?usp=drive_link
- Log file: 20230623-ravisiyermisc-blog-commnts-log.txt (36 KB), https://drive.google.com/file/d/1HA_DCcOoIAKTglpW_88TbjSBZ8arRewj/view?usp=drive_link
Comments
Post a Comment