Last updated on 9 Sep. 2023
9 Sep. 2023 Update: I rarely use the software covered in this post now. To see the current software for creating Blogger blogbooks that I use, please visit: Short User Guide to creating Blogger Blogbooks from Backup/Export File using ExportFileFilterAndGenBook and another VBA projects' macros/code (free and open source), https://ravisiyermisc.blogspot.com/2023/09/short-user-guide-to-creating-blogger.html .
end-Update 9 Sep. 2023
I decided to put up and publish this post now on 14 Jul 2023 while this (Microsoft) Visual Basic for Applications (VBA) work is in progress, as I wanted to ensure that whatever work I have done till now is available for interested readers.
The following VBA code takes in a blog feed request (hardcoded as of now) and generates an output html file which can be inserted into Word as Web Page text thereby having a blog feed book as a Word document. To this Word document you can add page numbers, TOC, Title etc.
Note that I was earlier exploring using a Google Apps Script (GAS) solution for this task. That solution is described here: BlogBooksMaker Google Apps Script to Create Blogger Blog Book (or Book parts): Description and Stable Version Info, https://ravisiyermisc.blogspot.com/2023/07/blogbooksmaker-google-apps-script-to.html . The reason I started looking at VBA option for this task is that Google Apps Script (GAS) execution time limits became the major stumbling block for me when I was trying to use it to get my final blog books. Further details are provided later on in this post.
--- Start BlogFeedToBook VBA code ---
Option Explicit
Function GetHTTPResponse(url As String) As String
Dim msXML As Object
Set msXML = CreateObject("Microsoft.XMLHTTP")
With msXML
.Open "Get", url, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=utf-8"
.send
GetHTTPResponse = .responseText
End With
Set msXML = Nothing
End Function
Sub BlogFeedToBook()
Dim XDoc As Object
Dim StartTime, EndTime As String
StartTime = Now
Debug.Print "Macro Sub. execution started. Date & Time is: " & StartTime
Set XDoc = CreateObject("MSXML2.DOMDocument")
XDoc.async = False: XDoc.validateOnParse = False
Dim BlogFeed As String
'BlogFeed = GetHTTPResponse("https://ravisiyermisc.blogspot.com/feeds/posts/default?max-results=150")
BlogFeed = GetHTTPResponse("https://www.blogger.com/feeds/8377239296413777485/posts/default?max-results=500")
' Above request is for whole of eklavyasai.blogspot.com (has 216 published posts as of 14 Jul 2023)
' using blog ID in request.
' Gets done (from start till MsgBox saying output file is written) in below 3 seconds on my desktop PC
' Output file for run was: testout3.html. Its stats: 3,786 KB size,
XDoc.LoadXML (BlogFeed)
Dim Entries As IXMLDOMNodeList
Dim Entry As IXMLDOMNode
Set Entries = XDoc.SelectNodes("/feed/entry")
Dim DebugMessage As String
DebugMessage = "Post Entries(0) -> baseName " & Entries(0).BaseName & "Entries.length = " & Entries.Length
' MsgBox DebugMessage
Debug.Print DebugMessage
Dim FSO As New FileSystemObject
Set FSO = CreateObject("Scripting.FileSystemObject")
Dim OutputFileName As String
OutputFileName = "C:\Users\Ravi-user\Desktop\Temp\BlogBookGen\VBA\RSS\testout3.html"
Dim OutputFileStream As TextStream
Set OutputFileStream = FSO.CreateTextFile(OutputFileName, True, True) ' Overwrite = True, Unicode = True
' Unicode True is needed to support Devanagari characters
' Default is ASCII file for which TextStream.Write()
' fails with error code 5 Invalid procedure or argument
Dim ContentHTML As String
Dim PostTitle As String
ContentHTML = "<html><body>"
Dim i As Integer
For i = 0 To Entries.Length - 1
PostTitle = Entries(i).ChildNodes(3).Text
ContentHTML = ContentHTML & "<h1>" & PostTitle & "</h1><br/>"
ContentHTML = ContentHTML & Entries(i).ChildNodes(4).Text & "<br/><br/>"
ContentHTML = ContentHTML & _
"============================End of Post==========================<br/><br/>"
'Debug.Print "ContentHTML length = " & Len(ContentHTML) & vbNewLine
'MsgBox "ContentHTML length = " & Len(ContentHTML) & vbNewLine
OutputFileStream.Write ContentHTML
'Debug.Print "Wrote to output file, Post Title: " & PostTitle & vbNewLine
'Above line floods Immediate Window resulting in truncation of earlier Debug.Print entries for
' the same run.
' If I really need the above statement data I should write it to a separate log file
ContentHTML = ""
Next i
ContentHTML = _
"<br/>============================<b>End of Book</b>========================="
ContentHTML = ContentHTML & "</body></html>"
OutputFileStream.Write ContentHTML
OutputFileStream.Close
EndTime = Now
DebugMessage = "Output file: '" & OutputFileName & "' written." & vbNewLine _
& "Time taken for this run of BlogFeedToBook() macro (in seconds): " _
& DateDiff("s", StartTime, EndTime) & vbNewLine _
& "Current date & time is: " & EndTime
Debug.Print DebugMessage
MsgBox DebugMessage
Set XDoc = Nothing
'Debug.Print "Macro Sub. execution finishing. Date & Time is: " & Now
End Sub
--- End BlogFeedToBook VBA code ---
Log for BlogFeedToBook macro run on 15 July 2023 using above code
BlogFeedToBook macro run log
For blog feed:
BlogFeed = GetHTTPResponse("https://www.blogger.com/feeds/8377239296413777485/posts/default?max-results=500")
' Above request is for whole of eklavyasai.blogspot.com (has 216 published posts as of 14 Jul 2023)
' using blog ID in request.
----
Output file testout3.html not present/not existing at start of run
==== Execution Log (Immediate Window) Start ========
Macro Sub. execution started. Date & Time is: 15-Jul-23 6:10:54 PM
Post Entries(0) -> baseName entryEntries.length = 216
Output file: 'C:\Users\Ravi-user\Desktop\Temp\BlogBookGen\VBA\RSS\testout3.html' written.
Time taken for this run of BlogFeedToBook() macro (in seconds): 2
Current date & time is: 15-Jul-23 6:10:56 PM
==== Execution Log (Immediate Window) End ==========
Output file testout3.html size: 3,786 KB, seems OK at quick look
Note that this output file has linked images/pictures and not embedded images/pictures. That's why its size is not so big and it gets created very quickly.
Opened output file in Word 2007, switched to Print Layout from Web Layout in which it was first shown, and then saved as Word document (.docx): testout3.docx, file size: 916 KB, pages 779
I think the run time above varies a little with Internet speed. Some time back I got a 4 second run for the same blog feed. Note that I use Airtel mobile Internet in Puttaparthi and the Internet Download speed I get during daytime is usually between 2 Mbps to 8 Mbps which can go down lower at peak times and go up to even 25 Mbps in daytime (presumably when the load is low).
Whether it is 2 seconds or 3 seconds or 4 seconds, getting linked pictures blog book of 216 posts of a blog so fast and that too using VBA in Word document, is awesome! I am super-impressed and very happy with this.
--- end BlogFeedBook macro run log ---
--- Start ChangeLinkedImageToEmbedded VBA code ---
Option Explicit
Sub ChangeLinkedImageToEmbedded()
Dim i, j As Integer
Dim StartStepTime, EndStepTime As Date
Dim ExecutionStepTime, ExecutionTimeTotal As Integer
Dim Msg, Style, Response
Dim NumPicsChangedInStep As Integer
Debug.Print "Started executing macro Sub. Date & Time is: " & Now
NumPicsChangedInStep = 50 ' Pause processing after these many number of inline pics and ask user whether
' to continue (or not)
Dim StepNum As Integer
' Below code based on https://www.extendoffice.com/documents/word/5423-word-convert-linked-image-to-embedded.html
' Many thanks to author and publisher of above page
i = 0
StartStepTime = Now
StepNum = 1
ExecutionTimeTotal = 0
Dim xIShape As InlineShape
For Each xIShape In ActiveDocument.InlineShapes
With xIShape
If .Type = wdInlineShapeLinkedPicture Then
.LinkFormat.SavePictureWithDocument = True
.LinkFormat.BreakLink
i = i + 1
j = i Mod NumPicsChangedInStep
If j = 0 Then
EndStepTime = Now
ExecutionStepTime = DateDiff("s", StartStepTime, EndStepTime) ' Checked with some test code about
' seconds being rounded in this case
' and confirmed that it is so.
ExecutionTimeTotal = ExecutionTimeTotal + ExecutionStepTime
'Msg = "i Mod " & NumPicsChangedInStep & " is 0 and i = " & i & ". i is Linked pictures number." _
' & vbNewLine &
Msg = "Step no. " & StepNum & " completed. " & _
"Number of Inline pictures changed so far from Linked to Embedded is: " & i _
& vbNewLine & "Time taken for last step of " & NumPicsChangedInStep & _
" Inline pictures being changed from Linked to Embedded (in seconds): " & _
ExecutionStepTime
Debug.Print Msg
Msg = Msg & vbNewLine & "Do you want to continue ?"
Style = vbYesNo Or vbDefaultButton1 ' Define buttons.
Response = MsgBox(Msg, Style)
If Response = vbYes Then ' User chose Yes.
' Debug.Print "User chose Yes (continue) when i (Inline linked pictures number) was: " & i
Debug.Print "User chose Yes (continue) at end of Step " & StepNum
StartStepTime = Now
StepNum = StepNum + 1
Else ' User chose No.
Debug.Print "User chose No (don't continue) at end of Step " & StepNum
Debug.Print "Exiting For"
Exit For
End If
End If
End If
End With
Next
If Response = vbYes Then
' Last step typically would have been partial or For loop may have exited as 0 images were left to be
' changed
j = i Mod NumPicsChangedInStep
If j = 0 Then
'For loop exited as 0 images were left to be changed
Else
' Partial last step
EndStepTime = Now
ExecutionStepTime = DateDiff("s", StartStepTime, EndStepTime)
ExecutionTimeTotal = ExecutionTimeTotal + ExecutionStepTime
Msg = "Last step of run, step no. " & StepNum & ", was partial." & _
"Number of Inline pictures changed from Linked to Embedded in this last step " _
& "= " & j & vbNewLine _
& "Time taken for the this last step (in seconds): " & _
ExecutionStepTime
MsgBox (Msg)
End If
End If
Msg = "Just before exit from macro Sub. Total number of Inline pictures changed from Linked to Embedded " _
& "in this run is: " & i & vbNewLine _
& "Total execution time for all steps of changing Inline pictures from Linked to Embedded" & vbNewLine _
& " excluding MsgBox interaction time with user (in seconds): " & ExecutionTimeTotal
MsgBox Msg
Debug.Print Msg
Debug.Print "Just before exit from macro Sub. Date & Time is: " & Now
End Sub
Sub CountInlineLinkedImages()
Dim i, j As Integer
i = 0
Dim Msg, Style, Response
Dim xIShape As InlineShape
For Each xIShape In ActiveDocument.InlineShapes
With xIShape
If .Type = wdInlineShapeLinkedPicture Then
i = i + 1
End If
End With
Next
Msg = "Number of Inline linked images/pictures = " & i
MsgBox Msg
Debug.Print Msg
End Sub
--- End ChangeLinkedImageToEmbedded VBA code ---
Log for ChangeLinkedImageToEmbedded macro run on 15 July 2023 using above code
In log below, please read 'Time taken for last step of 50 Inline pictures'... as 'Time taken for previous step of 50 Inline pictures'... Later I will be changing the code too.
Also above log omits the last step of 40 images being changed from linked to embedded. I recall the MsgBox being shown but the log (Debug.Print) does not have it. I have spotted the bug and will fix it in later versions.
--- Start ChangeLinkedImageToEmbedded macro run log ---
ChangeLinkedImageToEmbedded() macro run log
testout3.docx size before macro run: 916 KB, pages: 779
==== Execution Log (Immediate Window) Start ========
Started executing macro Sub. Date & Time is: 15-Jul-23 7:28:46 PM
Step no. 1 completed. Number of Inline pictures changed so far from Linked to Embedded is: 50
Time taken for last step of 50 Inline pictures being changed from Linked to Embedded (in seconds): 13
User chose Yes (continue) at end of Step 1
Step no. 2 completed. Number of Inline pictures changed so far from Linked to Embedded is: 100
Time taken for last step of 50 Inline pictures being changed from Linked to Embedded (in seconds): 10
User chose Yes (continue) at end of Step 2
Step no. 3 completed. Number of Inline pictures changed so far from Linked to Embedded is: 150
Time taken for last step of 50 Inline pictures being changed from Linked to Embedded (in seconds): 10
User chose Yes (continue) at end of Step 3
Step no. 4 completed. Number of Inline pictures changed so far from Linked to Embedded is: 200
Time taken for last step of 50 Inline pictures being changed from Linked to Embedded (in seconds): 12
User chose Yes (continue) at end of Step 4
Step no. 5 completed. Number of Inline pictures changed so far from Linked to Embedded is: 250
Time taken for last step of 50 Inline pictures being changed from Linked to Embedded (in seconds): 9
User chose Yes (continue) at end of Step 5
Just before exit from macro Sub. Total number of Inline pictures changed from Linked to Embedded in this run is: 290
Total execution time for all steps of changing Inline pictures from Linked to Embedded
excluding MsgBox interaction time with user (in seconds): 70
Just before exit from macro Sub. Date & Time is: 15-Jul-23 7:30:05 PM
==== Execution Log (Immediate Window) End ==========
On saving testout3.docx, its size increased to 18,641 KB. The document seemed OK when I took a quick look.
Saving As PDF from Word took around 20 seconds (measured manually by using digital clock). The PDF file - testout3.pdf - has size of 15,809 KB, and has 779 pages. I opened the PDF document in Chrome and had a quick look at it. It seemed OK.
--- end ChangeLinkedImageToEmbedded macro run log ---
Steps to use VBA macros BlogFeedToBook() and ChangeLinkedImageToEmbedded()
I use Microsoft Word 2007 on my PC desktop and so the steps below are for that. It may be slightly different for other versions of Microsoft Word.
1) Open an empty document in Word
2) If Developer Menu is not shown in Word, you have enable it. To enable Developer tab in Word 2007: Office button -> Word Options -> Popular -> "Show Developer tab in the ribbon" has to be checked and then OK has to be pressed in the dialog. More details: https://support.microsoft.com/en-gb/office/show-the-developer-tab-in-word-e356706f-1891-4bb8-8d72-f57a51146792 .
3) To add BlogFeedToBook() VBA macro to Word document that has no VBA macros: Click Developer tab -> Macros. Add any name like Test in Macro name box. Click on Create button. This will open Visual Basic Editor window with few lines of code for the Macro. Replace those lines of code by the BlogFeedToBook() VBA macro and its helper function GetHTTPResponse related code given above. Now the BlogFeedToBook() will be listed as a macro in Word under View->Macros. More details: https://support.microsoft.com/en-us/office/create-or-run-a-macro-c6b99036-905c-49a6-818a-dfb98b7c3c9c#ID0EBBD=Office_2007 .
4) Add references of a) Microsoft Scripting Runtime and b) Microsoft XML, v3.0 [Perhaps it will work with other versions of Microsoft XML like v5.0 and v6.0 but I used v3.0.] To add these references, in Microsoft Visual Basic window, click Tools -> References which shows up a dialog where these references can be looked up and added. Note that if a macro is running, the References menu is greyed out. Use Reset button on toolbar to end Macro run after which References menu is enabled.
5) Modify blog feed url in call to function GetHTTPResponse in VBA macro BlogFeedToBook(), to use the blog feed you want to make into a book. The relevant line of (uncommented) code from above code is:
BlogFeed = GetHTTPResponse("https://www.blogger.com/feeds/8377239296413777485/posts/default?max-results=500")
Here the blog feed url is "https://www.blogger.com/feeds/8377239296413777485/posts/default?max-results=500" which gets max of 500 posts for blog specified through Blog Id. The blog is eklavyasai.blogspot.com which has 216 published posts as of 14 Jul 2023. So the blog feed url or request retrieves all posts of the blog.
There is also a commented out code line as follows:
'BlogFeed = GetHTTPResponse("https://ravisiyermisc.blogspot.com/feeds/posts/default?max-results=150")
I have tested the above line in earlier version and so it should work now too, IMHO. Here the blog feed url is "https://ravisiyermisc.blogspot.com/feeds/posts/default?max-results=150" which gets max of 150 posts of blog ravisiyermisc.blogspot.com which has over 1000 published posts as of 14 Jul 2023.
6) Modify output file folder and filename. The relevant line of code from above code is:
OutputFileName = "C:\Users\Ravi-user\Desktop\Temp\BlogBookGen\VBA\RSS\testout3.html"
You can modify it to something like:
OutputFileName = "C:\Users\username\Desktop\Temp\blogfeedbook.html" ' username is the Windows user name of user
7) To run VBA macro BlogFeedToBook()
In the empty Word document, click on View -> Macros. In the dialog that comes up, choose BlogFeedToBook() macro and then click on Run button.
At the end of the run a message box will be shown stating that the output file has been written. On acknowledging the message box, the macro will end execution.
8) To insert the output html file (output3.html) in the empty Word document as web document:
Click on Insert tab. in Toolbar. Then click the drop-down next to Object toolbar icon and choose "Text from File ..." command. In the dialog that appears, set File type to "All Web Pages (*.htm, *.html, *.mht, *.mhtml)".
Navigate to folder with output file and choose the output file (output3.html in orig. code). Click on Insert button in dialog. The contents of .html file should now be rendered as processed HTML (and not raw HTML) in the empty Word document. It takes a little time to do so and when viewing the images in the file, it takes some time to show the image (The images are not stored in the Word document and may be accessed from blogger website).
The content may show initially in "Web Layout" without any margins. You can change it to "Print Layout" by choosing the associated icon in right bottom icon-strip after which typical document margins will be shown and like in usual Word documents, one can change the page size, margins, orientation, add Table of Contents, add a Title page to the document etc. For the full eklavyasai.blogspot.com blog feed used in above code example, the Word document shows 779 pages (Letter size, 1 inch margins and Portrait mode).
Click on Save button to save the Word document.
Alternative way: Open .html file in Word (using Open With command which appears on mouse right-click on .html file (on my PC, at least))
If Word shows document in "Web Layout" (without margins and pages), switch to "Print Layout" (in Word 2007, by clicking on "Print Layout" button in right bottom of window). Now margins will appear and content will be divided into pages (normal view for a Word document).
Save as Word document (.docx) file
Close document.
Now the HTML file is in .docx format. Open the .docx file to confirm that it shows the content properly.
9) Pictures of the blog feed book are saved as links (linked pictures) and NOT as embedded pictures in the Word document. This results in two issues:
a) When scrolling through the Word document, at times the picture is blank initially. In such cases, usually after some seconds, the picture is shown (presumably after it loads from the Internet).
b) Saving As PDF saves some (or perhaps many) pictures of the document as almost blank placeholders without the corresponding picture. I think this is happening due to Word itself not having the pictures readily available and so not providing it to the PDF making function/code. This is a major issue for me as I want my blog books to have the images embedded in them, so that they are independent of the blog. Even if the blog disappears, the blog book will show the pictures.
I tried the procedure described here: Transform linked images to embedded images, https://superuser.com/questions/294978/transform-linked-images-to-embedded-images . The procedure is: Select entire document (Ctrl + A), Office button -> Prepare -> Edit Links to Files, select all files in the listed links and then select the "Save picture in Document' checkbox. Finally click OK.
This procedure worked for a small test document with 5 images. But when I used it with a larger document with many images, I could not select all the listed links as the list would get populated slowly when I would scroll, and scroll to end did not work. Then Word crashed (exited suddenly)! After that, even after reboot, choosing Office button -> Prepare -> Edit Links to Files resulted in Word exiting! So this option did not work for me.
VBA approaches:
https://software-solutions-online.com/word-vba-loop-through-images/ helped me start off on code for iterating through images . I had to modify the code slightly (add ActiveDocument.) to try it out. I got some idea of how to go through InlineShapes in Word document. I tested a variation of it on the blog books created by BlogFeedToBook() macro and later step of inserting the output file HTML into Word document. The variation provided some count type info. about the images as it looped through them. If I recall correctly, I saw that all the images were "InlineShapes"
Ref. pages:
[Methods and Properties of the objects are listed in a pane on the left of the page.]
InlineShape object (Word), https://learn.microsoft.com/en-us/office/vba/api/word.inlineshape
InlineShapes object (Word), https://learn.microsoft.com/en-us/office/vba/api/word.inlineshapes
WdInlineShapeType enumeration (Word), https://learn.microsoft.com/en-us/office/vba/api/word.wdinlineshapetype
How to convert all linked images to embedded in Word document?, https://www.extendoffice.com/documents/word/5423-word-convert-linked-image-to-embedded.html gave me the solution. Many thanks to the author and publisher of the page.
The ChangeLinkedImageToEmbedded() macro uses code based on the above and builds up on it.
I first tried used the above mentioned extendoffice.com code (perhaps in slightly different form) in an earlier version of ChangeLinkedImageToEmbedded() function/macro. I tried this earlier version of ChangeLinkedImageToEmbedded() on a small document created from a larger blog book document by trimming it. This small document had only 5 images. The code worked well though it took some time (some seconds). It was clear that each image being changed from linked picture to embedded picture was taking time. But it did not take too long. I had added a Debug.Print line within the loop printing numeric position of the inline image being processed (1, 2, 3 etc.). 5 messages appeared for 5 pictures. On saving the file, its size increased from earlier indicating images stored in the document now. Then I checked out saving this file as PDF and saw that the PDF pictures loaded fast and the PDF was of larger size than earlier (it seems it also had the embedded images).
Next I tried out the same procedure (using an earlier version of ChangeLinkedImageToEmbedded() function/macro) on "testout2-Worldly-LastTenPosts.docx" which had last ten posts of my Worldly blog with linked pictures, file size: 118 KB and 79 pages. On running the macro code, Word and associated Visual Basic windows seemed to freeze but after some time (few minutes or maybe it was less than a minute), the program had finished the macro and was no longer frozen. Saving the document resulted in size increasing to 440 KB with pages being same as 79. Opening the document and saving it as PDF resulted in PDF file of size 843 KB and 79 pages which showed the pictures correctly.
Next I tried out the procedure (using an earlier version of ChangeLinkedImageToEmbedded() function/macro) on "testout2-Worldly-Last150Posts-EmbTrial.docx" which had last 150 posts of my Worldly blog with linked pictures, file size: 1447 KB and 1258 pages. On running the macro code, like in earlier case above, Word and associated Visual Basic windows seemed to freeze. But this time the freeze was for much longer - perhaps 5 to 10 minutes but as I did not check the time, I am not sure. I was wondering whether the program had got stuck but Task Manager showed that the Word process was making network requests, disk requests and using CPU. So I waited and waited. Eventually it got over. On saving the Word document, its size ballooned from 1447 KB to 22,619 KB with pages being same at 1258! Opening the Word document and saving it as PDF resulted in PDF file of size 18,237 KB with same number of pages as Word document - 1258. I scrolled through the whole PDF file. It seems to be OK and all the pictures I could spot in my scroll were being shown (no case of placeholder instead of picture).
In the current version of ChangeLinkedImageToEmbedded() function/macro, I have introduced step-wise processing of the pictures, with the user being prompted for continue (Yes or No) after a specific number of images are done. This gives good control over the Word program execution. Users can even exit after a few steps, close Word etc. Later the users can reopen the file and continue the process from where they left off. The ChangeLinkedImageToEmbedded macro run log given earlier on in this post is for the current version and the log entries show the step wise process.
Notes
Migrating Code to other platforms
I think the code should be easy to migrate to VB.Net or C#.Net. The main code relies on the following classes:
1) Microsoft.XMLHTTP
2) MSXML2.DOMDocument
3) TextStream
These classes should be available on regular Visual Studio development environments. So I think any person having such an environment on his/her computer should be able to use this code with small changes, and create an executable program file which can then be used by others on Windows computers who do not have Visual Studio development environment.
I removed the old Visual Studio development environment (and other development environments like Java) from my PC many years ago, as I wanted to avoid doing such code. Now I don't want to go through installing it and then creating the executable. I leave that work for others if they are interested.
Further, I think that this code could also be migrated with some more changes to non Microsoft development environments like Java or perhaps Python as they would be having equivalent classes to above mentioned Microsoft classes.
Office365? Why did I not use my Google Apps Script BlogBooksMaker code?
Can it also work on the free online version of Office365? I don't know. I do use Office365 at times for Word's Dictate (Voice to Text) feature (on PC and on mobile). But I don't know about its VBA support.
But even if it has VBA support, will it have the same issue of execution time limits that became the major stumbling block for me with Google Apps Script (GAS) when I was trying to use it to get my final blog books? Note that with 50 posts per book part, GAS works for all my blogs without any crash/errors. But even with posts per book part being 100, it works for some cases and it crashes for some cases with typical error being Drive Insert error. 50 posts per book part results in 35 book parts for my Spiritual blog! That's too much! I would have to merge these documents at least into one book part per year thereby reducing the book parts to around 10 or 11 (and could reduce it further by combining small book parts for some years with other book parts). I found that frustrating.
Then I came to know that Blogger blog feed is max. 150 posts when using blog address (like ravisiyer.blogspot.com) but is max. 500 posts when using blog Id! I saw that the GAS code was able to retrieve all 216 published posts of one of my blogs when I used blog Id in the feed request and did the processing work for building the content HTML fast. The point where it slowed down in a big way was the writing to Google Drive document, and even there I was willing to wait for long (10 minutes, half an hour, no problem), but after some minutes (maybe 4 to 5 minutes), it aborted the program with the typical Drive Insert error!!! Here's the error I got yesterday (13 Jul. 2023) when I tried to use BlogBooksMaker GAS project to make my whole ravisiyermisc.blogspot.com blog with 150 posts per part:
----- start error message and nearby messages ----
Jul 13, 2023, 6:59:31 PM Info Post Title (80 chars), Pub. date: 'Once Upon a Time in the West - Famous Western genre movie made in 1968', 2020-02-08T17:08:00.001+05:30
Jul 13, 2023, 6:59:31 PM Info 150.0 in this fetch, 150.0 overall
Jul 13, 2023, 6:59:32 PM Info contenthtml.length = 2998541
Jul 13, 2023, 7:00:28 PM Info Error is GoogleJsonResponseException: API call to drive.files.insert failed with error: Bad Request
Jul 13, 2023, 7:00:28 PM Info Failure to write output Google Docs file. Error Message: GoogleJsonResponseException: API call to drive.files.insert failed with error: Bad Request
----- end error message and nearby messages ----
This error happened with the first blog book part itself. So the run created no blog book parts! Yes, the HTML data to be written was long: 2998541 bytes which is 2928 KB. And I think Google Docs pulls in the referenced images in the HTML and so its size bloats (perhaps to around 23 to 25 MB based on per year blog book parts created in earlier runs for the same blog).
I felt rather cornered and got really bugged.
That's when I started thinking about exploring VBA option as it would be running on my Desktop and so not have any significant execution time limits or significant file writing time limits, and came up with this solution. Note that as I have Office 2007 on my PC, VBA is available through it (I mean, I did not/could not uninstall VBA like I uninstalled Visual Studio some years ago).
Useful Utilities
https://countwordsfree.com/xmlviewer was quite helpful for me to figure out the XML structure of the blog feed. I requested 2 latest posts of this blog in a browser and copy-pasted the returned XML to a file. This data I could copy-paste to above utility window and see the tree structure well. I could correlate it to the data shown in VBA debugger in the MSXML2.DOMDocument.DocumentElement variable having the same data. This enabled me to figure out the code to write to access title and content of posts in the DocumentElement variable.
I also first read the XML feed from the file I had created above instead of requesting it over HTTP/HTTPS. Once I had got that code working, I moved to getting to the blog feed using HTTP/HTTPS GET request.
Comments
Post a Comment