Google Apps Script to Create WordPress Blog Book (or Book parts): Test Versions and Stable Version

Last updated on 10 Sep. 2023

10 Sep. 2023 Update: I rarely use the software covered in this post now. To see the current software that I use for creating WordPress blogbooks, please visit: Generated blogbook of my Misc. Tech. WordPress blog using my VBA program WPBlogExportFileToBook,  https://ravisiyermisc.blogspot.com/2023/08/generated-blogbook-of-my-misc-tech.html .

end-Update 10 Sep. 2023

I decided to put up and publish this post now on 9 Jul 2023 while this Google Apps Script work is in progress, as I wanted to ensure that whatever work I have done till now is available for interested readers. 

Top-Level Google Drive publicly shared directory for code and data for this work: BBMakerForWordPress, https://drive.google.com/drive/folders/1g3UPj7rhmjDNMlQoH1dfg8qOgj52V58b?usp=drive_link .

This work builds up on my previous project(s) covered in the following posts: 

The above project(s) were for Blogger blogs which provide blog feed as JSON if appropriate parameter is passed in the request. Wordpress blog feed is in XML by default. To get it in JSON one needs to use a plugin but plugins need a WordPress business plan. That was the stumbling block for me to use the above mentioned project(s) code to make a blog book or blog book parts from one of my WordPress blogs. (I do not update my other WordPress blogs, except for very minor stuff, and so the old blog books for those blogs are enough for my current needs).

So I explored ways to convert Wordpress blog XML feed to JSON and then use code similar to BlogBooksMaker to make the Blog Books from the JSON feed. The sections below cover those efforts in reverse chronological order.

First some general info. that is common to the sections below.

How to get feed in WordPress?

The following URL provides the default blog feed for my WordPress blog: https://ravisiyer.wordpress.com/feed/ .

It returns the latest x number of posts in XML format. Note that it does not return blog pages. x is controlled by WordPress Admin->Settings->Reading->(RSS feed settings) Syndication feeds. It is 10 by default but I could increase it to 150 in which case it returned all 113 published posts of my above blog.

I could not get proper info. on whether parameters are supported for above URL.

But posts for a particular year or month can be obtained by URLs like the following:

How to convert the XML feed to JSON and then write the converted JSON feed to a text file?

I used Chrome browser to get the XML feed and then copy-pasted its contents to the appropriate input textbox in this online xml to json converter: https://codebeautify.org/xmltojson . Then I copy-pasted the converted JSON from the output textbox into a text file. [To have the XML feed as a text file, I used Chrome browser to get the XML feed and then copy-pasted its contents to a text file.]

Online viewing of JSON structure

To understand the JSON structure of the converted JSON feed, I used https://jsonviewer.stack.hu/ . I had to copy-paste the JSON feed into the appropriate input tab, and then see its structure through the viewer tab. This helped me in modifying the code of the Blogger blog maker project(s) (like BlogBooksMaker) to work properly with the WordPress blog XML feed converted to JSON feed.

Converting XML to JSON through code

On searching for JavaScript code for the above, I got the following two links from:

https://stackoverflow.com/questions/1773550/convert-xml-to-json-and-back-using-javascript

I tried out the first one but ran into some issues which will be covered in the appropriate section below.

Then I came across a Google Apps Script solution here: Convert XML to JSON with Apps Script, https://www.labnol.org/code/19952-convert-xml-to-json which I tried out and which seems to work! Those details will be covered in the appropriate section below.

Test Versions and Stable Version in Reverse Chronological Order

Note that all my code in these projects is free for others to use and modify.

20230710-WordPressFeedToBook version

I improved upon the Code.gs file (https://drive.google.com/file/d/1OCszQYZKPdDL4s-ldH_bewn-hFjCqAu0/view?usp=drive_link ) of WPBlogBooksMaker of 20230709-XMLFeed-BlogBooks version considering that I was close to a stable version. The new Code.gs is shared here: https://drive.google.com/file/d/1kpz1LHZ4DEQxLEvzITWVtYNo5yXnAZz4/view?usp=drive_link . The main changes are as follows:
  • Added licensing info. and background info. as comments in top of file.
  • function makeWPBlogFeedBook(blogFeedURL, bookTitle) became the main function doing the blog feed book work. Providing two arguments made the function more general purpose and it was expected to be invoked by other 'run-driver' functions from Run-Driver.gs file of the project.
  • Added blog feed title and creation process start date at beginning of output file.
  • Added proper code to get published date of post and add that to post title in output book/file.
  • Added runType enum and check for normal or test run. Test run would break after some iterations.
  • Added end of book line in book.
  • Added more logging information like is being used in Blogger BlogBooksMaker project.
  • Cleaned up the code and tried to beautify the code a little.
Then I added a Run-Driver.gs file, https://drive.google.com/file/d/1MedXqF-qZ66GEP0YQY476AD1rqWgmLC9/view?usp=drive_link , which has the following functions invoking the main function makeWPBlogFeedBook(blogFeedURL, bookTitle) of Code.gs file. In function names below, BFB is the acronym for Blog Feed Book:
  • function makeBFBWithDefaultValues(): Invokes makeWPBlogFeedBook with default values which will make a book of the default blog feed. This function can be run from Script Editor to test makeWPBlogFeedBook function.
  • function makeRavisiyerWPBlogBook(): Invokes makeWPBlogFeedBook for ravisiyer.wordpress.com full blog. Requires WordPress Admin->Settings->Reading->Syndication feeds to be set to higher than number of published posts in ravisiyer.wordpress.com.
  • function makeBFBForOneYear(): Invokes makeWPBlogFeedBook for a particular blog feed (for a year).
  • function makeBFBYearWise(): Invokes makeWPBlogFeedBook for a particular blog feed with year parameter in a loop.
The XMLToJSON.gs file remained the same as in WPBlogBooksMaker of 20230709-XMLFeed-BlogBooks version. This 20230710-WordPressFeedToBook project/folder's XMLToJSON.gs is shared here:  https://drive.google.com/file/d/1xiUEPZOtAOFWpm2M5vbQSF_OYU18bXRI/view?usp=drive_link .

After some testing which involved many runs, the code stabilized (stable version code file links shared above). Then I did a final set of runs all of which seem to have produced expected output with just one issue of some images being bigger than Google Docs page which was faced in earlier versions as well. The runs info. is provided below.

Run1
Run-Driver.gs function makeBFBWithDefaultValues() was run from Script Editor.
Extract from Run1's RunInfo-ExecLog.txt file,  https://drive.google.com/file/d/1szNJfIlf3NnGJhqZ5yEdZ1A90r6QP2Rk/view?usp=drive_link , is given below (slightly modified): 

Normal run, WP Admin -> Syndication feeds = 150
So makeBFBWithDefaultValues() invocation resulted in full ravisiyer.wordpress.com blog being made into a book.
Output file "Blog Feed Book",  https://docs.google.com/document/d/1y6e1seiwK6CmFUGHS8ZFJ0L2OS5HJYfpODGTNNUGbVQ/edit?usp=drive_link , seems to be as expected in quick look but for image issue. In particular, it seems that all published posts were included in it. The file is 10.1 MB in size and has 413 pages in default page settings of Portrait, Letter, margins of 1 inch, of Google Docs. The image issue is that some images are bigger than page size. The images are NOT cropped as their size can be reduced manually to show the whole image on the page. The formatting issue that I had faced in previous runs for some posts, seems to have reduced significantly as I could not spot those issues in my quick look.

Downloaded "Blog Feed Book" from Google Docs as a .docx (Microsoft Word) document (shared here as a zip link which has download button to avoid Google Docs opening it automatically:  https://drive.google.com/file/d/1rTBwoJ30AkotgnU4nP8fTUkZbOS6UaVM/view?usp=drive_link ). Word (2007) also has the image being too big for page issue but like in Google Docs the full image is present in the document and can be reduced in size manually to get it shown fully on the page. In Word, I was able to add page numbers and generate a Table of Contents for the various posts (file shared here as a zip link which has download button to avoid Google Docs opening it automatically: https://drive.google.com/file/d/14Wo15uxoIGG90O0u1HAt3Y4Z8JJ0_ys-/view?usp=drive_link ). On first look, the TOC seems to be OK but I will need to do a detailed check. The Titles of some posts towards the end of the book, are in a different font from the earlier ones in the TOC, but that itself is not a major issue for me.

Overall, I think this output is good enough for me, though I need to see if I can automatically reduce image sizes to page size (less margin size) in Word. That would fix the image size issue for me.

--- end extract from RunInfo-ExecLog.txt ---

About the sentence in above log, "The formatting issue that I had faced in previous runs for some posts, seems to have reduced significantly as I could not spot those issues in my quick look.": I looked up some previous runs output files I have and confirmed that in those runs, the title of some posts towards the end of the book (e.g. 2011 posts) were in normal font size and not in header font size. That has been fixed in later runs including this project run. I don't know how exactly it got fixed but I noted that later runs have an
"===========================End of Post============================"
line after each post. Did that fix the issue? I don't know.

Many posts of 2011 (towards end of book) are in a different font (both title and contents) from most of the other posts in the book. I don't know whether that is due to different fonts being using in the WordPress blog posts itself. I don't view it as an issue as of now and so am not investigating it.

Some more info. about workaround for some images not fitting in Google Docs default page settings of Portrait, Letter and 1 inch margins (left, right, top and bottom): Changing the page settings of Google Docs (File->Page Setup) to pageless (for above "Blog Feed Book" it takes some time and so one has to wait after clicking OK in the dialog windows), seems to reduce sizes of all images in document to fit the view size! So pageless is a solution for viewing all images in the Google Docs document on computer! I have shared a pageless version of above "Blog Feed Book" here: https://docs.google.com/document/d/11clW4sR7i7vZQ6tg_WBZpsFn4D24nqvB6qfeh_-VRBo/edit?usp=drive_link .

Another point is that in my case, as these docs, as of now, mainly serve as a human-readable backup of the blog (as against XML export), the image issue is manageable. I can leave the images as is, and when I need to retrieve the data for a post, I can manually resize only images of that post that do not fit the page. If somebody does use these docs as a book to read then that person will have to put in the effort to resize those images which do not fit the page and which he/she wants to see in full (or he/she can simply visit the blog post link provided along with the content for that post in the book).

Run2
Run-Driver.gs function makeRavisiyerWPBlogBook() was run from Script Editor.
Extract from Run2's RunInfo-ExecLog.txt file,   https://drive.google.com/file/d/1M0BrLuxx4f67LxxA2IPBWM0-XOgrkMUM/view?usp=drive_link , is given below (slightly modified): 

Run after Run1 with same WP settings (max 150 syndication posts in feed).
Some small changes made in Run-Driver.gs after Run1.
The function invoked - makeRavisiyerWPBlogBook() - provides blog url and book title. So the only key difference in the output seems to be the title of the book which is "ravisiyer.wordpress.com Blog Feed Book", https://docs.google.com/document/d/1w_yMoqmk8hLdupAPlu4s5u2OOU_FZITf82BHHMIwi_M/edit?usp=drive_link . Quick look indicated that output book is as expected, 10 MB, 413 pages in Google Docs.
--- end extract from RunInfo-ExecLog.txt ---

Run3
Run-Driver.gs function makeBFBForOneYear() was run from Script Editor.
Extract from Run3's RunInfo-ExecLog.txt file,  https://drive.google.com/file/d/1yRcN9UlU6NVDMDeAmkWT1-uMEZmbq9LC/view?usp=drive_link , is given below (slightly modified): 

Run after Run2 with same WP settings (max 150 syndication posts in feed).
Quick look indicated that output book - "ravisiyer.wordpress.com - Year 2022 Blog Feed Book",  https://docs.google.com/document/d/1nmS3eCyfWYB9iR5jto4b1cKfxoCxhoJq84SGL5JPGnk/edit?usp=drive_link - is as expected, 3.5 MB, 37 pages in Google Docs.

--- end extract from RunInfo-ExecLog.txt ---

Run4
Run-Driver.gs function makeBFBYearWise() was run from Script Editor.
Extract from Run4's RunInfo-ExecLog.txt file,   https://drive.google.com/file/d/1zSrCG9jIo4iLIg1dr8q3eRlFdKcyCwL-/view?usp=drive_link , is given below (slightly modified): 

Run after Run3 with same WP settings (max 150 syndication posts in feed).
Quick look at few of the books, indicates that output books from "ravisiyer.wordpress.com - Year 2023 Blog Feed Book" to "ravisiyer.wordpress.com - Year 2011 Blog Feed Book" - are as expected [Folder share link: https://drive.google.com/drive/folders/18IoviS0UrrTwNKA9-8Njc3_1NV8Se9Af?usp=drive_link ]. Some of the books do not have posts and so are of only one page - e.g. "ravisiyer.wordpress.com - Year 2020 Blog Feed Book" as the corresponding year did not have any published posts. I think that's fine - a missing 2020 book may raise some doubts. A 1 page book showing no posts for the year may assure user that there are no posts for that year (which he/she, of course, can check in the blog itself).

--- end extract from RunInfo-ExecLog.txt ---

As the above runs have been successful with some issues like image size being too big for the page in some cases but which I am willing to live with as manual resize is possible, I think this 20230710-WordPressFeedToBook version can be viewed as a stable version.

20230709-XMLFeed-BlogBooks version


With 20230709-FullBlogFeed version, it seems that I have a stable version to convert to a release version but it has the manual steps of capturing XML feed and then converting it to JSON using an online converter. I felt it appropriate to invest some time to explore if I could programmatically do that work. Capturing the XML feed was a simple HTTP GET operation to be done with the appropriate URL, and so was no problem. The challenge was in finding code that would then convert the XML feed into JSON feed. Of course, there was no question of me attempting to write that sort of code at all.


I first tried out https://goessner.net/download/prj/jsonxml/ code. I copy-pasted the appropriate code (what I thought was the appropriate code, rather) into Code.gs of a new project (and folder): XMLToJSON, https://drive.google.com/drive/folders/1Fc6eTQUceczpKc6n6s56y2t4UCBz3ATW?usp=drive_link with the Code.gs link being: https://drive.google.com/file/d/1Bcx49DyVy6zavGWBqodPPDesXgTFrP0D/view?usp=drive_link .

I wrote simple run driver function code which is now in file: DriverOldv1.gs , https://drive.google.com/file/d/1bmmav7ofc8YJGdWkwyjmjvz1Ubcxi8dt/view?usp=drive_link . testWPFeedOld() in the file was the run driver function which got the WordPress blog XML feed (10 posts default) and then called xml2json() function of Code.gs (which was copy-pasted from https://goessner.net/download/prj/jsonxml/ ). The invocation of testWPFeedOld() from Script Editor failed with the following messages in the execution log:
--- start error message in Execution Log ---
Error
TypeError: e.normalize is not a function
removeWhite @ Code.gs:137
xml2json @ Code.gs:160
testWPFeedOld @ DriverOldv1.gs:27
--- end error message in Execution Log ---

I could not figure out the error. I tried a variation which also gave the same error. Then I gave up on this approach, at least for the time being.

Next I looked at https://github.com/abdolence/x2js . But I felt it was quite complex to read and understand, and then try out. So I did not try that out.

Meanwhile I came across another solution using Google Apps Script: Convert XML to JSON with Apps Script, https://www.labnol.org/code/19952-convert-xml-to-json . That was a fascinating possibility!

I copy-pasted the code into a new file called XMLtoJSON.gs, https://drive.google.com/file/d/1l86MyScsWHgy1bfmFNtBn93cknDVJOYF/view?usp=drive_link .

I copied out the old Driver.gs as DriverOldv1.gs (and renamed testWPFeed() in it to testWPFeedOld()). In Driver.gs, testWPFeed() called XML_to_JSON() of XMLtoJSON.gs, passing on the XML feed from the WordPress blog and getting back the jsonFeed (as JSON object but I had not noted that clearly at first).

Executing testWPFeed() of Driver.gs from Script Editor did not encounter any error! The execution log, Run-Info-Exec-Log.txt, https://drive.google.com/file/d/1T9FfD2opdJjhdiC1rjCain7Btk6O4UJy/view?usp=drive_link seemed to indicate a successful conversion of XML to JSON!

Now it was time to check out whether the returned JSON object worked with the Blogger blog book maker code with any required small modifications. For that I created a new project (and folder) - WPBlogBooksMaker, https://drive.google.com/drive/folders/1a8KLQnPbObYG3pLEAxaGeMDtlZHvkBwO?usp=drive_link . It had the same code in XMLtoJSON.gs, https://drive.google.com/file/d/1cw8-Tv9IUGhDRXbmmIiwp0iCb4PCrHCy/view?usp=drive_link , as in above project. I moved the testWPFeed() function to a new file: TestXMLToJSON.gs, https://drive.google.com/file/d/1KIlCBm5Nd0BJUSLUDj773L6qy0QqTLkn/view?usp=drive_link .

Its Code.gs file, https://drive.google.com/file/d/1OCszQYZKPdDL4s-ldH_bewn-hFjCqAu0/view?usp=drive_link , had the new function: makeWordPressBlogBooks() which read the XML feed of a WordPress blog (latest 10 posts), converted it to JSON using XML_to_JSON() of XMLtoJSON.gs, processed the JSON object to extract post information into an HTML string, and finally wrote out the HTML string to a Google Docs document (JSON object processing and later part is similar to BlogBooksMaker project for Blogger blogs).

The execution/run of makeWordPressBlogBooks() seemed to be largely successful with the run info. and execution log, Run-Info-Exec-Log.txt shared here: https://drive.google.com/file/d/1mwh5jgIk5TCRWcwFipoYbs25-5THqyMD/view?usp=drive_link . Small issues like large image size resulting in cropped image sometimes, and some text formatting issues need to be checked out.

The output file: Test WP Blog Book, https://docs.google.com/document/d/1vKsl54hCk1EFszc1jCKZUag6B2tsqDiFBqtBlvRrwMI/edit?usp=drive_link , is 4 MB in size and has 117 pages. Its contents seem to be as expected.

20230709-FullBlogFeed version

Google Drive share folder: https://drive.google.com/drive/folders/1QUGtQqDyIvy5QkDPwvu3bQTDNI0ma1l8?usp=drive_link

In this version and run, I decided to use main function JSONTextFiletoBlogBook() used in 20230708-Succ-R1 version version and make necessary modifications to it.

I temporarily set WordPress Admin->Settings->syndication feeds to 150. Note total published posts in the ravisiyer.wordpress.com blog are 113. Using https://ravisiyer.wordpress.com/feed/ in browser I seem to have got the full blog posts feed (not pages). Copy-pasted this content into file: Allpostsfeed.xml of size 1,161 KB, https://drive.google.com/file/d/1ns40_elYaPXtZH9Z_kWLoniWG4hLa-qi/view?usp=drive_link .

Code.gs of this version: https://drive.google.com/file/d/1WMXMAW6WCqqQ_XJvjBVuukhO-WN1uMJ_/view?usp=drive_link

Modified code of function JSONTextFiletoBlogBook() in Code.gs to use Allpostsfeed.json.txt as input file. Also added publication date to title of post in output file. I don't think made any further significant changes to the JSONTextFiletoBlogBook() function which was the main function invoked in this run.

The program completed normally. Its execution log, Exec-Log.txt is shared here: https://drive.google.com/file/d/1ZiTjCawaPLaMrUmS7WENouCwPm89yWea/view?usp=drive_link .

The run output file is: Test WP Blog Book, 10.1 MB, 399 pages, https://docs.google.com/document/d/1qGoszpuYBj8TLy_OsGMzP1-nQ7k_dowTdGINB0uLGjA/edit?usp=drive_link .

The issues in this output file are:
a) Images seem to be printed in large size and so some images get cropped.
b) Formatting is much better than in ExportData test case but still some post titles are not printed in large and bold font.
Otherwise it seems OK. I think it has got all the posts of the blog (but not the pages).

20230708-09-ExportData-Test version

Google Drive share folder: https://drive.google.com/drive/folders/1Sst4vbhuxO44arT4lxkad4fGSGs3UEin?usp=drive_link

I wondered whether the XML export of files of my WordPress blog could be used instead of just the blog feed, in the procedure used in 20230708-Succ-R1 version described below, and so tried out the same. I followed the same procedure except that I used my March 2023 full blog export file: raviiyerorg.wordpress.2023-03-27.000.xml, https://drive.google.com/file/d/1qKtn3_3lC732KLiW18Qm2w-Pv34ZZ4in/view?usp=drive_link for conversion to JSON: raviiyerorg.wordpress.2023-03-27.000.json.txt, https://drive.google.com/file/d/1IQNh8DIJBEaJ_wVVIl1SFOxE7AlXOVlg/view?usp=drive_link , in step 2 of the procedure.

For the test run, in Code.gs for this version: https://drive.google.com/file/d/1PZzGsJB3NSDP-H6VWAuld0jfRO8DY9p7/view?usp=drive_link , I stopped the writing of posts after around 200 posts and changed the name of the input file to raviiyerorg.wordpress.2023-03-27.000.json.txt. I added a new top-level function ExportJSONTextFiletoBlogBook() which was copy-pasted from the earlier main function JSONTextFiletoBlogBook() and then modified for this purpose. The JSON structure was slightly different from earlier case and so I added/modified code to handle that. I also added code to check whether the entry is post or page in which I case I processed it and so skipped attachment entries (the initial part of the export file had many attachment entries). I don't think I made any other significant changes to the code from the earlier 20230708-Succ-R1 version.

The run seemed to get the around 200 posts and pages text and images but messed up the formatting in a big way! I dropped further work into this as fixing the format messing up part may have sucked too much time. Here are the share links for the output document: Test WP Blog Book, https://docs.google.com/document/d/1BDjkfUw1cnl72KkN_vd6s1lVu_dvTZqrwi0aryTw60c/edit?usp=drive_link and the execution log: Exec-Log.txt, https://drive.google.com/file/d/1kRiT_6wjGrXsyfWID2New2r-jQUHzWfm/view?usp=drive_link .

20230708-Succ-R1 version

Google Drive share folder:  https://drive.google.com/drive/folders/1SzMbncxDXGXz6GCKFUxedpBtX808RbQ6?usp=drive_link

This is the first version and used this procedure, if I recall correctly:

1) Used URL: https://ravisiyer.wordpress.com/feed/ in Chrome browser to get the default blog feed of last 10 posts in XML.

2) Copy-pasted this XML from Chrome browser (or from a text file into which I had saved it), into https://codebeautify.org/xmltojson and copy-pasted the converted JSON text/content from the output textbox into a text file (testxmltojson.txt, https://drive.google.com/file/d/111wdgo7zJVAaPrAYz8iurGEi9eWYhBLX/view?usp=drive_link ). I uploaded this file into my Google Drive (root folder, if I recall correctly).

3) Explored using Google Apps Script code (Code.gs) to read the file and then provide this JSON feed to code of previous Blogger Blog Books maker project(s) to create a Blog Book from this JSON feed. As the JSON structure was slightly different from the Blogger Boog Books maker project(s), the code had to be modified suitably. This code work involved multiple steps as I had to do fair amount of learning and that is described later on in this section. But eventually the final version of this version code (Code.gs, https://drive.google.com/file/d/1G_MFVGtir5N8_51V48iS_ntgPuICVGx1/view?usp=drive_link ) worked and produced a blog book having all the latest 10 posts of the specified WordPress blog - "Test WP Blog Book", https://docs.google.com/document/d/1Af5j2MFWSHzlxjxGs0JwNij3sermF1KS32AxsBLAgMo/edit?usp=drive_link . The Execution log for the run, ExecLog.txt, is shared here: https://drive.google.com/file/d/1V8HidHQNVnyPvSNjAa2JTMa7kACGiBP6/view?usp=drive_link .

More info. about functions in Code.gs 

  • JSONTextFiletoBlogBook(): The main function of this file. Read the JSON feed in text file in Google Drive and pass it on to code that processes it and writes out the posts in the feed into a Google Docs document. It seemed to work and wrote out both text and images.
  • readTextFile(): Read a text file specified by its filename in Google Drive and write out its contents to log file. This worked and so now I could read the JSON feed in a text file in code and pass it on to other code to process and write out the posts in a document. I later commented out this code.
  • test() - 2 versions: First version of test had only the related code of LoadData() used in a way to list filenames of all files in my Google Drive. I tried it out and it listed the files in my Drive but it was not what I wanted. However, it ran successfully and that was good to see. I later commented out the first version of test() function code. I modified it to print only files of a particular name in my Google Drive. That modification worked but I later commented it out.
  • LoadData(): As I was browsing for Google Apps Script code to read a text file from Google Drive, I came across this function. There are two copies of the same function code in the file. I think I had presumed that I may make some changes to the code and so made an extra copy of the original code for easy reference. But then I decided to copy-paste relevant code from this into test function and try it out there.

Comments

Archive

Show more