Google Apps Script to Create WordPress Blog Book (or Book parts): Test Versions and Stable Version
Last updated on 10 Sep. 2023
10 Sep. 2023 Update: I rarely use the software covered in this post now. To see the current software that I use for creating WordPress blogbooks, please visit: Generated blogbook of my Misc. Tech. WordPress blog using my VBA program WPBlogExportFileToBook, https://ravisiyermisc.blogspot.com/2023/08/generated-blogbook-of-my-misc-tech.html .
end-Update 10 Sep. 2023
I decided to put up and publish this post now on 9 Jul 2023 while this Google Apps Script work is in progress, as I wanted to ensure that whatever work I have done till now is available for interested readers.
Top-Level Google Drive publicly shared directory for code and data for this work: BBMakerForWordPress, https://drive.google.com/drive/folders/1g3UPj7rhmjDNMlQoH1dfg8qOgj52V58b?usp=drive_link .
This work builds up on my previous project(s) covered in the following posts:
- BlogBooksMaker Google Apps Script to Create Blogger Blog Book (or Book parts): Description and Stable Version Info, https://ravisiyermisc.blogspot.com/2023/07/blogbooksmaker-google-apps-script-to.html
- Google Apps Script to Create Blogger Blog Book (or Book parts): Test Versions and Stable Version, https://ravisiyermisc.blogspot.com/2023/06/google-apps-script-to-create-blogger.html .
The above project(s) were for Blogger blogs which provide blog feed as JSON if appropriate parameter is passed in the request. Wordpress blog feed is in XML by default. To get it in JSON one needs to use a plugin but plugins need a WordPress business plan. That was the stumbling block for me to use the above mentioned project(s) code to make a blog book or blog book parts from one of my WordPress blogs. (I do not update my other WordPress blogs, except for very minor stuff, and so the old blog books for those blogs are enough for my current needs).
So I explored ways to convert Wordpress blog XML feed to JSON and then use code similar to BlogBooksMaker to make the Blog Books from the JSON feed. The sections below cover those efforts in reverse chronological order.
First some general info. that is common to the sections below.
How to get feed in WordPress?
The following URL provides the default blog feed for my WordPress blog: https://ravisiyer.wordpress.com/feed/ .
It returns the latest x number of posts in XML format. Note that it does not return blog pages. x is controlled by WordPress Admin->Settings->Reading->(RSS feed settings) Syndication feeds. It is 10 by default but I could increase it to 150 in which case it returned all 113 published posts of my above blog.
I could not get proper info. on whether parameters are supported for above URL.
But posts for a particular year or month can be obtained by URLs like the following:
- https://ravisiyer.wordpress.com/2023/feed/
- https://ravisiyer.wordpress.com/2022/feed/
- https://ravisiyer.wordpress.com/2019/11/feed/
How to convert the XML feed to JSON and then write the converted JSON feed to a text file?
I used Chrome browser to get the XML feed and then copy-pasted its contents to the appropriate input textbox in this online xml to json converter: https://codebeautify.org/xmltojson . Then I copy-pasted the converted JSON from the output textbox into a text file. [To have the XML feed as a text file, I used Chrome browser to get the XML feed and then copy-pasted its contents to a text file.]
Online viewing of JSON structure
To understand the JSON structure of the converted JSON feed, I used https://jsonviewer.stack.hu/ . I had to copy-paste the JSON feed into the appropriate input tab, and then see its structure through the viewer tab. This helped me in modifying the code of the Blogger blog maker project(s) (like BlogBooksMaker) to work properly with the WordPress blog XML feed converted to JSON feed.
Converting XML to JSON through code
On searching for JavaScript code for the above, I got the following two links from:
https://stackoverflow.com/questions/1773550/convert-xml-to-json-and-back-using-javascript
I tried out the first one but ran into some issues which will be covered in the appropriate section below.
Then I came across a Google Apps Script solution here: Convert XML to JSON with Apps Script, https://www.labnol.org/code/19952-convert-xml-to-json which I tried out and which seems to work! Those details will be covered in the appropriate section below.
Test Versions and Stable Version in Reverse Chronological Order
20230710-WordPressFeedToBook version
- Added licensing info. and background info. as comments in top of file.
- function makeWPBlogFeedBook(blogFeedURL, bookTitle) became the main function doing the blog feed book work. Providing two arguments made the function more general purpose and it was expected to be invoked by other 'run-driver' functions from Run-Driver.gs file of the project.
- Added blog feed title and creation process start date at beginning of output file.
- Added proper code to get published date of post and add that to post title in output book/file.
- Added runType enum and check for normal or test run. Test run would break after some iterations.
- Added end of book line in book.
- Added more logging information like is being used in Blogger BlogBooksMaker project.
- Cleaned up the code and tried to beautify the code a little.
- function makeBFBWithDefaultValues(): Invokes makeWPBlogFeedBook with default values which will make a book of the default blog feed. This function can be run from Script Editor to test makeWPBlogFeedBook function.
- function makeRavisiyerWPBlogBook(): Invokes makeWPBlogFeedBook for ravisiyer.wordpress.com full blog. Requires WordPress Admin->Settings->Reading->Syndication feeds to be set to higher than number of published posts in ravisiyer.wordpress.com.
- function makeBFBForOneYear(): Invokes makeWPBlogFeedBook for a particular blog feed (for a year).
- function makeBFBYearWise(): Invokes makeWPBlogFeedBook for a particular blog feed with year parameter in a loop.
20230709-XMLFeed-BlogBooks version
20230709-FullBlogFeed version
Google Drive share folder: https://drive.google.com/drive/folders/1QUGtQqDyIvy5QkDPwvu3bQTDNI0ma1l8?usp=drive_link
In this version and run, I decided to use main function JSONTextFiletoBlogBook() used in 20230708-Succ-R1 version version and make necessary modifications to it.
I temporarily set WordPress Admin->Settings->syndication feeds to 150. Note total published posts in the ravisiyer.wordpress.com blog are 113. Using https://ravisiyer.wordpress.com/feed/ in browser I seem to have got the full blog posts feed (not pages). Copy-pasted this content into file: Allpostsfeed.xml of size 1,161 KB, https://drive.google.com/file/d/1ns40_elYaPXtZH9Z_kWLoniWG4hLa-qi/view?usp=drive_link .
Allpostsfeed.json.txt, size 1,146 KB, https://drive.google.com/file/d/1MudzPWoTEs81fbo1Y2hDYBCpDSXv7Tqs/view?usp=drive_link .
Code.gs of this version: https://drive.google.com/file/d/1WMXMAW6WCqqQ_XJvjBVuukhO-WN1uMJ_/view?usp=drive_link
Modified code of function JSONTextFiletoBlogBook() in Code.gs to use Allpostsfeed.json.txt as input file. Also added publication date to title of post in output file. I don't think made any further significant changes to the JSONTextFiletoBlogBook() function which was the main function invoked in this run.
The program completed normally. Its execution log, Exec-Log.txt is shared here: https://drive.google.com/file/d/1ZiTjCawaPLaMrUmS7WENouCwPm89yWea/view?usp=drive_link .
The run output file is: Test WP Blog Book, 10.1 MB, 399 pages, https://docs.google.com/document/d/1qGoszpuYBj8TLy_OsGMzP1-nQ7k_dowTdGINB0uLGjA/edit?usp=drive_link .
a) Images seem to be printed in large size and so some images get cropped.
b) Formatting is much better than in ExportData test case but still some post titles are not printed in large and bold font.
Otherwise it seems OK. I think it has got all the posts of the blog (but not the pages).
20230708-09-ExportData-Test version
Google Drive share folder: https://drive.google.com/drive/folders/1Sst4vbhuxO44arT4lxkad4fGSGs3UEin?usp=drive_link
I wondered whether the XML export of files of my WordPress blog could be used instead of just the blog feed, in the procedure used in 20230708-Succ-R1 version described below, and so tried out the same. I followed the same procedure except that I used my March 2023 full blog export file: raviiyerorg.wordpress.2023-03-27.000.xml, https://drive.google.com/file/d/1qKtn3_3lC732KLiW18Qm2w-Pv34ZZ4in/view?usp=drive_link for conversion to JSON: raviiyerorg.wordpress.2023-03-27.000.json.txt, https://drive.google.com/file/d/1IQNh8DIJBEaJ_wVVIl1SFOxE7AlXOVlg/view?usp=drive_link , in step 2 of the procedure.
For the test run, in Code.gs for this version: https://drive.google.com/file/d/1PZzGsJB3NSDP-H6VWAuld0jfRO8DY9p7/view?usp=drive_link , I stopped the writing of posts after around 200 posts and changed the name of the input file to raviiyerorg.wordpress.2023-03-27.000.json.txt. I added a new top-level function ExportJSONTextFiletoBlogBook() which was copy-pasted from the earlier main function JSONTextFiletoBlogBook() and then modified for this purpose. The JSON structure was slightly different from earlier case and so I added/modified code to handle that. I also added code to check whether the entry is post or page in which I case I processed it and so skipped attachment entries (the initial part of the export file had many attachment entries). I don't think I made any other significant changes to the code from the earlier 20230708-Succ-R1 version.
The run seemed to get the around 200 posts and pages text and images but messed up the formatting in a big way! I dropped further work into this as fixing the format messing up part may have sucked too much time. Here are the share links for the output document: Test WP Blog Book, https://docs.google.com/document/d/1BDjkfUw1cnl72KkN_vd6s1lVu_dvTZqrwi0aryTw60c/edit?usp=drive_link and the execution log: Exec-Log.txt, https://drive.google.com/file/d/1kRiT_6wjGrXsyfWID2New2r-jQUHzWfm/view?usp=drive_link .
20230708-Succ-R1 version
Google Drive share folder: https://drive.google.com/drive/folders/1SzMbncxDXGXz6GCKFUxedpBtX808RbQ6?usp=drive_link
This is the first version and used this procedure, if I recall correctly:
1) Used URL: https://ravisiyer.wordpress.com/feed/ in Chrome browser to get the default blog feed of last 10 posts in XML.
2) Copy-pasted this XML from Chrome browser (or from a text file into which I had saved it), into https://codebeautify.org/xmltojson and copy-pasted the converted JSON text/content from the output textbox into a text file (testxmltojson.txt, https://drive.google.com/file/d/111wdgo7zJVAaPrAYz8iurGEi9eWYhBLX/view?usp=drive_link ). I uploaded this file into my Google Drive (root folder, if I recall correctly).
3) Explored using Google Apps Script code (Code.gs) to read the file and then provide this JSON feed to code of previous Blogger Blog Books maker project(s) to create a Blog Book from this JSON feed. As the JSON structure was slightly different from the Blogger Boog Books maker project(s), the code had to be modified suitably. This code work involved multiple steps as I had to do fair amount of learning and that is described later on in this section. But eventually the final version of this version code (Code.gs, https://drive.google.com/file/d/1G_MFVGtir5N8_51V48iS_ntgPuICVGx1/view?usp=drive_link ) worked and produced a blog book having all the latest 10 posts of the specified WordPress blog - "Test WP Blog Book", https://docs.google.com/document/d/1Af5j2MFWSHzlxjxGs0JwNij3sermF1KS32AxsBLAgMo/edit?usp=drive_link . The Execution log for the run, ExecLog.txt, is shared here: https://drive.google.com/file/d/1V8HidHQNVnyPvSNjAa2JTMa7kACGiBP6/view?usp=drive_link .
More info. about functions in Code.gs
- JSONTextFiletoBlogBook(): The main function of this file. Read the JSON feed in text file in Google Drive and pass it on to code that processes it and writes out the posts in the feed into a Google Docs document. It seemed to work and wrote out both text and images.
- readTextFile(): Read a text file specified by its filename in Google Drive and write out its contents to log file. This worked and so now I could read the JSON feed in a text file in code and pass it on to other code to process and write out the posts in a document. I later commented out this code.
- test() - 2 versions: First version of test had only the related code of LoadData() used in a way to list filenames of all files in my Google Drive. I tried it out and it listed the files in my Drive but it was not what I wanted. However, it ran successfully and that was good to see. I later commented out the first version of test() function code. I modified it to print only files of a particular name in my Google Drive. That modification worked but I later commented it out.
- LoadData(): As I was browsing for Google Apps Script code to read a text file from Google Drive, I came across this function. There are two copies of the same function code in the file. I think I had presumed that I may make some changes to the code and so made an extra copy of the original code for easy reference. But then I decided to copy-paste relevant code from this into test function and try it out there.
Comments
Post a Comment