Python and Google Docs to Build Books
Mon May 15 2017
Last updated
Mon May 15 2017
Last updated
Using Python to combine multiple Google docs into one cohesive whole that can be published as a book.
When I started my latest fiction book, The Darkest Autumn, I wrote out the chapters as individual files. I did it in a text editor (Sublime) and saved the files to a git repo. The names of the files determined their order, chapters being named in this pattern:
As the book developed I thought about moving it to Scrivener. If you don't know, Scrivener is an excellent tool for writing. It encourages you to break up your work into chapters and scenes. The downside is that Scrivener is complex (I want to write, not figure out software) and Scrivener isn't designed for simultaneous collaboration. The latter issue is a very serious problem, as I like to have others review and comment on my writing as I go.
What I really wanted to do is combine the chapter breaking of Scrivener with the simplicity and collaboration of Google Docs. Preferably, I would put the book chapters into Google Docs as individual files and then send invites to my editor, wife, and my beta readers. By using Google Docs I could ensure anyone could access the work without having to create a new account and learn an unfamiliar system.
Unfortunately, at this time Google Docs has no way to combine multiple Google Docs contained in one directory into one large document for publication. To use Google Docs thhe way I want involves manually copy/pasting content from dozens of files into one master document any time you want to update a work. With even 5 to 10 documents this is time consuming and error prone (for me) to the point of being unusable. This is a problem because my fiction books have anywhere from 30 to 50 chapters.
Fortunately for me, I know how to code. By using the Python programming language, I can automate the process of combining the Google Docs into one master file which can be converted to epub, mobi (kindle), or PDF.
First, I download all the files in the book's Google Docs directory.
This generates and downloads a zip file called something like drive-download-20170505T230011Z-001.zip. I use unzip to open it
:
Inside the new the-darkest-autumn folder are a bunch of MS Word-formatted files named identically to what's stored on Google Docs:
Now it's time to bring in the code. By leveraging the python-docx library, I combine all the Word files into one large Word files using this Python (3.6 or higher) script:
This is what it looks like when I run the code:
And now I've got a Word document in the same directory called the-darkest-autumn.docx.
While Kindle Direct Publishing (KDP) will accept .docx files, I like to convert it to .epub using Calibre:
And now I can check out my results by using Calibre's book viewer:
As python-docx
doesn't handle HTTP links at this time, I manually add them to the book using Calibre's epub editor. I add links to:
My personal author site at danielroygreenfeld.com
The book's review page on Amazon
The book's upcoming sequel, The River Runs Uphill.
For me it works wonders for my productivity. By following a "chapters as files" pattern within Google Docs I get solid collaboration power plus some (but not all) of the features of Scrivener. I can quickly regenerate the book at any time without having to struggle with Scrivener or have to add tools like Vellum to the process.
I have a secondary script that fixes quoting and tab issues, written before I realized Calibre could have done that for me.
The book I started this project for, The Darkest Autumn, is available now on Amazon. Check it out and let me know what you think of what the script generates. Or if you want to support my writing (both fiction and non-fiction), buy The Darkest Autumn on Amazon and leave an honest review.
Right now this snippet of code generates something that looks okay, but could be improved. I plan to enhance it with better fonts and chapter headers, the goal to generate something as nice as what Draft2Digital generates.
I've considered adding the OAuth components necessary in order to allow for automated downloading. The problem is I loathe working with OAuth. Therefore I'm sticking with the manial download process.
For about a week I thought about leveraging it and my Django skills to build it as a paid subscription service and rake in the passive income. Basically turn it into a startup. After some reflection I backed off because if Google added file combination as a feature, it would destroy the business overnight.
I've also decided not to package this up on Github/PyPI. While Cookiecutter makes it trivial for me to do this kind of thing, I'm not interested in maintaining yet another open source project. However, if someone does package it up and credits me for my work, I'm happy to link to it from this blog post.
Reference : https://www.pydanny.com/using-python-and-google-docs-to-build-books.html