Python and Google Docs to Build Books
Mon May 15 2017
Last updated
Was this helpful?
Mon May 15 2017
Last updated
Was this helpful?
Using Python to combine multiple Google docs into one cohesive whole that can be published as a book.
What I really wanted to do is combine the chapter breaking of Scrivener with the simplicity and collaboration of Google Docs. Preferably, I would put the book chapters into Google Docs as individual files and then send invites to my editor, wife, and my beta readers. By using Google Docs I could ensure anyone could access the work without having to create a new account and learn an unfamiliar system.
Unfortunately, at this time Google Docs has no way to combine multiple Google Docs contained in one directory into one large document for publication. To use Google Docs thhe way I want involves manually copy/pasting content from dozens of files into one master document any time you want to update a work. With even 5 to 10 documents this is time consuming and error prone (for me) to the point of being unusable. This is a problem because my fiction books have anywhere from 30 to 50 chapters.
Fortunately for me, I know how to code. By using the Python programming language, I can automate the process of combining the Google Docs into one master file which can be converted to epub, mobi (kindle), or PDF.
First, I download all the files in the book's Google Docs directory.
This generates and downloads a zip file called something like drive-download-20170505T230011Z-001.zip. I use unzip to open it
:
Inside the new the-darkest-autumn folder are a bunch of MS Word-formatted files named identically to what's stored on Google Docs:
This is what it looks like when I run the code:
And now I've got a Word document in the same directory called the-darkest-autumn.docx.
And now I can check out my results by using Calibre's book viewer:
As python-docx
doesn't handle HTTP links at this time, I manually add them to the book using Calibre's epub editor. I add links to:
For me it works wonders for my productivity. By following a "chapters as files" pattern within Google Docs I get solid collaboration power plus some (but not all) of the features of Scrivener. I can quickly regenerate the book at any time without having to struggle with Scrivener or have to add tools like Vellum to the process.
I have a secondary script that fixes quoting and tab issues, written before I realized Calibre could have done that for me.
I've considered adding the OAuth components necessary in order to allow for automated downloading. The problem is I loathe working with OAuth. Therefore I'm sticking with the manial download process.
When I started my latest fiction book, , I wrote out the chapters as individual files. I did it in a text editor (Sublime) and saved the files to a git repo. The names of the files determined their order, chapters being named in this pattern:
As the book developed I thought about moving it to . If you don't know, Scrivener is an excellent tool for writing. It encourages you to break up your work into chapters and scenes. The downside is that Scrivener is complex (I want to write, not figure out software) and . The latter issue is a very serious problem, as I like to have others review and comment on my writing as I go.
Now it's time to bring in the code. By leveraging the library, I combine all the Word files into one large Word files using this Python (3.6 or higher) script:
While Kindle Direct Publishing (KDP) will accept .docx files, I like to convert it to .epub using :
My personal author site at
The book's
The book's upcoming sequel, .
The book I started this project for, , is available now on . Check it out and let me know what you think of what the script generates. Or if you want to support my writing (both fiction and non-fiction), and leave an honest review.
Right now this snippet of code generates something that looks okay, but could be improved. I plan to enhance it with better fonts and chapter headers, the goal to generate something as nice as what generates.
For about a week I thought about leveraging it and my skills to build it as a paid subscription service and rake in the passive income. Basically turn it into a startup. After some reflection I backed off because if Google added file combination as a feature, it would destroy the business overnight.
I've also decided not to package this up on Github/PyPI. While makes it trivial for me to do this kind of thing, I'm not interested in maintaining yet another open source project. However, if someone does package it up and credits me for my work, I'm happy to link to it from this blog post.
Reference :