How to Automate Text Documents Processing with Microsoft Word API

Most text documents are created using Microsoft Word. To make Microsoft Word even better, I think it would be great to allow automatic document editing, printing, as well as exporting text documents to PDF via an API.

In this article I will describe how to automate two of those tasks: printing text documents and converting text documents to PDF.

Here is an example of input code for printing a text document written in Microsoft Visual Basic.NET, which looks like this:

Dim app = CreateObject("Word.Application")
Dim doc = app.Documents.Open("D:\in\my resume.doc")

app.PrintOut(False)

doc.Close()
app.Quit()

This code seems quite simple, but there are some peculiarities when one needs to print a list of documents:

  1. Users need to be able to specify the name of a printer that they are going to use;
  2. It’s important to make Word application work conspicuously instead of appearing on the screen every time a document is edited;
  3. It’s necessary to switch off all the informational notes and requests to approve certain operations performed on documents.
  4. It’s necessary to turn off an option of automatically adding edited files to the list “Recent files”.

To take care of all those issues, an example of improved code would look like this:

Dim app = CreateObject("Word.Application")

app.Visible = False

app.DisplayAlerts = 0
app.FeatureInstall = 0
app.DisplayRecentFiles = False
app.DisplayDocumentInformationPanel = False
app.AutomationSecurity = 3

Dim wdOptions = app.Options
wdOptions.WarnBeforeSavingPrintingSendingMarkup = False
wdOptions.SavePropertiesPrompt = False
wdOptions.DoNotPromptForConvert = True
wdOptions.PromptUpdateStyle = False
wdOptions.ConfirmConversions = False

Dim doc = app.Documents.Open("D:\in\my resume.doc")

doc.Application.ActivePrinter = "Xerox Global Print Driver PS"

app.PrintOut(False)

doc.Saved = True
doc.Close(0)

app.Quit()

I personally think this code is sufficient to pass a University test in programming, but to upgrade it for commercial purposes we need to introduce some further improvements:

  1. This code won’t work if an incoming file is saved in a “Read only” mode.
  2. Performance of your program can be enhanced significantly if the user just opens and closes documents themselves and leaves the Microsoft Word application open.
  3. Some users need an option to print a selected range of pages instead of printing a whole document, or to print several copies of the same document.
  4. This code won’t work if the program is started from Windows Task Scheduler or on behalf of Windows Service.

I won’t describe how to solve the above mentioned issues. It gets very technical, very quickly and some readers may find it a little boring. But, let me just mention that I have successfully solved these issues in the following software applications: Print ConductorFolderMill, and 2Printer.

Another useful tool is converting a document to PDF, which requires the substitution of the PrintOut function for ExportAsFixedFormat. After doing this the code will look like this:

Dim app = CreateObject("Word.Application")

app.Visible = False

app.DisplayAlerts = 0
app.FeatureInstall = 0
app.DisplayRecentFiles = False
app.DisplayDocumentInformationPanel = False
app.AutomationSecurity = 3

Dim wdOptions = app.Options
wdOptions.WarnBeforeSavingPrintingSendingMarkup = False
wdOptions.SavePropertiesPrompt = False
wdOptions.DoNotPromptForConvert = True
wdOptions.PromptUpdateStyle = False
wdOptions.ConfirmConversions = False

Dim doc = app.Documents.Open("D:\in\my resume.doc")

doc.ExportAsFixedFormat("D:\out\my resume.pdf", 17)

doc.Saved = True
doc.Close(0)

app.Quit()

This example is for computers equipped with Microsoft Word 2007 SP2 or later versions.

It’s important to note that ExportAsFixedFormat function won’t work if there are no printers installed on the computer or if none of the printers have been defined as a system default printer.

A code for exporting documents to PDF can be improved by adding an option to export only part of the document, or to export a document to PDF/A format variation. These two options are in place with ExportAsFixedFormat by Microsoft.

Two examples of commercial applications that utilize this code for converting documents to PDF are DocuFreezer and FolderMill.