I finally have the opportunity to give something back to the wonderful community of writers I have met and interacted with on the internet. Since starting this blog several months ago I have had the opportunity to engage with a wide range of extremely talented and helpful people who have given freely of their time to assit me in acheiving my dream of becoming a writer. And in case I’m not being clear about who those people are, if you’re reading this post then chances are you are one of those people. 🙂
Well, now I finally have the chance to give something back.
My day job is in IT, specifically software development, and for a while now I have been tinkering with the idea of a tool that can be used to quickly see how frequently words appear in my writing. I did a post in November last year called Revision – A Primer where I described how this information was used to tighten up a short flash fiction piece. I was keen to use a similar technique on a longer work but the thought of doing it all manually was daunting.
As I’m sure you know any task that is tedious to do manually is an ideal candidate for some automation and what with me being a programmer surely there was something I could do to make my life easier. Well I found a site that had a basic macro that did this at Allen Wyatt’s Word Tips and used that as the basis for my own version of the tool. There’s been quite a few modifications to the original but I have no desire to take credit for someone else’s work.
From this I added the screen to control the calculation, sent the results to a HTML file and added the ability to expand contractions (can’t to can not, I’ve to I have, etc.) and also to treat possessives as the additional versions of the same word (fred and fred’s count as fred).
What I have now is a quick and easy way to see what words I am abusing without the tedious task of counting the by hand, or the slightly less tedious task of using search and replace. It is this little utility that I now share with you.
I have tested in with MS Word 2010 and 2007 but it should also work in older versions (though I haven’t tried).
To make it available is quite simple and I’ll walk you through it here. This guide assumes you are using Windows 7 and working with Word 2007 but the process should be the same for Word 2010:
- You will need to download the Word Frequency User Interface import file here
- Once that is downloaded you should open the extract the contents of the zip file somewhere on your computer. Remember that location because you will need to find it again later on.
- Start MS Word and choose the View tab on the ribbon and then click the Macro button on the far right
- On the screen that pops up (see below) enter the name WordFrequency as the Macro Name and then press Create
- On the screen that comes up you need to go to the File menu and then choose Import File…
- On the screen the pops up you need to find the files that you extracted in step 2. Select them and press the Import button
- Now it will be time for you to do a little coding. Relax it is very very basic, no more than one line. What you need to do is make sure that you type the code exactly as it appears here. Hopefully the bits in blue and green have already been done automatically, you just need to worry about the line in black
- Once that’s done you should save the macro by pressing CTRL+S or selecting File then Save Normal from the menu. Then choose Close and Return to Microsoft Word
That’s it. The macro has been installed and you are now ready to see what words you are using far too often. To do that just open a word document, select the View Tab and then press the Macro button like you did in step 3.
You should now see the Run Macro screen again but this time there should be the option to select WordFrequency from the list. Highlight it and press the Run button
That screen should go away and you will be shown the Word Frequency Analysis screen where you control the macro you just installed.
I’ll give you a brief run down on what each option means but hopefully they are reasonably self explanatory (if they aren’t let me know so I can change them to something more meaningful).
Strip Possessives: This means to treat proper nouns ending in ‘s as simply additional versions of the proper noun. Thus Fred and Fred’s would count as 2 instances of Fred rather than 1 of Fred and 1 of Fred’s
Expand Contractions: This means that words such as didn’t, couldn’t, I’d, I’ve and you’re will be counted as two words did not, could not, I had, I have, you are. If you leave it unticked then did not and didn’t would count separately.
Sort on Frequency: Means that the most commonly used word will appear at the top.
Sort Alphabetically: Means that the results will be listed from A to Z.
Maximum Frequency to Display: Only words that appear in the source document more than this number of times will appear in the list. Particularly in longer documents you don’t want to have to wade through pages of words that only appear a handful of times. But you can if you want to. Simply set this to zero to show every word in all its glory.
Results File: The path and filename where the frequency results will be saved. This is a html file which you can view in your browser. You can also press the […] button to bring up a file selection dialog so you can specify the results file name.
Run Count: This is the Go button. When you are happy with your selections press this button and let the macro do its thing.
View Results: This will open the file you have named as your save file and display the results in your browser.
So, if you’re wondering what the results look like here is one I prepared earlier:
Since using this I have found that “the” is by far the most common word I use, I wonder if that will also hold true of any of you?
Well, if you decide to use this tool I hope you find it useful. I have some ideas for things to add to it so I do plan to update it from time to time. If anyone has any suggestions or finds anything wrong with it please do not hesitate to let me know.
This just my way of saying thank you to the fabulous community of quthors out there.