Mistborn By The Numbers

In my opinion, Brandon Sanderson’s writing is excellent. Others have described it as being technically perfect. His novels are epics in both concept as well as size but they are very accessible. He structures his sentences in ways that make them easy to read, I don’t need a dictionary beside me and he doesn’t give the impression that he swallowed a thesaurus. Put simply, if my writing in any way approached his I would be proud of myself.

My opening paragraph contains numerous subjective statements. I firmly believe that they are accurate but how am I going to prove them? The answer is, sadly, I don’t know that I can prove them. All I can do is put forward my opinion and – provided it is shared by enough people – I might be considered correct. The purpose of this post isn’t to prove my belief that Brandon Sanderson writes well, that is something that you will either agree with or disagree with, but there is something that none of us can deny. He absolutely writes well enough to have gotten himself published and to have him personally selected to complete another author’s magnum opus.

So, questions of quality aside, I thought it would be interesting to do a little digging, to pull apart his writing and see what its made of. In this post I am going to take all three of his hugely popular Mistborn books and break them down into some key metrics that will hopefully be an interesting baseline by which I might measure my own writing.

Before I start though I am making an assumption you are aware of the books I am talking about (and if you aren’t I will asume you either live under a rock or have just awoken from a coma). To refresh your memory here are the cover images. Click on them to take you to the amazon page for each book where you can read reviews and whatnot to assure you that his work is well liked.

Let me start with some high level numbers for the three books.

  Book 1 Book 2 Book 3
Word Count 213,348 245,172 234,908
Sentence Count 22,368 25,951 24,108
Paragraph Count 7,785 9,091 7,397
Unique Words 10,043 10,199 10,110
Average letters per word 4.5 4.4 4.4
Average words per sentence 9.5 9.4 9.7
Average sentences per paragraph 2.9 2.9 3.3

What does this tell me. Well firstly that all three books are very similar in terms of size and structure. The words, sentences, and paragraphs are all about the same size. Some of you might recognise these measures as being inputs into certain readability models. Regardless of which of the many such models you subscribe to the one thing they have in common is that words per sentence, and letters per word is relevant to how readable a particular bit of text is. So I would argue that the fact that all three of these books are very similar in these measures is no mere coincidence.

Next I want to examine the words themselves. To do this I have a table showing the ten most frequently used words in each of the three books. I will show the word, how often it appears in the text and then the average gap between subsequent occurences of the word.

For Book 1 the ten most frequently used words were:

Word Avg Gap Count Percentage
the 18.4 11607 5.4404%
to 35.9 5934 2.7814%
a 42.9 4974 2.3314%
of 51.3 4159 1.9494%
and 61.3 3479 1.6307%
he 79.2 2678 1.2552%
she 82.2 2536 1.1887%
you 87.8 2418 1.1334%
her 88 2393 1.1216%
that 89.9 2372 1.1118%

For Book 2 the ten most frequently used words were:

Word Avg Gap Count Percentage
the 19.3 12709 5.1837%
to 35.7 6873 2.8033%
a 46.5 5274 2.1511%
of 52.5 4666 1.9032%
he 59 4153 1.6939%
and 61.6 3977 1.6221%
i 78.9 3108 1.2677%
she 78.7 3104 1.266%
that 79 3102 1.2652%
you 86.9 2818 1.1494%

For Book 3 the ten most frequently used words were:

Word Avg Gap Count Percentage
the 17.5 13444 5.7231%
to 34.8 6749 2.873%
of 43.3 5427 2.3103%
a 49.3 4756 2.0246%
he 55.6 4227 1.7994%
and 59.9 3922 1.6696%
that 71.4 3291 1.401%
it 79.5 2953 1.2571%
in 82.9 2835 1.2069%
was 83.8 2802 1.1928%

What strikes me is how similar these lists are. Sure there are words in some that don’t appear in others and some are in a different place but overall there is a huge correlation between the three lists. And not one of those words is a proper noun. In case you are interested there is a site that lists the most commonly used english words and their top ten is very similar to what we have here. I guess the message to draw from that is if your own writing has a wildly different top ten then maybe you have some revision to do.

So far there hasn’t been anything particularly mind blowing in any of these statistics. So I am now going to turn to those trouble words that editors warn you to watch out for. How often do they show up in the Mistborn trilogy.

Word or words Book 1 Book 2 Book 3
Turn / Turned / Turning 386 530 389
These / This 1,007 1,154 1,196
Realized 46 48 71
Seem / Seemed 278 331 327
Smile / Smiled 249 220 161
had 1,331 2,096 2,577
“ly” words 1,333 1,179 1,258

As you can see, with the exception of the use of “had” in book 3 these are very low counts for a 250,000 word piece, even “had” only rates a one in 100 mention in that book. I selected these words (or types of words) because they are words that I typically overuse. Your habits may be slightly different and if you are curious how your pet word fares in the Mistborn series then ask me in the comments and I’ll reply with the results.

My goal in putting this up is to show you a statistical breakdown of a well respected author’s work for a highly acclaimed series of books. This alone is not the mark of good writing but it is a piece of the puzzle. So, if your writing results in statistics like this then you are at least somewhere toward having good sentence structure, and are selecting your words carefully. Of course you’ll still need to have a well considered plot and engaging characters and no amount of statistical analysis can show you how to do that.

I hope you have found the above data interesting.


9 thoughts on “Mistborn By The Numbers”

  1. What is your source? I assume you put the eBooks into a word processor, but it’d be nice to include that information.

    1. Hi Bob. Thanks for taking the time to leave a comment. Yes you are quite right I did use the ebooks as my source then used a program I wrote to extract the information.

      1. One question. Under your list of words to avoid you include had and then a count of how many times he used it. Does that include conjugations of had (he’d, she’d, etc.) or just the entire word?

      2. It was a while since I posted that but I think it includes the conjugations. In the program I used to do the analysis I have an option to ignore or consider those and I remember adding that feature when I wrote that post.

