Better algo for word count to avoid the bloat

I just published a 7679 words story. I checked the word count in several apps and it’s all close to that. But somehow here it turned into 8096 words. That’s about a 5% error. I had one with 7700 that turned into 7971 words, so it seems to be affected by the story itself. I thought it was the CR but it doesn’t work out. Maybe it’s the quotation marks that get counted as words or something like that.

The main reason is probably how the inflections (like “it’s”) are counted. I think most tools count this as one word, but the site would count two words.

Currently I’m using split(“\W+”).length. Changing this to split(“\s+”).length or even split(“[\W\s]+”).length would probably be closer to what you expect.

Problem though is that a change would cause a inconsistency, as only edited or new stories would use that altered way to count words.

And honestly, I still think “it is” should always count as two words, regardless how it’s written :slight_smile:

Are you concerned about the colour of your story’s word count? We’ve been thinking about changing those colours.

This is probably not affecting a lot of people significantly enough but since it would just be a small change in the code and one pass on the database to adjust past stories (I doubt any author would object to see the word count of their stories shrink to what most tools indicate), here are the benefits and arguments.

First, how it affected me, which is not a typical case. I wanted to create a story under 8000 words. I already knew about the artifical bloat, so I checked one of my stories here that was just on the threshold, and saw that a story with 7700 words appeared as 7995 words here, so I aimed for 7700 words instead. I spent about 4 hours carefully going through my text and cutting words here and there to reach 7700 words. Only to discover when I published that for this specific story, it was bloated to 8096 words. By that time I was too tired and published anyway.

Now, the actual word count of my story was lowered by my 4 hours process and it has more important benefits apart from looking like a 7700 words orange story instead of a 8096 words red story that some people might avoid. The word cutting makes the story flow better, so it was not wasted time.

But it remains that my story appears to be as long as those on GSS that avoided contractions and truly “feel” like 8096 words.

I understand that word counts methods and color thresholds are arbitrary, and that all stories are counted the same way. But we use word count on GSS as an approximation of how long a story takes to read.

On GSS:

He’d suck Mark’s cock
He would suck his long cock

are both counted as 6 words. But the first sentence is easier to read because it only has 4 syllables instead of 6. So I would suggest that instead of arguing about what is considered a word, we argue more about which method achieves the goal of representing ease of reading better. Contractions usually do not add to the number of syllables most readers hear in their heads as they read, so not counting them as separate words gives a better representation of the readability of the story.

The whole reason for the existence of contractions in a language is to make your words flow better, to make the 2-word “do not” flow as if it was a 1-word “don’t”.

The reason my current story bloated more is probably because it has 4 characters and I had to use a lot of Ambrose’s, Rafael’s, Toby’s and Julian’s everywhere to make everything clear. And “Ambrose’s Rafael’s Toby’s Julian’s” on GSS is counted as 8 words instead of 4. Which is not intuitive.

Another argument for a word count method more in line with most existing writing tools is that those tools right now are useless to estimate word count on GSS. Again, not many authors keep an eye on the word count as they write, or care about how GSS evaluates it, but to those that do, it’s a pain.

I don’t mind changing this. It’s really minor. But I won’t run a db update, that’s too much for this minor issue. Each edit automatically updates the count, so if you really need an older story to reflect the new word count, you’d have to make a small edit.

Still, I’m currently not working on the code (I’m just starting to install everything on our new server, which is a lot of work).

So how urgent is this?

It’s not urgent at all. I don’t think if affects a lot of people.