Spelling

Google Search is a writer’s friend: Primo spell checker

segg.png

For years now I’ve used Google Search as my go-to-spell checker on the internet for words that stump Microsoft Word’s spell checker (which is unfortunately a pretty low bar … “no spelling suggestions” and red underlined words are a pretty common occurrence. I may get one yet writing this sentence).

A spell checker is an application program that flags words in a document that may not be spelled correctly. Spell checkers may be stand-alone, capable of operating on a block of text, or as part of a larger application, such as a word processor, e-mail client, electronic dictionary, or search engine.

“The spell checker scans the text and extracts the words contained in it, comparing each word with a known list of correctly spelled words (i.e. a dictionary). This might contain just a list of words, or it might also contain additional information, such as hyphenation points or lexical and grammatical attributes,” Wikipedia tells me.

“An additional step is a language-dependent algorithm for handling morphology. Even for a lightly inflected language like English, the spell-checker will need to consider different forms of the same word, such as plurals, verbal forms, contractions, and possessives. For many other languages, such as those featuring agglutination and more complex declension and conjugation, this part of the process is more complicated.”

Most of the time I do know how to spell the word triggering the red alert, but even my largely two-index fingers typing has a tendency to overrun my typing on the page when I am composing something quickly in my head, as I write (err … type), and sometimes it is just as fast, when it is more than one word, to copy-and-paste the sentence into Google Search rather than to individually correct several suspect words. Sometimes, of course, I correct the word in Word just to make sure I really do remember how to spell it. Sort of like doing math in your head, or at least on paper with a pen or pencil, rather than using a calculator. We pretty much all figure we should be able to do those things manually; we just don’t want to overdo it.

This got me thinking the other day, wondering why Google Search is so much better at correcting my spelling in sentences, almost as an afterthought, while it completes a search that may or may not be additionally helpful in and of itself. Google Search will often finish a sentence correctly for me, even if I only paste or type a part of the sentence into the search box or bar.

My first hunch was that it had something to do with the vast amount of data Google Search processes with over three billion searches a day, and developing algorithms and other proprietary tools based on that.

My second hunch was that if I was pondering this other people have thought about it, researched it, and likely written about it before me.

My intuition for both hunches turned out to be correct.

Intuition, in fact, is what Google Search is all about. What makes it intuitive? Context. Context rules.

John Breeden II, the Washington, D.C. chief executive officer of Tech Writers Bureau, who formerly was the laboratory director and senior technology analyst for Government Computer News (GCN), where he reviewed thousands of products aimed at the U.S. federal government – everything from notebooks to high-end servers – and at the same time decoded highly technical topics for broad audiences, wrote about the topic in an Nov. 18, 2011 article for GCN.

“My biggest problem with Word is that there are some words that simply trip it up,” Breeden wrote. “When writing about temperature for our many rugged reviews, I always put ‘Farenheight,’ which Word thinks should be changed to ‘Fare height.’ That doesn’t help at all.

“However, when the same misspelled word is pasted into Google, it says, ‘showing results for Fahrenheit instead.’ There are quite a few other words that confuse Word but not Google. They are not difficult to find.

“I have to wonder why Google is so smart when it comes to figuring out what word a user wants to use. My guess is that the database Google is pulling from is so massive that it’s probably seen a lot of the same basic spelling mistakes. There are probably a lot of people who have wanted to search for Fahrenheit but typed in ‘Farenheight’ instead. Nice to know that I’ve got company.

“You would think it would be simple for word processors to use the same type of technology to improve their accuracy, but I suppose that would involve capturing data from their users and then making the connections between common mistakes and the accurate spelling.

“I thought that is what spell check was supposed to do, but instead I think it just matches the misspelling with words that are somewhat close to what you’ve typed. And Google obviously goes beyond that to associate common mistakes with actual words.”

An anonymous poster at Quora, a question-and-answer website where questions are asked, answered, edited and organized by its community of users, wrote on Sept. 1, 2012 in response to the question, “How is google so good at correcting spelling mistakes in searches?”:

“Google (search engines in general) has clusters processing tons (TB’s) query logs, which try to learn the transformation from original misspelled sentence to the corrected one. These transformation schemes are fed into the front end servers which serve the auto completion (and/or corrections to queries). “Also these servers have lot more processing power and memory and disk space of course will not be an issue at all (for the learned transformations). “Also since Google crawls the entire web regularly it will learn new words and suggest corrections Word can’t do till next release.”

Quora also aggregates questions and answers to topics.

“Desktop software usually have tight constraints on processing power, memory or disk space they could use to run compared to that of server based applications and usually are expected to keep the internet usage to a minimum (at least for MS Word.) “They use static resources (dictionary that might only be current at the time of launch) and can’t employ complex algorithms due to the above said restrictions and hence employ heuristic algorithms which may not [be] very predictive of the correct word.”

Cosmin Negruseri, vice-president of engineering at Addepar, an investment management technology company, formerly worked at Google (both companies are based in Mountain View in Santa Clara County, California) as an engineer, working on ads, search and Google Code Jam, an international programming competition hosted and administered by Google, replied the same day, writing: “The main insight in modern spell correctors is using context. For example New Yorp is a misspelling of New York with a high probability.”

 You can also follow me on Twitter at: https://twitter.com/jwbarker22

 

 

Standard
Blogosphere, Popular Culture and Ideas

Tipping points and blogging by the numbers

007barn2statsgraphChicago Fire - Season 2

I have been blogging at soundingsjohnbarker (https://soundingsjohnbarker.wordpress.com/) for about a year and a half now. Recently I reached some kind of magical tipping point where I no longer have to write anything, at least for the foreseeable future, to garner more than 100 readers a day on average. It’s got so easy, I missed marking the few days it took to go from 49,000 readers to 50,000 readers because I hadn’t been looking at my stats much (the bane of every self-respecting blogger, or so it seems anyway) because I hadn’t written anything new since Feb. 16.

Not that it really matters much. Even if they were inclined to disclose such proprietary information, which they’re not for the most part, I’m not mathematically enough gifted to really understand how various Google and Facebook algorithms work, so I can’t explain why this is so.

I do know this: About 75 per cent of the daily views right now on the almost 200 posts I’ve written since September 2014 come from my home or landing page on the blog, with a handful of stories, or blog posts, if you will, garnering views of at least one or two readers somewhere in the world every day.

I’m delighted to say “Red Barn, Big Barney and the Barnbuster” (https://soundingsjohnbarker.wordpress.com/2014/09/13/red-barn-big-barney-and-the-barnbuster/) one of my early posts from Sept. 13, 2014 joined that rarefied company of posts recently. On an ordinary day, readers in about a dozen or more countries around the globe read what I have written here. The makeup of the countries changes somewhat but the overall number of 12 or slightly more on a daily basis, has been the same almost from the beginning. It doesn’t go up or down much.

It seems that the majority of the stories being read right now, where a reader goes to a specific story rather than my homepage, come disproportionally from my earlier work, say between September 2014 and last May. Is that because I wrote better stuff back then? Possibly. But I think it more likely has a lot to do with the mysteries of Google search and how things cycle around the World Wide Web (WWW) on the Internet. I expect perhaps that in six months from now, some of the stories I’ve penned more recently will find their stride.

Along the way, I’ve learned a few tricks, of course. Write local if you want some big numbers on a given day. While I do from time to time, if some local issue or story interests me in an unusual way, I stay away from that kind of writing for the most part. For one thing, those kind of stories, I find, have little staying power, with three or four rare local exceptions (an unsolved murder story; a story about Dr. Alan Rich’s retirement last year and local lawyer Alain Huberdeau’s appointment to the provincial court bench; and several Vale stories come to mind). But most of them are one or two day wonders. It’s the more eccentric pieces on other places and even times that have a deeper and wider audience in the long run. Fortunately, I prefer to write on more eclectic things these days without any particular regard for geography or subject matter if the topic strikes my interest. Thompson city council may well make decisions that affect me in myriad ways, not the least of which is in the pocketbook as a local taxpayer, but even that can’t remove the glaze from my eyes long enough to write much about local municipal politics, although our water bills are tempting me to make an exception. But reading newspaper accounts of such goings on is usually painful enough. Mind you, I realize what strikes my fancy to write about when I don’t write local, is not for everyone, and I have no doubt that I’ve created some eye glazing of my own especially when I write on eschatology or some other arcane to some of my local readers religious topic.

The other thing I’ve learned is a bit about the value of tags and search engine optimization. And what I’ve learned, I must confess, is not exactly high culture or high-minded for that matter. Sex sells. Sizzle sells. Self-referential sells. Surprise!

My leading search engine terms today are: “hot tub high school; Lauren German hot tub school; MKO audit; Red Barn restaurant; LBJ sworn in” and “hot tub high school movie.” If you detect a theme, it is actually from a more recent story Jan. 29 headlined, “Fox TV’s Lucifer Morningstar and normalizing evil: Does the devil get any cuddlier?” (https://soundingsjohnbarker.wordpress.com/2016/01/29/fox-tvs-lucifer-morningstar-and-normalizing-evil-does-the-devil-get-any-cuddlier/) where I wrote, “Multiple references by Morningstar to Dancer about her briefly being a B-list actress, best known for her topless scenes in a movie called Hot Tub High School, before she became a cop, like her dad, are not accompanied by flashbacks, although Neil Genzlinger in his New York Times review, described the devil in Lucifer as having the “sexist, salacious mind-set of a 14-year-old boy” when it comes to Chloe.”

Perhaps destined to join the ranks of stories read daily in a few months?

You can also follow me on Twitter at: https://twitter.com/jwbarker22

 

Standard