Showing posts with label Search Engine Optimization. Show all posts
Showing posts with label Search Engine Optimization. Show all posts

Tuesday, August 17, 2010

The Power of Keywords

Keywords form the gist of Chapter 5 of Michael Miller's The Complete Idiot's Guide to SEO, immediately following Content.

If Content is viewed as a wheel in motion, ferrying you to places, then keywords would be the spokes that radiate out from the center hub. They are essential for motion, but one can readily observe that the number of spokes is just adequate for the purpose at hand: too few and the wheel will become rickety; too many it makes a solid wheel, lacking the spatial mix that renders it attractive.

Of course coming up with a list of relevant, and hopefully highly searchable, words is just the beginning. But it's a huge beginning. You would think that coming up with the list is intuitive. After all we do this everyday, looking up stuff online.

Long before these web search keywords come into vogue, the academic community is already using the system in printed journals for the ease of indexing and sourcing relevant journal articles. Drive along an urban road, you can see giant billboards of an advertisement, feasting your eyes with countless keywords that could practically last a life time. However, this mental imprints do not stay long in the mind, or they are quickly shoved off to the deep recesses of the memory bank, being pushed into oblivion by new arrivals.

Hence, the keyword trackers or research tools that will do the memorizing, and also prompting, for you. They are supposedly based on what users actually type in the query box. They can also do the reverse by back-tracing to the original search terms if you want to know how successful articles were searched.

Once the proper list of keywords is up, the next step, which requires more planning, is their insertion into the webpage at just the right dosage: having too many leads to keyword stuffing, a definite No-No that can even disbar you from the search engine fraternity; having too few diminishes the impact and severely limiting the prospect of being taken seriously.

Hence, keyword density, a concept borrowed from the measure of the amount of material within a specified volume, in this case, applied in the two-dimensional sense represented by a surface area. Recommended densities in this respect can vary from 5% to 20%, depending on the length of the page. Obviously a longer page with a higher percentage of keywords sprinkled throughout may still be readable compared to a shorter page but strewn with the same percentage of keywords.

Then there are placement locations to consider. Michael Miller recommends at least once in the preamble/introduction and another time in the concluding paragraph. Another way is to partition the page into sections with headings, which are then legitimately colonized by the keywords.

Whatever the techniques, ultimately, it's still the human reader who will be the arbiter of whether the page is a forced concoction arranged to suit the keywords or it is a enjoyable read, regardless of whether it is ranked high or not.

That said, one can also argue that if the page is not ranked high in the first place, chances are it would not be read. Therefore, in addition to appealing to the human eye, the page also needs to be searchbot-friendly, in a way pandering to their set ways of sniffing. And this is most efficiently, and effectively as well, done through optimizing the HTML tags, the subject of Chapter 6 of Michael Miller's book.

That prospect led me down the memory lane, going back to the mid-1990s when I took some introductory courses in HTML codes, and even experimented with my own off-line personal journal, complete with photos interspersed between the HTML tags. It will be a long overdue refresher course of sort.

Sunday, August 15, 2010

My Attempts at Demystifying The PageRank Algorithm

One habit of mine is reading multiple books at any one time. Not holding the books at exactly the same time, but rotating the books after finishing one chapter or a section of each book. The habit is borne out of another habit at cross-referencing, comparing different perspectives offered by different authors and oftentimes supplementing each other. The two books I am reading the same time as Michael Miller's The Complete Idiot's Guide to SEO are Donna L. Baker's How to Do Everything with Google Tools (McGraw Hill 2008) and Google Hacks by Tara Calishain and Rael Dornfest (O' Reilly, 2nd Ed., 2005).

In Chapter 8 (Webmastering) of the latter book, I bumped into the PageRank algorithm of Google reproduced below in its equation form (Hack #87 by Mark Horrell):

PR(A) = (1 – d) + d[PR(T1)/C(T1) + ….. + PR(Tn)/C(Tn)]

where PR(A) is the PageRank (hereinafter shortened to PR) of a page A, PR(T1) is the PR of a page T1 (that links to page A and is considered an incoming link of page A), n is the total number of incoming links, C(T1) is the number of outgoing (outbound) links from the page T1 and so on for C(Tn), and d is a damping factor in the range 0 Wikipedia has given the mathematical background to the this now famous equation, purported named after Larry Page, one of the two co-founders of Google together with Sergey Brin, including a variant of the above equation where each PR is divided by the total number of pages, N, and is touted to be the one actually meant by Larry and Prin based on the statement in their paper that "the sum of all PageRanks is one”.

Wikipedia explains further:

PageRank is a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. … The PageRank computations require several passes, called "iterations", through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value.”

It is easy to infer from the structure of the equation that the higher the number of incoming link (n), the higher will be the value PR(A), i.e., its PR, since each incoming PR, PR(T1) to PR(Tn) always has a positive value. its. This is the quantity aspect, the more the merrier, which perhaps helped spawn the proliferation of link farms.

Similarly, the higher the PR of individual incoming links, PR(T1) to PR(Tn), the higher will also be the numeric value of PR(A). This then constitutes the quality aspect. This has another subtle twist to it based on the fact the new webpages invariably start with a low PR simply because it takes time for them to be crawled and indexed and be linked. Thus, there is a preference for webpage A to be linked by older webpages than the new ones.

However, do not forget about the denominator term, C(T), the presence of which will alter the fractional value in the opposite sense to that of PR(T), i.e., the larger the number of the outgoing links of each incoming link, PR(T), the smaller is its contribution to the overall value of PR(A). In other words, you would like each of your incoming links to have less outgoing links of their own.

Now, is that not a paradox of sort that tends to foster selfish thinking? Certainly it would not appear to be a win-win situation for all. But, wait. There is another facet to it, as amply expounded by Wikipedia:

Google assigns a numeric weighting from 0-10 (but 0 is used just for penalized or non analyzed-pages) for each webpage on the Internet; this PageRank denotes a site’s importance in the eyes of Google. The PageRank is derived from a theoretical probability value on a logarithmic scale like the Richter Scale.”

The practical implication of using a log (short for logarithmic) scale is that the difference between successive ranking numerals is not in algebraic, but geometric progression, or more specifically in the case, it is an order-of-magnitude difference, i.e., 10 times. Imagine thus:

A PR of 1 may have the actual values from 0.1 to 9.9;
A PR of 2 may have the actual values from 10 to 99.9;
A PR of 3 may have the actual values from 100 to 999.9; and so on.

Thus an increase of a PR by 1 can only come from tremendous efforts, certainly not in the linear trend of climbing from 1 to 2, the higher the PR, the much higher effort to elevate to the next higher ranking. Thus, in the grand scheme of things, having links to webpages with a higher or lower number of outgoing links would matter less than linking to a larger number of link, plus it is out of one's control anyway. Now I can really appreciate the chasm that separates this blog site of mine and Wikipedia's as I may have seen to remark rather casually when pitting a PageRank of 1-2 against a 9.

If PR were the only criterion used by Google, then getting a high rank is no difference than winning a popularity context. In the words of Wikipedia, “a PageRank results from a "ballot" among all the other pages on the World Wide Web about how important a page is. A hyperlink to a page counts as a vote of support. The PageRank of a page is defined recursively and depends on the number and PageRank metric of all pages that link to it ("incoming links"). A page that is linked to by many pages with high PageRank receives a high rank itself. If there are no links to a web page there is no support for that page.”

Reality is of course a different story. There is another part to the rank determination conducted by Google, a much greater part I hope, as reflected in the following:

Hypertext-Matching Analysis: Our search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), our technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. We also analyze the content of neighboring web pages to ensure the results returned are the most relevant to a user's query.”

Now I do not have retract my previous blog on content being king and all and all the content writers can sleep better.

Upon further Google (what else?) search, I came upon an excellent explanation of the PageRank algorithm by Phil Craven that alerted me to two more features:

- That a website's internal links are included. Therefore, the more pages a website holds, the higher the PR as they contribute the same as would an incoming link. The only caveat is that the pages have to be indexed by Google, that means on their own merit, and no cookie cutter stuff.

- “That when a page votes its PageRank value to other pages, its own PageRank is not reduced by the value that it is voting. The page doing the voting doesn't give away its PageRank and end up with nothing. It isn't a transfer of PageRank.” That is, do not be afraid to link and plunge into the world of inter-connectedness, even though these links may be virtual, and the spirit of reciprocity (an outbound link begets a inbound one) lives on.

The last piece of the PageRank puzzle is the little d. What role does it play? Wikipedia puts it thus:

The PageRank theory holds that even an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d.”

And in Page and Brin's own words:

The d damping factor is the probability at each page the "random surfer" will get bored and request another random page.”

This fits in with our perception of human nature for we are not machines and are not capable of executing infinite number of clicks. Other than physical exhaustion, boredom will set in much quicker, and the rate of quickening seems to be a function of age as we can all attest to. Mathematically, we know that its inclusion depreciates the overall value of PR. If it were zero, all webpages will have the same PR of 1, a non-starter in the first place. Another thing to note is that the PR can never be zero. If a complete introvert writes a single webpage with no links whatsoever, it would still garner a PR of (1 – d), or 0.15, practically.

Is all the above that I spent a better half of the morning transiting into the afternoon understanding and collating worth the trouble? Intellectually, a resounding YES. It has been a long while that I have killed so much of my brain cells within such a short span of time. It's invigorating to say the least. My analytical mind still works.

Practically though, I am sobered by the recent clarification from Google as appearing almost as a epilogue in Wikipedia:

On October 15, 2009, Google employee Susan Moskwa confirmed that the company had removed PageRank from its Webmaster Tools section. Her post said in part, "We’ve been telling people for a long time that they shouldn’t focus on PageRank so much; many site owners seem to think it's the most important metric for them to track, which is simply not true."”

The actual post by Susan Moskwa was actually more explicit through further elaboration punctuated by a smiley at the end (tongue in cheek?):

"We removed it because we felt it was silly to tell people not to think about it, but then to show them the data, implying that they should look at it. :-)"

Then why retained it in the Google ToolBar? Perhaps the whole situation is better explained in a separate post that Susan Moskwa pointed to, which I will get to it later.

Ah well, life moves on. And so will we.

Saturday, August 14, 2010

Writing content well first, then go for SEO worthiness

Optimizing Your Website for Search Engines. That is the Title of Part 2 of Michael Miller's The Idiot's Guide to Search Engine Optimization, and is also what I have been talking about being inline with the normal usage of English when used to describe the SEO business. It is another variation of Website Optimization for Search Engines (WOSE).

Anyway, Miller has described Part 2 as the meat of the SEO business. It is action-oriented, applying SEO principles to a live website, in a step-wise fashion. The first rung, the most important foundation of all, is Content. Aptly entitled Optimizing Your Site's Content, it is a drill section on CONTENT, CONTENT, and nothing but CONTENT.

Engaging, flowing, flawless, and pertinent. The content must be able to captivate and sustain the attention of the users/visitors, the style must be flowing like water with continuity in a natural progression, the grammar must be the envy of the language enthusiasts (meeting the expectations of the purists will be too tall an order) with judiciously placed punctuation marks to indicate change in thought, in emphasis, and in re-direction, and the coverage must be clearly delineated, shorn of extraneous materials and excessive self-peddling. So much for content quality, which is the organic part.

One can be trained to write well, but the skills need to be acquired prior to launching the website. A website that is on the public gallery is hardly a place to learn the ropes of writing engagingly as parading language weakness in a published website degrades the perception of content worthiness. Fortunately, for those who are predisposed to be better doers than writers, help is at hand: copy writers. They are a niche unto themselves for a reason.

The rest of the optimization would appear to be more mechanistic: prudent sprinkling of keywords, sprucing up the HTML tags, and creating links that serve to elevate (inspire trust) rather than downgrade (engender distrust). Remember not all links are created equal, and they do vary widely in link reputation or worthiness. Official domains such as the edu's and the gov's command much higher respect than the lowly com's, some of them anyway.

As for the requisite length, most agree that being longer is better than shorter premised on the well-regarded observation that amplification trumps precis. And a thousand words or thereabout seems to be the consensus. For comparison, the length of this blog up to this point is about 380 words. Thus, I still have some grounds to cover until that magical threshold is reached.

Whether it is a webpage, or an article in hardcopy form like in a printed magazine or newspaper, the techniques and rules for good writing differ little. There are essentially two parts: the what (the content) and the how (the writing style).

As the progenitor of the webpage prompted by an idea, an urge to fill a need, or an opportunity to start, run and own an e-business by providing services, you are the best judge of the what part, and hence, the best person to articulate these core propositions. To this end:

1) Focus on the core theme, be it to verbalize a message, to convey a piece of information, to deliver a sales pitch for a specific product or to solicit feedback. Say it out-front, say it in the middle, and say it again at the end.

2) Focus on the needs of the readers: speak to them, with respect, and humility, and be truthful about the benefits that will accrue so that they could walk away with their needs met, or a way to meet their needs identified. Always remember to cultivate a lasting impression to encourage repeat business. If you're not in for the long haul, you have no business to be in it in the first place. In and out is certain to spell doom from day one. As in an oral presentation, audience analysis is vital so as to be able to write in a tone that the targeted audience is most comfortable with. Not condescending, nor overly didactic.

3) Then support the core theme with well-thought out procedure/applications/examples/cases in a clear sequence that culminates in the realization of the core theme through a series of success stories. After all, it is no difference than telling a story at the end of which the audience, or a portion thereof, must be sold on the story. In that regard, nothing sells better than one that is rooted in authenticity.

Once we know what to write and for whom, the success of the how part hinges on our ability to achieve readability and to exude elegance of the written word. While a written work may lack the reinforcement via body language, one can still aim to blend in the non-verbal cues through the use of evocative prose and sentiment-laden words to evince passion, sincerity, eagerness, and empathy. Here's where creativity can know no bounds, transcending platitudes and rising above sloganeering.

Regardless, good writing traits revolve around simplicity, economy, writing in the active voice/first person, coherence, and avoidance of slang/jargons and repetitions of the same words. Reading widely, having a wide command of vocabulary, and knowing the nuances will go a long way in presenting the what in a highly readable and elegant manner. If this seems daunting or the learning process is too time consuming, engage professional help.

There is one more step to insure SEO worthiness though. That's where SEO skills are called for, and that is also where Michael Miller lays bare the meat of his book for everyone's picking, starting from hereon right up to the last chapter, which is 24, of his book.

One chapter at a time now. And the threshold met.

Thursday, August 12, 2010

Content rules, but only the textual kind

The second chapter (How SEO works) of Michael Miller's Complete idiot's Guide to SEO is now history. That history was made while we were waiting for WT to sit for his driving test at a local DMV office this morning. Before that, we made him drive us round the carpark where the office is located and the adjacent road several times just to get him acclimatized to the route and traffic setting. And it paid off. A huge thumb-up sign from him at the end of his driving test announced another new legal driver on the road.

OK, back to the second chapter. In two words, Content Rules. Not just any content, but the textual kind, relegating the non-textual genre to irrelevance, at least for now until such time as and when some kind of image recognition capability is achieved.

The chapter is about what search engines look for and armed with that knowledge, how one can optimize the website to provide strategically, repeatedly, and refreshingly what these search engines look for, which are tuned to users' needs. That means also understanding what people in general look for.

In this respect, search engines can be viewed as a match maker, trying to consummate a marriage of sort that is only made in heaven, both parties' wishes fulfilled: the user's query is answered, and the website gets its top ranking.

Crawlers and searchbots, the unseen sniffers that prowl the cyberspace dispatched by the Search Engine Enterprise, are busy and impatient beings and do not linger long on any abodes of the internet denizens (think home page). They have got a zillion places to cover and therefore only look for what that are trained to do at selecetd places to send the content back to the Mothership. And there are three staples in this mix: keywords, HTML tags, and links.

Keywords are descriptors of items that are of interest to the users. HTML tags are codes that structure the website both for viewing and underlying it all, for providing a detailed schematic of where things are kept in a neat hierarchical arrangement. Not all HTML tags are created the same and the trick is know which are the favorite hangouts of these crawlers or spiders during their brief sojourn. Fortunately for people for do not bother with HTML coding like yours truly, structure means there is a well-defined path to follow and even the uninitiated is unlikely to go wrong in identifying these alcoves.

Links are connections or conduits that point to another webpage. Apparently, the more the merrier is the motto here suggestive of a popularity contest. To a point, since quality and relevance matter as well. Links have become a commodity that one can actually buy them, abiding by the economic model that where there is a demand, there will be a supply.

It would appear that SEO is nothing more than manipulating the keywords, the HTML tags, and the links to work in concert to improve a website's search ranking. And to that end, Michael Miller offers ten key factors to doing just that. And the associated optimization techniques are further amplified in others chapters of the book, plus a whole slew of other things that one can try.

And in the final analysis, it's all about trying. No venture no gain. Let the adventure begin. Mine started with the tinkling of this blog' visual look by experimenting with several HTML codes in the blog template, albeit at a rudimentary level. Yes, taking baby steps is good.

Wednesday, August 11, 2010

Whether SEO or WOSE, Text is KING

My take-away message from the introductory chapter of Michael Miller's book is that Text is King as far as searches go. It's a very text-centric world out there and the crawlers and searchbots are trained to sniff out web pages based on text only. Hence, text analysis is featured as one of the primary considerations that determine page rank. At least this is as things stand now. It does not mean that images become immaterial, but that one has to anchor it with some kind of text in order to score any point.

While quantity, or length in this case, might not be all that important, a shorter text can be deemed as less relevant, and is often accorded a lower page rank, all things being equal. Then again the page ranking algorithm reputed to be the top secret of the highest degree can sense any word padding, no matter how subtle it is, from a mile away, easily. Thus pruning a page for readability and more important, substance, is much more rewarding than playing the word game.

Another thing that struck me is pages are stored verbatim in the so-called document servers operated by these search engines for lightning-fast retrieval. Once stored, a page only gets updated, but is never totally removed from cyber storage. That would mean that a simple click of the delete key to annihilate even those files in the trash folder or Recycle Bin to oblivion is as good as out of sight but still floating in limbo somewhere in cyberspace, ready to be resurrected to inflict nightmares via another click by those who have the means and the incentive to do so. Wonder whether there is any kind of virtual shredder that makes stitching back so difficult that it becomes a futile venture for those who are so inclined.

Then it struck me, again, that SEO as Search Engine Optimization is actually a misnomer. What we try to achieve in raising the profile of our web pages as denoted by page rank through SEO is more like Website Optimization for Search Engines, rather than Optimization of the Search Engines themselves as the common usage of English would dictate. So, perhaps they should have been WOSE consultants.

I learned today too that the Google ToolBar actually includes a PageRank icon that displays the page rank of a particular webpage/site, but visually in the form of a filling horizontal bar. Readers can judge the page rank by mentally dividing the bar into ten slots (corresponding to the range of 0 to 10) and see how many slots are filled. This website of mine was adjudged by yours truly to have a page rank of 1 to 2. On the other hand, the Wikipedia website is like a 9.

This would appear to be a quick way to find out whether my SEO (or WOSE) efforts actually make any headway, or not.

Sunday, August 08, 2010

My Initiation Rites to SEO

I recently came across a similarly crowded term, Search Engine Optimization (SEO). I think for the sake of brevity, this three-noun version appears crispier than Optimization of Search Engines, or, God forbid, Optimization of Engines that Search. So SEO it is.

Thanks to Wei Joo, my elder son, as the intermediary, and to Daniel Tan, my nephew, who is an SEO guy (in his own words), I was initiated into the amazing realm of e-marketing, and in particular, the rat race for website ranking. Rat race is definitely apt here, not unlike the Wild Wild West known for the mad scramble for the pot of gold, as attested to by the proliferation of SEO websites.

From my limited online browsing thus far covering a span of a few days and that also outside my working hours and is thus meager at best by any measure, Search Engine Optimization (SEO) websites operate under the notion that nobody is going to be bothered with what is under the virtual pile, especially in cyberspace, as time is of the essence, no matter how useful the information can be. And Search Engine is the virtual equivalent of the Yellow Pages of the brick and mortar world, except that it's more nimble, expansive, and best of all, prioritized based on some objective criteria that presumably hold the interests of the consumers supreme. The latter aspect is where optimization matters, exclusively for those offering their services in the e-marketplace.

I think a disclaimer is apt here. There is no bashing of any kind whatsoever intended; it just so happened that I was recently awakened to what SEO can do, though what I'm about to blog may not be exclusive to SEO websites only.

My cursory search in cyberspace churned up so many SEO websites that it would last many life times of reading. Many of them will guarantee top billings for one's website, some even going as far as assuring the Uno. However, logic dictates that some in the latter category may be an exercise in futility based on simple arithmatic that goes something like this:

Number of SEO websites: countless (in relative terms as it most likely ranges in the thousands at most);

Number of niches: finite since regardless of the possible permutations of qualifying words and variants of the terms (here rarely used terms of practically no utility value as judged by the failure to elicit any recognizable response from an average Joe are excluded), the lexicons grow only at an excruciatingly slow pace and one can only drill down that far;

Number of top ranks: countable, with both hands at most for those interested in the Top 10 list.

Simple conclusion: there is just too many monks to share the few bowls of congee (to borrow from a opt-used Chinese idiom, loosely translated) given the fact that the pinnacle is always pointy at the top by definition. In other words, some unwary customers will not get their wishes granted. However, the ability to level the playing field such that big players don't simply win by sheer might in the e-business milieu is not in doubt. And therein perhaps lies its appeal where both the Davids and the Goliaths can match their creativity for supremacy.

Thus, a SEO consultant worth his/her salt will settle at “top billings”, implying that perhaps the pedestal is not as narrow a ledge as some would prefer it to be, leaving the numerals out to make room for creative maneuvers. So, caveat emptor, which, come to think of it, would apply to the procurement of any service, online or otherwise.

And a good way not to fall for any sales pitch is to learn some basics of what SEO can do and cannot do. In this regard, the Complete Idiot's Guide to Search Engine Optimization by Michael Miller (Alpha Books, 2009), courtesy of the local public library from which I collected the book just yesterday, will show me the way, hopefully. From its first few pages, I think I already like the even-keeled approach of the author, no promising the sky but "improving your search engine rankings" and "making your site rank higher". To me, these are realistic and achievable goals, if one knows what to do as the author assures in the Introductory chapter of his book, but with a reality check, it's not a quick and easy job. And my journey toward an improved website ranking just got started!