Semantic Search: Keyword Choices and Relevancy

Have you read the recent SearchMetrics ranking factors survey?

SEO experts Marcus Tober, Dr. Leonhard Hennig, and Daniel Furch are back with another edition of ranking correlations and ranking factors in the 2014 Ranking Factors Study. The report showcases four content factors which the authors feel correlate to higher rankings:

  • Semantically relevant and semantically comprehensive wording
  • Long form/higher word-count content
  • Enriched content with diverse media
  • Easy to read content

While all these areas are important, the first bullet – “semantically relevant…” – stands out to me in my role as an SEO professional. We’ve talked about keywords on the Act On blog and how important choosing the right ones can be. Let’s see how this new emphasis on semantics and “semantic search” affects how we evaluate keyword choices and relevancy further.

So grab a coffee and get ready to dive a bit deeper into the technical aspects of exactly how a computer program (bot/spider/search engine like Google) determines what the meaning of a page is – what question this page is the best answer to – and, subsequently, what it should rank for.

Keep in mind that as a marketer, you have control over a page’s quality, as well as the keywords the page could rank for. But the lines of influence and manipulation are blurred. A proper understanding of the science behind search can help shed light on the best way to develop content in today’s Google world.

Caveat: With few exceptions (e.g., the discontinuance of Google Authorship), Google does not directly address how its algorithms operate. Much of what we consider “emerging common wisdom” about how search changes is the result of research and testing by experts, who independently come to a rough consensus.

That said, here’s what the experts think is happening now:


Co-citation is a clustering method used to identify similar content through shared citations.
As it relates to search engines, the frequency with which two documents are cited together by other documents may help cement the relatedness and context of a page.

Here’s an example to break it down:


Let’s say there are three independent-yet-highly-ranked web pages focused on Siamese cats:

  1. Page A
  2. Page B
  3. Page C

And let’s further say that all three of the above web pages contain linked citations to two other pages:

  • Dr. Kat Scratch, cat behavior specialist, and
  • The Siamese Cat Bible

So we have three highly ranked pages that each mention (cite) two other pages.

One possible result of this co-citation scenario is this: Both Dr. Scratch and The Siamese Cat Bible websites can begin ranking (or ranking higher) for the keyword “Siamese cat”. In fact, even if Dr. Scratch’s page never mentions the term “Siamese cat”, her site can still rank for it.

Slightly gratuitous cat picture
Slightly gratuitous cat picture


Experts think it’s because search engines view the two pages as being thematically related because they are cited together several times by several high-ranking pages that are about Siamese cats (Pages A, B, and C in our example)..

Search engines want to provide the best response – content of some kind – that answers a searcher’s keyword query. In order to do that, experts suggest the concept of co-citation is applied, aiding in the discovery of terms a page should rank for – regardless of whether or not the page is optimized for that specific exact term.

Co-citation is a phenomenon that marketers have little (if any) control over. But they have a bit of control – or at least some influence – over this:


The concept of “co-occurrence” assumes search engines look at other terms (either words or phrases) that are used on the same page in order to comprehend a page’s meaning and contextual relevancy.

To return to our Siamese cat example, words like “feline” and “kitten” and “purr,” “seal point” and “applehead,” are words that can reasonably be expected to co-occur on a page about Siamese cats. This co-occurrence probably helps influence how the search engine ranks this page. In fact, it may outrank a page that is optimized for “Siamese cat.”

For another example, a search engine could reasonably expect certain words to appear in a blog post you write about the iWatch:


Never mind the search engine. Wouldn’t you, the reader, think a story about the iWatch that contained the words “Apple,” “watch,” iPhone,” “video,” “time,” and “apps” would be a more authoritative story than one which used none of those terms, but had the term “iWatch” sprinkled liberally throughout? You would, and that is why Google likely does as well.

As people increasingly use conversational search, longer tail terms, and voice recognition software, it’s increasingly important to understand how search engines are responding to these changes.

I first learned of Co-Citation and Co-Occurrence after a Whiteboard Friday with Rand Fishkin of Moz in October of 2012. In the video Rand explains his prediction for the “death” of anchor text, or rather, the more realistic concept of its signal being diminished as a ranking factor. He went on to provide a few examples where websites ranked for terms that weren’t actually optimized on the page for that specific term. Rather, the page was referenced or co-cited with other related sites, and/or terms co-occurred with other words on the page to help provide relevancy and quality signals. That then gave the pages contextual relevance for terms the page wasn’t optimized for. The search engines reflected this relevance by ranking the page appropriately.

It’s supposed to happen that way.

Matchmaker, Matchmaker, make me a match

In Aaron Bradley’s lengthy but brilliant article about semantic SEO, his ultimate advice to marketers is that they let search engines be their matchmakers, where connecting you with your targets is less about “the words that are used to describe things” and more about “the thing being described”.

“Unique identifiers allow computers to talk about things: a unique identifier represents the actual thing that a word is talking about. Not a keyword, but the meaning underlying a keyword. This is a critical distinction…”.

Proof terms and relevant terms

The aforementioned SearchMetrics survey analyzes “proof terms” and “relevant terms.” Here’s the difference:

What are proof terms?

The assumption: Proof terms are words that search engines like Google expect to be included on a page. In the case of our Siamese Cat Bible page, “kitten” or “sealpoint” could be proof terms.

Some experts suggest looking at proof terms as the list of keywords your SEO team gave you, but didn’t make the list. At some point you just can’t optimize your page for all the possible combinations of high-volume/low-competition phrases and those that don’t cannibalize other pages. Proof terms aren’t necessarily the most competitive or highest volume, but are, rather, natural terms you’d expect to co-occur.

How to implement proof terms

When writing about a topic consider whether you are overusing a particular term. Why not include a few variations instead? Here’s an example:

Say you’re creating a page to answer the question: “How do you make furniture out of pallets?” (A recent project of mine). You’ll surely include pallets on the page and will likely use that word many times. But consider also using words like skid, structural foundation, containerization, and shipping containers. Those terms naturally co-occur, especially within quality content, so it makes sense that search engines would expect them.

What are relevant terms?

“Relevant terms” are a little bit different. These are the terms you’d expect your SEO team to give you for the pallet page. Things like “pallet furniture”, “patio furniture”, “outdoor pallets”, “wooden pallets”, and combinations of these terms.

Many copywriters are trained to integrate relevant terms, which is great … but don’t overdo it. The best intentions often result in over-optimized pages, and over-optimization can adversely affect your ranking.

Holistic linguistics

“You shall know a word by the company it keeps” – John Rupert Firth, linguistics professor.

The study of linguistics dates from about 600 BCE, and used to be the province (mostly) of academics. A mere 2500+ years later, modern marketers are starting to become masters of linguistic analysis, understanding language patterns and using that information to help develop quality websites and content holistically – that is, naturally…not forced.

Whether you uses co-citation, Bibliographic coupling, or Lexical co-occurrence concepts, or concepts others have developed over the years – study up, and let linguistic understanding guide your holistic campaigns.

Places to learn more

Co-citation and co-occurence are just two facets of a broader concept often called “the “semantic Web.” The goal is to provide not just information about things, but also the connections between things. A fully realized semantic Web would be a better place for searchers, offering more relevance in less time, and a richer knowledge experience. (Pretty exciting, eh?)

This article on the Vertical Measures site offers a healthy list of resources I recommend. Simple resources like Google Adwords Keyword tool,,,,, and Google Suggest can help you determine relevant and proof terms, depending on your depth of research.

There is no one right way to integrate keywords, proof terms and relevant terms into the content you develop, but now is the right time to begin discovering what works for your pages. These sites can help you get started on the right path.

What’s your experience? Are you consciously working with proof terms and relevant terms yet? Have you developed a strategy?

Photo of “Coco” the Siamese cat by S zillayali, from Wikipedia Commons, used under a GNU Free Documentation License,