It’s called Knowledge-Based Trust (or KBT), and it’s the subject of a research paper Google has outlined here titled, Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources . The intention of the new metric would be to estimate the trustworthiness of a particular web source by determining how accurate the information contained on a particular web page is. Current methods, such as links and browser history help, but according to the research team, “…such signals mostly capture how popular a webpage is.”
The example showing the inability for current methods to accurately measure trust looks at top gossip websites. According to the research team, “the gossip websites listed here mostly have high PageRank scores, but would not generally be considered reliable”. This seems to be where the research team seems to think this new metric will be most valuable. In other cases, it seems to be the consensus to use this new trust calculation as a supplement to other current factors Google uses to determine a web page or web sites overall quality.
So how exactly is Knowledge-Based Trust calculated?
The research first shows that testing on synthetic data reveals the ability for this KBT to actually reliably measure the trustworthiness of a web source. This is important, because the next step involved applying this same model to 119M live web pages using the 2.8B facts extracted from the web. This extraction process is based on Google’s Knowledge Vault project (KV), which most people know from viewing Knowledge Graph results Google includes in relevant searches at the top right of result pages.
According to Hal Hodson of New Scientist, “Facts the web unanimously agrees on are considered a reasonable proxy for truth. Web pages that contain contradictory information are bumped down the rankings.”
Concerns & Questions
- Non-trivialness: The example given was a web site about Hindi movies. Should the fact that it lists the movies language as Hindi give it a high Knowledge-Based Trust score, even though the information presented is accurate? It seems there needs to be a way for Google to consider this as trivial, and discount the value.
- Relevance of Site vs. KV extracted: Would a site whose main topic is business directories in South America, but whose KV extractions show only lists of cities and countries in SA benefit from a high Knowledge-Based Value score? Seems there needs to be a way to determine that although accurate, this information isn’t relevant to the main topic of the site, and thus be discounted.
There is also an article written by Matt Southern in Search Engine Journal where he expresses concerns over topics where the Knowledge Vault might be lacking valid sources. Specifically with new technologies and discoveries. “If Google started to rely on Knowledge-Based trust to rank web pages, would it then focus additional effort on revising and updating the Knowledge Graph?” asks Southern in the article. A very good question, and one that will presumably be answered if this metric were to ever be implemented on any scale.