Just recently, Google released a research paper that describes the method of scoring web documents according to the accuracy of facts. Called “Knowledge-Based Trust,” (KBT) many online experts refer to it as the “Truth Algorithm,” which is said to be a new method being used by Google to assign a Trust Score to help reduce the number of websites that contain wrong information.
If you own a website and if disseminating information is part of your online marketing strategy, Google’s Truth Algorithm is something that you should learn more about. According an article on New Scientist, the tech giant is moving to “rank websites based on facts not links.” Google’s algorithm researchers are keen on identifying key facts in a web page and score them for their accuracy by assigning a trust score.
But before the Truth Algorithm can be applied to billions of web pages, Google’s research paper identifies certain issues that the tech giant must overcome. Discussed below are some of them.
No 1: Irrelevant noise. According to the paper, the Truth Algorithm uses “Knowledge Triples,” a method of identifying facts that examines three factors (subject, predicate, and or object) in order to determine if they are true and correct. An example of this is the sentence “Barack Obama was born in Honolulu.” However, the paper revealed that the problem with Knowledge Triples is that extracting them from websites results in irrelevant triples. To make KBT work, Google said it first needs to identify the main topics of a website and weed out information that diverge from the topic of a web page.
No 2: Extraction technology needs improvement. Currently, KBT uses an extractor, a system that identifies triples within a web page and assigns confidence scores to them, that has “limited extraction capabilities.” Google says it needs to improve their extractors that is capable of identifying triples with a high certainty of accuracy.
No 3: Duplicate content. Another weakness of the current KBT algorithm is that it cannot exactly identify websites that contain facts copied from other sites. If Google’s Truth Algorithm cannot sort out duplicate content, the researchers said it may be possible that KBT can be spammed by copying facts from “trusted” sources such as Wikipedia and Freebase.
No 4: Accuracy. As part of its study, Google’s researchers examined 100 random sites with low PageRank but high Knowledge-Based Trust scores to determine how well KBT performed in identifying high quality sites over PageRank. Unfortunately, the researchers struggled with inaccuracy. Amongst the 100 random high trust sites they tested, 15 of them are errors. Two sites are topically irrelevant, 12 scored high because of trivial triples, and one had both kinds of errors. Google researchers concluded that these should also be addressed before the Truth Algorithm can be deployed to the World Wide Web.