In this article we will begin a series on the best, most accurate and most effective search methods on the world’s leading search engine, Google (and on others too). The methods we will discuss are a little more advanced, and as we should be improving our information literacy every day the best approach is to try out some good search practices immediately.
Google is a typical (and also excellent) representative of the surface web (more on this here). Apart from actual searching it also makes available many additional web tools that use a practically endless quantity of data. You can get an idea of just how much data here.
Eliminating information waste
As we are going to examine Google’s main product — searching — let’s begin with a few words about the relationship between accuracy and completeness.
Accuracy = The number of relevant documents found relative to all documents found.
Completeness = The number of relevant documents found relative to all documents (e.g. that exist in the database).
This means that if we want to restrict our set of results exclusively to relevant documents, we must ask our question in the most specific way possible. In the second case, we’re interested in finding out the maximum possible number of documents within the given search parameters, albeit with the risk of reducing the relevance of some of the documents found. The following techniques are also very helpful for reducing information waste, which is a wonderful thing to do when using the surface web. Here are two examples [in Czech with translations (for now), but obviously the same techniques apply to English searches]:
Question for accuracy:
"časopis čtenář" [“magazine reader”]
Question for completeness:
"oborové časopisy" (knihovnictví OR "informační věda") [“professional magazines” (librarianship or “information science”)]
We’ll come back to the detail of why the question is phrased in this way shortly. Also, don’t be surprised that časopis Čtenář [The Reader magazine] is written in lower case in the question. This is because most search engines are case-insensitive, i.e. they ignore whether a letter is upper or lower case. Now, we’ll look at both these questions in Google and you’ll also see one of the basic ways of making your searches significantly better. Here we go…
The magic of the phrase in Google (and elsewhere)
For our basic Google search, we’ll put in the question
časopis čtenář [magazine reader] without quotation marks.
As you can see, apart from the first, relevant document, you’ve also discovered hundreds of thousands of other documents. This is because Google has found all documents that contain the words
časopis AND(!!!!) čtenář [magazine AND reader]. In other words, along with časopis Čtenář [The Reader magazine] Google could also have found a web page about the fact that the reader Mr Novák enjoyed the magazine Pevnost (a Czech Sci-fi and Fantasy mag). Clearly, this has little in common with the well-known periodical on librarianship we’re searching for, but Google is not doing your thinking for you here. You’re the one who is determining what results you’ll receive. In any case, putting the question this way means that both words must be part of the document.
So, let’s take a look at another example that will move us up a level in our results. Now we’ll write
"časopis čtenář" [“magazine reader”] in our search field, and notice the quotation marks here. This time, the number of results is dramatically smaller. Why is this?
Here, we have used the
"phrase" search technique for our Google search. The principle is very simple: whatever you put inside the quotation marks will be searched for exactly as written. In our example of časopis Čtenář / The Reader magazine, the system returns results for documents that must contain the whole phrase
"časopis čtenář" somewhere. Searching with a phrase significantly increases the accuracy of our search.
Now we will look at an example related to completeness. Let’s put the following question in our search field:
"oborové časopisy" (knihovnictví OR "informační věda")[“professional magazines” (librarianship or “information science”)]
You can probably guess how the system (Google) will approach the first part,
"oborové časopisy" [“professional magazines”]. That’s right, it’s going to search for exactly this phrase. But what about the second part:
(knihovnictví OR "informační věda")(librarianship or “information science”)? The parentheses make up a logical whole consisting of two parts —
knihovnictví OR!!! informační vědy[ibrarianship OR information science]. This means that Google will search in the document either for the word
knihovnictví OR the phrase "informační věda".
The Google search engine translates the whole question into natural language as follows: Search for the phrase
"oborové časopisy" together with either the keyword
knihovnictví or with the phrase
"informační věda" or with both together.
From the almost 160 000 results we can now see some highly relevant ones, i.e. an overview of professional periodicals focused on librarianship and information science. In addition, we have also maximised the relationship between completeness and accuracy in searching. We have found not just information about about the existence of časopisu Čtenář (The Reader magazine), but also about dozens of other sources in related areas.
- Accuracy The number of relevant documents found relative to all documents found.
- Completeness is the number of relevant documents found relative to all documents (e.g. that exist in the database)
- Searching using “phrase” is a basic tool for a more accurate and efficient search.