Google bots can index files of all types. In today’s episode of our serial on advanced Google search techniques we’ll take a look at the possibilities for finding interesting information hidden in Excel (XLS) files.
We can persuade Google to search for interesting files if we use the special operator filetype:, which is followed by the required file extension. (If you would like an overview of the file types you can search you can find it here.) To begin with, let’s try and mine Google for a database of contacts with personal information.
Thousands of contacts within reach
For this exercise we’ll use the file extension XLS, i.e. a file produced by MS Excel. Let’s search:
"contact list" filetype:xls -template
This tells Google to search its databases for some interesting XLS file named “contact list”, or a file containing this phrase.
On the second page of results we can see the following record:
If we look through this file we will find hundreds of names of health care professionals working for leading companies. E-mail address, telephone numbers… they’re all there. But that’s not all. Let’s make our task a little more difficult and find a list of American journalists with contact information. Here’s the strategy we’ll take:
"media * *" intext:"new york times" filetype:xls -template (site:com OR site:us OR site:gov)
Syntax in detail
Here’s an analysis of the complete syntax in the individual blocks.
"media * *" (This part addresses the potential name of the file, for example “media contact list”, “media phone lists”, “media contact database”, and so on. The asterisk * substitutes for one word in Google searches.)
intext:"new york times" (We’ve chosen the name of a media source that MUST be part of this database, which in turn assumes the occurrence of OTHER similar media sources.)
filetype:xls (We want XLS files.)
-template (We don’t want the word “template” to appear in the file. This should eliminate empty files.)
(We’re searching in the COM, US, GOV domains).
(site:com OR site:us OR site:gov)
We can see two interesting results immediately:
This file isn’t all that relevant for our needs. The second file is much more interesting:
- The Google operator filetype: enables us to search for many file types indexed by this search engine.
- Combining this operator with a specific phrase lets us search various databases full of names and contact information.