TORONTO — Search engines have become indispensable tools for hundreds of millions of people. Including hackers who are use them to find vulnerable networks, machines and services with relative ease.
In a packed seminar at the Infosecurity Canada conference here, Nish
Bhalla, a man who has literally helped write the book on hacking
applications and founder of security firm Security Compass, detailed how valuable a tool search engines are to both black hats and white hats.
He should know. He is contributing author to Windows XP Professional
Security, HackNotes: Network Security, Writing Security Tools and
Exploits and Hacking Exposed: Web Applications, 2nd Edition.
Bhalla argued that security professionals and network administrators need to
pay attention to search engine hacking approaches. More importantly, he said that gathering information on search engine-hacking has to be performed as part of a proper Web application vulnerability testing methodology.
“Threat analysis is the act of identifying threats against anything in your
environment,” Bhalla said. “That includes Google hacking.”
Bhalla wasn’t suggesting attendees use the information to attack-third
party networks, of course, but rather to help security professionals understand what’s out there already.
But for those who would use the information to perform
some kind of penetration testing, the security guru gave a stern warning.
“Don’t attack unless you have permission,” Bhalla said. “Or you could have
problems.”
Over the course of the hour-long session, Bhalla illustrated how search engine queries could yield results that hackers could use to devastate networks.
Passwords, mis-configurations, buffer overflows, Web application
vulnerabilities: Search engines help find all of them he said.
“Search engines crawl sites, and some information that you don’t want exposed
is also exposed because of lack of knowledge of what is on the systems,”
Bhalla explained.
Though Bhalla repeatedly used the term “Google hacking” for search-engine-based hacking approaches, he was quick to note that it’s not just Google that
hackers can use.
“It’s not Google that is at fault; you can use any search engine,” Bhalla
said. “Though the syntax is different on different engines, you just
have to go to the advanced options and it will tell you.”
The first thing that search engine hackers are likely to do is use the search
engine to look for potential points of entry.
Bhalla noted hackers are
searching for ports, server identification profiles, vulnerability scan
reports and devices.
By using a search engine to discover those points of
entry, as opposed to doing a traditional port scan using an application, the
hacker will not alert the company they are targeting and they won’t trip any
kind of intrusion-detection system, either.
Bhalla’s excursion into the murky depths of search-engine hacking drew gasps from the crowd and more than a few concerned looks.
By using some
advanced query operators, such as “intitle” (which included results from page
title), “inurl” (results from URL), “ext” (file extensions) and “filetype”
Bhalla swiftly demonstrated how entry points, confidential files, passwords,
mis-configurations and network vulnerabilities could be discovered with a
search engine.
The actual search engine ‘hack’ is relatively simple in concept. By combing
the advanced query operators with the names of ports, common portal login
pages and even searching for Outlook .PST files, the (not-so)
confidential information of networks and individuals is returned in the
public search results of the search engines.
So what’s a stressed-out already overburdened IT person to do to protect
themselves and their networks?
Bhalla noted that the most obvious thing is to use a robots.txt file, which
is a file on a webserver that specifies to a search engine crawler what it
is allowed to crawl and what it isn’t.
Then again if the robots.txt file itself is searchable, Bhalla noted, a
hacker could just go in and open up that file to see what you don’t want
others to see. To add further insult to injury, not all search engines will
obey robots.txt.
According to Bhalla, archive.org, which is an attempt to maintain a complete
archive of everything on the Internet does not follow robots.txt.
The real keys are vigilance and awareness, according to Bhalla. That and an
onion approach to security as opposed to an egg. So instead of having a hard
outer shell and a soft interior, have layers of security.