There have been many common misguided perceptions on how to solve the
Google Hacker
- Block search engines from crawling the site, or portions of it.
- Run an automated scan of the website to find the vulnerabilities.
- Manually test the site and try every Google hack to see if the site is
vulnerable.
The main reason to not block search engines is, obviously, that you need
your site indexed by these free advertisers.
Millions of users hit these sites a
second and most likely your company cannot afford not aving its data in
there. Another reason is simple.
Hackers can run their own crawls whenever
they want. There's nothing preventing them from downloading their own
crawler and pointing it at your data to find what they're looking for. Search
engines only speed up the process. If the information is on the Internet
hackers will eventually find it.
Lastly, the only [easy] way to block search
engines from part of your site is by using a robots.txt file, which is a doubleedged
sword.
Most hackers actually look for this file because it tells them
areas of the site of particular interest.
Automated scans are great for detecting known vulnerabilities and easy-tonotice
information disclosures but there are several vulnerabilities that they
simply cannot detect.
Not being able to automatically find these
vulnerabilities is not a bug in the scanner, but is simply a limitation of
computers in the fact that they cannot think like a human can. When a
crawler moves along a website it has no idea if it found a part of the website
it was not supposed to access.
It has no idea if it found an admin section
where the user can control the entire application. It has no idea if it found a
list of all email addresses in your company. If it attempted to do this
automatically, it would lend itself to a myriad of false-positives [false
alarming], and worse yet, an enormous amount of false negatives [missing
vulnerabilities].
The reason the second is much worse is that it can lead the
tester into falsely believing that the site is secure. Instead, what the scanner can and must do is present all of the information it finds as best as possible
so the user can easily sift through the data.
Some might think that the best thing to do is to try every Google hack in the
book and see if any results turn up. While this might make sense, it can lend
itself to a variety of problems. You never know what queries people can come
up with next and which may not be published. Your site has to be on the
Internet for search engines to see it.
Once they do, you're racing every
hacker online at the same time - good luck. In addition, Google's not the
only search engine. Each one has a different database and different options
for queries. You would have to try every single one to be sure you were safe.
The only way to reliably prevent Google Hacking is to beat the hackers to the
data. |