Only a human looking at the data can determine whether it's of any
significance or whether it should be accessible. This
gets easier over time but here are some pointers with
which to start.
WebInspect has such a view, and most users will
barely look at this section, but it is very important
when trying to prevent Google Hacking. The site tree is where WebInspect
shows you every piece of data it found. It shows you every directory, every
file, and every vulnerable attack it made against the website. What you're
seeing is the crawler's view of the website, shown in a file browser-type
fashion, similar to Windows Explorer.
Here is where most users will glance
over the data and verify that the crawler found the pages it needed to, but
then don't look at everything else that was found. It is here where you can
find parts of the application that a user was never intended to go to.
First, look through the folders. Most websites are arranged where major
sections are in folders.
Look for folders like "/admin/" which a crawler should
not have accessed. Somewhere the crawler must have found a link to these.
The folders will tell you which areas of the application are of interest. I
usually view all of the responses on the main folders to see how they look.
Examine the filenames next. Which pages was the crawler able to find? What
are the names and locations? Should the bot have been able to access those
files?
Look for key filenames such as "login.aspx" as they might lead you to
login forms that you did not know existed on the site. Also look for filetypes
that differ from the main files, i.e. if you have an ".html" file in a directory
with a bunch of ".aspx" files this could be a clear indictor to a file that does
not belong. When I scan these file I always look for exactly that: what does
not belong.
Next, check out the content.
If you can, view the response as a web browser
would – so it's rendered. See if you can find areas of the website that don't
look normal. Frequently, vulnerable pages don't have the same style sheets
applied so they will look much different than the normal pages of the
application. These are ones that are often left over, were never meant to be
deployed with the application to begin with, or which the bot simply should
not have been able to retrieve.
|