Truths of Google Google Information Google Tools Google Hacking Google Vulnerable Google Hacking
Logo Google Truths
Information Retrival System
Truth Google - Home Truth Google - Sitemap Truth Google - Contact
Home Sitemap Contact
Login Here
Works Google Tips Google Tricks Google Techniques Google Secrets Google Search Engines Google
Advertising Tools Communication Tools Software Tools Publishing Tools Search Tools Development Tools
 Advanced Search Title FileTypes


Google News Google Supports Google Searching Google Techniques Google Products Hacking of Google
How Google Works
» How Google Indexer Works
» How Google Spider Works
» How Google Query Processor
» How Google WebCrawler Works
» How Google Page Rank Works
» How Google AdWords Works
» How Google AdSense Works
» How Google Audio Ads Works
» How Google Click-2-Call Works
» How Google PPC & CPC Works
» How Google Translate Works
» How Advanced Search Works
» How Google Search URL Works
» How Google Print Works
» How Works Robots.txt
Google Official Informations
» Google Search
» Google Services
 
Google Tools
» Advertising Tools
» Communication Tools
 
Google Tips & Tricks
» GMail Secrets Tricks
» Orkut Secrets Tricks
 
GOOGLE TRUTHS - HOW GOOGLE WORKS - How Google Indexer Works
Google Truth How Google Search Works Google Tool Work Google Truths Works Google Tools Works Google Tips Tricks
How Google Indexer Works

Google Indexer Works

Googlebot gives the indexer the full text of the pages it finds. These pages are stored in Google's index database. Google index is sorted alphabetically by Google search term, with each index entry storing a list of documents in which the term appears and the location within the text where it occurs. This data structure allows rapid access to documents that contain user query terms.

To improve search performance, Google ignores (doesn't Google index) common words called stop words (such as the, is, on, or, of, how, why, as well as certain single digits and single letters). Stop words are so common that they do little to narrow a Google search, and therefore they can safely be discarded. The Google indexer also ignores some punctuation and multiple spaces, as well as converting all letters to lowercase, to improve Google's performance.

Indexing the Web
Google Parsing
Parsing - Any parser which is designed to run on the entire Web must handle a huge array of possible errors. These range from typos in HTML tags to kilobytes of zeros in the middle of a tag, non-ASCII characters, HTML tags nested hundreds deep, and a great variety of other errors that challenge anyone's imagination to come up with equally creative ones. For maximum speed, instead of using YACC to generate a CFG parser, we use flex to generate a lexical analyzer which we outfit with its own stack. Developing this parser which runs at a reasonable speed and is very robust involved a fair amount of work.
Google Indexing Documents
Indexing Documents into Barrels - After each document is parsed, it is encoded into a number of barrels. Every word is converted into a wordID by using an in-memory hash table -- the lexicon. New additions to the lexicon hash table are logged to a file. Once the words are converted into wordID's, their occurrences in the current document are translated into hit lists and are written into the forward barrels. The main difficulty with parallelization of the Google indexing phase is that the lexicon needs to be shared. Instead of sharing the lexicon, we took the approach of writing a log of all the extra words that were not in a base lexicon, which we fixed at 14 million words. That way multiple indexers can run in parallel and then the small log file of extra words can be processed by one final indexer.
Google Sorting
Sorting - In order to generate the inverted index, the sorter takes each of the forward barrels and sorts it by wordID to produce an inverted barrel for title and anchor hits and a full text inverted barrel. This process happens one barrel at a time, thus requiring little temporary storage. Also, we parallelize the sorting phase to use as many machines as we have simply by running multiple sorters, which can process different buckets at the same time. Since the barrels don't fit into main memory, the sorter further subdivides them into baskets which do fit into memory based on wordID and docID. Then the sorter, loads each basket into memory, sorts it and writes its contents into the short inverted barrel and the full inverted barrel.
Google URL Resolver
URL Resolver - The Url Resolver read the anchors file and converts relative urls into absolute urls and in turn into docids. It puts the anchor text into the forward index, associated with the docid that the anchor points to. It also generates a Google database of links which are pairs of docids. The links Google database is used to compute Google pageranks for all the documents.
Google Sorter
Sorter - The sorter takes the forward index, which is sorted by docID, and resorts it by wordID to generate the inverted index. This is done in place so that little temporary space is needed for this operation. The sorter also produces a list of wordids and offsets into the inverted index. A program called dumplexicon takes this list together with the lexicon produced by the indexer and generates a new lexicon to be used by the searcher.
Searcher
Searcher - The searcher is run by a web server and uses the lexicon built by dumplexicon together with the inverted index and the pageranks to answer queries.
Google Truths : Hacking Tool
» Files Containing Juicy Info
» Files Containing Usernames
» Files Containing Passwords
» Error Messages
» Footholds
» Vulnerable Login Portals
» Sensitive Network Pages
» Vulnerable Servers
» Sensitive Directories
» Vulnerable Files
» Online Shopping Cart Info
» Various Online Devices
» Web Server Detection
Google Advanced Operators
» define » spell
» info » id
» filetype » ext
» movie » music
» lyrics » author
» intext » allintext
» inurl » allinurl
» intitle » allintitle
» inanchor » allinanchor
» site » source
» cache » link
» related » insubject
» book » phonebook
» location » time
» stocks » store
» group » maps
» daterange » weather
» safesearch » crack
Vulnerability Informations
» Unix » Linux
» Windows » Mac
» Web Server » Directories
» Usernames » Passwords
» Oracle » PL/SQL
» MS Access » Foxpro
» PHP » ASP
» JSP » .NET
» Network » Devices
» Webcams » Printers
» Movies » Music
» Books » Images
» Templates » Torrent
» Rapidshare » Megaupload
» Cracks » Serial Key
» Full Version Software & Utilities
Google Hacking : Prevention
» Finding the Data First
» Folder and File Scanning
» Vulnerability Classification
» Common Misconceptions
» Sorting Through the Results
Google Google Google Google Google Google

 

 

 

         
Google Google Google Google Google Google

 

 

 

Google Google Google Google Google Google
WHO WHAT WHERE WHEN WHY HOW
Google Google Google Google Google Google
Google Google Google Google Google Google
Conclusion Google Truths