I wrote an essay about Google when I was an undergraduate student in 2013 and found it still refreshing which is easy to digest for general users. This is why I still dared to share my essay eventhough it has already been 7 years since then. No matter how long it had passed, I still believe that the knowledge is still useful with the disadvantage that it had already advanced today.
Google history is when 2 Ph.D student from Stanford University named Larry Page and Sergey Binn to make an efficient search engine that gave users relevant links in response to search requests. As a search engine is still Google’s core purpose but Google also provides variety of application, range from email service up to document storage (Jonathan Strickland, 2013).
Because of Google and other search engines, The Internet is place to search for information from anywhere, instantly and anytime with low cost. Without these search engines it would be practically impossible to find the information you need when you browse the Web. Like other search engines Google uses a special algorithm. Although its specific algorithm is company secret, Google gave out a general fact. Like other search engines, Google uses an automated program called spiders or crawlers. Google has a large index of keywords but what distinguish Google from other search engines is Google ranks it’s search result well. Google uses a trademarked algorithm called PageRank, which assigns each Web page a relevancy score.
A Web page's PageRank depends on a few factors (Jonathan Strickland, 2013):
- The frequency and location of keywords within the Web page: If the keyword only appears once within the body of a page, it will receive a low score for that keyword.
- How long the Web page has existed: People create new Web pages every day, and not all of them stick around for long. Google places more value on pages with an established history.
- The number of other Web pages that link to the page in question: Google looks at how many Web pages link to a particular site to determine its relevance.
Spider bots are in charge of listed above, it travels to many sites and continues to other sites through the link that’s found on that site. When you enter keywords in Google the search engine refers to its database and displays the results. The most relevant results are at top list.
Back in 1998, Google's equipment was relatively modest. Co-founders Larry Page and Sergey Brin used Stanford equipment and donated machines to run Google's search engine duties. The equipment at that time included (Jonathan Strickland, 2013):
- Two 300-megahertz (MHz) Dual Pentium II servers with 512 megabytes (MB) of memory
- A four-processor F50 IBM RS6000 computer with 512 MB of memory
- A dual-processor Sun Ultra II computer with 256 MB of memory
- Several hard drives (some of which were housed in a box covered in LEGO bricks) ranging from 4 to 9 gigabytes (GB) for a total of more than 350 GB of storage space
Today, Google uses hundreds of thousands of servers to provide services to its users. Google's strategy is to use relatively inexpensive machines running on a customized operating system based on Linux. A program called Google File System manages the data on Google's servers (Jonathan Strickland, 2013).
We do not know exactly the current setup Google has for its datacenters. But a thread at WebmasterWorld asks the question, "How are Google's servers connected?" Lammert provides an extremely helpful and well-written response to the question. I cannot write it better myself, so I will quote it below (Barry Schwartz, 2005).
Google operates a number of datacenters around the world. I am not sure about the exact number, but at the moment there are about 15. Each datacenter has one or more clusters, and each clusters consists of thousands of computers calculating the SERPs for your search query. When you do a query, you are connected with one of these data centers. Which one is determined by the DNS settings of the nameservers of Google called ns1.google.com ... ns4.google.com. Throughout the day, you are not connected to the same data center or cluster. This is, because Google has decided to set an extremely short TTL (time to live) time for the canonical name and IP address. They have a good reason for it. If a cluster is overloaded or brakes down, they can route requests to another cluster or datacenter. Within 5 minutes (the TTL of the IP addresses) all clients will request a new IP address for www.google.com and all traffic is rerouted (Barry Schwartz, 2005).
- Strickland, J. 2013. http://computer.howstuffworks.com/internet/basics/google.htm. Viewed 3/3/2013
- Schwartz, B. 2005. http://www.seroundtable.com/archives/002268.html. Viewed 3/3/2013