What are the essential tools that every IT person should have in the modern IT world? That is the idea behind a new Kingston University module called IT Toolbox. Over a 12 week semester, first year students will be guided through a series of activities such as blogging, running a server, client and server side scripting, search, social networking and problem solving. Each of those activities will be published here and anyone is welcome to join in.

hide alert

Exploring Google (Toolbox Activity 1)

Written by: Jonathan Briggs

September 29, 2009 [4667 views]

Deadline: Midnight, Monday 5th October 2009

In this first Toolbox activity you need to spend at least a couple of hours really trying to put Google through its paces.

Questions to think about
  1. What makes Google the most popular search engine in the world?
  2. How can you improve the results you get for a search?
  3. How can you use Google to get help when things go wrong?
  4. What else is Google doing besides search?
  5. How can you keep in touch with developments at Google?
  6. How does Google work and what is going on behind the scenes? (Part 2)

Read this first

You should always print out these instructions

Every activity is broken up into at least 2 parts. You need to make sure that you always complete part 1 and if you are trying to get a good result for the module you should also complete part 2.

Every activity comes with things to do and things to think about. At the end of each part of the activity there is a feedback form that allows you to prove that you have completed the work. The feedback form will not test every part of the activity and will also allow you to ask questions and report problems.

Group feedback will be provided if you complete the activity before the stated deadline.

You should always make sure that you make notes (on paper or in a word processor) during the activity so that it is easy to fill in the feedback form. This also ensures that if the feedback form does not work for some reasons you can go back and fill it in again.

The software used to power the form is called SurveyGizmo and is highly reliable.

Do remember that failure to take part in the activities will result in you having to take an examination (based on the same material).

Every activity will come with a list of useful links that should help you explore the questions asked. You may not need to use all of them but you may find them helpful for these activities and other parts of your course.

Do remember that these activities must be completed individually. This does not prevent group discussion or collaborative planning of the activities but each student must complete the feedback surveys on their own.

We reserve the right to interview students whose answers are identical to other students and if copying or plagarism is suspected both parties may be required to take the examination. A quality control check will be applied to all submissions and only those students who meet that threshold will have passed.

Before you start

  1. Download a copy of the Firefox broswer from http://en.www.mozilla.com/en/firefox/
    We will use Firefox for many of the activities in this module because it works the same way across
    different platforms and provides many useful extensions and tools.
  2. Download and install the Google Toolbar http://www.google.com/tools/firefox/toolbar/FT3/intl/en/.
    This is the first additional tool that you must install. Follow the instructions and restart Firefox to activate the toolbar.

Part 1: Going beyond “I feel lucky”

You are already familiar with Google but are you using all of its tools to get the best out of the results? Here are 4 things that you need to do:

1. How many results?
  1. Set your Google preferences so that you are getting 100 results per page
  2. Choose a popular pair of keywords such as “kanye west” and look at the number of results on www.google.com and www.google.co.uk. Make a note of the numbers. Why do
    you think they are different
  3. Look at the first 100 results - are these the best results possible?
  4. What about the order of these results. Make a note of the URL for the top natural result?
  5. Notice the advertising. Make a note of the company who is advertising along side these natural results.
  6. Page forward until you reach the 1000th result. What do you notice? Are you surprised? Check to see whether you get similar answers using Yahoo! and Bing.
2. Advanced searching

Google will let you refine your search either using the advanced search functionality, through using keywords in the query or by using tools on the results page.

  • Work out how to find pages about search engine spiders that have been published in the uk in the last 6 months
3. Different versions of google

Take a look at scholar.google.com. This is an academic version of google that prioritises academic papers. Repeat some of the searches you have done so far.

4. Google labs

Take a look at Google labs http://www.googlelabs.com/and the range of new services that Google is developing. Take a look at Google blog search http://blogsearch.google.com/

Here are the specific questions you will have to answer in the feedback questionaire. For each answer you have to type a couple of sentences maximum.

Please Note: Writing "don't know", N/A, filling with rubbish or similar will prevent you from passing the quality control check.

Answers required

  1. How many search results did you find for “Kanye West” on the UK and US versions of Google?
  2. Why do you think these might be different? (Have a guess if you don’t know)
  3. Who is advertising along side results for “Kanye West”?
  4. What was special about the results beyond the first 1000?
  5. Why do you think that this is the case? Think carefully about technical, business and other issues that may be involved.
  6. How did you use Google to find articles about search engine spiders that had been published in the UK in the last 6 months? Describe what you did in full?

You will pass this activity if you give sensible (not necessarily correct answers) to the above questions.

Feedback your results to PART 1 using this survey

http://www.surveygizmo.com/s/176056/toolbox-1-1

Part 2: How does Google work?

Part 2 activities will generally be more open ended and require you to do some research on your own. In this activity we want you to find out as much as you can about how Google works.

Use Google to research and then write notes for each of the following:

  1. What does a search engine spider do?
  2. How can Google return its answers so quickly?
  3. What sorts of computers and software is Google using to provide its service?
  4. How does Google decide which pages to show at the top of its results?

Take this short quiz selecting the best answer for each question:

Answers required

Google

a. searches the web whenever a user presses the search button
b. searches an internal version of the web
c. searches an internal index of the web

Spiders

a. visit every page on the web
b. collect data from every page on the web
c. collect data from some pages on the web

Page Rank

a. is a measure of how relevant the contents of a page are to a search query
b. is a measure of how popular a page is with other web sites
c. is a measure of how much a web site has paid Google to be listed

Results

a. are shown according to relevance
b. are shown according to a measure of reputation
c. are shown according to relevance and reputation

Number of servers for Google

a. 1
b. 10 - 100
c. 100 - 1000
d. 1000+

Finally try to answer the following questions

  1. How would you explain to a none technical person how Google works?
  2. What would be required to build a search engine that is better than Google?
  3. What questions do you have about Google? (list at least 3)

You will pass this activity if you give sensible (not necessarily correct answers) to the above questions.

Feedback your results to PART 2 using this survey

http://www.surveygizmo.com/s/176070/toolbox-1-2

Useful links

Google Guide
Recommended Search Engines(University of Berkeley)
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Google in Wikipedia
Google PageRank Basics (YouTube)
Sergey Brin and Larry Page: Inside the Google machine (Video)

Recent comments:

On September 29, 2009 at 7:43 PM, Rahim wrote:

How do we submit the Activities when we complete them

Jonathan replies: If you read the text above you will find two links to forms to submit your answers. They are marked FEEDBACK YOUR ANSWERS

On September 30, 2009 at 9:46 AM, Marky Warburton wrote:

i note that you like & use google a lot. what do you think of "knowledge engines" such as wolfram alpha?

Jonathan replies: I think they are very interesting and perhaps the source from which many of the next big advances will come. What I am sure of is that we have not finished and that Google is only the current best search engine rather than the final one.

On September 30, 2009 at 7:59 PM, PJ wrote:

I agree. However I find it hard to see other search engines will accomplish what Google has accomplished. Though other search engine's such as Yahoo and MSN do have Email for users. Google has exceeded all expectations of a search engine and has products ranging from Media to Tools for Desktops. Google is more then just an engine its a necessity to almost all IT users. Google has the upper hand covering many categories enabling them to spread and increase their company contacts.

Also, Bing is over rated as a Search Engine, in my opinion.

Jonathan replies: I'm glad that Bing is a more serious challenger than MS's last version.

If you are interested in where things may go then take a look at some of the developments in the "semantic web".

Also remember that America gets very jumpy when IT companies get too powerful and tends to break them up or control them: AT&T, IBM, MS etc

On October 2, 2009 at 11:52 AM, K Taylor wrote:

Hi Jonathan,
for the first question when you say UK & US do you mean the entire web on .co.uk and .com or for the UK do you mean search UK sites only? both give different results.

Thanks

Jonathan replies: I am really just asking you to explain why different versions of Google give different results because surely they are looking at the same web - so you can take this either way.

On October 2, 2009 at 1:22 PM, Emad wrote:

Hi there jonathan..

I have tried so many times to get to www.google.com and it always changes to www.google.co.uk.
Is there anything i can do with the settings to change it?

Thank you

Jonathan replies: If you go to the front page of google.co.uk and look under the search button you will find "go to google.com"

On October 2, 2009 at 9:34 PM, Waqas wrote:

hi jonathan,
Can you tell me what is the difference between google.com and google.co.uk, i mean do we get different results by doing tha and do we have to do that.

thanks

Jonathan replies: That is part of the activity! It is up to you to investigate the difference and try to find out why they might be different.

On October 3, 2009 at 12:24 PM, Nayaab wrote:

"Make a note of the URL for the top natural result?"
what did you mean by the top natural result?

Jonathan replies: The results on the left hand side of the page are known as natural or organic results while the ones on the right (and sometimes at the top) are known as sponsored or paid links.

Natural links have not been paid for!

On October 3, 2009 at 4:14 PM, sabena wrote:

i want to talk about question number 4 in 1st part of the activity as i was completly surprised when i wanted to explore 1000 search ressults of google. I mean there wasnt anything there, at the top of first web page about 39,600,000 serch results were found but i couldnt find anything after search result number 667. i really never noticed this before, STRANGE, btw why is this like this?

Jonathan replies: I knew you would be surprised and so was I! I want you to think about why! We will discuss the explanations on Tuesday.

On October 5, 2009 at 2:03 PM, James wrote:

Hi Jonathan,
I am just wondering that do we have to do Activity 2?
Thanks

Jonathan replies: You need to do it for NEXT Monday but as I had finished preparing it I thought I would publish it for you early.

On October 5, 2009 at 4:50 PM, Youssef Ibrahimi wrote:

dear jonathan

one of the questions said to go past the 1000th result but my search only limited itself to 600 results and it would not let me go anyfurther. do you have any advise for me as to what i can do.

thanks

Jonathan replies: Perhaps that is what I wanted you to see :-p Why do you think that is? Do the other search engines do a better job?

On October 6, 2009 at 9:50 PM, Americo Do Rosario wrote:

Just like to thank Jonathan for this activity. My research during this activity just helped me in finding out some of the email addresses and passwords posted on pastebin.com/m3888bb7a page.
Even though this page has been removed from pastebin.com, I managed to retrieve some of the email addresses and passwords by using the keyword: lafaroleratropezoooooooooooooo which I found on acunetix.com site below.

http://www.acunetix.com/blog/websecuritynews/statistics-from-10000-leaked-hotmail-passwords/

I’m not going to explain how I managed to retrieve them but this article will give you a clue - just play with intext: query string; http://www.hungry-hackers.com/2008/09/20-great-google-secrets.html

Below are a few emails and passwords indexed on Google databases. Each set represents a specific query using a trick learned from hungry-hackers.com. I believe that I could go all the way if I had the time and the wiliness to do so.

pastebin - collaborative debugging [email addresses removed for security reasons]

Shouldn’t Google delete those indexes by now? I’m surprised that they are there.
What do you think about it?

Jonathan replies: Interesting that Google has recorded these. As you can see I have removed some of the detail because I don't want to perpetuate the problem. We all need to be careful about the data we publish on the web - especially email addresses.

On October 7, 2009 at 6:31 PM, Mona wrote:

Hi Jonathan,
I did not know about this activity. I only done the enrolment activity which I thought was 'Activity 1' and seen this one only now. The deadline has passed, but could I still do this activity?

Thanks

Jonathan replies: Yes - I will be quite lenient for the first few activities - complete as soon as you can.

On October 29, 2009 at 2:06 PM, New Student wrote:

Hi Jonathan,
I'm a new student and I just completed my enrolment,on the 26th of october, i want to know if i can still do this activity, as it seems the deadline for it might have passed.

Thanks.

Jonathan replies: Yes - I will accept work late from newly enrolled students.

On November 3, 2009 at 8:44 PM, Cadhene Lubin-Hewitt wrote:

Hi jonathon, I am a new student and i gotta say, i love doing your work...but one problem i have is, how do i submit it

Jonathan replies: There is a link in every activity (each part) to an online feedback form - just look a little harder :-)

What do you think?







Add your comments