You are on page 1of 11

1

B.Sc (Computer Sc ience)
Web Technologies
Unit- III

1. Disc uss about Directories, Search Engine a nd Meta SearchEngine.

Information about web pages is contained in database that has already
created, either manually or using special programs that search the web for pages.
Request for information is answered by the search tool retrieving the information
from its already constructed database of indexed web pages.
The following are the different approaches in response to the need to organize
and locate information on the World Wide Web.
· Directories
· Search Engines
· Meta Search Engines

Directories: The first method of finding and organizing web information is the
directory approach. A directory offers a hierarchical representation of hyperlinks to
the web pages and presentation broken down into topics and subtopics.

Directories can be classified as either general or specialized. A general
directory is also called a web directory, a subject directory or sometimes a web
guide. The top level of a general directory prov ides a wide range of very broad topics
such as arts, automobiles, education, news, science, sports and so on.

Popular General Directories: LookSmart, Lycos, Yahoo

Search Engines: The second approach to organizing information and locating
information on the web is a search engine w hich is a computer program that does
the following:

1. Allows the user to submit a form containing a query that consists of a word or
phrase describing the specific information user is trying to locate on the web.
2. Searches its database to try to match the query.
3. Collates and returns a list of clickable URLs containing presentations that
match the query; the list is usually ordered, w ith better matches appearing at
the top.
4. Permits the user to revise and resubmit a query.

A number of search engines also prov ide URLs for related or suggested topics. Like
directories, search engines can be classified as either general or specialty search
engine. A specialty search engine is also called a vertical search engine or a topic
search engine. Many people find that search engines are not as easy to use as
directories. To use a search engine, the user supplies a query by entering
information into a field on the screen. To be effective, the search engine returns a
small list of URLs on the user’s topic. To pose such queries, user must learn the
query syntax of the search engine w ith which user is working.

Popular search engines are: AltaVista, AskJeeves, Google and Excite.

MetaSearc h Engines: A metasearch engines or all-in one search engines performs
a search by calling on more than one search engines to do the actual work. A
metasearch engine does not maintain its own database of information by submitting
searches to other search engines; it queries the databases of the other search

IIMC Prasanth Kumar K
2
engines. The particular set of search engines that each metasearch will send a query
to varies.

Many metasearch engines will collate the search results into one list, remove
duplicates and then rank the pages according to how well they match the query. The
advantage of a metasearch engine is that user can access a number of different
search engines with a single query. The disadvantage is that user will have a high
noise-to-signal ratio; that is, a lot of the matches will not be of interest to the user.
This means user will need to spend more time evaluating the results and deciding
which hyperlinks to follow.

Popular MetaSearch Engines are: DogPile, InFind, Mamma, MetaCrawler and
MetaSearch.

2. Explain a bout Sea rch Fundamenta ls and Searc h Strate gies.

Search Terminology:
1. Search Tool: Any mechanism for locating information on the Web; usually
refers to a search or metasearch engine or a directory.
2. Query: Information entered into a form on a search engine’s web page that
describes the information being sought.
3. Query Syntax: A set of rules describing what constitutes a legal query. On
some search engines, special sy mbols may be used in a query.
4. Query Semantics: A set of rules that defines the meaning to a query.
5. Hit: A URL that a search engine returns in response to a query.
6. Match: A synony m for hit.
7. Re levancy Score: A value that indicates how close a match a URL was to a
query; usually expressed as a value from 1 to 100 w ith the higher score
meaning more relevant.

Pattern Matching Que ry: The most basic type of query is a pattern matching
query. We formulate a pattern matching query using a keyword or a group of
keywords. The search engine returns the URL of any page that contains these
keywords.

Boolean Queries: Boolean queries involve the Boolean operations AND, OR and
NOT. Most search engines allow the user to enter Boolean queries.

Search Strategies: User can begin by testing a number of different search engines,
trying to find one that meets the following conditions:
· Possesses a user-friendly interface.
· Has easy-to-understand, comprehensive documentation.
· Is convenient to access; that is user need to wait several minutes before
being able to submit a query.
· Contains a large database so that it knows a lot about the information for
which user is searching.
· Does a good job in assigning relevance scores.

User can find a search engine that meets most of the above criteria. User should
concentrate on learning it well rather than learning a little bit about several
different search engines.

Search Gene ralization (Fe w Hits): Suppose the query returns no hits or less
number of hits, we need to generalize the search.

IIMC Prasanth Kumar K
3
· If user used a pattern matching query, eliminate one of the more specific
keywords from the query.
· If the user used a Boolean query, remove one of the keywords or phrases
with which user used AND or delete a NOT item which is specified.
· If user restricts the search domain.
· If user is still unlucky, try keywords that are more general or exchange a
couple of the keywords with synony ms.
· If this fails, user can decide to use a directory. Another alternative way is
to use a metasearch engine.

Search Specialization (Too Ma ny Hits): Suppose the query returns more URLs,
then user needs to specialize the search:

· If the user started with a pattern matching query, then user may want to add
more keywords.
· If user began with a Boolean query, user need to AND another keyword or
use the NOT operator to exclude some pages.
· If user is still retrieving too many hits, try capitalizing proper nouns and
pronouns or names.
· If nothing seems to work, try review ing first 20 URLs since search engines list
the best matches near the top. If they don’t contain the seeking information,
user can refine the search.
· If this fails, user could resort to a directory and work dow n to the topic of
interest.

3. Explain the working of Searc h Engine.

Search Engine Compone nts: Based upon functionality, the search engine is splits
into the following components.

1. User Interface: The screen in which user types a query and w hich displays
the search results.
2. Searche r: The part that searches a database for information to match the
query.
3. Evaluator: The function that assigns relevancy scores to the information
retrieved.
4. Gathe rer: The component that traverses the web collecting information
about the web pages.
5. Indexer: The function that categorizes the data obtained by the gatherer and
creates the index.

User Interface: The user interface must provide a mechanism by which a user
can submit queries to the search engine. This is universally done using forms. In
addition, the user interface should be friendly and v isually appealing. Hyperlinks
to help files should be displayed prominently and advertisements should not
hinder a reader’s use of the search engine. Finally, the user interface needs to
display the results of the search in a convenient way. The user should be
presented w ith a list of hits from the search, a relevancy score for each hit and a
summary of each page that was matched. This way, the user can make an
informed choice as to which hyperlinks to follow.

Searche r: The searcher is a program that uses the search engine’s index and
database to see if any matches can be found for the query. The query must first
be transformed into a syntax that the searcher can process. Since the database
associated with the search engine is extremely large, a highly efficient search

IIMC Prasanth Kumar K
4
strategy must be applied. Computer scientists have spent years developing
efficient search and sorting strategies.

Evaluator: The searcher locates any URLs that match the query. The hits
retrieved by the query are called the result set of the search. Not all of the hits
will match the query equally well. The relevancy score is an indication of how well
a given page matched with the query. The relevancy score varies from search
engine to search engine. A number of different factors are involved and each one
contributes a different percentage. Some of the factors are:
· How many times the words in the query appear in the page.
· Whether or not the query words appear in the title.
· The proximity of the query words to the beginning of the page.
· Whether the query words appear in the CONTENT attribute of the META
tag.
· How many of the query words appear in the document.

Gathere r: A search engine obtains its information by using the gatherer, a
program that transverses the Web and collects the information about Web
documents. The gatherer does not collect the information every time a query is
made. Rather, the gatherer is run at regular intervals and it returns the information
that is incorporated into the search engine’s database and is indexed. A gatherer
may employ essentially two different methods to search the web for the new pages.
Both techniques are well-known search strategies in computer science; they are:
1. Breadth-First Search
2. Depth-First Search

Breadth-First Sea rch: A breadth-first search proceeds in levels "across" the pages.
The gatherer begins at a particular Web page and then explores all pages that it can
reach by using only one hy perlink from the original page. Once it has exhausted all
Web pages at that one level, it explores all of the Web pages that can be reached by
follow ing only one hyperlink from any page that was discovered at one level. In this
way, a second level, which usually contains many more web pages than the first
level, is explored. This process is repeated level by level until no new Web pages are
found. When no more pages can be located, the search may need to jump to a new
starting point.

Depth-First Sea rch: A depth-first search proceeds by following a chain of
hyperlinks "down" as far as possible. The gatherer begins at a particular Web page
and explores one of its hyperlinks. At the new page, the gatherer follows another

IIMC Prasanth Kumar K
5
hyperlink. At the next page, one of its hyperlinks is followed, and so on. In contrast
to the breadth-first search, hyperlinks on a given page are not fully exhausted before
the gatherer goes to the next-level page. When the gatherer reaches a page from
which no new pages can be discovered, the search backtracks until it can go forward
again and discover new pages.

Indexer: Once the gatherer retrieves information about Web pages, the information
is placed into a database and indexed. The indexer functions create a set of keys (an
index) that organizes the data, so that high-speed searches can be conducted and
the desired information can be located retrieved quickly. The equivalent elements
that should go into a Web page record include the URL, document title, and
descriptive keywords.

4. Explain a bout Telnet a nd Remote Login.

Telnet and Remote Login are two programs that allow the user to log into another
computer from an account in to which user is already logged. To do this, user needs
a second computer that is accessible to the user. The second computer is usually at
a different physical location.

Te lnet: The telnet command uses the Telnet protocol to log into a remote computer
on the internet. The command is often called telnet, but different programs w ith
names like tn3270 (IBM 3270 Machine) , WinQVT (Query/View/Transformation) and
QWS3270(Quick Windows Sequencer).
There are a wide range of Telnet clients and many of them have a user-
friendly interface. On a desktop system, a Telnet client can usually be launched from
one of the system’s menus simply by selecting the Telnet option.
If the telnet is not located on the desktop and it is a Windows operating
system, there is still a good chance that there is a Telnet client on the system. To
determine whether the system has Telnet or not, go to the start menu and select
Find. Under F ind, select the Files or Folders option. Now simply enter the word
“telnet” in the search area. The telnet.exe file is an executable telnet program.
In a windows env ironment, in the Telnet interface, select RemoteSystem
option from Connect pull-down menu causes the Connect window to display within
the Telnet window. The form in the Connect window specifies the hostname, port and

IIMC Prasanth Kumar K
6
terminal type of the computer to which user is connecting. Generally, all we need to
do is type in a hostname and push the Connect button. For example, to connect to
abc.com, just type the machine name or its IP address in the Host field.

On UNIX system, we can type the command telnet at the operating system
prompt. We receive the following prompt.

telnet>

We can type the open command followed by the hostname of the computer to
connect as follows:

telnet> open hostname

The hostname is the machine domain name or the IP address of the machine. In
some case, we need to type port number also.

Typing help or ? at the Telnet prompt w ill usually result in the Telnet documentation
being displayed. When the Telnet needs to quit, we can type close or quit to end it.

We can use Telnet in the browser’s address bar, by typing

telnet://hostname

One of the most common uses of Telnet is to log into personal machine to retrieve
email while traveling. Be warned that the process of reading email in this fashion can
be very tedious from many countries. The connections are often slow that sometimes
it is impossible to retrieve.

Remote Login: The rlogin command is similar to the telnet command, except that
it provides the remote computer with information about where we are logging in
from. If the machine that we are performing the remote login from is listed in the
remote machine’s file of hostnames, we need not enter any password.
On UNIX systems, the list of hostnames is given in a hidden f ile called
.rhosts. From UNIX prompt, the syntax for the rlogin command is

%rlogin hostname

Where hostname is the name of the machine from w hich we want to establish a
remote login connection. All the commands entered w ill run on the remote machine
until the remote session is terminated by using an exit command.

Telnet is more secure remote login mechanism than rlogin.

IIMC Prasanth Kumar K
7

5. Explain a bout File Transfer.

File Transfer is an application that allows the user to transfer files between two
computers on the Internet or on the same network. The two most import file transfer
functions are:
· Copying a file from another computer to user’s computer.
· Sending a file from user’s computer to another computer.

The process of transferring the file from user’s computer to another computer is
called uploading. The process of getting the file from another computer to user’s
computer is called downloading. When copying the file, user should first run virus
detection software on them before using them. This helps safeguard against the
computer getting infected, but it is not a guarantee.

Gra phical File Transfer Clie nt: Graphical file transfer clients are the easiest to use.
These applications display the sending computer’s f ile system in one w indow and the
receiving computer’s file system in a second window.
In order to connect to a remote site using a graphical FTP client, user should
first click on Connect button. In the first line, we simply type in the hostname or the
IP address of the remote system we are connecting. In the third line, we enter the
user account name and in the fourth line, the password. Once we type all the
information, we can press the OK button. This will connect the user to the remote
system.
Many features of a graphical FTP client are self-explanatory. For example, to
transfer a file f rom one system to another, we can drag a nd drop it to the other
system. Files can be thus be exchanged in either direction.
One important point is the transfer setting mode. This can usually be specified
by clicking on a button. Most clients have a text transfer mode (ASCII) and a binary
transfer mode and Auto. All file types can be transferred using binary mode, but not
all files can be transferred using text mode.
After completing an FTP session, it is a good practice to close the session by
clicking on the Close button and then exit the FTP client by clicking on the Exit
button.
The following steps are followed while transferring the file:
1. Locate the file to transfer.
2. Launch the FTP client on the PC
3. Connect and login to the remote UNIX system.
4. Change to the appropriate directories on both the local and remote systems.
5. Select the appropriate transfer mode.
6. Select the file to transfer.
7. Transfer the file.
8. Close and exit FTP.

Text-Based File Tra nsfe r Client: we can launch the UNIX f ile transfer client called
File Transfer Protocol by entering the command.

%ftp hostname

Here hostname is the name of the computer w ith which we want to exchange files.
Once we have successfully initiated an FTP session by supply ing userid and
password, we get the following prompt

ftp>

IIMC Prasanth Kumar K
8
The following are the some of the FTP commands:

· Bye - Terminate the session and exit the file transfer program
· Cd - Change directory
· Get - Copy a file
· Help – View the list of commands
· Ls - list of files in the current working directory
· Put – send a copy of the file
· Pwd – Print the name of the current directory.

File Compression: It is common to compress the files that are to be transferred
between two computers. Compressing a file makes it smaller and the compressed file
can be transferred more quickly over a network. A w ide variety of compression tools
are available like WinZip, PKZIP or gzip.

Anonymous File Transfer: On some systems, files are made available to anyone
who wants to retrieve them. If a file needs to be widely distributed, it may not
feasible to assign accounts and passwords to everyone interesting in receiving a copy
of the file. Anony mous file transfer was established to solve this problem.

6. Explain a bout Virus avoidance and precautions.

Computer Viruses: Some of the programs that are downloaded from the internet or
obtain as email attachments may threaten the security of the computers if they
contain Virus, Trojan Horse and Worms.

Virus :A v irus can be thought of as a program that when run can replicate and then
embed itself within another program. Although there are harmless v iruses, most are
intended to damage the host system. The damage can occur immediately by f illing
all the available space in the hard disk or it may occur after some later time. The
damage of the computer might involve something as innocent as a message being
displayed on the desktop. Before doing damage, the virus could infect the other
programs on the computer as well as other computers if we send program f iles to
others. A specific even, such as on a particular date, the virus becomes active and it
is called a trigger.

Troja n Horse: The name came from Greek Mythology. It is a legitimate program for
carrying out some useful function, but within it is hidden code that is activated by
some trigger. When the hidden code is executed, it might release a virus, permit
unauthorized access to the computer or destroy files and data.

Worm: A worm is a stand-alone program that tries to gain access to computer
systems via networks. For example, a worm might try various password
combinations until it is successful. The 1988 Internet Worm created by Robet T.Moris
is a highly successful example. Although not designed to be destructive (the worm
was intended to be an experiment), the worm caused major problems when it
inadvertently consumed the available memory in the systems it inavaded.

Virus Avoida nce and Precautions: UNX viruses are rare, because of the strict
security measures on UNIX systems. Most viruses are designed to infect PCs or
Macs. A virus is usually target at one type or the other of such systems, since nearly
all viruses are operating-system specific. To protect from v iruses, Trojan Horses and
Worms, we need to take the following precautions:

IIMC Prasanth Kumar K
9
· Run antivirus software on any new programs. This software looks for viruses,
Trojan Horse programs by comparing data patterns found in user’s programs
to characteristics data patterns found in programs infected by known viruses.
· Do not download files form unknown sources. This includes mail attachments
from individuals and organizations unknown to the user. If user downloads
something, run antivirus software on it before opening the file or running the
program.
· Do not use pirated copies of software.
· Keep the antivirus software upto date, since new releases will contain
information necessary to identify the latest viruses and Trojan Horse
programs.
· Back up the files regularly after ensuring they are virus-free. If we lose data
or files because of a virus, we will be able to recover if we have current
backup files.

7. Differe ntiate between Sema ntic and Sy ntactic base style types.

Semantic-Based Style Types: These are also called as Content-Based Style Types.
These tags are used to indicate the content of the text. The following are the list of
Semantic Style types:

1. Emphasis Tag: The emphasis tag <EM> w ith its corresponding </EM> ending
tag is used for highlighting text.

<EM> This is Web Technologies</EM>

2. Strong Tag: The strong tag <ST RONG> is used to indicate an even higher level
of emphais.

<ST RONG> This is Web Technologies</ST RONG>

3. Citation Tag: The citation tag <CITE> is used to specify a reference. A collection
of citations creates a bibliography. Using the citation tag facilitates that collection
since every reference is bracketed between <CITE> and </CITE>

<CITE> Ray mond Greenlaw <BR>
Fundamentals of Internet and WWW<BR>
Tata McGraw Hill
</CITE>

4. Address Tag: The address tag <ADDRESS> is used to indicate an address. If it is
used throughout a series of Web pages, it is easy to automate the process of
developing an address book for the pages.

Please send mail at
<ADDRESS> Street No.1 <BR>
Road No.1 <BR>
Hyderabad<BR>
Pin-000000
</ADDRESS>

5. Keyboa rd Tag: This tag is used to delineate keyboard input.

Enter text <KBD> dir</KBD> to display the directory.

IIMC Prasanth Kumar K
10
6. Varia ble Tag: Computer scientists developed HTML and many of them like to talk
about programming. To do so on Web pages, they introduced the variable tag,
<VAR>. The variable tag is used to indicate an expression usually just a sequence of
letters that has a number of different possible values.

For example variable name file1 represent any file name. in the on-line
documentation, we are developing about file manipulation, we can specify how to
delete a file using the follow ing code:

Confirm to delete <VAR>f ile1</VAR>?

7. Code Ta g: The code tag <CODE> is used for specifying program code.

Start the program
<CODE> Let x=100<BR>
Let y=200<BR>
Add x and y <BR>
</CODE>

8. Sma ll Tag: To reduce the relative font size small tag is used <SMALL>.

This is a <SMALL>Web Technology </SMALL>.

9. Big Ta g: The big tag increases the relative font size.

This is a <BIG>Web Technology </BIG>.

Syntactic-Based Style Tags: These are also called as Physical-Based Style Tags.
These tags allow the programmer to tell the browser specifically how to display the
text on a web page. The following are the list of Syntactic Style types:

1. Bold Ta g: The bold tag <B> is used to make text in boldface. Most browsers
darken the text and widen the letters.

This is <B> Web </B> Technologies

2. Ita lics Tag: To place the portion of text in italics, use the italics tag <I>

This is <I> Web </I> Technologies

3. Monospaced Type writer Text: the typewriter text <TT> is used for placing text
in a monospaced typewriter font. This can be used to indicate that a certain phrase
needs to be typed in.

This text is <TT > Typed in a Typwriter</TT>

4. Strike Ta g: The strike tag <STRIKE> may be used for crossing out a word or a
phrase by hav ing a line drawn through it.

This is <ST RIKE> 50% offer</STRIKE> on purchasing.

5. Subsc ript Tag: The subscript tag <SUB> is used to generate subscript.
To get x1 + x2 =0

We use x<SUB>1</SUB>+ x<SUB>2 </SUB>=0

IIMC Prasanth Kumar K
11
6. Supe rscript Tag: The superscript tag <SUP> is used to generate subscript.
To get x2 + y2 =0

We use x<SUP>2</SUP>+ y <SUP>2</SUP>=0

7. Underline Tag: The underline tag <U> is used to underline text. Since
hyperlinks are depicted by underlining, the underline tag should be used sparingly
and only in situations where no confusion can result as to whether or not the
underlined item is a hyperlink.

This is <U>Web</U> Technologies

8. Blink Tag: Flashing text is created using the blink tag <BLINK>.

This is <BLINK> Web Technologies</BLINK>

8. Explain Hea ders and Footers in HTML.

Headers: The beginning part of a rendered Webpage is called the header. The
header is the information contained at the top of a rendered web page, not at the
top of an HTML source file. The header is not an HTML tag. The header is not
formatted within the head tag, but in the body of a document. Most headers contain
subset of the following information:
· The title of the page
· Last-updated information
· Signature of the page developer
· An icon or logo associated with the page.
· A counter of the number of visitors.
· An advertisement

The purpose of the header is to convey the most important information about the
page, introduce the page and set the tone for the page. In any collection of web
pages, it is a good idea to use consistent headers. This helps the reader to determine
the boundaries of the presentation. If a hyperlink leads to a different looking header,
readers realize they may have left the original presentation. Consistent headers help
tie the presentation together.

Footers: The bottom of many web pages contains similar information. The ending
part of a web page is called the footer. Footer is not an HTML element but rather web
page content appearing at the bottom of a page. Most footers contain subset of the
follow ing information.

· Nav igational aids
· Last-update information
· The webmaster’s name
· A mailto hyperlink to the webmaster.
· A hyperlink leading to FAQ page.
· A copyright notice.
· A disclaimer.
· A README file that usually contains acknowledgement.
· A publication date.
· Advertisements.

The purpose of the footer is to convey additional important information about a
page.

IIMC Prasanth Kumar K