1

B.Sc (Computer Sc ience) Web Technologies Unit- III 1. Disc uss about Directories, Search Engine a nd Meta SearchEngine. Information about web pages is contained in database that has already created, either manually or using special programs that search the web for pages. Request for information is answered by the search tool retrieving the information from its already constructed database of indexed web pages. The following are the different approaches in response to the need to organize and locate information on the World Wide Web. · Directories · Search Engines · Meta Search Engines Directories: The first method of finding and organizing web information is the directory approach. A directory offers a hierarchical representation of hyperlinks to the web pages and presentation broken down into topics and subtopics. Directories can be classified as either general or specialized. A general directory is also called a web directory, a subject directory or sometimes a web guide. The top level of a general directory prov ides a wide range of very broad topics such as arts, automobiles, education, news, science, sports and so on. Popular General Directories: LookSmart, Lycos, Yahoo Search Engines: The second approach to organizing information and locating information on the web is a search engine w hich is a computer program that does the following: 1. Allows the user to submit a form containing a query that consists of a word or phrase describing the specific information user is trying to locate on the web. 2. Searches its database to try to match the query. 3. Collates and returns a list of clickable URLs containing presentations that match the query; the list is usually ordered, w ith better matches appearing at the top. 4. Permits the user to revise and resubmit a query. A number of search engines also prov ide URLs for related or suggested topics. Like directories, search engines can be classified as either general or specialty search engine. A specialty search engine is also called a vertical search engine or a topic search engine. Many people find that search engines are not as easy to use as directories. To use a search engine, the user supplies a query by entering information into a field on the screen. To be effective, the search engine returns a small list of URLs on the user’s topic. To pose such queries, user must learn the query syntax of the search engine w ith which user is working. Popular search engines are: AltaVista, AskJeeves, Google and Excite. MetaSearc h Engines: A metasearch engines or all-in one search engines performs a search by calling on more than one search engines to do the actual work. A metasearch engine does not maintain its own database of information by submitting searches to other search engines; it queries the databases of the other search

IIMC

Prasanth Kumar K

2
engines. The particular set of search engines that each metasearch will send a query to varies. Many metasearch engines will collate the search results into one list, remove duplicates and then rank the pages according to how well they match the query. The advantage of a metasearch engine is that user can access a number of different search engines with a single query. The disadvantage is that user will have a high noise-to-signal ratio; that is, a lot of the matches will not be of interest to the user. This means user will need to spend more time evaluating the results and deciding which hyperlinks to follow. Popular MetaSearch Engines are: DogPile, InFind, Mamma, MetaCrawler and MetaSearch. 2. Explain a bout Sea rch Fundamenta ls and Searc h Strate gies.

Search Terminology: 1. Search Tool: Any mechanism for locating information on the Web; usually refers to a search or metasearch engine or a directory. 2. Query: Information entered into a form on a search engine’s web page that describes the information being sought. 3. Query Syntax: A set of rules describing what constitutes a legal query. On some search engines, special sy mbols may be used in a query. 4. Query Semantics: A set of rules that defines the meaning to a query. 5. Hit: A URL that a search engine returns in response to a query. 6. Match: A synony m for hit. 7. Re levancy Score: A value that indicates how close a match a URL was to a query; usually expressed as a value from 1 to 100 w ith the higher score meaning more relevant. Pattern Matching Que ry: The most basic type of query is a pattern matching query. We formulate a pattern matching query using a keyword or a group of keywords. The search engine returns the URL of any page that contains these keywords. Boolean Queries: Boolean queries involve the Boolean operations AND, OR and NOT. Most search engines allow the user to enter Boolean queries. Search Strategies: User can begin by testing a number of different search engines, trying to find one that meets the following conditions: · Possesses a user-friendly interface. · Has easy-to-understand, comprehensive documentation. · Is convenient to access; that is user need to wait several minutes before being able to submit a query. · Contains a large database so that it knows a lot about the information for which user is searching. · Does a good job in assigning relevance scores. User can find a search engine that meets most of the above criteria. User should concentrate on learning it well rather than learning a little bit about several different search engines. Search Gene ralization (Fe w Hits): Suppose the query returns no hits or less number of hits, we need to generalize the search.

IIMC

Prasanth Kumar K

3
· · · · · If user used a pattern matching query, eliminate one of the more specific keywords from the query. If the user used a Boolean query, remove one of the keywords or phrases with which user used AND or delete a NOT item which is specified. If user restricts the search domain. If user is still unlucky, try keywords that are more general or exchange a couple of the keywords with synony ms. If this fails, user can decide to use a directory. Another alternative way is to use a metasearch engine.

Search Specialization (Too Ma ny Hits): Suppose the query returns more URLs, then user needs to specialize the search: · · · · If the user started with a pattern matching query, then user may want to add more keywords. If user began with a Boolean query, user need to AND another keyword or use the NOT operator to exclude some pages. If user is still retrieving too many hits, try capitalizing proper nouns and pronouns or names. If nothing seems to work, try review ing first 20 URLs since search engines list the best matches near the top. If they don’t contain the seeking information, user can refine the search. If this fails, user could resort to a directory and work dow n to the topic of interest.

·

3. Explain the working of Searc h Engine. Search Engine Compone nts: Based upon functionality, the search engine is splits into the following components. 1. User Interface: The screen in which user types a query and w hich displays the search results. 2. Searche r: The part that searches a database for information to match the query. 3. Evaluator: The function that assigns relevancy scores to the information retrieved. 4. Gathe rer: The component that traverses the web collecting information about the web pages. 5. Indexer: The function that categorizes the data obtained by the gatherer and creates the index. User Interface: The user interface must provide a mechanism by which a user can submit queries to the search engine. This is universally done using forms. In addition, the user interface should be friendly and v isually appealing. Hyperlinks to help files should be displayed prominently and advertisements should not hinder a reader’s use of the search engine. Finally, the user interface needs to display the results of the search in a convenient way. The user should be presented w ith a list of hits from the search, a relevancy score for each hit and a summary of each page that was matched. This way, the user can make an informed choice as to which hyperlinks to follow. Searche r: The searcher is a program that uses the search engine’s index and database to see if any matches can be found for the query. The query must first be transformed into a syntax that the searcher can process. Since the database associated with the search engine is extremely large, a highly efficient search

IIMC

Prasanth Kumar K

4
strategy must be applied. Computer scientists have spent years developing efficient search and sorting strategies. Evaluator: The searcher locates any URLs that match the query. The hits retrieved by the query are called the result set of the search. Not all of the hits will match the query equally well. The relevancy score is an indication of how well a given page matched with the query. The relevancy score varies from search engine to search engine. A number of different factors are involved and each one contributes a different percentage. Some of the factors are: · How many times the words in the query appear in the page. · Whether or not the query words appear in the title. · The proximity of the query words to the beginning of the page. · Whether the query words appear in the CONTENT attribute of the META tag. · How many of the query words appear in the document. Gathere r: A search engine obtains its information by using the gatherer, a program that transverses the Web and collects the information about Web documents. The gatherer does not collect the information every time a query is made. Rather, the gatherer is run at regular intervals and it returns the information that is incorporated into the search engine’s database and is indexed. A gatherer may employ essentially two different methods to search the web for the new pages. Both techniques are well-known search strategies in computer science; they are: 1. Breadth-First Search 2. Depth-First Search Breadth-First Sea rch: A breadth-first search proceeds in levels "across" the pages. The gatherer begins at a particular Web page and then explores all pages that it can reach by using only one hy perlink from the original page. Once it has exhausted all Web pages at that one level, it explores all of the Web pages that can be reached by follow ing only one hyperlink from any page that was discovered at one level. In this way, a second level, which usually contains many more web pages than the first level, is explored. This process is repeated level by level until no new Web pages are found. When no more pages can be located, the search may need to jump to a new starting point.

Depth-First Sea rch: A depth-first search proceeds by following a chain of hyperlinks "down" as far as possible. The gatherer begins at a particular Web page and explores one of its hyperlinks. At the new page, the gatherer follows another

IIMC

Prasanth Kumar K

5
hyperlink. At the next page, one of its hyperlinks is followed, and so on. In contrast to the breadth-first search, hyperlinks on a given page are not fully exhausted before the gatherer goes to the next-level page. When the gatherer reaches a page from which no new pages can be discovered, the search backtracks until it can go forward again and discover new pages.

Indexer: Once the gatherer retrieves information about Web pages, the information is placed into a database and indexed. The indexer functions create a set of keys (an index) that organizes the data, so that high-speed searches can be conducted and the desired information can be located retrieved quickly. The equivalent elements that should go into a Web page record include the URL, document title, and descriptive keywords. 4. Explain a bout Telnet a nd Remote Login.

Telnet and Remote Login are two programs that allow the user to log into another computer from an account in to which user is already logged. To do this, user needs a second computer that is accessible to the user. The second computer is usually at a different physical location. Te lnet: The telnet command uses the Telnet protocol to log into a remote computer on the internet. The command is often called telnet, but different programs w ith names like tn3270 (IBM 3270 Machine) , WinQVT (Query/View/Transformation) and QWS3270(Quick Windows Sequencer). There are a wide range of Telnet clients and many of them have a userfriendly interface. On a desktop system, a Telnet client can usually be launched from one of the system’s menus simply by selecting the Telnet option. If the telnet is not located on the desktop and it is a Windows operating system, there is still a good chance that there is a Telnet client on the system. To determine whether the system has Telnet or not, go to the start menu and select Find. Under F ind, select the Files or Folders option. Now simply enter the word “telnet” in the search area. The telnet.exe file is an executable telnet program. In a windows env ironment, in the Telnet interface, select RemoteSystem option from Connect pull-down menu causes the Connect window to display within the Telnet window. The form in the Connect window specifies the hostname, port and

IIMC

Prasanth Kumar K

6
terminal type of the computer to which user is connecting. Generally, all we need to do is type in a hostname and push the Connect button. For example, to connect to abc.com, just type the machine name or its IP address in the Host field.

On UNIX system, we can type the command telnet at the operating system prompt. We receive the following prompt. telnet> We can type the open command followed by the hostname of the computer to connect as follows: telnet> open hostname The hostname is the machine domain name or the IP address of the machine. In some case, we need to type port number also. Typing help or ? at the Telnet prompt w ill usually result in the Telnet documentation being displayed. When the Telnet needs to quit, we can type close or quit to end it. We can use Telnet in the browser’s address bar, by typing telnet://hostname One of the most common uses of Telnet is to log into personal machine to retrieve email while traveling. Be warned that the process of reading email in this fashion can be very tedious from many countries. The connections are often slow that sometimes it is impossible to retrieve. Remote Login: The rlogin command is similar to the telnet command, except that it provides the remote computer with information about where we are logging in from. If the machine that we are performing the remote login from is listed in the remote machine’s file of hostnames, we need not enter any password. On UNIX systems, the list of hostnames is given in a hidden f ile called .rhosts. From UNIX prompt, the syntax for the rlogin command is %rlogin hostname Where hostname is the name of the machine from w hich we want to establish a remote login connection. All the commands entered w ill run on the remote machine until the remote session is terminated by using an exit command. Telnet is more secure remote login mechanism than rlogin.

IIMC

Prasanth Kumar K

7
5. Explain a bout File Transfer. File Transfer is an application that allows the user to transfer files between two computers on the Internet or on the same network. The two most import file transfer functions are: · Copying a file from another computer to user’s computer. · Sending a file from user’s computer to another computer. The process of transferring the file from user’s computer to another computer is called uploading. The process of getting the file from another computer to user’s computer is called downloading. When copying the file, user should first run virus detection software on them before using them. This helps safeguard against the computer getting infected, but it is not a guarantee. Gra phical File Transfer Clie nt: Graphical file transfer clients are the easiest to use. These applications display the sending computer’s f ile system in one w indow and the receiving computer’s file system in a second window. In order to connect to a remote site using a graphical FTP client, user should first click on Connect button. In the first line, we simply type in the hostname or the IP address of the remote system we are connecting. In the third line, we enter the user account name and in the fourth line, the password. Once we type all the information, we can press the OK button. This will connect the user to the remote system. Many features of a graphical FTP client are self-explanatory. For example, to transfer a file f rom one system to another, we can drag a nd drop it to the other system. Files can be thus be exchanged in either direction. One important point is the transfer setting mode. This can usually be specified by clicking on a button. Most clients have a text transfer mode (ASCII) and a binary transfer mode and Auto. All file types can be transferred using binary mode, but not all files can be transferred using text mode. After completing an FTP session, it is a good practice to close the session by clicking on the Close button and then exit the FTP client by clicking on the Exit button. The following steps are followed while transferring the file: 1. Locate the file to transfer. 2. Launch the FTP client on the PC 3. Connect and login to the remote UNIX system. 4. Change to the appropriate directories on both the local and remote systems. 5. Select the appropriate transfer mode. 6. Select the file to transfer. 7. Transfer the file. 8. Close and exit FTP. Text-Based File Tra nsfe r Client: we can launch the UNIX f ile transfer client called File Transfer Protocol by entering the command. %ftp hostname Here hostname is the name of the computer w ith which we want to exchange files. Once we have successfully initiated an FTP session by supply ing userid and password, we get the following prompt ftp>

IIMC

Prasanth Kumar K

8
The following are the some of the FTP commands: · · · · · · · Bye - Terminate the session and exit the file transfer program Cd - Change directory Get - Copy a file Help – View the list of commands Ls - list of files in the current working directory Put – send a copy of the file Pwd – Print the name of the current directory.

File Compression: It is common to compress the files that are to be transferred between two computers. Compressing a file makes it smaller and the compressed file can be transferred more quickly over a network. A w ide variety of compression tools are available like WinZip, PKZIP or gzip. Anonymous File Transfer: On some systems, files are made available to anyone who wants to retrieve them. If a file needs to be widely distributed, it may not feasible to assign accounts and passwords to everyone interesting in receiving a copy of the file. Anony mous file transfer was established to solve this problem. 6. Explain a bout Virus avoidance and precautions. Computer Viruses: Some of the programs that are downloaded from the internet or obtain as email attachments may threaten the security of the computers if they contain Virus, Trojan Horse and Worms. Virus :A v irus can be thought of as a program that when run can replicate and then embed itself within another program. Although there are harmless v iruses, most are intended to damage the host system. The damage can occur immediately by f illing all the available space in the hard disk or it may occur after some later time. The damage of the computer might involve something as innocent as a message being displayed on the desktop. Before doing damage, the virus could infect the other programs on the computer as well as other computers if we send program f iles to others. A specific even, such as on a particular date, the virus becomes active and it is called a trigger. Troja n Horse: The name came from Greek Mythology. It is a legitimate program for carrying out some useful function, but within it is hidden code that is activated by some trigger. When the hidden code is executed, it might release a virus, permit unauthorized access to the computer or destroy files and data. Worm: A worm is a stand-alone program that tries to gain access to computer systems via networks. For example, a worm might try various password combinations until it is successful. The 1988 Internet Worm created by Robet T.Moris is a highly successful example. Although not designed to be destructive (the worm was intended to be an experiment), the worm caused major problems when it inadvertently consumed the available memory in the systems it inavaded. Virus Avoida nce and Precautions: UNX viruses are rare, because of the strict security measures on UNIX systems. Most viruses are designed to infect PCs or Macs. A virus is usually target at one type or the other of such systems, since nearly all viruses are operating-system specific. To protect from v iruses, Trojan Horses and Worms, we need to take the following precautions:

IIMC

Prasanth Kumar K

9
· Run antivirus software on any new programs. This software looks for viruses, Trojan Horse programs by comparing data patterns found in user’s programs to characteristics data patterns found in programs infected by known viruses. Do not download files form unknown sources. This includes mail attachments from individuals and organizations unknown to the user. If user downloads something, run antivirus software on it before opening the file or running the program. Do not use pirated copies of software. Keep the antivirus software upto date, since new releases will contain information necessary to identify the latest viruses and Trojan Horse programs. Back up the files regularly after ensuring they are virus-free. If we lose data or files because of a virus, we will be able to recover if we have current backup files.

·

· ·

·

7. Differe ntiate between Sema ntic and Sy ntactic base style types. Semantic-Based Style Types: These are also called as Content-Based Style Types. These tags are used to indicate the content of the text. The following are the list of Semantic Style types: 1. Emphasis Tag: The emphasis tag <EM> w ith its corresponding </EM> ending tag is used for highlighting text. <EM> This is Web Technologies</EM> 2. Strong Tag: The strong tag <ST RONG> is used to indicate an even higher level of emphais. <ST RONG> This is Web Technologies</ST RONG> 3. Citation Tag: The citation tag <CITE> is used to specify a reference. A collection of citations creates a bibliography. Using the citation tag facilitates that collection since every reference is bracketed between <CITE> and </CITE> <CITE> Ray mond Greenlaw <BR> Fundamentals of Internet and WWW<BR> Tata McGraw Hill </CITE> 4. Address Tag: The address tag <ADDRESS> is used to indicate an address. If it is used throughout a series of Web pages, it is easy to automate the process of developing an address book for the pages. Please send mail at <ADDRESS> Street No.1 <BR> Road No.1 <BR> Hyderabad<BR> Pin-000000 </ADDRESS> 5. Keyboa rd Tag: This tag is used to delineate keyboard input. Enter text <KBD> dir</KBD> to display the directory.

IIMC

Prasanth Kumar K

10
6. Varia ble Tag: Computer scientists developed HTML and many of them like to talk about programming. To do so on Web pages, they introduced the variable tag, <VAR>. The variable tag is used to indicate an expression usually just a sequence of letters that has a number of different possible values. For example variable name file1 represent any file name. in the on-line documentation, we are developing about file manipulation, we can specify how to delete a file using the follow ing code: Confirm to delete <VAR>f ile1</VAR>? 7. Code Ta g: The code tag <CODE> is used for specifying program code. Start the program <CODE> Let x=100<BR> Let y=200<BR> Add x and y <BR> </CODE> 8. Sma ll Tag: To reduce the relative font size small tag is used <SMALL>. This is a <SMALL>Web Technology </SMALL>. 9. Big Ta g: The big tag increases the relative font size. This is a <BIG>Web Technology </BIG>. Syntactic-Based Style Tags: These are also called as Physical-Based Style Tags. These tags allow the programmer to tell the browser specifically how to display the text on a web page. The following are the list of Syntactic Style types: 1. Bold Ta g: The bold tag <B> is used to make text in boldface. Most browsers darken the text and widen the letters. This is <B> Web </B> Technologies 2. Ita lics Tag: To place the portion of text in italics, use the italics tag <I> This is <I> Web </I> Technologies 3. Monospaced Type writer Text: the typewriter text <TT> is used for placing text in a monospaced typewriter font. This can be used to indicate that a certain phrase needs to be typed in. This text is <TT > Typed in a Typwriter</TT> 4. Strike Ta g: The strike tag <STRIKE> may be used for crossing out a word or a phrase by hav ing a line drawn through it. This is <ST RIKE> 50% offer</STRIKE> on purchasing. 5. Subsc ript Tag: The subscript tag <SUB> is used to generate subscript. To get x1 + x2 =0 We use x<SUB>1</SUB>+ x<SUB>2 </SUB>=0

IIMC

Prasanth Kumar K

11
6. Supe rscript Tag: The superscript tag <SUP> is used to generate subscript. To get x2 + y2 =0 We use x<SUP>2</SUP>+ y <SUP>2</SUP>=0 7. Underline Tag: The underline tag <U> is used to underline text. Since hyperlinks are depicted by underlining, the underline tag should be used sparingly and only in situations where no confusion can result as to whether or not the underlined item is a hyperlink. This is <U>Web</U> Technologies 8. Blink Tag: Flashing text is created using the blink tag <BLINK>. This is <BLINK> Web Technologies</BLINK> 8. Explain Hea ders and Footers in HTML.

Headers: The beginning part of a rendered Webpage is called the header. The header is the information contained at the top of a rendered web page, not at the top of an HTML source file. The header is not an HTML tag. The header is not formatted within the head tag, but in the body of a document. Most headers contain subset of the following information: · The title of the page · Last-updated information · Signature of the page developer · An icon or logo associated with the page. · A counter of the number of visitors. · An advertisement The purpose of the header is to convey the most important information about the page, introduce the page and set the tone for the page. In any collection of web pages, it is a good idea to use consistent headers. This helps the reader to determine the boundaries of the presentation. If a hyperlink leads to a different looking header, readers realize they may have left the original presentation. Consistent headers help tie the presentation together. Footers: The bottom of many web pages contains similar information. The ending part of a web page is called the footer. Footer is not an HTML element but rather web page content appearing at the bottom of a page. Most footers contain subset of the follow ing information. · · · · · · · · · · Nav igational aids Last-update information The webmaster’s name A mailto hyperlink to the webmaster. A hyperlink leading to FAQ page. A copyright notice. A disclaimer. A README file that usually contains acknowledgement. A publication date. Advertisements.

The purpose of the footer is to convey additional important information about a page.

IIMC

Prasanth Kumar K