SP10Q3Response

Information professionals believe that well designed information retrieval systems (IRs) such as online databases that contain structured access points and controlled vocabulary will facilitate user information access and retrieval. Many libraries have made online databases available for their users free of charge. However, many users prefer to use web searching engines such as Google, Google Scholar, Bing, and Yahoo to find information of interest.

Answer the following: a) Select a library or agency of interest to you. Compare the strengths and weaknesses of a search conducted in a structured IR system that you feel most comfortable with (e.g., ERIC) with a Google, Google Scholar, Bing, or Yahoo search. b) Discuss how the evaluation of retrieval performance of a structured IR system may vary from that of a web search engine. c) Discuss issues in using precision and recall ratios as measures of retrieval performance in IRs and web search engines

Information Environment: School Library

A. Relevance //Google// __Strengths__ Quick no password no knowledge of how to search readily available

__Weaknesses__ Poor academic focus too many results not selective challenge of how to evaluate sites that are returned in search result

//Database (example used EBSCO)// __Strengths__ reading level identified eliminates commercial sites only newspapers & magazines selectivity balanced approach includes peer reviewed articles __Weaknesses__ password needed subscription needed oddball of newspapers and magazines (mixture of scholarly and popular)

B. Discuss these terms __resource availability__ - a resource found on Google or internet search engine is readily available; resources only available through subscription databases are not available to just anyone, there has to be payment (Academic library students = tuition) __quality__ - credibilty, validity, reliability of the information and the source of the information [According to Dr. Bilial: Criteria for evaluating quality of information on the internet - source domain (.doc, .gov, .edu), authority of source, purpose or motivation, quality of writing, balanced views, currancy of info, sources cited, accuracy] __search vocabulary options__ - some databases (LCSH) have a specific vocabulary structure (United States--History--Civil War, 1861-1865); an internet search engine allows for "natural language" searching (US civil war) __search logic__ - either natural language, Boolean, truncation, nesting, both web search engines and subscription databases have Advance Search options enabling users to perform well-defined, specific, and filtered searching - the problem is they may be so specific as to decrease relevant results. __ranking__ - value given to search results (From [|OLDIS:] In [|information retrieval], the presentation of [|search] results in a sequence based on one or more criteria that, in some systems, the user may specify in advance. The most common are [|currency] ([|publication date]) and [|relevance], usually determined by the [|number] of occurrences of the [|search term]s typed as [|input] and their location in the [|record] (in [|title], [|descriptor]s, [|abstract], or [|text]) __time__

C. issues in using precision and recall ratios as measures of retrieval performance in IRs and web search engines //Recall// measures how well a search system finds what you want, and precision measures how well it weeds out what you don’t want. A "good" IR system extracts relevant information while weeding out non-relevant information. Issues in using these ratios as measures of performance include:
 * Need help here...
 * the idea of "relevance" is personal (what may be relevant to one, may not be to another
 * Leaves out issues of time, error-rate, user-satisfaction
 * Recall and precision are interrelated
 * Recall and precision are set measures (things are either relevant or not)...ranking by degree of relevance is used by some web search engines..

According to Lester and Koehler (pp 52-3) "Recall and precison are relevance measures and offer a means by which the quality of information retreived may be measured. Recall measures the ratio of relevant records returned in a search of a database to the total numbers of relevant records in that database. Precision is a measure of the number of relevant records retreived to relevant and nonrelevant records returned in any given database." L & K then talk about two issues with trying to use these measures: first there is an assumption "that in any given database the number of relevant and nonrelevant records or documents is known. For very large databases, it may not be possible to count or identify the number of relevant or nonrelevant records for any given search." For the WWW, it is impossible because of the size and dynamic nature. Second, defining relevance is subjective - results can be "highly to marginally relevant for any give query," and there could be disagreement about the relevancy of a return (probably based on the context of the query and the experience/knowledge of the searcher).

(Information for the following section found [|here]) In information retrieval, a perfect precision score of 1.0 means that every result retrieved by a search was relevant (but says nothing about whether all relevant documents were retrieved) whereas a perfect recall score of 1.0 means that all relevant documents were retrieved by the search (but says nothing about how many irrelevant documents were also retrieved). Sometimes, people are not clear in what they are searching for, or maybe they are just browsing... Browsing //is a comfortable and powerful paradigm (known as the Serendipity Effect)//
 * //Search results don't have to be very good.//
 * //Recall? Not important (as long as you get at least some good hits).//
 * //Precision? Not important (as long as at least some of the hits on the first page you return are good).//