RelationalDBDesign RelationalDBDesign


Internet Features   «Prev 

Visual Information Retrieval

This interesting addition to the Oracle tool suite is very similar to the interMedia tool. Both have a Java client that lets you retrieve an image file, modify it on your desktop, and then return it to the database. Both Visual Information Retrieval and interMedia use object types with methods as the primary definitions for tables storing the images. Both allow you to update the image record with attributes for size, format, compression, comments, and the like.
The difference between the tools lies in the added query capability of Visual Information Retrieval. This tool can compare images to one another using a scoring method that scores the correlation between two images: 0 (zero) means the images are a perfect match and 100 means the images share no common traits. Visual Information Retrieval also has another method called "similar," which compares two images and rates how similar they are to each other, according to specific criteria. Technology similar to these Visual Information Retrieval methods is used in face-recognition software.

(IMT) Oracle interMedia Text

After using the database for a while, I notice a trend that is common to many database applications: the amount of text stored in the database increases. New prospects send in resumes and their resumes are added to the PROSPECT table. Employees are evaluated, and their evaluations are added to the database. As the amount of text increases in the database, so does the complexity of the text queries performed against the database. Instead of just performing string matches, I need new text-search features, such as weighting the terms in a search of multiple terms or ranking the results of a text search. You can use Oracle's interMedia Text (IMT) option to perform text-based searches. In prior versions, this feature was known as the ConText Cartridge. If you have previously used ConText, you may find the configuration for IMT to be simpler and better integrated with the Oracle kernel. You can use IMT to perform wildcard searching, fuzzy matches, relevance ranking [1], proximity searching, term weighting, and word expansions.
Search engines almost always provide some form of relevance ranking and present findings to searchers in decreasing order of assumed relevance. The ranking methods normally are based on statistical relations between words in a query and words in a text, anchor words of in-links to the text, the location of the words in the text, and number of in-links to this site from other sites.

System-Computed Relevance and Ranking

An information retrieval system which will rank and order the records or their surrogates in a retrieved set needs a mechanism for calculating the closeness of a match between a user query and a document. The result of this calculation can be used to determine the order of presentation of members of the set to the searcher. That is to say, this calculation provides the system’s estimate of the relevance of the document, and the goal is that this estimate should be strongly correlated to the user’s judgment of the relevance of the document. The result of this calculation, the value given to the closeness of the match between the query and the document, has been called the retrieval status value, or rsv.
In a strict Boolean query system, one that specifies attribute values that must be present if a record is to be selected, each term present in the query or document could only have a weight of 0 or 1 and the resulting rsv[2] of a document could only have a value of 1 (accept) or 0 (reject) resulting in the traditional unranked, but assumed relevant, subset of the database. If weighted terms are used, a document’s rsv, computed from their values, can range anywhere from 0 to 1 and is therefore potentially much more useful.

Ranking

Since the purpose of the (rsv) "retrieval status value" is to provide a mechanism for evaluating the match between a document and a query, it allows the system to rank documents in descending order on the basis of their rsv. This means that the system can go down the ranked list and present the user with a complete, ordered list of all documents that have a positive value of rsv or the top-ranking n documents of the list, where n can be set by the user. These would be those the system judges most likely to be deemed relevant by the user. This is what is called mathematically a weak ordering[3], meaning that ties are allowed. If the rsv is binary there is no choice but to present all documents that meet the formal requirements of the query, an option often frustrating to users. Increasingly, IR systems are providing relevance ranking options, and on the Web where precise queries may not be possible and document attributes not explicit, all search engines utilize such rankings.

Challenge with Ranking

A difficulty with ranking is that users are not usually told what the system's base for the calculation is. Where users have been polled for their reactions, they seem to like it. Would it make any significant difference if they were told the basis or given an opportunity to make a contribution to the method, perhaps to emphasize words occurring in the text, name of the author, or source? There is no research on this question to date although systems exist that give the user the opportunity to supply terms to be used for ranking separate from those used in the search, e.g., the AltaVista "Sort By" box. Asking users to make such choices calls for more involvement on their part, necessitating more knowledge of the system, something not all users want to invest in. But, it could lead to better retrieval outcomes.

[1] Relevance ranking: Relevancy ranking is the method that is used to order the results set in such a way that the records most likely to be of interest to a user will be at the top of the result set. This makes searching easier for users as they will not have to spend as much time looking through records for the information that interests them.
[2](rsv) Retrieval Status Value: The retrieved documents are ranked according to their retrieval status values if these are montonically increasing with the probability of relevance of documents.
[3] weak ordering: A weak ordering is a mathematical formalization of the intuitive notion of a ranking of a set, some of whose members may be tied with each other. Weak orders are a generalization of totally ordered sets (rankings without ties) and are in turn generalized by (poset) partially ordered sets and preorders.