DITA: Dita Coursework part one: Information retrieval in a public Library

Laura Doggett (110060156)

Information retrieval in a public Library

This Blog-post consolidates my understanding of IR; the processes involved in its management of information and the evaluation of these processes. I will use the context of a public Library to give specific examples and technical details from my own experiences to augment the theoretical understanding.

‘The essential problem in information representation and retrieval remains how to obtain the right information for the right user at the right time’ (Chu, 2003)

Representation aside, Chu has highlighted the rationale of IR; on the one hand we are faced with a flood of knowledge and on the other potential users of that knowledge. The Library professional has two tools at his disposal in bridging the gap between these segregated entities, the users queries and an IR system. The IR system should effectively match the users query to the right information and if each component works correctly a chain of information is accomplished[1]. This chain would be perceived as the user IR point of view whereas a system view deals more with storage of Information and a sources view deals more with presentation of Information.

The public Library we are considering uses an IR system called SPYDUS, ‘a web-centric, automated information management solution.’ (Civica, 2011) SPYDUS networks collections in Libraries across a Borough, records the membership database, circulation transactions and enables enquiries. Unlike a database, SPYDUS must contend with the problem of relevance. There is often no binary, yes or no answer to a users query and only the user can define the relevance of a results set. This is a key distinction between an Information retrieval system and a database, which is a system that organises data and by data we mean ‘a set of given facts’ (Chowdry, 2004). In the case of a database being queried there can be no doubt as to whether the right information has been retrieved, there is simply a correct answer and a false one.

An IR system has numerous functions it can utilise in searching information; these include Boolean searching, case sensitive searching, truncation, proximity searching, field searching, fuzzy, natural language and weighted searching.

SPYDUS however has three kinds of search available, natural language, field and Boolean searching. Boolean searching was invented by George Boole, ‘Boole used three operators, namely AND, OR, NOT, to summarize the logical operations of the human mind’. (Chu, 2003) When responding to a query, the operators can be used alongside keywords to expand or narrow a results set. The AND operator allows you to look for multiple keywords in the same piece of information. The OR operator allows you to expand your search by returning information with both or either the keywords. The NOT operator allows you to narrow your search by limiting the results to information that contains the first key word but excludes the second.

Chowdry says ‘Because of its simplicity, the Boolean retrieval model has formed the basis of most database management and Information retrieval systems.’ (Chowdry, 2004) Although simple, the Boolean model does still require the end user to know the commands and this could be considered a limitation when Library users utilise the catalogues independently.

Natural language searching however requires no previous skill set as it is based on the syntax a human would use to question another human. The query can be posed in its natural language and the IR system will find relevant documents via keyword ranking, which will present documents believed most likely to answer the query first. Although user friendly, natural language searching has it’s own limitations: synonyms, homographs and syntax. Novice users of the Library IR system may also have difficulty in finding a balance between being too precise or too vague in their query structure.

Having considered functions in SPYDUS that are limiting to the end user as a Library visitor, it is prudent to also consider limitations that also extend to the Library professional. One drawback to the system is that it requires the user to know the correct spelling of any searches posed. When faced with an unknown spelling, a foreign author for example, the user is forced to refer another source (most commonly the World Wide Web) to verify it first. This difficulty could be avoided by the addition of fuzzy searching to SPYDUS, which is ‘designed to find terms that are spelled incorrectly’ (Chu, 2003)

Considering the limitations of Boolean and natural language searching, and SPYDUS itself, it will be clear as to why it is often necessary to modify a query to improve the result set returned. Once a result set is obtained it may be unsatisfactory in two distinct ways. We may have too many results that are not relevant enough, in which case we need to improve the precision of the results, which in turn lowers the recall. Alternatively we may have relevant results but too few of them; in this case we must improve the recall that in turn lowers the precision. We can adjust the correlating values of precision and recall by being more or less specific with our query until we are satisfied with the relevance of our results set.

This relevance, as mentioned earlier, must be decided by the judgement of the end user and is therefore a slippery quality to measure. Nonetheless relevance feedback models, shown below, have been developed in order to make evaluation in a mathematical, objective manner.

Relevant documents retrieved/ Total documents retrieved = Precision

Relevant documents retrieved/ Total number of relevant documents in database = Recall

However in my experience of IR in a public Library, the Library professional is given direct feed back from the user (who’s query they are resolving) as to whether the search results are relevant. I believe this makes query modification the key function to successful Information retrieval in this context. When a library user is present at the time of the query they can instantly feedback on a results set, giving the Library professional the guidance to balance the precision and recall until the best possible information chain is formed. Thus enabling them to answer the aforementioned essential problem to the best abilities of the Library collection.

[1] The concept of the ‘right’ information requires further contemplation, as it is a subjective to the user, and will be returned to later in this blog.

Beaulieu, A. (2009), Learning SQL, Sebastopol: O’Reily Media

Chowdhury, G.G. (2004), Introduction to modern information retrieval, London: Facet Publishing.

Chu, H. (2003), Information representation and retrieval in the digital age, New Jersey: Information today.

Civica (2011) http://www.civica.co.uk/library-and-learning/spydus_library_automation_solution, Retrieved 26^th October 2011

MacFarlane, A. (2011) Lecture 04: Information Retrieval, London: City University.

MacFarlane, Andrew, Butterworth, Richard and Krause, Anton (2011) Lecture 03: Structuring and querying information stored in databases. London: City University.

Morville, P. and Rosenfeld, L. (2007), Information architecture for the world wide web, Sebastopol: O’Reily Media.

Blog address: http://laura-frances-dita.blogspot.com

DITA

Friday, 28 October 2011

Dita Coursework part one: Information retrieval in a public Library

No comments:

Post a Comment