DITA: Dita Coursework part two: Web 3.0 technologies and their potential to develop Museum websites

Web 3.0 technologies and their potential to develop Museum websites

In a time when the heritage arena is saturated (Moore, 1994), with museums that continue to compete for visitors (Moore, 1994), and those visitors ‘are spending 65% more time online than three years ago’ (BBC news, 2010), it becomes important to assess the role of the Web within these organisations.

This blog post will make analysis of the potential use of Web 3.0 technologies in the advancement of museum websites; part one will establish what these technologies are and part two will assess how they can be utilised. The specific example of Reading museum and its evolvement in the Your Painting project is examined to augment the theoretical understanding.

Part one: Web 3.0

The Semantic Web

In the context of this post, Web 3.0 technologies, refer to a cluster of related tools, largely developed by the W3C in their hopes of encouraging the next generation of the Web.

The Semantic Web is the brainchild of computer scientist Tim Berners-Lee, who created the Web, as we know it, in the 1990s (Berners-Lee, 1999). In his 2009 TED talk (TED, 2009) Berners- Lee illustrates the difference between the two Webs, explaining the current Web as the Web of linked documents and the Semantic Web as the Web of linked data. He describes the Web of linked data as ‘a web of data that can be processed directly or indirectly by machines’ (Berners-Lee, 1999); meaning the machines can understand the information they are processing.

The current Web has become a deluge of Web pages that must be sifted through in order to find required information. Whilst the information on this Web is composed in a language that machines cannot adhere meaning to, there are severe limitations in their ability to not only deliver the documents we require, but also to perform more complex tasks with the data imbedded within them.

Current search engines work by locating patterns of words within documents. The user forms their query and the search engine returns documents in which the same words appear, however the search engine has no understanding of the question asked, the words it’s looking for, or the documents returned. This style of Information Retrieval is explanation as to why an image search for ‘paintings by Francis Bacon’ will return works by Bacon, but also images of Francis Bacon painted by Leucian Freud.

The machines lack of semantic knowledge of the queries and documents also means that they are unable to make links between words; there is no scope for synonyms. For example a search for ‘Victorian furniture, will not return a document about a ‘Victorian chair’. Equally a search for ‘19^th Century Chair’ would not return a document relating to an ‘1852 chair’.

Further limitation is that the search engine will not pluck the required information from documents into one place; it is still necessary to sift through Web pages, just of a narrower range.

These limitations would be overcome by the implementation of the Semantic Web, as Diane M. Zorich describes in her paper ‘Beyond Bitslag: Integrating museum resources on the Internet’ (Zorich, 1995).

She proposes that this form of the Web ‘would allow users to search across sites without navigating through each individual resource. Instead of a web, a fabric would be a more appropriate analogy. Each warp and woof of fabric is interwoven, and the entire piece is examined to identify relevant segments. Because users wouldn’t search ‘link by link’ across sites, the process would be less time consuming and less disorientating. Users would not move from site to site to get information; instead, information would come to them from a universe of sites. They wouldn’t need to visit or even know where the information physically resides.’ (Zorich, 1995)

HTML, Web Services and XML

Berners-Lee believes the way for his original Web to evolve into Zorich’s fabric; the Semantic Web, is by ‘putting data on the web in a form that machines can naturally understand, or converting it to that form’ (Berners-Lee, 1999). To do this a new universal language is needed.

In the current Web, when viewing a Web page we are given information; but this information is arranged aesthetically, it is surrounded by formats, fonts and colours all set by the pages designer. The reason for this design is that a webpage is intended to be read by humans and as such is written in a language called HTML. This language labels the information with formatting instructions to tell a Web browser how to present it to the user. These instructions are the only part of the language machines can understand; they have no concept of the information embedded within.

A Web service is a way of communicating the information in a Web page, but stripped of its formatting instructions, it is written in a different language, one that is understood by machines. XML is a metalanguage, a series of grammatical rules that describe the making of a language that can be used to create ’self- describing information ’ (Bosak and Bray, 1999). It was developed using SGML (Standard generalised mark up language) as a building block by a W3C workgroup. XML allows different languages, known as dialects, to be created for different applications, as long as the rules are adhered to. The result is data that also contains information describing what it is, which can then be read by machines.

This is the great potential of XML languages, if a machine ‘identifies data, that data becomes available for other tasks. A software program can be designed to extract just the information that it needs, perhaps join it with data from another source, and finally output the resulting combination in another form for another purpose’ (Castro, 2001).

RDF- The Resource Description Framework

Although information written in XML will be in a form machines can process and extract data from, in order for this information to be organised, and links between it to be made, another Web 3.0 technology must be used in conjunction with it.

RDF – The Resource Description Framework is a language developed by the W3C. The hope is that RDF ‘should do for Web data what catalogue cards do for Library books’ (Bosak and Bray, 1999). It is a framework used to indentify Web pages and data within those pages as well as the relationships between, in an unambiguous manner.

The W3C say that ‘RDF is intended for situations in which this information needs to be processed by applications, rather than only being displayed to people. RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning’ (W3C, 2004).

RDF can be used to turn a natural language statement about a resource, whether a whole website or information within it, into one that can be understood by machines.

For Example,

http://www.artquotes.net/masters/bacon/paint_painting.htm has a creator whose value is Francis Bacon.

We are saying that the object, a digital copy of Francis Bacon’s Painting 1946, has a relationship to the property Francis Bacon. The relationship is that he painted it, he was the Creator, and the value of creator is Francis Bacon. RDF turns these links into three Uniform Resource Indicators or URIs, these are:

· The Subject (which is the painting)

· The Predicate (which is creator)

· The Object (which is Francis Bacon).

Information labelled in this way can be written as XML. If this particular resource was put onto the Web in this format, RDF/XML, then there could be no confusion as to whether it was a painting of, or by, Francis Bacon. A machine would be able to understand the words the image was labelled with, what they mean and how they relate to each other. It is a Web written in this format that will be the Semantic Web, the Web of meaning.

Part Two: Web 3.0 Technologies in the context of a museum

With the technologies available to make the Semantic Web a reality what is the impetus for heritage institutions to adopt them, and is it a viable option for them to do so?

In his book Weaving the Web; Berners- Lee takes the existence of online screen- scraping products – comparison Websites, as a generic rationale for the development of the Semantic Web. ‘I take as evidence of the desperate need for the Semantic Web the many recent screen-scraping products, such as those used by the brokers, to retrieve the normal Web pages and extract the original data’ (Berners-Lee, 1999).

The BBC and PCFs current Your Paintings project, although technically not a screen-scraping product, could be taken as similar evidence within the Museum sector.

Your Paintings

‘Your Paintings is a joint initiative between the BBC, the public Catalogue foundation (a registered charity) and participating collections and museums from across the UK.’ It ‘is a Website which aims to show the entire UK national collection of oil paintings’ (Your Painting, 2011)

This impressive Website is not a comparison site; information from over a thousand UK galleries (Your Painting, 2011) has been compiled into one online database, which allows users to search images of the collections of all the participating institutions simultaneously. However, it does respond to the same need as the screen scraping products, an alternative to trawling through countless Websites for disparate information on the same topic.

If we take Berners-Lee logic as premise, then this project demonstrates a need within the Museum sector for their collections to be available in a format enabled for the Semantic Web.

Reading Museum

Reading Museum is one of the institutions participating in the Your paintings project, and discussion with Art Curator Elaine Blake suggests that a vast amount of Museum time and resources were invested into doing so.

She says that there was a ‘great deal of work involved for the Museum. The Curator of Art spent along time checking information, organising work programmes and liaising with the PCF, particularly over their standardisation of museum data. The Art documentation officer sorted out images to send and accompanied the photographer, making sure that all works that needed photographing were done. Most importantly a knowledgeable volunteer who was already working on checking attributions and information relating to artists represented in the collection was diverted to make the oil paintings his priority’ (Doggett and Blake, 2011).

It also appears that the museum under took the task of creating standardised data, perfect for a transition into RDF. ‘All the information about the paintings came from the museum (pcf supplied Excel spreadsheet requiring standardised information for each record – title, date, artist, artists dates, medium, dimensions, IP, museum object number, previous attribution and a few others’ (Doggett and Blake, 2011).

Through their participation in the project Reading museum had already taken steps toward their collection being available on their website in a format enabled for the Semantic web, it just remains to convert the data into Web 3.0 technologies.

Zorich analyses the potential for this step quite exhaustively in her 1995 paper, and concludes that the benefits of Museums sharing their resources in this format are extensive, far beyond the realm of the image searches Your Paintings provides. Briefly the advantages she describes are better exhibition planning, improved cataloguing and research as well as conservators and curators learning from each other’s experiences (Zorich, 1995).

Museums’ not relying on an intermediary also eliminates problems of changing data and new acquisitions, which Elaine Blake explains they will be able to update if the project continues to secure funding, and only when the PCF specifies (Doggett and Blake, 2011).

Conclusion

The development of the Your paintings project shows the need within the heritage sector to create amalgamated access to resources. Individual institutions making their collections available online in Web 3.0 format, will enable Museums to ‘create a shared resource that offers unprecedented access to a large portion of the world’s cultural heritage’ (Zorich, 1995), leading to many advantages not only to users of Museum Websites but also to their staff.

Appendix

Email correspondence between Laura Doggett and Elaine Blake (Curator of Art at Reading Museum). 5/12/2011.

How did the museum get involved in the project? We were approached by PCF when they considered a Berkshire catalogue. We had already worked with the National Inventory of Oil Paintings project (pre.1900 European paintings) which was subsumed into the PCF project.

What made your decision to participate? We always look to opportunities to increase access to collections and this is an ambitious project to open the collection to the whole web-audience. It was always clear that if it succeeded it would make a unique and much needed research tool. It also built on our own pilot web-based image database (Reading foundation for Art collection)

How much work was involved in getting the collection online? How much time did the pcf spend at the museum? How many people were involved? Great deal of work involved for museum. Don’t know how much pcf spent but their involvement at the museum was 2 -3 days of photography. They provided a framework for data for individual paintings and, of course, all the subsequent collation, editing and importantly where we did not know intellectual property right info. they tried to discover that. The Curator of Art spent a long time checking information, organising work programmes and liasing with the pcf, particularly over their standardisation of museum data. The Art documentation officer sorted out images to send and accompanied the photographer making sure that all works that needed photographing were done. Most importantly a knowledgable volunteer who was already working on checking attributions and information relating to artists represented in the collection was diverted to make the oil paintings his priority. The information could not have been provided to a level of accuracy that the museum would be satisfied with in the time frame required by the pcf without his input.

Did the museum supply information to go with the paintings, if so, in what format? All the information about the paintings came from the museum (pcf supplied Excel spreadsheet requiring standardised information for each record – title, date, artist, artists dates, medium, dimensions, IP, museum object number, previous attribution and a few others. Museum also supplied images of works where we had a good image – we almost certainly provided more images than the pcf took – I think that this was probably unusual as adding images to our records of the art collection (7,500 works) had been a priority for the last few years.

Will the project be on going? Will the online collection be edited if the museum gains or looses pieces? And what would this involve? We expect the project to continue (hopefully pcf will continue to attract funding and BBC will have funds to adjust the related Your Paintings web-site) with further levels of data being requested next year along with the opportunity to correct any mistakes to current data, add new acquisitions (by the way, museum does not lose works), and add new data to existing entries. Further levels of data are likely to involve interpretation which is extremely time-consuming and therefore unlikely to happen (depends precisely on what pcf want). Art curator would expect to supply changes to existing data and new records including photography which will be quite onorous.

Do you feel the project has improved the museums website? No but it has thrown the store doors open to the world (literally considering the enquiries that I have been receiving)

Do you think that this project and virtual museum projects such as google ART could have an affect on the amount of visitors museums receive, either positive or negative? I think that whilst many people will use the image database without visiting that it is unlikely that anyone wanting to experience the original will be put off visiting because of seeing the database. Furthermore it will encourage many people to enquire further or visit the original because for the first time they will know where works are will and who to contact in order to visit them.

Bibliography

Blog address: http://laura-frances-dita.blogspot.com/

BBC news. (2010), www.bbc.co.uk/news/10122834, accessed 12th December 2011

Bearman,D. and Trant,J. (1997), Museums and the Web, Pennsylvania: Archives and Museums Informatics

Berners-Lee, T. (1999), Weaving the Web, London: The Orion Publishing group

Bosak, J. and Bray,T. (1999), XML and the Second-Generation Web, Scientific American: Feature article may 1999

Carbonell, B. (2004), Museum studies, Oxford: Blackwell Publishing

Castro, E. (2001), XML for the world wide web: visual quick start guide, Berkeley: Peachpit press

Doggett and Blake (2011), Email correspondence between Laura Doggett and Elaine Blake (Curator of Art at Reading Museum), Personal correspondence

Moore, K. (1994), Museum management, London: Routledge

Parry, R. (2007), Recoding the Museum, London: Routledge

TED, (2009), www.ted.com/talks/tim-berners-lee-on-the-next-web-html, accessed 12^th December 2011

W3C, (2004), www.w3.org/TR/rdf-primer, accessed December 12^th 2011

Your Paintings (2011), www.bbc.co.uk/arts/yourpaintings/about, accessed 12^th December 2011

Zorich, D. (1995), Beyond Bitslag: Integrating Museum resources on the Internet, Wired, December 1995, p.60

DITA

Saturday, 7 January 2012

Dita Coursework part two: Web 3.0 technologies and their potential to develop Museum websites

No comments:

Post a Comment