Blogs as Tools of Preservation
Defining credibility on the Web
For the past few years archivist and preservationists have been struggling with how to preservation digital records. Electronic media has become a part of almost all aspects of business and culture as more and more information is being produced digitally. The digital environment has a new set of problems pertaining to preservation. It also offers a great set of possibilities. The principles that archivist have been developing for digital preservation come from the traditional world of archives and are achievable when applied to discrete digital objects. The principles of authenticity, provenance, and fixity can translate to a collection of related digital objects; these principles are not as easily applied when looking at the web as a whole. Tim Berners-Lee envisioned the web as a community of shared information. “Its universality is essential: the fact that a hypertext link can point to anything, be it personal, local or global, be it draft or highly polished” is the essence of the Web [10]. In the environment of the Web documents will then have placement in a variety of collections through a variety of relationships.
How then does one preserve the electronic documents found on the World Wide Web while maintaining the provenance? One solution is the Internet Archive. In 1996 Brewster Kahle conceived of a method of capturing everything available on the World Wide Web for the sake of preservation. “The Internet Archive Wayback Machine contains over 300 terabytes of data and is currently growing at a rate of 12 terabytes per month” [18]. Accessing Information from the Archive can be clunking and incomplete despite the efforts to capture everything. Should we take the attitude of “saving everything” even if it becomes technically and economically feasible? In the end it may be impossible to capture everything because “the Web is both an information archive and a social network” [8]. How then is the decision made on what to preserve? Weblogs could provide an understanding of how to accomplish this task.
“11.30.03
Adventures in Vermont living, chapter thirty-four:
CHIMNEY FIRE!
It is bitter out this morning, a raw-wind day, little pellets of wet ice
spitting from the grey sky. I came home from Meeting laden with groceries (I
stopped for groceries after Meeting; the Quakers do not distribute groceries to
Meeting-goers or anything) and bundled them all in the house and set them on
the floor of the front room and thought, boy howdy, a fire would be just the
ticket to-day” [2]
This is what most people think of when discussing a weblog. Yet blogs, as they are commonly called, have matured into more than diaries of personal musings. Since the advent of free blogging applications the ease of publication has transformed the world of blogs from a group of HTML savvy techies to a reflection of social culture as a whole [6]. Members of a variety of communities have adopted blogging as a method of disseminating and filtering information. In a timely response to the proliferation of electronic information, individuals have begun providing the service of selection through the platform of blogs [11]. Yet, blogs are still governed by their authors. They provide a wonderful mix of personality, culture, and current events. “Blogs succeed largely because they are extremely native to the Web as Tim Berners-Lee conceived it in the first place” [35]. The blogosphere has exponential grown in the past few years making it more and more difficult to ignore the content being created on blogs.
Blogs should be thought of as micro-content management systems instead of online diaries. As the content created with the blog platform continues to increase and continues to become a part of culture, considerations about preservation will need to be made. Paul Conway offers “five basic values [which] inform the preservation of traditional materials and digital products: Longevity, Choice, Quality, Integrity, and Accessibility” [13]. Blogs offer the service of choice or selection because of their hypertext nature and because of the networked community they form. The question that arises then is what determines the integrity or credibility of this selection process? This article will focus on the principle of integrity as seen in the credibility of authorship and content selection within blogs individually and as a community.
Rebecca Blood author of We’ve Got Blog, traces the root of the word weblog to Jorn Barger who first used the name in December, 1997. “The original weblogs were link-driven sites. Each was a mixture in unique proportions of links, commentary, and personal thoughts and essays. . . These weblogs provide a valuable filtering function for their readers. The Web has been, in effect, pre-surfed for them” [6]. A list was compiled by Jesse James Garrett, editor of Infosift, and posted on Cameron Barrett’s site CamWorld. “CamWorld began on June 11, 1997 - a few days before [he] was to begin teaching a college class on HTML and new media design. [he] initially used the site to post links to web design articles [he] wanted [his] students to read for class” [3]. The site grew and was frequently referenced by other members of the community. Originally the community’s members were individuals who had been using the web and authoring with HTML. When blogging applications started being released in spring of 1999, the community grew from technically oriented individuals to anyone with the desire to create their own content. Blogger, one of the popular applications that survived the first wave, is freely available and “gives you a way to automate (and greatly accelerate) the blog publishing process without writing any code or worrying about installing any sort of server software or scripts” [1]. From the creation of this easy method of publication “blogs [began to] vary greatly: personal (diaries, photos, poetry, post news about a hobby, keeping in touch with family members, gossip, celebrity fan mail), technical (project updates, develop ideas collaboratively, platforms for uncensored ideas) or news (breaking news stories, rumors)” [39].
Along with Blogger, the other applications have “come out of the traditional web site content management space” [35]. “The software automatically formats and posts the entry. It also automatically archives older ones on separate pages. If categories are used in the creation of entries, the software can also create subject-specific archives based on keywords used” [31]. The blog’s design can also be customized to the author’s needs. These features place blogging applications in the ranks with many content management systems. Blogging applications may not be sophisticated to the extent that Bob Boiko describes in his book Content Management Bible [7]. Boiko envisions a CMS that allows for a process of collecting, managing and publishing content in any format. Through various layers of metadata, content components are managed by tracking authorship, version control, and the appearance of the output [7]. Increasingly, there's only a thin layer of functionality separating blogware from low-end Content Management solutions such as basic workflow, permissions, and update histories. Yet, blogs do have the ability to operate on multiple databases with a variety of users who can manage posts through draft status and future postings [19]. No longer can blogs be viewed as only simple pages containing links or ramblings from personal diaries. Blogging platforms now appeal to a wide audience and add a multilayer of functionality to the content being provided.
One application of the blog as a content management system is within the corporate environment. Corporations are using blogs not only to share news throughout the business but also to track business process on projects. “Many companies are creating team and project blogs for internal use. These blogs serve as centralized locations for knowledge management and project coordination” [37]. For example, Community Connect a company in New York that operates a variety of online communities, while interviewing candidates for a position posted comments about each candidate on a password protected weblog to help track their status. “Verizon Communications uses a weblog to collect news and intelligence about the industry and competitors” [32]. As these examples show, corporations are currently using blogs to maintaining company information electronically in a variety of ways. Blogs have been adopted by other communities as well.
Information specialists are another community that has adopted the blog for communicating information on the Web. “Librarians are great filters of information” making the platform of a blog a remarkable method for relay that information [34]. One of the most notorious librarian blogs is Gary Price’s site Resource Shelf. Price is seen as the “indefatigable finder of reference sources on the invisible web” [5]. By posting the information he finds on a weblog his readers have instant access without suffering a clogged email box. “Jenny Levine was one of the first to do this in 1995” as a way to accent valuable resources and to give librarians a reason to go on the Web [5]. Her weblog is The Shifted Librarian. Other library weblogs of note are Library Stuff and Librarian.net.
Blake Carver has taken his blog to the next level of community building by creating a collaborative blog called LISNews.com. By signing in with a user name and password a variety of people can post information to a single site. LISNews is divided in a variety of sections making it easy for readers to access the type of information they seek. Carver has over twenty librarians contributing to his site from all over the United States.
Blogs are not just representative of individual collaborations; non-corporate institutions have begun to adopt the media as well. Several public and academic libraries use blogs to broadcast recent information. “The UK’s Gateshead Library has a blog with topics ranging from the top albums of the 1980’s to Spider Man and the value of XreferPlus” and “the Waterboro Public Library, ME, has run a prolific blog with content added daily” [11]. Other examples of blogs found in the academic world include one maintained by Rowland Institute at Harvard and another by R. B. House Undergraduate Library at University of North Carolina – Chapel Hill.
In academia blogs have not just been picked up by the library community, but also by scholars. Ray Schroeder uses a blog to continue communication with his students. He began by emailing students in the early 1990s and quickly converted to producing listservs. Soon, he realized with the listservs that students were not removing their names after graduation but still received the updates and links he was providing. He also began to have inquires from other individuals on the campus who were not in his classes to be added to the listservs. “The listservs, at their high point, directly reached a few hundred students, former students, and colleagues. The Online Learning Update blog, on the other hand, collects thousands of visits each month” [33]. Schroeder’s experience is telling of how blogs can change and influence the flow of information.
Professors are not only using blogs in the classroom but also as a way to publish ideas and research. “In their skeptical moments, academic bloggers worry that the medium smells faddish, ephemeral. But they also make a strong case for blogging’s virtues [22]. With the freedom and informality of the web scholars are allowed to explore topics and expound to any desired length and not having to limit the content to sound bites. Henry Farrell an assistant professor of political science at the University of Toronto at Scarborough maintains a list of over ninety-three scholarly blogs most of which are less than a year old [22]. Disciplines listed on Farrell’s blog include political science, economics, sociology, law, and history just to name a few. The interesting aspect of these blogs is that the readership is not trapped within one discipline scholars’ interact across the community and even post information outside of their area of expertise. How then can one judge the credibility of information posted when the author is not viewed as an authority on the subject?
Defining Credibility on the Web
Nicholas Burbules defines credibility though a variety of methods. Judgments of credibility are assessed by what is useful, relevant, or interesting. Further assessment occurs when looking at the timeliness and comprehensiveness of information. “The standard criteria for judging credibility online are frustrated by the characteristic conditions of the World Wide Web and of the larger Internet” [8]. Assessing credibility on the Web is a difficult task. “The markers of institutional credibility and authority, the lines of tradition that allow viewers to judge media sources or publishers, for example, have not been settled yet” [8]. There are other perfunctory methods used to determine the credibility of a Web site such as visual quality and design, URL domain name, date of material, and personal judgment if the source appears to be authoritative [8]. Unfortunately these methods are not foolproof and can easily be taken advantage of by someone intentionally producing false information.
A recent study conducted by Stanford University Persuasive Technology Lab (SUPTL), “found that when people assessed a real Web site’s credibility they did not use rigorous criteria, nearly half of all consumers (or 46.1%) in the study assessed the credibility of sites based in part on the appeal of the overall visual design of a site, including layout, typography, font size and color schemes” [17]. Consumer WebWatch an affiliate of SUPTL suggests five general guidelines to follow when determining credibility on the Web: Identity, Advertising and Sponsorships, Customer Service, Corrections, and Privacy. Unfortunately each of these factors captured less than 10% of the participants’ attention within the Stanford study. Overwhelmingly participants evaluated credibility based on visual design. “This result indicates that Consumer WebWatch, along with librarians and information professionals, must increase efforts to educate online consumers so they evaluate the Web sites they visit more carefully and make better educated decisions” [17].
This research should be compared with other research which has shown that identity plays a strong role in determining credibility within online communities. David Millen and John Patterson examined the effects of an identity policy in online environments. They concluded that the “identity policy: bridged and enriched online and face-to-face interactions, promoted accountability in support of local commerce, and fostered a social norm of polite conversation” [30]. It has been shown that trust in online environments is decreased when it is difficult to asses the motives of a user [29]. The protocol of identifying oneself is also reflected in the blogging community because blogs are not only web sites of content but also members of a community.
Unlike other online communities that may have developed based on anonymity such as MOO and MUDs, bloggers more often than not want their identity know. Most of the blogs that act as content filters clearly identify who is authoring and selecting the information on the site. If there is not a link directly provided to a description of the author or a name visible on the site often an email address is provided. There are a number of individuals who are famous and well received within their professional communities who have also gained fame within blogging communities and the blogosphere. In these instances the individual identity lends to the credibility of the blog. Of course, the identity of a blog’s author does not have to be apparent on the blog, but can from outside the blogosphere either though other media sources or by virtue of the social network.
The social network is another way that credibility emerges from a blog. This is because of the nature of the blogosphere to link between blogs. As more people link to a blog the author gains social capital. This capital has value in and out of the blogosphere. Wil Wheaton, a child actor for the television series Star Trek: The Next Generation has made a name for himself inside and outside the blogosphere. “The blog has become Wheaton’s portal into a new career as writer” [20]. Links to Weaton’s blog can be found on a number of respected blogs within the blogging community such as Doc Searls, editor of Linux Journal. Also since this network exists inside and outside of the blogosphere, an anonymous blog may be added to a network because in the real world the author in known to the members of the network. The social network of a blog is visible through the use of a system of blogrolling which, though the name for the software which creates the roll, aptly describes its function. The blogroll becomes a salient marker of what other blogs are commonly read and linked to by the currently viewed blog. Most often the list appears in a sidebar next to the content of the blog.
From this list a pattern emerges. The most popular and authoritative blogs list each other. “These new nodes on the net are perhaps more analogous to year-round conferences” [5]. Leaders in the field quickly rise to the top of the blogroll and are linked to most frequently. The network that is built though professional organization and conferences is maintained thought the blog community and reflects the credibility of that community. The other way the community is facilitated is thought commenting on individual blogs. Authors can add a feature to their blog that allows readers to comment about most recent posts. “Credibility, authority, and accountability take the form of feedback and linking” [11]. Heated debates and lengthy discussions have taking place in the comments section of blogs. If someone posts erroneous information the author’s mistake will quickly be brought to his/her attention, therefore keeping the information provided within the blog accurate and credible.
This feature of commenting within the blogs also establishes credibility though out the community though a form of informal peer review. “In an essay in the May issue of Reason magazine, Mr. Sanchez [a staff writer at the Cato Institute] noted that blobbing permitted the investigation of Mr. Lott [a gun researcher who has been accused of inventing the results of a telephone survey] to proceed much more quickly than a controversy two years earlier about the former Emory University historian Michael A. Bellesiles, whose book Arming America: The Origins of a National Gun Culture (Alfred A. Knopf) was ultimately exposed as error-ridden” [22]. The network of links and feedback worked to prevent academic misconduct.
The network of bloggers can also capture and correct information that is not covered by mainstream news. One of the most noted instances of bloggers capturing egregious behavior is the comments Trent Lott made at Strom Thrurmond’s 100th birthday. “Most major media outlets ignored the remark but online journalists, especially Webloggers such as Josh Marshall, Andrew Sullivan, and David Frum posted scathing attacks on Lott (with the latter two being conservatives)” [21]. Blogs kept Lott’s story fresh though linking and posting and even researched the subject further. “Atrios [an anonymous blogger] (www.atrios.blogspot.com), found a 1948 Thurmond campaign document telling voters that electing his rival, Harry Truman, would mean "anti-lynching and anti-segregation proposals will become the law of the land and our way of life in the South will be gone forever” [9]. All of the information gathered in the blog community lead to the main stream media finally picking up the story and Lott’s eventual resignation as Senate Majority Leader. In this way the network of blogs was able to enforce the ethic of credibility outside the blogosphere.
Burbules explores this idea of assessing the credibility of a Web site in terms of links. By linking Web sites together and collectively screening the addition of new material, [online communities] pool their intelligence and expertise to make credibility judgments and to cross-check one another. . . One might term this an instance of distributed credibility’ in that it displaces an individual judgment with a collective intelligence [8]. Thought this process of networking no one blog need stand as the one definitive authority. Instead the collective intelligence creates authority. It is from this distributed credibility that the blogging community provides a means of selection.
The blog medium is essential designed to integrate links within content. References to on-line resources or recently published news are common. Another way links can develop is between other blogs which reinforces the credibility of the information because the most popular sites linked to at any given moment become apparent. This process has been captured by Daypop. Daypop is a search engine operating within blogs. Daypop offers a feature called Top 40 which ranks the sites being linked to most often for the day. By reviewing the citations given for links listed at Daypop one can see how far a story or digital object has spread beyond a single blogging community and into the blogosphere as a whole.
Another way the network of links is captured is through an application researchers developed In On the Bursty Evolution of Blogspace. The blogosphere is noted as a “network of small but active micro-communities” which “exibit[s] striking temporal characteristics” [28]. “Within a community of interacting bloggers, a given topic may become the subject of intense debate for a period of time, then fade away. These bursts of activity are typified by heightened hyperlinking amongst the blogs involved” [28]. Kumar and his colleagues’ research is based on the algorithm developed by Jon Kleinberg in 1999. Kleinberg’s research “focus[ed] on the use of links for analyzing the collection of pages relevant to a broad search topic, and for discovering the most “authoritative” pages on such topics” [27]. Kleinberg developed an algorithm designed to identify hub and authoritative pages when searching. He based his research on hyperlinks because “hyperlinks encode a considerable amount of latent human judgment, and we claim that this type of judgment is precisely what is needed to formulate a notion of authority” [27]. His testing produced likely and expected hubs for searches preformed and even clearly divided hubs which fell on opposite side of an argument as with the search preformed on “abortion”. Kleinberg admits it is difficult to machine generate “authority” which is a human determined factor.
Yet when the application is applied to the blogosphere as Kumar and his colleagues have, a combination of human and machine generated authority occurs. The content that appears on a blog has been through a process of selection by human judgement. Once the content is posted the distributed network of the blog will select the most relevant and credibly information. If archivists and preservationist captured these bursts of activity, the analysis could then determine what objects and content should be preserved.
Currently, there is not a very strong system of selection in place when looking at the Web as a whole. The proposed solution of using the blogging community is a radical departure from the centralized method of selection that normally takes place within the Archivist community. Yet, it may be the best suited solution when applied to a medium like the Web and digital objects. Blogs are native to the Web and the community of blogs is growing to represent society as it exists within the digital realm. A decentralized and democratic method of selection may be the only way to manage the glut of information being produced digitally. With the Internet Archive growing at a pace of 12 terabytes a month, will it economically feasible to continue saving everything? Archives have had a strong tradition of selection throughout the ages. Simple because digital records occupy less physical space does not mean they should not go through the same process. The value of a preserved record is in its usefulness and accessibility not in sheer volume.
Even with the possiblity of a solution the issue of selection the problem of longevity has still not been addressed. There is an opportunity for the blogging applications to attempt to address this problem by adding more features to their applications. By adding a link checking feature, blog application could help ensure the technical integrity of a blog’s content. Also by adding metadata to the application which would capture click paths of users, analysis could be done on usage and development of the social network within the community. Metadata could be added to track versioning as well. Currently the number of blogging applications is relatively small and the blogosphere relatively new, making this an opportune time to add functionality to the applications that is concerned with the preservation of the content.
It is said that there is strength in numbers and so it is with blogs. Individually blogs may not hold much value; it is the sum of the community which presents value. It is when the community as a whole is observed that the medium presents value. Blogs are native to the Web. They perform the task of filtering and selection content found on the Web and because of their nature of community building this selection becomes representative of the social culture on the Web. The ephemeral and significant are captured through the democratic process. A blog archive therefore serves society as a whole, and would represent the institutional knowledge of the Web.
1. (2002). About. Retrieved Dec. 10, 2003, from Pyra Labs: http://new.blogger.com/about.pyra
2. Anonymous. (2003). Adventures in Vermont living, chapter thirty-four. Retrieved Dec. 06, 2003, http://www.nobodysdoll.com
3. Barrett, C. (2003). About Cameron Barrett. Retrieved Dec. 10, 2003, http://www.camworld.com/about/
4. BBC. (2003, October 31). World drowning in oceans of data. BBC News. Retrieved November 28, 2003, from http://www.citris.berkeley.edu/news/Archives/Articles_03/10_31_data.htm
5. Block, M. (2001). Communicating off the page. Library Journal, 126, 50-53.
6. Blood, R. (2002). We've got blog. Cambridge, MA: Perseus Publishing.
7. Boiko, B. (2002). Content management bible. New York: Hungry Minds Inc.
8. Burbules, N. (2001). Paradoxes of the web: the ethical dimensions of credibility. Library trends, 49, 441-453.
9. Burkeman , O. (2002, December 21). Bloggers catch what washington post missed. The Guardian. Retrieved Novemember 28, 2003, from http://www.guardian.co.uk/usa/story/0,12271,864036,00.html
10. Berners Lee, T. (1998). A one-page personal history of the web. Retrieved Dec. 07, 2003, from W3Consortium: http://www.w3.org/People/Berners-Lee/
11. Carver, B. (2003). Is it time to get blogging. Library Journal, 128, 30-33.
12. Conhaim, W. (2003). Personal journals: new uses for an age-old practice. Information today, 20, 27-30.
13. Conway, Paul. (2003) Definitions and Overviews. Presented on September 8, 2003.
14. Erard, M. (2003, November 27). Decoding the new cues in online society. The New York Times. Retrieved November 28, 2003, from http://www.nytimes.com/2003/11/27/technology/circuits/27frie.html
15. Farkas, C., et al. (2002). Anoymity and accountability in self-organizing electronic communities. Proceedings of the ACM workshop on privacy in the electronic society [20-21 November 2002]. Washington, DC: Workshop On Privacy In The Electronic Society.
16. Fichter, D. (2001). Blogging you life away. Online, 25, 68.
17. Fogg, B. (2002). How Do People Evaluate a Web Site’s Credibility?. Retrieved Nov. 07, 2003, from Persuasive Technology Lab Stanford University: http://www.webcredibility.org/
18. Frequently Asked Questions. Retrieved Dec. 07, 2003, from Internet Archive: http://www.archive.org/about/faqs.php
19. Hiler, J. (2002). Blogs as Disruptive Tech. Retrieved Dec. 06, 2003, http://www.webcrimson.com/ourstories/blogsdisruptivetech.htm
20. Gilmore, D. (2003, October 8). Blog has become former actor's portal into new career. Knight Ridder/Tribune news service. Retrieved November 9, 2003, from Infotrac Database
21. Glaser, M. (2002, December 17). Weblogs credited for lott brouhaha. Online Journalism Review. Retrieved November 28, 2003, from http://www.ojr.org/ojr/glaser/1040145065.php
22. Glenn, D. (2003). Scholars who blog. Chronicle of Higher Education, 49, 14-17.
23. Greenspan, R. (2003). Blogging by the numbers. CyberAtlas. Retrieved November 28 2003, from http://cyberatlas.internet.com/big_picture/applications/print/0,,1301_2238831,00.html
24. Harder, G. (2003). Throw another blog on the "wire": libraries and the weblogging phenomena. Feliciter, 49, 85-8.
25. Howard, J. (2003, November 16). It's a little too cozy in the blogosphere. The Washington Post. Retrieved November 21, 2003, from Proquest Database
26. Jones, M. (2003). Interview: Six Apart's degree of weblog integration; blog tools vendor positions itself for enterprise growth. InfoWorld.com, Oct 14. Retrieved November 9 2003, from Infrotrac database
27. Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of ACM, 46, 604-632.
28. Kumar, R., et al. (2003). On the Bursty Evolution of Blogspace. Proceedings of the twelfth international conference on World Wide Web [20-24 May 2003]. Budapest, Hungry: International World Wide Web Conference.
29. Marx, G. (1999). What's in a name? Information Society, 15, 99-113.
30. Millen, D. & Patterson, J. F. (2003). Identity Disclosure and the Creation of Social Capital. CHI '03 extended abstracts on Human factors in computer systems [5-10 April 2003]. Ft. Lauderdale, Fl: Conference on Human Factors and Computing Systems.
31. Notess, G. (2002). The blog realm: new sources, searching with daypop, and content management. Online, September/October, 70-72.
32. O'Shea, W. (2003, July 7). The online journals known as web logs are finding favor as an efficient way to communicate within the workplace. New York Times. Retrieved Novermber 28, 2003, from http://query.nytimes.com/gst/abstract.html?res=F70813FF3C590C748CDDAE0894DB404482
33. Schroeder, R. (2003). One path to the blog. eLearn Magazine. Retrieved November 25 2003, from http://www.elearningmag.org/sub_page.cfm?section=3&list_item=14&page=1
34. Schwartz, G. (2003). Blogs for libraries. WebJunction. Retrieved November 9 2003, from http://www.webjunction.org/do/DisplayContent?id=1432
35. Searls, D., & Sifry, D.(2003). Building with blogs. Linux journal, 2003, 4.
36. Selgino, J. (1998). A new archive and internet search engine may change the nature of on-line research. Chronicle of Higher Education, 44, .
37. Tepper, M. (2003). The rise of social software. netWorker, 7, 19-23.
38. Weidlich, T. (2003, June 22). The corpoarte blog is catching on. New York Times. Retrieved November 28, 2003, from http://nytimes.com/2003/06/22/business/yourmoney/22EXLI.html?ex=1371614400
39. Young Jr., T. (2003). Blogs: is the new online culture a fad or the future? Knowledge quest, 31, 50-51.