Indiana University Web Sites on Archive-It.org

Indiana University Web Sites on Archive-It.org
 
 
Collection Overview
Indiana University Web Sites seeks to preserve and facilitate access to web sites produced by administrative offices, schools, departments, service units, institutes, centers, programs, and faculty, student and alumni organizations on the Indiana University, Bloomington campus.  In addition, a few websites for Indiana University offices that are responsible for operations at the system-wide level have also been collected.
 
Citing Web Sites in the Archive
Please cite the collection as follows: Indiana University Web Sites.  Archived by the Indiana University Libraries Web Archive at http://www.archive-it.org/collections/219  <accessed [date]>
Please cite individual seeds or web pages as follows: “School of Education.” Indiana University Web Sites.  Archived by the Indiana University Libraries Web Archive at  http://www.archive-it.org/collections/219  <accessed [date]>
 
Selection Criteria
Scope: The goal is to preserve and make accessible every web site created by a unit on the IU Bloomington campus, and the web sites of a few, important system-wide offices.  The only reason that an IU Bloomington web site would be excluded is if the site were password protected, blocked by robots.txt, or otherwise inaccessible to the Internet Archive’s automated systems. 
Volume: Currently, there are 182 unique domains or seeds being captured in this collection.
 
Crawl Parameters
Collection Dates:  Start Date:  July 1, 2006
How often captured:  The frequency of capture is determined by an analysis of how often the site changes over time.  It is anticipated that most sites will be crawled on a quarterly basis.  A few active sites will be crawled monthly and some less active sites will be crawled on an annual basis.
 
Acquisition Parameters
Depth:  The complete web site, if possible. 
Breadth:  Links are followed out to one external level.
 
Searching
Archive-It provides full text search capability for all public collections. Alternately, if you know the site you are looking for, enter the URL into the search box, and Archive-It will search for instances of that archived URL.
Archive-It release 2.0 (July 24, 2006) enables searching of both the full text of web sites and the metadata that has been assigned to the seeds, or individual URL’s.  However, the ability to search on metadata elements is not yet available to the public. 
The search tool used to provide full-text access to the Library's Web archive collections is powered by the open-source search engine, Nutch.
 
Some hints on searching:
  • Generally, search results are ranked by relevance according to several factors:
  • how often the query terms appear in the page relative to how often they appear throughout the collection
  • how often the query terms appear in the page compared to the length of the page
  • whether the query terms appear in the URL
  • whether the query terms appear in the hostname
  • The Boolean search default is AND.
  • If you know that what you're looking for is in a specific type of file, you can limit your search to just that format by adding type:[file type] to your search terms. e.g., A PDF document about Herman Wells might be found using the following string: Herman Wells type:pdf.
  • If you want to find out about a topic discussed specifically on an archived web site, you can limit your search by adding site:[URL of archived site] to your search terms. e.g., David Baker site:www.music.indiana.edu/ will find mentions of David Baker on the School of Music web site.
  • You can refine search results in the following ways:
    • The link to other versions will take you to a list of archived versions that were captured on different dates.
    • The more from... link will take you to other hits from that host.
Since the Indiana University Libraries have been archiving web sites only since spring 2006, you may wish to look for earlier versions of many of the sites in the Library's collections through the Internet Archive's general Wayback Machine.  The Wayback Machine, however, is not text searchable; you must know the URL of the site that you would like to view.
 
Contact Information:
Philip Bantin
Director, Office of University Archives and Records Management