Web Server Setup

Web-Based Publishing (INFO 290-19)

Erik Wilde, UC Berkeley School of Information
2007-04-19
Creative Commons License

This work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 2.5 License.

Abstract

For Web-based publishing, one of the core components is how to make published information available, which is the task of a Web Server (technically speaking, an HTTP Server). While there are many different Web server implementations, this lecture uses the most popular Web server software, the Apache HTTP Server, as an example for the important aspects of Web server configuration.

Web Basics

Server Statistics

Outline (Basic Operation)

  1. Basic Operation [4]
  2. Configuration Control [1]
  3. Access Control [3]
  4. Server-Side Includes (SSI) [5]
  5. Content Negotiation [4]
  6. Crawlers [2]
  7. Conclusions [1]

HTTP Service

Apache Request Processing

Request Processing in Apache

Filters

Web Pages

Outline (Configuration Control)

  1. Basic Operation [4]
  2. Configuration Control [1]
  3. Access Control [3]
  4. Server-Side Includes (SSI) [5]
  5. Content Negotiation [4]
  6. Crawlers [2]
  7. Conclusions [1]

Where to Look

Outline (Access Control)

  1. Basic Operation [4]
  2. Configuration Control [1]
  3. Access Control [3]
  4. Server-Side Includes (SSI) [5]
  5. Content Negotiation [4]
  6. Crawlers [2]
  7. Conclusions [1]

Limiting Clients

Managing Authentication

Requesting Authorization

Outline (Server-Side Includes (SSI))

  1. Basic Operation [4]
  2. Configuration Control [1]
  3. Access Control [3]
  4. Server-Side Includes (SSI) [5]
  5. Content Negotiation [4]
  6. Crawlers [2]
  7. Conclusions [1]

Server-Side Processing

SSI Configuration

Basic SSI

Processing Problems

Advanced SSI

Outline (Content Negotiation)

  1. Basic Operation [4]
  2. Configuration Control [1]
  3. Access Control [3]
  4. Server-Side Includes (SSI) [5]
  5. Content Negotiation [4]
  6. Crawlers [2]
  7. Conclusions [1]

URI vs. MIME

Type Maps

URI: foo

URI: foo.en.html
Content-type: text/html
Content-language: en

URI: foo.fr.de.html
Content-type: text/html;charset=iso-8859-2
Content-language: fr, de

Emulating Type Maps

Negotiation Algorithm

Outline (Crawlers)

  1. Basic Operation [4]
  2. Configuration Control [1]
  3. Access Control [3]
  4. Server-Side Includes (SSI) [5]
  5. Content Negotiation [4]
  6. Crawlers [2]
  7. Conclusions [1]

Controlling Crawlers

# /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism

User-agent: webcrawler
Disallow:

User-agent: lycra
Disallow: /

User-agent: *
Disallow: /tmp
Disallow: /logs

Steering Crawlers

Outline (Conclusions)

  1. Basic Operation [4]
  2. Configuration Control [1]
  3. Access Control [3]
  4. Server-Side Includes (SSI) [5]
  5. Content Negotiation [4]
  6. Crawlers [2]
  7. Conclusions [1]

Your Customer Touchpoint