datavow.com - Documents

1) Introduction
- 1) Purpose
- 2) Requirements
- 3) Terminology
- 4) Overall Operation
2) Notational Conventions and Generic Grammar
- 1) Augmented BNF
- 2) Basic Rules
3) Protocol Parameters
- 1) HTTP Version
- 2) Uniform Resource Identifiers
- 3) Date/Time Formats
  - 1) Full Date
  - 2) Delta Seconds
- 4) Character Sets
  - 1) Missing Charset
- 5) Content Codings
- 6) Transfer Codings
  - 1) Chunked Transfer Coding
- 7) Media Types
  - 1) Canonicalization and Text Defaults
  - 2) Multipart Types
- 8) Product Tokens
- 9) Quality Values
- 10) Language Tags
- 11) Entity Tags
- 12) Range Units
4) HTTP Message
- 1) Message Types
- 2) Message Headers
- 3) Message Body
- 4) Message Length
- 5) General Header Fields
5) Request
- 1) Request-Line
  - 1) Method
  - 2) Request-URI
- 2) The Resource Identified by a Request
- 3) Request Header Fields
6) Response
- 1) Status-Line
  - 1) Status Code and Reason Phrase
- 2) Response Header Fields
7) Entity
- 1) Entity Header Fields
- 2) Entity Body
  - 1) Type
  - 2) Entity Length
8) Connections
- 1) Persistent Connections
- 2) Message Transmission Requirements
9) Method Definitions
- 1) Safe and Idempotent Methods
  - 1) Safe Methods
  - 2) Idempotent Methods
- 2) OPTIONS
- 3) GET
- 4) HEAD
- 5) POST
- 6) PUT
- 7) DELETE
- 8) TRACE
- 9) CONNECT
10) Status Code Definitions
- 1) Informational 1xx
  - 1) Continue
  - 2) Switching Protocols
- 2) Successful 2xx
- 3) Redirection 3xx
- 4) Client Error 4xx
- 5) Server Error 5xx
11) Access Authentication
12) Content Negotiation
- 1) Server-driven Negotiation
- 2) Agent-driven Negotiation
- 3) Transparent Negotiation
13) Caching in HTTP
- 1) ..
- 2) Expiration Model
- 3) Validation Model
- 4) Response Cacheability
- 5) Constructing Responses From Caches
- 6) Caching Negotiated Responses
- 7) Shared and Non-Shared Caches
- 8) Errors or Incomplete Response Cache Behavior
- 9) Side Effects of GET and HEAD
- 10) Invalidation After Updates or Deletions
- 11) Write-Through Mandatory
- 12) Cache Replacement
- 13) History Lists
14) Header Field Definitions
- 1) Accept
- 2) Accept-Charset
- 3) Accept-Encoding
- 4) Accept-Language
- 5) Accept-Ranges
- 6) Age
- 7) Allow
- 8) Authorization
- 9) Cache-Control
  - 1) What is Cacheable
  - 2) What May be Stored by Caches
  - 3) Modifications of the Basic Expiration Mechanism
  - 4) Cache Revalidation and Reload Controls
  - 5) No-Transform Directive
  - 6) Cache Control Extensions
- 10) Connection
- 11) Content-Encoding
- 12) Content-Language
- 13) Content-Length
- 14) Content-Location
- 15) Content-MD5
- 16) Content-Range
- 17) Content-Type
- 18) Date
  - 1) Clockless Origin Server Operation
- 19) ETag
- 20) Expect
- 21) Expires
- 22) From
- 23) Host
- 24) If-Match
- 25) If-Modified-Since
- 26) If-None-Match
- 27) If-Range
- 28) If-Unmodified-Since
- 29) Last-Modified
- 30) Location
- 31) Max-Forwards
- 32) Pragma
- 33) Proxy-Authenticate
- 34) Proxy-Authorization
- 35) Range
  - 1) Byte Ranges
  - 2) Range Retrieval Requests
- 36) Referer
- 37) Retry-After
- 38) Server
- 39) TE
- 40) Trailer
- 41) Transfer-Encoding
- 42) Upgrade
- 43) User-Agent
- 44) Vary
- 45) Via
- 46) Warning
- 47) WWW-Authenticate
15) Security Considerations
- 1) Personal Information
- 2) Attacks Based On File and Path Names
- 3) DNS Spoofing
- 4) Location Headers and Spoofing
- 5) Content-Disposition Issues
- 6) Authentication Credentials and Idle Clients
- 7) Proxies and Caching
  - 1) Denial of Service Attacks on Proxies
16) Acknowledgments
17) References
18) Authors' Addresses
19) Appendices
- 1) Internet Media Type message/http and application/http
- 2) Internet Media Type multipart/byteranges
- 3) Tolerant Applications
- 4) Differences Between HTTP Entities and RFC 2045 Entities
- 5) Additional Features
  - 1) Content-Disposition
- 6) Compatibility with Previous Versions
20) Index
21) Full Copyright Statement
22) Acknowledgement

2 Notational Conventions and Generic Grammar

2.1 Augmented BNF

All of the mechanisms specified in this document are described in both prose and an augmented Backus-Naur Form (BNF) similar to that used by RFC 822 [9]. Implementors will need to be familiar with the notation in order to understand this specification. The augmented BNF includes the following constructs:

name = definition

The name of a rule is simply the name itself (without any enclosing "<" and ">") and is separated from its definition by the equal "=" character. White space is only significant in that indentation of continuation lines is used to indicate a rule definition that spans more than one line. Certain basic rules are in uppercase, such as SP, LWS, HT, CRLF, DIGIT, ALPHA, etc. Angle brackets are used within definitions whenever their presence will facilitate discerning the use of rule names.

"literal"

Quotation marks surround literal text. Unless stated otherwise, the text is case-insensitive.

rule1 | rule2

Elements separated by a bar ("|") are alternatives, e.g., "yes | no" will accept yes or no.

(rule1 rule2)

Elements enclosed in parentheses are treated as a single element. Thus, "(elem (foo | bar) elem)" allows the token sequences "elem foo elem" and "elem bar elem".

*rule

The character "*" preceding an element indicates repetition. The full form is "<n>*<m>element" indicating at least <n> and at most <m> occurrences of element. Default values are 0 and infinity so that "*(element)" allows any number, including zero; "1*element" requires at least one; and "1*2element" allows one or two.

[rule]

Square brackets enclose optional elements; "[foo bar]" is equivalent to "*1(foo bar)".

N rule

Specific repetition: "<n>(element)" is equivalent to "<n>*<n>(element)"; that is, exactly <n> occurrences of (element). Thus 2DIGIT is a 2-digit number, and 3ALPHA is a string of three alphabetic characters.

#rule

A construct "#" is defined, similar to "*", for defining lists of elements. The full form is "<n>#<m>element" indicating at least <n> and at most <m> elements, each separated by one or more commas (",") and OPTIONAL linear white space (LWS). This makes the usual form of lists very easy; a rule such as

( *LWS element *( *LWS "," *LWS element ))

can be shown as

1#element

Wherever this construct is used, null elements are allowed, but do not contribute to the count of elements present. That is, "(element), , (element) " is permitted, but counts as only two elements. Therefore, where at least one element is required, at least one non-null element MUST be present. Default values are 0 and infinity so that "#element" allows any number, including zero; "1#element" requires at least one; and "1#2element" allows one or two.

; comment

A semi-colon, set off some distance to the right of rule text, starts a comment that continues to the end of line. This is a simple way of including useful notes in parallel with the specifications.

implied *LWS

The grammar described by this specification is word-based. Except where noted otherwise, linear white space (LWS) can be included between any two adjacent words (token or quoted-string), and between adjacent words and separators, without changing the interpretation of a field. At least one delimiter (LWS and/or separators) MUST exist between any two tokens (for the definition of "token" below), since they would otherwise be interpreted as a single token.

2.2 Basic Rules

The following rules are used throughout this specification to describe basic parsing constructs. The US-ASCII coded character set is defined by ANSI X3.4-1986 [21].

OCTET	= <any 8-bit sequence of data>
CHAR	= <any US-ASCII character (octets 0 - 127)>
UPALPHA	= <any US-ASCII uppercase letter "A".."Z">
LOALPHA	= <any US-ASCII lowercase letter "a".."z">
ALPHA	= UPALPHA \| LOALPHA
DIGIT	= <any US-ASCII digit "0".."9"
CTL	= <any US-ASCII control character (octets 0 - 31) and DEL (127)>
CR	= <US-ASCII CR, carriage return (13)>
LF	= <US-ASCII LF, linefeed (10)>
SP	= <US-ASCII SP, space (32)>
HT	= <US-ASCII HT, horizontal-tab (9)>
<">	= <US-ASCII double-quote mark (34)>

HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all protocol elements except the entity-body (see Appendix 19.3 for tolerant applications). The end-of-line marker within an entity-body is defined by its associated media type, as described in Section 3.7.

CRLF

= CR LF

HTTP/1.1 header field values can be folded onto multiple lines if the continuation line begins with a space or horizontal tab. All linear white space, including folding, has the same semantics as SP. A recipient MAY replace any linear white space with a single SP before interpreting the field value or forwarding the message downstream.

LWS	= [CRLF] 1*( SP \| HT )

The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 [22] only when encoded according to the rules of RFC 2047 [14].

TEXT	= <any OCTET except CTLs, but including LWS>

A CRLF is allowed in the definition of TEXT only as part of a header field continuation. It is expected that the folding LWS will be replaced with a single SP before interpretation of the TEXT value.

Hexadecimal numeric characters are used in several protocol elements.

HEX	= "A" \| "B" \| "C" \| "D" \| "E" \| "F" \| "a" \| "b" \| "c" \| "d" \| "e" \| "f" \| DIGIT

Many HTTP/1.1 header field values consist of words separated by LWS or special characters. These special characters MUST be in a quoted string to be used within a parameter value (as defined in Section 3.6).

token	= 1*<any CHAR except CTLs or separators>
separators	= "(" \| ")" \| "<" \| ">" \| "@" \| "," \| ";" \| ":" \| "\" \| <"> \| "/" \| "[" \| "]" \| "?" \| "=" \| "{" \| "}" \| SP \| HT

Comments can be included in some HTTP header fields by surrounding the comment text with parentheses. Comments are only allowed in fields containing "comment" as part of their field value definition. In all other fields, parentheses are considered part of the field value.

comment	= "(" *( ctext \| quoted-pair \| comment ) ")"
ctext	= <any TEXT excluding "(" and ")">

A string of text is parsed as a single word if it is quoted using double-quote marks.

quoted-string	= ( <"> *(qdtext \| quoted-pair ) <"> )
qdtext	= <any TEXT except <">>

The backslash character ("\") MAY be used as a single-character quoting mechanism only within quoted-string and comment constructs.

quoted-pair

= "\" CHAR