Extended Backus-Naur Form
In computer science, extended Backus–Naur form (EBNF) is a family of metasyntax notations, any of which can be used to express a context-free grammar. EBNF is used to make a formal description of a formal language such as a computer programming language. They are extensions of the basic Backus–Naur form (BNF) metasyntax notation. The earliest EBNF was developed by Niklaus Wirth, incorporating some of the concepts (with a different syntax and notation) from Wirth syntax notation. Today, many variants of EBNF are in use. The International Organization for Standardization adopted an EBNF Standard, ISO/IEC 14977, in 1996.[1][2] According to Zaytsev, however, this standard "only ended up adding yet another three dialects to the chaos" and, after noting its lack of success, also notes that the ISO EBNF is not even used in all ISO standards.[3]
This article uses EBNF as specified by the ISO for examples applying to all EBNFs. Other EBNF variants use somewhat different syntactic conventions.
Basics
EBNF is a code that expresses the syntax of a formal language.[4] An EBNF consists of terminal symbols and non-terminal production rules which are the restrictions governing how terminal symbols can be combined into a valid sequence. Examples of terminal symbols include alphanumeric characters, punctuation marks, and whitespace characters.
The EBNF defines production rules where sequences of symbols are respectively assigned to a nonterminal:
digit excluding zero = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
digit = "0" | digit excluding zero ;
This production rule defines the nonterminal digit which is on the left side of the assignment. The vertical bar represents an alternative and the terminal symbols are enclosed with quotation marks followed by a semicolon as terminating character. Hence a digit is a 0 or a digit excluding zero that can be 1 or 2 or 3 and so forth until 9.
A production rule can also include a sequence of terminals or nonterminals, each separated by a comma:
twelve = "1", "2" ;
two hundred one = "2", "0", "1" ;
three hundred twelve = "3", twelve ;
twelve thousand two hundred one = twelve, two hundred one ;
Expressions that may be omitted or repeated can be represented through curly braces { ... }:
positive integer = digit excluding zero, { digit } ;
In this case, the strings 1, 2, ..., 10, ..., 10000, ... are correct expressions. To represent this, everything that is set within the curly braces may be repeated arbitrarily often, including not at all.
An option can be represented through squared brackets [ ... ]. That is, everything that is set within the square brackets may be present just once, or not at all:
integer = "0" | [ "-" ], positive integer ;
Therefore, an integer is a zero (0) or a positive integer that may be preceded by an optional minus sign.
EBNF also provides, among other things, the syntax to describe repetitions (of a specified number of times), to exclude some part of a production, and to insert comments in an EBNF grammar. Table of symbols
The following represents a proposed ISO/IEC 14977 standard, by R. S. Scowen, page 7, tables 1 and 2.
| Usage | Notation | Alternative | Meaning |
|---|---|---|---|
| definition | = | ||
| concatenation | , | ||
| termination | ; | . | |
| alternation | ` | ` | / or ! |
| optional | [ ... ] | (/ ... /) | none or once |
| repetition | { ... } | (: ... :) | none or more |
| grouping | ( ... ) | ||
| terminal string | " ... " | ' ... ' | |
| comment | (* ... *) | ||
| special sequence | ? ... ? | ||
| exception | - |