CS273: Context-Free Languages Review
This document describes expectations of what knowledge and skills you
should have about context-free languages - the middle third of the
course. To prepare for the exam, you should make sure that you have
done all relevant reading assigned, gone through all homework problems
and solutions, all HBS problems, and viewed all lecture notes. We have
attempted here to give a reasonable high-level summary of concepts and
skills, but the "official" representation of what has been covered is
the union of of all material mentioned in the above sources, together
with this document and any additional material presented in lectures.
- Context-Free Grammars (CFGs)
- CFGs accept Context-Free Languages (CFLs), which properly contain
the regular languages.
- Be able to use and manipulate CFGs expressed as tuples:
G = (V, T, P, S)
- Use either standard notations/conventions for denoting
terminals, non-terminals, sentential forms etc. OR clearly
indicate your notation.
- The symbol =n=> : alpha1
=n=> alpha2 means the
sentential form alpha2 is derived from the
sentential form alpha1 after exactly n
applications of some production(s).
- Be able to prove that a grammar G generates exactly the language L.
This requires showing
- L(G) is a subset of L, i.e. all strings
derivable from G are in L, and
- L is a subset of L(G),
i.e. every string in L has some derivation in G.
Proof technique: Induction on the number of productions used
in deriving a string w in L. In the induction step, it is often
useful to consider the first production followed by applying
the inductive hypothesis.
- Be able to construct a grammar for a given language.
Some useful techniques:
- Zero or more repeated characters: A -> aA | a
- anbn : X -> aXb | e
- anb3n : X -> aXbbb | e
- anbm where m > n : equivalent to anbnbk where k > 0. Therefore, S -> XY, X -> aXb | e, Y -> bY | b
- anbm where m != n : equivalent to the union of the two languages anbm where m < n and anbm where m > n
- Union of L1 and L2: S -> S1 | S2
- Concatenation of L1 and L2: S -> S1S2
- anbmcmdn, no relation between m and n: S -> aSd | X, X -> bXc | e
- A word w is in L(G) iff w is the yield of a derivation tree
of G with S at the root, i.e. the labels on the leaves of the
tree read left to right spell out w.
- Understand definitions related to derivations:
- A leftmost derivation always replaces the leftmost non-terminal.
- A rightmost derivation always replaces the rightmost non-terminal.
- A grammar is ambiguous if there is some word that has more than
one leftmost (or rightmost) derivation.
A CFL is inherently ambiguous if all grammars for that
language are ambiguous.
- Push-Down Automatas (PDAs)
- A PDA is an NFA with one infinite Last-In-First-Out stack.
- PDAs are non-deterministic by default.
- A PDA can use only ONE stack. (A PDA with two stacks is
equivalent to a TM (which we'll talk about soon).)
- PDAs can accept a string either by entering a final state or
emptying its stack, after reading the entire string.
- Acceptance by final state and by empty stack are
equivalent.
- IN THIS CLASS, ALL PDAs ARE TO ACCEPT BY BOTH
FINAL STATE AND EMPTY STACK
- No transitions are allowed from a state when the stack
is empty. (PDAs start off with some initial symbol on the
stack.)
- PDAs recognize precisely context-free languages -- for
every CFG G, there exists a PDA (with one state) that can simulate
left-most derivations of strings obtained from G; for
every PDA M, there exists a grammar that derives exactly
those strings accepted by M.
- Be able to use and manipulate PDAs expressed as a tuple:
(Q,Sigma,Gamma,delta, q0,Z,F)
- The instantaneous description (ID) of a computation on a PDA is
its current state, the remainder of the string to be read, and the
current stack contents.
- Be able to construct a PDA for a given CFL.
Some useful techniques:
- Design a grammar and then construct a PDA that simulates
left-most derivations. (Generally, not the best way.)
- Remember that you can still store a FINITE amount of
information in the state of a PDA.
- Potentially unbounded amount of information should go on the
stack.
- A single stack symbol can be a k-tuple, i.e. can
consist of k different components.
- Deterministic PDAs (DPDAs)
- DPDAs have only one possible transition for every combination of
current state, input symbol and top-of-stack symbol.
- epsilon-transitions are allowed, but only when no
non-epsilon-transition is possible.
- DPDAs are less powerful than PDAs. DPDAs accept the
deterministic context-free languages (DCFLs). The class of
DCFLs is a proper subset of the class of CFLs.
- Chomsky Normal Form (CNF)
- In CNF, each right-hand side is either a single terminal or two
nonterminals.
- Know how to convert a grammar G into CNF:
(assuming L(G) does not contain epsilon)
- Eliminate epsilon-productions by identifying and
eliminating "nullable" non-terminals. The
resulting grammar can never generate the empty
string.
- Eliminate unit productions (A -> B).
- Eliminate useless symbols.
- Force all productions to be either of the form A ->
X1X2...Xn where all Xi are non-terminals, or A -> a.
- Obtain productions of the form A -> BC.
- Pumping lemma for CFLs
- Be able to state the pumping lemma for CFLs
- Be able to use the pumping lemma to prove that a particular
language is not context-free.
Your proof should follow the template of examples we showed and must
include all the text explaining the argument.
(Note the difference from the pumping lemma for regular
languages: the "pumpable" parts of the string need not
be in the initial portion (prefix) of the string; however
they cannot be separated by more than n characters.)
- PLEASE do not choose u, v, w, x, y.
Instead, you need argue that they must have a certain form.
- Know some particular non-context-free languages:
- Closure properties
- Know CFLs are closed under:
- Union, concatenation, *
- Substitution (with CFLs), homomorphism
- Intersection with regular languages
- Be able to use these closure properties to prove that a language
is not context-free.
- Know CFLs are NOT closed under:
- Complement
- Intersection with CFLs that are not regular
- Decision problems for CFLs
- DECIDABLE properties (i.e. some decision algorithm exists)
- Given a CFG G, is L(G) empty?
- Is L(G) finite or infinite?
- UNDECIDABLE properties (i.e. no decision algorithm can
exist)
- Is the complement of L(G) empty? (i.e. is L(G) =
Sigma*?)
- Is the complement of L(G) finite?
- Is L(G1) intersect L(G2) empty?
- Is L(G1) = L(G2)?
- Is G ambiguous?
- Is L(G) inherently ambiguous?
- CYK parsing algorithm to decide given w and G whether w is in
L(G). (not on test)