\documentclass[a4paper,12pt]{esslli}

%\usepackage{a4wide} %this is to have narrow margins (more space for text)
\usepackage{chicago} %style for bibliography 
\usepackage[latin1]{inputenc}
\usepackage{graphicx}
\usepackage{color}
\usepackage{url}
\usepackage{lingmacros}
\usepackage{algorithm}
\usepackage{algpseudocode}
\usepackage{amssymb}

\newcommand{\erickowjargon}{\textbf}
\newcommand{\erickowtuple}[1]{\langle #1 \rangle}
\newcommand{\erickowset}[1]{\{#1\}}
\newcommand{\erickowsemexpr}[1]{\texttt{\footnotesize #1}}
\newcommand{\erickownatlang}[1]{\textit{\footnotesize #1}}
\newcommand{\erickowtautree}[1] {$\tau_{#1}$}
\newcommand{\erickowtree}[1]{\texttt{\footnotesize #1}}
\newcommand{\erickowpol}[1]{\texttt{\footnotesize #1}}

\offprhead{Proceedings of the Seventh ESSLLI Student Session}
\offprocc{\textnormal{Paul Egr\'e and Laura Alonso i Alemany (editors)}}
\offprauthors{Eric Kow}

\begin{document}
\chapter{Adapting polarised disambiguation to surface realisation}
\thispagestyle{firstpage}
\chapterauthor{Eric Kow}
\chapteraffiliation{Langue et Dialogue - LORIA\\
  615 rue du jardin botanique\\
  54600 Villers-Lès-Nancy, France}
\chapteremail{Eric.Kow@loria.fr}

\begin{chapterabstract}
The aim of surface realisation is to produce a string from an input
semantics. The task may be viewed as the inverse of parsing, and like
parsing, it is made more difficult by the problem of lexical ambiguity
in natural language. \cite{bonfante04} propose a polarisation technique 
for parsing that greatly reduces the effects of ambiguity.  We show how
polarisation can be adapted for surface realisation and how to address
two complications which arise: multi and zero-literal semantic input.
\end{chapterabstract}

\section{Introduction}

Surface realisation can be seen as parsing in reverse: given a grammar
and some semantic input, the task is to construct the parse trees
associated by the grammar with that semantics, and to output the
resulting strings.  Like parsing, surface realisation is severely
affected by the possibility for each part of the semantics to be
realised by several lexical entries.  But unlike parsing, surface
realisation is NP-complete \cite{kollerStriegnitz02}, so that a solution
to the lexical ambiguity problem becomes particularly desirable.  A
common response is to disambiguate the input with a separate
preprocessing filter before parsing or realisation proper.  The filter
can be made to be very fast by using probabilistic methods
\cite{joshi94}; however, such methods carry the risk of discarding valid
disambiguations along with the incorrect ones.  We aim for a surface
realiser that can produce all possible paraphrases, that is explore the
entire search space, and so we eschew the use of probabilities in favour
of an ``exact method'' \cite{bonfante04}.  The basic idea is to make the
resource sensitivity of grammars explicit by annotating lexical items
with polarities and using a filtering step to neutralise them.
This has been successfully applied to parsing \cite{bonfante03leopar}, 
but adapting it to generation reveals some hidden complications which
stem from lexical items spanning either multiple pieces of input, or
none at all.

In this paper, we propose some modifications to the filtering algorithm
that overcome these problems.  The result is that the optimised surface
realiser retains its ability to produce output with such items as
pronouns and control verbs.  We begin in sections
\ref{erickowsec:the_surface_realiser} and \ref{erickowsec:polarity_filtering} by
briefly presenting the surface realiser and a basic version of its
polarity filter.  Then in sections \ref{erickowsec:multi-literal} and
\ref{erickowsec:nullsem}, we discuss the modifications necessary to make the
filter work for multi-literal and zero-literal semantic input. 

\section{The surface realiser}
\label{erickowsec:the_surface_realiser}

The surface realiser uses a Feature Based Lexicalised Tree Adjoining
Grammar (FLTAG) \cite{VijJos88} and a flat semantic representation
\cite{CopLasFli01}.  The properties of FLTAG are largely irrelevant to
this paper, except that a grammar is a set of lexical items where
each item is a tree with one or more anchors.  A flat semantics consists
of a set of propositions called \erickowjargon{literals}.  A literal 
consists of a predicate followed by a list of indices, its arguments.
If two literals have arguments with the same index, that index refers to
the same entity.  Below, for example, the index \erickowsemexpr{p} in
\erickowsemexpr{picture(p)} and \erickowsemexpr{cost(c,p,h)} refers to
the same object.

\enumsentence{\label{erickowex:flat_semantics}
\erickowsemexpr{picture(p), cost(c,p,h), high(h)} 
(\erickownatlang{The picture is expensive})}

Given a semantics (the \erickowjargon{input semantics}) and a set of
lexical items retrieved from the FLTAG (the \erickowjargon{lexical
input}), the surface realiser produces sentences whose semantics is
equal to the input semantics.  It uses a tabular algorithm with
bottom-up processing.  The lexical input consists of the set of lexical
items whose semantics subsumes the input semantics.  Several lexical
items may cover the same literal, raising ambiguities to be solved.
For example, the lexical input associated with (\ref{erickowex:flat_semantics})
could be \erickowtautree{painting} or \erickowtautree{picture} for
\erickowsemexpr{picture(p)}; \erickowtautree{cost} (the noun) or
\erickowtautree{costs} for \erickowsemexpr{cost(c,p,h)}; and
\erickowtautree{high} or \erickowtautree{a~lot} for
\erickowsemexpr{high(h)}.

\section{Polarity filtering}
\label{erickowsec:polarity_filtering}

We attempt to resolve all input lexical ambiguities into a set of
disambiguations where each disambiguation is a combination of selected
lexical items.  Given a lexical input, the number of lexical
combinations is \textit{a priori} exponential: $\prod_{1 \leq i \leq
n}{a_i}$ with $a_i$ the degree of lexical ambiguity of the $i$-th
literal and $n$, the number of literals in the input semantics. In
practice, however, many of these combinations are syntactically invalid:
either some items have syntactic requirements which cannot be met, or
some of the lexical items cannot be combined with others and stitched
into a successful realisation \cite{perrier03}. To reduce the
combinations resulting from lexical ambiguity, we introduce a single
filtering step whose role is to efficiently detect these invalid
combinations so that they are not explored during realisation proper.  

We identify the syntactic requirements of each lexical item and only
retain the lexical combinations in which the requirements of every item
are met.  More precisely, each lexical item is assigned a bag of labels
that have either a polarity $+$ or $-$.  These labels provide hints
about which trees may combine with each other.  For instance, an
intransitive verb will have the label $\erickowpol{-np}$, meaning it
``requires'' a noun phrase (the subject), whereas a transitive verb
will have the labels $\erickowpol{-np -np}$ (the subject and object),
requiring two noun phrases.  In the case of TAG grammars, these
polarities can be automatically extracted \cite{bonfante04} by awarding
each tree with a $-f$ polarity for every foot or substitution node with
category $f$ and a $+r$ polarity, where $r$ is the category of its root
node.  For instance, a tree like
\erickowtree{s(np$\downarrow$,vp(v(hates),np$\downarrow$))} would have
the polarities \erickowpol{-np -np +s}.

%In the example
%above, one such combination could be \erickowtautree{the~picture},
%\erickowtautree{costs}, \erickowtautree{a~lot}, and another \erickowtautree{the~painting},
%\erickowtautree{the~cost~of}, \erickowtautree{is~high}.   

In the table below, we show how these polarities can be put to use for
generation.  Each column contains a single literal from the input
semantics \erickowsemexpr{picture(p), cost(c,p,h), high(h)}, along with the
lexical items that realise that literal as well as their associated
polarities. To simplify this example, we only consider a single label,
\erickowpol{np}.

\begin{center}
\begin{tabular}{|c|c|c|}
\hline
\erickowsemexpr{picture(p)}  & \erickowsemexpr{cost(c,p,h)}    & \erickowsemexpr{high(h)} \\
\hline
\erickowtautree{picture} \erickowpol{+np} & 
\erickowtautree{cost} \erickowpol{+np -np} &
\erickowtautree{high} \erickowpol{-np}\\
%
\erickowtautree{painting} \erickowpol{+np} & 
\erickowtautree{costs} \erickowpol{-np}        & 
\erickowtautree{a~lot}   \\
\hline
\end{tabular}\\
\end{center}

The problem at hand is to choose a combination that covers the entire
semantics, that is, one lexical item per column.  We can avoid the
syntactically invalid combinations by counting polarities.  If the 
sum of polarities (or \erickowjargon{charge}) is greater than zero, then
the lexical combination overall has more trees than it can use.
Similarly, if the charge is less than zero, then the combination requires
more trees than it can provide.  The charge must be equal to zero or
else the combination is syntactically invalid.  For instance, the
combination \erickowtautree{painting} \erickowtautree{cost} \erickowtautree{a~lot}
has a charge of $+1np$, so it is clearly not a solution. In contrast,
\erickowtautree{painting} \erickowtautree{costs} \erickowtautree{a~lot} has a charge of
$0np$, which does not \emph{guarantee} syntactic compatibility but
suggests that the combination is worth exploring.

\begin{figure}[htpb]
\begin{center}
\includegraphics[scale=0.33]{images/basicaut-gpruned.pdf}
\end{center}
\vspace{-0.4cm}
\caption{A minimised polarity automaton}
\label{erickowfig:polarity_automaton_pruned}
\end{figure}

Rather than performing a separate polarity count for every lexical
combination, we factorise the work into an automaton which calculates
the lexical combinations and their net charge.  We arbitrarily impose an
order on the input semantics into a list we call $InputSemantics$.  Each
automaton state has the form $\erickowtuple{L,C}$ and represents the
set\footnote{We will use the following notation for lists: $\emptyset$
indicates the empty list and $L+i$ indicates appending item $i$ to $L$.
We also take the liberty of comparing sets and lists, e.g, if we say
that a set $S$ is equal to a list $L$, we merely treat $L$ as a set.} of
lexical combinations whose semantics is $L$ and whose polarity charge is
$C$.  The approach is to construct the automaton in steps, one literal
at a time, starting with the initial step $\erickowtuple{\emptyset,0}$.  At
each step, we build upon the states $\erickowtuple{L,C}$ created in the
previous step. We select the next literal $l$ from $InputSemantics$; for
each lexical item $lex_l$ which realises literal $l$, we add a
transition to the state $\erickowtuple{L+l,C+pol}$ where $pol$ is the polarity
of $lex_l$.  When there are no more literals left to select, we complete
the process by declaring the final state to be
$\erickowtuple{InputSemantics,0}$ and performing automaton minimisation
\cite{hopcroft79intro}.  What remains is a representation of all the
lexical combinations that cover the input semantics with zero net charge
(figure \ref{erickowfig:polarity_automaton_pruned}).  \footnote{See
\cite{kow04} for a description of how this technique can be generalised
for multiple labels.} As shown in \cite{kow04}, polarity filtering is an
efficient method for dealing with lexical ambiguity. Given an input with
438 272 possible combinations, the polarity filter only passes 232 of
these combinations on.  As a result, the surface realisation time drops
from 93.8 to 14.7 seconds \cite{gardentKow05}. 

\section{Multi-literal semantic lexical items}
\label{erickowsec:multi-literal}

Certain lexical items could have a semantics that spans multiple
literals, for example \erickowsemexpr{cost(c,p,h), high(h)} for the item
\erickowtautree{expensive}. These items are not correctly handled because
the automaton construction algorithm relies on the assumption that at 
any given state $\erickowtuple{L,C}$, the lexical combinations represented by
that state all have a semantics equal to $L$.  The assumption breaks
down when one of the combinations has an item $lex_m$ whose semantics
includes some literal $x$ which is not in $L$.  If the algorithm later
visits literal $x$, whatever representative it selects for the literal
will cause a polarity miscount: either $lex_m$ is reselected and its
polarities are double-counted, or some other item with a redundant
semantics is selected and its polarities are included in the count.  
The correct behaviour in this case is neither to double-count polarities
nor build lexical combinations with redundant lexical items, but not to
select any item at all.  

We achieve this by augmenting the automaton states with a third element
$E$, used to keep track of the lexical combinations with extra
semantic literals.  Given a state $\erickowtuple{L_1,C_1,E_1}$, the literal
being visited $l_1$ and the selected lexical item $lex_1$ with polarity
$pol$ and semantics $\erickowset{l_1} \cup extra$; we build a transition via
$lex_1$ to the state $\erickowtuple{L_1+l_1,C_1+pol,E_1\cup extra}$.  Now,
given a step with state $\erickowtuple{L_2,C_2,E_2}$; if the literal being
visited $l_2 \in E_2$, we build a null transition to the state
$\erickowtuple{L_2+l,C_2,E_2 \setminus \erickowset{l_2}}$.  

\begin{figure}[htpb]
\begin{center}
\includegraphics[scale=0.33]{images/multiaut-gpruned.pdf}
\end{center}
\vspace{-0.4cm}
\caption{A minimised polarity automaton with multi-literal semantics}
\label{erickowfig:multilit_automaton_pruned}
\end{figure}

It may seem desirable to simplify this mechanism by replacing the
tracking of semantic literals and use of null transitions with 
a single transition that spans all the semantic literals covered
by a lexical item. In figure \ref{erickowfig:multilit_automaton_pruned}
above, this would mean a direct transition \erickownatlang{expensive}
between the first $+1$ and final $0$ states.  But this simplification
is not viable, because lexical items may span overlapping sets of
semantics literals.  In \cite{shemtov96} for example,
\erickownatlang{quickly moved into} can also be expressed as
\erickownatlang{rushed into} or \erickownatlang{quickly entered}.  
Given the ordering below, it would not be possible to build a direct
transition for \erickowtautree{entered} across
\erickowsemexpr{move(m,x), into(x,y)} without erroneously skipping 
over the literal  \erickowsemexpr{quick(m)}.

\begin{center}
\begin{tabular}{|l|lll|}
\hline
\textbf{lexical item} &
\multicolumn{3}{l|}{\textbf{semantics}} 
\\
\hline
\erickownatlang{moved} &
\erickowsemexpr{move(m,x)} &
&
\\
\erickownatlang{rushed} &
\erickowsemexpr{move(m,x)} &
\erickowsemexpr{quick(m)}  &
\\ 
\erickownatlang{entered} &
\erickowsemexpr{move(m,x)} &
&
\erickowsemexpr{into(x,y)}  
\\ 
\hline
\end{tabular}
\end{center}

\section{Null and zero-literal semantic lexical items}
\label{erickowsec:nullsem}
\label{erickowsec:co-anchors}

Lexical items with a \erickowjargon{null semantics} typically correspond to
function words: complementisers \erickownatlang{(John likes \textbf{to}
read.)}, subcategorised prepositions \erickownatlang{(Mary accuses John
\textbf{of} cheating.)}.  Such items need not be lexical items at all.
We can exploit TAG's support for trees with multiple anchors, by
treating them as co-anchors to some primary lexical item. The English
infinitival \erickownatlang{to}, for example, can appear in the tree
\erickowtautree{to~take} as \erickowtree{s(comp(to),v(take),np$\downarrow$)}. 

On the other hand, pronouns have a \erickowjargon{zero-literal} semantics, one
which is not null, but which consists only of a variable index.  For
example, the pronoun \erickownatlang{her} in (\ref{erickowex:pronoun_pol_she}) has
semantics \erickowsemexpr{s} and in (\ref{erickowex:pronoun_pol_control}),
\erickownatlang{he} has the semantics \erickowsemexpr{j}.  

{\footnotesize
\eenumsentence{\label{erickowex:pronoun_pol} 
\item \label{erickowex:pronoun_pol_sue} 
\erickowsemexpr{joe(j), sue(s), book(b), lend(l,j,b,s), boring(b) }  
\\ \erickownatlang{Joe lends Sue a boring book.}

% Note for visually impaired readers: ignore anything with \color{white}. 
% It is used as a form of indentation to help sighted users. 
\item \label{erickowex:pronoun_pol_she} 
\erickowsemexpr{joe(j), {\color{white}sue(s),} book(b), lend(l,j,b,s), boring(b) }  
\\ \erickownatlang{Joe lends her a boring book.}
}
\eenumsentence{\label{erickowex:pronoun_pol_control} 
\item[{\color{white}a.}] \label{erickowex:pronoun_pol_control_inf}
\erickowsemexpr{joe(j), sue(s), leave(l,j), promise(p,j,s,l)}
\\ \erickownatlang{Joe promises Sue to leave.}
\\ or \erickownatlang{Joe promises Sue that he would leave.} 
}}

In figure \ref{erickowfig:polarity_automaton_zerolit_bad}, we compare the
construction of polarity automata for (\ref{erickowex:pronoun_pol_sue}, left)
and (\ref{erickowex:pronoun_pol_she}, right).  Building an automaton for
(\ref{erickowex:pronoun_pol_she}) fails because \erickowtautree{sue} is not available
to cancel the negative polarities for \erickowtautree{lends}; instead, a
pronoun must be used to take its place.  The problem is that the
selection of a lexical item is only triggered when the construction
algorithm visits one of its semantic literals.  Since pronoun semantics
have zero literals, they are \emph{never} selected.  Making pronouns
visible to the construction algorithm would require us to count the
indices from the input semantics.  Each index refers to an entity.  This
entity must be ``consumed'' by a syntactic functor (e.g. a verb) and
``provided'' by a syntactic argument (e.g. a noun).

%\footnote{This also holds true for sentences like \erickownatlang{Joe sings
%badly and Sue sings well}, \erickowsemexpr{sing(s1,j) good(s1), sing(s2,m),
%bad(s2)} because each usage of \erickownatlang{sings} actually corresponds to a
%different lexical item, \erickowtautree{sings1} with the semantics
%\erickowsemexpr{sings(s1,j)} and \erickowtautree{sings2} with \erickowsemexpr{sings(s2,m)}.}.  

\begin{figure}[htpb]
\begin{center}
\includegraphics[scale=0.33]{images/zeroaut-noun.pdf}
\includegraphics[scale=0.33]{images/zeroaut-sans.pdf}
\end{center}
\vspace{-0.4cm}
\caption{Difficulty with zero-literal semantics.}
\label{erickowfig:polarity_automaton_zerolit_bad}
\end{figure}

We make this explicit by annotating the semantics of the lexical input
(that is, the set of lexical items selected on the basis of the input
semantics) with a form of polarities.  Roughly, nouns provide
indices\footnote{except for predicative nouns, which like verbs, are
semantic functors} ($+$), modifiers leave them unaffected, and verbs
consume them ($-$).  Predicting pronouns is then a matter of counting
the indices.  If the positive and negative indices cancel each other
out, no pronouns are required.  If there are more negative indices than
positive ones, then as many pronouns are required as there are negative
excess indices.  In the table below, we show how the example semantics
above may be annotated and how many negative excess indices result:

%\begin{center}
%{\footnotesize
%\begin{tabular}{|cl|cl|cl|}
%\hline
%\multicolumn{2}{|c|}{\textbf{nouns}} &
%\multicolumn{2}{|c|}{\textbf{verbs}} &
%\multicolumn{2}{|c|}{\textbf{modifiers}} \\ 
%\hline
%%
%\erickowtautree{joe}      & \erickowsemexpr{joe(+j)}             &     
%\erickowtautree{lend}     & \erickowsemexpr{lend(l,-j,-b,-s)}    &
%\erickowtautree{boring}   & \erickowsemexpr{boring(b)}             \\ 
%%
%\erickowtautree{sue}       & \erickowsemexpr{sue(+s)}            &     
%\erickowtautree{promise~a} & \erickowsemexpr{promise(p,j,-s,l)}  &     
%& \\
%%
%\erickowtautree{a~book}    & \erickowsemexpr{book(+b)}            &
%\erickowtautree{promise~b} & \erickowsemexpr{promise(p,-j,-s,l)}  &     
%& \\
%\hline
%\end{tabular}\\
%}
%\end{center}

\begin{center}
{\footnotesize
\begin{tabular}{|l|r|r|r|}
\hline
\multicolumn{1}{|c|}{\bf semantics} & 
{\bf \tt b} &
{\bf \tt j} & 
{\bf \tt s} \\ 
\hline
\erickowsemexpr{joe(+j)  sue(+s)  book(+b)  lend(l,-j,-b,-s)  boring(b)} & 
\erickowsemexpr{0} &
\erickowsemexpr{0} &
\erickowsemexpr{0} \\
\hline
\erickowsemexpr{joe(+j)  {\color{white}sue(+s)} book(+b)  lend(l,-j,-b,-s)  boring(b)} & 
\erickowsemexpr{0} &
\erickowsemexpr{0} &
\erickowsemexpr{1} \\
\hline
\erickowsemexpr{joe(+j) sue(+s) leave(l,-j,-s) promise(p,{\color{white}-}j,-s,l)} &
\erickowsemexpr{0} &
\erickowsemexpr{0} & 
\erickowsemexpr{0} \\ 
\hline
\erickowsemexpr{joe(+j) sue(+s) leave(l,-j,-s) promise(p,-j,-s,l)} &
\erickowsemexpr{0} &
\erickowsemexpr{1} & 
\erickowsemexpr{0} \\ 
\hline
\end{tabular}
}
\end{center}

Counting surplus indices allows us to establish the number of pronouns
used and thus gives us the information needed to build polarity
automata.  We implement this by introducing a virtual literal for
negative excess index, and having that literal be realised by pronouns.
Building the polarity automaton as normal yields lexical combinations
with the required number of pronouns, as in figure
\ref{erickowfig:polarity_automaton_zerolit}.   

\begin{figure}[htpb]
\begin{center}
\includegraphics[scale=0.33]{images/zeroaut-pron.pdf}
\end{center}
\vspace{-0.4cm}
\caption{Constructing a polarity automaton with zero-literal semantics.}
\label{erickowfig:polarity_automaton_zerolit}
\end{figure}

This becomes more complicated when the lexical input contains lexical
items with different annotations for the same semantics.  For instance,
the control verb \erickownatlang{promise} has two forms: one which
solicits an infinitive as in \erickownatlang{promise to leave}, and one
which solicits a declarative clause as in \erickownatlang{promise that
he would leave}.  This means two different counts of subject index
\erickowsemexpr{j} in (\ref{erickowex:pronoun_pol_control}) : zero for the form
that subcategorises for the infinitive, or one for the declarative. But
to build a single automaton, these counts must be reconciled, i.e., how
many virtual literals do we introduce for \erickowsemexpr{j}, zero or
one?  

The answer is to introduce enough virtual literals to suppport the
largest count (in this case one), and to balance them by adding the
virtual literals to the lexical semantics of the smaller counts.  To
handle example (\ref{erickowex:pronoun_pol_control}), we introduce one virtual
literal for \erickowsemexpr{j} in order to select the pronoun
\erickownatlang{he} in \erickownatlang{promise that he would leave}.
This extra pronoun is not selected for the infinitive form
\erickownatlang{promises to leave}, because it is accounted for
in the semantics of lexical item \erickowtautree{promise\_to}, which now
consists of \erickowsemexpr{promise(p,j,s,l)} as well as the virtual 
literal \erickowsemexpr{j}.

\begin{figure}[htpb]
\begin{center}
\includegraphics[scale=0.33]{images/zeroaut-promise.pdf}
\end{center}
\vspace{-0.4cm}
\caption{Constructing a polarity automaton with zero-literal semantics.}
\label{erickowfig:polarity_automaton_zerolit_promise}
\end{figure}

\section{Related work}
\label{erickowsec:related_work}

Polarity filtering is not the only mechanism for dealing with ambiguity.
\cite{Kay96} proposes a chart generation algorithm which groups
intermediate results into equivalence classes according to their
semantic coverage, syntactic category and a distinguished semantic
index. \cite{shemtov96} improves on this approach by using a coarser
representation of semantic coverage to produce larger classes.
Equivalence classes allow for the variants of an intermediate solution,
for example those which arise from lexical ambiguity, to be packed
together and processed as a single group.  The more structures are
recognised as equivalent, the more redundant computation is avoided.  
Our surface realisation algorithm does not keep track of equivalence
classes, but we plan to investigate how they can be incorporated, 
how usefully they would combine with polarity filtering in practice.  

A more similar approach is that of \cite{kollerStriegnitz02}.  The TAG
lexical input is converted into a set of lexical entries of a dependency
grammar.  These entries are then parsed by an efficient constraint-based
dependency parser and the output is converted back into a set of TAG
derived trees representing the surface realisation output.  The main
similarity between this and our approach is that they both use a global
mechanism for filtering out combinations of lexical entries that cannot
possibly lead to a syntactically valid sequence.  Both approaches
involve the cancelling out of syntactic resources and requirements.
Interestingly, Koller et al. take adjunction into account, whereas our
approach largely ignores auxiliary trees.  Their treatment of auxiliary 
trees could be a useful addition to the polarity filter.  A key
difference is that though Koller et al. handle null semantics, and offer
some thoughts on multiple literal semantics, they do not account for
zero-literal semantics.  It would be worthwhile to see if semantic
index counting can also be used within a constraint propagation
framework.

\section{Conclusion}

Polarity filtering was introduced for parsing in \cite{bonfante04}.  In 
\cite{kow04}, we transpose this approach to surface realisation and
show that it greatly enhances efficiency.  However, several linguistic 
issues arise which stem from lexical items displaying multi-literal
semantics, null semantics or zero-literal semantics.  In this paper, we
show how the polarity algorithm can be extended to deal with such items.
Briefly, multi-literal semantic items are catered for by introducing a
sort of literal counting in the automaton; null semantic lexical items
by using TAG support for multiple anchors and zero-literal lexical items
by introducing index counting and modifying automaton construction
accordingly.

One point is worth noting.  Whereas null semantic and multiple literal
items have been discussed before
\shortcite{shieber88,calder89,carrollEtAl99}, zero-literal items have
been paid scant attention in the surface realisation literature. Their
handling, however, is important for both linguistic and computational
reasons.  Linguistically, it is required to support the generation of
sentences containing pronouns and control verbs as well as full noun
phrases and non-control verbs. Computationally, it is necessary to
restrain the semantic wildcard behaviour of pronouns. In a naive 
algorithm, a pronoun is selected for every index in the input semantics,
which means any intermediary structure created during surface
realisation can be combined with a pronoun, correctly or not.  This is
further aggravated because every lexical combination is explored and the
number of intermediary structures is itself very large.  The approach
presented here addresses these two problems as follows.  On the one
hand, the wildcard effect is eliminated because index counting only
allows pronouns to be selected if they can be associated with a
\emph{specific} ``excess'' index in the semantics.  On the other hand,
polarity filtering is used to drastically reduce the number of lexical
combinations to be explored.

\bibliographystyle{chicago}
\bibliography{kow-esslli05}

\end{document}

