1 minute read

My thesis titled Learning Semantics from Meaning Representations: From Distributional and Graph-Grammatical Perspectives is now available at the CUHK Electronic Theses & Dissertations Collection.

As a teaser, here are the first three paragraphs of the Introduction:

Computational semantics looks at the issues of semantics through computational perspectives or methods. Most works on traditional formal semantics adopt a theory-centric approach where computations only contribute to practical systems for usage (e.g., parsers are developed from formal language theory) or verifications of claims (e.g., logic programming and Bayesian inference are used in automated reasoning). In the past few decades, the advancement in computational technology has opened the door to a data-driven paradigm. This enables us to automate analyses of a large amount of data, which otherwise require time and effort beyond what humans can reasonably invest, and discover complex correlations and patterns in data using advanced statistical tools, which otherwise is impossible to be worked out by manual effort.

One of the most successful linguistic adaptations that has ridden the wave of the data-centric shift is distributional semantics where meanings of expressions are derived from where they occur in a corpus. Formal semantics and distributional semantics provide complementary strengths and weaknesses to each other: the former is logically rigorous but it is not easy to be actualized in extralinguistic environments whereas the latter is logically loose but learnable from data. Data is also helpful to the development of formal grammar. While grammars provide precise formalization of syntax and semantics of natural languages, they are often expensive to engineer. From suitably annotated data, grammars can be induced automatically to approximate language production, possibly improving interpretability and computational efficiency.

In this thesis, I connect the data-driven approach to semantics to meaning representations. Meaning representations provide linguistically informative annotations of texts, which include predicate–argument structure, compositional derivation of syntactic and semantic structures, morphosyntactic lexical disambiguation, and scope constraint of quantifiers and logical operators. In particular, I use them to learn truth-conditional representations of words with Functional Distributional Semantics (FDS), and a graph-based formalization of semantic compositions with probabilistic synchronous hyperedge replacement grammar (PSHRG).

Updated: