ICLR submission: https://openreview.net/forum?id=SygcCnNKwr
Github repo with dataset and tools: https://github.com/google-research/google-research/tree/master/cfq
The authors present a metric and new dataset of quantitatively measuring compositional generalization. The dataset consists of natural language questions and answers, and can be used for semantic parsing or QA. Their experiments indicate that various standard seq2seq architectures don't generalize compositionally.