GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. ICLR’19 Alex Wang1 , Amanpreet Singh1 , Julian Michael2 , Felix Hill3 , Omer Levy2 & Samuel R. Bowman1 Local copy
This paper has a few takeaways summarized by me:
However, this model still achieves a fairly low absolute score. Analysis with our diagnostic dataset reveals that our baseline models deal well with strong lexical signals but struggle with deeper logical structure.