Announcement_8
New preprint alert! We introduce LOGICAL-COMMONSENSEQA, a benchmark designed to isolate multi-fact and compositional inference in LLMs using logical operators (AND, OR, NEITHER/NOR). Our evaluation reveals that fluent reasoning often masks underlying logical failures, especially in negation-based tasks. Check the paper on arXiv!
Enjoy Reading This Article?
Here are some more articles you might like to read next: