PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation Paper • 2409.06820 • Published 9 days ago • 55
Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature Paper • 2408.15836 • Published 23 days ago • 11
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 114
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published 24 days ago • 137
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time Paper • 2408.13233 • Published 27 days ago • 20
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models Paper • 2408.03837 • Published Aug 7 • 17