Best AI papers explained

Un pódcast de Enoch H. Kang

550 Episodo

The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs
Publicado: 9/5/2025
Limits to scalable evaluation at the frontier: LLM as Judge won’t beat twice the data
Publicado: 9/5/2025
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation
Publicado: 9/5/2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Publicado: 9/5/2025
Prediction-Powered Statistical Inference Framework
Publicado: 9/5/2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Publicado: 9/5/2025
RM-R1: Reward Modeling as Reasoning
Publicado: 9/5/2025
Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy
Publicado: 8/5/2025
Decoding Claude Code: Terminal Agent for Developers
Publicado: 7/5/2025
Emergent Strategic AI Equilibrium from Pre-trained Reasoning
Publicado: 7/5/2025
Benefiting from Proprietary Data with Siloed Training
Publicado: 6/5/2025
Advantage Alignment Algorithms
Publicado: 6/5/2025
Asymptotic Safety Guarantees Based On Scalable Oversight
Publicado: 6/5/2025
What Makes a Reward Model a Good Teacher? An Optimization Perspective
Publicado: 6/5/2025
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Publicado: 6/5/2025
Identifiable Steering via Sparse Autoencoding of Multi-Concept Shifts
Publicado: 6/5/2025
You Are What You Eat - AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
Publicado: 6/5/2025
Interplay of LLMs in Information Retrieval Evaluation
Publicado: 3/5/2025
Trade-Offs Between Tasks Induced by Capacity Constraints Bound the Scope of Intelligence
Publicado: 3/5/2025
Toward Efficient Exploration by Large Language Model Agents
Publicado: 3/5/2025

20 / 28

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.

Visit the podcast's native language site

550 Episodo

The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs

Limits to scalable evaluation at the frontier: LLM as Judge won’t beat twice the data

Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

Accelerating Unbiased LLM Evaluation via Synthetic Feedback

Prediction-Powered Statistical Inference Framework

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

RM-R1: Reward Modeling as Reasoning

Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy

Decoding Claude Code: Terminal Agent for Developers

Emergent Strategic AI Equilibrium from Pre-trained Reasoning

Benefiting from Proprietary Data with Siloed Training

Advantage Alignment Algorithms

Asymptotic Safety Guarantees Based On Scalable Oversight

What Makes a Reward Model a Good Teacher? An Optimization Perspective

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Identifiable Steering via Sparse Autoencoding of Multi-Concept Shifts

You Are What You Eat - AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

Interplay of LLMs in Information Retrieval Evaluation

Trade-Offs Between Tasks Induced by Capacity Constraints Bound the Scope of Intelligence

Toward Efficient Exploration by Large Language Model Agents