Human Judgment as a Specification

June 17, 2026 at 12:45

Quality: 8/10 Relevance: 9/10

Summary

The Brown PLT blog post argues that as GenAI enters programming, formal methods are needed to ensure AI-generated solutions meet user intent. It discusses the challenge of turning informal prose into formal specifications and argues for keeping humans in the loop to avoid misinterpretation and automation bias. It introduces PICK, a tool that, given a prompt (e.g., a regex for dates), returns several plausible candidates and shows concrete strings that distinguish them, asking users to upvote or downvote. PICK is demonstrated across three domains—regular expressions, linear temporal logic, and attribute-based access control—using the same algorithm: generate candidates, sample differences, present scenarios, update scores, and converge or admit defeat. The workflow relies on closure under negation and intersection and the ability to sample the differences between candidates, enabling a spec-elucidation process where human judgments reveal implicit intent. The authors argue this approach provides a meaningful and moderate human-in-the-loop workflow that serves as an independent witness to user intent, helps catch mismatches between prompts and outcomes, and remains valuable even as models improve. They point to an ECOOP 2026 paper and a Pick-regex VS Code extension for readers to explore.

LLM & Prompting AI Tools

Read Original Article