The central tension in modern AI research isn’t just about making models more capable; it is about how we verify that the humans using them are still capable, too. In learning-intensive fields—academia, high-stakes research, professional certification—we have long relied on the "artifact" (the essay, the code, the research paper) as a proxy for the creator's understanding.
A new paper by Seine A. Shintani, "AI to Learn 2.0," addresses the moment this proxy fails. When an opaque model can produce a polished result, the result no longer serves as credible evidence that the person who submitted it actually understands what they’ve produced. This is "proxy failure," and Shintani’s work provides a rigorous framework to fix it.
Methodology Shift
The methodology here is a shift from governing the process to governing the deliverable. Instead of trying to detect AI usage—a game of cat-and-mouse that the "cats" are currently losing—the framework establishes a maturity rubric for the final output. It distinguishes between the "artifact residual" (the physical or digital result) and the "capability residual" (what the human takes away from the task).
The framework requires a five-part deliverable package that must be usable, auditable, and—most importantly—justifiable without the original large language model. File this one carefully: it suggests that if you cannot explain the "why" behind a specific line of code or a specific argumentative turn without asking the AI to explain it for you, the deliverable fails the audit.
Capability-Evidence Ladder
What makes this particularly elegant is the "capability-evidence ladder." In Shintani’s contrastive cases, such as a teacher-audited exam practice, the framework doesn't ban the AI during the exploration or drafting phases. It allows the model to act as a partner in hypothesis generation or workflow design. However, it sets "gate thresholds" on critical dimensions. If the human cannot demonstrate transfer—applying the logic of the AI-assisted work to a new, unassisted problem—the work is essentially marked as a "substitution" rather than a "handoff."
Handoff-Ready Workflows
The detail that changes the interpretation of this work is its focus on "handoff-ready" workflows. Shintani isn't arguing for a return to the pre-AI era. Instead, the paper describes a world where AI is a high-speed engine, but the human is the permanent record. For a deliverable to be mature under this rubric, it must include human-attributable evidence of explanation. We are moving from an era of "Look what I made" to "Look what I can prove I understand about what I made."
Worth the attention of anyone currently struggling with "AI policy" in a classroom or a lab. Most policies are built on prohibitions; this one is built on validity. It acknowledges that the AI is opaque, but insists that the human output cannot be.
It is a quiet, methodological response to a loud problem. As we build systems that reason better, we must refine the frameworks that ensure we haven't stopped reasoning ourselves. The record should include this as a foundational step toward a more honest relationship with our tools.


