A four-dimension framework for evaluating the quality of a person's AI engagement practice: how they plan, work, reflect, and think critically.
Most evaluation of AI use focuses on the output: whether what came back was accurate, useful, or well-structured. That matters. But it only tells part of the story.
The quality of an AI output is largely determined before the output exists. It is a direct result of how clearly the person defined what they needed, how deliberately they chose their tool, how effectively they directed the exchange, and how honestly they reflected on what they got back.
The Personal Engagement Meta framework evaluates that practice. It explores how the person engaged with the AI to get there. Four dimensions. Four questions that most people using AI have never asked themselves.
Used alongside the Output Evaluator Rubric, the two frameworks provide both the diagnosis and the explanation. The rubric identifies what fell short. This framework identifies why.
Every dimension is scored out of ten. Scores map to five bands. The descriptions are intentionally honest. Adequate is not a compliment, and Exemplary is earned precisely because the bands beneath it are not.
Each dimension assesses a different stage of AI engagement practice: from how a person prepares before a conversation starts, through how they work during it, to how honestly they reflect on what they produced and how critically they engaged throughout.
This dimension assesses what happens before an AI conversation starts: whether the purpose is defined, whether the right tool has been selected for the task, and whether the engagement has been structured to produce what is actually needed rather than what is easiest to ask for.
| Band | Score | Descriptor |
|---|---|---|
| Insufficient | 0–2 | No evidence of planning; engagements appear reactive and undefined; tool selection appears arbitrary; the prompt reflects no prior thought about what is actually needed. |
| Partial | 3–4 | Some planning evident but incomplete; purpose broadly stated rather than precisely defined; tool selection may be habitual rather than considered. |
| Adequate | 5–6 | Clear purpose stated before or at the start of engagements; tool selection is appropriate; the engagement is structured well enough to produce usable output. |
| Capable | 7–8 | Engagements are well-designed before they begin; purpose, audience, constraints, and desired output are clearly established; tool is selected deliberately against task requirements. |
| Exemplary | 9–10 | Planning is architectural; the engagement structure itself reflects a clear mental model of what the AI can and cannot contribute; nothing is left to chance that could have been specified; the quality of the output is largely determined before the conversation begins. |
This dimension assesses what happens during an AI conversation: whether the person iterates purposefully, redirects where necessary, builds depth through sustained engagement, and treats the first response as a starting point rather than a conclusion.
| Band | Score | Descriptor |
|---|---|---|
| Insufficient | 0–2 | No iteration evident; first response accepted regardless of quality; the engagement is transactional: one prompt, one output, done. |
| Partial | 3–4 | Some iteration present but unfocused; follow-up prompts are reactive rather than purposeful; the exchange develops but without clear direction. |
| Adequate | 5–6 | Purposeful iteration present; the person redirects where the output drifts; the engagement develops the output beyond the first response in a meaningful way. |
| Capable | 7–8 | Iteration is targeted and efficient; the person identifies precisely where the output needs development and applies focused follow-up; the final output is materially better than the first response. |
| Exemplary | 9–10 | The engagement is a genuine collaboration; the person uses the AI's capabilities fully and knows when to push further and when to stop; iteration decisions are evidence of discriminating judgement, not habit. |
This dimension assesses whether the person looks back honestly at their AI practice, not just at the outputs it produced. It asks whether reflection is present, whether it is honest, and whether it is translated into changed behaviour rather than remaining at the level of abstract awareness.
| Band | Score | Descriptor |
|---|---|---|
| Insufficient | 0–2 | No reflection evident; outputs are accepted or rejected without examining the practice that produced them; there is no feedback loop from output quality to engagement quality. |
| Partial | 3–4 | Some reflection present but shallow or intermittent; awareness that practice could improve without specific diagnosis of how. |
| Adequate | 5–6 | Honest reflection on AI use is present; the person can identify what worked and what did not at the level of specific practice decisions. |
| Capable | 7–8 | Reflection is structured and developmental; findings from evaluation are translated into adjusted practice; the person applies learning across engagements, not just within them. |
| Exemplary | 9–10 | Self-evaluation is a deliberate, recurring practice; the person maintains an honest account of where their AI use stands, applies that account to improve future engagements, and is as rigorous about their own practice as they are about the outputs it produces. |
This dimension assesses whether the person thinks with the AI rather than deferring to it. It asks whether the person identifies weaknesses in AI reasoning, resists plausible but unsupported conclusions, brings their own knowledge and judgement to bear throughout, and knows when to accept a strong output and when to push back on a weak one. Critical thinking here is not scepticism for its own sake: it is purposeful, calibrated engagement that produces better outcomes than uncritical acceptance would.
| Band | Score | Descriptor |
|---|---|---|
| Insufficient | 0–2 | AI outputs accepted uncritically; no evidence of independent judgement applied; the person's role is passive; the AI sets the terms throughout. |
| Partial | 3–4 | Some independent judgement present but applied inconsistently; challenge occurs but is not well-calibrated; significant outputs accepted without interrogation where interrogation was warranted. |
| Adequate | 5–6 | Critical engagement present and broadly sound; the person identifies where the AI's response requires further examination; challenge is purposeful rather than reflexive. |
| Capable | 7–8 | Strong discrimination between what warrants challenge and what does not; the person's own knowledge and judgement are visibly active throughout; acceptance of strong outputs reflects discrimination, not passivity. |
| Exemplary | 9–10 | The person thinks with the AI rather than through it; identifies assumptions, gaps, and overconfident conclusions with precision; challenge and acceptance are both expressions of independent, well-calibrated judgement; the engagement produces something neither party would have reached alone. |
Practice Coherence is not simply the average of four dimension scores. A person can plan well but not iterate, reflect honestly but not apply it, think critically but without a structured evaluation habit to anchor it.
Coherence asks whether the four dimensions reinforce each other: whether strength in one dimension is supported by and translates into strength in the others. It is the difference between four isolated competencies and an integrated, intentional practice.
A high coherence score alongside lower dimensional scores is rare but possible. It signals that a developing practice is at least consistent and self-aware. A low coherence score alongside higher dimensional scores signals that strengths are not yet connected into something that compounds..
| Band | Score | Descriptor |
|---|---|---|
| Insufficient | 0–2 | No coherent practice evident; dimensions appear disconnected or contradictory; strengths in one area do not translate to others. |
| Partial | 3–4 | Some dimensions are stronger than others in ways that suggest the practice is developing unevenly; coherence is partial rather than consistent. |
| Adequate | 5–6 | The four dimensions operate together in a broadly consistent way; practice is coherent enough to produce reliable results. |
| Capable | 7–8 | The dimensions reinforce each other; strength in planning is visible in iteration; self-evaluation shapes future planning; critical thinking is active at every stage. |
| Exemplary | 9–10 | The four dimensions are inseparable in practice; each one draws on and strengthens the others; the practice as a whole is greater than the sum of its parts. |
The framework is honest by design. These principles govern how evaluations are conducted: whether by AI or by a human evaluator working through the dimensions independently.
Developmental feedback serves the person better than comfortable feedback. The delivery can be softened where appropriate. The content cannot.
Weaknesses are located precisely in observable behaviour. "Could be stronger" is not useful. Where it fell short and what it cost the practice: that is useful.
Every dimension carries an honest note about what is and is not observable. Limited evidence is not a reason to score generously. It is a reason to note the limit and score conservatively.
If a dimension genuinely scores Exemplary, it is recorded as Exemplary. Inventing critique to appear rigorous is its own failure of the framework's principles.
A strong overall impression does not inflate weak dimensions. A weak overall impression does not deflate strong ones. Each dimension is evaluated on its own evidence.
If a dimension scores Partial, that is named clearly before noting what worked. Concerns are not buried at the end of paragraphs that lead with strengths.