Rules reference

Understand what the engine checks

The deterministic rules score the structure and clarity of the prompt itself. The LLM rule is opt-in and looks for issues deterministic checks miss.

Rules in the current public release

Rule IDCategoryWhat it checks
min-lengthspecificityPrompt is not too short.
max-lengthstructurePrompt is not excessively long.
no-output-formatspecificityThe expected answer format is specified.
no-examplesbest-practiceFew-shot examples are present.
no-rolebest-practiceA role or persona is assigned.
no-contextspecificityBackground context is provided.
ambiguous-negationclarityNegative instructions are not overly vague or stacked.
no-constraintsspecificityExplicit constraints are defined.
all-caps-abuseclarityALL CAPS is not overused for emphasis.
vague-instructionclarityQualifiers like good or appropriate are not left undefined.
missing-taskclarityAn explicit task or request is detectable.
no-structured-formatstructureLong prompts use visible structure such as sections or tags.
llm-prompt-reviewmodel-specificOpt-in LLM review for issues deterministic rules miss.

Rewrite suggestions

When a deterministic rule fails, PromptScore can attach a concrete rewrite snippet that you (or your tooling) can paste into the prompt. The snippet ships on the same RuleResult as the fix message, exposed as a structured field:

interface PromptRewrite {
  title: string;
  snippet: string;
  placement: 'prepend' | 'append';
}

interface RuleResult {
  // ...message, suggestion, reference, etc.
  rewrite?: PromptRewrite;
}

Seven of the deterministic rules emit a rewrite today: missing-task, no-role, no-context, min-length, no-output-format, no-examples, and no-constraints. The remaining deterministic rules either need user-supplied content (e.g. measurable acceptance criteria for vague-instruction) or a transformation that can’t be automated deterministically (e.g. max-length would require summarization). For those rules, the fix message is still authoritative.

The opt-in llm-prompt-review rule also emits a rewrite — one per non-generic issue type (ambiguity, conflict, grounding, success criteria, task framing). When the LLM passes the prompt, or flags it under the catch-all general category, no rewrite is attached and the existing message and suggestion remain authoritative. The five per-issue snippets appear alongside each issue label below.

The CLI text and markdown reporters render the snippet alongside the suggestion and reference. The browser analyzer surfaces the same snippet in each finding card. The JSON output contains the field verbatim, so editor integrations and CI tooling can apply rewrites programmatically.

Deterministic rules

min-lengthspecificity

Prompts shorter than 20 words rarely give a model enough to work with. The score scales linearly with the word count below the threshold.

Fix: Add detail about what you want, why, and how the output should look.

Rewrite (append): Flesh out the prompt

<context>
  <who is asking and why>
</context>

<instructions>
  <what you want, step by step>
</instructions>

<output_format>
  <exact shape of the answer>
</output_format>

max-lengthstructure

Very long prompts (>1500 words) tend to contain redundancy and dilute the model’s focus. The score decreases gradually past the soft limit.

Fix: Look for repeated instructions, bundled unrelated tasks, or sections that can be summarized.

no-output-formatspecificity

The prompt should tell the model exactly how to format its answer, otherwise the model will guess and consumers cannot rely on the shape.

Fix: State the exact format: JSON schema, bullet list, markdown table, single sentence, etc.

Rewrite (append): Specify the output format

<output_format>
  Return a <JSON object | bullet list | markdown table | single paragraph>.
  Required fields / sections: <list>.
  Do not include: <what to omit>.
</output_format>

no-examplesbest-practice

Examples dramatically improve consistency on classification, extraction, and formatting tasks. Profiles like claude weight this rule higher.

Fix: Add 1–3 concrete examples showing the input and the expected output.

Rewrite (append): Add few-shot examples

<examples>
  <example>
    <input><sample input></input>
    <output><expected output></output>
  </example>
  <example>
    <input><sample input></input>
    <output><expected output></output>
  </example>
</examples>

no-rolebest-practice

Assigning a role focuses the model and sets expectations for expertise and tone. Useful for both system and user messages.

Fix: Start with something like "You are a senior <X> who specializes in <Y>."

Rewrite (prepend): Assign a role

You are a senior <role> who specializes in <domain>. You write for <audience> and prioritize <quality bar>.

no-contextspecificity

Context helps the model understand the situation, audience, and constraints. Long prompts (>=80 words) are assumed to provide implicit context.

Fix: Explain the situation: who the user is, why they’re asking, and what the stakes are.

Rewrite (prepend): Add a context block

<context>
  Who is asking: <user>
  Why they are asking: <motivation>
  Constraints or stakes: <what cannot break>
</context>

ambiguous-negationclarity

Models follow positive instructions ("do Y") more reliably than negations ("don't do X"). Heavy stacking of negations correlates with regressions.

Fix: Rewrite "don't do X" as "do Y instead". Tell the model what the desired behavior is.

no-constraintsspecificity

Constraints keep the model on track and prevent scope drift. They are also the easiest hook for downstream evaluation.

Fix: Add constraints like length limits, scope boundaries, or things the answer must include.

Rewrite (append): Add explicit constraints

<constraints>
  - Length: <e.g. ≤ 200 words>
  - Scope: <what is in scope and what is not>
  - Must include: <required elements>
  - Must avoid: <forbidden elements>
</constraints>

all-caps-abuseclarity

Excessive ALL CAPS is noisy and rarely the clearest way to emphasize something. Bold, quotes, and XML tags work better.

Fix: Use bold (**word**), quotes, or XML tags for emphasis instead of ALL CAPS.

vague-instructionclarity

Vague qualifiers ("good", "proper", "appropriate") don't give the model a measurable target. Replace them with concrete acceptance criteria.

Fix: Replace vague words with measurable criteria. "Good" → "concise (≤ 3 sentences) and citing sources".

missing-taskclarity

This is the highest-weight rule and the only one that fires as an error by default. If the model can't identify a task, the rest of the prompt is wasted.

Fix: State the task explicitly: "Your task is to..." or "Please <verb> <object>".

Rewrite (prepend): Add an explicit task

Your task is to <verb> <object>. Specifically: <what success looks like>.

no-structured-formatstructure

Long prompts (>100 words) are easier for a model to follow when broken into sections. XML tags work especially well for Claude; markdown headers work well for GPT.

Fix: Split the prompt into labeled sections: <instructions>, <context>, <examples>, <output_format>.

LLM-backed rule (experimental, opt-in)

The llm-prompt-review rule is skipped unless you enable --llm in the CLI or include_llm: true in the project config and provide a configured LLM client. It calls the configured provider to catch hidden ambiguity, missing grounding, conflicting instructions, unrealistic task framing, and unclear success criteria.

When the model reports a failure, PromptScore normalizes the review into one of the issue types below. The reported reference on the rule result links to the matching anchor here.

Ambiguity

The prompt has multiple plausible readings, missing scope, or under-specified inputs.

Fix: Specify the task, input, audience, constraints, and expected output.

Rewrite (append): Make the ambiguous parts explicit

<inputs>
  <input variable> = <where the data comes from>
</inputs>

<scope>
  In scope: <what to consider>
  Out of scope: <what to ignore>
</scope>

<expected_output>
  Shape: <exact format>
  Audience: <who is reading>
</expected_output>

Conflicting instructions

The prompt asks for incompatible behaviors (e.g. JSON and plain text) or contradicts itself across sections.

Fix: Choose one instruction path and remove the incompatible wording.

Rewrite (append): Pick one consistent set of instructions

<resolved_instructions>
  Output format: <ONE choice — JSON | markdown | plain text>
  Length: <ONE target — exact words / characters / sentences>
  Tone: <ONE register — formal | casual | technical>
</resolved_instructions>

Remove any earlier wording that contradicts these resolved instructions.

Grounding

The prompt expects facts, citations, or jurisdiction-specific knowledge without supplying source material or scope.

Fix: Provide source material, scope, assumptions, and how the model should handle uncertainty.

Rewrite (append): Add the missing grounding

<grounding>
  Sources: <attached docs, URLs, or {{variable}} placeholders>
  Scope: <jurisdiction, time period, domain>
  Assumptions: <what the model can take as given>
  Uncertainty: If the source material is silent on a point, say "not stated in the provided material" rather than guessing.
</grounding>

Success criteria

The prompt does not state what a good answer must include, avoid, or optimize for.

Fix: State what a good answer must include, avoid, and optimize for.

Rewrite (append): Define measurable success criteria

<success_criteria>
  A good answer must include: <required elements>
  A good answer must avoid: <forbidden elements or patterns>
  Optimize for: <one explicit metric — accuracy | brevity | citation density | etc.>
</success_criteria>

Task framing

The task is unrealistic, overbroad, or asks for guarantees the model cannot reasonably provide.

Fix: Narrow the task to something the model can reasonably complete and verify.

Rewrite (prepend): Reframe the task into something verifiable

Narrow the task to a single deliverable: <one concrete output>.
Out of scope: <items the model should not attempt>.
A reviewer can verify success by checking: <one observable check>.

How scoring should be interpreted

  • The score is a structural signal, not a guarantee of output quality.
  • Rule weight and severity come from the active profile.
  • missing-task is intentionally the most important rule in the default experience.
  • Suggestions are sorted by likely impact, not just in file order.

What rules are not doing yet

PromptScore does not currently validate runtime grounding, output correctness, safety outcomes, or tool behavior. Those are different problems and should remain clearly separated from prompt linting.