A hands-on intro to prompt engineering. Learn the four levers (role, format, examples, constraints) and watch a vague prompt turn into a reliable one.

On this page

Prompt Engineering Basics: From "Help Me" to Working Prompts

By the end of this post you'll know the four levers that turn vague LLM prompts into reliable ones, and you'll have run a side-by-side comparison showing how each lever changes output quality. The whole thing takes about twenty minutes and any LLM API will do — examples here use OpenAI, but the patterns transfer.

What prompt engineering actually is #

Less mysterious than it sounds. A "prompt" is the text you send to an LLM. "Engineering" the prompt means structuring that text so the model produces the answer you want, consistently, even on inputs you haven't tested.

The bad version of prompt engineering is rearranging adjectives and adding superlatives ("respond like an EXPERT", "you MUST be detailed"). Modern models ignore most of that.

The good version uses four levers, in order of impact:

Role and goal — tell the model what task it's doing, in one sentence
Output format — tell it the exact shape you want back, with an example
Few-shot examples — show 2–3 input/output pairs for tasks where the shape is hard to describe
Constraints — explicit rules ("only use the categories below", "respond with N or fewer words")

We'll walk through each by transforming a deliberately bad prompt into a good one.

Step 1: Setup #

bash.bash

pip install openai
export OPENAI_API_KEY="sk-..."

Save this as prompt_demo.py:

python.python

import openai
client = openai.OpenAI()

def ask(prompt: str, system: str | None = None) -> str:
    messages = []
    if system:
        messages.append({"role": "system", "content": system})
    messages.append({"role": "user", "content": prompt})
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        temperature=0,
    )
    return resp.choices[0].message.content

We use temperature=0 so output is deterministic — the same prompt gives the same answer. That makes side-by-side comparisons meaningful.

The bad prompt #

Our running example: classify customer support tickets into one of four categories. Bad version first:

python.python

ticket = "My credit card was charged twice yesterday for the same order. Please refund the duplicate charge."
print(ask(f"What category is this ticket? {ticket}"))

You'll get something like:

code

This ticket is related to billing or payment issues. Specifically, it concerns
a duplicate charge on a credit card and a request for a refund...

That's an essay, not a category. Useless if you're routing tickets to teams.

Step 2: Lever 1 — give it a role and a goal #

python.python

SYSTEM = "You categorize customer support tickets to route them to the correct team."

print(ask(ticket, system=SYSTEM))

Better — but still too verbose. The model now knows what it's doing but doesn't know what shape you want back.

Step 3: Lever 2 — specify the output format with an example #

python.python

SYSTEM = """You categorize customer support tickets to route them to the correct team.

Respond with JSON like: {"category": "BILLING"}

Categories: BILLING, TECHNICAL, ACCOUNT, OTHER"""

print(ask(ticket, system=SYSTEM))

You should get:

json.json

{"category": "BILLING"}

That's parseable. The literal example in the system prompt anchors the format better than describing it ("respond with JSON containing a category field").

Step 4: Lever 3 — few-shot examples for harder cases #

For straightforward classification, the system prompt above is enough. For tasks with subtle judgment calls — multi-label classification, structured extraction, tone matching — examples matter more than instructions.

Try this richer task: extract the customer's intent and the urgency level.

python.python

SYSTEM = """You extract intent and urgency from customer support tickets.

Examples:

Input: "My credit card was charged twice. Please refund."
Output: {"intent": "refund_request", "urgency": "high"}

Input: "How do I change my profile picture?"
Output: {"intent": "how_to_question", "urgency": "low"}

Input: "Production is down. Customers can't log in. URGENT."
Output: {"intent": "outage_report", "urgency": "critical"}

Now respond with the same JSON format for the new ticket."""

print(ask("My subscription auto-renewed but I cancelled last week.", system=SYSTEM))

You should get something like:

json.json

{"intent": "billing_dispute", "urgency": "high"}

Three examples is the sweet spot for most tasks — one easy, one ambiguous, one edge case. More examples eat tokens (cost + latency); fewer leave the model guessing about format.

Step 5: Lever 4 — constraints #

Constraints lock down behavior the examples don't cover. Two examples that earn their place in real prompts:

Refusal phrasing. Force the model to say "I don't know" in a specific way you can detect:

python.python

SYSTEM += '\n\nIf the ticket is unclear or out of scope, respond exactly: {"intent": "unknown", "urgency": "unknown"}'

Now you can branch on the output: if intent == "unknown": route_to_human().

Allowed value lists. For classification, list the only categories you accept. The model will pick from those instead of inventing new ones.

Length caps. "Respond in 50 words or fewer" or max_tokens=100. Caps cost and forces concision.

Common mistakes #

Stuffing the prompt with rules. "MUST NOT include greeting. MUST cite sources. MUST NOT use jargon. MUST..." — long rule lists conflict with each other and confuse the model. If you have more than ~5 rules, you probably need few-shot examples instead.

Capitalizing for emphasis. MUST is no more effective than must. Modern models don't weight capitalization. Save the keystrokes.

Putting the most important instruction in the middle. LLMs attend to the start and end of the prompt more than the middle. Put critical constraints (output format, refusal phrasing) at both ends.

Testing only on easy cases. A prompt that handles the obvious queries can fall apart on adversarial input or rare formats. Keep a small eval set of 20–50 tricky inputs and re-run it on every prompt change.

What to read next #

You now have the moves. The next levels:

Prompt engineering best practices: what actually works in production — patterns we kept and the ones we abandoned across 40 production prompts
Field notes: prompt versioning and regression testing — how to ship prompt changes without breaking things
AI security: a practical threat model — when user input lands in prompts, what to watch for

Prompt engineering isn't magic. It's clear specification: tell the model what task it's doing, what shape you want back, and what to do when it isn't sure. The four levers above cover ~90% of real prompts. The rest is iteration.

Prompt Engineering Basics — From "Help Me" to Working Prompts

Prompt Engineering Basics: From "Help Me" to Working Prompts

What prompt engineering actually is #

Step 1: Setup #

The bad prompt #

Step 2: Lever 1 — give it a role and a goal #

Step 3: Lever 2 — specify the output format with an example #

Step 4: Lever 3 — few-shot examples for harder cases #

Step 5: Lever 4 — constraints #

Common mistakes #

What to read next #

Stay Updated

Linux File Permissions — Read, Write, Execute Without Tears

SSH Tutorial — Keys, Config, and Working Remotely

More from AI

What Are Embeddings? A Beginner's Guide with Code

Build Your First RAG App in 100 Lines of Python

Fine-Tuning vs RAG vs Long-Context: A Decision Framework With Numbers

What Are Embeddings? A Beginner's Guide with Code

Build Your First RAG App in 100 Lines of Python

Fine-Tuning vs RAG vs Long-Context: A Decision Framework With Numbers

LLM Output Validation: Schema-First Prompt Engineering Patterns

Terraform Tutorial — Your First Infrastructure-as-Code Project

SSH Tutorial — Keys, Config, and Working Remotely

About Admin

You might have missed

GitOps with Argo CD: Best Practices for 2025

Linux Performance Tuning for Containers and Kubernetes Nodes

AWS Lambda and Serverless Best Practices for Production