Agent Skills

How Domain Experts Can Build AI Agent Skills

Using a medical monitor review workflow as an example

AI tools are already very good at language, summarization, reasoning, and pattern recognition. But many experts still use them like generic chatbots: copy data into an AI tool, ask a question, adjust the prompt, and repeat.

That works for experimentation, but it does not scale well for recurring professional work. The real opportunity is not just asking AI questions. The real opportunity is turning expert workflows into reusable AI agent skills.

Instead of repeatedly telling the AI how to think, define the workflow once as a reusable skill.

What Is an AI Agent Skill?

An AI agent skill is a reusable instruction framework that tells an AI agent how to perform a specific type of work.

A skill is not just a prompt. It can include review logic, evaluation criteria, expected inputs, output structure, escalation rules, domain-specific reasoning patterns, validation steps, and retrieval of supporting knowledge.

In simple terms, a skill captures how an expert approaches recurring work.

Simple Formula

Stored Skill + User Data + Optional Retrieved Knowledge = Structured AI-Assisted Output

Why Generic Prompting Breaks Down

Many professionals start with a simple prompt such as:

Review this report and identify risks.

Sometimes the result is useful. Sometimes it is vague, inconsistent, or misses important findings. The user then adds more instructions: focus on operational risks, check compliance issues, summarize key findings, provide recommendations, use bullet points, and so on.

Over time, prompts become longer, harder to manage, inconsistent, difficult to reuse, and difficult to share across teams.

The problem is not the AI model itself. The problem is that the workflow exists only inside scattered prompts and human memory.

The Shift: From Prompting to Skills

Instead of writing the same instructions repeatedly, experts can define reusable skills.

The workflow changes from:

Human repeatedly explains the task
to
Reusable Skill + User Data

For example, instead of explaining the entire medical review process each time, a user can simply say:

Apply the Medical Monitor Review skill to this AE listing.

The user provides the data. The skill provides the expertise structure. The AI combines both.

Medical Monitor Review as an Example

A medical monitor often reviews adverse event listings, laboratory listings, vital signs, coding consistency, eligibility concerns, and protocol deviations.

The review process usually follows recurring patterns:

  • Identify serious adverse events
  • Identify Grade 3 or Grade 4 events
  • Review liver toxicity
  • Assess dose-response relationships
  • Identify stopping-criteria events
  • Evaluate clinically significant trends
  • Determine whether escalation is needed

This is an ideal candidate for an AI agent skill because the workflow is structured, the review criteria are repeatable, the outputs are relatively standardized, and human oversight remains essential.

Step 1 — Identify the Repeatable Workflow

The first step is not AI. The first step is understanding your own workflow.

Ask yourself:

  • What recurring tasks do I perform repeatedly?
  • What patterns do I consistently look for?
  • What outputs do I repeatedly generate?
  • What judgment criteria do I apply?
  • What findings usually require escalation?

For medical monitor review, the workflow may be:

Input

AE listing, lab listing, vital signs listing, study context.

Review

Serious events, Grade 3/4 findings, ALT increases, safety signals, eligibility issues, coding consistency.

Output

Executive summary, key findings, trends, follow-up actions, human review recommendation.

Step 2 — Define the Required Input Data

Next, define what data the skill expects. A skill cannot reliably review data that is inconsistent, incomplete, or poorly structured.

AE Listing

  • Subject ID
  • Treatment arm or dose
  • AE term
  • CTCAE grade
  • Seriousness flag
  • Relationship to study drug
  • Action taken
  • Outcome

Lab Listing

  • Subject ID
  • Visit
  • Analyte
  • Result
  • Units
  • Reference range
  • CTCAE grade

Vital Signs Listing

  • Subject ID
  • Visit
  • Blood pressure
  • Heart rate
  • Temperature
  • Oxygen saturation
  • Weight
  • Baseline values

Step 3 — Define the Review Logic

This is where expertise becomes reusable. The goal is to teach the AI what matters, what patterns to look for, and how to prioritize findings.

For medical monitor review, focus areas may include:

  • Serious adverse events
  • Grade 3 or Grade 4 abnormalities
  • Liver toxicity
  • Dose-response trends
  • Stopping criteria
  • Skin reactions
  • Protocol deviations
  • Coding inconsistencies

Step 4 — Define the Output Structure

Experts often underestimate how important output structure is. A reusable skill should produce consistent outputs.

Recommended Output

  1. Executive Summary
  2. Key Safety Findings
  3. Trends and Patterns
  4. Potential Safety Signals
  5. Missing Information
  6. Recommended Follow-up Actions
  7. Human Review Recommendation

This makes reviews easier to read, compare, validate, and eventually automate.

Step 5 — Write the Skill File

The skill can be written as a simple Markdown file. It does not need to be complicated at first.

Purpose → Input Data → Focus Areas → Processing Instructions → Output Requirements → Escalation Rules

This file becomes reusable. You can improve it over time instead of rewriting prompts repeatedly.

Step 6 — Test with Sample Data

Next, test the skill using synthetic data, sample listings, or de-identified datasets.

The process is simple:

Skill File + Data Listing + Simple Invocation

Example:

Apply the Medical Monitor Review skill to this dataset.

Then review what the AI missed, what it overstated, whether the output was clinically useful, and whether the structure helped the review.

Where Can You Test and Run a Skill?

A skill does not need a complex platform on day one. You can start testing it anywhere you can provide three things: the skill instructions, the input data, and a clear request to apply the skill.

1. Directly in an LLM Chat

The simplest test is to paste the skill file into ChatGPT, Claude, or another LLM, then provide a sample dataset and ask the model to apply the skill. This is useful for early design and quick iteration.

2. In Claude Cowork

Claude Cowork is Claude's agentic workspace inside Claude Desktop. It brings Claude Code-style agent capabilities to knowledge work beyond coding, so Claude can work across files, instructions, and multi-step tasks without requiring a terminal.

This makes Cowork a natural place to create and test skills. A domain expert can define a skill, attach sample files or data, run the workflow, inspect the output, and refine the instructions until the skill behaves more consistently.

3. Inside an Agent Platform

Eventually, the skill can run inside an agent system that manages users, files, retrieval, permissions, routing, history, and repeated execution. This is where a reusable skill becomes part of a real workflow rather than a one-off prompt.

This is also where i80agent is heading: a place where domain knowledge, retrieval, and reusable expert skills can be tested together and eventually used in real workflows.

Step 7 — Refine Through Real Usage

The first version will not be perfect. That is expected.

Over time, new rules get added, edge cases get captured, escalation logic improves, outputs become more useful, and the workflow becomes more standardized.

This is how expertise gradually becomes operationalized.

Skills Do Not Replace Experts

This point is important: the goal is not to replace experts.

The goal is to reduce repetitive work, improve consistency, accelerate review, identify patterns earlier, structure workflows, and support decision-making.

The expert still reviews findings, validates conclusions, handles ambiguity, and makes final decisions.

This Applies Far Beyond Medical Review

The same process works in many domains.

Legal Review Skill

Identify contract risks, review clauses, flag missing protections, and summarize negotiation concerns.

HR Candidate Evaluation Skill

Summarize interview notes, compare competencies, identify gaps, and generate structured recommendations.

Financial Review Skill

Identify unusual variances, summarize trends, assess operational impact, and generate follow-up questions.

Operations Incident Review Skill

Summarize events, identify root causes, assess severity, and recommend mitigation steps.

Retrieval + Skills

One important lesson from building domain-specific AI systems is that skills alone are not enough. A skill often still depends on trusted knowledge.

For example, a medical monitor review skill may need protocol rules, study context, stopping criteria, coding standards, eligibility criteria, and historical decisions.

This is where retrieval becomes important.

Retrieve the right knowledge. Apply the right skill. Generate the right output.

Final Thoughts

Many professionals already have highly structured expertise. They just do not always think of it as something that can be formalized.

But once recurring workflows become explicit, structured, testable, and reusable, they can evolve into AI agent skills.

This is one of the most practical paths toward domain-specific AI: not replacing experts, but helping experts scale their expertise more consistently and efficiently.

I believe people in many different domains can do this. Medical reviewers, legal teams, HR leaders, finance teams, operations managers, educators, consultants, and creative professionals all have recurring workflows that depend on judgment and experience. If those workflows can be described clearly, they can probably become skills that improve consistency, reduce repetitive work, and make expert work more efficient.

The future of professional AI may not be one giant chatbot. It may be many reusable expert skills, each designed around a real workflow.