Back to YouTube

Parker RexJune 3, 2025

Microsoft KILLED Every Prompt Tool Company With This ONE GitHub Feature

Microsoft GitHub models unlock enterprise AI: store .prompt.yaml in private repos, a Versel-like playground, and GitHub-integrated AI workflows.

Watch on YouTube Subscribe to Parker Rex →

Next show Previous show

Show Notes

Microsoft is integrating AI model experimentation directly into GitHub, throwing a big lever for enterprise AI. Here’s what you need to know and how to use it to move from endless prompts to real, testable results.

What GitHub Models is

A workspace inside a private GitHub repo that lets you find and experiment with AI models for free.
You get a versioned, playground-like environment (similar to Vercel's playground) embedded into your GitHub workflows.
Aims to lower barriers to enterprise-grade AI adoption by tying AI development to familiar GitHub processes.

How it works (quick setup outline)

In your repo, you’ll use a prompt.yaml file to define prompts, models, and parameters.
When the repo is private, you get a secure, sandboxed environment for model experimentation.
A new Models tab in GitHub Copilot UI exposes:
- Prompt name, description, and the chosen model
- Parameters and message stacks (system/user messages)
- A place to store and iterate on prompts directly alongside your code
- The ability to provide test messages or test inputs

Sample workflow:

Create and edit prompts with system instructions, user prompts, and test inputs.
Save prompts, model choices, and parameter settings to prompt.yaml.
Run side-by-side evaluations of multiple models using identical prompts.

Code blocks and UI are designed to support back-and-forth prompt iteration, including a message stack for realistic dialog.

Core features you’ll actually use

Prompt storage under version control
Model selection and parameter tuning in one place
Structured prompts with system instructions, test inputs, and variables
Model comparison: run multiple models side by side with the same prompts and inputs
Evaluators: scoring metrics like similarity, relevance, and groundedness to analyze outputs
Prompt/model/parameter triage saved in a single file (prompt.yaml)
Private repo safety: work remains in your org, with governance around model access

Evaluators and metrics

Built-in evaluators let you score outputs to guide cost and quality:
- Similarity: how closely output matches expected ideas
- Relevance: alignment with the task
- Groundedness: factual alignment with inputs
Use these scores to:
- Filter which model/prompts to adopt
- Optimize prompts for the best return
- Control costs by preferring cheaper models when quality is acceptable

Collaboration, governance, and admin controls

Organization admins can allow all models or create a allow/deny list
Makes cross-team prompt sharing practical while keeping governance tight
Great for coordinating prompts across large teams and ensuring consistency

Tooling and the bigger picture

GitHub Models is positioned to work with tool-ability in Copilot:
- API/tool-calls in the generative stack (coming via Microsoft’s tools)
- Possibility to define personal tools or company-specific toolsets within Copilot
- Plans around open-source Copilot concepts (C-Pilot) that could enable custom tool integration
Practical implication: you could ship your own tools that Copilot can call, all managed inside your private repo

Note: While some of these tooling capabilities are still evolving, the direction is clearly toward embedded tool calls, extensible prompts, and self-contained tool ecosystems inside GitHub.

UI/Workflow highlights

The models workspace shows a side-by-side comparison UI for models and prompts
You can commit prompt changes like you would code changes, keeping prompts traceable
The models page exposes code snippets you can drop into projects, enabling quick adoption
Test data can be embedded in prompt.yaml so samples exist alongside your prompts

Real-world takeaways

If you’re building prompts for products, GitHub Models lets you iterate faster with real evaluation metrics (no more guesswork)
You can compare multiple models with identical prompts to see which one meets your needs at a given cost
The ability to save prompts, models, and settings in a single YAML file makes it easier to version and share within teams
Admin controls help you manage risk and keep teams aligned on which models are allowed

Practical tips and cautions

Start simple (KISS): use a minimal prompt.yaml to get a feel for iteration before scaling
Leverage evaluators early to quantify gains and justify costs
Weigh privacy carefully: even in private repos, consider what data is used for training or feedback
Expect the tooling to evolve quickly — stay flexible and keep prompts modular

Getting started (actionable steps)

Create a private repo or use an existing one and enable the GitHub Models workspace
Add a prompt.yaml with a basic prompt, a chosen model, and simple test inputs
Use the Models tab to create side-by-side experiments (e.g., GPT-4 vs. another model)
Enable evaluators (similarity, relevance, groundedness) and compare results
Save promising combinations and incrementally add more prompts, tests, and parameters
Explore tool-calls concepts and await upcoming Copilot/C-Pilot capabilities to add personal tools

Takeaways

GitHub Models turns prompt experimentation into a first-class, version-controlled workflow
You can compare models, use evaluators, and iterate quickly within GitHub
Governance and private repos help teams adopt AI safely at scale
This approach aligns with the broader trend toward embedded tool calls and customizable copilots

Links

Next show Previous show