Back to YouTube
Parker RexJune 3, 2025

Microsoft KILLED Every Prompt Tool Company With This ONE GitHub Feature

Microsoft GitHub models unlock enterprise AI: store .prompt.yaml in private repos, a Versel-like playground, and GitHub-integrated AI workflows.

Show Notes

Microsoft is integrating AI model experimentation directly into GitHub, throwing a big lever for enterprise AI. Here’s what you need to know and how to use it to move from endless prompts to real, testable results.

What GitHub Models is

  • A workspace inside a private GitHub repo that lets you find and experiment with AI models for free.
  • You get a versioned, playground-like environment (similar to Vercel's playground) embedded into your GitHub workflows.
  • Aims to lower barriers to enterprise-grade AI adoption by tying AI development to familiar GitHub processes.

How it works (quick setup outline)

  • In your repo, you’ll use a prompt.yaml file to define prompts, models, and parameters.
  • When the repo is private, you get a secure, sandboxed environment for model experimentation.
  • A new Models tab in GitHub Copilot UI exposes:
    • Prompt name, description, and the chosen model
    • Parameters and message stacks (system/user messages)
    • A place to store and iterate on prompts directly alongside your code
    • The ability to provide test messages or test inputs

Sample workflow:

  • Create and edit prompts with system instructions, user prompts, and test inputs.
  • Save prompts, model choices, and parameter settings to prompt.yaml.
  • Run side-by-side evaluations of multiple models using identical prompts.

Code blocks and UI are designed to support back-and-forth prompt iteration, including a message stack for realistic dialog.

Core features you’ll actually use

  • Prompt storage under version control
  • Model selection and parameter tuning in one place
  • Structured prompts with system instructions, test inputs, and variables
  • Model comparison: run multiple models side by side with the same prompts and inputs
  • Evaluators: scoring metrics like similarity, relevance, and groundedness to analyze outputs
  • Prompt/model/parameter triage saved in a single file (prompt.yaml)
  • Private repo safety: work remains in your org, with governance around model access

Evaluators and metrics

  • Built-in evaluators let you score outputs to guide cost and quality:
    • Similarity: how closely output matches expected ideas
    • Relevance: alignment with the task
    • Groundedness: factual alignment with inputs
  • Use these scores to:
    • Filter which model/prompts to adopt
    • Optimize prompts for the best return
    • Control costs by preferring cheaper models when quality is acceptable

Collaboration, governance, and admin controls

  • Organization admins can allow all models or create a allow/deny list
  • Makes cross-team prompt sharing practical while keeping governance tight
  • Great for coordinating prompts across large teams and ensuring consistency

Tooling and the bigger picture

  • GitHub Models is positioned to work with tool-ability in Copilot:
    • API/tool-calls in the generative stack (coming via Microsoft’s tools)
    • Possibility to define personal tools or company-specific toolsets within Copilot
    • Plans around open-source Copilot concepts (C-Pilot) that could enable custom tool integration
  • Practical implication: you could ship your own tools that Copilot can call, all managed inside your private repo

Note: While some of these tooling capabilities are still evolving, the direction is clearly toward embedded tool calls, extensible prompts, and self-contained tool ecosystems inside GitHub.

UI/Workflow highlights

  • The models workspace shows a side-by-side comparison UI for models and prompts
  • You can commit prompt changes like you would code changes, keeping prompts traceable
  • The models page exposes code snippets you can drop into projects, enabling quick adoption
  • Test data can be embedded in prompt.yaml so samples exist alongside your prompts

Real-world takeaways

  • If you’re building prompts for products, GitHub Models lets you iterate faster with real evaluation metrics (no more guesswork)
  • You can compare multiple models with identical prompts to see which one meets your needs at a given cost
  • The ability to save prompts, models, and settings in a single YAML file makes it easier to version and share within teams
  • Admin controls help you manage risk and keep teams aligned on which models are allowed

Practical tips and cautions

  • Start simple (KISS): use a minimal prompt.yaml to get a feel for iteration before scaling
  • Leverage evaluators early to quantify gains and justify costs
  • Weigh privacy carefully: even in private repos, consider what data is used for training or feedback
  • Expect the tooling to evolve quickly — stay flexible and keep prompts modular

Getting started (actionable steps)

  1. Create a private repo or use an existing one and enable the GitHub Models workspace
  2. Add a prompt.yaml with a basic prompt, a chosen model, and simple test inputs
  3. Use the Models tab to create side-by-side experiments (e.g., GPT-4 vs. another model)
  4. Enable evaluators (similarity, relevance, groundedness) and compare results
  5. Save promising combinations and incrementally add more prompts, tests, and parameters
  6. Explore tool-calls concepts and await upcoming Copilot/C-Pilot capabilities to add personal tools

Takeaways

  • GitHub Models turns prompt experimentation into a first-class, version-controlled workflow
  • You can compare models, use evaluators, and iterate quickly within GitHub
  • Governance and private repos help teams adopt AI safely at scale
  • This approach aligns with the broader trend toward embedded tool calls and customizable copilots