Back to YouTube

Parker RexNovember 5, 2024

Claude Releases New Model (Haiku 3.5) | Code Testing & Impressions

Claude Haiku 3.5: code testing, first impressions, pricing, and benchmarks vs Gemini/GPT—what it means for coders and agents.

Watch on YouTube Subscribe to Parker Rex →

Next show Previous show

Show Notes

Claude Haiku 3.5 is put to the test in Parker’s hands-on look at coding and UI prompts, with a practical lens on cost, performance, and real-world workflows. Here are the key takeaways and hands-on findings.

Benchmark landscape at a glance

Claude Haiku 3.5 performance
- On the AER (Agent for Coding) leaderboard: around 4th place with ~75% in coding tasks, trailing the top model.
- Pricing vs. capability: marketed as a cheaper option, but user feedback flags the price as high for a model that isn’t the absolute leader.
Reasoning benchmarks
- Demonstrates solid chain-of-thought reasoning, but still not on par with the flagship, top-tier models.
- Compared to Gemini/other premium models, it’s competitive in some aspects but notably costlier.
Practical takeaway
- If you’re optimizing for price-to-performance in coding tasks, Haiku 3.5 is a mixed bag: better than many but expensive for what you get relative to the best-in-class.

Hands-on testing: prompts and results

Test setup (summary)
- Two primary prompts/tests: a Pixel-Perfect clone prompt (component copy) and a small web app UI task to compare Sunet vs Hau/HaCou outputs.
- Same prompts run against the two models to compare results and speed.
Test 1: Pixel-Perfect clone (component copy)
- Sunet: produced more usable, structured results with clearer code blocks and narrative guidance.
- Hau/HaCou: struggled to start or render clean outputs; outputs were less consistent and harder to translate into workable code.
- Takeaway: Sunet tends to be stronger for direct coding tasks that require clean structure and stepwise delivery.
Test 2: One-shot web app with an ingredient history UI
- Prompt involved generating a UI that shows ingredient history changes over time.
- Sunet: produced a more complete artifact (code + architecture notes) but acknowledged gaps needing refinement.
- Hau/HaCou: often failed to render a coherent TSX flow or skipped key pieces, making it less reliable for this kind of task.
- Takeaway: For frontend app generation, Sunet more consistently delivers usable scaffolds; HaCou lags behind in this test.

Prompt engineering and architect-style workflow

Architect-focused workflow
- The idea from AER-style practice: start with planning, then define data models, APIs, and TS/React stack steps.
- Example high-level approach (condensed prompt pattern):
  - Act as the software architect
  - Define data models and API endpoints
  - Output a structured plan in Markdown with code blocks and explanations
Model behavior differences
- Sunet tends to produce clean markdown with distinct sections, code blocks, and step-by-step guidance.
- HaCou tends to mix formats and can require more manual post-processing to extract usable code.
Practical note
- If the goal is documentation-to-code or vice versa, Sunet’s output tends to align better with developer workflows and tooling.

Practical workflows and takeaways

Low-code evaluation workflow
- Make.com (Integromat) is a practical path to automate model testing without writing code.
- Build a workflow that sends repeated prompts, collects outputs, and stores results in a spreadsheet for quick comparison.
Documentation-ready AI
- For turning a website or doc into well-formatted Markdown with snippets, you can feed a URL and have the AI generate structured docs, checklists, and examples.
Quick-start prompts for coding
- Use an architect-style prompt to define scope, then drill down with TS/React/Tailwind specifics.
- Example, minimal prompt snippet:
  - Act as the software architect. Define data models, API endpoints, and a TypeScript/React plan. Output in Markdown with code blocks for key parts.
What to test with your team
- Run side-by-side comparisons on your own coding tasks (component scaffolds, API schemas, UI flows) to see which model handles your typical patterns best.
- Track both quality and speed to judge whether the cost aligns with your productivity gains.

Final takeaways

Haiku 3.5 is not the top dog on coding benchmarks, but it’s a credible option with strong consistency in some tasks.
Sunet generally outperforms HaCou for coding-focused prompts and architecture-style workflows, often at a better price-performance point.
If you’re building workflows that rely on repeated AI tasks, consider low-code automation (Make.com) to test and compare models efficiently.
Use architecture-first prompts to drive clearer outputs (data models, APIs, and structured steps) before diving into code generation.

Actionable next steps

Do a two-model test for your key coding tasks and measure time-to-useful-output versus cost.
Set up a Make.com workflow to automate iterative prompts and export results for quick side-by-side analysis.
Experiment with an architecture prompt to see how each model formats its plan; use Sunet’s markdown/code structure to scaffold your project.

Links

AI Coding Leaderboards (coding benchmarks and discussion)
Claude 3.5 Haiku (Anthropic)
Claude Models (Sonnet and other models)
Make.com (low-code automation platform)
Framer Motion (animation/UI tooling)

Next show Previous show