Show Notes
Intro In this deep dive, Parker walks through building a free, open-source system that downloads knowledge from top minds on X and lets you query it with AI. He uses Augment, remote agents, and a TypeScript CLI to create a living, searchable knowledge base in under two hours.
What I built and why
- A personal knowledge base from smart X authors you care about (tweets, replies, etc.), turned into searchable embeddings.
- Open source, free to use, with an emphasis on safe scraping and practical tooling (SQLite + Drizzle, CLI, etc.).
- Key idea: X is a learning gold mine, but APIs are restrictive. This approach lets you selectively learn from specific people without wading through blogs or endless scrolling.
Actionable takeaway:
- If you want a targeted knowledge base, start with a spike to validate data sources, then automate the extract-to-embeddings pipeline.
How it works at a glance
- Data source: Public tweets and replies from chosen X users.
- Ingest pipeline:
- Interactive CLI to configure what to scrape (user, content type, scope, time range, count).
- Scrape with a rate-limiting profile to balance speed and account safety.
- Generate embeddings for semantic search.
- Store in SQLite with Drizzle ORM for easy querying and optional dashboards.
- Capabilities:
- Smart tweet scraping with advanced filtering.
- Semantic Q&A over the collected content.
- Optional daily/up-to-date knowledge base via cron-like jobs.
Actionable takeaway:
- Use a rate-limited scraping profile first; you can always increase tempo later, but safety comes first to protect accounts.
The technical stack and workflow
- Language and runtime: TypeScript (no Python) using Bun.
- Orchestration: Augment remote agents (auto mode) for task execution and research spikes.
- Storage and querying: SQLite + Drizzle ORM for fast, queryable access.
- Authentication: Works by passing in tokens (e.g., O token, CT0) to access the data sources.
- Documentation hosting plan: Move docs to a centralized Mintlify/Mintify-style site for consistency.
Actionable takeaway:
- Design the workflow around small, testable tasks (spikes) and keep the data model simple (embeddings + lightweight DB) so you can iterate quickly.
Demo walkthrough: CLI-driven scraping in action
- Run the CLI in interactive mode:
- bun run the cli
- Steps you’ll configure:
- Enter the X handle (username)
- Content type: tweets, replies, or both
- Scope: all posts vs. keyword-filtered posts
- Time range: e.g., last month
- Number of tweets to scrape (e.g., 100)
- Rate limiting profile: moderate (to avoid account issues)
- Generate embeddings after scraping
- Output and storage:
- Embeddings stored in SQLite via Drizzle
- Time stamps enable future cron-based updates
- Prompt-tuning and prompts usage:
- Use compact prompts (as few sentences as possible) and rely on just-in-time context
- Start with prompts from Enthropic/analogous prompt resources to guide improvements
- Optional: switch between interactive mode and scripted runs to fit your workflow
Actionable takeaway:
- Start with a minimal, repeatable CLI flow and add features (timestamps, incremental updates) as you validate the base pipeline.
Augment remote agents vs. Claude Code
- Augment strengths:
- Better handling of context and codebase understanding
- Thoughtful, deliberate steps before acting
- More affordable (e.g., $50 with a generous thread allotment)
- Claude Code trade-offs:
- Faster, more brute-force execution
- Feels closer to a Vim-like IDE experience; might be less opinionated about context
- Practical takeaway:
- For code-focused automation and knowledge-base tasks, Augment often offers a more principled, cost-efficient approach. Claude Code can be useful for rapid, large-scale experiments, but you’ll pay more and may trade some context sensitivity.
Actionable takeaway:
- If you’re price-conscious and want better codebase awareness, start with Augment and reserve Claude Code for specific, high-speed explorations.
Prompt engineering and context strategy
- Prompts and prompts sources:
- Leverage compact prompts; the smaller the prompt, the better the model performance in practice.
- Use context-engineering techniques (just-in-time context) to keep the model focused on the task.
- Practical prompts:
- Use code-analysis prompts to understand and suggest improvements to a given codebase.
- Examples: analyze the TypeScript + Bun CLI, suggest performance improvements, and preserve functionality.
- Workflow tip:
- Run a prompt as an Augment task and then review the results; refine the prompt or add a small, targeted context for subsequent runs.
Actionable takeaway:
- Favor short, precise prompts plus just-in-time context. It dramatically improves relevance and reduces hallucinations.
Documentation strategy: centralizing with Mintlify/Mintify
- Problem: Documentation scattered across multiple places.
- Plan: Create a centralized docs site and use it as the single source of truth.
- Multi-step approach:
- Analyze the current docs and identify gaps
- Study the reference architecture
- Design the documentation architecture
- Create the plan
- Set it up and migrate content
- Tools considered:
- Mintlify/Mintify (documentation hosting and structure)
- Outcome:
- A clean, navigableDocs site that mirrors the project’s architecture and usage
Actionable takeaway:
- Normalize the docs early. A centralized docs site speeds onboarding and reduces maintenance friction.
Performance, optimization, and code quality notes
- Bottlenecks observed:
- Inefficient queries and multiple count queries
- Memory management issues
- Unnecessary re-initialization on every command
- Suggested optimizations:
- Replace multiple queries with common-table expressions (CTEs) to reduce round trips
- Introduce caching for cosine similarity calculations
- Use typed arrays for high-throughput vector math
- How this was approached:
- Use Augment’s sequential thinking to outline improvements, then run targeted tasks
- Keep the session clean between runs to avoid stale references
- Documentation-driven improvements:
- Generate improved docs as part of the iteration to keep the codebase and explanations aligned
Actionable takeaway:
- Prioritize query optimization, memory efficiency, and targeted caching; keep documentation in sync with code changes.
Project, community, and what’s next
- Project and repo:
- XGPT and related tools are hosted under the VI organization on GitHub (example path referenced: github.com/joinvai)
- Community and platform:
- VI is a community-driven platform for builders; ongoing updates and show-and-tell sessions
- Pricing and momentum:
- Augment and related tooling offer cost-effective options; cloud-code products tend to be pricier
- He hints at price increases and encourages joining early to lock in access
- Next steps:
- Check out the XGPT repo for hands-on exploration
- Consider joining VI to participate in future updates and discussions
Actionable takeaway:
- If you’re serious about this workflow, explore the XGPT repo and consider joining the VI community to stay ahead and influence upcoming features.
Takeaways you can act on today
- Start with a spike to validate data sources and the core pipeline (scrape -> embed -> query).
- Use a rate-limited scraping profile to protect accounts and stay compliant.
- Build around a simple, query-friendly store (SQLite + Drizzle) before expanding storage complexity.
- Leverage short, context-aware prompts and just-in-time context to keep AI responses relevant.
- Consolidate documentation early with a centralized site to reduce maintenance overhead.
- Compare Augment vs Claude Code for your needs; choose Augment for codebase awareness and lower cost, Claude Code for speed and IDE-fit.
Links
Note: Some tooling names and product references in the video may be discussed in the context of the creator’s live environment. If you want exact setup steps or to clone the repo, refer to the XGPT repository linked above.