Showing 1–20 of 1502 insights
| Title | Episode | Published | Category | Domain | Tool Type | Preview |
|---|---|---|---|---|---|---|
| Continuous Prompt Evaluation | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Monitoring | - | Use Azure's evaluator library for continuous monitoring and evaluation of prompts similar to Langsmith's approach. |
| End-to-end Experiment Pipeline | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Devops | - | Cameron built a basic pipeline using an evaluator that overrides rows in an existing dataset for multiple experiments, tracking correctness rates, run... |
| Comparative Experimentation | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Use an LLM as a judge to compare outputs from multiple experiments or models, facilitating side-by-side evaluation and selection of the best performin... |
| LLM Feedback Loop | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Integrate cloud code with Langsmith via mcp to run experiments, inspect traces, modify code, and rerun in a tight feedback loop for rapid AI developme... |
| Building Evaluator Hooks | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Use evaluator hooks on GitHub to automatically run AI evaluations on each commit, enabling continuous feedback on prompt performance. |
| Context Window Data Formatting | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Implement precise data passing into the model’s context window, including formatting, conversions, and calculations, to improve prompt reliability. |
| Curated Prompt Test Suite | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Maintain a curated dataset of prompts, expected outputs, and evaluation criteria to continuously test model upgrades and prompt engineering workflows. |
| Pairwise Output Comparison | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Set up pairwise experiments where an LLM compares two nondeterministic outputs and declares which is better to guide prompt improvements. |
| LLM-Based Output Evaluation | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Use OpenAI’s evaluation framework to have an LLM judge outputs and assign a numeric score (e.g., 1–100) as an automated quality metric. |
| Prompt Dataset Evaluation | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Leverage Hugging Face’s ChatGPT prompt dataset to run experiments on diverse prompt examples and build automated evaluators. |
| Multi-Metric Optimization | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Optimize AI outputs not only for accuracy but also for cost or other business metrics by iterating on prompts and workflows against reference datasets... |
| Iterative Benchmarking Workflow | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Iteratively test different app versions, prompts, data formats, or AI models against a gold standard dataset to benchmark accuracy and optimize outcom... |
| Unit and Regression Testing | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Leverage traditional software testing frameworks like unit and regression tests to systematically evaluate AI agent performance using reference inputs... |
| Catchable-Predictable-Random Taxonomy | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Define AI output evaluation categories as catchable, predictable, or random to tailor your testing and feedback processes in domain-specific data pipe... |
| LangChain Academy Workflow | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Follow the LangChain Academy curriculum—starting with tracing, then testing and evaluation in Langsmith, and finally advanced prompt engineering—to sy... |
| Iterative AI Engineering | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Adopt an AI engineering or agent engineering practice with offline, online, and real-time test optimizations to continuously gather feedback and refin... |
| Backtesting AI Features | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Ai-development | - | Build a backtesting framework using Langsmith to evaluate existing AI features, run model upgrade tests, and track performance changes over time. |
| Event-Driven Agent Triggers | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Backend | - | Feed real-time market data from the API into LangGraph and configure event alerts to trigger agent workflows. |
| Deep Agents for Trading | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Architecture | - | Use Deep Agents in LangGraph to encapsulate and optimize high-frequency trading strategies. |
| Agentic Cognitive Architecture | EP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments | 11/22/2025 | Frameworks | Architecture | - | Convert traditional if-else for-loop trading logic into a LangGraph agent-based cognitive architecture embedding decision trees and sub-agents to impr... |
© 2025 The Build. All rights reserved.
Privacy Policy