Showing 1–20 of 1502 insights
TitleEpisodePublishedCategoryDomainTool TypePreview
Continuous Prompt EvaluationEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksMonitoring-
Use Azure's evaluator library for continuous monitoring and evaluation of prompts similar to Langsmith's approach.
End-to-end Experiment PipelineEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksDevops-
Cameron built a basic pipeline using an evaluator that overrides rows in an existing dataset for multiple experiments, tracking correctness rates, run...
Comparative ExperimentationEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Use an LLM as a judge to compare outputs from multiple experiments or models, facilitating side-by-side evaluation and selection of the best performin...
LLM Feedback LoopEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Integrate cloud code with Langsmith via mcp to run experiments, inspect traces, modify code, and rerun in a tight feedback loop for rapid AI developme...
Building Evaluator HooksEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Use evaluator hooks on GitHub to automatically run AI evaluations on each commit, enabling continuous feedback on prompt performance.
Context Window Data FormattingEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Implement precise data passing into the model’s context window, including formatting, conversions, and calculations, to improve prompt reliability.
Curated Prompt Test SuiteEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Maintain a curated dataset of prompts, expected outputs, and evaluation criteria to continuously test model upgrades and prompt engineering workflows.
Pairwise Output ComparisonEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Set up pairwise experiments where an LLM compares two nondeterministic outputs and declares which is better to guide prompt improvements.
LLM-Based Output EvaluationEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Use OpenAI’s evaluation framework to have an LLM judge outputs and assign a numeric score (e.g., 1–100) as an automated quality metric.
Prompt Dataset EvaluationEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Leverage Hugging Face’s ChatGPT prompt dataset to run experiments on diverse prompt examples and build automated evaluators.
Multi-Metric OptimizationEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Optimize AI outputs not only for accuracy but also for cost or other business metrics by iterating on prompts and workflows against reference datasets...
Iterative Benchmarking WorkflowEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Iteratively test different app versions, prompts, data formats, or AI models against a gold standard dataset to benchmark accuracy and optimize outcom...
Unit and Regression TestingEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Leverage traditional software testing frameworks like unit and regression tests to systematically evaluate AI agent performance using reference inputs...
Catchable-Predictable-Random TaxonomyEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Define AI output evaluation categories as catchable, predictable, or random to tailor your testing and feedback processes in domain-specific data pipe...
LangChain Academy WorkflowEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Follow the LangChain Academy curriculum—starting with tracing, then testing and evaluation in Langsmith, and finally advanced prompt engineering—to sy...
Iterative AI EngineeringEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Adopt an AI engineering or agent engineering practice with offline, online, and real-time test optimizations to continuously gather feedback and refin...
Backtesting AI FeaturesEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksAi-development-
Build a backtesting framework using Langsmith to evaluate existing AI features, run model upgrade tests, and track performance changes over time.
Event-Driven Agent TriggersEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksBackend-
Feed real-time market data from the API into LangGraph and configure event alerts to trigger agent workflows.
Deep Agents for TradingEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksArchitecture-
Use Deep Agents in LangGraph to encapsulate and optimize high-frequency trading strategies.
Agentic Cognitive ArchitectureEP 21 Kimi k2 Thinking, The AI Bubble, Nvidia’s Future, and LangChain Experiments11/22/2025FrameworksArchitecture-
Convert traditional if-else for-loop trading logic into a LangGraph agent-based cognitive architecture embedding decision trees and sub-agents to impr...
PreviousPage 1 of 76Next