Discord Dev Stream 11-6-24
Part 1​
Watch: https://www.youtube.com/watch?v=oqq5H0HRF_A
00:00:00 - Overview
- OKai is moving to a plugin architecture to enable developers to easily add integrations (e.g. Ethereum wallets, NFTs, Obsidian, etc.) without modifying core code
- Plugins allow devs to focus on specific areas of interest
- Core changes will focus on enabling more flexibility and features to support plugins
00:01:27 - Core abstractions
- Characters: Way to input information to enable multi-agent systems
- Actions, evaluators, providers
- Existing capabilities: Document reading, audio transcription, video summarization, long-form context, timed message summarization
00:02:50 - OKai as an agent, not just a chatbot
- Designed to act human-like and interact with the world using human tools
- Aim is to enable natural interactions without reliance on slash commands
00:04:44 - Advanced usage and services
- Memory and vector search db (SQLite, Postgres with pgVector)
- Browser service to summarize website content, get through CAPTCHAs
- Services are tools leveraged by actions, attached to runtime
00:06:06 - Character-centric configuration
- Moving secrets, API keys, model provider to character config
- Clients will become plugins, selectable per character
- Allows closed-source custom plugins while still contributing to open-source
00:10:13 - Providers
- Inject dynamic, real-time context into the agent
- Examples: Time, wallet, marketplace trust score, token balances, boredom/cringe detection
- Easy to add and register with the agent
00:15:12 - Setting up providers and default actions
- Default providers imported in runtime.ts
- CLI loads characters and default actions (to be made more flexible)
- Character config will define custom action names to load
00:18:13 - Actions Q: How does each client decide which action to call? A: Agent response can include text, action, or both. Process actions checks the action name/similes and executes the corresponding handler. Action description is injected into agent context to guide usage.
00:22:27 - Action execution flow
- Check if action should be taken (validation)
- Determine action outcome
- Compose context and send follow-up if continuing
- Execute desired functionality (mint token, generate image, etc.)
- Use callback to send messages back to the connector (Discord, Twitter, etc.)
00:24:47 - Choosing actions Q: How does it choose which action to run? A: The "generate method response" includes the action to run. Message handler template includes action examples, facts, generated dialogue actions, and more to guide the agent.
00:28:22 - Custom actions Q: How to create a custom action (e.g. send USDC to a wallet)? A: Use existing actions (like token swap) as a template. Actions don't have input fields, but use secondary prompts to gather parameters. The "generate object" converts language to API calls.
00:32:21 - Limitations of action-only approaches
- Shaw believes half of the PhD papers on action-only models are not reproducible
- Many public claims of superior models are exaggerated; use OKai if it's better
00:36:40 - Next steps
- Shaw to make a tutorial to better communicate key concepts
- Debugging and improvements based on the discussion
- Attendee to document their experience and suggest doc enhancements
Part 2​
Watch: https://www.youtube.com/watch?v=yE8Mzq3BnUc
00:00:00 - Dealing with OpenAI rate limits for new accounts
- New accounts have very low rate limits
- Options to increase limits:
- Have a friend at OpenAI age your account
- Use an older account
- Consistently use the API and limits will increase quickly
- Can also email OpenAI to request limit increases
00:00:43 - Alternatives to OpenAI to avoid rate limits
- Amazon Bedrock or Google Vertex likely have same models without strict rate limits
- Switching to these is probably a one-line change
- Project 89 got unlimited free access to Vertex
00:01:25 - Memory management best practices Q: Suggestions for memory management best practices across users/rooms? A: Most memory systems are user-agent based, with no room concept. OKai uses a room abstraction (like a Discord channel/server or Twitter thread) to enable multi-agent simulation. Memories are stored per-agent to avoid collisions.
00:02:57 - Using memories in OKai
- Memories are used in the
composeState
function - Pulls memories from various sources (recent messages, facts, goals, etc.) into a large state object
- State object is used to hydrate templates
- Custom memory providers can be added to pull from other sources (Obsidian, databases)
00:05:11 - Evaluators vs. Action validation
- Actions have a
validate
function to check if the action is valid to run (e.g., check if agent has a wallet before a swap) - Evaluators are a separate abstraction that run a "reflection" step
- Example: Fact extraction evaluator runs every N messages to store facts about the user as memories
- Allows agent to "get to know" the user without needing full conversation history
00:07:58 - Example use case: Order book evaluator
- Evaluator looks at chats sent to an agent and extracts information about "shields" (tokens?)
- Uses this to build an order book and "marketplace of trust"
00:09:15 - Mapping OKai abstractions to OODA loop
- Providers: Observe/Orient stages (merged since agent is a data machine)
- Actions & response handling: Decide stage
- Action execution: Act stage
- Evaluators: Update state, then loop back to Decide
00:10:03 - Wrap up
- Shaw considers making a video to explain these concepts in depth
Part 3​
Watch: https://www.youtube.com/watch?v=7FiKJPyaMJI
00:00:00 - Managing large context sizes
- State object can get very large, especially with long user posts
- OKai uses "trim tokens" and a maximum content length (120k tokens) to cap context size
- New models have 128k-200k context, which is a lot (equivalent to 10 YouTube videos + full conversation)
- Conversation length is typically capped at 32 messages
- Fact extraction allows recalling information beyond this window
- Per-channel conversation access
- Increasing conversation length risks more aggressive token trimming from the top of the prompt
- Keep instructions at the bottom to avoid trimming them
00:01:53 - Billing costs for cloud/GPT models Q: What billing costs have you experienced with cloud/GPT model integration? A:
- Open Router has a few always-free models limited to 8k context and rate-limited
- Plan to re-implement and use these for the tiny/check model with fallback for rate limiting
- 8k context unlikely to make a good agent; preference for smaller model over largest 8k one
- Locally-run models are free for MacBooks with 16GB RAM, but not feasible for Linux/AMD users
00:03:35 - Cost management strategies
- Very cost-scalable depending on model size
- Use very cheap model (1000x cheaper than GPT-4) for should_respond handler
- Runs AI on every message, so cost is a consideration
- Consider running a local Llama 3B model for should_respond to minimize costs
- Only pay for valid generations
00:04:32 - Model provider and class configuration
ModelProvider
class withModelClass
(small, medium, large, embedding)- Configured in
models.ts
- Example: OpenAI small = GPT-4-mini, medium = GPT-4
- Approach: Check if model class can handle everything in less than 8k context
- If yes (should_respond), default to free tier
- Else, use big models
00:06:23 - Fine-tuned model support
- Extend
ModelProvider
to support fine-tuned instances of small Llama models for specific tasks - In progress, to be added soon
- Model endpoint override exists; will add per-model provider override
- Allows pointing small model to fine-tuned Llama 3.1B for should_respond
00:07:10 - Avoiding cringey model loops
- Fine-tuning is a form of anti-slop (avoiding low-quality responses)
- For detecting cringey model responses, use the "boredom provider"
- Has a list of cringe words; if detected, agent disengages
- JSON file exists with words disproportionately high in the dataset
- To be shared for a more comprehensive solution
Part 4​
Watch: https://www.youtube.com/watch?v=ZlzZzDU1drM
00:00:00 - Setting up an autonomous agent loop Q: How to set up an agent to constantly loop and explore based on objectives/goals? A: Create a new "autonomous" client:
- Initialize with just the runtime (no Express app needed)
- Set a timer to call a
step
function every 10 seconds - In the
step
function:- Compose state
- Decide on action
- Execute action
- Update state
- Run evaluators
00:01:56 - Creating an auto template
- Create an
autoTemplate
with agent info (bio, lore, goals, actions) - Prompt: "What does the agent want to do? Your response should only be the name of the action to call."
- Compose state using
runtime.composeState
00:03:38 - Passing a message object
- Need to pass a message object with
userId
,agentId
,content
, androomId
- Create a unique
roomId
for the autonomous agent usingcrypto.randomUUID()
- Set
userId
andagentId
using the runtime - Set
content
to a default message
00:04:33 - Composing context
- Compose context using the runtime, state, and auto template
00:05:02 - Type error
- Getting a type error: "is missing the following from type state"
- (Transcript ends before resolution)
The key steps are:
- Create a dedicated autonomous client
- Set up a loop to continuously step through the runtime
- In each step, compose state, decide & execute actions, update state, and run evaluators
- Create a custom auto template to guide the agent's decisions
- Pass a properly formatted message object
- Compose context using the runtime, state, and auto template