Skip to main content

Discord Dev Stream 11-6-24

Part 1​

Watch: https://www.youtube.com/watch?v=oqq5H0HRF_A

00:00:00 - Overview

  • OKai is moving to a plugin architecture to enable developers to easily add integrations (e.g. Ethereum wallets, NFTs, Obsidian, etc.) without modifying core code
  • Plugins allow devs to focus on specific areas of interest
  • Core changes will focus on enabling more flexibility and features to support plugins

00:01:27 - Core abstractions

  • Characters: Way to input information to enable multi-agent systems
  • Actions, evaluators, providers
  • Existing capabilities: Document reading, audio transcription, video summarization, long-form context, timed message summarization

00:02:50 - OKai as an agent, not just a chatbot

  • Designed to act human-like and interact with the world using human tools
  • Aim is to enable natural interactions without reliance on slash commands

00:04:44 - Advanced usage and services

  • Memory and vector search db (SQLite, Postgres with pgVector)
  • Browser service to summarize website content, get through CAPTCHAs
  • Services are tools leveraged by actions, attached to runtime

00:06:06 - Character-centric configuration

  • Moving secrets, API keys, model provider to character config
  • Clients will become plugins, selectable per character
  • Allows closed-source custom plugins while still contributing to open-source

00:10:13 - Providers

  • Inject dynamic, real-time context into the agent
  • Examples: Time, wallet, marketplace trust score, token balances, boredom/cringe detection
  • Easy to add and register with the agent

00:15:12 - Setting up providers and default actions

  • Default providers imported in runtime.ts
  • CLI loads characters and default actions (to be made more flexible)
  • Character config will define custom action names to load

00:18:13 - Actions Q: How does each client decide which action to call? A: Agent response can include text, action, or both. Process actions checks the action name/similes and executes the corresponding handler. Action description is injected into agent context to guide usage.

00:22:27 - Action execution flow

  • Check if action should be taken (validation)
  • Determine action outcome
  • Compose context and send follow-up if continuing
  • Execute desired functionality (mint token, generate image, etc.)
  • Use callback to send messages back to the connector (Discord, Twitter, etc.)

00:24:47 - Choosing actions Q: How does it choose which action to run? A: The "generate method response" includes the action to run. Message handler template includes action examples, facts, generated dialogue actions, and more to guide the agent.

00:28:22 - Custom actions Q: How to create a custom action (e.g. send USDC to a wallet)? A: Use existing actions (like token swap) as a template. Actions don't have input fields, but use secondary prompts to gather parameters. The "generate object" converts language to API calls.

00:32:21 - Limitations of action-only approaches

  • Shaw believes half of the PhD papers on action-only models are not reproducible
  • Many public claims of superior models are exaggerated; use OKai if it's better

00:36:40 - Next steps

  • Shaw to make a tutorial to better communicate key concepts
  • Debugging and improvements based on the discussion
  • Attendee to document their experience and suggest doc enhancements

Part 2​

Watch: https://www.youtube.com/watch?v=yE8Mzq3BnUc

00:00:00 - Dealing with OpenAI rate limits for new accounts

  • New accounts have very low rate limits
  • Options to increase limits:
    1. Have a friend at OpenAI age your account
    2. Use an older account
    3. Consistently use the API and limits will increase quickly
  • Can also email OpenAI to request limit increases

00:00:43 - Alternatives to OpenAI to avoid rate limits

  • Amazon Bedrock or Google Vertex likely have same models without strict rate limits
  • Switching to these is probably a one-line change
  • Project 89 got unlimited free access to Vertex

00:01:25 - Memory management best practices Q: Suggestions for memory management best practices across users/rooms? A: Most memory systems are user-agent based, with no room concept. OKai uses a room abstraction (like a Discord channel/server or Twitter thread) to enable multi-agent simulation. Memories are stored per-agent to avoid collisions.

00:02:57 - Using memories in OKai

  • Memories are used in the composeState function
  • Pulls memories from various sources (recent messages, facts, goals, etc.) into a large state object
  • State object is used to hydrate templates
  • Custom memory providers can be added to pull from other sources (Obsidian, databases)

00:05:11 - Evaluators vs. Action validation

  • Actions have a validate function to check if the action is valid to run (e.g., check if agent has a wallet before a swap)
  • Evaluators are a separate abstraction that run a "reflection" step
  • Example: Fact extraction evaluator runs every N messages to store facts about the user as memories
  • Allows agent to "get to know" the user without needing full conversation history

00:07:58 - Example use case: Order book evaluator

  • Evaluator looks at chats sent to an agent and extracts information about "shields" (tokens?)
  • Uses this to build an order book and "marketplace of trust"

00:09:15 - Mapping OKai abstractions to OODA loop

  • Providers: Observe/Orient stages (merged since agent is a data machine)
  • Actions & response handling: Decide stage
  • Action execution: Act stage
  • Evaluators: Update state, then loop back to Decide

00:10:03 - Wrap up

  • Shaw considers making a video to explain these concepts in depth

Part 3​

Watch: https://www.youtube.com/watch?v=7FiKJPyaMJI

00:00:00 - Managing large context sizes

  • State object can get very large, especially with long user posts
  • OKai uses "trim tokens" and a maximum content length (120k tokens) to cap context size
    • New models have 128k-200k context, which is a lot (equivalent to 10 YouTube videos + full conversation)
  • Conversation length is typically capped at 32 messages
    • Fact extraction allows recalling information beyond this window
    • Per-channel conversation access
  • Increasing conversation length risks more aggressive token trimming from the top of the prompt
    • Keep instructions at the bottom to avoid trimming them

00:01:53 - Billing costs for cloud/GPT models Q: What billing costs have you experienced with cloud/GPT model integration? A:

  • Open Router has a few always-free models limited to 8k context and rate-limited
    • Plan to re-implement and use these for the tiny/check model with fallback for rate limiting
  • 8k context unlikely to make a good agent; preference for smaller model over largest 8k one
  • Locally-run models are free for MacBooks with 16GB RAM, but not feasible for Linux/AMD users

00:03:35 - Cost management strategies

  • Very cost-scalable depending on model size
  • Use very cheap model (1000x cheaper than GPT-4) for should_respond handler
    • Runs AI on every message, so cost is a consideration
  • Consider running a local Llama 3B model for should_respond to minimize costs
    • Only pay for valid generations

00:04:32 - Model provider and class configuration

  • ModelProvider class with ModelClass (small, medium, large, embedding)
  • Configured in models.ts
  • Example: OpenAI small = GPT-4-mini, medium = GPT-4
  • Approach: Check if model class can handle everything in less than 8k context
    • If yes (should_respond), default to free tier
    • Else, use big models

00:06:23 - Fine-tuned model support

  • Extend ModelProvider to support fine-tuned instances of small Llama models for specific tasks
  • In progress, to be added soon
  • Model endpoint override exists; will add per-model provider override
    • Allows pointing small model to fine-tuned Llama 3.1B for should_respond

00:07:10 - Avoiding cringey model loops

  • Fine-tuning is a form of anti-slop (avoiding low-quality responses)
  • For detecting cringey model responses, use the "boredom provider"
    • Has a list of cringe words; if detected, agent disengages
  • JSON file exists with words disproportionately high in the dataset
    • To be shared for a more comprehensive solution

Part 4​

Watch: https://www.youtube.com/watch?v=ZlzZzDU1drM

00:00:00 - Setting up an autonomous agent loop Q: How to set up an agent to constantly loop and explore based on objectives/goals? A: Create a new "autonomous" client:

  1. Initialize with just the runtime (no Express app needed)
  2. Set a timer to call a step function every 10 seconds
  3. In the step function:
    • Compose state
    • Decide on action
    • Execute action
    • Update state
    • Run evaluators

00:01:56 - Creating an auto template

  • Create an autoTemplate with agent info (bio, lore, goals, actions)
  • Prompt: "What does the agent want to do? Your response should only be the name of the action to call."
  • Compose state using runtime.composeState

00:03:38 - Passing a message object

  • Need to pass a message object with userId, agentId, content, and roomId
  • Create a unique roomId for the autonomous agent using crypto.randomUUID()
  • Set userId and agentId using the runtime
  • Set content to a default message

00:04:33 - Composing context

  • Compose context using the runtime, state, and auto template

00:05:02 - Type error

  • Getting a type error: "is missing the following from type state"
  • (Transcript ends before resolution)

The key steps are:

  1. Create a dedicated autonomous client
  2. Set up a loop to continuously step through the runtime
  3. In each step, compose state, decide & execute actions, update state, and run evaluators
  4. Create a custom auto template to guide the agent's decisions
  5. Pass a properly formatted message object
  6. Compose context using the runtime, state, and auto template