discussion 2024-06-25
Summary​
In the discussion, Shaw shared a JSON dataset for training a transformer model, which included input-output pairs and digits to consider during evaluation. Yikesawjeez suggested adding a sink token to explore outcomes and recommended trying Jamba models as an educational resource, sharing a YouTube tutorial on setting up CUDA in the background. HaileyStorm noted that her AutoEncoders' performance drops significantly beyond 24x24 pixel resolutions. Metapontum discussed the potential of browser automation for AI development and its synergy with hierarchical DSL construction, while LemonLyman joined the server later in the conversation. Shaw concluded by expressing expectations about transformer performance based on their discussions.
FAQ​
-
What is the issue Shaw encountered with their transformer model?
-
Shaw: They have found a bug in their positional encoding where they are surprised that any size of transformer cannot beat the evaluation metric they're using, even though the input-output pairs seem straightforward and correct according to the provided JSON data.
-
What suggestion did yikesawjeez make for potentially resolving Shaw's issue?
- yikesawjeez: They suggested adding a sink token to see what happens as it might help in identifying or fixing the bug that Shaw is encountering with their transformer model.
-
How does HaileyStorm relate their experience with AutoEncoders to the discussion on transformers?
- HaileyStorm: They realized that their AutoEncoders perform well up until a certain size (20x20), after which performance drops significantly, indicating potential issues or limitations in handling larger input sizes. This could be relevant for understanding how transformer models might behave with different input dimensions and the importance of positional encoding.
-
What is metapontum's perspective on browser automation?
- metapontum: They believe there is significant immediate potential in browser automation, especially when it comes to constructing a hierarchical Domain Specific Language (DSL) that can iteratively build complex sequences without manual intervention. This approach could synergize with other areas and potentially improve the efficiency of tasks like identifying specific buttons on websites.