discussion 2024-06-20
Summary​
In the discussion, Shaw highlighted challenges in generating sequences that fit given examples using a program to find functions for future inputs, likening it to finding a polynomial that fits multiple points. The team considered augmenting training sets with varying difficulty levels of sequence generation, including noisy sinusoidal patterns. Sidbin and Yikesawjeez joined the server, welcomed by Shaw. A significant technical decision was made when Shaw merged start and end tokens for better model convergence, which led to a loss below 1 after removing label smoothing from the CrossEntropyLoss function. The team also recognized an oversight in treating input pairs as separate entities rather than grouped based on relationships, with AutoMeta suggesting a perception-based approach using one token at a time and Arch2230 proposing to reformulate their problem-solving strategy by learning mathematical relationships between candidate solutions. Shaw proposed the idea of generating convolutional kernels dynamically within a CNN framework but acknowledged the gap between feasibility and completion, mentioning an update with position encoding in the dataset. Loveofdoing suggested using an Autoencoder for prediction as another potential direction.
FAQ​
-
What is the main issue with treating every input outpair pair as separate in Shaw's work?
-
[loveofdoing]: The problem lies in not learning the relationship between input-output pairs, which could lead to a lack of understanding of how different inputs relate to each other and affect outputs. This approach might miss patterns or relationships that are crucial for accurate predictions or classifications.
-
How can we improve the training set augmentation process?
- [shaw]: By generating sequences with varying levels of difficulty, including both obvious examples like sinusoidal functions with noise and more complex ones, we could greatly enhance the diversity and richness of our training data. This would help in creating a model that can generalize better to unseen data by exposing it to a wide range of scenarios during training.
-
What is AutoMeta's approach for handling input-output pairs?
- [AutoMeta]: The method involves treating each token as an individual unit, similar to FastText, and essentially trains a classifier that assigns labels (words) based on the relationships between arrays of tokens. This allows the model to learn from sequences by understanding how different combinations of tokens relate to specific outcomes or categories.
-
How can we create a network of candidate solutions for learning mathematical relationships?
- [Arch2230]: By developing a system that generates and evaluates various potential solutions, it could identify why certain relationships are invalid based on the underlying mathematics. This approach would involve creating an adaptive framework capable of refining its understanding over time as more data is processed, ultimately leading to improved problem-solving capabilities.
-
What challenges exist in generating a program that creates other programs?
- [shaw]: The main challenge lies in the complexity and variability of programming tasks, which require an extensive range of knowledge and skills. While it may be theoretically possible to learn such a "meta" program, practical implementation would demand significant advancements in AI research and development. Additionally, ensuring that generated programs are both efficient and reliable remains a considerable hurdle.
Who Helped Who​
- shaw helped loveofdoing with understanding sequence generation by explaining how a polynomial could fit multiple examples, suggesting augmentation to improve training sets.
- AutoMeta helped Arch2230 with conceptualizing an approach for creating candidate solutions and learning mathematical relationships behind invalid relations in sequences.
- shaw helped himself (and indirectly others) with debugging his code by removing label smoothing from the loss function, which led to convergence of the model's training process.
Action Items​
- Technical Tasks
- Generate a program that fits given examples and predicts future inputs (shaw)
- Merge start and end tokens in the dataset (shaw)
- Remove label_smoothing from criterion to achieve convergence (shaw)
- Address fundamental issue of treating every input pair as separate instead of grouped (shaw, loveofdoing)
- Explore creating a network of candidate solutions and learn the mathematical relationship behind invalid relationships (Arch2230)
- Documentation Needs
- Update dataset to pack in position encoding (shaw)
- Feature Requests
- Generate kernel on the fly for CNN approach (shaw)
- Consider using an Autoencoder for prediction (loveofdoing)