Skip to main content

discussion 2024-07-02

Summary​

In the Discord chat, Shaw shared their experience with training a sequence model for solving arc challenges using GPT-generated code but reverted to transformer training due to difficulties in handling class imbalance. They suggested shuffling tokens and creating color maps that can be applied back and forth during inference as potential solutions. SSamuel discussed his simpler version of the arc challenge, which includes sparse binary matrices and transformations like translation, rotations, and mirrors; however, he noted his model struggled with predicting rotate_90 transformations despite high accuracy in other areas. Shaw advised that more training and data might be needed to improve performance. The chat also included discussions on the potential of ViT's image patch embedding being analogous to a wavelet transform and references to research papers related to Singular Learning Theory (SLT) and AGI progress.

FAQ​

  • How do you deal with class imbalance in your model?
  • SSamuel: He suggested using a shuffle technique on the tokens and creating color maps that can be applied back and forth to address the issue of class imbalance, specifically for his case where classes are colors.
  • Are you training or using GPT to generate code?
    • Shaw: Initially mentioned having made a gpt coder but later clarified they were back to training the transformer model.
  • How many epochs do you use in your experiments?
    • SSamuel: He asked this question, and it was not directly answered within the chat transcript provided.
  • Do you mask the input from the loss or train with all tokens?
    • SSamuel: This technical question about training methodology was posed by SSamuel but did not receive a direct answer in the given conversation.
  • Is your dataset solvable, and how do you ensure it is?
    • SSamuel: He mentioned running a DFS (Depth First Search) to confirm that his data should be solvable, indicating he ensures solvability through algorithmic checks.

Who Helped Who​

  • SSamuel helped Shaw with addressing class imbalance in a sequence model by suggesting techniques like shuffling tokens, creating color maps for classes, and applying transformations to data. This advice aimed at improving the model's ability to learn from diverse examples.
  • Shaw helped SSamuel understand his model's difficulty with solving simpler versions of arc challenges by sharing insights on training strategies and confirming that all data is solvable. They discussed potential issues like sparse binary matrices, transformations, and sequence models' capabilities in handling such tasks.

Action Items​

  • Technical Tasks

  • Train a sequence model to solve arc challenges, specifically addressing class imbalance issues (mentioned by SSamuel)

  • Shuffle tokens and apply color maps back and forth during training to handle class imbalances in the data (suggested by Shaw)

  • Pre-compute a thousand maps for swapping classes without losing original information, ensuring reversibility post-inference (suggested by Shaw)

  • Investigate if all input data is solvable and ensure that the model can solve simpler binary matrix transformations like translation, rotation, and mirroring (mentioned by SSamuel)

  • Improve prediction accuracy for rotate_90 transformation in Jamba model with 6 layers (identified issue by SSamuel)

  • Documentation Needs

    • No specific documentation needs were explicitly requested.
  • Feature Requests

    • Develop a GPT coder to assist in training the transformer, as an alternative or supplementary tool for generating code (mentioned by Shaw)
  • Community Tasks

    • Share and discuss results of experiments with simpler versions of arc challenges within the community, including performance metrics like accuracy for different transformations (committed by SSamuel)