• Benchmarked various Learning from Human Feedback methods and studied their overoptimization problem
  • Introduced an efficient weighted decoding method that aligns text to a given attribute with uni-directional reward model
  • Explored language models’ knowledge-learning process and their QA performance relative to their pre-training data
  • Analyzed language model hallucination and tracked wrong answers in training corpus