- Benchmarked various Learning from Human Feedback methods and studied their overoptimization problem
- Introduced an efficient weighted decoding method that aligns text to a given attribute with uni-directional reward model
- Explored language models’ knowledge-learning process and their QA performance relative to their pre-training data
- Analyzed language model hallucination and tracked wrong answers in training corpus