Research Assistant @ UNC NLP

Benchmarked various Learning from Human Feedback methods and studied their overoptimization problem
Introduced an efficient weighted decoding method that aligns text to a given attribute with uni-directional reward model
Explored language models’ knowledge-learning process and their QA performance relative to their pre-training data
Analyzed language model hallucination and tracked wrong answers in training corpus