🎯 Situation
Applied XL supplies effective tools to access and interpret real-time clinical trial data. Their mission is to help global biotech leaders make data-driven decisions, leveraging machine learning and Generative AI for quicker, more nuanced clinical intelligence. AppliedXL partnered with Loka to kickstart their GenAI work on Amazon SageMaker. This engagement was part of Loka’s GenAI Workshop, a six-week program that streamlines the design and deployment of LLM systems on AWS. This collaboration aims to accelerate Applied XL’s work within an AWS environment, tapping into Loka’s expert fine-tuning of LLMs leveraging GPU optimization in Amazon SageMaker.
🗒️ Task
Applied XL’s GenAI use case was one of the most intricate Loka had encountered. The key challenges were:
- Platform Design: AppliedXL needed a platform that enabled domain experts to contribute with their journalistic preferences. Loka determined that this could be achieved with a holistic labeling platform that allowed domain experts to manually generate ideal answers, rank multiple LLM generations, and iterate on the evaluation rubric.
- Framework Development: Applied XL needed a framework that would allow the LLM to incorporate human preferences when producing outputs. Loka selected RLHF because the priority was to align the LLM according to subjective criteria that is best captured via human preferences. This is exactly where RLHF excels.
🧩 Action
Diverging from more common RAG-centric GenAI use cases, this project demanded precise technical execution and deep knowledge of Amazon SageMaker.
During the discovery portion of our GenAI Workshop, we determined that the primary goal was framework development. To meet this objective, Loka established an end-to-end Reinforcement Learning with Human Feedback (RLHF) workflow tailored for Applied XL’s unique data. Our strategy was driven by the following actions:
- Preference Dataset: Creation of a dataset composed of multiple LLM generations ranked by domain experts according to their journalistic preferences.
- Custom Reward Model: Using the curated preference dataset, Loka trained a reward model from the SFT Llama2-7B that learned to distinguish between better and worse generations according to the experts’ preferences. The training process was optimized on Amazon SageMaker by leveraging multi-GPU training, plus state of the art quantization techniques and PEFT methods like LORA.
- Reinforcement Learning with PPO: Loka further fine-tuned the SFT Llama2-7B model to generate responses that maximize the scores of the Reward Model by using PPO. This is a compute-intensive process where Loka leveraged multi-GPU training, quantization, and PEFT to optimize the available resources on Amazon SageMaker.
🚀 Results
By the end of the project, Loka and AppliedXL achieved the following outcomes:
- Custom Tool Development: Loka evaluated several open-source labeling and annotation tools and provided recommendations for AppliedXL’s use case. This fueled AppliedXL’s work on an in-house labeling platform designed for their unique requirements.
- RLHF Workflow: Created an end-to-end RLHF workflow, optimized for AWS SageMaker, that aligns LLMs according to preference datasets. This workflow is a solid foundation for AppliedXL to achieve their ultimate goal of developing LLMs that follow their core journalistic principles.
- Avoided Project Delays and GenAI Abandonment: Reflecting on our collaboration, Applied XL expressed that they would’ve likely faced timely delays and considered shelving the RLHF project due to extreme challenges. Indeed, RLHF is still an under-explored domain with limited documentation and potential library incompatibilities that pose significant challenges. Having a dedicated team helped carry the project to completion.