Many of Loka's clients come from the biotech industry, particularly in drug discovery, where the goal is to find and develop new therapeutic drugs. Drug development is a costly and time-consuming process, often taking up to 15 years to bring a new drug to market. At Loka, we help clients speed up this process by optimizing their AI pipelines and applying state-of-the-art models, such as large language models. This project specifically focuses on using DNA-encoded libraries, a technology we have explored in collaboration with a biotech client.
The KinDEL dataset contains DNA-encoded library (DEL) data for two kinase targets: DDR1 and MAPK14. The data is available on GitHub, along with a set of benchmark models and evaluation scripts. The test set features data from real binding experiments, where binding affinities (Kd values) were measured. The dataset also includes predefined splits using two methods: a random split and a disynthon (building block) split that groups compounds based on shared building blocks. For a more detailed description of the dataset, please refer to the accompanying paper.
The objective of this challenge is to build a machine learning model to predict enrichment scores of DEL compounds and evaluate how well it generalizes to real binding affinities (Kd). Some benchmark models are already provided in the GitHub repository, so we encourage candidates to explore novel architectures and methods for representing the molecules. Feel free to follow the evaluation strategy outlined in the paper, or come up with a new one
Github repo: Please prepare a GitHub repository that showcases your approach to developing a machine learning model for DEL-ligand data. The repository should include code, documentation, and any necessary instructions to understand and reproduce your work.
Discussion and Future Work: Provide a critical analysis of your results, interpreting the significance of your findings in relation to your objectives. Suggest potential improvements, alternative approaches, or future directions that could enhance the model or address any unresolved issues.