Advertisement
Review| Volume 26, ISSUE 1, P3-15, February 2021

Download started.

Ok

Artificial Intelligence Effecting a Paradigm Shift in Drug Development

  • Masturah Bte Mohd Abdul Rashid
    Correspondence
    Corresponding Author: Masturah Bte Mohd Abdul Rashid, KYAN Therapeutics, NUS Enterprise@Singapore Science Park, The Curie, #02-03/04, 83 Science Park Drive, Singapore, 118258, Singapore. Email: mrashid@kyantherapeutics.com
    Affiliations
    KYAN Therapeutics, Singapore, Singapore
    Search for articles by this author

      Abstract

      The inverse relationship between the cost of drug development and the successful integration of drugs into the market has resulted in the need for innovative solutions to overcome this burgeoning problem. This problem could be attributed to several factors, including the premature termination of clinical trials, regulatory factors, or decisions made in the earlier drug development processes. The introduction of artificial intelligence (AI) to accelerate and assist drug development has resulted in cheaper and more efficient processes, ultimately improving the success rates of clinical trials. This review aims to showcase and compare the different applications of AI technology that aid automation and improve success in drug development, particularly in novel drug target identification and design, drug repositioning, biomarker identification, and effective patient stratification, through exploration of different disease landscapes. In addition, it will also highlight how these technologies are translated into the clinic. This paradigm shift will lead to even greater advancements in the integration of AI in automating processes within drug development and discovery, enabling the probability and reality of attaining future precision and personalized medicine.

      摘要

      薬剤開発にかかるコストと、薬剤の成果を市場に組み入れることとの間には逆相関があり、急速に拡大しているこの問題を克服するための革新的なソリューションが求められるようになってきた。この問題は、臨床試験の早期中止、規制要因、または薬剤開発プロセスの初期段階で下される判断など、複数の要因に起因している可能性がある。薬剤開発をより迅速化し補助するための人工知能(artificial intelligence:AI)の導入は、比較的安価なうえ効率的なプロセスをもたらし、最終的に臨床試験の成功率を向上させている。本レビューのねらいは、さまざまな疾患を取り巻く状況を探究しながら、薬剤開発の自動化を支援するとともに、薬剤開発の、特に新薬のターゲットの特定や設計、ドラッグリポジショニング、バイオマーカーの特定、有効な患者層別化などを首尾よく運ぶAI技術のさまざまなアプリケーションを示して比較することにある。また、これらの技術が臨床場面で活用される方法にも注目する。このパラダイムシフトは、薬剤開発や創薬の範疇のプロセスを自動化する際にAIを組み込むことの利点をさらに拡大することにつながり、ひいては将来の高精度医療やオーダメイド医療の実現の可能性を高める。

      초록

      약물 개발 비용과 약물의 성공적인 시장 통합 간의 역상관관계는 이러한 급증하는 문제들을 극복하기 위한 혁신적인 해법이 필요하다는 인식을 제기했다. 이러한 문제는 임상시험의 조기 종료, 규제 요인 또는 초기 약물 개발 과정에서 이루어진 결정을 포함한 여러 요인들에서 기인할 수 있다. 약물 개발을 가속화하고 지원하기 위한 인공지능(artificial intelligence, AI)의 도입으로 약물 개발 과정이 더욱 저렴하고 더욱 효율적이 되었으며 궁극적으로 임상시험의 성공률이 향상되었다. 본 종설의 목적은 다양한 질병 환경의 탐색을 통해 자동화를 지원하고 특히 신약의 표적 확인 및 설계, 약물의 재포지셔닝, 생체표지자 확인 및 효과적인 환자 층화 부분에서 약물 개발의 성공을 향상시키는 AI 기술의 다양한 적용을 보여주고 비교하는 것이다. 또한 이러한 기술들이 임상으로 전환되는 방식에 대해서도 강조할 것이다. 이러한 패러다임 전환은 신약 개발 및 발견의 자동화 과정에서 AI의 통합을 더욱 크게 발전시켜 향후 정밀의학 및 맞춤 의학 달성 가능성을 높이고 그 실현을 가능하게 할 것이다.

      抄録

      薬剤開発にかかるコストと、薬剤の成果を市場に組み入れることとの間には逆相関があり、急速に拡大しているこの問題を克服するための革新的なソリューションが求められるようになってきた。この問題は、臨床試験の早期中止、規制要因、または薬剤開発プロセスの初期段階で下される判断など、複数の要因に起因している可能性がある。薬剤開発をより迅速化し補助するための人工知能(artificial intelligence:AI)の導入は、比較的安価なうえ効率的なプロセスをもたらし、最終的に臨床試験の成功率を向上させている。本レビューのねらいは、さまざまな疾患を取り巻く状況を探究しながら、薬剤開発の自動化を支援するとともに、薬剤開発の、特に新薬のターゲットの特定や設計、ドラッグリポジショニング、バイオマーカーの特定、有効な患者層別化などを首尾よく運ぶAI技術のさまざまなアプリケーションを示して比較することにある。また、これらの技術が臨床場面で活用される方法にも注目する。このパラダイムシフトは、薬剤開発や創薬の範疇のプロセスを自動化する際にAIを組み込むことの利点をさらに拡大することにつながり、ひいては将来の高精度医療やオーダメイド医療の実現の可能性を高める。

      Keywords

      Introduction

      Drug development is a time-consuming and expensive process, with a duration ranging from 12 to 15 y
      • DiMasi J.A.
      • Feldman L.
      • Seckler A.
      • et al.
      Trends in Risks Associated with New Drug Development: Success Rates for Investigational Drugs.
      and an average cost of about 2.6 billion.
      • DiMasi J.A.
      • Feldman L.
      • Seckler A.
      • et al.
      Trends in Risks Associated with New Drug Development: Success Rates for Investigational Drugs.
      At the same time, failure rates of clinical trials exceed 90% after preclinical studies, and only 14% of compounds clear clinical trials. Because of the high attrition rate across the entire drug development pipeline, there is a very low probability that a newly identified molecule will result in an approved medicine, with only 10% of small molecules within the industry transitioning to candidate status, failing at multiple stages.
      • Kola I.
      • Landis J.
      Can the Pharmaceutical Industry Reduce Attrition Rates?.
      This could be due to several factors, including, but not limited to, no developable hits from high-throughput screening (HTS), the inability to configure a reliable assay, toxicity in preclinical studies, off-target effects, and the inability to obtain a good pharmacodynamics or pharmacokinetics profile.
      • DiMasi J.A.
      • Feldman L.
      • Seckler A.
      • et al.
      Trends in Risks Associated with New Drug Development: Success Rates for Investigational Drugs.
      ,
      • Kola I.
      • Landis J.
      Can the Pharmaceutical Industry Reduce Attrition Rates?.
      Coupled with these factors, the high cost of failure rates has resulted in alternative approaches that make the processes at each step of the drug development more efficient and accelerate the entire pipeline. This has paved the way toward the integration of artificial intelligence (AI) within these processes. AI in drug development aims to effectively analyze huge amounts of data and project better solutions based on these learned data. In this review, we look into the different permutations of AI used specifically in the drug discovery stage (from target to hit and lead optimization) and preclinical research, compare and contrast the AI used in the different stages, highlight platforms that are at the forefront of these AI applications, and present current hurdles of AI implementation in drug discovery and development.

      Types of AI in Drug Development

      Historically, computational approaches have been used in drug development, particularly for rational drug discovery. Virtual screening (VS) has been widely used to guide rational drug discovery.
      • Scior T.
      • Bender A.
      • Tresadern G.
      • et al.
      Recognizing Pitfalls in Virtual Screening: A Critical Review.
      The quantitative structure-activity relationship (QSAR) was developed for VS to generate models based on the hypothesis that similar compounds have similar activities. Hence, the models generated were based on molecular structures and target activities (e.g., physicochemical properties, therapeutic activities, and pharmacokinetic properties).
      • Lavecchia A.
      Machine-Learning Approaches in Drug Discovery: Methods and Applications.
      However, with the expansion of the search space and more HTS assays, the traditional computational approaches for QSAR are not suitable for big data.
      Generally, AI is the ability of a machine to perform tasks in response to a learned range of environments, aiming to replicate human intelligence. The term was first coined in 1956 by John McCarthy to describe the integration between engineering and science to generate intelligent machines.
      • McCarthy J.
      • Hayes P.J.
      Some Philosophical Problems from the Standpoint of Artificial Intelligence.
      Even though the concept dates back to many decades ago, there have been booms and busts in the development of AI since then. In the peak of the 1980s, advancements in AI led to mathematical models such as the multilayer feed-forward neural network and the back-propagation algorithm. This led to IBM’s Deep Blue victory against chess legend Garry Kasparov, which paved the way toward other advances in AI.
      • Korf R.E.
      Does Deep Blue Use AI?.
      ,
      • DeCoste D.
      The Future of Chess-Playing Technologies and the Significance of Kasparov versus Deep Blue..
      The most common permutations within AI are primarily machine learning
      • Giaccone G.
      • Bazhenova L.
      • Nemunaitis J.
      • et al.
      A Phase III Study of Belagenpumatucel-L, an Allogeneic Tumour Cell Vaccine, as Maintenance Therapy for Non-Small Cell Lung Cancer.
      and deep learning (DL), with the latter built from the foundations of the former. Briefly defined, ML involves algorithms that have been programmed to learn and improve future decision making based on past input data, without reprogramming or human intervention. The algorithms become more robust with increasing and higher quality data. There are 2 types of ML: supervised and unsupervised. Supervised learning involves developing predictive models based on continuous data, in which the input and output variables are known. Unsupervised learning involves identifying intrinsic patterns within data and clustering them in a meaningful manner. Examples of ML include support vector machines,
      • Schölkopf B.
      • Tsuda K.
      • Vert J.-P.
      Support Vector Machine Applications in Computational Biology.
      decision trees, k-nearest neighbors, naïve Bayesian methods, and artificial neural networks (NN), summarized in Table 1. DL is the next generation of artificial NN. In artificial NN, input data are fed into an input layer, which is subsequently transformed within the hidden layers, eventually generating the predictions in the output layer. DL is more complex than its predecessor, integrating multiple layers of learning from massive data sets, with each layer comprising a refined classification to improve the accuracy of the model. DL carries out large-scale data extraction by processing multilayered deep NN, building on its foundation on artificial NN. Some examples include convolutional NN, stacked autoencoders, deep belief networks, recurrent NN, and Boltzmann machines, summarized in Table 2. DL has been largely applied in image recognition and video and sound analyses, but there has been growing interest in the application of DL in drug development.
      Table 1Permutations of Machine Learning used.
      Types of Machine LearningFunctionReferences
      Support vector machinesFacilitate compound classification by constructing separating lines to distinguish objects
      • Vapnik V.
      The Nature of Statistical Learning Theory.
      ,
      • Vapnik V.
      • Vapnik V
      Statistical Learning Theory.
      Decision treesAssociating specific molecular features and/or descriptor values (branching) with property of interest (leaf node)

      Ensemble methods are employed for pruning algorithms
      • Quinlan J.R.
      Induction of Decision Trees.
      k-nearest neighborsPredictions made based on nearest training examples in the feature space (local approximation) for classification
      • Altman N.S.
      An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression.
      Naïve Bayesian methodsDescribing the probability of an event that might have been the result of any two or more causes, by providing a mathematical rule, explaining how a hypothesis change with new evidence
      • Nielsen T.D.
      • Jensen F.V.
      Bayesian Networks and Decision Graphs.
      ,
      • Dempster A.P.
      A Generalization of Bayesian Inference.
      Artificial neural networks (forward, backward, random, and self-organized maps)Flexible, nonlinear regression models that attempt to model brain structure, by incorporating multilayered neurons
      • Haykin S.
      Neural Networks: A Comprehensive Foundation.
      Table 2Permutations of Deep Learning Used.
      Type of Deep LearningFunctionReferences
      Convolutional NNContains many convolutional layers and subsampling layers embedded within these, where local conjunctions of features are detected from these layers
      • LeCun Y.
      • Boser B.E.
      • Denker J.S.
      • et al.
      Handwritten Digit Recognition with a Back-Propagation Network.
      ,
      • LeCun Y.
      • Bottou L.
      • Bengio Y.
      • et al.
      Gradient-Based Learning Applied to Document Recognition.
      Recurrent NNReconciling sequential data where parameters across the different steps of the model are shared
      • Schmidhuber J.
      • Hochreiter S.
      Long Short-Term Memory.
      Stacked autoencodersEach layer is trained to minimize discrepancy between original and reconstructed data
      • Hinton G.E.
      • Salakhutdinov R.R.
      Reducing the Dimensionality of Data with Neural Networks.
      ,
      • Hinton G.E.
      • Osindero S.
      • Teh Y.-W.
      A Fast Learning Algorithm for Deep Belief Nets.
      Boltzmann machinesNetwork of symmetrically connected, neuronlike units that make stochastic decisions in a binary fashion, with each layer connected to subsequent or previous nodes
      • Salakhutdinov R.
      • Hinton G.
      Deep Boltzmann Machines.
      ,
      • Hinton G.E.
      Learning Multiple Layers of Representation.
      Deep belief networksStacking of restricted Boltzmann machines
      • Hinton G.E.
      Learning Multiple Layers of Representation.
      ,
      • Hinton G.E.
      Training Products of Experts by Minimizing Contrastive Divergence.
      Generative adversarial networkModel trained based on competing components of the generator and discriminator, where the latter distinguishes real from artificial data
      • Goodfellow I.
      • Pouget-Abadie J.
      • Mirza M.
      • et al.
      Generative Adversarial Nets.

      Karras T., Aila T., Laine S., et al. Progressive Growing of GANS for Improved Quality, Stability, and Variation. arXiv preprint arXiv:1710.10196 2017.

      • Goodfellow I.
      NIPS 2016 Tutorial: Generative Adversarial Networks.
      NN, neural network.
      The drug development pipeline encompasses drug discovery (where target identification and drug lead discovery occurs), preclinical development (where the efficacy of the drug is interrogated at in vitro and in vivo phases and assessment of drug toxicity properties), and clinical phases (in which the safety of drugs in humans is investigated; Figure 1). Using AI at the different stages allows for the processing of massive amounts of data by identifying patterns of functional properties and projecting a response based on new situations. The following sections will explore the different interventional AI approaches that have been used to aid some drug developmental processes.
      Figure 1
      Figure 1Integration of artificial intelligence within the drug development pipeline, as exemplified by companies within the industry.

      Application of AI in Drug Discovery

      VS has been an integral component in chemoinformatics, in which computational tools are used to plough through huge databases for new leads, with a higher probability of strong binding affinity to the target protein. VS can be classified into structure or ligand based, where the former is used when there is enough information on the three-dimensional (3D) structure, whereas the latter is used when there is little 3D information.
      • Lavecchia A.
      Machine-Learning Approaches in Drug Discovery: Methods and Applications.
      Machine learning approaches (Table 1) have been integrated into VS, particularly for ligand-based VS. The objective of applying machine learning is to generate models, based on training sets, which are then used to predict compound class labels and rank these compounds according to their probable activites.
      • Bajorath J.
      Integration of Virtual and High-Throughput Screening.
      ,
      • Bajorath J.
      Selected Concepts and Investigations in Compound Classification, Molecular Descriptor Analysis, and Virtual Screening.
      The first application of machine learning was structural analysis (SSA), a tool for the automated analysis of biological screening data, as described by Cramer et al.
      • Cramer III, R.D.
      • Redl G.
      • Berkoff C.E.
      Substructural Analysis. Novel Approach to the Problem of Drug Design.
      This involves deriving a weight for each substructural fragment, independent of others within a molecule, to evaluate the estimated activity of the fragment-containing molecules. This has released prior restrictions that only allowed optimization of a previously recognized lead structure, hence allowing prediction of active compounds beyond the structural class of established biological interest. SSA was used in the US government’s anticancer program,
      • Hodes L.
      • Hazard G.F.
      • Geran R.I.
      • et al.
      A Statistical-Heuristic Method for Automated Selection of Drugs for Screening.
      and it was later “reidentified” as naïve Bayesian classifier.
      • Hert J.
      • Willett P.
      • Wilton D.J.
      • et al.
      New Methods for Ligand-Based Virtual Screening: Use of Data Fusion and Machine Learning to Enhance the Effectiveness of Similarity Searching.
      Hence, AI has been historically integrated into drug discovery to identify potential new therapeutic targets or to generate new lead molecules. But this is only a small fraction of the capabilities of AI in effecting changes in drug discovery.
      With the increasing amount of available data, the hurdle for drug design is ploughing through the vast space of medicinal chemistry data related to multiple targets and finding an optimal solution from these. The following examples illustrate the application of machine learning in drug discovery. The method to be employed depends on the prediction problem, data source, and prediction performance, for instance, the Berg Health harnesses Bayesian method in their AI platform, bAIcis, for target identification (Fig. 1). The Bayesian method involves assessing the probability of an event resulting from any two or more causes, providing a mathematical rule on how a hypothesis changes with new evidence.
      • Nielsen T.D.
      • Jensen F.V.
      Bayesian Networks and Decision Graphs.
      ,
      • Dempster A.P.
      A Generalization of Bayesian Inference.
      . As exemplified in fat tissue inflammation, Berg Health has identified the expression of a novel target, collagen type VI alpha 3 (COL6A3), to be linked to adipose tissue inflammation. The study showed that engineering low levels of COL6A3 in immortalized human preadipocytes resulted in the marked low expression and secretion of an inflammatory molecule, monocyte chemoattractant protein 1. These engineered fat cells were shown to be less sensitive to inflammatory signals and enhanced insulin sensitivity. These results highlight the potential of targeting COL6A3 for the treatment of obesity-induced inflammation.
      • Gesta S.
      • Guntur K.
      • Majumdar I.D.
      • et al.
      Reduced Expression of Collagen VI Alpha 3 (COL6A3) Confers Resistance to Inflammation-Induced MCP1 Expression in Adipocytes.
      By mining omics data from both normal and diseased states, Berg Health is able to generate potential therapeutic targets and novel biomarkers in areas such as oncology, neurology, and rare diseases.
      In the drug design space, Recursion Pharmaceuticals uses a combination of experimental data, automation, and machine learning to help facilitate drug compound design (Fig. 1). Thousands of chemical compounds are screened across cellular disease models, and Phenoprints is used to generate automated HTS of microscopic imaging to quantify phenotypic outputs from cellular images. Coupled with this automation, this suite of machine learning is iteratively harnessed to identify the most promising drug compound.

      Victors M. L., Borgeson B. C., St-Jean-Leblanc C. Systems and Methods for Evaluating Whether Perturbations Discriminate an on Target Effect. US Patent 10146914B1, December 4, 2018.

      In the rare disease field, Recursion Pharmaceuticals has established a pipeline to target the disease. Mutations of the target genes are induced to generate human rare disease cell models. More than 200 rare disease cell models have been generated, and these are used to test the efficacy of the compounds. The changes as reflected through the morphology and phenotype of the cells are quantified via Phenoprints and an open-source analytical software, CellProfiler. Bypassing mechanism elucidation, AI is used instead to plough through 200,000 images obtained from screening more than 2000 chemical compounds for each rare disease model.
      • Gibson C.C.
      • Zhu W.
      • Davis C.T.
      • et al.
      Strategy for Identifying Repurposed Drugs for the Treatment of Cerebral Cavernous Malformation.
      Recursion has entered clinical trials for Rec-994, a potential therapeutic molecule targeting cerebral cavernous malformation (CCM). CCM, clinically manifested by weakened vessels, results in blood leakage in the brain, leading to stroke. With 20% of the familial form of CCM associated with heterozygous loss of function of the gene CCM2,
      • Riant F.
      • Bergametti F.
      • Ayrignac X.
      • et al.
      Recent Insights into Cerebral Cavernous Malformations: The Molecular Genetics of CCM.
      primary human endothelial cells were subjected to small interfering RNA targeting CCM2. These cells were used for subsequent downstream screening experiments, and the resulting compounds were validated in endothelium-specific Ccm2 knockout mice, culminating in Rec-994 targeting CCM with defective CCM2. Rec-994 has completed phase 1 and is now preparing for phase 2 clinical trials.

      Chon, J. Automation and Machine Learning: A Look into Recursion Pharmaceuticals. September 2017. https://www.rarediseasereview.org/publications/2017/9/17/automation-and-machine-learning-a-look-into-recursion-pharmaceuticals

      This advancement through clinical trials is a testament to the robustness of the integrated platform in identifying promising therapeutics.
      Harnessing DL in target identification and drug design has also been extensively explored. Moreover, DL has been demonstrated to have greater predictive capabilities over machine learning, as evident from Merck Kaggle

      Dahl, G. E.; Jaitly, N.; Salakhutdinov, R. Multi-task neural networks for QSAR predictions. arXiv preprint arXiv:1406.1231 2014.

      and the NIH Tox21
      • Mayr A.
      • Klambauer G.
      • Unterthiner T.
      • et al.
      DeepTox: Toxicity Prediction Using Deep Learning.
      challenge, in which such approaches have triumphed for compound and toxicity prediction, respectively. In the QSAR competition sponsored by Merck, using the same descriptors and training data sets, teams were challenged to predict the biological activities of different molecules based on the numerical descriptors generated from the chemical structures by leveraging and comparing the different machine learning approaches. The challenge was composed of 15 targets, 164,024 compounds, and 11,081 features, with both descriptors and activities provided for the training set but only the latter for the test set. The winning team (submitted by one of the authors, George Dahl), which used multitask deep NN, achieved an improved accuracy of approximately 15%, as compared with Merck’s. The mean R2 averaged from 0.42 to 0.49, as compared with random forests. Even though a small number of targets interrogated is not reflective of the volume processed in pharmaceutical companies, it attracted the attention for future advancements in the QSAR field,

      Dahl, G. E.; Jaitly, N.; Salakhutdinov, R. Multi-task neural networks for QSAR predictions. arXiv preprint arXiv:1406.1231 2014.

      especially with the ability of deep NN to process thousand number of predictors. In addition, another study by Ma et al.
      • Ma J.
      • Sheridan R.P.
      • Liaw A.
      • et al.
      Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships.
      also demonstrated the superiority of deep NN over random forests for QSAR. Using the same data sets used in the Merck kaggle challenge and adapting the DNN algorithms derived from George Dahl’s team, they have shown deep NN outperforming random forests based on the mean R2 values. This study highlighted the predictive capability of deep NN in which having a single set of values for all the deep NN algorithmic parameters is applicable for most large volumes of QSAR data sets in industrial drug discovery
      • Ma J.
      • Sheridan R.P.
      • Liaw A.
      • et al.
      Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships.
      and that it is not necessary to optimize for the individual data sets.
      In the Tox21 challenge, DeepTox was demonstrated to be more superior in toxicity prediction (which included 12 stress response and nuclear receptor effects) when juxtaposed against other computational approaches, such as SVM and random forests. DeepTox pipeline consists of deep NN representing the different toxicophores. With a training set of 11,764 compounds, leaderboard set of 296, and test set of 647 compounds, the computational models had to predict the outcome of the HTS assay. All information regarding the compound structures and assay measurements for the 12 different toxic effects were made available for the training set, whereas the assay results have been withheld for the leaderboard set to evaluate the performance of the models and thereafter released so that participants could further refine the models. DeepTox demonstrated high performance, winning 9 of 15 challenges, using the area under the ROC curve (AUC) as a performance criterion. DeepTox achieved the best AUC for both toxicity panels.
      • Mayr A.
      • Klambauer G.
      • Unterthiner T.
      • et al.
      DeepTox: Toxicity Prediction Using Deep Learning.
      These aforementioned examples illustrate the superiority of DL over machine learning, including the following: ability to process and analyze large-scale data, mining relationships between input and output features, flexibility of the NN architectures resulting in its efficacy, and the automated extraction of features from raw data representations, without any predefined structure descriptor. However, these hand-engineered features are limited in predictive power to some extent, as these approaches cannot fully encode the structure information, and these predefined features are not data driven. The analysis underlying DL still involves predefined input predictors, limiting the full potential of direct raw data extraction. In addition, the currently available architectures are not suited for irregular structured data, such as molecules.
      To minimize bias and improve the robustness of existing approaches, alternative methods such as graph convolution networks (GCN) aim to address these limitations. GCN, an extension of convolutional NN, is driven by the aggregation of information from neighboring nodes to represent a particular node. This neighborhood information is represented as graph substructures, and these are projected onto similar or different spaces.
      • Xu K.
      • Hu W.
      • Leskovec J.
      • et al.
      How Powerful Are Graph Neural Networks?.
      GCN is thus able to encode the structural information of a molecular graph. A pioneering example was carried out by Duvenaud et al.,
      • Duvenaud D.K.
      • Maclaurin D.
      • Iparraguirre J.
      • et al.
      Advances in Neural Information Processing Systems.
      in which GCN was leveraged to generate data-driven fingerprints, representing the substructures within molecules. This was carried out by graphically representing the atoms as nodes and the interatom chemical bonds as the edges. These graphs are then used as input to learn molecule representations. The fingerprints were then used to evaluate drug properties, including solubility and drug efficacy, and were demonstrated to be superior when juxtaposed against circular fingerprints.
      • Glen R.C.
      • Bender A.
      • Arnby C.H.
      • et al.
      Circular Fingerprints: Flexible Molecular Descriptors with Applications from Physical Chemistry to ADME.
      Another important aspect of drug discovery is the evaluation of molecular dynamics simulation. Gilmer et al. reformulated available models
      • Duvenaud D.K.
      • Maclaurin D.
      • Iparraguirre J.
      • et al.
      Advances in Neural Information Processing Systems.
      ,
      • Kearnes S.
      • McCloskey K.
      • Berndl M.
      • et al.
      Molecular Graph Convolutions: Moving beyond Fingerprints.
      ,
      • Schütt K.T.
      • Arbabzadah F.
      • Chmiela S.
      • et al.
      Quantum-Chemical Insights from Deep Tensor Neural Networks.
      and proposed a common network framework termed message-passing NN within the GCN.

      Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; et al. Neural Message Passing for Quantum Chemistry. arXiv preprint arXiv:1704.01212 2017.

      This NN extracts features from molecular graphs. Bond types and interatomic distances are translated into neighborhood messages. These messages are subsequently fed into the center atom via a set2set model. This approach was evaluated on the QM9 data set, comprising approximately 130,000 molecules with 13 properties and various types of molecular energies. This study demonstrated the high predictive capability of the platform, by accurately predicting DFT (a quantum mechanical simulation method) on 11 of 13 targets.

      Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; et al. Neural Message Passing for Quantum Chemistry. arXiv preprint arXiv:1704.01212 2017.

      It has also highlighted the importance of stretching and effectively generalizing the model toward larger molecular sizes.
      Based on these studies, pharmaceutical companies have also leveraged GCN. As exemplified in Chemi-Net, this approach has been used for absorption, distribution, metabolism, and excretion (ADME) prediction endpoints, which encompass human microsomal clearance, human CYP450 inhibition, and aqueous equilibrium solubility.
      • Liu K.
      • Sun X.
      • Jia L.
      • et al.
      Chemi-Net: A Molecular Graph Convolutional Network for Accurate Drug Property Prediction.
      This analysis was carried out by first evaluating and reducing the neighboring information around each atom. The center atom is then defined by congregating the aforementioned information. This condensed information is then combined with an atom input feature. This study, which involved 250,000 data points, was demonstrated to perform better than Cubist, a machine learning approach adopted by Amgen.
      • Liu K.
      • Sun X.
      • Jia L.
      • et al.
      Chemi-Net: A Molecular Graph Convolutional Network for Accurate Drug Property Prediction.
      Other permutations of DL within drug discovery include the Google-owned AI company DeepMind, which has leveraged deep NN (namely, convolutional NN and autoencoders) to predict the properties of protein from its primary sequence. Named AlphaFold, it successfully predicted 25 of 43 structures, which includes the distances between pairs of amino acids and the ϕ-ψ angles between neighboring peptide bonds.
      • Chan H.S.
      • Shan H.
      • Dahoun T.
      • et al.
      Advancing Drug Discovery via Artificial Intelligence.
      ,
      • AlQuraishi M.
      AlphaFold at CASP13.
      This advancement is useful in predicting the 3D structure of the target, so as to allow the de novo design of inhibitors against these targets (Fig. 1). DeepMind has demonstrated the robustness of AlphaFold across many indications including macular degeneration, neuroscience, reinforcement learning, and, most recently, on Coronavirus disease 2019 (COVID-19). COVID-19 is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus, which replicates in the upper respiratory tract. To date, there are no vaccines or medicines available for COVID-19, and those who can infected are treated based on their symptomatic effects. Leveraging on AlphaFold, DeepMind has predicted the structures of understudied proteins associated with SARS-CoV-2.
      • Team A.
      Computational Predictions of Protein Structures Associated with COVID-19.
      Although they have not been experimentally verified, these structure predictions provide the backbone for the identification of potential therapeutics, accelerating the drug discovery process.
      De novo molecule design is challenging because of the vast search space and especially so for novel targets without extensive prior data and knowledge. Predictive models were initially generated using Simplified Molecular-Input Line-Entry System (SMILES), a linear string notation used in chemistry to describe molecular structures. This approach, however, does not capture molecular similarities succinctly, and certain chemical properties are better projected on a graph. Hence, the evolution and introduction of deep generative models for de novo molecule design, in which graphical representations serve as subunits for molecular structures. Building and improving upon these advances, Jin et al.
      • Jin W.
      • Barzilay R.
      • Jaakkola T.
      Junction Tree Variational Autoencoder for Molecular Graph Generation.
      has demonstrated a new generative model of molecular graph, the junction tree variational autoencoder. This approach places importance on evaluating structures as a whole, instead of using the node by node approach, where the likelihood of generating chemically invalid intermediates is high. To circumvent this, junction tree variational autoencoder breaks down molecules into subgraphs, serving as units both when encoding a molecule into vector representation and when decoding vectors back into molecular structures. However, this involves an indirect method of optimizing the molecular properties. You et al.
      • You J.
      • Liu B.
      • Ying Z.
      • et al.
      Advances in Neural Information Processing Systems.
      demonstrated an approach that is capable of directly optimizing the molecular characteristics of the molecular graphs through a process termed graph convolutional policy network.
      • You J.
      • Liu B.
      • Ying Z.
      • et al.
      Advances in Neural Information Processing Systems.
      Another approach, GraphAF, combines autoregressive and flow-based principles for molecular graph generation.
      • Shi C.
      • Xu M.
      • Zhu Z.
      • et al.
      GraphAF: A Flow-Based Autoregressive Model for Molecular Graph Generation.
      GraphAF is highly efficient, generating a high percentage of 68% of valid molecules. These examples illustrate the role of deep generative models, such as generative adversarial networks (GAN),
      • Goodfellow I.
      • Pouget-Abadie J.
      • Mirza M.
      • et al.
      Generative Adversarial Nets.
      and variational autoencoders,

      Kingma, D. P.; Welling, M. Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114 2013.

      in aiding de novo molecule design. These models are trained based on real examples and learned to generate similar, novel synthetic counterparts beyond the boundaries of the defined data set.
      In silico medicine has leveraged the power of GAN, coupled with reinforcement learning, giving rise to generative tensorial reinforcement learning (GENTRL) technology.
      • Zhavoronkov A.
      • Ivanenkov Y.A.
      • Aliper A.
      • et al.
      Deep Learning Enables Rapid Identification of Potent DDR1 Kinase Inhibitors.
      GAN trains models using a generator and discriminator, where these components compete. The generator produces artificial data, whereas the discriminator distinguishes it from real data. This process is repeated until the discriminator is unable to discern artificial from real data.

      Karras T., Aila T., Laine S., et al. Progressive Growing of GANS for Improved Quality, Stability, and Variation. arXiv preprint arXiv:1710.10196 2017.

      ,
      • Goodfellow I.
      NIPS 2016 Tutorial: Generative Adversarial Networks.
      The integration of reinforcement learning allows active exploration and optimization of the space beyond samples defined within the data set. Shedding off a significant amount of time, they have generated, validated, and designed a novel small-molecule targeting discoidin domain receptor 1 (DDR1) within 2 mo. DDR1 is a proinflammatory receptor tyrosine kinase involved in fibrosis. Six data sets were used to generate a robust model: molecules obtained from the ZINC data set, known DDR1 inhibitors, common kinase inhibitors, nonkinases inhibitors, patent data for biologically active compounds, and 3D structures of DDR1 inhibitors. The identified six lead candidates were validated both in vitro and in vivo. This approach demonstrated the capabilities of GAN in generating molecular design in a rapid and efficient fashion through optimizing biological activity and synthetic feasibility.
      Mendez-Lucio et al. of Bayer combined GAN with transcriptomic data and demonstrated its capabilities in proposing hit molecules based on a gene expression signature of the target knockout.
      • Méndez-Lucio O.
      • Baillif B.
      • Clevert D.-A.
      • et al.
      De Novo Generation of Hit-Like Molecules from Gene Expression Signatures Using Artificial Intelligence.
      This approach was based on the concept that a knocked-out protein would generate a gene expression signature that is analogous to the pharmacological inhibition of the same target. This platform could be applied to any target, as no prior background information on it or its active molecules are required. It comprises the training phase, in which approximately 20,000 compounds from the L1000 data set are trained via GAN; the prediction phase, in which new molecules are generated based on the desired gene expression signature; and finally the validation phase, in which molecules are checked against inhibitors beyond the training set. Each signature produced approximately 10% valid molecules, where the generated molecules share functional groups with the active molecule. This platform paves the way in reconciling chemistry and biology in drug molecular design, without having any prior information on the intended target.
      For drug design, both Atomwise and Benevolent.AI have used convolutional NN for their big data analyses (Fig. 1; Table 3). Atomwise’s platform, AtomNet, is able to predict the binding affinity of molecules by integrating the structure and ligand information of the target.

      Wallach, I.; Dzamba, M.; Heifets, A. AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-Based Drug Discovery. arXiv preprint arXiv:1510.02855 2015

      This is advantageous, especially for novel targets as compared with the other existing computation approaches, which requires substantial amounts of prior data on the target. Atomwise has recently demonstrated this on Parkinson’s disease by targeting Miro1, a mitochondrial membrane enzyme that mediates motility. Upon mitochondrial depolarization, Miro1 is removed to facilitate the removal of mitochondria via mitophagy. It was found that more than 94% of fibroblasts from patients have defective Miro1 removal upon depolarization. Harnessing AtomNet over 6.8 million molecules, the most promising small molecule promoted the separation of Miro1 from mitochondria and stalled neurodegeneration without causing toxicities.
      • Hsieh C.-H.
      • Li L.
      • Vanhauwaert R.
      • et al.
      Miro1 Marks Parkinson’s Disease Subset and Miro1 Reducer Rescues Neuron Loss in Parkinson’s Models.
      This study highlights the powerful tool of AI in advancing personalized medicine by identifying an optimal molecule against targets that could potentially stratify patients as well.
      Table 3List of Pharmaceutical Companies That Leverage AI to Enhance Drug Development Processes.
      Drug Development Pipeline
      CompanyData RequiredAI Approach/PlatformsTarget IdentificationDrug DesignDrug Repositioning/Drug CombinationMechansim Elucidation/Biomarker Discovery
      AtomwiseBigAtomNet

      -Deep learning (convolutional neural network)
      Benevolent.AIBigGuacaMol

      -Deep learning (convolutional neural network)
      Berg HealthBigbAIcis

      - Machine learning (Bayesian)
      BiovistaBigClinical Outcomes Search

      Space

      -Machine learning (natural language processing)
      BioXcelBigMachine learning
      BlackThorn TherapeuticsBigMachine learning
      CellWorksBigMachine learning
      CyclicaBigLigand Design and Ligand Express

      -Deep learning
      DeepMindSmallAlphaFold

      -Deep learning (convolutional neural network and autoencoder)
      ExscientiaSmall and bigCentaur Chemist

      Centaur Biologist

      -Deep learning (Bayesian)
      GNS healthcareBigReverse engineering,

      forward simulation (REFS)

      -Machine learning (Bayesian)
      HealxBigHealNet

      -Machine learning (natural language processing)
      Insilico MedicineBigDeep learning (Generative Adversial Networks)

      -Transcriptional response scores

      -Pathway activation scores

      -Chemical structural scores
      InsitroBigMachine learning
      Invivo.AIBigDeep learning

      (Few-shot, reinforcement, active and representation learning)
      KYANSmallOptim.AI

      -Machine learning (regression analysis)
      Lantern PharmaBigRADR

      Machine learning (inclusive of artificial neural network)
      Notable labsSmallUndisclosed
      NovadiscoveryBigJinkō
      NumerateBigUndisclosed
      PharnextBigPLEOTHERAPY
      PrecisionlifeBigMachine learning
      Recursion PharmaceuticalsBigMachine learning
      ReviveMedBigNetwork-based machine learning
      StandigmBigDeep learning (autoencoders)
      Turbine.AIBigMachine learning

      (learning classifier systems)
      twoXARBigUndisclosed
      Another company that harnesses DL, Exscientia, has recently developed an obsessive compulsive disorder drug, DSP-1181, within 12 mo, as juxtaposed with the conventional norm of 4 y to reach phase 1 of clinical trials.
      • Burki T.
      A New Paradigm for Drug Development.
      ,
      • Mullard A.
      The Drug-Maker’s Guide to the Galaxy.
      DSP-1181 is a long-acting potent serotonin 5-HT1A receptor agonist. This molecule was developed in partnership with Japan’s Sumitomo Dainippon Pharma, which has expertise in G-protein–coupled receptor drug discovery. With their platforms, Centaur Chemist and Centaur Biologist, Exscientia is able to optimize multiple SARs in a systemic method. This entails mining data from ChEMBL, a public database that contains compound and activity data from decades of published medicinal chemistry literature. In the aforementioned OCD drug, data from ChEMBL were extracted to construct Bayesian models of ligand activity across 784 human protein targets, including G-protein–coupled receptors.
      • Kotz J.
      In silico drug design.
      ,
      • Besnard J.
      • Ruda G.F.
      • Setola V.
      • et al.
      Automated Design of Ligands to Polypharmacological Profiles.
      These models aim to identify compounds that exert multitarget effects while minimizing off-target effects. This approach also provides an edge over other approaches that focus on only a single molecular target.
      • Roche O.
      • Sarmiento R.M.R.
      A New Class of Histamine H3 Receptor Antagonists Derived from Ligand Based Design.
      ,
      • Schneider G.
      • Geppert T.
      • Hartenfeller M.
      • et al.
      Reaction-Driven De Novo Design, Synthesis and Testing of Potential Type II Kinase Inhibitors.
      Exscientia differentiates itself from the others by being able to drive drug discovery using very limited data points, especially useful for novel targets. Using donepezil, an acetylcholinesterase inhibitor to improve cognitive enhancement, as a starting point, this approach evolved its structure with additional improvements in D2 dopamine receptor activity and penetration of the blood-brain barrier (BBB).
      • Besnard J.
      • Ruda G.F.
      • Setola V.
      • et al.
      Automated Design of Ligands to Polypharmacological Profiles.
      Based on these defined dimensions with their corresponding Bayesian scores, including an additional combined score representing the ADME properties suitable for BBB penetration, alternative chemical structures were generated. In addition to evolving donepezil’s negligible D2 activity, inhibitors in the benzolactom series were also generated. This class was not present in the initial database, providing further confidence in the capabilities of the algorithm in generating novel structures.
      • Besnard J.
      • Ruda G.F.
      • Setola V.
      • et al.
      Automated Design of Ligands to Polypharmacological Profiles.
      As illustrated, with limited literature information and starting from a single point, Exscientia is able to expand alternative options within neuropsychiatric conditions.
      The other companies that are within the same DL space include Insilico Medicine, Cyclica, Standigm, and DeepMind (Figure 1; Table 3).

      Application of AI in Drug Repositioning

      There has been interest in drug repositioning, or drug repurposing, due to the challenges in designing new molecules. This is especially useful when these drugs have known mechanisms of action and ADME and toxicity (ADMET) data, thus saving resources with repurposed drugs approved sooner (3–12 y)
      • Pantziarka P.
      • Bouche G.
      • Meheus L.
      • et al.
      The Repurposing Drugs in Oncology (ReDO) Project.
      and at reduced cost (50–60%).
      • Ashburn T.T.
      • Thor K.B.
      Drug Repositioning: Identifying and Developing New Uses for Existing Drugs.
      ,
      • Chong C.R.
      • Sullivan D.J.
      New Uses for Old Drugs.
      As mentioned in the previous section, Cyclica uses DL for target identification and drug molecule design. In addition, Cyclica is also able to execute drug repositioning through their Ligand Express platform. This platform provides insights into the polypharmacology of small-molecule ligands by identifying the on- and off-target interactions. Leveraging DL, the platform screens ligands against structurally characterized proteome (Fig. 1). Coupled with machine learning, the platform is also able to predict ligand-protein interactions. They have demonstrated this in repurposing an antiretroviral agent, nelfinavir mesylate, to treat pulmonary fibrosis in patients with systemic sclerosis, by inhibiting transforming growth factor β1 (TGFβ1).
      • Sanchez C.G.
      • Molinski S.V.
      • Gongora R.
      • et al.
      The Antiretroviral Agent Nelfinavir Mesylate: A Potential Therapy for Systemic Sclerosis.
      TGFβ1 is a key factor in fibrogenesis, promoting the differentiation of fibroblasts into myofibroblasts as well as collagen deposition. Nelfinavir was approved for combinatorial therapy for HIV type 1 infection by inhibiting the proteases. There were six targets that were identified upon administration of nelfinavir, but the other targets had weak association with systemic sclerosis. In this instance, nelfinavir was shown to decrease collagen, fibronectin, and α–smooth muscle actin by inhibiting TGFβ1-mediated phosphorylation of Smad2/3, Akt while activating autophagy.
      • Sanchez C.G.
      • Molinski S.V.
      • Gongora R.
      • et al.
      The Antiretroviral Agent Nelfinavir Mesylate: A Potential Therapy for Systemic Sclerosis.
      Cyclica has exemplified succinctly how an antiretroviral drug could be repurposed for an entirely different indication. Many companies that leverage AI to intervene in the drug discovery process are also able to repurpose existing drugs such as Exscientia, Recursion Pharmaceuticals, and Berg Health (Table 3).
      Centered on rare diseases, Healx agglomerates their proprietary data, both high-quality structured data and public unstructured data, as processed using natural language processing (NLP), and analyzes them to repurpose drugs, sieving out potential biomarkers and combinatorial treatments in tandem. NLP, which relies on machine learning techniques, is the ability of software to decipher and make sense of human languages. Leveraging NLP, Healx have demonstrated the capability of their platform to predict 22.2% synergistic antimalarial combinations from 1,540 combinations.
      • Mason D.J.
      • Eastman R.T.
      • Lewis R.P.
      • et al.
      Using Machine Learning to Predict Synergistic Antimalarial Compound Combinations with Novel Structures.
      The novel combinatorial therapy includes efflux or transporter inhibitors with compounds possessing antimalarial activity. Healx has also identified repurposed therapeutics for Fragile X syndrome, a genetic condition that results in learning disabilities.
      • Tranfaglia M.R.
      • Thibodeaux C.
      • Mason D.J.
      • et al.
      Repurposing Available Drugs for Neurodevelopmental Disorders: The Fragile X Experience.
      Spanning more than 15 mo from inception to readiness for clinical trial, Healx harnessed their AI analytics as the basis of its in silico Disease-Gene Expression Matching (DGEM) pipeline. This project identified eight potential candidates, which were also validated in mice. Sulindac, a nonsteroidal anti-inflammatory drug, and metformin, a hepatic glucose production inhibitor, have been identified as promising repurposing candidates for Fragile X.
      • Tranfaglia M.R.
      • Thibodeaux C.
      • Mason D.J.
      • et al.
      Repurposing Available Drugs for Neurodevelopmental Disorders: The Fragile X Experience.
      Following this, the most promising ones will be chosen to progress through phase IIa trials. The efficiency in repurposing therapeutics puts forth AI technologies such as Healx as viable options for indications that lack readily available therapeutics.
      Even though AI has been synonymous with big data analysis, generating predictive models from small data is particularly useful, as demonstrated earlier by Exscientia for novel targets. Another approach that uses small, experimentally derived data sets is able to highlight optimal drug combinations using existing available drugs via regression analysis. These data could be derived experimentally at the in vitro, in vivo, and ex vivo stages to rationally identify the most optimal drug combination. This platform has been leveraged on many indications including multiple myeloma
      • Rashid M.B.M.A.
      • Toh T.B.
      • Hooi L.
      • et al.
      Optimizing Drug Combinations against Multiple Myeloma Using a Quadratic Phenotypic Optimization Platform (QPOP).
      and T-cell lymphoma,
      • de Mel S.
      • Rashid M.B.
      • Zhang X.Y.
      • et al.
      Application of an Ex-Vivo Drug Sensitivity Platform towards Achieving Complete Remission in a Refractory T-Cell Lymphoma.
      for which epigenetic drugs have been repurposed for these hematological disorders. Recently, the platform has also repositioned antiviral drugs to inhibit SARS-CoV-2. With a 12-drug search space spanning more than 530,000 possible drug combinations, the platform identified remdesivir, ritonavir, and lopinavir as the most optimal regimen to inhibit SARS-CoV-2 live virus. This regimen demonstrated a 6.5-fold improvement in efficacy juxtaposed to remdesivir alone. On the other hand, combinatorial hydroxychloroquine and azithromycin were shown to be relatively ineffective.

      Blasiak, A.; Lim, J. J.; Seah, S. G. K.; et al. IDentif. AI: Artificial Intelligence Pinpoints Remdesivir in Combination with Ritonavir and Lopinavir as an Optimal Regimen against Severe Acute Respiratory Syndrome Coronavirus 2 (SARSCoV-2). medRxiv 2020.

      Completed within 2 wk, these studies also demonstrate the application and translational capabilities of AI to identify optimal therapeutics from the existing pool of drugs to address future outbreaks.

      Application of AI in Biomarker Discovery

      Identifying a robust biomarker is powerful in identifying potential responders, thus enabling precision medicine and improving clinical trial success rates.
      • van Gool A.J.
      • Bietrix F.
      • Caldenhoven E.
      • et al.
      Bridging the Translational Innovation Gap through Good Biomarker Practice.
      ,
      • Kraus V.B.
      Biomarkers as Drug Development Tools: Discovery, Validation, Qualification and Use.
      Under the master protocol, both basket and umbrella trials aim to extend the reach of targeted therapies to potential patients. Basket trials translate this by enrolling patients harboring the same mutation or biomarker, regardless of indication, to the same treatment.
      • West H.J.
      Novel Precision Medicine Trial Designs: Umbrellas and Baskets.
      ,
      • Renfro L.A.
      • Mandrekar S.J.
      Definitions and Statistical Properties of Master Protocols for Personalized Medicine in Oncology.
      Umbrella trials, on the other hand, focus on a particular indication but enroll patients to the respective treatments based on the different molecular alterations or biomarker.
      • West H.J.
      Novel Precision Medicine Trial Designs: Umbrellas and Baskets.
      The search space to identify a suitable biomarker is huge, having to plough through extensive amounts of data.
      By leveraging the Bayesian approach to sieve through mechanisms driving the disease landscape, a by-product of the GNS platform is the identification of potential biomarkers (Fig. 1). This has been demonstrated recently on multiple myeloma (MM), for which a histone methyltransferase, PHF19, has been identified as the gene with the strongest association with myeloma progression.
      • Mason M.J.
      • Schinke C.
      • Eng C.L.
      • et al.
      Multiple Myeloma DREAM Challenge Reveals Epigenetic Regulator PHF19 as Marker of Aggressive Disease.
      This potential biomarker was also shown to exhibit greater predictive power than the well-known high-risk gene in MM, MMSET. The platform was also applied on other indications such as metastatic colorectal cancer
      • Das R.
      • Ou F.
      • Washburn C.
      • et al.
      PD-020 Bayesian Machine Learning on CALGB/SWOG 80405 (Alliance) and PEAK Data Identify a Heterogeneous Landscape of Clinical Predictors of Overall Survival (OS) in Different Populations of Metastatic Colorectal Cancer (mCRC).
      and Huntington’s disease, for which predictive models were generated based on data from clinical trials.
      To increase the sensitivity of this process, a combination of features was also used to facilitate patient stratification. This involves the agglomeration and analysis of genomic, epigenetic, epidemiological, and other patient data to match the patients to the right treatments. Precisionlife uses this machine learning clustering approach to aid the process of therapy selection. Mining through genotype data of 547,197 single-nucleotide polymorphisms (SNPs) from 11,088 breast cancer cases and 22,176 controls, this platform was able to highlight high-order combinatorial genomic signatures within 1 h. Clustering of the SNPs provided insights into the complex disease population, where 175 risk-associated genes were found to be relevant to different patient subpopulations. From these, P4HA2 and TGM2 were demonstrated to have high repurposing potential in breast cancer, with accompanying validation.
      • Taylor K.
      • Das S.
      • Pearson M.
      • et al.
      Systematic Drug Repurposing to Enable Precision Medicine: A Case Study in Breast Cancer.
      P4HA is involved in collagen synthesis, whereas TGM2 translates into an enzyme (transglutaminase 2) involved in posttranslational modification of proteins to facilitate crosslinking. Collagen deposition has been proven to hasten cell growth and development,
      • Wang T.
      • Fu X.
      • Jin T.
      • et al.
      Aspirin Targets P4HA2 through Inhibiting NF-κB and LMCD1-AS1/let-7g to Inhibit Tumour Growth and Collagen Deposition in Hepatocellular Carcinoma.
      thus making it a good biomarker to recruit potential patients. Transglutaminase 2 has been shown to be upregulated in breast cancer, and its interaction with interleukin-6 mediates metastasis.
      • Oh K.
      • Ko E.
      • Kim H.S.
      • et al.
      Transglutaminase 2 Facilitates the Distant Hematogenous Metastasis of Breast Cancer by Modulating Interleukin-6 in Cancer Cells.
      Without taking these mechanisms into account prior to the analysis, the robustness of the platform is able to highlight these genes that would have high patient stratification potential. This analysis provides multiple insights into the disease, identifying both signature biomarkers and their corresponding viable drug treatments. Precisionlife has also leveraged their analytics platform on other non-oncology indications such as amyotrophic lateral sclerosis, Alzheimer’s disease, and asthma.
      Besides analysing extensive and huge data sets, an ex vivo analysis on patient cells is able to highlight the most optimal, actionable treatments for the patient within a short turnaround time of 5 to 7 days. As demonstrated on a rare subtype of peripheral T-cell lymphoma (PTCL), hepatosplenic T-cell lymphoma, this machine learning–driven platform by KYAN has a relatively high concordance to previous clinical trials.
      • de Mel S.
      • Rashid M.B.
      • Zhang X.Y.
      • et al.
      Application of an Ex-Vivo Drug Sensitivity Platform towards Achieving Complete Remission in a Refractory T-Cell Lymphoma.
      Before ex vivo analysis, the patient was administered the standard-of-care combinations (HyperCVAD, pembrolizumab, GVD [gemcitabine, vinorelbine, and liposomal doxorubicin], and pralatrexate), but the disease was refractory to them. Upon carrying out the ex vivo analysis as harnessed through KYAN’s platform, panobinostat and bortezomib emerged as the optimal combination for the same patient. After eight cycles on the regimen, the patient experienced complete metabolic remission. Subsequently, the patient underwent an autologous stem cell transplant and reported wellness 1 y later. Following this patient, 20% of subsequent patients with similar clinical manifestations responded to the same regimen, reflective of a previous phase II clinical trial on relapsed or refractory PTCL.
      • Tan D.
      • Phipps C.
      • Hwang W.Y.
      • et al.
      Panobinostat in Combination with Bortezomib in Patients with Relapsed or Refractory Peripheral T-Cell Lymphoma: An Open-Label, Multicentre Phase 2 Trial.
      Following up, this ex vivo analytical platform could recruit patients who are sensitive to the desired treatment of interest, whereas those who are not will be administered the standard of care, akin to the basket trials. Paving the way toward personalized medicine, this platform could potentially be a viable clinical decision support system in recruiting and matching the right patients to the best therapeutic options.
      The companies exemplified above highlight the importance of selecting the right biomarkers to facilitate patient recruitment, hence increasing the success of clinical trials.
      We have demonstrated the role of AI in expediting and increasing efficiency across various stages along the drug development pipeline. Even though there is higher density and greater development of pharmaceutical companies in the drug discovery space, we have also showcased the application of AI for drug repositioning, biomarker discovery, and patient recruitment for clinical trials (Fig. 1). In addition, we have pointed out the multiple roles these companies offer, intervening at multiple stages across the drug development. With these roles, it is imperative to choose the most suitable AI technology to solve the intended problems. Big data analysis is the core structure of AI, but this review has also highlighted companies such as Exscientia, which is able to work with limited amount of data points. This is essential when working with novel targets and in advancing personalized medicine.
      With the pervasive use of AI, especially in drug development, some factors should be taken into consideration. Algorithm transparency still remains an issue, in which platforms are akin to black boxes, making it difficult to sufficiently explain how the results are derived. Before implementation, complementary and follow-up experiments are important for validating the results that have been obtained via AI. This gap in interpretation would also result in troubleshooting difficulty when failure occurs.
      Because AI is synonymous with big data aggregation and analysis, another issue that comes with big data is the availability of quality and accurate data. AI is based on the premise of predictive model generation, so any noisy data may skew the results and future predictions using this training set. One solution is for pharmaceutical companies to enforce high standards for data quality and management. Another is collaborative effort, in which high-quality shared data resources can be made more publicly available. These should encompass both successful and failed drug development efforts to allow more accurate prediction.
      In addition, these large data sets can be integrated for training, validation, and feature data for the related algorithms. Integrating multifaceted data improves the accuracy and robustness of these algorithms. There are many ongoing efforts including ChEMBL, a large-scale chemical bioactivity database,
      • Gaulton A.
      • Hersey A.
      • Nowotka M.
      • et al.
      The ChEMBL Database in 2017.
      which contains and is not limited to compounds in clinical development, phenotypic data associated with similar compounds, and data from patents. Probe miner, a web resource that enables the objective, large-scale analysis of chemical probes, provides valuable information that assists in the identification of potential chemical probes. Approximately 300,000 (out of 1.8 million) published bioactive compounds were objectively explored and quantified against at least 2,000 human targets.
      • Antolin A.A.
      • Tym J.E.
      • Komianou A.
      • et al.
      Objective, Quantitative, Data-Driven Assessment of Chemical Probes.
      This database also highlights the gaps in knowledge, including human targets that lack chemical tools to probe their function. CanSAR is another platform that brings together multidisciplinary data across biology, chemistry, pharmacology, structural biology, cellular networks, and clinical annotations.
      • Coker E.A.
      • Mitsopoulos C.
      • Tym J.E.
      • et al.
      canSAR: Update to the Cancer Translational Research and Drug Discovery Knowledge Base.
      Machine learning can also be applied for predictions in drug discovery. In addition, these open-source data sets aid the accuracy of AI prediction in drug discovery.
      With the burgeoning number of AI approaches and companies sprouting within the industry, the presence of publicly available data also serves as a standardized data set that allows the evaluation of these newly developed algorithms. For instance, MoleculeNet (built upon the open-source package DeepChem) provides a suite of software encompassing many known molecule representations and DL algortihms.
      • Wu Z.
      • Ramsundar B.
      • Feinberg E.N.
      • et al.
      MoleculeNet: A Benchmark for Molecular Machine Learning.
      ,

      Li J., Cai D., He X. Learning Graph-Level Representation for Drug Discovery. arXiv preprint arXiv:1709.03741 2017.

      Built on multiple public databases, it covers approximately 700,000 compounds that have been tested across different properties. For drug property prediction, the integration of attention and gate mechanisms improved the performance of graph convolutional networks.
      • Ryu S.
      • Lim J.
      • Hong S.H.
      • et al.
      Deeply Learning Molecular Structure-Property Relationships Using Attention- and Gate-Augmented Graph Convolutional Network.
      This open access code was evident from the approach of Ryu et al, which is able to distinguish two separated molecular regions related to charge-transfer excitations for highly efficient photovoltaic molecules without any electronic structure information. These were proposed with the availability of publicly available codes. These efforts are pertinent particularly because the eventual goal for AI is the successful implementation and integration into patient care.
      With the growing number of companies occupying the AI space, there is a need to use AI intelligently to make drug development processes more efficient. There are many companies offered at each stage along the entire drug development pipeline. Every problem is unique; hence, a different approach is needed to answer the intended question elegantly. This highlights the importance of selecting the most appropriate company, and their accompanying AI platforms, to solve the issue. In addition, with the burgeoning of AI companies, there is a worry that the drug development space would be saturated. To prevent this saturation, it is essential to keep these companies in check and to remind potential startups to employ an approach that has not been explored before, providing alternative options and promoting diversity as well.
      From the examples illustrated in this review, we are just scratching the surface in unveiling the capabilities of AI in transforming drug development processes. Upon addressing the issues raised earlier, models of high predictive validity holds the key to increased efficiency in drug development. With the availability of more effective therapeutics coupled with accurate biomarker identification and patient stratification, there is great potential in AI to advance precision and personalized medicine at unprecedented rates.
      Declaration of Conflicting Interests
      The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

      Funding

      The authors received no financial support for the research, authorship, and/or publication of this article.

      References

        • DiMasi J.A.
        • Feldman L.
        • Seckler A.
        • et al.
        Trends in Risks Associated with New Drug Development: Success Rates for Investigational Drugs.
        Clin. Pharmacol. Ther. 2010; 87: 272-277
        • Kola I.
        • Landis J.
        Can the Pharmaceutical Industry Reduce Attrition Rates?.
        Nat. Rev. Drug Discov. 2004; 3: 711-716
        • Scior T.
        • Bender A.
        • Tresadern G.
        • et al.
        Recognizing Pitfalls in Virtual Screening: A Critical Review.
        J. Chem. Inform. Model. 2012; 52: 867-881
        • Lavecchia A.
        Machine-Learning Approaches in Drug Discovery: Methods and Applications.
        Drug Discov. Today. 2015; 20: 318-331
        • McCarthy J.
        • Hayes P.J.
        Some Philosophical Problems from the Standpoint of Artificial Intelligence.
        in: Readings in Artificial Intelligence. Webber, B. L., Nilsson, N. J.; Elsevier, New York1981: 431-450
        • Korf R.E.
        Does Deep Blue Use AI?.
        in: Proceedings of the 4th AAAI Conference on Deep Blue versus Kasparov: The Significance for Artificial Intelligence. AAAI Press, Palo Alto, CA1997: 1-2
        • DeCoste D.
        The Future of Chess-Playing Technologies and the Significance of Kasparov versus Deep Blue..
        in: Proceedings of the 4th AAAI Conference on Deep Blue versus Kasparov: The Significance for Artificial Intelligence. AAAI Press, Palo Alto, CA1997: 9-13
        • Giaccone G.
        • Bazhenova L.
        • Nemunaitis J.
        • et al.
        A Phase III Study of Belagenpumatucel-L, an Allogeneic Tumour Cell Vaccine, as Maintenance Therapy for Non-Small Cell Lung Cancer.
        Eur. J. Cancer. 2015; 51: 2321-2329
        • Schölkopf B.
        • Tsuda K.
        • Vert J.-P.
        Support Vector Machine Applications in Computational Biology.
        MIT Press, Cambridge, MA2004
        • Vapnik V.
        The Nature of Statistical Learning Theory.
        Springer Science & Business Media, New York2013
        • Vapnik V.
        • Vapnik V
        Statistical Learning Theory.
        Wiley, New York1998
        • Quinlan J.R.
        Induction of Decision Trees.
        Machine Learn. 1986; 1: 81-106
        • Altman N.S.
        An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression.
        Am. Stat. 1992; 46: 175-185
        • Nielsen T.D.
        • Jensen F.V.
        Bayesian Networks and Decision Graphs.
        Springer Science & Business Media, New York2009
        • Dempster A.P.
        A Generalization of Bayesian Inference.
        J. R. Stat. Soc. Ser. B (Methodol.). 1968; 30: 205-232
        • Haykin S.
        Neural Networks: A Comprehensive Foundation.
        Prentice Hall, Upper Saddle River, NJ1994
        • LeCun Y.
        • Boser B.E.
        • Denker J.S.
        • et al.
        Handwritten Digit Recognition with a Back-Propagation Network.
        in: Touretzky D. Advances in Neural Information Processing Systems. Morgan Kaufmann, Denver, CO1990: 396-404
        • LeCun Y.
        • Bottou L.
        • Bengio Y.
        • et al.
        Gradient-Based Learning Applied to Document Recognition.
        Proc. IEEE. 1998; 86: 2278-2324
        • Schmidhuber J.
        • Hochreiter S.
        Long Short-Term Memory.
        Neural Comput. 1997; 9: 1735-1780
        • Hinton G.E.
        • Salakhutdinov R.R.
        Reducing the Dimensionality of Data with Neural Networks.
        Science. 2006; 313: 504-507
        • Hinton G.E.
        • Osindero S.
        • Teh Y.-W.
        A Fast Learning Algorithm for Deep Belief Nets.
        Neural Comput. 2006; 18: 1527-1554
        • Salakhutdinov R.
        • Hinton G.
        Deep Boltzmann Machines.
        Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research. 2009; 5: 448-455
        • Hinton G.E.
        Learning Multiple Layers of Representation.
        Trends Cogn. Sci. 2007; 11: 428-434
        • Hinton G.E.
        Training Products of Experts by Minimizing Contrastive Divergence.
        Neural Comput. 2002; 14: 1771-1800
        • Goodfellow I.
        • Pouget-Abadie J.
        • Mirza M.
        • et al.
        Generative Adversarial Nets.
        in: Ghahramani Z. Welling M. Cortes C. Lawrence N.D. Weinberger K.Q. Advances in Neural Information Processing Systems (NIPS). MIT Press, Cambridge, MA2014: 1-9
      1. Karras T., Aila T., Laine S., et al. Progressive Growing of GANS for Improved Quality, Stability, and Variation. arXiv preprint arXiv:1710.10196 2017.

        • Goodfellow I.
        NIPS 2016 Tutorial: Generative Adversarial Networks.
        arXiv preprint arXiv:1701.00160. 2016;
        • Bajorath J.
        Integration of Virtual and High-Throughput Screening.
        Nat. Rev. Drug Discov. 2002; 1: 882-894
        • Bajorath J.
        Selected Concepts and Investigations in Compound Classification, Molecular Descriptor Analysis, and Virtual Screening.
        J. Chem. Inform. Comput. Sci. 2001; 41: 233-245
        • Cramer III, R.D.
        • Redl G.
        • Berkoff C.E.
        Substructural Analysis. Novel Approach to the Problem of Drug Design.
        J. Med. Chem. 1974; 17: 533-535
        • Hodes L.
        • Hazard G.F.
        • Geran R.I.
        • et al.
        A Statistical-Heuristic Method for Automated Selection of Drugs for Screening.
        J. Med. Chem. 1977; 20: 469-475
        • Hert J.
        • Willett P.
        • Wilton D.J.
        • et al.
        New Methods for Ligand-Based Virtual Screening: Use of Data Fusion and Machine Learning to Enhance the Effectiveness of Similarity Searching.
        J. Chem. Inform. Model. 2006; 46: 462-470
        • Gesta S.
        • Guntur K.
        • Majumdar I.D.
        • et al.
        Reduced Expression of Collagen VI Alpha 3 (COL6A3) Confers Resistance to Inflammation-Induced MCP1 Expression in Adipocytes.
        Obesity. 2016; 24: 1695-1703
      2. Victors M. L., Borgeson B. C., St-Jean-Leblanc C. Systems and Methods for Evaluating Whether Perturbations Discriminate an on Target Effect. US Patent 10146914B1, December 4, 2018.

        • Gibson C.C.
        • Zhu W.
        • Davis C.T.
        • et al.
        Strategy for Identifying Repurposed Drugs for the Treatment of Cerebral Cavernous Malformation.
        Circulation. 2015; 131: 289-299
        • Riant F.
        • Bergametti F.
        • Ayrignac X.
        • et al.
        Recent Insights into Cerebral Cavernous Malformations: The Molecular Genetics of CCM.
        FEBS J. 2010; 277: 1070-1075
      3. Chon, J. Automation and Machine Learning: A Look into Recursion Pharmaceuticals. September 2017. https://www.rarediseasereview.org/publications/2017/9/17/automation-and-machine-learning-a-look-into-recursion-pharmaceuticals

      4. Dahl, G. E.; Jaitly, N.; Salakhutdinov, R. Multi-task neural networks for QSAR predictions. arXiv preprint arXiv:1406.1231 2014.

        • Mayr A.
        • Klambauer G.
        • Unterthiner T.
        • et al.
        DeepTox: Toxicity Prediction Using Deep Learning.
        Front. Environ. Sci. 2016; 3: 80
        • Ma J.
        • Sheridan R.P.
        • Liaw A.
        • et al.
        Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships.
        J. Chem Inform. Model. 2015; 55: 263-274
        • Xu K.
        • Hu W.
        • Leskovec J.
        • et al.
        How Powerful Are Graph Neural Networks?.
        arXiv preprint arXiv:1810.00826. 2018;
        • Duvenaud D.K.
        • Maclaurin D.
        • Iparraguirre J.
        • et al.
        Advances in Neural Information Processing Systems.
        in: Cortes C. Lawrence N.D. Lee D.D. Sugiyama M. Garnett R. Convolutional Networks on Graphs for Learning Molecular Fingerprints. Curran Associates, Inc., Red Hook, NY2015: 2224-2232
        • Glen R.C.
        • Bender A.
        • Arnby C.H.
        • et al.
        Circular Fingerprints: Flexible Molecular Descriptors with Applications from Physical Chemistry to ADME.
        IDrugs. 2006; 9: 199
        • Kearnes S.
        • McCloskey K.
        • Berndl M.
        • et al.
        Molecular Graph Convolutions: Moving beyond Fingerprints.
        J. Comput. Aid. Mol. Design. 2016; 30: 595-608
        • Schütt K.T.
        • Arbabzadah F.
        • Chmiela S.
        • et al.
        Quantum-Chemical Insights from Deep Tensor Neural Networks.
        Nat. Commun. 2017; 8: 1-8
      5. Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; et al. Neural Message Passing for Quantum Chemistry. arXiv preprint arXiv:1704.01212 2017.

        • Liu K.
        • Sun X.
        • Jia L.
        • et al.
        Chemi-Net: A Molecular Graph Convolutional Network for Accurate Drug Property Prediction.
        Int. J. Mol. Sci. 2019; 20: 3389
        • Chan H.S.
        • Shan H.
        • Dahoun T.
        • et al.
        Advancing Drug Discovery via Artificial Intelligence.
        Trends Pharmacol. Sci. 2019; 40: 592-604
        • AlQuraishi M.
        AlphaFold at CASP13.
        Bioinformatics. 2019; 35: 4862-4865
        • Team A.
        Computational Predictions of Protein Structures Associated with COVID-19.
        DeepMind Website K. 2020; 417: Y453
        • Jin W.
        • Barzilay R.
        • Jaakkola T.
        Junction Tree Variational Autoencoder for Molecular Graph Generation.
        arXiv preprint arXiv:1802.04364. 2018;
        • You J.
        • Liu B.
        • Ying Z.
        • et al.
        Advances in Neural Information Processing Systems.
        in: Bengio S. Wallach H.M. Larochelle H. Grauman K. Cesa-Bianchi N. Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. Curran Associates, Inc, Red Hook, NY2018: 6410-6421
        • Shi C.
        • Xu M.
        • Zhu Z.
        • et al.
        GraphAF: A Flow-Based Autoregressive Model for Molecular Graph Generation.
        arXiv preprint arXiv:2001.09382. 2020;
      6. Kingma, D. P.; Welling, M. Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114 2013.

        • Zhavoronkov A.
        • Ivanenkov Y.A.
        • Aliper A.
        • et al.
        Deep Learning Enables Rapid Identification of Potent DDR1 Kinase Inhibitors.
        Nat. Biotechnol. 2019; 37: 1038-1040
        • Méndez-Lucio O.
        • Baillif B.
        • Clevert D.-A.
        • et al.
        De Novo Generation of Hit-Like Molecules from Gene Expression Signatures Using Artificial Intelligence.
        Nat. Commun. 2020; 11: 1-10
      7. Wallach, I.; Dzamba, M.; Heifets, A. AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-Based Drug Discovery. arXiv preprint arXiv:1510.02855 2015

        • Hsieh C.-H.
        • Li L.
        • Vanhauwaert R.
        • et al.
        Miro1 Marks Parkinson’s Disease Subset and Miro1 Reducer Rescues Neuron Loss in Parkinson’s Models.
        Cell Metab. 2019; 30: 1131-1140.e7
        • Burki T.
        A New Paradigm for Drug Development.
        Lancet Digital Health. 2020; 2: e226-e227
        • Mullard A.
        The Drug-Maker’s Guide to the Galaxy.
        Nat. News. 2017; 549: 445
        • Kotz J.
        In silico drug design.
        Science Business eXchange. 2013; 6: 50
        • Besnard J.
        • Ruda G.F.
        • Setola V.
        • et al.
        Automated Design of Ligands to Polypharmacological Profiles.
        Nature. 2012; 492: 215-220
        • Roche O.
        • Sarmiento R.M.R.
        A New Class of Histamine H3 Receptor Antagonists Derived from Ligand Based Design.
        Bioorg. Med. Chem. Lett. 2007; 17: 3670-3675
        • Schneider G.
        • Geppert T.
        • Hartenfeller M.
        • et al.
        Reaction-Driven De Novo Design, Synthesis and Testing of Potential Type II Kinase Inhibitors.
        Future Med. Chem. 2011; 3: 415-424
        • Pantziarka P.
        • Bouche G.
        • Meheus L.
        • et al.
        The Repurposing Drugs in Oncology (ReDO) Project.
        ecancermedicalscience. 2014; 8: 442
        • Ashburn T.T.
        • Thor K.B.
        Drug Repositioning: Identifying and Developing New Uses for Existing Drugs.
        Nat. Rev. Drug Discov. 2004; 3: 673-683
        • Chong C.R.
        • Sullivan D.J.
        New Uses for Old Drugs.
        Nature. 2007; 448: 645-646
        • Sanchez C.G.
        • Molinski S.V.
        • Gongora R.
        • et al.
        The Antiretroviral Agent Nelfinavir Mesylate: A Potential Therapy for Systemic Sclerosis.
        Arthritis Rheumatol. 2018; 70: 115-126
        • Mason D.J.
        • Eastman R.T.
        • Lewis R.P.
        • et al.
        Using Machine Learning to Predict Synergistic Antimalarial Compound Combinations with Novel Structures.
        Front. Pharmacol. 2018; 9: 1096
        • Tranfaglia M.R.
        • Thibodeaux C.
        • Mason D.J.
        • et al.
        Repurposing Available Drugs for Neurodevelopmental Disorders: The Fragile X Experience.
        Neuropharmacology. 2019; 147: 74-86
        • Rashid M.B.M.A.
        • Toh T.B.
        • Hooi L.
        • et al.
        Optimizing Drug Combinations against Multiple Myeloma Using a Quadratic Phenotypic Optimization Platform (QPOP).
        Sci. Translat. Med. 2018; 10: eaan0941
        • de Mel S.
        • Rashid M.B.
        • Zhang X.Y.
        • et al.
        Application of an Ex-Vivo Drug Sensitivity Platform towards Achieving Complete Remission in a Refractory T-Cell Lymphoma.
        Blood Cancer J. 2020; 10: 1-5
      8. Blasiak, A.; Lim, J. J.; Seah, S. G. K.; et al. IDentif. AI: Artificial Intelligence Pinpoints Remdesivir in Combination with Ritonavir and Lopinavir as an Optimal Regimen against Severe Acute Respiratory Syndrome Coronavirus 2 (SARSCoV-2). medRxiv 2020.

        • van Gool A.J.
        • Bietrix F.
        • Caldenhoven E.
        • et al.
        Bridging the Translational Innovation Gap through Good Biomarker Practice.
        Nat. Rev. Drug Discov. 2017; 16: 587
        • Kraus V.B.
        Biomarkers as Drug Development Tools: Discovery, Validation, Qualification and Use.
        Nat. Rev. Rheumatol. 2018; 14: 354-362
        • West H.J.
        Novel Precision Medicine Trial Designs: Umbrellas and Baskets.
        JAMA Oncol. 2017; 3: 423
        • Renfro L.A.
        • Mandrekar S.J.
        Definitions and Statistical Properties of Master Protocols for Personalized Medicine in Oncology.
        J. Biopharm. Stat. 2018; 28: 217-228
        • Mason M.J.
        • Schinke C.
        • Eng C.L.
        • et al.
        Multiple Myeloma DREAM Challenge Reveals Epigenetic Regulator PHF19 as Marker of Aggressive Disease.
        Leukemia. 2020; 34: 1866-1874
        • Das R.
        • Ou F.
        • Washburn C.
        • et al.
        PD-020 Bayesian Machine Learning on CALGB/SWOG 80405 (Alliance) and PEAK Data Identify a Heterogeneous Landscape of Clinical Predictors of Overall Survival (OS) in Different Populations of Metastatic Colorectal Cancer (mCRC).
        Ann Oncol. 2019; 30: mdz156.019
        • Taylor K.
        • Das S.
        • Pearson M.
        • et al.
        Systematic Drug Repurposing to Enable Precision Medicine: A Case Study in Breast Cancer.
        Digital Med. 2019; 5: 180
        • Wang T.
        • Fu X.
        • Jin T.
        • et al.
        Aspirin Targets P4HA2 through Inhibiting NF-κB and LMCD1-AS1/let-7g to Inhibit Tumour Growth and Collagen Deposition in Hepatocellular Carcinoma.
        EBioMedicine. 2019; 45: 168-180
        • Oh K.
        • Ko E.
        • Kim H.S.
        • et al.
        Transglutaminase 2 Facilitates the Distant Hematogenous Metastasis of Breast Cancer by Modulating Interleukin-6 in Cancer Cells.
        Breast Cancer Res. 2011; 13: R96
        • Tan D.
        • Phipps C.
        • Hwang W.Y.
        • et al.
        Panobinostat in Combination with Bortezomib in Patients with Relapsed or Refractory Peripheral T-Cell Lymphoma: An Open-Label, Multicentre Phase 2 Trial.
        Lancet Haematol. 2015; 2: e326-e333
        • Gaulton A.
        • Hersey A.
        • Nowotka M.
        • et al.
        The ChEMBL Database in 2017.
        Nucleic Acids Res. 2017; 45: D945-D954
        • Antolin A.A.
        • Tym J.E.
        • Komianou A.
        • et al.
        Objective, Quantitative, Data-Driven Assessment of Chemical Probes.
        Cell Chem. Biol. 2018; 25: 194-205.e5
        • Coker E.A.
        • Mitsopoulos C.
        • Tym J.E.
        • et al.
        canSAR: Update to the Cancer Translational Research and Drug Discovery Knowledge Base.
        Nucleic Acids Res. 2019; 47: D917-D922
        • Wu Z.
        • Ramsundar B.
        • Feinberg E.N.
        • et al.
        MoleculeNet: A Benchmark for Molecular Machine Learning.
        Chem. Sci. 2018; 9: 513-530
      9. Li J., Cai D., He X. Learning Graph-Level Representation for Drug Discovery. arXiv preprint arXiv:1709.03741 2017.

        • Ryu S.
        • Lim J.
        • Hong S.H.
        • et al.
        Deeply Learning Molecular Structure-Property Relationships Using Attention- and Gate-Augmented Graph Convolutional Network.
        arXiv preprint arXiv:1805.10988. 2018;