Virtual screening is a technical method that uses computers to quickly screen out potential active molecules from large compound libraries, which can greatly reduce the number of compounds entering the biochemical experimental stage, effectively reduce screening costs, and improve screening hit rates. The important supplementary technology of quantitative screening technology to discover new hit compounds has received more and more attention from scientific research institutions and pharmaceutical companies.
The goal of LBVS is to build a suitable model or question-and-answer system based on the known biologically active ligands of a specific target, so as to identify and screen structurally diverse molecules with similar biological activities. At the heart of ligand-based approaches is reliance on the chemical structure of the ligand and its associated biological activity of similar ligands. LBVS methods are usually used to predict the biological activity of new compounds against a specific target (or a series of specific targets), and the target is predicted according to the activity ranking of the predicted compounds. Databases such as ChEMBL, PubChem, and ExCape store experimentally validated biological activity data for many compounds on a variety of proteins, and are used as modeling data sources to collect chemical structure and activity information. The main representative methods of LBVS include pharmacophore modeling, 3D shape screening, substructure searching, QSAR modeling and machine learning methods, etc.
Using the reported target molecules, software such as Schrödinger and LigandScout are used to derive a pharmacophore model, which defines the key pharmacophore characteristics necessary for the molecule to bind to the target. This pharmacophore model can then be further used for molecular library screening. There are two main methods for the identification and definition of pharmacophore: if there is a target structure, the possible pharmacophore structure can be inferred by analyzing the mode of action of the ligand and target protein; when the structure of the target is unknown or the binding mode is unknown , a series of compounds reported related to the target are subjected to pharmacophore research, and summarize the groups that play a key role in the activity of compounds by methods such as conformational analysis and ligand superposition. In recent years, with the development of compound databases and computer technology, virtual screening of databases using pharmacophore models has been widely used, and has become one of the important means to discover lead compounds.
SBVS requires structural models of the target protein, which can be obtained by methods such as nuclear magnetic resonance (NMR), X-ray diffraction, or molecular simulations (modeling of homologous proteins). SBVS attempts to predict the binding mode between the ligand and the target protein, and uses a scoring function to calculate and rank the binding free energy of the binding mode to obtain a list of candidate compounds. The main representative methods include molecular docking, molecular dynamics simulation, structure-based pharmacophore, machine learning, etc. Among them, molecular docking, as one of the most commonly used classical methods in SBVS, plays an extremely significant role in the study of the mechanism of action between drugs and their targets and the development of new drugs. Molecular docking software is used to study the strength of interaction between small molecule ligands and receptor biomacromolecules, predict the binding and affinity of ligand complexes, and then find the optimal ligand structure.
The theoretical basis is that the recognition process of ligands and receptors relies on spatial shape matching and energy matching, that is, "induced fit". Molecular docking can be roughly divided into rigid docking, semi-flexible docking and flexible docking. In rigid docking, the molecular structure remains unchanged, and the degree of conformational matching is mainly studied. The calculation method is relatively simple, so it is more suitable for studying macromolecular systems, such as protein-protein and protein-nucleic acid systems. In semi-flexible docking, the molecular conformation can be changed within a certain range, so it is more suitable for dealing with the interaction between proteins and small molecules. In general, the structure of small molecules can be changed freely, while large molecules remain rigid or retain some rotatable amino acid residues to ensure computational efficiency. In the flexible docking, the conformation of the protein is free to change, although it consumes more computing resources but significantly improves the docking accuracy.
The molecular docking program has two important components, namely the search algorithm and the scoring function. Search algorithms such as genetic algorithms and Monte Carlo tree search algorithms, aiming to explore the conformational space of ligands/receptors. Scoring functions, including force field-based, empirical and knowledge-based scoring functions, are used to score binding patterns. The commonly used molecular docking software is Autodock, which is mainly used for ligand-protein docking. DOCK is suitable for docking between flexible ligands and flexible proteins.