Dr. Thomas Engel - Faculty for Chemistry and Pharmacy

Textbook - Learning Objectives

  1. Introduction
  2. Representation of Chemical Compounds
  3. Representation of Chemical Reactions
  4. The Data
  5. Databases/Datasources
  6. Databases/Datasources
  7. Calculation of Physical and Chemical Data
  8. Calculation of Structure Descriptors
  9. Methods for Data Analysis
  10. Applications

1. Introduction

2. Representation of Chemical Compounds

  • To understand different kinds of conventional nomenclature of chemical compounds
  • To know how to transform a chemical structure into a language for computer representation and manipulation
  • To be able to represent the constitution in an unambiguous and unique manner
  • To learn more about connection tables and additional special notations of chemical structures
  • To become familiar with structure exchange formats such as Molfile and SDfile
  • To find out how stereochemistry can be represented in a 2D structure
  • To know generate 3D structures and how to represent and handle them with the computer
  • To be introduced to molecular surfaces and to different models for visualization
  • To recognize which programs can be used for the generation and visualization of molecular structures

3. Representation of Chemical Reactions

  • To understand how to extract knowledge from reaction information
  • To recognize reaction classification as an important step in learning from reaction instances
  • To appreciate the reaction center and its importance in reaction searching
  • To become familiar with basic models of chemical reactivity
  • To know simple approaches to quantify chemical reactivity
  • To be able to follow some algorithmic approaches to reaction classification
  • To understand the formal treatment of the stereochemistry of reactions

4. The Data

  • To gain a general overview on data and its pre-processing for learning
  • To know, in outline, the pathways for data acquisition
  • To understand what datasets are and how to estimate their quality
  • To be able to deal with outliers and redundancy
  • To know how to carry out scaling, mean-centering, and auto-scaling
  • To understand data transformations and their applicability
  • To know how to select an optimal subset of descriptors
  • To become familiar with dataset optimization techniques
  • To know how to validate results
  • To understand what training and test sets are, and how to make use of them

5. Databases/Datasources

  • To understand introductory basic database theory
  • To become familiar with the classification of chemical databases according to their data stock
  • To get to know various databases covering the topics of bibliographic data, physicochemical properties, and spectroscopic, crystallographic, biological, structural, reaction, and patent data
  • To be able to access chemical information available on the Internet

6.Structure Search Methods

  • To become familiar with various methods and tools for full structure recognition and the search in structural data sets.
  • To learn a more thorough approach to the solution of the substructure search problem.
  • To become familiar with the basics of chemical structure similarity, similarity measures, and different approaches exploited within the similarity search process.

7. Calculation of Physical and Chemical Data

  • To be able to calculate molecular properties by additivity schemes based on contributions by structural subunits
  • To become familiar with the estimation of thermochemical data
  • To understand the estimation of average drug-receptor binding energies
  • To become familar with the algorithm for charge calculation by partial equalization of orbital electronegativity (PEOE) and by a modified Hückel Molecular Orbital method
  • To appreciate residual electronegativity as a measure of the inductive effect
  • To follow a simple scheme for calculating the polarizability effect
  • To know how linear equations can be used for calculation of enthalpies of gas-phase reactions
  • To understand the basic concepts of force field-calculations
  • To see the contributions to the molecular mechanics potential energy function and their mathematical representation
  • To get an overview of the currently available software and implementations with their strengths, weaknesses, and application areas
  • To understand the importance of investigating the dynamical behavior of molecules
  • To have an overview of the algorithms and basic concepts used to perform molecular dynamics simulations
  • To consider exemplary state-of-the-art applications of MD simulations
  • To become familiar with the different quantum mechanical methods
  • To know which properties can be derived from quantum mechanical methods
  • To ponder on the future of quantum mechanics in chemoinformatics

8. Calculation of Structure Descriptors

  • To understand what structure descriptors are.
  • To know what QSAR and QSPR are, and the steps in QSAR/QSPR.
  • To find out how to distinguish between the different kinds of molecular descriptors.
  • To understand the recommendations for structure descriptors in order to be able to apply them in QSAR or drug design in conjunction with statistical or machine learning techniques.
  • To become familiar with the properties of these descriptors.
  • To know which are the frequently used descriptors.

9. Methods for Data Analysis

  • To understand the machine learning process and learning concepts
  • To become familiar with the structure and task of decision trees
  • To gain insight into chemometric methods such as correlation analysis, Multiple Linear Regression Analysis, Principal Component Analysis, Principal Component Regression, and Partial Least Squares regression/Projection to Latent Variables
  • To understand neural networks, especially Kohonen, counterpropagation and backpropagation networks, and their applications
  • To know about fuzzy sets and fuzzy logic
  • To become familiar with genetic algorithms and their application for descriptor selection
  • To understand data mining and data mining tasks
  • To understand visual data mining and information visualization techniques
  • To appreciate the architecture and tasks of expert systems and examples of expert systems in chemistry

10. Applications

  • To understand how to derive a quantitative relationship between property and structure
  • To become familiar with the application of the basic principles of the model building process by means of calculating log P and log S values
  • To acquire an overview of methods and examples of some pitfalls in modeling log P, log S, and the toxic effects of compounds
  • To identify the main methods and tools available for the computer prediction of spectra from the molecular structure, and for automatic structure elucidation from spectral data
  • To realize that a proper representation of the molecular structure is crucial for the pre-diction of spectra
  • To recognize the main approaches for structure representation in the context of struc-ture-spectra correlations
  • To be able to define reaction planning, reaction prediction, and synthesis design
  • To know how to acquire knowledge from reaction databases
  • To understand reaction simulation systems
  • To become familiar with a knowledge-based reaction prediction system
  • To appreciate the different levels of the evaluation of chemical reactions
  • To know how reaction sequences are modeled
  • To understand kinetic modeling of chemical reactions
  • To become familiar with biochemical pathways
  • To recognize the different levels of representation of biochemical reactions
  • To understand metabolic reaction networks
  • To know the principles of retrosynthetic analysis
  • To understand the disconnection approach
  • To become familiar with synthesis design systems
  • Developing a suitable synthesis strategy for a target compound by searching for synthesis precursors, starting materials and synthesis reactions
  • To become familiar with the drug discovery process
  • To find out what a lead structure is
  • To appreciate the impact of chemoinformatics on the drug discovery process
  • To understand the "similar property" principle
  • To know what virtual screening is
  • To become familiar with Lipinski's "Rule of Five"