reading

Bibliografische Information der Deutschen Nationalbibliothek:

Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http://www.dnb.de abrufbar.

Herstellung und Verlag:

BoD – Books on Demand GmbH, Norderstedt

ISBN: 978-3-7412-5505-2

INTRODUCTION
AIM OF THE BOOK
THE AUTHOR
PART I CHEMISTRY BASICS
- 1 Atoms
  - 1.1 Isotope
  - 1.2 Electronegativity
  - 1.3 Electromagnetic Radiation
  - 1.4 Bohr's theory
  - 1.5 Heisenberg uncertainty principle
  - 1.6 Wave mechanics
  - 1.7 Quantum mechanical orbital model
- 2 Molecules
  - 2.1 Covalent Bonds
  - 2.2 Ionic Bonds
  - 2.3 Intermolecular interactions
    - 2.3.1 Electric Dipoles
    - 2.3.2 Hydrogen Bonds
    - 2.3.3 London dispersion force
- 3 Acids and Bases
  - 3.1 pH-Value
  - 3.2 pK_S-Value und pK_B- Value
- 4 Chemical reactions
- 5 Further Reading
  - 5.1 Textbooks
PART II BIOLOGY BASICS
- 6 Prokaryotes
- 7 Eukaryotes
- 8 Further Reading
  - 8.1 Textbooks
PART III BIOCHEMISTRY
- 9 Proteins
  - 9.1 Amino Acids
  - 9.2 Peptides
  - 9.3 Protein Structure
  - 9.4 Super secondary Structures
- 10 Carbohydrates
  - 10.1 Monosaccharides
  - 10.2 Disaccharides
  - 10.3 Polysaccharides
  - 10.4 Glycolization
- 11 Fatty Acids
  - 11.1 Steroids
- 12 Nucleic Acids
  - 12.1 DNA
  - 12.2 RNA
- 13 Further Reading
  - 13.1 Textbooks
  - 13.2 References
PART IV MOLECULAR GENETICS
- 14 Gene
  - 14.1 Characteristic features of eukaryotic Genes
- 15 Replication
- 16 Protein Biosynthesis
  - 16.1 Transcription
  - 16.2 Translation
  - 16.3 The Protein biosynthesis in detail
- 17 Further Reading
  - 17.1 Textbooks
  - 17.2 References
PART V METABOLISM
- 18 Enzyme
  - 18.1 Enzyme kinetics
  - 18.2 Enzyme Inhibitions
  - 18.3 Cofactors
  - 18.4 Enzyme Specificity
    - 18.4.1 Substrate Specificity
    - 18.4.2 Specificity of Action
    - 18.4.3 Stereo specificity
  - 18.5 Enzyme classification
    - 18.5.1 Classification on the basis of sequence similarities
    - 18.5.2 Classification based on structures
    - 18.5.3 Classification based on Function
  - 18.6 Important Enzymes
    - 18.6.1 Trypsin
    - 18.6.2 Isoenzyme
- 19 Primary metabolism
  - 19.1 Citric acid cycle
  - 19.2 Carbohydrate metabolism
    - 19.2.1 Glycolysis (Embden-Meyerhof Pathway)
    - 19.2.2 Entner-Doudoroff Pathway
    - 19.2.3 Pentose Phosphate Pathway
    - 19.2.4 Gluconeogenesis
  - 19.3 Fatty acid metabolism
    - 19.3.1 ß-Oxidation
    - 19.3.2 Fatty acid Synthesis
  - 19.4 Amino acid Metabolism
    - 19.4.1 Amino acid Synthesis
    - 19.4.2 Amino acid Degradation
  - 19.5 Secondary Metabolism
    - 19.5.1 Alkaloids
    - 19.5.2 Flavonoids
    - 19.5.3 Flavons
- 20 Further Reading
  - 20.1 Textbooks
  - 20.2 References
PART VI WORKING TECHNIQUES
- 21 Cell Disruption
  - 21.1 Extraction
- 22 Chromatography
  - 22.1 Ion Exchange Chromatography
  - 22.2 Affinity Chromatography
  - 22.3 Gel permeation chromatography (Gel filtration)
- 23 Electrophoresis
  - 23.1 Gel Electrophoresis
    - 23.1.1 Separation of DNA
    - 23.1.2 Separation of Proteins
    - 23.1.3 Two-dimensional gel electrophoresis (2GE)
- 24 NMR Spectroscopy
  - 24.1 Principle of the Nuclear Spin
- 25 X-ray Analysis Of Crystals
  - 25.1 Crystal Growth
  - 25.2 X-ray Diffraction
  - 25.3 Phase problem
  - 25.4 Structure Refinement
- 26 Microarrays
  - 26.1 DNA-Chips
  - 26.2 Protein Chips
- 27 Mass Spectrometry
  - 27.1 Ion Sources
    - 27.1.1 Electrospray Ionization
    - 27.1.2 Matrix- Assistant Laser desorption/Ionization
  - 27.2 Mass Analyzer
    - 27.2.1 Quadrupoles
    - 27.2.2 Time of Flight (TOF)
    - 27.2.3 Ion trap
  - 27.3 Detectors
  - 27.4 Liquid Chromatography Mass Spectrometry (LC-MS)
  - 27.5 Gas Chromatography Mass Spectrometry (GC-MS)
- 28 Molecular Biological Methods
  - 28.1 PCR: Polymerase Chain Reaction
  - 28.2 Molecular Cloning
  - 28.3 Complementary DNA
- 29 Further Reading
  - 29.1 Textbooks
  - 29.2 References
PART VII COMPUTER SCIENCE BASICS
- 30 Operating System (OS)
  - 30.1 The Linux operating system
    - 30.1.1 Shell
    - 30.1.2 Directories and Files
    - 30.1.3 Rights Management
    - 30.1.4 LINUX Commands
- 31 Programming Languages
  - 31.1 Classifications
    - 31.2.1 Interpreter
    - 31.2.2 Compiler
    - 31.2.3 Data Types
- 32 Algorithms
  - 32.1 Deterministic Algorithms
  - 32.2 Graph Theory
  - 32.3 Computational complexity theory
- 33 Further Reading
  - 33.1 Textbooks
PART VIII STATISTICS AND STOCHASTICS
- 34 Statistical Parameters
  - 34.1 Regression analysis
    - 34.1.1 Linear Regression
  - 34.2 Maximum-Likelihood-Method
  - 34.3 Covariance
  - 34.4 Correlation
    - 34.4.1 Correlation und Causality
- 35 Probability
  - 35.1 Events
  - 35.2 Probability Axioms
  - 35.3 Addition of probabilities
  - 35.4 Conditional probability
  - 35.5 Multiplication theorem
  - 35.6 Total probability
  - 35.7 Bayes' theorem
  - 35.8 Permutation
- 36 Statistical Distributions
  - 36.1 Binomial Distribution
  - 36.2 Hypergeometric Distribution
  - 36.3 Poisson Distribution
  - 36.4 Normal Distribution
    - 36.4.1 Standard Normal Distribution
- 37 Statistical Hypotheses
  - 37.1 False-positive Error
  - 37.2 False-negative Error
  - 37.3 Significance level
  - 37.4 Test Statistic
  - 37.5 P-Value
  - 37.6 Confidence Interval
  - 37.7 t-Test
  - 37.8 F-Test
- 38 Cluster analysis
  - 38.1 Similarity measures
    - 38.1.1 Tanimoto Coefficient
    - 38.1.2 Euclidean Distance
  - 38.2 Cluster methods
    - 38.2.1 Nearest Neighbor Clustering
    - 38.2.2 Furthest Neighbor Clustering
    - 38.2.3 Centroid-Approach
    - 38.2.4 Group Averaging
    - 38.2.4 Minimum-Variance-Methods
    - 38.2.4 k-Means method
- 39 Principal component analysis
- 40 Further Reading
  - 40.1 Textbooks
  - 40.2 References
PART IX GENERAL BIOINFORMATICS
- 41 Basics
  - 41.1 Annotation
  - 41.2 Protein Domains
  - 41.3 Homologies
  - 41.4 Consensus sequences
  - 41.5 Sequence • Folding • Function
- 42 Databases
  - 42.1 Relational Databases
    - 42.1.1 Structured Query Language (SQL)
  - 42.2 Biological Databases
    - 42.2.1 GenBank
    - 42.2.2 ENA
    - 42.2.3 SwissProt
    - 42.2.4 TrEMBL
    - 42.2.5 NCBI Protein Database
    - 42.2.6 PDB
    - 42.2.7 CATH
    - 42.2.8 Pfam
    - 42.2.9 PROSITE
    - 42.2.10 KEGG
    - 42.2.11 FASTA file format
- 43 Sequence Analyses
  - 43.1 Sequence Alignments
  - 43.2 Identity and similarity of sequences
    - 43.2.1 PAM-Matrix
    - 43.2.2 BLOSUM-Matrix
    - 43.2.3 Application of Substitution matrices
  - 43.3 Algorithms for Sequence analysis
    - 43.3.1 Needleman-Wunsch-Algorithm
    - 43.3.2 Smith-Waterman-Algorithm
    - 43.3.3 BLAST (Basic Local Alignment Search Tool)
    - 43.3.4 Multiple Sequence alignment (MSA)
- 44 Molecular Modeling
  - 44.1 De novo- or ab initio Methods
  - 44.2 Threading
  - 44.3 Homology modeling
    - 44.3.1 BLAST-Search for suitable sequences
    - 44.3.2 Determining the template sequence
    - 44.3.3 Manual improvement of the alignments
    - 44.3.4 Generation of the protein backbone
    - 44.3.5 Insertion of missing Loops
    - 44.3.6 Modeling of the amino acid side chains
    - 44.3.7 Energy minimization
    - 44.3.8 Model Validation
- 45 Genomics
  - 45.1 DNA Sequencing
    - 45.1.1 Chain termination method of Sanger
    - 45.1.2 Next Generation Sequencing (NGS)
  - 45.2 Genome Sequencing
    - 45.2.1 Clone-By-Clone Sequencing
    - 45.2.2 Whole-Shotgun Sequencing
    - 45.2.3 Sequence Assembly
  - 45.3 Genome Annotation
    - 45.3.1 Ab initio Gene Identification
    - 45.3.2 Hidden-Markov-Model (HMM)
    - 45.2.3 Gene identification based on database searching
- 46 Transcriptomics
  - 46.1 Non-Coding RNA (ncRNA)
    - 46.1.1 microRNA
    - 46.1.2 Small interfering RNA (siRNA)
    - 46.1.3 Long non-coding RNAs (lncRNA)
    - 46.1.4 CRISPR RNAs (crRNAs)
  - 46.2 Gene expression profiling
    - 46.2.1 Spot identification
    - 46.2.2 Normalization
    - 46.2.3 Expression ratio
  - 46.3 RNA-Seq
- 47 Proteomics
  - 47.1 Protein identification
    - 47.1.1 Peptide Mass Fingerprinting (PMF)
    - 47.1.2 MS/MS Peptide Fragment Fingerprinting (PFF)
    - 47.1.3 Molecular Weight Search (MOWSE)-Algorithm
    - 47.1.4 MSA-Algorithm
    - 47.1.5 Sequest-Algorithm
    - 47.1.6 Significance of database search results
  - 47.2 Protein Quantification
    - 47.2.1 Isotopically labeled quantification
    - 47.2.2 Label-free quantification
    - 47.2.3 Quantification algorithms
- 48 Metabolomics
- 49 Further Reading
  - 49.1 Textbooks
  - 49.2 References
PART X SPECIAL BIOINFORMATICS
- 50 Structure and Function
  - 50.1 New classification models for proteins
  - 50.2. Advanced classification model for enzymes
  - 50.3 Enzyme reaction mapping algorithm
    - 50.3.1 Coding of atoms
    - 50.3.2 Coding of atomic bonds
    - 50.3.3. Coding of functional groups
    - 50.3.4 Assignment of known reactants (reaction pairs)
    - 50.3.5 Classification of unknown enzymes
- 51 Metabolic Network Analysis
  - 51.1 Network reconstruction
  - 51.2 Modeling, leading question and objective function
  - 51.3 Flux Balance Analysis (FBA)
  - 51.4 Elementary modes and extreme pathways
- 52 Further Reading
  - 52.1 Textbooks
  - 52.2 References

INTRODUCTION

Bioinformatics links content from the life sciences with mathematical concepts from computer science and is significantly involved in the current advances in the molecular biology and medicine. Bioinformatics is not as misleadingly claimed in some publications a sub-field of systems biology, rather the systems biology is an important application field of bioinformatics. Further areas of application are the structural biology, the pharmaceutical and biotechnological research, the Genome-, Proteome-, Transcriptome- and Metabolom Analysis. A core field of Bioinformatics comprises the development of software for the storage, evaluation and analysis of scientific data. However, the main task of bioinformatics is in the development of reliable algorithms for predicting biological functions of large biological datasets, as well as the development of algorithms for the simulation of biological processes based on these data. Bioinformatics is an independent science, because only with the help of sophisticated bioinformatics algorithms it will be possible to generate usable knowledge for the prediction of biological functions based on such large amounts of data. The long-term goal of this discipline is the computer-aided simulation of all known life processes of a human, animal or plant cell. As a result, the medium-term objective for Bioinformaticians will be gathering, storage and analysis of information about crucial biochemical processes of living cells. In this context, the development of biological databases plays an important role. Another medium-term goal will be analysis of the interactions of individual cells with each other, as well as with their environment. Moreover, there are in particular molecules and biochemical reactions of note, whose function is altered based on gene- or protein modifications. In this context, bioinformatics algorithms will be used to determine the function of individual genes and proteins as well as to identify the regulatory components of the cells. A prerequisite for the understanding of the content as presented in the book requires a basic knowledge of biology, chemistry and biochemistry in particular. For this reason, at the beginning of this book the required basics of the mentioned disciplines, important methods in molecular biology, essential working techniques and the basics of probability and statistics will be introduced.

AIM OF THE BOOK

The book is addressed to graduate and undergraduate students in the life sciences and information technology, as well as to advanced researchers in this fields, who want to acquire basic knowledge in the field of Bioinformatics. It serves as an entry point to the discipline as well as for deepening existing knowledge.

THE AUTHOR

Volker Egelhofer studied biotechnology, sinology, computer science and biochemistry. He obtained the degree Master of Engineering in biotechnology at the University of Applied Science Berlin and the degree Master of Science at the free University of Berlin. He performed his doctoral thesis in the field of theoretical biochemistry and bioinformatics at the Max Planck Institute for molecular genetics and acquired the degree of a doctor of science from the free University of Berlin in 2002. He held lectures and seminars in the field of bioinformatics at the University of Cologne, Technical University of Brunswick and the University of Vienna. He has more than 15 years' experience in the development of Bioinformatics algorithms and is the author of a number of scientific publications.

PART I CHEMISTRY BASICS

Chemistry is a branch of science. In following chapters, the basics of general and physical chemistry are taught, whose understanding is a prerequisite for a successful entry into the bioinformatics.

1 ATOMS

Atoms are the basic building blocks of matter. Every atom is composed of a nucleus and a nuclear envelope. The nucleus consists of protons and neutrons. The electrons move around at high speed in a certain distance to the nucleus. Protons have a positive and electrons a negative charge. Neutrons, on the other hand are not charged. The nucleus of an atom is very small in relation to its volume, but represents the total positive charge as well, due to the negligible mass of the electrons, the total mass of the atom. The electrostatic repulsion of the protons due to their positive charge is prevented by the nuclear strong force.

1.1 Isotope

An uncharged atom is made up of the same number of protons and electrons. The number of electrons in an atom is the same as the number of protons. The mass number is the sum of the number of protons and the number of neutrons. Only its atomic number is decisive for the chemical properties of an atom. Isotopes are atoms with the same atomic number but different mass number. The different mass relies on the different number of neutrons in the nucleus of the isotopes. The most natural elements consist of mixtures of isotopes. Carbon for example is a mixture of the isotopes C¹² (> 98.9%), C¹³ (~ 1%) and C¹⁴ (traces). Exceptions are sodium or fluorine. The distribution of the isotopes and the resulting isotope patterns play an important role especially in the mass spectrometry (see Chapter 27).

1.2 Electronegativity

The term electronegativity ' describes the ability of a nuclear core to attract its electrons. The absolute values are not of importance, only the relative values are important for comparing the ability of atoms to bind electrons to the nucleus. The element with the highest electronegativity (4.0) is fluorine followed by oxygen (3.5), nitrogen (3.0). Hydrogen has only value of 2.1.

1.3 Electromagnetic Radiation

The spectrum of electromagnetic radiation range from low-energy, long-wave radio waves to high-energy, short-wave gamma rays (see table 1). Electromagnetic radiation can be represented as both wave and particle flow. Electromagnetic waves travel at the speed of light c in a vacuum. Two waves can superpose to form a wave of greater or lower amplitude. The electromagnetic radiation consists of an oscillating electric field that is perpendicular to an oscillating magnetic field. The relation between the wavelength λ and the frequency ν is described by the following formula

(1.1)

In quantum theory, the electromagnetic radiation is described as a stream of Photons. The energy of a photon is represented by following formula:

(1.2)

with the Planck's action quantum h =6,6261*10^-34Js.

Thus, the higher the frequency of radiation, the higher the energy of its photons (for example the energy of UV light is higher than the energy of visible - or infrared light).

Table 1: Ranges of electromagnetic radiation

Radiation	Wavelength	Frequency [Hz]

Radio waves	>30cm	3KHz - 300 MHz
Micro waves	1mm - 30cm	300MHz - 30 GHz
Infrared	800nm - 1mm	~1-380THz
Visible light	400-800nm	~380-800THz
Ultraviolet	3-380nm	~800THz-300PHz
X-rays	~50pm- ~10nm	30-300EHz
Gamma rays	~1- 50pm	30-300EHz

1.4 Bohr's theory

The Bohr model of the atom from 1903 paved the way for the understanding of the structure of the nuclear envelope. The model is based on the assumption that the electrons of an atom orbiting its nucleus only on certain tracks, similar to the orbit of the planets around the Sun (see fig. 1.1A). The energy of an electron is proportional to its distance from the nucleus. The circular paths on which the electrons move around the nucleus are called electron shell. These shells are named with letters (K, L, M...) or with numbers (n = 1,2, 3…). The electrons, which move on the K-Shell (n = 1) have the lowest energy level, they are in the so-called ground state. Electrons, which are located on outer paths have higher energy levels, they are in a so-called excited state. Within in the Bohr model, the energy of an electron as a function of its orbit (n) is calculated with the following equation:

(1.3)

The amount of energy, that must be supplied, in order to transfer electrons from the inner paths to outer trajectories, is equal to the amount of energy, that would have to be spent to move the electron against the attraction of the positive protons in the atomic nucleus. Conversely, a certain amount of energy is released when electrons from outer paths fall back to inner tracks. In this case, the energy is emitted in the form of a light quantum (Photon), whose frequency is calculated according to Equation 1.2. The frequency of the released energy can be calculated with equation 1.4:

(1.4)

However, the Atomic model of Bohr is in some points contrary to real measurements. For example, is it not possible to represent an electron as a wave. However, a crucial contradiction is the non-fulfilment of the Heisenberg uncertainty principle.

FIGURE 1.1

Atomic models

A: Niels Bohr's model of the atom requires that the electrons of an atom can revolve around its nucleus only on certain circular paths, similar to the planets which orbits on elliptical around the Sun.

B: In the quantum mechanical model, the electrons are not located on static orbits, rather their whereabouts are calculated based on probability densities.

1.5 Heisenberg uncertainty principle

The Heisenberg uncertainty principle states, that the position of a particle and its velocity cannot be determined simultaneously with a sufficient accuracy.

The relation between the uncertainty of the location Δx and the uncertainty of the velocity, respective of the pulse Δp is described by equation 1.5:

(1.5)

The idea of defined, discrete circuits on which the electrons move around their nucleus, is in contrast to the Heisenberg uncertainty principle, due to the fact that for an exact calculation of the electron circuits, it is a prerequisite to know the exact location and the exact velocity of the electrons at a given time.

1.6 Wave mechanics

The consideration of the light as a beam of particles (the photons) is fundamental in quantum mechanics. This approach was first introduced by Max Planck and Albert Einstein at the beginning of the 20Jh. According to Louis de Broglie, a photon can be regarded as a wave. The mass of a photon can be calculated with Einstein's equation 1.6

(1.6)

The energy of a photon is calculated using equation 1.2, in combination with equation 1.1 to get equation 1.7:

(1.7)

and by inserting eq.1.7 in eq. 1.6 is finally obtained eq.1.8:

(1.8)

However, not only the wavelength of a photon, but also those other flying particles - such as electrons - can be calculated. For that purpose, the speed of light c is replaced by the speed of the particle v (see eq. 1.9):

(1.9)

1.7 Quantum mechanical orbital model

In the quantum mechanical model, the electrons will not be arranged on defined circuits, rather their whereabouts, called Orbitals, are calculated based on probability densities. Orbitals are described mathematically as wave functions. The wave functions of an electron in an atom can be calculated using the Schrödinger equation (eq. 1.10).

(1.10)

The Schrödinger equation is a differential equation and has therefore infinitely many solutions. But only the physical solutions, that meet certain conditions are reasonable. Each solution is equivalent to a specific value of the energy E. The square of the absolute value of the Wave function |ψ|² indicates the probability density of the particle (after Max Born).

A schematic representation of the electron orbitals is shown in Figure 1B. There are three so called quantum numbers to describe the orbitals. The principal quantum number n = 1, 2, 3...is roughly equivalent to the electron shell in the Bohr 's atom model. Each of these main shells can be divided into sub shells. The number of sub shells corresponds to the number of the principal quantum number, thus, if n = 1, there is exactly one subshell, and so on. Each sub shell is named by a secondary quantum number l = 1, 2, 3...or with a letter l = s, p, d, f. The number of orbitals in a subshell is calculated as 2l + 1 (see table 2). Orbitals can be associated with two nuclei and therefore form chemical bonds.

Table 2: The number of orbitals of different Electron shells.

n	l	Sub shell	Orbital (2l+1)
1	0	1s	1
2	0	2s	1
	1	2p	3
3	0	3s	1
	1	3p	3
	2	3d	5

2 MOLECULES

Molecules consists of one or more atoms of the same or of different type held together by chemical bonds. The chemical linkage can be different. There are covalent bonds, ionic bonds, hydrogen bonds and bonds due to London forces (see fig. 2.1).

2.1 Covalent Bonds

The covalent bond is formed between two atoms when a pair of electrons, one of each atom, sharing a common orbital. Together they form a so-called electron pair (see fig. 2.1A). This electron pair (covalent) bond is a strong chemical connection, therefore a relatively large amount of energy is required to break the bond between them. The covalent bond is predominantly build between nonmetals atoms. There are single -, double -, and triple covalent bonds.

The energy of a covalent bond is proportional to the number of participating electron pairs. The bond between two electrons within molecules is symbolized by lines drawn between them, representing a saturated valence for each atom. The electrons of some molecules (e.g. 'benzene') form a kind of resonance structure, which achieves a stabilization by a permanent relocation of the participating electrons. This effect is called mesomerism or resonance. In such molecules, the exact positions of the double and single bonds between the participating atoms is not exact predictable. For this reason, the bonds between the atoms of such molecules will be drawn with a dotted line instead of as usual with continues line. In ring structures the resonance is indicated by a circle. The total number of possible bonds, which can be established by a single atom at once, depends on its number of free single occupied orbitals. For example, hydrogen has only one single occupied s-orbital and therefore can only form one bond. On the contrary, nitrogen has three single occupied p-orbitals and may form up to three bonds. According to Lewis [1] the atoms attempt to achieve the so-called noble gas configuration, by bonding in such a way that each atom has eight electrons in its valence shell. In the case of hydrogen, the goal is to achieve the helium configuration (2 electron configuration), for all other atoms (> 2 electrons) it is the aim of achieving the 8-Elektronenkonfiguartion (octet rule) of the other noble gases. Atoms can also form multiple bonds (see table 3).

Table 3: Examples of multiple bonds between atoms

Bond	Example	Description
Single	C-H	Between carbon C and hydrogen H
Double	C=C	Between 2 carbon atoms
Double	N≡N	Between 2 nitrogen atoms N

2.2 Ionic Bonds

The Ionic bond is formed between metals and non-metals and their bond strength is approximately in the range of the covalent bonds. This type of bond is in contrast to the covalent bonds, undirected and has a longer range. It is usually formed between atoms in salts. Figure 2.1B shows schematically the binding between a sodium (Na⁺)-Ion and a potassium (K^-)-Ion. The force behind the Ionic bonds are intramolecular electrostatic interactions. The strength of electrostatic interactions and thus the strength of an ionic bond can be described by Coulomb's law (see eq. 2.1):

(2.1)

ε: the electrical conductivity.

q: the specific charge of the ions.

r: symbolizes the distance between the ions.

The strength of the electrostatic attraction between two ions and therefore their bond strength is directly proportional to the product of their charges and inversely proportional to the square of their distance.

2.3 Intermolecular interactions

Beside intramolecular interactions, as the already mentioned ion bonds, there are also Intermolecular interactions, i.e. electrostatic bonds between two different molecules.

2.3.1 Electric Dipoles

An electric dipole can build within a molecule that consists of atoms of different electronegativity. The dipole takes effect, if the atom with the higher electronegativity pulls the shared binding electrons more closely to its nucleus. As a result of the electron movement a locally negative charge is built at the side of the atom with the higher electronegativity and in contrast due to the loss of its electron, remains on the other side a positive charge. A distinction is made between temporary - and permanent dipoles. Temporary dipoles may occur through short-term charge shifting, induced from other molecules (see more details on that in the Chapter: London interactions). In addition to the requirements for building temporary dipole's, the building of a permanent dipole needs a specific geometrical arrangement of the involved atoms within the molecule. That is why, the carbon dioxide molecule (CO₂) is not a permanent dipole, because its atoms are arranged in a single row respective in single layer, which in turn results in a charge neutralization (see fig. 2.1 C, right). On the contrary, a typical example of a permanent dipole, is H₂O. The oxygen in the water molecule pulls the shared electron pair closely to its nucleus, inducing therefore a negative partial charge on its side, as well as positive partial charges on the hydrogen atoms in the oxygen atom (see fig. 2.1 C, left). Due to the asymmetric arrangement of the involved atoms, the charge shifting within the molecule remains permanently.

2.3.2 Hydrogen Bonds

Hydrogen bonds are the strongest intramolecular electrostatic interactions. As the name implies, the hydrogen atom is bridging between two strong electronegative atoms. Roughly speaking, both electronegative atoms involved in the bond are trying to pull the hydrogen atom over to their side. Generally, formulated: ^δ-X - H^δ+ .... ^δ-Y, where X and Y symbolize one of the following elements: Fluorine (F), oxygen (O), or nitrogen (N) (see fig. 2.1D). The hydrogen bonds in the Helix structure of proteins (see Chapter 9.3) are mainly responsible for the formation of the proteins 3-dimensional structure (see Chapter 9.3).

2.3.3 London dispersion force

London dispersion forces are very weak, when compared to the above described bonds. They are induced by a molecule, which forms a temporary dipole (see fig. 2.1E, step 1). Now, this temporary dipole is able to polarize other nearby unipolar molecules, which induce the formation of a dipole in these molecules too (see fig. 2.1E, step 2). As a result, both molecules attract each other (see fig. 2.1E, step 3). Now, if within the initial dipole molecule, the direction of the charge is changed, the induced molecule will follow with a corresponding turnaround. Both molecules are more or less connected, in a way, that their interaction never reach zero, but leads to the London dispersion forces. The strength of these forces depends on the polarizing capability of both molecules.

FIGURE 2.1

A: Schematic representation of the four existing covalent bonds in a methane molecule. The bonds between the carbon atom C and its 4 hydrogens H persist, because 4 electron pairs, two of each atom sharing the same orbital: The green one's are from the H-atom and the red one from the C-atom.

B: Schematic diagram of ionic bond in the NaCl molecule.

C: Schematic representation of the binding geometry in H₂0 und CO₂. Due to the binding angle (~ 104.5 °) between the two hydrogen atoms, the H₂0 molecule forms a dipole (left), whereas carbon dioxide is not able to form a dipole (right). The negative charge of the O-atom and the positive charge of the C- atom are not effecting each other, because the angle between the two O atoms is approximately180°.

D: Schematic representation of a hydrogen bond. The H-bridge in the example is built between the N-atom of ammonia and the O-atom of the water molecule. The bond is stick together, because a H-atom of ammonia (donor) is more attracted from the electronegative O-atom of the water molecule(acceptor), than by the N atom of ammonia.

E: Schematic representation of the London dispersion forces between two molecules. At first, the molecule m1 forms a temporary dipole (step 1), then m1 induces a dipole in a nonpolar molecule m2, which is located closely to the m1 molecule. As a result, both molecules attract each other (step 3).

3 ACIDS AND BASES

Acids, also called Proton donors are substances which can donate protons (H⁺) to their environment. Bases take up protons, and are therefore called Proton acceptors. In general terms, an acid act as electron-pair acceptor and a base as an electron-pair donor. A base has a lone electron-pair, which can be used by an acid particle to build a pair of electrons, to establish a covalent bond.

3.1 pH-Value

H₂O dissociates in H⁺- und OH-Ions. The equilibrium constant K_W is the ion product of water and is at 25° C: 10^-14 mol²/l² (eq. 3.1).

(3.1)

The pH-value is calculated using EQ. 3.2:

(3.2)

An acidic solution has a pH-Value between >=0 and <7, a basic solution between >7 and <=14. At pH=7 is the solution neutral.

3.2 pK_S-Value und pK_B- Value

In contrast to strong acids such as hydrochloric acid (HCl) and sulfuric acid (H₂SO₄), weak acids such as the acetic acid (CH₃CH₂OH) or amino acids do not dissociate completely in aqueous solutions. For the dissociation process of weak acids in dilute aqueous solutions, the amount of water is constant. Therefore, the dissociation constant K_s is calculated by means of the law of mass action. In equation 3.3, the calculation of K_s is exemplarily shown for acetic acid.

(3.3)

The pK_S-Value can be calculated with the following formula:

(3.4)

In case of weak acids, the dissociation constant K_S is small, therefore the pH-Value can be calculated with the Equation 3.5:

(3.5)

Thus, the pK_S-Value is a measure of the strength of a weak acid, the smaller this value, the stronger is the corresponding acid. Analog to acids, a K_B-Value and pK_B-Value can be calculated for bases.

4 CHEMICAL REACTIONS

During a chemical reaction, the reactants (educts, chemical precursor) are converted into the products. A chemical reaction can be written as a reaction equation. The reactants will be written on the left side and the products on the right side, between them is an arrow, indicating the direction of the chemical reaction (eq. 4.1).

(4.1)

A double arrow (↔) indicates, that the chemical reaction is at equilibrium. A reaction equation must be stoichiometrically correct, i.e. an equal number of atoms must be located on both sides of the reaction arrow. Chemical reactions follow the laws of thermodynamics. Every chemical reaction can be in principle proceed in both directions, but mostly the balance is on the side of the reactants, therefore no product formation takes place. By calculating the change in the Gibbs Free Energy (Gibbs enthalpy) ΔG (eq. 4.2) of a chemical reaction, it can be determined, whether the reaction will proceed under the given physical conditions (pressure, temperature) and at given concentrations of the reactants and the products, without the presence of a catalyst (see table 4).

(4.2)

ΔH stands for the change of enthalpy, i.e. the increase or decrease of the internal energy of the system and ΔS stands for the change in the entropy of the system. The enthalpy H is defined by the following equation 4.3:

(4.3)

pV is the volume work, i.e., the energy that is needed, to change the size of a given volume on a different size. U is the complete internal energy in the entire thermodynamic system. The change in the internal energy ΔU is the product of heat and work and is constant in a closed system (first law of ther-30 modynamics). The entropy S describes the state of organization of all components in the system and can be the calculated with the Boltzmann equation (eq. 4.4).

(4.4)

k_B (1,381•10^-23 J/K) is the Boltzmann constant;

Ω is the number of different possibilities, that the atoms can arrange themselves in a closed system in space. The entropy increase with the existing number of possibilities for arranging the atoms in the available space, or in other words the higher the disorder in the system, the higher is the entropy.

Table 4: Gibbs Free Energy (Gibbs enthalpy) ΔG

ΔG	Reaction	Description
<0	exogenous	Proceed without any additional energy or catalysts.
=0	equilibrium	Rate of return and reverse reaction is equal
>0	endogen	Requires energy to proceed from outside (E.g. heat), or catalysts.

A chemical reaction is exogenous, if the Gibbs free energy ΔG decreases during the reaction. The decline of ΔG is initialized by change of entropy of ΔS, an increase in temperature and/ or a change in enthalpy ΔH.

5 FURTHER READING

5.1 Textbooks

Atkins, P.W., " Physical Chemistry", Oxford University Press; 5th Revised edition, 1994
Mortimer, C.E., "Chemistry“, Wadsworth Pub Co; 6 Sub edition,1986.

PART II BIOLOGY BASICS

Biology, besides biochemistry is one of the classic disciplines in life sciences. For the computational biologists are especially the subareas of cell biology, microbiology and molecular biology of relevance, however, the latter more represents a branch of biochemistry. In cell biology, the cell as the smallest living unit of organisms is in the focus of interest. There are prokaryotic cells (Procyten) and eukaryotic cells (Eucyten). Living organisms, which consist of prokaryotic cells, called prokaryotes, and those that are built up from eukaryotic cells, called eukaryotes. Prokaryotes as well as eukaryotes may occur as single cells (bacterial cells respective yeast cells), as well as multi-cellular organisms. The human body consists of several trillions of cells.

6 PROKARYOTES

The prokaryotes are divided in Bacteria and Archaea. Their cells consist of cytoplasm, which is surrounded by a single cell membrane. All metabolic reactions of the cell occur in the cytoplasm. The DNA of a prokaryotic cell is not surrounded by a membrane, rather the DNA is concentrated in a central area of the cell called the nucleoid.

7 EUKARYOTES

Examples of eukaryotes are animals, humans, fungi and plants. Eukaryotic cells like the prokaryotic cells consist of cytoplasm with a surrounding cell membrane. However, their DNA is not free in the cytoplasm. The eukaryotic DNA is wounded to proteins called histones. The histones are part of the chromosomes in the cell nucleus (nucleus) of the cell. In addition to the nucleus, the eukaryotic cell contains several characteristic cell compartments (see table 5). The usage of compartments is advantageous, because within a compartment closely related metabolic steps can efficiently be carry out, without bridging long diffusion paths through the cytosol. Plant cells have some additional peculiarities, like an extra cell wall and specific cell compartments like, chloroplasts, vacuoles and Glyoxysomes.

Table 5: Compartments of eukaryotic cells

Compartment	Membrane	Important functions and features
Nucleus	double	DNA-Replication, Transcription
Mitochondrion	double	Synthesis of ATP, contain their own DNA
Peroxisomes	single	β-Oxidation
Endoplasmic reticulum (ER)	single	Either: Lipid synthesis, intracellular Ca+ storage; Or: studded with ribosomes (protein biosynthesis)
Golgi apparatus	single	sugar phosphorylation of proteins. Zell transport
Lysosomes	single	Cellular digest
Vacuoles ¹	single	Storage of nutrients, depot for waste
Glyoxysomes ¹	single	Glyoxylat Cycle
Chloroplasts¹	double	Photosynthesis

¹ Special compartments only occurring in plants.

The main differences between Prokaryotes and Eukaryotes are summarized in table 6.

Table 6: Different Compartments in prokaryotes and eukaryotes cells

	Prokaryoten	Eukaryoten
DNA	Without surrounding mem- brane in the Cytosol	Within the nucleus
Ribosomes	70S¹	80S¹
Size	10-100μm	1-10μm

¹S stands for Svedberg, the unit for the sedimentation coefficient. This coefficient depends on the mass and shape of the sedimenting particles.

8 FURTHER READING

8.1 Textbooks

Alberts, B., Bray, D., Lewis, M., Raff, M., Roberts, K. and Watson, J.D. "The Cell" Garland Science; 3 edition, 1994.

PART III BIOCHEMISTRY

Biochemistry is the study of the molecular relationships of the most important biological and chemical processes in organisms. Therefore, a basic understanding of Biochemistry is a prerequisite for successful entry into the bioinformatics. The first part of this chapter introduces the four main substance classes of biochemistry: proteins, carbohydrates, lipids, and nucleic acids. The second part of the book gives a brief overview of molecular genetics. In the final part the basic metabolic processes will be discussed and selected metabolic pathways will be presented.

9 PROTEINS

Proteins are built from amino acids. They fulfil a variety of biological functions. As structural proteins they providing both stability and elasticity of the organism and in their functional role as enzymes, they speed up the most essential chemical reactions of the metabolism.

9.1 Amino Acids

Amino acids are organic compounds, consisting of one or more carboxyl groups (-COOH) and containing least one amino group (-NH₂). The covalent bonds of the so-called α-C-Atom of an amino acid with a carboxyl group as well as with and an amino group is characteristic for the protein building α-amino acids. The covalently-bound carbon atoms of the backbone (main chain) of an amino acid molecule are named with small Greek letters, starting with the α for the first C-Atom after the carboxyl group. The 2D-Structure of the 20 most important proteinogenic α-Amino acids are shown in Figure 3. These amino acids can be divided into different groups according to their different side chains.

FIGURE 9.1

A: α-Amino acids with non-polar side chains.

B: α-Amino acids with polar side chains.

C: α- Amino acids with basic side chains.

D: α- Amino acids with acidic side chains.

In amino acid sequences, an individual amino acid is usually represented by its single letter code or less common by its three-letter code (see fig. 9.1). The side chain determines the chemical function of an amino acid. The side chains are grouped based on their functional groups. A distinction is made between polar groups like Sulfhydryl groups (-SH) or hydroxyl groups (-OH) with hydrophilic properties and longer non-polar carbon chains with hydrophobic properties. The hydrophobicity index [1] specifies the strength of the hydrophobicity (water displacement) of the side chains of an amino acids.

The more negative this value, the more hydrophilic (water-loving) is the side chain. Furthermore, a distinction can be made between basic side chains (NH, NH₂) and acidic side chains (-COOH). The pH-Value of the surrounding solvent determines the chemical state of the functional groups. As a function of the pH of the solution for each functional group in an amino acid a specific dissociation constant K_S for acidic (e.g. COOH) or K_B for basic (e.g.-NH₂) groups can be calculated. Thus, depending on the pH of the solvent a functional group are either neutral, negative, or positively charged. In acidic solutions most of the functional groups are protonated, whereas in basic solutions they are deprotonated. At a specific pH-Value, the total number of negative charges equals the total number of the positive charges, thus the net charge of the entire amino acid molecule is zero. This special pH-Value is called the isoelectric point (pI) of an amino acid. In Organisms, most amino acids can be synthesized from Ketoacids. Amino acids, that cannot be synthesized by enzymes and must be supplied by food called essential amino acids. For the adult people, the following amino acids are essential: Leucine, Isoleucine, Methionine, Lysine, Phenylalanine, Valine, Tryptophan and Threonine.

9.2 Peptides

Amino acids chains, which are composed of at least 2 and not more than 100 amino acids called a peptide. A molecule composed of two connected amino acids is called a Dipeptide, of three connected amino acids a Tripeptide and so on. Peptides, which are composed of several amino acids also known as Oligopeptides. Longer chains of amino acids (> 100 amino acid molecules) are called Proteins. The bond between two amino acids is called a peptide bond, which is formed by a chemical condensation reaction, i.e. by dissociation of a water molecule. The cleavage of a peptide bond is in turn done through hydrolysis, i.e. the cleavage of the molecule is done by the Incorporation of a water molecule (see fig. 9.2A).

FIGURE 9.2

Table of Contents