Laboratory of Biometry and Bioinformatics,
University of Tokyo, JAPAN.
Plenary Talk Title: Statistical modeling of molecular evolution and its potential
Abstract: Statistical modeling of molecular evolution provides the clues to identify the adaptive evolution and biodiversity. In a micro scale, proteins adapt to novel environments, and the strength of selection pressure vary among the regions of the protein. By introducing Potts model as a prior on the spatial distribution of diversifying selection, the model of protein sequence evolution identified a diversifying region of influenza HA1 protein, which overlapped the antigenic sites. In a global scale, species comprise a biological community. The distribution of divergence times between member species of a community reflects the pattern of species composition. A newly defined effective species sampling proportion explains the amount of the difference between the divergence time distributions of the community and that of the meta-community, assuming random species sampling. The ratio of its maximum-likelihood estimate to the observed sampling proportion becomes an index of phylogenetic skew (PS), which can be used to detect candidate communities with unique species compositions from a large number of communities. Finally, we note that the rate of molecular evolution is the product of the mutation rate and the proportion of neutral mutations. The former factor depends on generation length and exposure to mutagens, while the latter depends on the strength of functional constraints and selection. A multiplicative gene-by-branch ANOVA model provides reliable estimates of divergence times and mutation rates. Regression on the gene-specific rates of molecular evolution allowed us to predict ancestral states of traits related to life history, social/reproductive behavior and food preference.
Dept. of Bioscience and Bioinformatics, Kyushu Institute of Technology (KIT), JAPAN http://www.bio.kyutech.ac.jp/~kurata/index-e.html
Plenary Talk: Virtual cell metabolism for systems and synthetic biology
Abstract: In systems and synthetic biology, computer simulation of the metabolic networks of a cell is a powerful method to predict their function and phenotype under different culture conditions and genetic modifications. To analyze or design metabolic networks, it is critical to understand the relationship between network structure and function, the mechanism through which biological parts or biomolecules are assembled into building blocks or functional networks. Understanding the mechanism of assembling functional networks would help us develop a methodology for analyzing large-scale networks and design robust biochemical networks. Based on such bottom-up approach, we developed a detailed kinetic model for the central carbon metabolism of E. coli in a batch culture, which includes the glycolytic pathway, tricarboxylic acid cycle, pentose phosphate pathway, Entner-Doudoroff pathway, anaplerotic pathway, glyoxylate shunt, oxidative phosphorylation, phosphotransferase system (Pts), non-Pts, and metabolic gene regulations by four protein transcription factors: cAMP receptor, catabolite repressor/activator, pyruvate dehydrogenase complex repressor, and acetate operon repressor. It enables the rational design of a metabolic network, which contributes to enhanced production of useful metabolites and proteins.
National Institute of Biomedical Innovation, Osaka University, JAPAN.
Plenary Talk Title: Computational systems approaches to drug discovery and development
Abstract: Several reasons can be attributed to costly late-phase attritions in drug development but the “wrong target” and the “wrong compound” are the main culprits. Computational methods can improve target identification and validation at an early stage of drug development by providing a better understanding of the biology of the target network. Similarly, predicting the toxicity (as well as the pharmacokinetic profile) of a new compound should help avoid unexpected adverse events. To establish systems approaches to drug discovery, we have developed an integrated database for genes, proteins, diseases and chemical compounds. Using our TargetMine system (http://targetmine.mizuguchilab.org), an integrated data warehouse for target prioritization, we have identified genes/proteins that would play key roles in infectious and pulmonary diseases and subsequently verified these hypotheses by direct experimentation. We have also developed a toxicogenomics analysis platform (Toxygates and the Adjuvant database; http://toxygates.nibiohn.go.jp). Based on these and other databases, machine-learning models can be built for predicting protein structure, function and interaction. In my talk, I will describe some of these resources, with specific applications in target discovery, safety prediction and pharmacokinetic modelling.
Director, Institute of Bioinformatics , Zhejiang University, China,
Title of Talk: Mixed Linear Model Approaches of Association Mapping for Complex Traits Based on Omics Variants
Complex traits are controlled by four-omics variants of SNPs, transcripts, proteins, and metabolites. Most of association studies for complex traits have been ignoring dominance, epistasis and environment interactions. We proposed mixed linear model approaches for association mapping SNPs (QTSs), transcripts (QTTs), proteins (QTPs), and metabolites (QTMs) to complex traits. Precise prediction for genetic architecture of complex traits has been impeded by the limited understanding on genetic effects of complex traits, especially on locus-by-locus interaction (GxG) and locus-by-environment interaction (GxE). The analysis of large omics datasets, especially two-loci interaction analyses, involves intensive computation. A GPU-based mapping software (QTXNetwork) has been developed for detecting multiple loci on large-scale omics data, and for estimating variance components of genetic effects. By analyzing datasets of SNPs and transcripts for mouse and drosophila datasets, we demonstrated that unbiased estimation could be obtained for genetic effects of causal loci. Transcript association can efficiently detect causal transcript loci on complex traits (QTTs), and on other transcripts (tQTTs). Complicated genetic networks of transcripts controlled by other omics variables can also be revealed for SNPs (tQTSs), proteins (tQTPs), and metabolites (tQTMs).
Association mapping for startle in Drosophila revealed high heritability for 85 QTTs (0.996) and 48 QTSs (0.935). The QTTs were also controlled by other 86 tQTTs (0.804 ~ 0.998) and 25 tQTSs (0.115 ~ 0.423). Both real data analyses and Monte Carlo simulations demonstrated that genetic effects and environment interaction effects could be estimated with no bias and high statistical power by using the proposed approaches. We conducted comparative GWASs for total cholesterol by full model and additive models. QTS-analysis identified 13 individual loci and 3 pairs of epistasis loci by using full model, and detected 14 loci by additive model. PLINK-analysis identified two loci and GCTA-analysis detected only one locus with genome-wide significance. Full model identified three previously reported genes as well as several new genes. Analyses of cholesterol data and simulation studies revealed that the full model performs were better than the additive-model performs in terms of detecting power and unbiased estimations of genetic variants of complex traits. By using full genetic model, Alcohol dependence symptom count (ADSC) was analyzed for detecting 20 highly significant QTSs, including four in previously reported genes (ADH1B, PKNOX2, CPE, and KCNB2), 4 novel genes (RGS6, FMN1, NRM, and BPTF), 2 noncoding RNA, and 2 epistasis loci. The detected QTSs contributed to about 20% of total heritability, in which dominance and epistasis effects accounted for over 50%.
WGAS was conducted for yield traits of cotton cultivars. There were 75 SNP loci detected with high heritability (61.73% ~ 98.71%), among which largely due to environmental interaction for lint yield (19.22%) and boll number (24.66%), and also to epistasis for boll weight (31.70%) and lint percentage (88.63%). As an often cross-pollinated crop, there are a small number of heterozygous genotypes (~7.0%) in the mapping cultivars, while the dominance-related heritabilities () were the major components (0.54 ~ 0.95) of total heritability () for four yield traits. It was revealed importance of heterozygote advantage for yield traits of cotton cultivars at the molecular level, and dominance effects of heterozygotes to genotypic variation of yield traits. These results could be useful for expediting cotton yield improvement by developing appropriate breeding strategies according to specific genetic basis. Leaf traits (leaf length, leaf width and upper leaf angle) of maize were analyzed for 5000 lines of NAM population derived by USDA. Analyses with full model identified 38 ~ 47 loci and multi-loci additive model identified 39~50 loci. Estimated total heritability varied from 64.32~79.06% for full model, but 19.36 ~ 48.86% for multi-loci additive model. Estimated heritability due to dominance and dominance related epistasis interaction effects was 16.00% ~ 56.91% for full model. Phenotypic variation of upper leaf angle was mostly controlled by additive (a) and additive × additive (aa) epistasis effects, but phenotypic variations for leaf length and leaf width were mostly controlled by dominance related epistasis interactions. The optimal genotype combinations were predicted for each of traits under 4 different environments based on the estimated genotypic effects to facilitate maker-assisted selection for leaf traits. It was revealed that the dominance and epistasis effects had large contributions to heritability, however environmental interactions were relatively unimportant for the leaf traits.
Department of Bioinformatics, Zhejiang University
Plenary Talk: Integrative bioinformatics approaches towards whole plant cell modeling
Abstract: Multi-omics data brings us a challenge to develop appropriate integrateive bioinformatics approaches to model complex biological systems at spatial and temporal scales. In this talk, we will describe multi-omics data available for rice cellular interactome modeling. Biological networks on multiple levels such as gene regulations, protein interactions, noncoding RNA regulations and metabolic reactions are reconstructed. A systematic identification and quantification of rice proteins in various tissues and organs are introduced. To better understand the interactions of proteins in rice, we developed PRIN, a predicted rice interactome network. We presented a novel integrative approach (PSI) that derives the wisdom of multiple specialized predictors via a joint-approach of group decision making strategy and machine learning methods to achieve better prediction results of protein subcellular localization. A genome-wide multiple level of interactome model of rice is integratively built. Furthermore, a database RiceNetDB is developed for systematically storing and retrieving the genome-scale multi-level network of rice to facilitate biomolecular regulatory analysis and gene-metabolite mapping. A virtual rice cell model in three dimensions will be developed via international collaborations. Our ultimate goal is to build such a whole plant cell model that can describe how phenotype arises from genotype.
Indian Statistical Institute (ISI), India
Plenary Talk Title: Improving the Efficiency of Experiments : Role of Covariates in Design Selection relating to Health Issues
Abstract: In this presentation, I will focus on (i) `exercise’ data for healthy active males (44) and females (43), and also on (ii) ‘health’ data involving three drugs (two antibiotics and one control). In both the illustrative examples, my purpose would be to focus on efficient use of covariates. In (i), there are four covariates viz., Heart Rate, Age, Height and Weight. The response variable is the V02 Max, measured in a suitable unit. The presentation will reveal the possibility of increasing the efficiency by a suitable choice of the underlying covariate design. In (ii) there are 30 patients and each one possesses original pre-treatment score [count of bacilli] in the study of leprosy. The drugs are to be applied to ten patients each and the problem relates to efficient classification of the patients into three treatment groups. We believe these exercises will be instructional to the experimenters in exploring the possibility of existence of improved experimental designs in situations involving use of covariates.
Director, National Institute of Biomedical Genomics (NIBG), Netaji Subhas Sanatorium, Kalyani & Professor, ISI, Kolkata, India
Talk Title: Identifying Drivers in Cancer Genomes: Computational and Statistical Challenges
Abstract: The search of genes that drive cancer is important and is now a global endeavor. The search comprises massively-parallel DNA sequencing during which very large data sets are generated. The statistical challenge that one is confronted with is to manage and find patterns in these data that are of statistical and biological significance. Several approaches are being proposed for analyses of these large data sets. We have devised a statistical methodology for detection of somatic variants. In this talk, I shall describe the expanse of the sequence data, methodologies of managing these data and the statistical methodology for variant-calling developed by us. Finally, I shall provide some relevant results of our analysis of oral cancer genomes and the implications of these findings for precision medicine.
Department of Applied Statistics
East West University
Plenary Talk Title: Generalized Linear Models for Bivariate Count Data
Abstract: Dependence in outcome variables poses formidable difficulty in various fields including health sciences, traffic accidents, economics, actuarial science, social sciences, environmental studies, genetic studies, etc. A widely studied topic in bivariate modeling of count data is traffic accidents and number of fatalities. Bivariate Poisson distribution is commonly employed for modelling bivariate count data. Some attempts have been made in the past to develop bivariate Poisson model using a trivariate reduction method. Bivariate data provide the repeated measures for two time points or two events on same experimental units and the modeling of repeated measures data is a formidable challenge to researchers and users due to the fact that we need to take into account both the relationships between outcome variables as well as between outcome variables and covariates. In other words, dependence in outcome variables needs to be considered. Another important aspect concerning the Poisson models is the under or over-dispersion due to violation of the equality of mean and variance. A third problem associated with analysis of count data is truncation in the data for univariate or bivariate cases. In this paper, several models are developed using the extended generalized linear models for bivariate count data to address the problems of bivariate count modelling of correlated outcomes, truncation and under or over-dispersion. In addition, test procedures are also shown for both goodness of fit and under or over-dispersion. The proposed models and test procedures are illustrated with examples.
Infectious Disease Division,
International Center for Diarrhoeal Diseases Resaerch, Bangladesh (ICDDR,B) www.icddrb.org
Plenary Talk: Microbial Forensics : Genomics to Metagenomics
Abstract: The term Forensic refers to using science and technology to investigate and establish true facts in criminal or civil courts of law. Microbial Forensics combines principles of public health epidemiology to identify patterns in a disease outbreak, determine which pathogen may be involved, and trace the organism to its source and how it was spreading. Historically, Microbial Forensics as a scientific discipline has evolved to reveal the source attribute by combining circumstantial evidences, physical, chemical, and related functional biological attributes, including the DNA fingerprints of an infectious agent. Like in forensics of criminology identifying a criminal, microbial forensics helps to identify the source of an infectious agent, including the perpetrator(s) in case of a bioterrorism act. Deliberate or inadvertent use of biological agents poses substantial dangers to individuals, the environment, the economy, and the public health. Forensics is not all about identifying perpetrator(s), but also to ensure that no innocent person is entangled, convicted and sentenced for what he or she has no relevance to. While Microbial Forensic as a discipline is in the developmental stage with many scientific challenges to overcome, the Whole Genome Sequencing bioinformatics and computational biology-based single nucleotide polymorphisms, for example, have revolutionized this science in respect to robustness and precision. This paper will focus on Microbial Forensics with emphasis on the Public Health aspects of Genomics and Metagenomics in Bangladesh.