X-Meeting 2015 - 11th International Conference of th AB3C + Brazilian Symposium of Bioinformatics

Access the X-meeting 2015 photo album here


Proceedings X-Meeting 2015





X-Meeting Survey




John Mattick

RNA is the computational engine of human development and cognition

High throughput sequencing and advanced imaging techniques have revealed that the vast majority of the genomes of mammals and other complex organisms is dynamically transcribed during development to produce tens if not hundreds of thousands of short and long non-protein-coding RNAs that show highly specific expression patterns and subcellular locations. Increasing numbers of these RNAs are being shown to have functions at many different levels of gene expression, including translational control and the guidance of epigenetic processes that underpin development, physiological adaptation, cognition and transgenerational communication, augmented by the superimposition of plasticity by RNA editing, RNA modification and retrotransposon mobilization. This in turn requires reassessment of the nature, scaling and hierarchies of the regulatory systems and processes that control the 4-dimensional assembly and cognitive capacities of complex organisms.


Executive Director of the Garvan Institute of Medical Research,

Opening Lecture: 
“The extraordinary complexity of the human coding and noncoding transcriptome” 

Dr. Guilherme Oliveira

Biomining in the Amazon

Bacteria are extremely versatile, being able to survive in extreme environments and displaying a broad range of phenotypes and enzymatic activities. Mining environments are extreme environments where bacterial activities evolved to enable their survival using various metabolic approaches to achieve, for example, energy production. Bacterial activity in metal rich fields can be naturally observed by the formation of acid drainage, for example. Despite the isolation of some bioleaching bacteria the current knowledge on the microbial diversity and the metabolic pathways involved in this process is still limited, mostly due to the difficulty in cultivating the complex community that inhabit these environments. In this work we will describe microorganisms that have been identified in active mining sites located at the Carajás region, Pará State, Brazil. We have used 16S and metagenomics approaches to describe populations. The most common phyla observed were Proteobacteria, Actinobacteria and Bacteroidetes, although with noticeable seasonal and site differences. Shotgun sequencing pointed metabolic pathways relevant to survival in the environment such as such as stress response, sulfur metabolism and iron acquisition genes. Shotgun data also demonstrated the presence of known bioleaching bacteria such as Acidimicrobium ferrooxidans, Acidiphilium cryptum, Acidithiobacillus caldus, Acidithiobacillus ferrooxidans. We have also characterized individual bacteria using two approaches. We isolated microorganisms in selective media and also conducted single cell genomics. Several genomes were sequenced and new members of the Chitinophagaceae were identified as well as species that is part of a new genus of iron-oxidizing Firmicutes: Acidibacillus. We describe new bacterial genomes and functions. Natural bacterial activities can be explored in mining operations in several ways. We describe two approaches established at the laboratory scale. We have developed a method for bioremediation of drainage sites at copper mines using a sulfidogenic bioreactor. This approach uses acid produced by sulfate-reducing bacteria to differentially precipitate metals. Additionally, we have been isolating endemic microalgae to be used as a carbon source to promote bacterial growth. The second use is on bioleaching by deploying a bacterium that oxidizes sulfur in a manner coupled to the reduction of ferric iron present in the mineral phase. This approach facilitates the recovery of associated metals such as nickel, cobalt and manganese. We have investigated the use of bacteria in low pH conditions to conduct bioleaching on rare earth materials composed of goethite and Monazitic rocks. We have success fully made rare earth metals available under both aerobic and anaerobic conditions.This work was funded by Vale.


Instituto Tecnológico Vale
Belém - BR

Patrícia Palagi

Training strategies in the old continent: Swiss and European perspectives

The computational biology skills gap is still an issue worldwide and filling it is one of the main concerns of SIB Swiss Institute of Bioinformatics and its ELIXIR partners. The SIB training programme is designed in one hand to ensure that life scientists benefit fully from bioinformatics and effectively apply it to their research projects, and in the other hand to train the next generation of competent bioinformaticians. SIB is the Swiss node of ELIXIR, the European life-sciences Infrastructure for biological Information. Together, we are defining a strategic training programme to upskill European researchers to enable effective exploitation of the data, tools, standards and compute infrastructure provided by ELIXIR partners. In this talk, I will give an overview of our common training strategies and the partnerships with the international community through GOBLET.


Swiss Institute of Bioinformatics
Geneva - CH

Robert Kuhn (course)

Genome Browser Workshop

This workshop is aimed at the biologist who is interested in exploring genomes using the University of California Santa Cruz (UCSC) Genome Browser. It is geared towards those who have little or no experience using the UCSC Genome Browser and for more advanced users who are not familiar with many of the gene-oriented browser features. Using real examples the user is guided through a step-by-step process for analyzing genes in the context of the human genome and a wide variety of genomic data. The user is shown how to use the UCSC Genome Browser for simple and more complex tasks. Tutorials, exercises and/or other informational material on using the UCSC Genome Browser will be provided.


Associate Director of the UCSC Genome Browser
Santa Cruz - USA

Jan Baumbach

Computational Breath Analysis - Non-invasive detection of biomarkers in exhaled air and bacterial vapor

Volatile organic compounds are emitted by all living cells and tissues. We seek to non-invasively 'sniff' biomarker molecules that are predictive for the biomedical fate of individual patients or cell cultures. This promises great hope to move the therapeutic windows to earlier stages of disease progression. While portable devices for exhaled volatile metabolite measurement exist, we face the traditional biomarker research barrier: A lack of robustness hinders translation to the world outside laboratories. To move from biomarker discovery to validation, from separability to predictability, we have developed several bioinformatics methods for computational breath analysis, which have the potential to redefine non-invasive biomedical decision making by rapid and cheap matching of decisive medical patterns in exhaled air. We aim to provide a supplementary diagnostic tool complementing classic urine, blood and tissue samples. In the presentation, we will review the state of the art, study some clinical application examples, highlight existing challenges, and introduce new data mining methods for identifying exhaled biomarkers.


University of Southern Denmark
Odense M DK

Régis Pomes

The Liquid State of Proteins

Although it was long thought that proteins must adopt a well-defined three-dimensional structure to perform their biological function, it has recently emerged that many proteins are at least partly disordered in their functional state. Even more remarkably, certain disordered proteins such as elastin have the capacity to self-assemble and separate into a liquid phase. In the assembled state, elastin fulfills a vital role by imparting extensibility, elastic recoil, and resilience to diverse tissues including arterial walls, skin, lung alveoli, and the uterus. Despite the biological importance of elastin and over eighty years of study, there is still no consensus model for its structure. We used high-performance computing to elucidate the microscopic structure of elastin. Molecular dynamics simulations exceeding 0.2 ms characterize the structural ensemble of elastin-like peptides. Results demonstrate that the hydrophobic domains of elastin are structurally disordered even when assembled together, like a bag of snakes or a plate of spaghetti. Consistent with the entropic nature of elastic recoil, the aggregated state is stabilized both by the hydrophobic effect and by an increase in conformational entropy upon self-assembly. These findings defy conventional wisdom regarding protein folding and disorder: (i) although the peptide side-chains are hydrophobic, they do not form a hydrophobic core; (ii) although the structure of elastin aggregates is maximally disordered, it is not random; and (iii) although the polypeptide backbone forms hydrogen-bonded turns, it remains significantly hydrated. This highly-disordered state underlies the two remarkable properties of elastin, its capacity to separate into a liquid phase and to undergo elastic recoil. As such, the unified picture obtained from this work resolves a long-standing controversy regarding the structure of elastin. The fact that polypeptide chains can aggregate yet retain functionally-essential conformational entropy is of broad relevance to the study of both protein disorder and protein phase separation. The structural ensemble of the elastin-like aggregate obtained here provides the first atomistic view into what may be called the liquid state of proteins.


Hospital for Sick Children
Toronto - CA

Boris Guennewig (course)

Current methodologies in transcriptome analysis

Next generation RNA-Sequencing (RNA-Seq) is nowadays applied ubiquitously in biological and medical research. The most common application is the detection of differentially expressed genes (DEGs) facilitated through abundance estimation of the complete transcriptome at a given time-point in a sample. These abundance estimations are generated from various tools, counting or estimating the amount of reads associated with an interval of the genome or transcriptome described through an annotation. The resulting abundance matrix is barely representing the underlying complexity of the process from i) experimental design (library depth & replication), ii) adapter and quality trimming, iii) alignment of reads, iv) abundance estimation and v) differential expression analysis. Each of these steps contains a multitude of variables affecting the outcome of the final abundance matrix and subsequently the results of the DEG analysis, with its up-following ontology, co-expression and pathway analysis. In this workshop I provide an overview of the current methodologies in transcriptome analysis. This workshop is aimed at the biologist or computational biologist who is interested in exploring high throughput transcriptome data. Topics covered will be experimental design, quality control, alignments and quantification (day 1); current annotations, differential expression analysis and batch effect control (day 2); de novo assembly, alternative splicing, editing and circRNAs (day 3).


Garvan Institute of Medical Research
 Sydney - AU

Martin Smith

The modular transcriptome: unraveling a network of functional, structured non-coding RNA domains

The majority (>80%) of our genome is dynamically transcribed into RNA in a developmentally coordinated and tissue-specific manner, producing an astounding diversity of processed non-coding transcripts. Identifying the precise molecular mechanisms implicating lncRNAs is crucial to the advancement of genomics and personalised medicine, as exemplified by the fact that most reported genetic variants associated to complex diseases occur in non-coding regions of the genome with no evidence of evolutionary sequence conservation. However, genome-wide functional annotation of lncRNAs has been limited by insufficient measures of purifying selection as well as unreliable structural predictions. We have recently exposed how over 20% of mammalian genomes present the hallmarks of purifying natural selection at the level of RNA secondary structure via comparative genomics [1]. Here, we expand these findings by revealing how a majority of these conserved RNA structure motifs present structural homologs throughout the human genome, and so with limited sequence similarity. Under the hypothesis that the structural diversity of lncRNAs serves as a modular scaffold for the recruitment and targeting of epigenetic effector complexes, amongst others, we propose that these RNA structure motifs form a network of functional domains for the recruitment of specific RNA-binding proteins. We are assigning specific functions to these motifs two-fold: (i) through the association of RNA structure motif-harbouring transcripts enriched in RNA immuno-precipitation data (RIPseq) targeting epigenetic regulatory proteins; and (ii) through a novel program for the identification of common RNA structures within a subset of sequences. The latter identifies novel and statistically significant clusters of common RNA structure motifs in RIPseq data, despite the lack of substantial sequence conservation. As more targeted sequencing data become available, these techniques will provide a tangible means of assigning biological function the complex and pervasive noncoding transcriptome.


Garvan Institute of Medical Research
 Sydney - AU

Dr. Augusto Schrank,

Efforts to develop Computational Biology in Brazil: a perspective from undergraduate training programs.

Back in 2001, a report from the Biological Sciences areas at CAPES pointed out the importance of PhD training programs to capacitate esources in the Biology / Informatics sciences. The obvious astonishing growth of Biological data available and the difficulties of properly analyzing their significance would required an inexistent number of trained scientists. Also, the improvement of computational capacity and the generation of more precise data on molecule structures allowed the expansion of modeling and structural studies with the same expanding need for trained scientists. Therefore, CAPES begun the induction of PhD training Programs with focus on Bioinformatics / Computational Biology and also opened some grants possibilities. Results and evaluation of these fforts will be presented and discussed.


Universidade Federal do Rio Grande do Sul
Porto Alegre - BR

Dr. Helder Nakaya

Systems Biology: A Holistic Approach to Understanding Biological Complexity

Biological processes operate through an intricate and elaborate network of molecules. Systems biology approaches provide a comprehensive way of dissecting the complex interactions within these processes, and can lead to a better understanding of biological systems. In recent years, systems biology has been successfully applied in analyzing the immune response to a wide range of vaccines and infectious agents. However, dealing with the large amount of data generated from high-throughput techniques and the inherent complexity of the immune system represent major computational and biological challenges. This seminar highlights the recent technological and methodological advances in the field and shows how systems biology can be applied to unraveling novel insights into the molecular mechanisms of immunity.


Universidade de São Paulo
São Paulo - BR

Michelle Brazas

Supporting Trainers to Improve Bioinformatics Education Globally

A needs assessment isn’t necessary to realize that across the globe, there is a high demand for quality bioinformatics training in all domains of life science. Delivering on this demand however is not trivial. In addition to computational infrastructure and software tools, quality bioinformatics training depends upon excellent trainers and training resources. With a focus on the trainer in the learning equation, the Global Organization for Bioinformatics Learning, Education and Training (GOBLET) aims to facilitate the advancement of bioinformatics education globally by training and supporting a network of bioinformatics trainers. Activities include coordinating training efforts, sharing data sets and teaching materials, discussing best practices and building up teaching standards and teaching recognition. Examples to improve your bioinformatics training programs will be provided. Through support and development of trainer excellence, GOBLET is working to improve the global landscape in bioinformatics education.


Ontario Institute for Cancer Research
Toronto - CA

Pedro Galante

Retrocopies in Primate Genomes

Gene duplication is a key factor contributing to phenotype diversity across and within species. Nowadays, the availability of complete genomes has led to the extensive study of genomic duplications. In this work, we performed a systematic analysis of mRNA retrocopies in seven fully sequenced primates, including human. Specifically, we catalogued their entire retrocopy repertoires and explored the origin, orthology, expression and polymorphism of these retrocopies.


Hospital Sírio-Libanês
São Paulo - SP

Stephen Turner

The most comprehensive view of genomes, epigenomes and transcriptomes

Pacific Biosciences’ Single Molecule Real-Time (SMRT™) sequencing is nearing its 5th year of commercial deployment, since its introduction the readlength and throughput of the technology have approximately doubled every year. Today, with an average readlength over 10,000 bases and significant numbers of reads over 30,000 bases in length, the technology has changed the face of genomics. Because of its unrivaled consensus accuracy and ability to provide microbial epigenomic information SMRT sequencing is now accepted as the gold standard in microbial genomics. The genomics of more complex organisms has been similarly transformed, with SMRT sequencing providing contig N50 values up to 500 times better than have been achievable with other short read technologies. Long read sequencing data also plays an important role in elucidating the full scope of cancer genome complexity as we move into the personalized medicine era. In a targeted mode, extraction of the salient features of key regions such as MHC, KIR and microsatellite regions has been enabled by its long reads and invulnerability to sequencing content bias. In a similar manner, the long reads have precipitated a revolution in understanding of RNA transcript structure through the IsoSEQ™ system. It has shown us that half or more of proteins lacked amino acid sequences in any database prior to the infusion of data from IsoSEQ™. These and other applications of SMRT sequencing have made it into an indispensable tool in genomics and will be highlighted during my talk.


Pacific Biosciences
Menlo Park - USA

Martin Kuiper

Networks for Knowledge

Volatile organic compounds are emitted by all living cells and tissues. We seek to non-invasively 'sniff' biomarker molecules that are predictive for the biomedical fate of individual patients or cell cultures. This promises great hope to move the therapeutic windows to earlier stages of disease progression. While portable devices for exhaled volatile metabolite measurement exist, we face the traditional biomarker research barrier: A lack of robustness hinders translation to the world outside laboratories. To move from biomarker discovery to validation, from separability to predictability, we have developed several bioinformatics methods for computational breath analysis, which have the potential to redefine non-invasive biomedical decision making by rapid and cheap matching of decisive medical patterns in exhaled air. We aim to provide a supplementary diagnostic tool complementing classic urine, blood and tissue samples. In the presentation, we will review the state of the art, study some clinical application examples, highlight existing challenges, and introduce new data mining methods for identifying exhaled biomarkers. Biological networks are exploited in many ways for gaining new knowledge about biological systems. Graph analysis of networks may provide useful characteristics about the design principles and mechanisms of pathways and regulation processes. Building networks as an object of scientific study, however, may prove to be a painstaking task, calling for elaborate database and literature surveying in order to get a comprehensive network representation in a topological correct format. We have used such elaborate approaches for instance for building logical models with predictive power for anti-cancer drug efficacy. Alternatively, the Semantic Web brings promises of enhanced sharing and use of biological knowledge. Semantic Systems Biology (SSB) aims to utilise semantic web resources as an additional toolkit for integrative and modeling approaches aiming to analyse and understand biological systems. The SSB group at the Norwegian University of Science and Technology works towards ways to reach out to end-users/biologists in order to create some user-pull to direct further implementations of semantic web resources. One of our efforts resulted in the construction of a resource for gene expression regulation analysis: the Gene eXpression Knowledge Base GeXKB. GeXKB provides a resource for finding novel network candidates potentially involved in gene expression regulation. The construction of GeXKB prompted us to start efforts in the direction of ‘semantifying’ data from the source: the curation of Transcription Factor information from scientific literature. This resulted in the TFcheckpoint database (www.tfcheckpoint.org), and the publication of a set of curation guidelines for other volunteer curators to join in this effort. This work inspired us to see if we could bring together the global community interested in the domain of transcription regulation research, and we are in the process of initiating GRECO: the Gene Regulation Consortium. GRECO aims to facilitate communication between resource and technology providers, paving the way to develop one virtual integrated high quality knowledge resource that could be used for instance in the field of regulatory network building and analysis.


Norwegian University of Science and Technology
Trondheim - NO

Dr. Leonardo Varuzza
Thermo Scientific

Igor Freitas

Accelerating advances in Healthcare & Life Sciences thought Intel® Big Data Solutions

Intel is helping drive the life sciences and healthcare evolution through a comprehensive approach that includes working with key players in the industry. Developing with commercial and open-source authors in order to optimize top industry codes in a variety fields like bioinformatics, computational chemistry, molecular dynamics and genomics, helps to maximize industry impact and ensure everyone benefits from this efforts. In this session we will provide an overview of how Intel solutions have been accelerating advances in healthcare and life sciences through data management and analytics delivering optimized diagnostics, treatment and care delivery.



Dr. Thiago Venâncio
Universidade Estadual do Norte Fluminense Darcy Ribeiro















For general questions: presidente@ab3c.org.br

For questions about your registration: eventos.marianaoliveira@x-meeting.com