Claudio Reggiani

Claudio Reggiani

Brussels, Brussels Region, Belgium
2K followers 500+ connections

Activity

Join now to see all activity

Experience

  • Aspire Graphic

    Aspire

    Singapore, Singapore

  • -

    Brussels, Brussels Region, Belgium

  • -

  • -

    Italy

  • -

    Singapore

  • -

    Singapore

  • -

    Brussels Area, Belgium

Education

  • thePower Graphic
  • -

    PhD student at Interuniversity Institute of Bioinformatics in Brussels, (IB)2.
    Analysis of omics data with machine learning techniques.
    Research on scalable machine learning methods in data intensive frameworks.
    Experience working in a HPC environment.
    Experience with PBS batch systems.

  • -

    Visiting student at Machine Learning Group (MLG) for master thesis.
    Master thesis title: "Scaling feature selection algorithms usingMapReduce on Apache Hadoop"

  • -

    Artificial Intelligence
    Model Identification and data analysis 1
    Software Engineering 2
    Data Bases 2
    Statistics
    Formal Languages and Compilers
    Computer Security
    Logic and Algebra
    Theoretical Computer Science
    High Performance Processors and Systems
    Game Theory
    Enterprise Digital Infrastructures
    Internet of Things
    Data Mining and Text Mining
    Business Information Systems
    Advanced Algorithms
    Search Computing
    Service Technologies 1
    Advanced…

    Artificial Intelligence
    Model Identification and data analysis 1
    Software Engineering 2
    Data Bases 2
    Statistics
    Formal Languages and Compilers
    Computer Security
    Logic and Algebra
    Theoretical Computer Science
    High Performance Processors and Systems
    Game Theory
    Enterprise Digital Infrastructures
    Internet of Things
    Data Mining and Text Mining
    Business Information Systems
    Advanced Algorithms
    Search Computing
    Service Technologies 1
    Advanced Topics in Computer Security

  • -

    Introduction to Probabilty
    Electrical Engineering
    Electronics I
    Electronics II
    Database
    Operational Research
    Operating Systems
    Lab Operating Systems
    Control Theory I
    Control Theory II
    Logistic engineering
    Computer Network
    Lab Computer Network
    Electrical Communication
    Economics
    Software Engineering
    English
    Termodynamics
    Industrial Computer Science

Volunteer Experience

Publications

  • Feature selection in high-dimensional dataset using MapReduce

    Springer CCIS series, volume 823

    This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features.

    Other authors
    See publication
  • Novel promoters and coding first exons in DLG2 linked to developmental disorders and intellectual disability

    Genome Medicine

    Background - Tissue-specific integrative omics has the potential to reveal new genic elements important for developmental disorders.

    Methods - Two pediatric patients with global developmental delay and intellectual disability phenotype underwent array-CGH genetic testing, both showing a partial deletion of the DLG2 gene. From independent human and murine omics datasets, we combined copy number variations, histone modifications, developmental tissue-specific regulation, and protein data…

    Background - Tissue-specific integrative omics has the potential to reveal new genic elements important for developmental disorders.

    Methods - Two pediatric patients with global developmental delay and intellectual disability phenotype underwent array-CGH genetic testing, both showing a partial deletion of the DLG2 gene. From independent human and murine omics datasets, we combined copy number variations, histone modifications, developmental tissue-specific regulation, and protein data to explore the molecular mechanism at play.

    Results - Integrating genomics, transcriptomics, and epigenomics data, we describe two novel DLG2 promoters and coding first exons expressed in human fetal brain. Their murine conservation and protein-level evidence allowed us to produce new DLG2 gene models for human and mouse. These new genic elements are deleted in 90% of 29 patients (public and in-house) showing partial deletion of the DLG2 gene. The patients’ clinical characteristics expand the neurodevelopmental phenotypic spectrum linked to DLG2 gene disruption to cognitive and behavioral categories.

    Conclusions - While protein-coding genes are regarded as well known, our work shows that integration of multiple omics datasets can unveil novel coding elements. From a clinical perspective, our work demonstrates that two new DLG2 promoters and exons are crucial for the neurodevelopmental phenotypes associated with this gene. In addition, our work brings evidence for the lack of cross-annotation in human versus mouse reference genomes and nucleotide versus protein databases.

    See publication
  • Analysis of structured RDF/XML data using Spark and GraphX

    Proceedings in BENELEARN 2016

    Bioinformatics is an evolving field combining, statistics, mathematics and engineering in order to interpret biological data. This vast domain deals with large datasets which need to be efficiently processed. The Big Data paradigm will be analysed in this context, presenting its advantages and shortcomings over traditional computational methods. This paper describes the implementation of a tool capable of standardizing the vast formats in which biological data is represented leveraging…

    Bioinformatics is an evolving field combining, statistics, mathematics and engineering in order to interpret biological data. This vast domain deals with large datasets which need to be efficiently processed. The Big Data paradigm will be analysed in this context, presenting its advantages and shortcomings over traditional computational methods. This paper describes the implementation of a tool capable of standardizing the vast formats in which biological data is represented leveraging distributed systems such as Apache Spark.

    Other authors
    See publication
  • Minimum Redundancy Maximum Relevance: MapReduce implementation using Apache Hadoop

    Proceedings in BENELEARN 2014

    High-dimensional datasets include useful information for prediction purposes, but redundancy of features and noise affect negatively classifier performance. Feature selection algorithms are employed to tackle the curse of dimensionality and improve performance by selecting a subset of features. In this paper we address the design and implementation of minimum Redundancy Maximum Relevance feature selection algorithm using MapReduce paradigm, through the Apache Hadoop framework. We report…

    High-dimensional datasets include useful information for prediction purposes, but redundancy of features and noise affect negatively classifier performance. Feature selection algorithms are employed to tackle the curse of dimensionality and improve performance by selecting a subset of features. In this paper we address the design and implementation of minimum Redundancy Maximum Relevance feature selection algorithm using MapReduce paradigm, through the Apache Hadoop framework. We report preliminary results on the scalability of our algorithm

    Other authors
    See publication

Projects

Honors & Awards

  • IBM EMEA Best Student Recognition 2013

    IBM Montpellier

    The 2013 EMEA Best Student Recognition Event brought together at IBM Montpellier, France, the top students from across Europe, Middle East and Africa, for a three days event on July 1-3, 2013.

  • Spinner Research Scholarship for Research and Development

    Consorzio Spinner

    Research grant aimed at a project of international collaboration for the development of new technologies in the industrial region of Emilia Romagna. This Facility has been sanctioned by the consortium Spinner, aimed at research and products in the field of computer vision.

  • Il pinguino nel computer

    Comune di Modena

    Winner of “Il pinguino nel computer” competition (2007 edition), organized by Fondazione
    Cassa di Risparmio di Modena and Università degli Studi di Modena e Reggio Emilia
    (Computer Engineering Department). The aim was to stimulate young people to develop
    an open source application under Linux environment

Test Scores

  • TOEIC

    Score: 920/990

Languages

  • Italian

    Native or bilingual proficiency

  • English

    Full professional proficiency

  • French

    Professional working proficiency

More activity by Claudio

View Claudio’s full profile

  • See who you know in common
  • Get introduced
  • Contact Claudio directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Claudio Reggiani

Add new skills with these courses