This page presents some of my publications in a context of academic research. If you use some of this work in your academic work please do not forget to cite these documents. If you need some help in citing these documents please contact me.

For citation purposes please follow these useful links:


List of publications/abstracts:


Comparative Approaches to Using R and Python for Statistical Data Analysis

This book enables the understanding of procedures to execute data analysis with the Python and R languages. It includes several reference practical exercises with sample data. These examples are distributed in several statistical topics of research, ranging from easy to advanced. The procedures are throughout explained and are comprehensible enough to be used by non-statisticians or data analysts. By providing the solved tests with R and Python, the proceedings are also directed to programmers and advanced users. Thus, the audience is quite vast, and the book will fulfill either the curious analyst or the expert.

Read and Download



A System for Efficient Communication between Patients and Pharmacies

When studying human-technology interaction systems, researchers thrive to achieve intuitiveness and facilitate the people’s life through a thoughtful and in-depth study of several components of the application system that supports some particular business communication with customers. Particularly in the healthcare field, some requirements such as clarity, transparency, efficiency, and speed in transmitting information to patients and or healthcare professionals might mean an important increase in the well-being of the patient and productivity of the healthcare professional. In this work, the authors study the difficulties patients frequently have when communicating with pharmacists. In addition to a statistical study of a survey conducted with more than two hundred frequent pharmacy customers, we propose an IT solution for better communication between patients and pharmacists.

Read and Download



Incremental TextRank – Automatic Keyword Extraction for Text Streams

Text Mining and NLP techniques are a hot topic nowadays. Researchers thrive to develop new and faster algorithms to cope with larger amounts of data. Particularly, text data analysis has been increasing in interest due to the growth of social networks media. Given this, the development of new algorithms and/or the upgrade of existing ones is now a crucial task to deal with text mining problems under this new scenario. In this paper, we present an update to TextRank, a well-known implementation used to do automatic keyword extraction from text, adapted to deal with streams of text. In addition, we present results for this implementation and compare them with the batch version. Major improvements are lowest computation times for the processing of the same text data, in a streaming environment, both in sliding window and incremental setups. The speedups obtained in the experimental results are significant. Therefore the approach was considered valid and useful to the research community.

Read and Download



Efficient Incremental Laplace Centrality Algorithm for Dynamic Networks

Social Network Analysis (SNA) is an important research area. It originated in sociology but has spread to other areas of research, including anthropology,biology, information science, organizational studies, political science, and computer science. This has stimulated research on how to support SNA with the development of new algorithms. One of the critical areas involves calculation of different centrality measures. The challenge is how to do this fast, as many increasingly larger datasets are available. Our contribution is an incremental version of the Laplacian Centrality measure that can be applied not only to large graphs but also to dynamically changing networks. We have conducted several tests with different types of evolving networks. We show that our incremental version can process a given large network, faster than the corresponding batch version in both incremental and full dynamic network setups.

Read and Download



Social Network Analysis in Streaming Call Graphs

Mobile telecom operators collect and store Call Detail Records (CDRs) in real-time, which detail the communication among subscribers.
Call graphs can be induced from these CDRs, where nodes represent subscribers and edges represent the phone calls made. These graphs may easily reach millions of nodes and billions of edges. Besides being large-scale and generated on real-time, the underlying social networks are inherently complex and, thus, difficult to analyze. Conventional data analysis performed by telecom operators is slow, done by request and implies heavy costs in data warehouses. In face of these challenges, real-time streaming analysis becomes an ever increasing need to mobile operators, since it enables them to quickly detect important network events and improve their marketing strategies. Sampling, together with visualization techniques, are required for online exploratory data analysis and event detection in such networks. In this chapter, we report the burgeoning body of research in network sampling, streaming analysis and streaming visualization of social networks and the solutions proposed so far.

Read and Download



Streaming Networks Sampling using top-K Networks

The combination of top-K network representation of the data stream with community detection is a novel approach to streaming networks sampling. Keeping an always up-to-date sample of the full network, the advantage of this method, compared to previous, is that it preserves larger communities and original network distribution. Empirically, it will also be shown that these techniques, in conjunction with community detection, provide effective ways to perform sampling and analysis of large scale streaming networks with power law

Read and Download



Metrics of Evolving Ego-Networks with Forgetting Factor

Nowadays, treating the data as a continuous real-time flux is an exigence explained by the need for immediate response to events in daily life. We study the data like an ongoing data stream and represent it by streaming egocentric networks (Ego-Networks) of the particular nodes under study. We use a non-standard node forgetting factor in the representation of the network data stream, as previously introduced in the related literature. This way the representation is sensible to recent events in users’ networks and less sensible for the past node events. We study this method with large scale Ego-Networks taken from telecommunications social networks with power law distribution. We aim to compare and analysis some reference Ego-Networks metrics, and their variation with or without forgetting factor.

Read and Download



Visualization of Evolving Large Scale Ego-Networks

Large scale social networks streaming and visualization has been a hot topic in recent research. Researchers strive to achieve efficient streaming methods and to be able to gather knowledge from the results. Moreover treating the data as a continuous real time flow is a demand for immediate response to events in daily life. Our contribution is to treat the data as a continuous stream and represent it by streaming the egocentric networks (Ego-Networks) for particular nodes. We propose a non-standard node forgetting factor in the representation of the network data stream. Thus, this representation is sensible to recent events in users networks and less sensible for the past node events. The aim of these techniques is the visualization of large scale Ego-Networks from telecommunications social networks with power law distributions.

Read and Download



Visualization for Streaming Telecommunications Networks

Regular services in telecommunications produce massive volumes of relational data. In this work the data produced in telecommunications is seen as a streaming network, where clients are the nodes and phone calls are the edges. Visualization techniques are required for exploratory data analysis and event detection. In social network visualization and analysis the goal is to get more information from the data taking into account actors at the individual level. Previous methods relied on aggregating communities, k-Core decompositions and matrix feature representations to visualize and analyse the massive network data. Our contribution is a group visualization and analysis technique of influential actors in the network by sampling the full network with a top-k representation of the network data stream

Read and Download



A Comprehensive Workflow for Enhancing Business Bankruptcy Prediction

Enterprise bankruptcy is a problem that evolves and is due to several reasons. The bankruptcy problem can cause severe damage to financial institutions. If predicted accurately and ahead of time, it may enable the company to react and change the course of history. This problem may be asserted as a typical classification problem, as several financial ratios are considered attributes. Widespread software tools offer a broad spectrum of Artificial Intelligence algorithms, and the most challenging task may be the decision of selecting that algorithm. Testing solutions to support this decision, on the relatively large amount of available literature in this area with so many options, advantages and pitfalls may be as informative as distracting. In this work, we present an empirical study with a comprehensive Knowledge Discovery and Data Mining (KDD) workflow. With a classifier selection automation, we select an algorithm with better prediction performance than the most widely documented in the literature.

Read and Download



All available PDFs: