This page presents some of my publications in a context of academic research. If you use some of this work in your academic work please do not forget to cite these documents. If you need some help in citing these documents please contact me.
For citation purposes please follow these useful links:
List of publications/abstracts:
Comparative Approaches to Using R and Python for Statistical Data Analysis
This book enables the understanding of procedures to execute data analysis with the Python and R languages. It includes several reference practical exercises with sample data. These examples are distributed in several statistical topics of research, ranging from easy to advanced. The procedures are throughout explained and are comprehensible enough to be used by non-statisticians or data analysts. By providing the solved tests with R and Python, the proceedings are also directed to programmers and advanced users. Thus, the audience is quite vast, and the book will fulfill either the curious analyst or the expert.
A System for Efficient Communication between Patients and Pharmacies
When studying human-technology interaction systems, researchers thrive to achieve intuitiveness and facilitate the people’s life through a thoughtful and in-depth study of several components of the application system that supports some particular business communication with customers. Particularly in the healthcare field, some requirements such as clarity, transparency, efficiency, and speed in transmitting information to patients and or healthcare professionals might mean an important increase in the well-being of the patient and productivity of the healthcare professional. In this work, the authors study the difficulties patients frequently have when communicating with pharmacists. In addition to a statistical study of a survey conducted with more than two hundred frequent pharmacy customers, we propose an IT solution for better communication between patients and pharmacists.
Incremental TextRank – Automatic Keyword Extraction for Text Streams
Text Mining and NLP techniques are a hot topic nowadays. Researchers thrive to develop new and faster algorithms to cope with larger amounts of data. Particularly, text data analysis has been increasing in interest due to the growth of social networks media. Given this, the development of new algorithms and/or the upgrade of existing ones is now a crucial task to deal with text mining problems under this new scenario. In this paper, we present an update to TextRank, a well-known implementation used to do automatic keyword extraction from text, adapted to deal with streams of text. In addition, we present results for this implementation and compare them with the batch version. Major improvements are lowest computation times for the processing of the same text data, in a streaming environment, both in sliding window and incremental setups. The speedups obtained in the experimental results are significant. Therefore the approach was considered valid and useful to the research community.
Efficient Incremental Laplace Centrality Algorithm for Dynamic Networks
Social Network Analysis (SNA) is an important research area. It originated in sociology but has spread to other areas of research, including anthropology,biology, information science, organizational studies, political science, and computer science. This has stimulated research on how to support SNA with the development of new algorithms. One of the critical areas involves calculation of different centrality measures. The challenge is how to do this fast, as many increasingly larger datasets are available. Our contribution is an incremental version of the Laplacian Centrality measure that can be applied not only to large graphs but also to dynamically changing networks. We have conducted several tests with different types of evolving networks. We show that our incremental version can process a given large network, faster than the corresponding batch version in both incremental and full dynamic network setups.
Social Network Analysis in Streaming Call Graphs
Mobile telecom operators collect and store Call Detail Records (CDRs) in real-time, which detail the communication among subscribers.
Call graphs can be induced from these CDRs, where nodes represent subscribers and edges represent the phone calls made. These graphs may easily reach millions of nodes and billions of edges. Besides being large-scale and generated on real-time, the underlying social networks are inherently complex and, thus, difficult to analyze. Conventional data analysis performed by telecom operators is slow, done by request and implies heavy costs in data warehouses. In face of these challenges, real-time streaming analysis becomes an ever increasing need to mobile operators, since it enables them to quickly detect important network events and improve their marketing strategies. Sampling, together with visualization techniques, are required for online exploratory data analysis and event detection in such networks. In this chapter, we report the burgeoning body of research in network sampling, streaming analysis and streaming visualization of social networks and the solutions proposed so far.
Streaming Networks Sampling using top-K Networks
The combination of top-K network representation of the data stream with community detection is a novel approach to streaming networks sampling. Keeping an always up-to-date sample of the full network, the advantage of this method, compared to previous, is that it preserves larger communities and original network distribution. Empirically, it will also be shown that these techniques, in conjunction with community detection, provide effective ways to perform sampling and analysis of large scale streaming networks with power law
Metrics of Evolving Ego-Networks with Forgetting Factor
Nowadays, treating the data as a continuous real-time flux is an exigence explained by the need for immediate response to events in daily life. We study the data like an ongoing data stream and represent it by streaming egocentric networks (Ego-Networks) of the particular nodes under study. We use a non-standard node forgetting factor in the representation of the network data stream, as previously introduced in the related literature. This way the representation is sensible to recent events in users’ networks and less sensible for the past node events. We study this method with large scale Ego-Networks taken from telecommunications social networks with power law distribution. We aim to compare and analysis some reference Ego-Networks metrics, and their variation with or without forgetting factor.
Visualization of Evolving Large Scale Ego-Networks
Large scale social networks streaming and visualization has been a hot topic in recent research. Researchers strive to achieve efficient streaming methods and to be able to gather knowledge from the results. Moreover treating the data as a continuous real time flow is a demand for immediate response to events in daily life. Our contribution is to treat the data as a continuous stream and represent it by streaming the egocentric networks (Ego-Networks) for particular nodes. We propose a non-standard node forgetting factor in the representation of the network data stream. Thus, this representation is sensible to recent events in users networks and less sensible for the past node events. The aim of these techniques is the visualization of large scale Ego-Networks from telecommunications social networks with power law distributions.
Visualization for Streaming Telecommunications Networks
Regular services in telecommunications produce massive volumes of relational data. In this work the data produced in telecommunications is seen as a streaming network, where clients are the nodes and phone calls are the edges. Visualization techniques are required for exploratory data analysis and event detection. In social network visualization and analysis the goal is to get more information from the data taking into account actors at the individual level. Previous methods relied on aggregating communities, k-Core decompositions and matrix feature representations to visualize and analyse the massive network data. Our contribution is a group visualization and analysis technique of influential actors in the network by sampling the full network with a top-k representation of the network data stream
A Comprehensive Workflow for Enhancing Business Bankruptcy Prediction
Enterprise bankruptcy is a problem that evolves and is due to several reasons. The bankruptcy problem can cause severe damage to financial institutions. If predicted accurately and ahead of time, it may enable the company to react and change the course of history. This problem may be asserted as a typical classification problem, as several financial ratios are considered attributes. Widespread software tools offer a broad spectrum of Artificial Intelligence algorithms, and the most challenging task may be the decision of selecting that algorithm. Testing solutions to support this decision, on the relatively large amount of available literature in this area with so many options, advantages and pitfalls may be as informative as distracting. In this work, we present an empirical study with a comprehensive Knowledge Discovery and Data Mining (KDD) workflow. With a classifier selection automation, we select an algorithm with better prediction performance than the most widely documented in the literature.
All available PDFs: