Application of an Artificial Neural Network (ANN) in Power BI adapting the R programming language.
Many of us who are users of strong data science tools such as R and/or Python are constantly looking for strategies to adapt these analytics development programs to other environments.
Considering the above argument, the CepoBIA team came up with the idea of writing an article in which we will explain in a sequential way how to include the outputs of an Artificial Neural Network (ANN) with the help of the R script installed in Microsoft Power BI.
But what is an ANN anyway? An artificial neural network, as it is known in English, models the relationship between a set of input and output signals using a model derived from understanding how the brain responds to sensory stimuli. Just as a human brain uses a network of interconnected cells called neurons to create a massively parallel process, an ANN uses a neural network or artificial nodes to solve machine learning problems.
Nowadays, the massive use of potentially efficient computers has prompted the application of ANNs in high caliber problems such as:
- Image and voice recognition programs used in some technological service.
- Mechanization of intelligent devices such as driverless cars and autopilot drones
- Sophisticated models of weather patterns, disease classifications and many other scientific, social or economic phenomena.
Well, now that a brief definition of what ANN is and where it can be used, let’s go straight to the example.
Estimating the strength of concrete is a challenge of particular interest. Although it is used in almost all construction projects, the performance of its structure varies greatly due to a wide variety of characteristics that interact in complex ways, therefore, it is difficult to accurately predict the strength of this material. As a challenge, an ANN is implemented to predict the strength of concrete given a list of inputs that aid in the composition of building construction practices.
We will use the concrete compressive strength dataset donated in the machine learning platform Kaggle link.
According to the website, the dataset contains 1,030 concrete strength measurements with eight inputs describing the components used in the material. Inputs include the amount in kilograms per cubic meter of cement, slag mix, ash mix, water, superplasticizer mix, coarse aggregate mix and fine aggregate mix used in the material as well as the aging time measured in days, these ingredients are believed to be related to strength.
When uploading the dataset to Microsoft Power BI we have the following structure:
Step 1- Create the R visualization
By clicking on the display corresponding to the R script, we select all the variables under study as shown below:
Step 2- Formation of a dataframe
The Power BI R script will automatically form a dataframe with the selected variables to later execute the analysis codes.
Step 3- Standardization, distribution of training and test datasets
Before executing any Machine Learning algorithm, it is advisable to rescale the variables so that they are all measured in the same range of units. In this exercise, it was decided to standardize from the maximum and minimum values, since a previous descriptive analysis showed that the distribution of almost all the variables was skewed. Subsequently, 75% of the data were chosen for the training dataset and the remaining 25% as test dateset. The data were already arranged in random order, so there was no need to perform a random sampling with the sample function in R.
Step 4- Train the ANN
Having previously installed and loaded the neuralnet library in R, we proceed to call the package with the library command and train the ANN using the following syntax:
At the end of “plotting” the model, the visualization of the neural network being trained is obtained in Power BI, the training algorithm used is backpropagation, the activation function exerted on each neuron is the sigmoid and the number of hidden nodes recommended in the modeling is 5.
At the bottom of the visualization, R reports the number of training steps and a learning error where fortunately it gives us lowered, i.e. the cost function found after a large number of steps, the weights that minimize the error of the training model given the analysis conditions.
ANN Evaluation
Now let’s evaluate the performance of the ANN from the test dataset, in this case the Power BI Query Editor is used.
Step 1- Obtain the predicted resistance
In the transformations tab of the Query Editor, we click the button described as Run R script, then we copy and paste all the code shown above with the only exception of removing the plot command that helps us visualize the artificial neural network, we add compute which stores the predictions regarding the test data.
Next, we determine the outputs of interest in a dataframe called output where the test data set and the de-standardized predictions are introduced.
By clicking OK and then clicking on the output created in the Query Editor, we get the concrete strength predictions tabulated and ready to create the visualizations you want to evaluate the performance of the machine learning model.
Step 2- Correlation between predicted and actual resistance
Because this is a prediction problem rather than a classification problem, the confusion matrix cannot be used to examine the accuracy of the model. Instead, it is proposed to measure the degree of linear association from the correlation between predicted concrete strength and the actual value.
After closing and applying the Query Editor transformations, in Power BI Desktop we proceed to open a new view of the R script and select the variables of interest, as shown below:
With the help of the “GGally” library previously installed in R, we obtain the calculation of the correlation between the two variables:
Correlation indicates a strong linear relationship between two variables. This implies that our ANN is doing a fairly good job of predicting concrete strength.
If you like, you can accompany the correlation calculation by running a scatter plot between predicted and actual values, as shown below.
Finally, we saw step by step how you can show the results of one of the machine learning models from Microsoft Power BI, now cheer up and start making your dashboard not only using an ANN, try it with a support vector machine, an ACP, decision trees and other ML models belonging to the world of artificial intelligence.
In cepoBIA we will continue to advance in the application of our knowledge of analytics in the new technologies offered by Microsoft.