Automated Machine Learning application (ML) in Power Bi
In recent years, artificial intelligence and machine learning have seen an unprecedented rise in popularity across industries and areas of scientific research. Companies are looking for ways to integrate these new technologies into their operations.
Power BI is one of Microsoft’s most popular tools that Data Scientists, Analysts and Business Intelligence developers have adapted to perform their roles based on ML, as shown in the chart below:
Most of these properties can be run in Power BI Desktop mode and the rest by purchasing a premium license with Power BI Service. In this article, we intend to showcase a novel AI feature in Power BI Desktop called AutoML. With a few simple clicks, AutoML allows users to create visualizations based on Machine Learning models for ad-hoc report generation or root cause analysis, which is enhanced by the built-in AI integration functionality.
Decomposition Tree
One of the novel visualizations in terms of ML that Power BI has introduced is the decomposition tree. A highly interactive Visualization that allows you to decompose (break down) a metric or variable response several attributes across different dimensions. To demonstrate how the decomposition tree works, it is modeled from the life insurance dataset, where you have information about the dollar value of insurance for each person with respect to their gender, age, whether or not they smoke tobacco, and the number of children they have. The analysis problem consists of decomposing the insurance dollar value metric into explanatory variables named gender, age, smoking and number of children. Once the dataset is connected to Power BI, a new “Decomposition Tree” visualization is added where two types of input are required: Analize: The metric you would like to analyze Explain by: One or more dimensions you would like to go deeper into.
The following result is obtained:
The decomposition tree allows you to break down the response variable in the order of the explanatory variables you want just by clicking on any of the nodes, in addition, you will have a sign (+) to know the high level of each variable and a sign (-) if you want to observe the lowest level. In the context of the example, if a person is male, smokes tobacco, has no children and is 19 years old, he will have the highest insurance value among all the possible results of the analysis, which is estimated at $245,500 dollars. The $17,755,825 refers to the total value of insurance for all individuals in the data set. In this way, the possible decompositions of interest can be analyzed according to the business idea or the analysis problem to be worked on.
If there is another variable of interest in the study, another type of visualization can be created to interact with the decomposition tree, following the example, we have the region of origin of each person as shown in the following results:
Depending on the source region, the different decompositions of the tree change, which makes it an interesting AutoML tool to apply within Power BI.
key Influencers
Other visualizations that stand out are the Key influencers. Power BI uses AutoML to run a logistic regression as a base ML model within this visualization, having as main objective to be able to rank the factors that are most associated with the metric in terms of probability. For the execution of the example, the data file on customer feedback based on [Moro et al., 2014] S.Moro,P. Cortez and P.Rita. “A data-driven approach to predict banking telemarketing success.” Decision Support Systems, Elsevier, 62:22-31, June 2014. As a business problem, the product manager wants the number of factors that lead customers in determining a negative rating regarding their cloud service to be discovered. The domains that influence customer ratings are:
– Country-Region – Customer’s role in the organization – Type of subscription – Company size – Topic
When there is no clear determination of the most influential explanatory variables in a metric, it is recommended to use a correlational statistical pre-analysis on your dataset to identify them. As in the decomposition tree, the two types of inputs are as follows:
It generates the following outputs:
As a first step it is noted that the type of rating is low in the visualization, then the most influential factors are represented, the number 1 or most influential factor is the role of the customer in the organization, where if it is a consumer type it is 2.57 more likely to give a low score compared to the other roles such as administrator and editor. The second most influential factor is the issue of customer review. Customers who commented on product usability were 2.55 times more likely to give a low rating compared to customers who commented on other issues such as reliability, design, or speed. Similarly one would interpret the other results according to the context of each variable.
As a supporting result in the Key influencers visualization, there is a column chart showing the distribution of the selected factor. By selecting the check box at the bottom, it shows only the most influential ones.
In the case of the role of the customer in the organization, 14.93% of the consumer category give a low score. In addition, on average, all the other roles, excluding the consumers’ role, give a low score of 5.78%.
The top segments tab shows the different clusters identified by Power BI within the population, for the selected metric value which in this case is low customer ratings. Segment 1, for example, has 30.8% low customer ratings. The size of the bubble represents how many customers are within each cluster.
By selecting a bubble, we drill down into detail for each segment. If we select bubble number one, you will find that in this cluster belong customers who are not publishers, i.e., they are consumers or administrators with premier subscription type and the topic of review as a customer was security. In this cluster, 30.8% of customers gave a low rating. The average client had a low rating of 11.7%, and segment one contains approximately 3.5% of the dataset.
The intention with which this article was written was to mention some techniques that Power BI is adapting in everything related to AI. From now on there will be many very interesting things to address and get more potential out of this good tool provided by Microsoft. We are on the verge of the fourth industrial revolution, where information technology companies are called to dominate the industry because of their fundamental role in giving value to data for decision making. As this fourth industrial revolution advances, data scientists and AI professionals will be fortunate to continue learning about new topics that surprise us every day.