KNIME, a window of opportunity in Data Science
Perhaps one of the most common frustrations for those who want to do data science and do not come from a background in computer science disciplines is the programming barrier. The long learning curve to develop programming skills in languages such as Python or R discourages many from doing data analysis based on machine learning, artificial intelligence, data visualization and even big data processing.
With this article is intended to clear this dark picture, there is good news, fortunately there are software developments that allow us to do data mining and data science without programming, today I come to talk about KNIME. Konstanz Information Mine or better known as KNIME is an open source software developed in Germany, programmed in Java and with a large community that provides new solutions. Its interface is based on a visualization of processes in node flows, like this: Knime Nodes
To start using it just drag with the mouse cursor the nodes and connect them, each node has a specific configuration based on drop-down lists, our node repository has functionalities ranging from reading Excel files, txt, json, csv, etc, data processing, to predictive and classification models. As you can see, we do not need to program to perform complex data transformation and modeling operations. The Knime interface is oriented towards a visualization of objects, we can query at each node the table corresponding to the database at each node in the workflow, we can generate new branches of our flow from each node, in order to perform validations of models, sensitivity analysis, visualizations and statistical analysis.
What if I know how to program? If you know how to program I have even better news for you, KNIME can integrate with software such as R, Python, PostgresSQL, Java, among others, that is, we can deploy these programs without leaving KNIME and take advantage of all the libraries and functionalities they offer. For example, the integration with SQL allows us to connect to database repositories and make queries in them. The integration with R allows us to make predictions with linear models, perform ggplot2 visualizations among other functionalities. These integrations are part of the extensions that can be downloaded for free from the software, the extensions include integration nodes to other software, API query, data processing, spatial analysis, specialized nodes in chemistry, biology and many more.
So is Knime recommended? In my opinion, it is highly recommended to perform data science analysis quickly and easily. The learning curve is relatively short because once we adapt to the workflow of nodes, the combination of these is very easy and gives us a schematic perspective of how to approach a data science problem, starting with the definition of a data repository, reading the databases, creating queries, data cleaning, descriptive analysis of the data and finally the use of predictive models to generate solutions in our organizations. And if you have any doubts, here is the gartner diagram that compares the main data mining tools on the market in 2018, where Knime is the industry leader over many open source and paid license programs:
Say no more, let’s get to work installing KNIME: https://www.knime.com/downloads Algunos recursos didácticos, que incluyen tutoriales y videos en Youtube: https://www.knime.com/resources