What is Data Mining? How Can You Data Mine Yourself?

Author: Hanzade Durmusoglu

From: Istanbul, Turkey

Introduction


Recently, I’ve been really into Animal Crossing: New Horizons, as one does when one is quarantined and has no aim whatsoever to do anything productive. After playing for a concerning amount of time, I’ve discovered various servers to busy myself and of course, data miners to follow and keep up with the latest discoveries. I had a slight understanding of what data mining was by only following their publications: there were sets of data and people read through them and got ideas of the following updates for the game. But what was exactly data mining? Where was it used? Was it useful? Most importantly, how do people data mine?


Data Mining for Dummies


“Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use,” says Wikipedia. In simpler words, data mining is used to extract hidden, valid and potentially useful data sets from raw, huge sets of data. It is also known and referred to as Knowledge Discovery in Data (KDD).

It is also stated in the Wikipedia article that the term “data mining” is a misnomer because the product that is extracted is not the data itself but the various patterns and knowledge within the database.


Where is it Used?


Data mining is used in many areas, not just limited to gaming. Future healthcare uses databases and analytics to identify the best medical practices that improve care and reduce cost. Educational Data Mining is used for predicting students’ studying patterns, developing personal techniques to boost their learning and overall improving the quality of teaching. Data mining acts as a bridge between customers’ interests and products in manufacturing engineering. Fraud detection strongly relies on data mining to turn data sets into trusted information when it is used and a criminal investigation is much easier when data mining is used to process information quickly and swiftly. These were just some examples if you are looking for an excuse to start learning.


Data Mining Techniques


Two main processes are commonly used in data mining:

1. The Knowledge Discovery in Databases (KDD):

  • Selection

  • Pre-processing

  • Transformation

  • Data mining

  • Interpretation/Evaluation

2. Cross-Industry Standard Process Data Mining (CRISP-DM):

  • Business Understanding

  • Data Understanding

  • Data Preparation

  • Modelling

  • Evaluation

  • Deployment

It is important to note that the CRISP-DM is used more frequently.


Data Mining Steps explained
Source: Ramesh Dontha

Data Mining Steps explained


Data Mining vs. Data Analysis: What is the Difference?


When analyzing data, the amount of data does not matter. Meaning any amount of data could be analyzed to test models and hypotheses on the database. In contrast, to successfully data mine, a large database is often needed to properly discover clandestine or unknown patterns. Data mining could be thought as a data analysis technique that focuses more on knowledge discovery for predictive purposes, as opposed to data analysis which leans more on the descriptive side. Although the two concepts sound incredibly similar and are deeply connected, they have slightly different dynamics that requires both of their usage in any business related to data.


Pros and Cons of Data Mining


If you are wondering the benefits of learning how to data mine and what to do with the skill, I have gathered the positive and negative sides of the notion to leave it to your judgement:

  • Data mining is cost-effective and much more efficient compared to other data applications.

  • Helps to facilitate automated prediction of trends (which is one of the major reasons why we use data mining, it is all about predictions) as well as the automated discovery of hidden patterns (another crucial reason why data mining is needed).

  • Since it is a practice that is based solely on any kind of data, it can be easily implemented into newer and older systems, meaning it is highly flexible.

  • Is a speedy process that allows users to process a huge amount of data as quickly as possible.

  • Similar to any kind of service that is connected to data sharing and the internet, it may pose security problems such as companies selling data mined info to others to gain money from it. (Looking at you here, Facebook.)

  • Even though it is a flexible tool as a concept, it is almost always required to look for various data mining tools for different algorithms, requiring effort and experience to benefit from the process.

  • Not all data mined information will be accurate and can lead to serious consequences. Since it is a system based on predictions and newly found patterns, sometimes information may be unclear or wrong. It may not cause too much harm in industries like gaming but when it is used as a strategic tool for businesses and companies it is important to not rely completely on the information collected.


Source: Wisdomplexus

Data Mining Pros and Cons


Where to Learn Data Mining, How to Start


Worry you not, Coursera has come to our help. The free Data Mining Specialization course on Coursera given by the University of Illinois may be a great start for you. Besides, there are tons of data miners and specialized data analytics on the Internet that you can always contact in case of questions and help. Here are wonderful websites that not only helped me understand the concept of data mining better but probably have much better explanations on the topic than me:


For those of you who were interested enough to read until the end, I hope your questions were answered and you are looking forward to learning how to data mine as much as I am. It is truly an interesting and vital field that has piqued the interest of many. Happy data mining!



 

Author: Hanzade Durmusoglu


Hanzade is a rising sophomore from Turkey who is mainly interested in computer science, politics and history. She has some knowledge in JavaScript, Java, HTML, CSS and Swift and looking forward to learning more.

 

References:


1. (2020, April 9). Gaming: What is data mining, and is it reliable for updates?. BBC.

https://www.bbc.co.uk/newsround/52456575#:~:text=Data%20mining%20works%20when%20people,or%20feature%20in%20the%20game.


2. (last edited in 2020, August 25). Data mining.

https://en.wikipedia.org/wiki/Data_mining#Process


3. (last edited in 2020, June 14). Examples of data mining.

https://en.wikipedia.org/wiki/Examples_of_data_mining


4. (last edited in 2020, August 14). Data analysis.

https://en.wikipedia.org/wiki/Data_analysis


5. Definition of ‘Data Mining’.

https://economictimes.indiatimes.com/definition/data-mining


6. Data Mining Tutorial: Process, Techniques, Tools, Examples.

https://www.guru99.com/data-mining-tutorial.html


7. Rajkumar, P. (2014, August 14). Top 14 useful applications for data mining.

https://bigdata-madesimple.com/14-useful-applications-of-data-mining/



132 views1 comment

Recent Posts

See All