Data mining is a word we have always heard and tried to understand what it is but never got the right information, right! Just read along to get your hands on Data mining and you`ll be scared as well after knowing what it is. What you`ll know by the end of this blog:

· Definition

· How Google and Facebook fetch your data

· What are Cookies

· What are Tracking Cookies and Deep Face

· 4 steps of Data mining

· Data Mining Techniques

· Software and Tools

· Pros and Cons

Data is “Raw facts, and figures” while Mining is “Extraction” like gold mining. So, we can say that,

Data mining is the process of finding peculiarity, patterns and correlations within large data sets to predict outcomes. Using a broad range of techniques.

Let`s take a look at an example to understand it better (Note: You can try it as
well). First, open Facebook and scroll through without opening anything, now
open something let’s say, you opened an e-store and you were looking through
mobiles, don`t tap at any phone, just close the Facebook and type the company name of the phone that you just saw at the Facebook. You`ll be surprised to see the name of the mobile phone that appeared you were scrolling through.

How does that happen?

Google uses IP addresses and Cookies to fetch your data, wondering what is an IP address and what are cookies?

IP address: Internet Protocol is a protocol your device needs to connect to the internet, IP address is usually in the form of 192.168.10.0, these numbers store your:

· City

· Zipcode/area code

· And your ISP (Internet Service Provider) name

It tells your accurate location not wholly but, partially for sure.

Cookies: Cookies are the things you need to be aware of the most. It is a small file that website stores in your browser to store your data like whatever site you visit and you log in, the next you`ll see is a form/panel hanging saying, “Save Password” somewhat like this:

Your username and password are also saved in your browser in that particular cookie file because if the company went to store your passwords and usernames, their databases will surely run down on storage after a hundred or more records.

Now, we have Facebook`s Tracking Cookies, you`ll be thinking what is the difference between normal cookies and tracking cookies. Normal cookies can`t track your mouse, the website owner doesn`t know where your mouse is hovering but, tracking cookies do know where your mouse hovered on their website.

We have always seen notifications on Facebook, recently we have started receiving notifications, somewhat like this:

Someone from your friends has uploaded a photo of you, how does Facebook know?

For this purpose, Facebook uses Deep Face (Accuracy: 97.35%) that can recognize you even if your picture is a little bit in another direction or its upside down, like this:

Picture (a) is the original one which is slightly to the right but after crossing it from Deep Face you can see in the picture (g) it’s a front-side of his face.

Now, we have 4 steps of Data Mining, which include:

1. Data Gathering

2. Data Preparation

3. Mining the data

4. Data Analysis and interpretation

To understand it better, let`s see a picture below:

The Data Source in the image is the first process where the data is gathered and then the data is sent to ETL (Extraction, Transformation, and Loading) this phase also includes error removing and a lot more than ETL actually. Next, the data went for the data warehouse, in the warehouse, there are small Data Marts, Data Marts store the most used data like cache memory in the computer that stores the recently used programs/applications. Then, this data is fetched through OLAP Server (Online Analytical Processing Server) and used for data mining, reporting tool, and analysis tool.

After 4 steps of Data Mining, we have different Data Mining Techniques. Don`t worry! We won`t go into more detail (Just names, will make a blog on them after it). These are:

· Classification

· Clustering

· Regression

· Neural networks

· Association

· Sequence

Now, there are some software’s through which we do Data Mining (Of course, we need something to mine data, we can`t do it in the air), these software`s are:

1. Alteryx

2. Amazon Web Services

3. Data Bricks

4. Data Robot

5. and there are many other

Last but not least is its Pros and Cons:

Pros:

1. Business Purpose

2. Better Customer Service

Cons:

1. Security

2. Information Misuse

Business Purpose: Companies use data mining to know their customer’s preferences so, that they can work more on it and show the user relevant products or even ads. Like Google Ad Sense. See open any website on your browser and they will show you the ads relevant to you and appropriate regarding your country, like here:

Google is showing the ad of PSL because of my Location.

Better Customer Service: When companies look at the problem they are facing like finding a product on a website, the company will look after it and do what they can to solve the problem. Like they made a Recommended for you button:

Now, we get Cons as well.

Security:

This is a YouTube channel Analytics, you can see it says when your viewers are on YouTube, Other Channel Your audience watches, and other videos your audience watched, means how do you know? Sometimes, we don`t want others to know what we watched or which YouTube channels we watched. So, first comes a Security risk.

Information Misuse: 2 years ago, the data of 115 million people of Pakistan was on the Dark Web for sale.

As we have gone through everything about Data Mining, let’s go for its history.

History: It all emerges in the late 1980s and early 1990s to analyze the vast amount of data when companies all over the world were gathering and producing data. The word Data Mining was in use by 1995 when the first international conference was held in Montreal. The event was sponsored by AARI (Association for the Advancement of Artificial Intelligence). A journal called Data Mining and Knowledge Discovery published its first problem in 1997, and so on the problems and the advancement begins.

--

--

Athar Naveed

Carving a path for junior devs so that they can easily navigate through the hurdles that made my learning experience challenging. Also loves Gardening!