What is the difference between Data Mining and Data Analysis?
Data Mining
Used to recognize patterns in data stored.
Mining is performed on clean and well documented data.
Results extracted from data mining are not easy to interpret.
Data Analysis
Used to order & organize raw data in a meaningful manner.
The analysis of data involves Data Cleaning. So, data is not present in a well documented format.
Results extracted from data analysis are easy to interpret.
So, if you have to summarize, Data Mining is often used to identify patterns in the
data stored. It is mostly used for Machine Learning, and analysts have to just
recognize the patterns with the help of algorithms. Whereas, Data Analysis is used
to gather insights from raw data, which has to be cleaned and organized before
performing the analysis.
What is the process of Data Analysis?
Data analysis is the process of collecting, cleansing, interpreting, transforming and
modeling data to gather insights and generate reports to gain business profits.
Collect Data: The data gets collected from various sources and is stored so
that it can be cleaned and prepared. In this step, all the missing values and
outliers are removed.
Analyse Data: Once the data is ready, the next step is to analyze the data. A
model is run repeatedly for improvements. Then, the mode is validated to
check whether it meets the business requirements.
Create Reports: Finally, the model is implemented and then reports thus
generated are passed onto the stakeholders.
What is the difference between Data Mining and Data Profiling?
Data Mining: Data Mining refers to the analysis of data with respect to finding
relations that have not been discovered earlier. It mainly focuses on the detection
of unusual records, dependencies and cluster analysis.
Data Profiling: Data Profiling refers to the process of analyzing individual
attributes of data. It mainly focuses on providing valuable information on data
attributes such as data type, frequency etc.
What is data cleansing and what are the best ways to practice
data cleansing?
Data Cleansing or Wrangling or Data Cleaning. All mean the same thing. It is the
process of identifying and removing errors to enhance the quality of data. You can
refer to the below image to know the various ways to deal with missing data.
What are the important steps in the data validation process?
As the name suggests Data Validation is the process of validating data. This step
mainly has two processes involved in it. These are Data Screening and Data
Verification.
Data Screening: Different kinds of algorithms are used in this step to screen
the entire data to find out any inaccurate values.
Data Verification: Each and every suspected value is evaluated on various
use-cases, and then a final decision is taken on whether the value has to be
included in the data or not.