Understanding the Basics of Machine Learning: Essential Concepts Explained
Sharing the key concepts of machine learning or someone else can call the keyword of machine learning dictionary. Let’s learn all the keywords, and then we will break down each point in depth. Here are the all key concepts.
Artificial Intelligence (AI)
Machine Learning
Algorithm
Data
Model
Model fitting
Training Data
Test Data
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Feature (Input, Independent Variable, Predictor)
Feature engineering
Feature Scaling (Normalization, Standardization)
Dimensionality
Target (Output, Label, Dependent Variable)
Instance (Example, Observation, Sample)
Label (class, target value)
Model complexity
Bias & Variance
Noise
Overfitting & Underfitting
Validation & Cross Validation
Regularization
Batch, Epoch, Iteration
Parameter
Hyperparameter
Cost Function (Loss Function, Objective Function)
Gradient Descent
Learning Rate
Evaluation
Artificial Intelligence (AI)
It is the field of computer science focused on creating systems or machines capable of performing tasks that typically require human intelligence. These tasks include problem-solving, learning, decision-making, natural language understanding, and pattern recognition.
AI is divided into 3 types and these are :
Narrow AI (Weak AI)
General AI (Strong AI)
Super AI
Machine Learning
Machine Learning is a subset of Artificial Intelligence that focuses on algorithms and models that learn patterns from data and improve their performance without being explicitly programmed. It’s broadly categorized into:
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Algorithms
A Machine Learning (ML) algorithm is a set of instructions or mathematical models that allow computers to learn patterns from data, make predictions, and improve over time.
Examples of Supervised Learning Algorithms:
Linear Regression
Logistic Regression
Ridge, Lasso, and Elastic Net Algorithms
Support Vector Machines
Naïve Bayes theorem
K-Nearest Neighbors (KNN) Algorithm
Decision Tree
Random Forest
Adaboost
Gradient Boosting
xgboost
Examples of Unsupervised Learning Algorithms:
PCA (Principal Component Analysis)
K Means Clustering
Hierarichal Clustering
DBSCAN Algorithm
Data
Data is a collection of raw materials. Data used in ML is Structured and Unstructured Data.
Model
An ML model is the output of a training process, representing learned patterns in the data.
Model Fitting
Model fitting in machine learning refers to the process of training a machine learning algorithm on a dataset to learn the underlying patterns or relationships between the input features (independent variables) and the target variable (dependent variable)
Training Data
The subset of data used to train the ML model. It includes inputs (features) and their corresponding outputs (labels).
Test Data
A separate dataset is used to evaluate the model's performance on unseen data.
Supervised Learning
A type of ML where models are trained on labeled data, i.e., data with known input-output pairs.
Unsupervised Learning
A type of ML where models learn patterns or structures in data without labeled outputs.
Reinforcement Learning
A learning method where an agent learns to make decisions by interacting with an environment to maximize cumulative rewards.
Feature (Input, Independent Variable, Predictor)
A measurable property or characteristic of the data used as input for a model.
Feature Engineering
The process of transforming raw data into meaningful features to improve model performance.
Feature Scaling (Normalization, Standardization)
Adjusting the range of feature values to bring them to a similar scale.
Normalization: Scales data to a range of [0, 1].
Standardization: Centers data around the mean with unit variance.
Dimensionality
The number of features (variables) in a dataset.
Target (Output, Label, Dependent Variable)
The variable a model is trained to predict in supervised learning.
Instance (Example, Observation, Sample)
A single data point in a dataset.
Label (Class, Target Value)
The ground truth or actual output value is associated with an instance in supervised learning.
Model Complexity
The capacity of a model to capture patterns. High complexity can lead to overfitting, while low complexity may lead to underfitting.
Bias & Variance
Bias: Error due to overly simplistic models (underfitting).
Variance: Error due to model sensitivity to small fluctuations in training data (overfitting).
Noise
Irrelevant or random variations in data that don’t represent true patterns.
Overfitting & Underfitting
Overfitting: When a model learns patterns specific to the training data, performing poorly on new data.
Underfitting: When a model fails to learn the patterns in the data.
Validation & Cross-Validation
Validation: Process of assessing model performance on a validation set.
Cross-Validation: Divides the dataset into folds to train and test the model multiple times for robust evaluation
Regularization
Techniques (e.g., L1, L2) that constrain model complexity to reduce overfitting.
Batch, Epoch, Iteration
Batch: Subset of the training data used in one pass of optimization.
Epoch: One complete cycle through the entire training dataset.
Iteration: A single update of model parameters.
Parameter
Model-specific values learned during training (e.g., weights in a neural network).
Hyperparameter
Values set before training that control the learning process (e.g., learning rate, number of layers).
Cost Function (Loss Function, Objective Function)
A function that measures how well a model's predictions match the actual outputs. Examples:
- MSE for regression, Cross-Entropy Loss for classification.
Gradient Descent
An optimization algorithm is used to minimize the cost function by updating model parameters in the direction of the steepest descent.
Learning Rate
Controls the step size in updating model parameters during gradient descent.
Evaluation
The process of assessing a trained model’s performance using metrics like:
Accuracy, Precision, Recall, F1-Score (for classification).
RMSE, MAE (for regression).