Ml Interview Questions – Introduction

1. [Asked in Google] What is Machine Learning? How is it different from traditional programming?

Answer:
Machine Learning (ML) is a subset of Artificial Intelligence (AI) that allows computers to learn from data and make decisions without being explicitly programmed.

Traditional ProgrammingMachine Learning
Developer writes explicit rulesModel learns patterns from data
Used for well-defined tasksUsed for tasks with uncertain patterns
Example: If-Else conditions in a fraud detection systemExample: A model learning fraud patterns from past transactions

2. [Asked in Amazon] What are the different types of Machine Learning?

Answer:
Machine Learning is classified into three types:

  1. Supervised Learning
    • Uses labeled data to train models.
    • Examples: Classification (Spam Detection), Regression (Stock Price Prediction).
  2. Unsupervised Learning
    • Uses unlabeled data to find hidden patterns.
    • Examples: Clustering (Customer Segmentation), Dimensionality Reduction (PCA).
  3. Reinforcement Learning
    • Model learns through rewards and penalties by interacting with an environment.
    • Examples: Self-driving cars, Game-playing AI (AlphaGo).

3. [Asked in Microsoft] What is the difference between Classification and Regression?

Answer:

FeatureClassificationRegression
Output TypeDiscrete (Categories)Continuous (Numerical values)
ExampleSpam Detection (Spam/Not Spam)Predicting House Prices ($200K, $250K)
AlgorithmsDecision Trees, SVM, Random ForestLinear Regression, Ridge Regression

Example in Python:

from sklearn.linear_model import LinearRegression, LogisticRegression

# Regression Model
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Classification Model
classifier = LogisticRegression()
classifier.fit(X_train, y_train)

4. [Asked in Netflix] What are the key steps in a Machine Learning pipeline?

Answer:
A standard ML pipeline consists of:

  1. Data Collection – Gather relevant data.
  2. Data Preprocessing – Handle missing values, remove outliers, normalize data.
  3. Feature Engineering – Select important features.
  4. Model Selection – Choose an appropriate algorithm (Decision Tree, SVM, etc.).
  5. Training the Model – Train on a dataset.
  6. Evaluation – Measure accuracy using metrics like RMSE, Precision, Recall.
  7. Hyperparameter Tuning – Optimize model parameters.
  8. Deployment & Monitoring – Deploy and continuously improve the model.

5. [Asked in Tesla] What are Model Overfitting and Underfitting? How do you prevent them?

Answer:

  • Overfitting: The model memorizes training data but fails on new data.
  • Underfitting: The model is too simple and fails to capture patterns.

Prevention Techniques:
For Overfitting:

  • Use Cross-Validation (K-Fold CV).
  • Apply Regularization (L1, L2).
  • Use Dropout (in Deep Learning).

For Underfitting:

  • Use a more complex model.
  • Increase training data.
  • Reduce regularization.

Example using Regularization:

from sklearn.linear_model import Ridge

model = Ridge(alpha=1.0)  # L2 Regularization
model.fit(X_train, y_train)

6. [Asked in Facebook] What is Feature Engineering? Why is it important?

Answer:
Feature Engineering involves transforming raw data into a better format for a Machine Learning model.

Importance:

  • Improves model performance.
  • Reduces training time.
  • Helps simplify complex relationships in data.

Techniques:

  1. Handling Missing Data – Replace missing values using mean/median/mode.
  2. Feature Scaling – Normalize data using Standardization (StandardScaler).
  3. Encoding Categorical Data – Use One-Hot Encoding or Label Encoding.

Example of Feature Scaling:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

7. [Asked in LinkedIn] What are the commonly used Machine Learning algorithms?

Answer:

Supervised Learning:
Linear Regression – Predicts continuous values.
Logistic Regression – Binary classification.
Decision Tree – Tree-based classification and regression.
Random Forest – Ensemble of Decision Trees.
Support Vector Machine (SVM) – Works well with high-dimensional data.

Unsupervised Learning:
K-Means Clustering – Groups similar data points.
Principal Component Analysis (PCA) – Reduces dimensions of data.

Reinforcement Learning:
Q-Learning – Used in robotics and game AI.


8. [Asked in IBM] What is Cross-Validation, and why is it important?

Answer:
Cross-Validation assesses model performance by splitting the dataset into multiple parts for training and validation.

Types of Cross-Validation:

  • K-Fold Cross-Validation: Splits data into K subsets, training on K-1 and testing on 1.
  • Leave-One-Out Cross-Validation (LOOCV): Uses one sample for validation and the rest for training.

Example using K-Fold CV:

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
scores = cross_val_score(model, X, y, cv=5)

print("Mean Accuracy:", scores.mean())

9. [Asked in Adobe] What are Precision, Recall, and F1-score? When should you use them?

Answer:
These metrics evaluate classification models when accuracy is not enough.

MetricFormulaWhen to Use
PrecisionTP / (TP + FP)Important for reducing false positives (e.g., Spam Detection)
RecallTP / (TP + FN)Important for reducing false negatives (e.g., Disease Detection)
F1-Score2 * (Precision * Recall) / (Precision + Recall)When both precision & recall matter

Example in Python:

from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

10. [Asked in Apple] What is the Bias-Variance Tradeoff in Machine Learning?

Answer:
The Bias-Variance Tradeoff describes the balance between underfitting and overfitting.

BiasVariance
Error due to overly simple modelError due to overly complex model
High bias → UnderfittingHigh variance → Overfitting
Example: Linear Regression on a non-linear datasetExample: Deep Neural Network on small dataset

Solution:

  • Use Ensemble Learning (Random Forest reduces variance).
  • Apply Regularization (L1/L2).
  • Increase training data.

Example using Ridge Regression to reduce variance:

from sklearn.linear_model import Ridge

ridge_model = Ridge(alpha=0.5)
ridge_model.fit(X_train, y_train)

11. [Asked in Google] What is the difference between Parametric and Non-Parametric Machine Learning models?

Answer:

  • Parametric Models:
    • Assume a fixed number of parameters.
    • Make assumptions about the data distribution.
    • Examples: Linear Regression, Logistic Regression, Naïve Bayes.
  • Non-Parametric Models:
    • Do not make strong assumptions about the data distribution.
    • The number of parameters grows with the dataset.
    • Examples: Decision Trees, k-Nearest Neighbors (k-NN), Support Vector Machines (SVM).

12. [Asked in Amazon] What is Dimensionality Reduction in Machine Learning? Why is it needed?

Answer:
Dimensionality reduction is the process of reducing the number of features in a dataset while preserving its key information.

Why is it needed?

  • Reduces overfitting by removing redundant or irrelevant features.
  • Improves model performance and training speed.
  • Helps visualize high-dimensional data.

Techniques:

  1. Principal Component Analysis (PCA) – Projects data into lower dimensions while preserving variance.
  2. t-SNE (t-Distributed Stochastic Neighbor Embedding) – Used for visualizing high-dimensional data.
  3. Autoencoders – Neural networks for feature extraction.

Example of PCA in Python:

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)

13. [Asked in Microsoft] What is One-Hot Encoding and when should it be used?

Answer:
One-Hot Encoding is a technique to convert categorical variables into a numerical format.

When to use:

  • When categorical data is nominal (no inherent order).
  • Example: Colors (Red, Blue, Green) → [1, 0, 0], [0, 1, 0], [0, 0, 1].

Example in Python:

from sklearn.preprocessing import OneHotEncoder
import pandas as pd

encoder = OneHotEncoder(sparse=False)
encoded = encoder.fit_transform(pd.DataFrame(["Red", "Blue", "Green"], columns=["Color"]))

14. [Asked in Netflix] What is the Curse of Dimensionality? How does it affect Machine Learning?

Answer:
The Curse of Dimensionality refers to performance issues that arise when the number of features increases.

Effects:

  • Increased sparsity: Data points become more distant in high-dimensional space.
  • Computational complexity: More features lead to slower training.
  • Overfitting risk: More dimensions require more training data to generalize well.

Solution: Use Dimensionality Reduction techniques like PCA or Feature Selection.


15. [Asked in Tesla] What is an Imbalanced Dataset? How do you handle it in Machine Learning?

Answer:
An imbalanced dataset occurs when one class has significantly more samples than another (e.g., Fraud Detection, where fraudulent cases are rare).

Techniques to Handle Imbalance:

  1. Resampling Methods:
    • Oversampling (e.g., SMOTE) – Increases minority class samples.
    • Undersampling – Reduces majority class samples.
  2. Class Weights Adjustment: pythonCopyEditfrom sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier(class_weight="balanced")
  3. Use Performance Metrics like Precision, Recall, and F1-Score instead of accuracy.

16. [Asked in Facebook] What is Transfer Learning? How is it useful?

Answer:
Transfer Learning is a technique where a pre-trained model on a large dataset is reused for a different but related task.

Advantages:

  • Requires less training data than training from scratch.
  • Reduces computational cost by leveraging existing knowledge.
  • Speeds up training and improves accuracy for complex problems.

Example using a pre-trained model in TensorFlow:

from tensorflow.keras.applications import VGG16

model = VGG16(weights="imagenet", include_top=False, input_shape=(224, 224, 3))

17. [Asked in LinkedIn] What is a Confusion Matrix? Explain its components.

Answer:
A Confusion Matrix is used to evaluate a classification model’s performance by showing actual vs. predicted classifications.

Predicted PositivePredicted Negative
Actual PositiveTrue Positive (TP)False Negative (FN)
Actual NegativeFalse Positive (FP)True Negative (TN)

Key Metrics Derived from the Confusion Matrix:

  • Accuracy = (TP + TN) / (Total Samples)
  • Precision = TP / (TP + FP)
  • Recall (Sensitivity) = TP / (TP + FN)
  • F1-Score = Harmonic mean of Precision and Recall

Example in Python:

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)
print(cm)

18. [Asked in IBM] What is Hyperparameter Tuning in Machine Learning?

Answer:
Hyperparameter tuning involves optimizing model parameters that are not learned from the data but set before training.

Techniques:

  1. Grid Search – Tries all possible combinations of hyperparameters.
  2. Random Search – Randomly selects hyperparameters from a given range.
  3. Bayesian Optimization – Finds the best hyperparameters using probabilistic models.

Example using Grid Search:

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier

param_grid = {"n_estimators": [10, 50, 100], "max_depth": [None, 10, 20]}
grid_search = GridSearchCV(RandomForestClassifier(), param_grid, cv=3)
grid_search.fit(X_train, y_train)

print("Best Parameters:", grid_search.best_params_)

19. [Asked in Adobe] What are Bagging and Boosting in Ensemble Learning?

Answer:
Bagging and Boosting are ensemble learning techniques that improve model performance.

FeatureBaggingBoosting
StrategyTrains multiple models independently and averages resultsTrains models sequentially, correcting errors of previous models
ReducesVarianceBias
ExampleRandom ForestXGBoost, AdaBoost

Example using Bagging (Random Forest):

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

20. [Asked in Apple] What is Stochastic Gradient Descent (SGD), and how does it work?

Answer:
Stochastic Gradient Descent (SGD) is an optimization algorithm used to update model parameters based on a subset of training data (batch) instead of the entire dataset.

Advantages:

  • Faster than traditional Gradient Descent.
  • Works well with large datasets.
  • Helps avoid local minima due to randomness.

Example using SGD in Python:

from sklearn.linear_model import SGDClassifier

sgd = SGDClassifier(loss="log")
sgd.fit(X_train, y_train)