Ml Interview Questions – Introduction
1. [Asked in Google] What is Machine Learning? How is it different from traditional programming?
Answer:
Machine Learning (ML) is a subset of Artificial Intelligence (AI) that allows computers to learn from data and make decisions without being explicitly programmed.
| Traditional Programming | Machine Learning |
|---|---|
| Developer writes explicit rules | Model learns patterns from data |
| Used for well-defined tasks | Used for tasks with uncertain patterns |
| Example: If-Else conditions in a fraud detection system | Example: A model learning fraud patterns from past transactions |
2. [Asked in Amazon] What are the different types of Machine Learning?
Answer:
Machine Learning is classified into three types:
- Supervised Learning
- Uses labeled data to train models.
- Examples: Classification (Spam Detection), Regression (Stock Price Prediction).
- Unsupervised Learning
- Uses unlabeled data to find hidden patterns.
- Examples: Clustering (Customer Segmentation), Dimensionality Reduction (PCA).
- Reinforcement Learning
- Model learns through rewards and penalties by interacting with an environment.
- Examples: Self-driving cars, Game-playing AI (AlphaGo).
3. [Asked in Microsoft] What is the difference between Classification and Regression?
Answer:
| Feature | Classification | Regression |
|---|---|---|
| Output Type | Discrete (Categories) | Continuous (Numerical values) |
| Example | Spam Detection (Spam/Not Spam) | Predicting House Prices ($200K, $250K) |
| Algorithms | Decision Trees, SVM, Random Forest | Linear Regression, Ridge Regression |
Example in Python:
from sklearn.linear_model import LinearRegression, LogisticRegression
# Regression Model
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Classification Model
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
4. [Asked in Netflix] What are the key steps in a Machine Learning pipeline?
Answer:
A standard ML pipeline consists of:
- Data Collection – Gather relevant data.
- Data Preprocessing – Handle missing values, remove outliers, normalize data.
- Feature Engineering – Select important features.
- Model Selection – Choose an appropriate algorithm (Decision Tree, SVM, etc.).
- Training the Model – Train on a dataset.
- Evaluation – Measure accuracy using metrics like RMSE, Precision, Recall.
- Hyperparameter Tuning – Optimize model parameters.
- Deployment & Monitoring – Deploy and continuously improve the model.
5. [Asked in Tesla] What are Model Overfitting and Underfitting? How do you prevent them?
Answer:
- Overfitting: The model memorizes training data but fails on new data.
- Underfitting: The model is too simple and fails to capture patterns.
Prevention Techniques:
For Overfitting:
- Use Cross-Validation (K-Fold CV).
- Apply Regularization (L1, L2).
- Use Dropout (in Deep Learning).
For Underfitting:
- Use a more complex model.
- Increase training data.
- Reduce regularization.
Example using Regularization:
from sklearn.linear_model import Ridge
model = Ridge(alpha=1.0) # L2 Regularization
model.fit(X_train, y_train)
6. [Asked in Facebook] What is Feature Engineering? Why is it important?
Answer:
Feature Engineering involves transforming raw data into a better format for a Machine Learning model.
Importance:
- Improves model performance.
- Reduces training time.
- Helps simplify complex relationships in data.
Techniques:
- Handling Missing Data – Replace missing values using mean/median/mode.
- Feature Scaling – Normalize data using Standardization (
StandardScaler). - Encoding Categorical Data – Use One-Hot Encoding or Label Encoding.
Example of Feature Scaling:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
7. [Asked in LinkedIn] What are the commonly used Machine Learning algorithms?
Answer:
Supervised Learning:
Linear Regression – Predicts continuous values.
Logistic Regression – Binary classification.
Decision Tree – Tree-based classification and regression.
Random Forest – Ensemble of Decision Trees.
Support Vector Machine (SVM) – Works well with high-dimensional data.
Unsupervised Learning:
K-Means Clustering – Groups similar data points.
Principal Component Analysis (PCA) – Reduces dimensions of data.
Reinforcement Learning:
Q-Learning – Used in robotics and game AI.
8. [Asked in IBM] What is Cross-Validation, and why is it important?
Answer:
Cross-Validation assesses model performance by splitting the dataset into multiple parts for training and validation.
Types of Cross-Validation:
- K-Fold Cross-Validation: Splits data into K subsets, training on K-1 and testing on 1.
- Leave-One-Out Cross-Validation (LOOCV): Uses one sample for validation and the rest for training.
Example using K-Fold CV:
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
scores = cross_val_score(model, X, y, cv=5)
print("Mean Accuracy:", scores.mean())
9. [Asked in Adobe] What are Precision, Recall, and F1-score? When should you use them?
Answer:
These metrics evaluate classification models when accuracy is not enough.
| Metric | Formula | When to Use |
|---|---|---|
| Precision | TP / (TP + FP) | Important for reducing false positives (e.g., Spam Detection) |
| Recall | TP / (TP + FN) | Important for reducing false negatives (e.g., Disease Detection) |
| F1-Score | 2 * (Precision * Recall) / (Precision + Recall) | When both precision & recall matter |
Example in Python:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
10. [Asked in Apple] What is the Bias-Variance Tradeoff in Machine Learning?
Answer:
The Bias-Variance Tradeoff describes the balance between underfitting and overfitting.
| Bias | Variance |
|---|---|
| Error due to overly simple model | Error due to overly complex model |
| High bias → Underfitting | High variance → Overfitting |
| Example: Linear Regression on a non-linear dataset | Example: Deep Neural Network on small dataset |
Solution:
- Use Ensemble Learning (Random Forest reduces variance).
- Apply Regularization (L1/L2).
- Increase training data.
Example using Ridge Regression to reduce variance:
from sklearn.linear_model import Ridge
ridge_model = Ridge(alpha=0.5)
ridge_model.fit(X_train, y_train)
11. [Asked in Google] What is the difference between Parametric and Non-Parametric Machine Learning models?
Answer:
- Parametric Models:
- Assume a fixed number of parameters.
- Make assumptions about the data distribution.
- Examples: Linear Regression, Logistic Regression, Naïve Bayes.
- Non-Parametric Models:
- Do not make strong assumptions about the data distribution.
- The number of parameters grows with the dataset.
- Examples: Decision Trees, k-Nearest Neighbors (k-NN), Support Vector Machines (SVM).
12. [Asked in Amazon] What is Dimensionality Reduction in Machine Learning? Why is it needed?
Answer:
Dimensionality reduction is the process of reducing the number of features in a dataset while preserving its key information.
Why is it needed?
- Reduces overfitting by removing redundant or irrelevant features.
- Improves model performance and training speed.
- Helps visualize high-dimensional data.
Techniques:
- Principal Component Analysis (PCA) – Projects data into lower dimensions while preserving variance.
- t-SNE (t-Distributed Stochastic Neighbor Embedding) – Used for visualizing high-dimensional data.
- Autoencoders – Neural networks for feature extraction.
Example of PCA in Python:
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
13. [Asked in Microsoft] What is One-Hot Encoding and when should it be used?
Answer:
One-Hot Encoding is a technique to convert categorical variables into a numerical format.
When to use:
- When categorical data is nominal (no inherent order).
- Example: Colors (Red, Blue, Green) →
[1, 0, 0],[0, 1, 0],[0, 0, 1].
Example in Python:
from sklearn.preprocessing import OneHotEncoder
import pandas as pd
encoder = OneHotEncoder(sparse=False)
encoded = encoder.fit_transform(pd.DataFrame(["Red", "Blue", "Green"], columns=["Color"]))
14. [Asked in Netflix] What is the Curse of Dimensionality? How does it affect Machine Learning?
Answer:
The Curse of Dimensionality refers to performance issues that arise when the number of features increases.
Effects:
- Increased sparsity: Data points become more distant in high-dimensional space.
- Computational complexity: More features lead to slower training.
- Overfitting risk: More dimensions require more training data to generalize well.
Solution: Use Dimensionality Reduction techniques like PCA or Feature Selection.
15. [Asked in Tesla] What is an Imbalanced Dataset? How do you handle it in Machine Learning?
Answer:
An imbalanced dataset occurs when one class has significantly more samples than another (e.g., Fraud Detection, where fraudulent cases are rare).
Techniques to Handle Imbalance:
- Resampling Methods:
- Oversampling (e.g., SMOTE) – Increases minority class samples.
- Undersampling – Reduces majority class samples.
- Class Weights Adjustment: pythonCopyEdit
from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier(class_weight="balanced") - Use Performance Metrics like Precision, Recall, and F1-Score instead of accuracy.
16. [Asked in Facebook] What is Transfer Learning? How is it useful?
Answer:
Transfer Learning is a technique where a pre-trained model on a large dataset is reused for a different but related task.
Advantages:
- Requires less training data than training from scratch.
- Reduces computational cost by leveraging existing knowledge.
- Speeds up training and improves accuracy for complex problems.
Example using a pre-trained model in TensorFlow:
from tensorflow.keras.applications import VGG16
model = VGG16(weights="imagenet", include_top=False, input_shape=(224, 224, 3))
17. [Asked in LinkedIn] What is a Confusion Matrix? Explain its components.
Answer:
A Confusion Matrix is used to evaluate a classification model’s performance by showing actual vs. predicted classifications.
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actual Positive | True Positive (TP) | False Negative (FN) |
| Actual Negative | False Positive (FP) | True Negative (TN) |
Key Metrics Derived from the Confusion Matrix:
- Accuracy = (TP + TN) / (Total Samples)
- Precision = TP / (TP + FP)
- Recall (Sensitivity) = TP / (TP + FN)
- F1-Score = Harmonic mean of Precision and Recall
Example in Python:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
18. [Asked in IBM] What is Hyperparameter Tuning in Machine Learning?
Answer:
Hyperparameter tuning involves optimizing model parameters that are not learned from the data but set before training.
Techniques:
- Grid Search – Tries all possible combinations of hyperparameters.
- Random Search – Randomly selects hyperparameters from a given range.
- Bayesian Optimization – Finds the best hyperparameters using probabilistic models.
Example using Grid Search:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
param_grid = {"n_estimators": [10, 50, 100], "max_depth": [None, 10, 20]}
grid_search = GridSearchCV(RandomForestClassifier(), param_grid, cv=3)
grid_search.fit(X_train, y_train)
print("Best Parameters:", grid_search.best_params_)
19. [Asked in Adobe] What are Bagging and Boosting in Ensemble Learning?
Answer:
Bagging and Boosting are ensemble learning techniques that improve model performance.
| Feature | Bagging | Boosting |
|---|---|---|
| Strategy | Trains multiple models independently and averages results | Trains models sequentially, correcting errors of previous models |
| Reduces | Variance | Bias |
| Example | Random Forest | XGBoost, AdaBoost |
Example using Bagging (Random Forest):
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
20. [Asked in Apple] What is Stochastic Gradient Descent (SGD), and how does it work?
Answer:
Stochastic Gradient Descent (SGD) is an optimization algorithm used to update model parameters based on a subset of training data (batch) instead of the entire dataset.
Advantages:
- Faster than traditional Gradient Descent.
- Works well with large datasets.
- Helps avoid local minima due to randomness.
Example using SGD in Python:
from sklearn.linear_model import SGDClassifier
sgd = SGDClassifier(loss="log")
sgd.fit(X_train, y_train)
