Update interview_prep.md

Amogh Singhal · web-flow · commit bafb8c8ac2b6 · 2021-01-03T01:15:02.000+05:30
diff --git a/interview_prep.md b/interview_prep.md
@@ -2,3 +2,36 @@
 
 When two or more predictors are highly correlated to each other such that one predictor 
 can be derived using the linear combinations of other predictors, then the predictors are said to be collinear
+
+### 2. What is the difference between standardisation and normalization ? Why is it useful?
+### 3. What is the central limit theorem ? Why is it useful ?
+### 4. What is the inter quartile range ? Why is it useful ?
+### 5. What is the difference between t-test and z-test ? Why is it useful ?
+### 6. Why do we take n-1 when calculating sample variance? Why is it useful ?
+Read about Besel correction
+### 7. What are the assumptions of the normal distribution ? Why is it useful ?
+### 8. What are the different approches to outlier detection ?  How will you handle the outliers? Why is it useful ?
+### 9. Where is RMSE a bad case ? How do we solve this ?
+### 10. What are the loss functions used in logistic regression ?
+log loss function
+### 11. Explain random forest in laymen terms ?
+### 12. How does logisitc regression work in laymen terms ?
+### 13. Why is logistic regression bad idea for multiclass classification ?
+### 14. How do you perform the train test split in a timeseries modelling ?
+### 15. What is the impact on timeseries model in case we have latge variation in data ?
+### 16. How do you decide the value of K(value of clusters) in K-means clustering ?
+### 17. What are the advantages and disadvantages of undersampling and oversampling ?
+### 18. Which are some supervised algorithms that are not impacted by imbalanced data ?
+### 19. You are a placement coordinator, you have to design a system for resume recommendation aligning to a company's requirement ?
+a. K means clustering to make clusters
+b. Ranking algorithm to sort for relevance
+
+_Second Strategy_
+
+a. Perform document similarity using Hamming distance (distance based approach)
+b. Compute the JD document distance with the resumes
+c. Shortlist top K resumes 
+
+### 20. How will you encode a feature like PinCode which has very high number of discrete values?
+Target mean encoding
+### 21. How do you design the architecture of a neural network?