|
2 | 2 |
|
3 | 3 | When two or more predictors are highly correlated to each other such that one predictor |
4 | 4 | can be derived using the linear combinations of other predictors, then the predictors are said to be collinear |
| 5 | + |
| 6 | +### 2. What is the difference between standardisation and normalization ? Why is it useful? |
| 7 | +### 3. What is the central limit theorem ? Why is it useful ? |
| 8 | +### 4. What is the inter quartile range ? Why is it useful ? |
| 9 | +### 5. What is the difference between t-test and z-test ? Why is it useful ? |
| 10 | +### 6. Why do we take n-1 when calculating sample variance? Why is it useful ? |
| 11 | +Read about Besel correction |
| 12 | +### 7. What are the assumptions of the normal distribution ? Why is it useful ? |
| 13 | +### 8. What are the different approches to outlier detection ? How will you handle the outliers? Why is it useful ? |
| 14 | +### 9. Where is RMSE a bad case ? How do we solve this ? |
| 15 | +### 10. What are the loss functions used in logistic regression ? |
| 16 | +log loss function |
| 17 | +### 11. Explain random forest in laymen terms ? |
| 18 | +### 12. How does logisitc regression work in laymen terms ? |
| 19 | +### 13. Why is logistic regression bad idea for multiclass classification ? |
| 20 | +### 14. How do you perform the train test split in a timeseries modelling ? |
| 21 | +### 15. What is the impact on timeseries model in case we have latge variation in data ? |
| 22 | +### 16. How do you decide the value of K(value of clusters) in K-means clustering ? |
| 23 | +### 17. What are the advantages and disadvantages of undersampling and oversampling ? |
| 24 | +### 18. Which are some supervised algorithms that are not impacted by imbalanced data ? |
| 25 | +### 19. You are a placement coordinator, you have to design a system for resume recommendation aligning to a company's requirement ? |
| 26 | +a. K means clustering to make clusters |
| 27 | +b. Ranking algorithm to sort for relevance |
| 28 | + |
| 29 | +_Second Strategy_ |
| 30 | + |
| 31 | +a. Perform document similarity using Hamming distance (distance based approach) |
| 32 | +b. Compute the JD document distance with the resumes |
| 33 | +c. Shortlist top K resumes |
| 34 | + |
| 35 | +### 20. How will you encode a feature like PinCode which has very high number of discrete values? |
| 36 | +Target mean encoding |
| 37 | +### 21. How do you design the architecture of a neural network? |
0 commit comments