Loading…

Data Science Interview Questions and Answers

By Umesh Singh
Last update: 06 Jul 2020, 7 Questions

Our list of Data Science Interview Questions will help you to understand the core basics of this subject, the crux of what data science really means. If you are currently looking for a job in this position and are just starting fresh, our list of questions will help you uncover a lot of concepts while strengthening your knowledge on this subject.

Data Science is a growing technology unifying multiple concepts of statistics, data analysis, machine learning, and domain knowledge. It uses these concepts to understand and analyze actual phenomena with the help of real-time data.

Most Frequently Asked Data Science Interview Questions

1. What do you essentially mean by data science?

Data Science is a blend of various fields using scientific processes, algorithms, and machine learning principles to extract information and insights from structural and unstructured forms of data.

It focuses on finding any hidden patterns from the raw data and turn it into a valuable resource for developing businesses and IT strategies.

2. Mention the differences between supervised and unsupervised learning?
Supervised Learning Unsupervised Learning
Here, the input data is labeled. Here, the input data is not labeled.
It uses a training data set. It uses the input data set.
It is primarily used for data prediction. It is primarily used for data analysis.
It helps in enabling regression and classification of data. It helps in enabling the density estimation, dimension reduction, and classification of data.
3. What do you understand by selection bias and mention its types?

Selection bias is a type of error that crops up when the researcher is deciding who/what is going to be studied. It is usually associated with research whose selection of participants is not random.

It is sometimes also mentioned as the selection effect. It involves the distortion of statistical analysis, which is a result of the method of collecting samples. It is vital to the whole process as, without this, the conclusions may not be accurate.

Here are the types of selection bias:
  • Sampling bias
  • Time interval Bias
  • Data Bias
  • Attrition Bias
4. What is the primary goal of A/B Testing in Data Science?

A/B Testing is a hypothesis testing used for a randomized experiment concerning two variables, A and B.

The primary goal of A/B Testing is identifying any changes on the web page for maximizing or increasing the outcome of interest. This is an excellent method for coming up with the best online promotions and other marketing strategies related to any business. It is used for multiple purposes such as website copy, digital ads, or even sales emails.

5. What is the need for Re-sampling?
Resampling is used for:
  • The estimation of accuracy involving sample statistics by using multiple subsets of accessible data or by drawing from a set of data points randomly.
  • The substitution of labels on data points while performing the necessary tests.
  • The validation of models through the usage of random subsets such as bootstrapping or cross-validation.
6. What do you understand by the Law of Large Numbers?

The law of large numbers, according to probability and statistics, states that as a sample size increases, the mean value gets closer to the average of the total population size.

7. In a 15-minute interval, there is a 20% probability of seeing at least one shooting star. What is the proba­bility that you will see at least one shooting star in one hour?

The probability of not seeing one shooting star in 15 minutes is
= 1 – P( One shooting star )
= 1 – 0.2 = 0.8 (20% probability, hence, 0.2)

The probability of not seeing any shooting star in an hour:
(0.8) ^ 4 = 0.4096

The probability of seeing one shooting star in an hour
= 1 – P( Not seeing any star )
= 1 – 0.4096 = 0.5904

Ans: 0.5904