What is Machine Learning?
Machine learning is subset of Artificial Intelligence (AI). Machine learning is the area of Computer science that specializes in analyzing and interpreting styles and structures in data to allow enable knowledge of, reasoning, and decision making outdoor of human interaction.
See more : What is Machine Learning? and History
What is Deep Learning?
Deep Learning (DL) is a subset of Machine Learning (ML). Usually, when people use the term deep learning, they are referring to deep artificial neural networks, and somewhat less frequently to deep reinforcement learning.
Why machine learning?
Machine learning has several very practical applications that drive the kind of real business results – such as time and money savings – that have the potential to dramatically impact the future of your organization.. Machine learning has made dramatic improvements in the past few years, but we are still very far from reaching human performance. Many times, the machine needs the assistance of human to complete its task.
At Interactions, we have deployed Virtual Assistant solutions that seamlessly blend artificial with true human intelligence to deliver the highest level of accuracy and understanding. However, machine learning is best leveraged for specific types of applications that will benefit the most from this technology, such as fraud detection, predictive marketing, machine monitoring (for the Internet of Things), and inventory management. The most intuitive and prominent example is self-driving cars.
What is difference between Supervised and unsupervised learning algorithms?
Supervised learning algorithms work on data which are labelled i.e. the data has a desired solution or a label. Supervised algorithm examples:
- Linear Regression
- Neural Networks/Deep Learning
- Decision Trees
- Support Vector Machine (SVM)
- K-Nearest neighbours
Unsupervised learning algorithms work on unlabeled data, meaning that the data does not contain the desired solution for the algorithm to learn from. Unsupervised algorithm examples:
- Clustering Algorithms: K-means, Hierarchical Clustering Analysis (HCA)
- Visualization and Dimensionality reduction :Principal component reduction (PCA)
- Association rule learning
What is reinforcement learning?
Reinforcement learning, the model has some input data and a reward depending on the output of the model. The model learns a policy that maximizes the reward. Reinforcement learning has been applied successfully to strategic games such as Go and even classic Atari video games.
What is Data Pre-processing technique for Machine Learning?
Data Pre-processing is a technique that is used to convert the raw data into a clean data set. In other words, whenever the data is gathered from different sources it is collected in raw format which is not feasible for the analysis.
What are the important Data pre-processing techniques in Python Machine Learning?
Some techniques are:
- Rescaling Data
- Standardized data
- Normalizing data
- Binarizing data (Make Binary)
- Mean removal
- One hot encoding
- Label encoding
Can you explain Rescaling Data technique in Data pre-processing?
Your pre-processed data may contain attributes with mixtures of scales for various quantities such as dollars, kilograms and sales volume. For data with attributes of varying scales, we can rescale attributes to possess the same scale. We rescale attributes into the range 0 to 1 and call it normalization. We use the MinMaxScaler class from scikit-learn.
What is Data Standardization in ML?
Standardization refers to shifting the distribution of each attribute to have a mean of zero and a standard deviation of one (unit variance).It is useful to standardize attributes for a model that relies on the distribution of attributes such as Gaussian processes.
What is Data Normalization in ML?
Normalization refers to rescaling real valued numeric attributes into the range 0 and 1.It is useful to scale the input attributes for a model that relies on the magnitude of values, such as distance measures used in k-nearest neighbours and in the preparation of coefficients in regression.
What is Data Augmentation in ML?
Data augmentation is a technique for synthesizing new data by modifying existing data in such a way that the target is not changed, or it is changed in a known way. Computer vision is one of fields where data augmentation is very useful
How is Machine Learning (ML) different from Artificial Intelligence (AI)?
Artificial Intelligence (AI) involves machines that execute tasks which are programmed and based on human intelligence, whereas ML is a subset application of AI where machines are made to learn information. They gradually perform tasks and can automatically build models from the learning’s
Explain the k-nearest algorithm different from the KNN clustering?
K-nearest algorithm is the supervised learning while the k-means algorithm is assigned under the unsupervised learning. While these two techniques look similar at the first glance, still there is a lot of difference between the two. Supervised learning needs data in the labelled form.
Can you explain how do you handle missing or corrupted data in a dataset?
Find missing/corrupted data in a dataset and either drop those rows or columns, or decide to replace them with another value. In Pandas, there are two very useful methods: isnull() and dropna() that will help you find columns of data with missing or corrupted data and drop those values. If you want to fill the invalid values with a placeholder value (for example, 0), you could use the fillna() method.
What is dimensionality reduction?
The process of reducing variables in a ML classification scenario is called Dimensionality reduction. The process is segregated into sub-processes called feature extraction and feature selection. Dimensionality reduction is done to enhance visualisation of training data. It finds the appropriate set of variables known as principal variables.
What is PCA in ML?
PCA stands for Principal Component Analysis. It is a dimensionality-reduction technique which mathematically transforms a set of correlated variables into a smaller set of uncorrelated variables called principal components. Applications of PCA are Noise reduction, Preprocess, Compression.
What are the best python libraries used in Machine Learning?
Python is the one of the most used programming language for solving the problems associated with ML (Machine Learning).Some python libraries are
- TensorFlow
- Theano
- NLTK
- Scikit Learn
- Keras
- Numpy
What is Feature Engineering?
The transformation stage in the data preparation process includes an important step known as Feature Engineering. Feature Engineering refers to selecting and extracting the right features from the data that are relevant to the task and model in consideration.
What is Feature Scaling?
Feature Scaling is a method used in Machine Learning for standardization of independent variables of data features.
What is Batch Normalization?
Batch Normalization is a technique to provide any layer in a Neural Network with inputs that are zero mean/unit variance – and this is basically what they like! But BatchNorm consists of one more step which makes this algorithm really powerful.
What is the F1 score?
The F1 score is used to check the performance of a model or this is the average of precision and recall of a model where 1 means the best and 0 means the worst
Why is Naive Bayes so Naive?
Naive Bayes is so ‘naive’ because it assumes that all of the features in a data set are equally important and independent. As we know, these assumptions are rarely true in real world scenario.
What is difference between a parameter and a hyperparameter?
Parameters are attributes in training data that can be estimated during ML. Hyperparameters are attributes that cannot be determined beforehand in the training data. Example: Learning rate in neural networks.
What are the smaller dataset techniques?
Below are some techniques.
- Data augmentation
- Pretrained Models
- Better algorithm
- Get started with generating the data
- Download from internet
What is Pruning in Decision trees?
Pruning is you remove branches that have weak predictive power in order to reduce the complexity of the model and in addition increase the predictive accuracy of a decision tree model. There are several flavours which includes, bottom-up and top-down pruning, with approaches such as reduced error pruning and cost complexity pruning.
Read : Datascience interview Questions and Answers
What is ROC?
ROC stands for Receiver operating characteristic .It is the pictorial representation of the contrast between true positive rates and the false positive rates calculated at multiple thresholds. It is used as the proxy to measure the trade-offs and sensitivity of the model. Based on the observation, it will trigger the false alarms.
Can you explain bias-variance trade-off?
The bias-variance trade-off is able to handle the learning errors effectively and manages noise too that happens due to underlying data, essentially, this trade-off will make the model more complex than usual but errors are reduced optimally
How will you handle missing data?
One can find missing data in a data-set and either drop those rows or columns, or decide to replace them with another value. In python library Pandas there are two useful functions which will be helpful, isnull() and dropna().
What is Data structure? And what are the different types of Data structures supported in R programming?
Data structure can be defined as the specific form of organizing and storing the data. R programming supports five basic types of data structure namely vector, matrix, list, data frame and factor.
- List
- Matrix
- Vector
- Data Frame
- Table
- Factors
- String
What is Type I vs Type II error?
Type I error is committed when the null hypothesis is true and we reject it, also known as a ‘False Positive’. Type II error is committed when the null hypothesis is false and we accept it, also known as ‘False Negative’.
In the context of confusion matrix, we can say Type I error occurs when we classify a value as positive (1) when it is actually negative (0). Type II error occurs when we classify a value as negative (0) when it is actually positive(1).
How would you handle an imbalanced data-set?
Imbalanced data is, for example, you have 90% of the data in one class and 10% in other. This leads to problems such as, no predictive power on the other category of data. Here are few techniques to get over it,
- Obviously collect more data t Artificial Intelligence o balance
- Try different algorithm (Not going to work effectively)
- Correct the imbalance in data-set
What is the difference between Probability and Likelihood?
Not going too deep in technical, Probability quantifies prediction of outcome, likelihood quantifies trust in model. For instance, someone challenges us to a ‘profitable gambling game’. Then, probabilities will serve us to compute things like the expected profile of your gains and losses. In contrast, likelihood will
Read : Data Analyst Interview Questions and Answers
What is Data set in ML?
A dataset is a large repository of structured data.Â
What sentiment analysis?
Sentiment Analysis in Machine Learning applications is used to train machines to analyze and predict the emotion or sentiment associated with a sentence, word, or a piece of text.
W
hat is Natural language processing?
Natural language processing deals with training machines to process and analyze large amounts of natural language data.
What are the best public data sets for Machine learning?
There are number of Data sets used for Machine learning. Some popular data sets are
- Google
- Amazon
- Data. World
- Enigma public
- Microsoft data science for research
- Archive
- Archive.org datasets
- CMU Stat Lab collections
- CMU JASA data archive
- UCI Machine Learning Repository
- KDNuggets Data Collections
- Data.Gov
- Data Hub
- GitHub
- Kaggle data set
- KDnuggets
- Eurostant
- Re3data
- FAIRsharing
- Research pipeline
- Reddit
- Buzzfeed
- Fivethirtyeight
- Federal Highway Administration
- National Travel and Tourism Office
- Medicare
- The central of disease control
- WHO
- Quandl
What are the differences between Machine learning and Artificial Intelligence?
Machine Learning (ML):
- This is defined as the acquisition of knowledge or skill.
- The aim is to increase accuracy, but it does not care about success
- It is a simple concept machine takes data and learns from data.
- The goal is to learn from data on certain task to maximize the performance of machine on this task.
- ML allows system to learn new things from data.
- It involves in creating self learning algorithms.
- Machine Learning will go for only solution for that whether it is optimal or not.
- Machine Learning leads to knowledge
Artificial Intelligence (AI):
- Intelligence is defined as an ability to acquire and apply knowledge.
- The aim is to increase chance of success and not accuracy.
- It work as a computer program that does smart work
- The goal is to simulate natural intelligence to solve complex problem
- Artificial Intelligence is decision making.
- It leads to develop a system to mimic human to respond behave in a circumstances.
- Artificial Intelligence will go for finding the optimal solution.
- Artificial Intelligence leads to intelligence or wisdom.