Machine learning day - lecture notes
Notes from Praseed Pai’s lecture, Machine learning day - KMUG (Praseed Pai) - 9-JUL-2016
- Analytical thinking vs system thinking
- analytical
- break problem down and solve
- system thinking
- holistic approach, nonlinear
- assume dependent vars
- analytical
-
Algorithmic techniques
- Hilbert space methods
- proximity queries between datasets
- Hilbert’s 23 problems (esp. pbm #2 and #10)
- statistical learning
- types of statistics
- non parametric statistics
- categorical/nominal data
- ordinal data (signifies order)
- parametric
- ratio
- interval
- non parametric statistics
- descriptive
- central tendency
- dispersion
- association
- types of statistics
- deep learning
- neural networks
- Hilbert space methods
-
Algorithmic classification
- supervised learning
- classification
- regression/prediction
- classification based on numerical data
- unsupervised
- clustering
- dimensionality reduction
- association analysis
- apriori
-
eg. Given historical retail data, decide whether customers who purchase bread and sugar should be offered a coupon for another product, say beer. We can solve this by finding the % of baskets that have beer in addn. to bread and sugar.
-
P(Y X) - prob of Y given X. TxnID | items ---------+-------------------- 1 | shoes, shirt, jacket 2 | shoes, jacket 3 | shoes, jeans 4 | shirt, sweatshirt items | Frequency ------------------+---------- shoes | 75% shirt | 50% {shoes, jacket} | 50%
-
- apriori
- supervised learning
- Decision tree classifier
- generate decision tree based on inputs
- Naive Bayes
- initial condition - priori probability
- adjust probability based on new data
- assumes independent variables
- posterior probability - calculate based on priori data
- eg. given learning data height, weight, foot size, predict gender.
- read up:
- false positive % - sensitivity
- false negative % - specificity
- base rate
- monty hall problem
- Weka - ML tool
MOOCs
- Caltech ML course by yasser abu mustafa
- Weka MOOC
Recommended books
- Machine Learning (Tom Mitchell)
- Statistics hacks (Bruce Frey)
- Financial Numerical Recipes in C++ (available online)
- Data Science for Dummies Using Python
- Machine Learning with Scikit Learn - Packt