Department: Industrial Engineering
Credits: Bilkent 3, ECTS 5
Course Coordinator: Savaş Dayanık
Semester:
20192020 Fall
Contact Hours:
3 hours of lecture per week,
1 hour of Lab/Studio/Others per week

Textbook and Other Required Material:

Required  Textbook: An Introduction to Statistical Learning, G. James, D. Witten, T. Hastie, R. Tibshirani, 2013, Springer [download]

Catalog Description:
Introduction to exploratory data analysis, multivariate regression, semiparametric regression, scatterplot smoothing, linear mixed models, generalized linear models, recursive partitioning, and hidden Markov models through the applications on real data sets using the statistical software R. Applications to consumer choice models, modeling the number of emergency room visits, building email spam filters, detecting fraudulent transactions, and other applications from manufacturing and service systems illustrating big data analytics.

Prerequisite(s):
MATH 260

Assessment Methods:

Type 
Label 
Count 
Total Contribution 
1 
Inclass participation 

1 
5 
2 
Homework 

4 
10 
3 
Quiz 

3 
25 
4 
Midterm:Practical (skills) 

1 
25 
5 
Final:Practical(skills) 

1 
35 

Minimum Requirements to Qualify for the Final Exam:
The weighted average of homework and quizzes should be at least 40%.

Course Learning Outcomes:
Course Learning Outcome 
Assessment 
Recognize the properties of fundamental statistical models 
Quiz Midterm:Practical (skills) Final:Practical(skills) 
Explore features of a data sets through graphics and summary statistics 
Quiz Midterm:Practical (skills) Final:Practical(skills) 
Fit a sensible statistical model to a given dataset 
Quiz Midterm:Practical (skills) Final:Practical(skills) 
Take advantage of biasvariance tradeoff and avoid overfitting 
Quiz Midterm:Practical (skills) Final:Practical(skills) 
Select the best model for a given dataset 
Quiz Midterm:Practical (skills) Final:Practical(skills) 

Weekly Syllabus:
 Introduction to statistical learning and R,
overview of regression and classification
problems
 Linear regression (Illustrations: effects of budgets allocated for TV, newspaper, radio advertisement on annual sales, prediction of credit card balance from income, limit, rating, age, number of cards, and education level)
 Linear regression continued and knearest
neighbour regression (Illustration: conjoint analysis from marketing science; how can you design a new product with a higher market penetration?)
 Logistic regression (Illustration: loan default probability estimation from credit card balance, income, occupation)
 Multinomial and Poisson regressions (Illustration: would it have been possible to predict the Challenger diasaster? https://en.wikipedia.org/wiki/Space_Shuttle_Challenger_disaster)
 Linear discriminant analysis (Illustrations: revisit credit card default probability estimation and Challenger disaster)
 Crossvalidation, linear
model selection, subset selection (Illustrations: what are the variables among income, limit, rating, age, number of cards, and education level that explain the credit card balance or default probability best? Is logistic regression or linear discriminant model best for predicting the loan default probability?)
 Shrinkage methods, ridge regression and
lasso (What if the number of predictors is largecomparable to number of examples? Illustration: prediction of salaries of baseball players from various measures of their performances in the past games)
 Polynomial regression, regression splines,
smoothing splines (Illustration: modeling the wage as a function of age, the amount pollutants in a residential area as a function of its distance from employment centers)
 Local regression, generalized additive
models for quantitative and categorical
variables (Illustrations: revisit wage and pollutant examples)
 Regression trees (Illustrations: predict the baseball player salaries, carseat sales)
 Classification trees (Illustrations: email spam filteringwhen is an email message spam? Predict crime rate in a residential area)
 Bagging, random forests, boosting (Illustrations: revisit baseball player salary email spam, crimerate examples)
 Principal component analysis, kmeans and hierarchical clustering (Illustrations: handwritten digit recognition, clustering cancer cell according to microarray data, marketbasket data)

Type of Course:
Lecture

Teaching Methods:
Lecture  Exercises  Assignment
