Preliminary Course
Information for STOR 565 (Spring 2014)

Introduction to Machine Learning

**Note:** Information subject
to change. Please check back for changes.

Introduction to Machine Learning

Overview: Machine
learning refers to a broad family of methods for gaining
information about a phenomena or object of interest from
empirical observations. Machine learning draws on ideas from
statistics, optimization, computer science, and probability,
and has been used to address a number of substantive applied
problems. STOR 565 is an advanced-undergraduate /
masters level course intended to provide a broad-based
introduction to machine learning that is suitable for students
with appropriate mathematical and statistical background.

Prerequisites: MATH 233, STOR 215 or MATH 381, STOR 435, and MATH 547 (linear algebra). Recommended: STOR 555 (statistical inference). Some prior experience with basic programming is desirable, but is not required.

**Class Meetings:** Class will be held Tuesday and Thursday
2:00 - 3:15pm in Hanes 130. In addition, there will **weekly
review session** on Wednesday from 6-7:30pm in Hanes 130,
run by the TA.

Registration: Enrollment and registration for the course is handled by Christopher Schrieber in the Department of Statistics and Operations Research. Mr. Schrieber can be reached by email at cschrieb@email.unc.edu.__Please note that ____the
instructor does not have control over the class rolls____.__

Instructor: Andrew B. Nobel, Department of Statistics and Operations Research

Office: Hanes 308 Phone: 962-1352.

Office Hours: Monday 2:30-3:30pm, Wednesday 4-5pm.

Teaching Assistant: James Wilson

Office: Hanes B40 Email: jameswd@email.unc.edu

Office Hours: TBA

Textbook: At present, there is no required textbook for the course. Lecture material will come from different sources; we will provide links to relevant online material and appropriate lecture notes.

Other Texts: Much of the material in the course is covered, at a slightly more advanced level, in the text ``The Elements of Statistical Learning: Data Mining, Inference, and Prediction, by T. Hastie, R. Tibshirani, and J. Friedman'', which is publicly available online. Other books that may be useful are ``Topics in Finite and Discrete Mathematics'' by Sheldon Ross, which contains the probability background for the course, and ``Discrete Mathematics'' by Kenneth Rosen, which contains background material on logic and basic mathematical reasoning.

**Grading:** There will be a midterm
examination, and a comprehensive final examination. In
addition there will be regular homework assignments, and
several computer assignments using the R programming
language. The final course grade is based on a weighted
sum of Assignment, Midterm and Final scores (not alphabetic
grades) using the following weights.

Policy for Homework and Computer Assignments: Homework and computer assignments will be posted on the course web page, and in most cases will be due once a week. Assignments will be collected at the beginning of class on the day that they are due, so please be prepared to turn in your homework at that time.

Each assignment will be graded: late/missed assignments will receive a grade of zero. In computing a student's overall score for the course, their two lowest assignment scores will be dropped. This latter provision is meant to cover exceptional situations in which a student is unable to turn in an assignment due to circumstances beyond his/her control. Under ordinary circumstances, students are expected to turn in every homework and computer assignment.

To receive full credit on the homework assignments, you must clearly label each problem, neatly show all your work (including your mathematical arguments), and staple together the pages of each assignment in the correct order. Please write your name or initials on each page. You should give a clear account of your reasoning in English, and use full sentences where appropriate.

You are allowed to discuss the homework and computer assignments with other students, but must prepare each assignment by yourself. Copying of another person's answers or code is not allowed. Any questions regarding the grading of homework or computer assignments should first be addressed to the TA. If you are absent from class when an assignment is returned, you can get your paper from the TA during their office hours.

Exam Policy and Scheduling: There will be one in-class midterm exam, and a comprehensive final exam, also in-class. All exams will be closed book and closed notes, and without calculators. Tentative exam dates are as follows. The midterm will be given around the middle of the semester. The final exam will be given at the date and time specified in the official University Final Exam Schedule.

When the Midterm is returned, a rough correspondence between numerical scores and letter grades for that individual exam will be provided. These letter grades are not a prediction of your final grade. Any student whose score is in the D or F range should come to the instructor's office hours.

Classroom Protocol: Please show up on time, as late arrivals tend to disturb those already present. Reading of newspapers and the use of laptops, tablets, and phones, is not permitted during class. Attendance will not be taken in class. If you are unable to make a lecture, see to it that you obtain the notes from someone else in the class.

**Specific Prerequisites**: A list of
prerequisite courses is given above. A more detailed
list, by area, is given below. Note that this list is
not exhaustive, but is meant to help students brush up on
basic material that will used in class.

Prerequisites: MATH 233, STOR 215 or MATH 381, STOR 435, and MATH 547 (linear algebra). Recommended: STOR 555 (statistical inference). Some prior experience with basic programming is desirable, but is not required.

Registration: Enrollment and registration for the course is handled by Christopher Schrieber in the Department of Statistics and Operations Research. Mr. Schrieber can be reached by email at cschrieb@email.unc.edu.

Instructor: Andrew B. Nobel, Department of Statistics and Operations Research

Office: Hanes 308 Phone: 962-1352.

Office Hours: Monday 2:30-3:30pm, Wednesday 4-5pm.

Teaching Assistant: James Wilson

Office: Hanes B40 Email: jameswd@email.unc.edu

Office Hours: TBA

Textbook: At present, there is no required textbook for the course. Lecture material will come from different sources; we will provide links to relevant online material and appropriate lecture notes.

Other Texts: Much of the material in the course is covered, at a slightly more advanced level, in the text ``The Elements of Statistical Learning: Data Mining, Inference, and Prediction, by T. Hastie, R. Tibshirani, and J. Friedman'', which is publicly available online. Other books that may be useful are ``Topics in Finite and Discrete Mathematics'' by Sheldon Ross, which contains the probability background for the course, and ``Discrete Mathematics'' by Kenneth Rosen, which contains background material on logic and basic mathematical reasoning.

Homework and Computer
Assignments |
20% |

Midterm |
30% |

Final and Final
Project |
50% |

Policy for Homework and Computer Assignments: Homework and computer assignments will be posted on the course web page, and in most cases will be due once a week. Assignments will be collected at the beginning of class on the day that they are due, so please be prepared to turn in your homework at that time.

Each assignment will be graded: late/missed assignments will receive a grade of zero. In computing a student's overall score for the course, their two lowest assignment scores will be dropped. This latter provision is meant to cover exceptional situations in which a student is unable to turn in an assignment due to circumstances beyond his/her control. Under ordinary circumstances, students are expected to turn in every homework and computer assignment.

To receive full credit on the homework assignments, you must clearly label each problem, neatly show all your work (including your mathematical arguments), and staple together the pages of each assignment in the correct order. Please write your name or initials on each page. You should give a clear account of your reasoning in English, and use full sentences where appropriate.

You are allowed to discuss the homework and computer assignments with other students, but must prepare each assignment by yourself. Copying of another person's answers or code is not allowed. Any questions regarding the grading of homework or computer assignments should first be addressed to the TA. If you are absent from class when an assignment is returned, you can get your paper from the TA during their office hours.

Exam Policy and Scheduling: There will be one in-class midterm exam, and a comprehensive final exam, also in-class. All exams will be closed book and closed notes, and without calculators. Tentative exam dates are as follows. The midterm will be given around the middle of the semester. The final exam will be given at the date and time specified in the official University Final Exam Schedule.

Midterm | TBA |

Final | See University Timetable |

When the Midterm is returned, a rough correspondence between numerical scores and letter grades for that individual exam will be provided. These letter grades are not a prediction of your final grade. Any student whose score is in the D or F range should come to the instructor's office hours.

Classroom Protocol: Please show up on time, as late arrivals tend to disturb those already present. Reading of newspapers and the use of laptops, tablets, and phones, is not permitted during class. Attendance will not be taken in class. If you are unable to make a lecture, see to it that you obtain the notes from someone else in the class.

Statistics:Sample vs. population. One- and two-sample z- and t-statistics. Hypothesis testing and p-values.

Probability:Joint and conditional densities and probability mass functions. Definition and basic properties of mean, variance, covariance, correlation. Distributions and their basic properties: Bernoulli, binomial, Poisson, geometric (discrete); uniform, normal, exponential, gamma (continuous).

Linear algebra:Vector spaces, dimension, subspaces, matrix addition and multiplication, determinants, inverses, eigenvectors, eigenvalues, symmetric matrices, non-negative and positive definite matrices.

Syllabus (tentative): The
course has four parts. The first is devoted to some basic
results from probability that are needed for the material that
follows, including Hoeffding's inequality and Jensen's
inequality. The second part of the course is devoted to the
problem of classification, which is the simplest example of
supervised learning and prediction. The third part of
the course is devoted the problem of clustering, which is the
simplest example of unsupervised learning. The final part of
the course will consist of selected topics.

1. Exploratory Data Analysis

2. Probability

Honor Code: Students are expected to adhere to the UNC honor code at all times. Violations of the honor code will be prosecuted.

Study tips:

1. Keep up with the reading and homework assignments. If the reading assignment is long, break it up into smaller pieces (perhaps one section or subsection at a time). Keep a pencil and scratch paper on hand, and use these to work out the details of any argument that is not clear to you.

2. Look over the notes from the lecture k before attending lecture k+1. This will help keep you on top of the course material. Ideas from one lecture often carry over to the next: you will get much more out of the material if you can maintain a sense of continuity and keep the ``big picture'' in mind.

3. Complete the reading *before* doing the homework. Trying to find the right formula or paragraph for a particular problem often takes as much time, and it tends to create more confusion than it resolves.

4. It is important to know what you know, but it's especially important to know what you don't know. As you look over new reading material or your notes, ask yourself if you (really) understand it. Keep careful track of any concepts and ideas that are not clear to you, and make efforts to master these in a timely fashion, using the class notes, outside reading, office hours, and study groups if necessary.

5. One good way of seeing if you understand an idea or concept is to write down the associated definitions and basic facts, without the aid of your notes, in full, grammatical sentences. Take special note of how you employ prepositions. It's also helpful to state the definitions and basic facts out loud -- the same grammatical criteria apply here. Translating ideas from mathematics to complete English sentences, and back again, is important in mathematical research.

6. The homework and computer assignments play two important roles in the course. First, they provides an opportunity to actively think about, engage with, and learn the course material. In addition, they provides feedback on your understanding of the material. Carefully look over your corrected assignments. Most students do well on the assignments: even if you received a good score, make sure to note and understand and correct any mistakes you have made.

7. Begin studying for exams at least one week before they are given. Look over your notes, homework, and the text. Write up a study guide containing the main concepts and definitions being covered, and use this to get a clear picture of the overall landscape of the material. A study guide for each midterm will be posted online. For every topic on the study guide, you should know the relevant definitions, motivating ideas, and at least one or two examples.

1. Exploratory Data Analysis

- Data sets and
examples

- Summary statistics: sample mean, variance, correlation, etc.
- Q-Q plots

- Histograms and scatter plots
- Comparing two populations: z- and t-statistics
- High dimensional data
- The Spectral Theorem and Principal Component Analysis

- Basic properties of probabilities and expectations
- Minima, maxima, and the triangle inequality
- Markov and Chebyshev inequalities
- Jensen’s inequality
- Cauchy Schwartz inequality
- Exponential Inequalities of Hoeffding and Bernstein

- Unsupervised Learning
- K-means
- Hierarchical clustering, agglomerative and divisive
- Examples and applications

- Supervised learning, prediction. Classification problem.
- Class priors, posteriors, and class densities
- Bayes risk, Bayes classifier
- Formulas for Bayes risk

- Overfitting
- Linear discriminant analysis
- Quadratic discriminant analysis and extensions
- Logistic regression
- Nearest neighbor
- Training and test error
- Cross-validation

- Analysis of networks, community detection
- Biclustering
- Multiple Testing
and the False Discovery Rate

- Support vector machines
- Regression
- Penalization and the LASSO
- The Vapnik Chervonenkis (VC) dimension and the VC inequality

Honor Code: Students are expected to adhere to the UNC honor code at all times. Violations of the honor code will be prosecuted.

Study tips:

1. Keep up with the reading and homework assignments. If the reading assignment is long, break it up into smaller pieces (perhaps one section or subsection at a time). Keep a pencil and scratch paper on hand, and use these to work out the details of any argument that is not clear to you.

2. Look over the notes from the lecture k before attending lecture k+1. This will help keep you on top of the course material. Ideas from one lecture often carry over to the next: you will get much more out of the material if you can maintain a sense of continuity and keep the ``big picture'' in mind.

3. Complete the reading *before* doing the homework. Trying to find the right formula or paragraph for a particular problem often takes as much time, and it tends to create more confusion than it resolves.

4. It is important to know what you know, but it's especially important to know what you don't know. As you look over new reading material or your notes, ask yourself if you (really) understand it. Keep careful track of any concepts and ideas that are not clear to you, and make efforts to master these in a timely fashion, using the class notes, outside reading, office hours, and study groups if necessary.

5. One good way of seeing if you understand an idea or concept is to write down the associated definitions and basic facts, without the aid of your notes, in full, grammatical sentences. Take special note of how you employ prepositions. It's also helpful to state the definitions and basic facts out loud -- the same grammatical criteria apply here. Translating ideas from mathematics to complete English sentences, and back again, is important in mathematical research.

6. The homework and computer assignments play two important roles in the course. First, they provides an opportunity to actively think about, engage with, and learn the course material. In addition, they provides feedback on your understanding of the material. Carefully look over your corrected assignments. Most students do well on the assignments: even if you received a good score, make sure to note and understand and correct any mistakes you have made.

7. Begin studying for exams at least one week before they are given. Look over your notes, homework, and the text. Write up a study guide containing the main concepts and definitions being covered, and use this to get a clear picture of the overall landscape of the material. A study guide for each midterm will be posted online. For every topic on the study guide, you should know the relevant definitions, motivating ideas, and at least one or two examples.