Information for STOR 565 (Spring 2014)
Note: Information subject
to change. Please check back for changes.
learning refers to a broad family of methods for gaining
information about a phenomena or object of interest from
empirical observations. Machine learning draws on ideas from
statistics, optimization, computer science, and probability,
and has been used to address a number of substantive applied
problems. STOR 565 is an advanced-undergraduate /
masters level course intended to provide a broad-based
introduction to machine learning that is suitable for students
with appropriate mathematical and statistical background.
Prerequisites: MATH 233,
STOR 215 or MATH 381, STOR 435, and MATH 547 (linear
algebra). Recommended: STOR 555 (statistical
inference). Some prior experience with basic programming
is desirable, but is not required.
Class Meetings: Class will be held Tuesday and Thursday
2:00 - 3:15pm in Hanes 130. In addition, there will weekly
review session on Wednesday from 6-7:30pm in Hanes 130,
run by the TA.
Registration: Enrollment and registration for
the course is handled by Christopher Schrieber in the
Department of Statistics and Operations Research. Mr.
Schrieber can be reached by email at
email@example.com. Please note that the
instructor does not have control over the class rolls.
Instructor: Andrew B. Nobel, Department
of Statistics and Operations Research
308 Phone: 962-1352.
Office Hours: Monday 2:30-3:30pm, Wednesday
Teaching Assistant: James Wilson
Hanes B40 Email:
Office Hours: TBA
Textbook: At present, there is no
required textbook for the course. Lecture material will come
from different sources; we will provide links to relevant
online material and appropriate lecture notes.
Other Texts: Much of the material in the course is covered, at
a slightly more advanced level, in the text ``The Elements of
Statistical Learning: Data Mining, Inference, and Prediction,
by T. Hastie, R. Tibshirani, and J. Friedman'', which is
publicly available online. Other books that may be useful are
``Topics in Finite and Discrete Mathematics'' by Sheldon Ross,
which contains the probability background for the course, and
``Discrete Mathematics'' by Kenneth Rosen, which contains
background material on logic and basic mathematical reasoning.
Grading: There will be a midterm
examination, and a comprehensive final examination. In
addition there will be regular homework assignments, and
several computer assignments using the R programming
language. The final course grade is based on a weighted
sum of Assignment, Midterm and Final scores (not alphabetic
grades) using the following weights.
|Homework and Computer
|Final and Final
Policy for Homework and
Computer Assignments: Homework and computer
assignments will be posted on the course web page, and in
most cases will be due once a week. Assignments will be
collected at the beginning
of class on the day that they are due, so please be
prepared to turn in your homework at that time.
Each assignment will be graded: late/missed assignments
will receive a grade of zero. In computing a
student's overall score for the course, their two lowest
assignment scores will be dropped. This latter provision
is meant to cover exceptional situations in which a
student is unable to turn in an assignment due to
circumstances beyond his/her control. Under ordinary
circumstances, students are expected to turn in every
homework and computer assignment.
To receive full credit
on the homework assignments, you must clearly label each
problem, neatly show all your work (including your
mathematical arguments), and staple together the pages of
each assignment in the correct order. Please write
your name or initials on each page. You should give a
clear account of your reasoning in English, and use full
sentences where appropriate.
You are allowed to
discuss the homework and computer assignments with other
students, but must prepare each assignment by
yourself. Copying of another person's answers or
code is not allowed. Any questions regarding the grading
of homework or computer assignments should first be
addressed to the TA. If you are absent from class
when an assignment is returned, you can get your paper
from the TA during their office hours.
Exam Policy and Scheduling:
There will be one in-class midterm exam, and a
comprehensive final exam, also in-class. All exams
will be closed book and closed notes, and without
calculators. Tentative exam dates are as follows.
The midterm will be given around the middle of the
semester. The final exam will be given at the date
and time specified in the official University Final Exam
the Midterm is returned, a rough correspondence between
numerical scores and letter grades for that individual exam
will be provided. These letter grades are not a
prediction of your final grade. Any student whose
score is in the D or F range should come to the
instructor's office hours.
Classroom Protocol: Please show up on time, as
late arrivals tend to disturb those already present.
Reading of newspapers and the use of laptops, tablets, and
phones, is not permitted during class. Attendance
will not be taken in class. If you are unable to make a
lecture, see to it that you obtain the notes from someone else
in the class.
Specific Prerequisites: A list of
prerequisite courses is given above. A more detailed
list, by area, is given below. Note that this list is
not exhaustive, but is meant to help students brush up on
basic material that will used in class.
Sample vs. population. One- and two-sample z- and
t-statistics. Hypothesis testing and p-values.
Probability: Joint and
conditional densities and probability mass functions.
Definition and basic properties of mean, variance,
covariance, correlation. Distributions and their
basic properties: Bernoulli, binomial, Poisson, geometric
(discrete); uniform, normal, exponential, gamma
Linear algebra: Vector
spaces, dimension, subspaces, matrix addition and
multiplication, determinants, inverses, eigenvectors,
eigenvalues, symmetric matrices, non-negative and positive
Syllabus (tentative): The
course has four parts. The first is devoted to some basic
results from probability that are needed for the material that
follows, including Hoeffding's inequality and Jensen's
inequality. The second part of the course is devoted to the
problem of classification, which is the simplest example of
supervised learning and prediction. The third part of
the course is devoted the problem of clustering, which is the
simplest example of unsupervised learning. The final part of
the course will consist of selected topics.
1. Exploratory Data
- Data sets and
statistics: sample mean, variance, correlation, etc.
- Q-Q plots
- Histograms and
- Comparing two
populations: z- and t-statistics
- High dimensional
- The Spectral
Theorem and Principal Component Analysis
- Basic properties
of probabilities and expectations
- Minima, maxima,
and the triangle inequality
- Markov and
- Cauchy Schwartz
Inequalities of Hoeffding and Bernstein
4. Introduction to
clustering, agglomerative and divisive
- Examples and
learning, prediction. Classification problem.
- Class priors,
posteriors, and class densities
- Bayes risk, Bayes
- Formulas for
6. Selected Topics
discriminant analysis and extensions
- Nearest neighbor
- Training and test
- Analysis of
networks, community detection
- Multiple Testing
and the False Discovery Rate
- Support vector
- Penalization and
- The Vapnik
Chervonenkis (VC) dimension and the VC inequality
Honor Code: Students are expected to adhere to the
UNC honor code at all times. Violations of the honor code will
1. Keep up with the
reading and homework assignments. If the reading assignment is
long, break it up into smaller pieces (perhaps one section or
subsection at a time). Keep a pencil and scratch paper
on hand, and use these to work out the details of any argument
that is not clear to you.
2. Look over the notes
from the lecture k before attending lecture k+1.
This will help keep you on top of the course material.
Ideas from one lecture often carry over to the next: you will
get much more out of the material if you can maintain a sense
of continuity and keep the ``big picture'' in mind.
3. Complete the reading
*before* doing the homework. Trying to find the right
formula or paragraph for a particular problem often takes as
much time, and it tends to create more confusion than it
4. It is important to know
what you know, but it's especially important to know what you
don't know. As
you look over new reading material or your notes, ask yourself
if you (really) understand it. Keep careful track of any
concepts and ideas that are not clear to you, and make efforts
to master these in a timely fashion, using the class notes,
outside reading, office hours, and study groups if necessary.
5. One good way of seeing
if you understand an idea or concept is to write down the
associated definitions and basic facts, without the aid of
your notes, in full, grammatical sentences. Take special
note of how you employ prepositions. It's also helpful
to state the definitions and basic facts out loud -- the same
grammatical criteria apply here. Translating
ideas from mathematics to complete English sentences, and back
again, is important in mathematical research.
6. The homework and
computer assignments play two important roles in the
course. First, they provides an opportunity to actively
think about, engage with, and learn the course material.
In addition, they provides feedback on your
understanding of the material. Carefully look over your
corrected assignments. Most students do well on the
assignments: even if you received a good score, make sure to
note and understand and correct any mistakes you have made.
7. Begin studying for
exams at least one week before they are given. Look over
your notes, homework, and the text. Write up a study
guide containing the main concepts and definitions being
covered, and use this to get a clear picture of the overall
landscape of the material. A study guide for each midterm will
be posted online. For every topic on the study guide,
you should know the relevant definitions, motivating ideas,
and at least one or two examples.