Preliminary Course Information for STOR 565 (Spring 2014)
Introduction to Machine Learning


Note: Information subject to change.  Please check back for changes.

 


Overview: Machine learning refers to a broad family of methods for gaining information about a phenomena or object of interest from empirical observations. Machine learning draws on ideas from statistics, optimization, computer science, and probability, and has been used to address a number of substantive applied problems.  STOR 565 is an advanced-undergraduate / masters level course intended to provide a broad-based introduction to machine learning that is suitable for students with appropriate mathematical and statistical background.

Prerequisites: MATH 233, STOR 215 or MATH 381, STOR 435, and MATH 547 (linear algebra).  Recommended: STOR 555 (statistical inference).  Some prior experience with basic programming is desirable, but is not required. 

Class Meetings: Class will be held Tuesday and Thursday 2:00 - 3:15pm in Hanes 130.  In addition, there will weekly review session on Wednesday from 6-7:30pm in Hanes 130, run by the TA.

Registration: Enrollment and registration for the course is handled by Christopher Schrieber in the Department of Statistics and Operations Research.  Mr. Schrieber can be reached by email at cschrieb@email.unc.edu.   Please note that the instructor does not have control over the class rolls. 
 

Instructor:  Andrew B. Nobel, Department of Statistics and Operations Research

Office: Hanes 308       Phone: 962-1352.

Office Hours: Monday 2:30-3:30pm, Wednesday 4-5pm.
 

Teaching Assistant: James Wilson

Office: Hanes B40     Email: jameswd@email.unc.edu

Office Hours: TBA

 

Textbook:  At present, there is no required textbook for the course. Lecture material will come from diff erent sources; we will provide links to relevant online material and appropriate lecture notes.

Other Texts: Much of the material in the course is covered, at a slightly more advanced level, in the text ``The Elements of Statistical Learning: Data Mining, Inference, and Prediction, by T. Hastie, R. Tibshirani, and J. Friedman'', which is publicly available online. Other books that may be useful are ``Topics in Finite and Discrete Mathematics'' by Sheldon Ross, which contains the probability background for the course, and ``Discrete Mathematics'' by Kenneth Rosen, which contains background material on logic and basic mathematical reasoning.


Grading: There will be a midterm examination, and a comprehensive final examination.  In addition there will be regular homework assignments, and several computer assignments using the R programming language.  The final course grade is based on a weighted sum of Assignment, Midterm and Final scores (not alphabetic grades) using the following weights.

Homework and Computer Assignments
20%
Midterm
30%
Final and Final Project
50%
 

Policy for Homework and Computer Assignments: Homework and computer assignments will be posted on the course web page, and in most cases will be due once a week. Assignments will be collected at the beginning of class on the day that they are due, so please be prepared to turn in your homework at that time. 

Each assignment will be graded: late/missed assignments will receive a grade of zero.  In computing a student's overall score for the course, their two lowest assignment scores will be dropped. This latter provision is meant to cover exceptional situations in which a student is unable to turn in an assignment due to circumstances beyond his/her control.  Under ordinary circumstances, students are expected to turn in every homework and computer assignment.


To receive full credit on the homework assignments, you must clearly label each problem, neatly show all your work (including your mathematical arguments), and staple together the pages of each assignment in the correct order.  Please write your name or initials on each page. You should give a clear account of your reasoning in English, and use full sentences where appropriate.

You are allowed to discuss the homework and computer assignments with other students, but must prepare each assignment by yourself.  Copying of another person's answers or code is not allowed. Any questions regarding the grading of homework or computer assignments should first be addressed to the TA.  If you are absent from class when an assignment is returned, you can get your paper from the TA during their office hours.
 

Exam Policy and Scheduling:   There will be one in-class midterm exam, and a comprehensive final exam, also in-class.  All exams will be closed book and closed notes, and without calculators. Tentative exam dates are as follows.  The midterm will be given around the middle of the semester.  The final exam will be given at the date and time specified in the official University Final Exam Schedule.

Midterm  TBA
Final See University Timetable

When the Midterm is returned, a rough correspondence between numerical scores and letter grades for that individual exam will be provided.  These letter grades are not a prediction of your final grade.  Any student whose score is in the D or F range should come to the instructor's office hours. 
 


Classroom Protocol: Please show up on time, as late arrivals tend to disturb those already present.  Reading of newspapers and the use of laptops, tablets, and phones, is not permitted during class.   Attendance will not be taken in class.  If you are unable to make a lecture, see to it that you obtain the notes from someone else in the class.


Specific Prerequisites: A list of prerequisite courses is given above.   A more detailed list, by area, is given below.  Note that this list is not exhaustive, but is meant to help students brush up on basic material that will used in class.

Statistics: Sample vs. population.  One- and two-sample z- and t-statistics.  Hypothesis testing and p-values.

Probability: Joint and conditional densities and probability mass functions.   Definition and basic properties of mean, variance, covariance, correlation.  Distributions and their basic properties: Bernoulli, binomial, Poisson, geometric (discrete); uniform, normal, exponential, gamma (continuous).

Linear algebra: Vector spaces, dimension, subspaces, matrix addition and multiplication, determinants, inverses, eigenvectors, eigenvalues, symmetric matrices, non-negative and positive definite matrices.


Syllabus (tentative): The course has four parts. The first is devoted to some basic results from probability that are needed for the material that follows, including Hoeff ding's inequality and Jensen's inequality. The second part of the course is devoted to the problem of classi fication, which is the simplest example of supervised learning and prediction.  The third part of the course is devoted the problem of clustering, which is the simplest example of unsupervised learning. The final part of the course will consist of selected topics.
 

1. Exploratory Data Analysis
2. Probability
3. Clustering
4. Introduction to Classification
5. Classification Methods
6. Selected Topics

Honor Code:
Students are expected to adhere to the UNC honor code at all times. Violations of the honor code will be prosecuted.



Study tips:     

1. Keep up with the reading and homework assignments.  If the reading assignment is long, break it up into smaller pieces (perhaps one section or subsection at a time).  Keep a pencil and scratch paper on hand, and use these to work out the details of any argument that is not clear to you.

2. Look over the notes from the lecture k before attending lecture k+1.   This will help keep you on top of the course material.  Ideas from one lecture often carry over to the next: you will get much more out of the material if you can maintain a sense of continuity and keep the ``big picture'' in mind.

3. Complete the reading *before* doing the homework.  Trying to find the right formula or paragraph for a particular problem often takes as much time, and it tends to create more confusion than it resolves. 

4. It is important to know what you know, but it's especially important to know what you don't know.  As you look over new reading material or your notes, ask yourself if you (really) understand it.  Keep careful track of any concepts and ideas that are not clear to you, and make efforts to master these in a timely fashion, using the class notes, outside reading, office hours, and study groups if necessary.

5. One good way of seeing if you understand an idea or concept is to write down the associated definitions and basic facts, without the aid of your notes, in full, grammatical sentences.  Take special note of how you employ prepositions.  It's also helpful to state the definitions and basic facts out loud -- the same grammatical criteria apply here.    Translating ideas from mathematics to complete English sentences, and back again, is important in mathematical research.

6. The homework and computer assignments play two important roles in the course.  First, they provides an opportunity to actively think about, engage with, and learn the course material.   In addition, they provides feedback on your understanding of the material.  Carefully look over your corrected assignments.  Most students do well on the assignments: even if you received a good score, make sure to note and understand and correct any mistakes you have made.

7. Begin studying for exams at least one week before they are given.  Look over your notes, homework, and the text.  Write up a study guide containing the main concepts and definitions being covered, and use this to get a clear picture of the overall landscape of the material. A study guide for each midterm will be posted online.  For every topic on the study guide, you should know the relevant definitions, motivating ideas, and at least one or two examples.