Mini-Project Guidelines:

Everyone will present their mini-project results during our extended final-exam period on 12/12 from noon-5pm. The order of presentations will be random. You will be alloted 15 mins to show your results, and explain novel aspects of your approach. You may use either a PowerPoint presentation (no more than 10 slides) or a web page to demonstrate your results. Your presentation will account for 50% of your mini-project grade.

In addition, you are required to turn in a short writeup of your project that concentrates on describing novel aspects of your approach, and your results. The write-up should be approximately one page in length and should not exceed two pages (pdf format is preferred). You should turn in this writeup, all of your mini-project code, and a separate README file explaining how to run your code, as a zip archive, which must be emailed to mcmillan@cs.unc.edu before midnight on 12/12.

All projects must be written in Python, and should be individual efforts. The last three projects are associated with potential RA opportunities, if that is of interest.


Project #1: Eigenfaces

This project involves performing PCA on images of your classmates. PCA is generally applied to a set of points with common features. In order to align this data set's features you should first rectify each image. This is best accomplished using an affine transform to align the two eyes and mouth of each image, followed by cropping them to a common image size. A good reconstruction filter will minimize the quality loss at this stage.

Once rectified, perform PCA on the resulting set of images treating each as a single point. Find the number of factors necessary to capture 95% of the data set's total variance, and compute the scaling weights for each input onto this set of factors (i.e. the projection of each point onto the PCA factor). Compute the Mean Absolute Error (MAE) for each point, and keep track of which one (who) is the largest outlier. Also, generate images of the mean point, and the first two PCA factors.



Hints and Options:
  • Find headshot images of two celebrities of your own choosing. Rectify them, and compute their projections onto the Eigenvectors of the class factors you computed previously and their reconstruction in 665 space (You need not include them in the PCA analysis). Compute their MAE as well.
  • Construct a plot of all samples projected onto the first two Eigenvectors. Label each point, and identify the closest pair of points with a line segment.

Project #2: Tubule Filter

The following images are of testicular histology studies. The nearly round substructures packed within the larger oval are cross-sections of seminiferus tubules. The goal of this project is design a filter, or combination of filters, whose response is maximal when centered over a tubule. The tricky part of this problem is that your filter should work for any image scale/resolution.

We are only interested in detecting tubules that are near circular in their presentation (the oblong and twisted compartments can be safely ignored). Missing a few tubules (false negatives) are preferred over incorrect detections (false positives). In particular, you should strive not to mislabel any of the gaps (white regions) between tubules.

Feel free to use any of the tools discussed in class, in particular spatial convolution, Fourier analysis, and wavelet analysis may prove useful.

Use the color channels to visualize your filter's response. For example, you might display all regions exceeding a given response threshold as an intensity in the red channel. In any case, you must provide some visualization of your results overlaid on the given images.



Hints and Options:
  • You can preprocess the original image in various ways prior to applying your filter. For example you might want to adjust/normalize the intensity levels, prior to application of your filter.
  • Use different colors to indicate the confidence of your detection.
  • Automatically fill in each tubule, indicating it with a color tint.
  • Count the number of tubules detected, and the ratio of their area to the area of the entire oval.

Project #3: Motility Movie Analysis

This project analyzes the motility of spermatozoa using 90 movie frames captured from a light microscope. The first objective is to separate the moving objects from the fixed background clutter. This requires extracting a background matte image, and then comparing each movie frame against it. Either an temporal average or temporal median image can be used.

The next requirement is to construct a summary image, which integrates the moving elements along their motion paths, with the previously computed background removed. The motion path should be pseudo-colored by tinting the elements of the first frame blue with a smooth transition to shades of red in the last frame.


Three example frames. Archive with all frames

Hints and Options:
  • Extend your code to individually track each moving item separately. You should report the trajectory as a 4-tuple of (item_number, frame, x, y), and you can visualize your result by assigning a different color to each item.
  • Reintroduce the background back into your animation by tinting it green and estimating a per-pixel alpha/opacity value for the animation (Note that a median background matte might provide better results for this option). Then overlay the animation over the background. Experiment with other backgrounds.

Project #4: Visualizing Sequence Similarity

The genome (DNA) sequences of individuals from the same species are largely identical, with the exception of a few polymorphic positions called SNPs (for Single Nucleotide Polymorphisms). The subset of variable positions in a genomic sequence are called the organism's haplotype. A useful pairwise measure of local sequence dissimilarity is the number of differing positions within a specified genomic window. Haplotype sequences are ideal for make such sequence comparisons.

The object of this project is to create an animation to visualize the sequence similarities of several common lab-mouse strains. Each frame of the visualization should use MDS to find the 2D coordinates of each strain which best represents pairwise sequence differences as the Euclidean distance between the pair. The animation is comprised of all frames as a window slides over a 10 megabase window as it is translated 1 megabase per frame. For each frame you will need to first find the set of SNPs that fall within the current interval. For this set of SNPs you will need to compare all pairs of strains, and find the number of sequence mismatches between strains. These differences will be the entires of your dissimilarity matrix to which MDS will be applied. Once applied, the initial 2D coordinates of the points will be corresponding values from the first two Eigenvectors.

Position
7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
6 6 6 6 7 7 7 7 9 9 9 0 0 0 0 0 0 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 9 9 9 9 9 9
2 3 4 6 0 1 2 5 6 7 8 1 1 2 3 3 4 8 9 9 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 7 7 7 8 9 0 0 1 1 8 8 8 9 9 9
5 4 5 1 9 5 9 6 6 8 0 6 7 5 1 3 4 9 1 7 7 1 6 9 1 4 8 0 5 8 0 2 9 5 8 1 3 5 8 4 3 9 3 7 0 3 9 3 4 7
Strain 0 5 5 5 7 7 4 5 3 9 3 9 4 5 9 5 7 4 0 3 9 0 7 7 0 1 7 9 6 3 4 4 1 4 2 7 3 7 2 2 3 3 0 9 3 4 1 7 3 1
129S1/SvImJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G A G G G G A T T T A G A G T G T
129S4/SvJae T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
129X1/SvJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G A G G G G A T T T A G A G T G T
A/J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A T T G T
AKR/J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G A G G G G A T T T A G A G T G T
BALB/cByJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A T T G T
BALB/cJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
BPH/2J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
BPL/1J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
BPN/3J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
BTBR T+ tf/J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G A G G G G A T T T A G A G T G T
BUB/BnJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
C3H/HeJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
C57BL/10J T C A G G T C A T A A A A G C T C A T G G A A G G T A T G C C T T A G A T A A A T T G G A A G C G G
C57BL/6J T C A G G T C A T A A A G G C T C A T G G G A G T G T C A T T C C A G G G G G A T A T A A A G T G G
C57BLKS/J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
C57BR/cdJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
C57L/J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
C58/J T C A G G T C A T A A A A G C T C A T G G A A G G T A T G C C T T A G A T A A A T T G G A A G C G G
CALB/RK A T G A T C A G T T G T G A T G T G A A T A A G G T A T G C C T T A G A T A A A T T G G A A G C G G
CAST/EiJ A T G A T C A G C T G T G A T T T G A A T A A G G T A T A C C T C A G A T A A A T T G G A A G C G G
CBA/J T C A G G C C A C A A A G G C T C A T G G G G A T G T C A T T C C G G G G G G A C T T A A C G T A T
CE/J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
CZECHII/EiJ T C A G G C C G C A A A A G C T C A T G G G A G G T A T A C C T C G A A T A A C T T G G A A G C G G
DBA/1J T C A G G C C A C A A A G G C T C A T G G G G A T G T C A T T C C G G G G G G A C T T A A C G T A T
DBA/2J T C A G G C C A C A A A G G C T C A T G G G G A T G T C A T T C C G G G G G G A C T T A A C G T A T
DDK/Pas T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
DDY/J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
EL/SUZ_2 T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
FVB/NJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A T T G T
HTG/GOSFSN T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
I/LnJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
IL/ILS T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
IS/CAMRK A T G A T C A G T T G T G A T G T G A A T A A G G T A T G C C T T A G A T A A A T T G G A A G C G G
IS/ISS T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
JF1/Ms T C A G G C C G C A A A A G C T C A T G G G A G G T A T A C C T C G A A T A A C T T G G A A G C G G
KK/HLJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A T T G T
LEWES/EI A T G A T C A G T T G T G A T G T G A A T A A G G T A T G C C T T A G A T A A A T T G G A A G C G G
LG/J T C A G G C C A C A A A G G C T C A T G G G G A T G T C A T T C C G G G G G G A C T T A A C G T A T
LP/J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
MA/MyJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
MAI/Pas T C A G G C C G C A A A A G C T C A T G G G A G G T A T A C C T C G A A T A A C T T G G A A G C G G
MOLF/EiJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A A A T T G T
MOLG/DN T C A G G C C G C A A A A G C T C A T G G G A G G T A T A C C T C G A A T A A C T T G G A A G C G G
MRL/MpJ T C A G G C C A C A A A G G C T C A T G G G G A T G T C A T T C C G G G G G G A C T T A A C G T A T
MSM/Ms T C A G G C C G C A A A A G C T C A T G G G A G G T A T A C C T C G A A T A A C T T G G A A G C G G
NOD/LtJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
NON/LtJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
NOR/LTJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
NZB/BlNJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
NZO/HlLtJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
NZW/LacJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
O2/O20 T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
P/J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
PERA/EiJ A T G A T C A G T T G T G A T G T G A A T A A G G T A T G C C T T A G A T A A A T T G G A A G C G G
PERC/EI A T G A T C A G T T G T G A T G T G A A T A A G G T A T G C C T T A G A T A A A T T G G A A G C G G
PL/J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
PWD/Ph A T G A T C A G T T G T G A T G T G A A T A A G G T A T G C C T T A G A T A A A T T G G A A G C G G
PWK/PhJ T C A G G C C G C A A A A G C T C A T G G G A G G T A T A C C T C G A A T A A C T T G G A A G C G G
Qsi/Qsi5 T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
RF/J T C A G G C C A C A A A G G C T C A T G G G G A T G T C A T T C C G G G G G G A C T T A A C G T A T
RIIIS/J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
SEA/GnJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
SEG/Pas T C A G G C C G C A A A A G C T C A T G G G A G G T A T A C C T C A A A T A A C T T G G A A G C G G
SJL/J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
SKIVE/EI T C A G G C C G C A A A A G C T C A T G G G A G G T A T A C C T C A A A T A A C T T G G A A G C G G
SM/J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
SOD1/EI T C A G G T C A T A A A A G C T C A T G G A A G G T A T G C C T T A G A T A A A T T G G A A G C G G
SPRET/EiJ T C A G G C C G C A A A A G C T C A T G G G A G G T A T A C C T C G A A T A A C T T G G A A G C G G
ST/bJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
SWR/J T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
TALLYHO/JNGJ T C A G G C C A C A A A G G C T C A T G G G A A T G T C A T T C C G G G G G G A T T T A G A G T G T
WSB/EiJ T C A G G T C A T A A A A G C T C A T G G G A G T G T C G T T C C A G G G G G C T A T A A A G T G G
ZALENDE/EiJ A T G A T C A G T T G T G A T G T G A A T A A G G T A T G C C T T A G A T A A A T T G G A A G C G G

Shown above is a sample of a haplotype sequence. You can download a full sequence here.
It is a zipped text file (.cvs format, which can be loaded into Excel). It is also transposed relative to the table shown above.

In order to maintain the coherence of your animation, it will be necessary to consistently align each frame of the animation. Use the following alignment approach, translate the centroid of the strains named "CAST/Eij", "WSB/Eij" and "PWD/Ph" too the center of your image. Rotate the result so that CAST/Eij is offset entirely in the positive x-direction. You may then need to optionally, reflect all of the remaining coordinates so that the y-coordinate of PWD is positive (this involves only multiplying the y-values of all strains by -1). Finally, you should scale the entire image so that all strains, and their labels, fit within the image.


Hints and Options:
  • You can connect your strain points into a tree that is constructed as follows. For each dissimilarity matrix, find the smallest off-diagonal value, which will correspond to the closest pair of strains. Connect these two strains with a line segment in your visualization. Next remove these two strains from your table, and replace them with a new "virtual" strain, whose distance to other strains is the average of the two repalced distances. The coordinate of this virtual strain should be the midpoint of the line segment (do not run MDS again). Repeat this procedure until only two strains are left, and then connect them with the final segment.
  • Use the LBG algorithm to indentify 3-4 clusters (codebook entries), assign a distinct color to points in each cluster. In all likelihood you will not end up with consistent colors if you do apply this to every frame, thus, you should experiemnt with changing it over 10 frames or so. You should try as much as possible to keep your colors consitent between frames, seeding the LBG centriod point with the mean of the clusters from the previous LBG run will aid in this.