Abstract:
Unsupervised classification methods are still being discovered and refined, so it is necessary to
look at the methods available and discern which may be most useful in different situations. Taking data
collected by the Washington Post on fatal police shootings, various statistical methods of unsupervised
classification were explored to determine if there were any meaningful groups present within the data.
The variables explored were “race”- the race of the individual, “threat level”- what the perceived threat
level of the situation was, “signs of mental illness”- whether or not the individual displayed signs of
mental illness at the time of the shooting, and “body camera”- whether or not there was a body camera
present for at least part of the incident. The variables were examined for latent groups using a variety of
methods: k-means clustering, k-medoids clustering, hierarchical clustering, fuzzy clustering, densitybased spatial clustering of applications with noise (DBSCAN), gaussian mixture model (GMM), and
principal components analysis (PCA). The number of clusters to retain was explored for k-means, kmedoids, hierarchical clustering, fuzzy clustering, DBSCAN, and GMM. The means were then examined
to understand the how the clusters were structured. PCA required the determination of how many
principal components to retain and then the rotation matrix was examined to determine what factors were
important for each cluster. Though no conclusive consensus was achieved by all the models, there was
some overlap in the structure of some of the latent groups that emerged. Many of the final solutions found
differences by race, age, and mental illness that may call for more analyses to done to understand the
groups of people that are more likely to be involved in a fatal police shooting.