Danielle Durán, M.S.
1 min readJan 8, 2021

--

Solid start on a very timely topic. I'd like to see you dig deeper, location-wise. In the police shooting dataset, city and state are available fields. It'd be interesting to grab the demographics per city and state rather than just the higher-level state data. Your predictive models will likely improve. I do see that you accidentally included the same plot twice (figures 3 & 4) - you may want to update that for those curious as to the distribution of incident count, as it's the main variable of interest. This paper reads more like a data exploration / model comparison exercise on an interesting topic. For next steps, there is a wide canon of criminology literature to pull from when considering which variables influence police shootings - perhaps there is more out there that could be included. Additionally, try a stepwise variable selection process to achieve maximum model performance / efficiency. I think you'll see improved results with the above ideas. Lastly - I love that you added Random Forest (one of my fave ML techniques). I usually do parameter optimization when I use RF - set the seed for replicability and then play with varying the number of trees, variables tried at each node, keeping predictions in bag vs out, etc. All in all, solid work & thanks for sharing!

--

--

Danielle Durán, M.S.
Danielle Durán, M.S.

Written by Danielle Durán, M.S.

Statistician. Co-founder. ESG Advocate.

Responses (2)