Posts

Showing posts from July, 2018

Three Horror Writers Walk into a Predictive Model

Image
How I learned to stop worrying and love the vectorizor In the world of horror writers, there are few that can top the likes of H.P. Lovecraft, Mary Shelley, and Edgar Allan Poe (maybe Stephen King, but that's mostly due to volume. Not to say he's bad. I've actually never read any of his stuff. This aside has gone on too long.) An avid fan could probably tell the difference between their writings, but how well could we train a computer to do the same. Computers are notorious for their lack if interest in literature, so getting them to absorb the text in a language they understand it tricky. Furthermore, getting those converted text to play nicely with the other features is a bit of a mess, so her is vecorizing and feature union, spooky style. But first, wordclouds! Inside of our wordy Cthulhu, We see a number of words that don't seem to have to much significance And the same with Poe's Raven. The vocabulary seems pretty similar with the exception being t...

Bring Balance to the Data

Image
When the Yays far outweigh the Nays, and vice versa One would probably assume that a 95% accuracy is a damn fine model, but in many cases that is about as bad as it gets. For a classification problem, the more the predicted value counts vary from each other, the more you have issues with imbalanced classes.  To demonstrate this concept, let's look at the issue of West Nile Virus in Chicago. Due to the heavy population density and the species of mosquito that live in the area, Chicago summers are a hot bed of activity. Using weather data and mosquito trap data, it is possible to build a predictive model see when/where the virus would be most prevalent. While West Nile is very prevalent in Chicago, When looking at the traps, we see that the number of observations is, in fact, quite low (about 5%) With that information, spraying can be done to prevent the spread of the virus. However, If you asked a computer to make the most accurate model it can to deal with the issue, i...

You Know You Can Be Too Fit

Image
A review on the concept of overfitting No, this isn't an incitement on the culture of cross-fit or other fitness related clubs. I've never had much of an issue as long as people are getting in shape and enjoying themselves that's fine. I know people find they talk about their new found hobby too much but so does anyone with a new found hobby. Have you ever talked to someone who just started improv classes?  Anyway, the overfitting issue that I'm referring to has more to do with data and the idea that less is more. We all live with the idea that the more information we have, the better we are equipped at dealing with an issue, but that simply is not the case. Overfitting is when a model performs extremely well on the data it has been trained on, but poorly on data it has never seen. Imagine you are cooking at a friends house. You have trained in your kitchen and have become so used to the set up there that when you more to a kitchen you aren't used to, you b...