Sorting through 900,000 tweets probably isn’t a task most people would relish tackling, but two Montana Tech Data Science students eagerly jumped in head first as they endeavored in a creative senior project where fan Tweets were used to successfully predict the outcomes of NFL football games.
“The overall point of the project was to predict whether any given team would win over another team against the spread in any given week,” said Sarah Wiseman, a double major in Statistics.
Wiseman and Jace Rhodes based their project on a similar study that was done in 2013. Their project results indicated an extra 4% prediction power when including the Twitter data in the model, compared to an accuracy of 54.1% without.
“We also found that advanced stats and more data could possibly improve these results,” Rhodes noted.
After several rejections, the pair applied for and received an educational license from Twitter. They then could download hundreds of thousands of Tweets and used open-sourced Python packages to analyze their data. They took fan Tweets that were posted 2 hours after a game to predict the winner of the following week’s match.
To assign the tweets a sentiment value using Python, Wiseman and Rhodes had to classify about 1,600 tweets by hand. Machines can shorten processing time considerably, but they are not sentient and are unable to pick up the subtext of certain Tweets.
“Analyzing social media was difficult,” Rhodes said. “Humans behave unpredictably. There were sarcastic tweets and others were very short. Looking at the tweets was somewhat hilarious, just to see how mad some of the fans were.”
The programs used also could not tell the difference between Tweets about football and Tweets about baseball teams of the same name. That work had to be done by hand. Wiseman enjoyed curating the data.
“I liked pre-processing the tweets, taking out the unnecessary words, and making sure they had just enough information so that the machine could understand the information and not get confused,” Wiseman said. “We only want the words that have meaning associated with them. We didn’t foresee the different sarcastic stuff that would make sentiment analysis difficult.”
“While data from Twitter and other social networking sites can be valuable in predicting events, it is challenging to design models that accurately interpret that data,” said Dr. Hilary Risser, Associate Professor and Mathematics Department Head. “This project demonstrates the challenges that Data Scientists face when using social network data.”
Wiseman and Rhodes look forward to utilizing their education when they launch their careers after graduation on May 5. Both students came to Montana Tech from Billings, but won’t be headed back soon. Wiseman is in the interview process to work at an Air Force base in Utah. Rhodes plans to stay in Butte, where he will work for Water & Environmental Technologies.
The pair are excited to work in a rapidly expanding field. The Bureau of Labor Statistics expects the demand for Data Scientists to increase by 36% over the next decade. The bureau reports the median salary of a data scientist in the U.S. in 2021 was more than $100,000.
“It’s an industry that’s growing quickly,” Rhodes said.