As part of my Phd dissertation at Walden University, I developed an application that would analyze the sentiments of tweets that include the stock symbols of the publicly held firms in the United States and correlate the results with the financial data of such firms during the period of the research. At the time of the coding, between 4th quarter of 2014 and the first quarter of 2015, I could not find ready made tools that I could use for conducting the data analysis. Some companies offered solutions at very high costs while other tools had limited capabilities. So I went ahead and stitched various tools and coding techniques for my own research. The key steps and scripting tools used were as follows:

  • Used Twitter APIs and Tweepy libraries in my Python code that would extract the relevant tweets on a streaming-basis from Twitter
  • Leveraged Yahoo Developer Network using my Python code to extract the financial data of each of the publicly held firms in the United States.
  • Extracted the stock symbols of all publicly held companies in the United States by extracting the data from nasdaq.com using IPython Pandas libraries
  • Developed a portal using Python Django that would help me train the machine learning system to recognize negative and positive sentiments.
  • Used Stanford Core NLP java modules with the help of the trained data to analyze the sentiments of all the tweets
  • Used IPython Pandas in a Notebook format to conduct the data analysis.

The source code is available on finSentiment on Github. In subsequent posts I will explain what the scripts do, which is relevant and which is not, and how the scripts can be used in other projects.