There has been an influx of new traders in the US markets due to a specific online group by the name of WallStreetBets. The group caused a pump in the market with stocks such as GME(GameStop) and AMC resulting in rich hedgefund investor to lose millions of dollars in the market. Manipulation in the stock market is nothing new but for the first time in history we are seeing a trend of independent investors wielding the power to shift the market in their favor with organized plays. Despite criticism in the media and action from the SEC, this is a good sign. Why? It shows that the “commoner” in society still has a voice despite the increased disparity between the rich and the poor.
Since then, the reddit group has more than quadrupled in size. This has been a double edged sword for the voice of the common person. There are more average Joes who are gaining exposure to the market BUT there are also bots, fake accounts, and internet trolls with the intent to discredit and mislead other followers of the group.
How can we assess who is in the group to help others and who is present for ulterior reasons? I will be analyzing data from the group’s posts in my project to better decipher and to differentiate the two.
I started with a csv file of the WallStreetBets posts from Reddit. After importing it and assigning it to dataframe, importing appropriate libraries such as numpy, pandas, and ntlk, I inspected the data and found a major issue that would determine the usefulness of the data. Mainly consistency of column content and data types were a bit compromised. I changed data types to be a bit more uniform and dropped certain columns containing data that would skew my results. I searched for correlation between columns but found nothing substantial enough to report.
With the aid of the NLP library, nltk, I made a visualization displaying the most used words in post titles in the WallStreetBets.
In my next post, I will show the most used positive and negative words from the Reddit group