Twitter Data Sentimental Analysis Using Hadoop
Objective
To analyze the sentiments of people as positive, negative or neutral using Hadoop for the Demonetization data to extract interesting patterns.
Project Overview
The Twitter Data Sentimental Analysis hadoop project is to analyse the sentiment by gathering tweets from different people and to check whether the people happy with the government scheme or not. Twitter Sentiment Analysis is the process of determining Tweets is positive, negative or neutral.It is known as opinion mining.
The data set is collected from tweets of citizens from twitter. Obviously data are in an unstructured format. Also a huge amount of tweets is generated. So here the big data come into action. The big data concepts like, Hadoop, MapReduce, Hadoop Distributed File System widely used for this type of applications.
Proposed System
The proposed Twitter Data Sentimental Analysis hadoop project system concentrates on sentiment analysis of the noteban data using hadoop. The sentiments collected from the twitter are classified as positive, negative, neutral. Positive opinion words are used to express desired states for the government scheme while negative opinion words are used to express undesired states for the government scheme. The proposed system architecture is shown in the figure.
Step 1: Twitter API
Twitter API is used as an authentication API to extract the tweets related noteban data.
Step 2: Data Preparation
The data are collected from twitter using Hadoop through twitter API for Indian government announcement noteban. Punctuation, stop words, special characters are removed using data preprocessing techniques.
- Tokenization:
- Lexical Dictionary
- Acronym Dictionary
- Emoticon Dictionary
- Stop Words Dictionary
Tokenization
Tweets extracted from twitter are divided into into tokens. This is known as tokenization process. For example, ‘In the short run it took many life & shattered many household’is divided down into ‘In’ , ‘the’, ‘short’, ‘run’, ‘it’, ‘took’, ‘many’, ‘life’, ‘&’, ‘shattered’, ‘many’, ‘household’.
Lexical Dictionary:It is used to match the words in the tweet.
Acronym Dictionary:It is used to expand the abbreviations and acronyms. This dictionary will create words which are used for further analysis.
Emoticon Dictionary: it is used to convey the meaning for emoticon.
Stop Words Dictionary:The words which do not have any importance for sentiment analysis. So this word is identified and removed. Example: a, an, the, as, etc.,
Step 3: Sentiment Analysis
The sentiments collected from the twitter are classified as positive, negative, neutral. This sentiment analysis is performed statewise.
Example for positive tweet:
New india is born.
Example for negative tweet:
In the short run it took many life & shattered many household.
Step 4: Data Visualization
After the sentiment analysis, the analyzedsentiments are visualized using bar chart.
Software Requirements
- Linux OS
- MySQL
- Hadoop & MapReduce
- Twitter API Account
Hardware Requirements
- Hard Disk – 1 TB or Above
- RAM required – 4 GB or Above
- Processor – Core i3 or Above
Technology Used
- Big Data – Hadoop
Leave a Reply