Hadoop Projects
What is Hadoop?
- It is mainly developed for storage and processing of huge amounts of data.
- Open source.
- Java-based programming framework.
Major Components of Hadoop
- HDFS(Hadoop Distributed File System): It is used to store of vast amounts of data in a distributed manner.
- MapReduce: It is used to process the massive amounts of data and gives the result.
What is Hadoop Projects?
- Hadoop project is a solution to the problem when we have big data in our hand and not having enough knowledge from data.
- It can handle all three types of data
- Structured data
- Semi-structured data
- Unstructured data
The Lifecycle in Hadoop Projects
- Data collection
- Data storage
- Data analysis and processing
- Knowledge extraction
- Knowledge presentation
Data collection
- Collecting data and inputting it to the Hadoop
- Examples:
- Flume
- Sqoop
Data storage
- The collected data is stored using various storage methods.
- Example:
- HDFS
- HIVE
Data analysis and processing
- Analyse and process the given data.
- Examples:
- MapReduce
- HIVE
- Pig
Knowledge extraction
- Extract useful information using machine learning or other techniques.
- Examples:
- Patterns
- Models
- Rules
- Trees
Knowledge presentation
- Present and visualize the results to the end user.
- Examples:
- Reporting
- Dashboards
- BI Applications
HADOOP PLATFORMS/TOOLS
- Hadoop 1. X or 2.x
- Linux
- Java JDK
- Cassandra
- MongoDB
- R
- HBase
- Pig
- Spark
- Mahout
- Pentaho
LIST OF HADOOP PROJECTS
- Sentiment Analysis On Product Reviews Using Hadoop
- Youtube Data Analysis Using Hadoop
- Recommendation System Using Hadoop
- Climatic Data Analysis Using Hadoop
- Aadhar Based Analysis Using Hadoop
- Twitter Data Analysis Using Hadoop
- Facebook Data Analysis Using Hadoop
- Airline Sentiment Analysis Using Hadoop
- Agricultural Data Analysis Using Hadoop
- Cricket Match Analysis Using Hadoop
Sentiment Analysis On Product Reviews Using Hadoop
The Sentiment analysis is used to identify the opinions or sentiments expressed in the social media source text. Social media has gained more attention nowadays. People opinions about different subjects expressed and spread continually via numerous social media. Twitter and Facebook are gaining reputation. This project deals with a tweets sentiment analysis that can spot the interest and opinion of people regarding the products such as mobile phones, books, and laptops.
Requirements
- Twitter API
- Hadoop and MapReduce
Youtube Data Analysis Using Hadoop
This project deals with how to perform YouTube data analysis in Hadoop and MapReduce. YouTube data is freely accessible. We can perform much analysis and will draw out many insights. The extracted insights might consist of the top 10 rated videos, most viewed videos.Analysis of structured data is easy one nowadays. However,analysis of unstructured data remains a challenging area. The objective is to analyze the data set by using Hadoop concepts,how data generated from YouTube can be mined and utilized to make targeted, real-time and informed decisions.
Requirements
- YouTube API
- Hadoop and MapReduce
Recommendation System Using Hadoop
Nowadays, recommendation systems play the main role in e-commerce applications. It usually gives the good recommendations based on their previous searches and their profile. Commonly used two approaches for recommendation systems are collaborative filtering and content-based filtering. The collected data set is huge size. The data set consists of unstructured data also. So, we need Hadoop to store and process this huge data set. The new system will find a relationship between items and user interests. It will be very helpful for the customers to select their preferences.
Requirements
- YouTube API
- Hadoop and MapReduce
- HIVE
Climatic Data Analysis Using Hadoop
This project deals with analyzing the climatic data using Hadoop and MapReduce to extract patterns for better decision-making.The climate is one of the key factors in the business like tourism. This climatic data influence success and failure of any business. Temperature and rainfall in one area vary from another area. So it is significant to analyze the climatic data in different areas/regions. This project analyzes the climatic and other related data of different areas. The collected data is huge and consists of unstructured data. So the concept of big data emerges here. Hadoop and MapReduce are used here to analyze the climatic data for predicting the future trends.
Requirements
- HBase
- Hadoop and MapReduce
Aadhar Based Analysis Using Hadoop
This project deals with analyzing the Aadhar data using Hadoop to extract useful models for better decision-making by the central and state government.India is the second largest nation regarding population, with 1.3 billion population. Surveys indicate that more than 99% of Indian people enrolled for Aadhar. The data analysts have access to some of the public data to analyze to extract useful information and generate reports. All the data collected for this unique identity is not in a structured manner. The gathered data consists of unstructured and semi-structured data.So here we are in need of using big data technology called Hadoop. The purpose of Hadoop is storing and processing a large amount of the data. So this project uses the Hadoop and MapReducefor processing Aadhar data.
Requirements
- Hadoop and MapReduce
- HIVE
Twitter Data Analysis Using Hadoop
Twitter Data Analysis Using Hadoop project analyzes the sentiments of people as positive, negative or neutral using Hadoop for the recent issues held in our country. This project extracts useful and interesting patterns. The project is to analyze the sentiment by collecting tweets and understanding the people opinion about the issue. It is also known as opinion mining. The data set collected from tweets of citizens from twitter. Also, a huge amount of tweets generated in unstructured text format. Hadoop and MapReduce used for this type of applications.
Requirements
- TwitterAPI
- Hadoop and MapReduce
- HIVE
Facebook Data Using Hadoop
This project is used to analyze the Facebook data using Hadoop for understanding user behavior for the business process. Statistics say that the nearly 1.37 billion daily active users on Facebook. Every user generates data from Facebook based on their activity in the application. Most of the data generated by users in facebook are unstructured in manner. Business people find this data and analyze this data for their business growth. Data analysis plays a main role in their business profit. A Facebook data analysis process consists of collecting data, analyzing data and visualizing outcomes. The attributes might include user behavior, number of likes, number of posts, type of posts, their comments, etc.
Requirements
- Facebook API
- Hadoop and MapReduce
- HIVE
Airline Sentiment Analysis Using Hadoop
The project is used to analyze the sentiments by gathering different airline data from Twitter and predict which airlines suit best for common people. Flight delays are an important issue in the flight industry because it will lead to economic issues. This project identifies the core factors that influence the delays that occur in airlines. It costs in the very big way for both travelers and airlines. This test the strategy and get the feedback from customers and hit the right market, at the right time, with the right product.It is difficult to collect data about customer’s feedback by questionnaires, but Twitter provides a sound data source to do customer sentiment analysis. The sentiments collected from the twitter classified as positive, negative, neutral.Retweets, stop words, links, URLs,mentions, punctuation, and accentuation removed so that data set could be standardized.
Requirements
- TwitterAPI
- Hadoop and MapReduce
- HIVE
Agricultural Data Analysis Using Hadoop
The project focuses on analyzing agricultural system data. The data set consists of the crop yield and the crop details on monthly as well as yearly basis. This project is used to analyze the productivity parameters to solve the main problems faced by farmers. Then identify the bottlenecks, and the provides the best possible solutions to it. This project objective can be accomplished using Hadoop with MapReduce. The output is generated in the form of models about good farming solutions. The outcome consists of the information like climate, growing crops on various factors like demand, change in production rate, future trend.
Requirements
- Hadoop and MapReduce
- HIVE
Cricket Match Analysis Using Hadoop
This project concentrates on predicting the outcome of the cricket match result using Hadoop. This outcome can achieve by understanding the historical and current data. In recent times, predicting sports results in advance using predictive analytics modeling plays a vital role in the sports and the business. Cricket is the most tricky and unpredictable sports. On the given day, any team can win the match with its performance. This reason makes the challenge in predicting the accurate outcome of the cricket match.To predict the result of the T20 game, we analyze the which stadium is most suitable for batting first and which stadium is most suitable for bowling first, type of ground, teams past performance, batting and bowling potentials of the 11 players of both teams using their past performance and toss factor.
Requirements
- Hadoop and MapReduce
- HIVE