Climatic Data analysis using Hadoop
Objective
To analyze the climatic data using Hadoop to extract significant knowledge for the purpose of better decision-making.
Project Overview
The climate is very important factor factors in many perspectives such as business, tourism, etc., This climatic data influence success of many operations. For example, temperature and rainfall in one area is not same for other area. So it is important to analyze the climatic data in different regions, for the purpose of better decision making and precaution measures.
This project tries an attempt to analyze the climatic data of various regions.Climate data are radically increasing in volume and complexity. So the concept of big data emerges here. Hadoop, MapReduce big data concept is used here to analyze the climatic data for the purpose of better understanding and decision-making.
Proposed System
The proposed system focuses on analyzing climate related data using Hadoop. The proposed system architecture is shown in the figure.
Figure: Proposed System Architecture
Step1: Data Preparation
Data Selection: The required data set is collected from the website https://knoema.com/atlas/India/topics/Climate-Change/datasets.
Data Loading: The collected data set loaded into Hadoop Distributed File System environment. Hadoop is the great tool to predict the climatic conditions, with processing of large and dynamic climate data.
Data Pre processing: The collected data set might consist of inconsistent data. If climatic analysis is performed on this data, it will produce wrong outcomes. Therefore, necessary pre processing techniques are applied before analyzing the data.
Step 2: Climatic Data Analysis
Climatic Data Analysis: Pre processed data set is now analyzed using machine learning and statistics. Prediction techniques like regression are applied to the climate data to predict the future climate in the city.
Step 3: Reports
Report Generation: After the climate data analysis, the necessary reports are generated and visualized. Bar charts and line charts are used along with the table format. Inferences and conclusions are derived from the analyzed data.
Statistics Questions
- Find the mean and median of the maximum temperature on an hourly basis
- Find the mean and median of the minimum temperatureon an hourly basis
- Identify the rate of change of daily average temperature
- Identify the rate of change of weekly average temperature
- Identify the rate of change of monthly average temperature
- Data points in the various cities
Machine Learning
- Clustering: Identify the regions that are close to each other based on the climatic conditions.
- Classification: The climate data set is classified, based on whether it will be a sunny day or rainy day day depending on the temperature.
- Prediction: Predict the next day climate in the city with the possible rate of change compared to existing climate.
Software Requirements
- Linux OS
- MySQL
- Hadoop & MapReduce
Hardware Requirements
- Hard Disk – 1 TB or Above
- RAM required – 4 GB or Above
- Processor – Core i3 or Above
Technology Used
- Big Data – Hadoop
- Machine Learning
- Statistics
Leave a Reply