• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
projectsgeek

ProjectsGeek

Download Mini projects with Source Code, Java projects with Source Codes

  • Home
  • Java Projects
  • C++ Projects
  • VB Projects
  • PHP projects
  • .Net Projects
  • NodeJs Projects
  • Android Projects
    • Project Ideas
      • Final Year Project Ideas
      • JSP Projects
  • Assignment Codes
    • Fundamentals of Programming Language
    • Software Design Laboratory
    • Data Structure and Files Lab
    • Computer Graphics Lab
    • Object Oriented Programming Lab
    • Assembly Codes
  • School Projects
  • Forum

Wiki Page Ranking With Hadoop Project

February 15, 2018 by ProjectsGeek Leave a Comment

Wiki Page Ranking With Hadoop

 

Objective

To ranking the given wiki pages using Hadoop.

Project Overview

One of the biggest changes in this decade is the availability of efficient and accurate information on the web. Google is the largest search engine in this world owned by Google, Inc. The heart of the search engine is PageRank, the algorithm for ranking the web pages developed by Larry Page and Sergey Brin at Stanford University. Page Rank algorithm does not rank the whole website, but it’s determined for each page individually.

The page ranking is not a completely new concept in the internet search. But it’s becoming more important nowadays, to improve the web page ranking position in the search engine. Wikipedia has more than 3 million articles as of now and still its increasing everyday. Every article has links to many other articles. This is the significant factor in page ranking. Each article has incoming and outgoing links. So analyzing which page is the most important page than the other pages is the key. Page ranking does this and rank the pages based on its importance.

Proposed System

The proposed Wiki Page Ranking With Hadoop project system focuses on creating best page ranking algorithm for Wikipedia articles using Hadoop. The proposed system architecture is shown in the figure.

Wiki Page Ranking With Hadoop

All the document data in the database are indexed. The web search engine searches the keyword in the indexing of the database.Then to search the information in the database, the web crawler is used. After finding the pages, the search engine shows the top web pages that are related to the query.

MapReduce consists of several iterative set of functions to perform the set of search results. Map() function gathers the query results from the search and reduce() function performs the add operation. The wiki page ranking project using Hadoop involves 3 important Hadoop steps.

  • Parsing
  • Calculating
  • Ordering

Parsing

In the parsing step, wiki xml is parsed into articles in Hadoop Job. During the mapping phase, get the name of the article and outgoing link connections are mapped. In the reduce phase, get the links to other pages.Then store the page, early rank and outgoing links.

Map(): Article name, outgoing link

Reduce(): Link to other pages

Store: Page, Rank, Outgoing link

Calculating

In the calculating step, the Hadoop job will calculate the new rank for the web pages.In the mapping phase, each outgoing link to the page with its rank and total outgoing links are mapped using map function.In the reduce phase, calculate the new page rank for the pages.Then store the page, new rank and outgoing links.Repeat these processes iterative to get the best results.

Map(): Rank, outgoing link

Reduce(): Page rank

Store: Page, Rank, Outgoing link

Ordering

Here the job function will map the identified page. Then store the rank and page. Now we can see top n pages in the web search.

Wiki Page Ranking With Hadoop Benefits

  • Fast and accurate web page results
  • Less time consuming

Software Requirements

  • Linux OS
  • MySQL
  • Hadoop & MapReduce

Hardware Requirements

  • Hard Disk – 1 TB or Above
  • RAM required – 8 GB or Above
  • Processor – Core i3 or Above

Technology Used

  • Big Data – Hadoop

Other Projects to Try:

  1. Web Page Ranking With Hadoop Project
  2. Search Engine using Python Project
  3. To Implement Web Crawler in Java BE(IT) CLP-II Pratical
  4. Search Engine project in Java
  5. Big Data Hadoop Projects Ideas

Filed Under: Hadoop Projects Tagged With: Hadoop Projects

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

Tags

.Net Projects Download Android Project Ideas Android Projects Angular 2 Assembly Codes C # Projects C & C++ Projects C++ Projects Class Diagrams Computer Graphics Database Project Data Mining Projects DataScience Projects Datastructure Assignments Download Visual Basic Projects Electronics project Hadoop Projects Installation Guides Internet of Things Project IOS Projects Java Java Interview Questions Java Projects JavaScript JavaScript Projects java tutorial JSON JSP Projects Mechanical Projects Mongodb Networking Projects Node JS Projects OS Problems php Projects Placement Papers Project Ideas Python Projects seminar and presentation Struts

Search this Website


Footer

Download Java Project
Download Visual Basic Projects
Download .Net Projects
Download VB Projects
Download C++ Projects
Download NodeJs Projects
Download School Projects
Download School Projects
Ask Questions - Forum
Latest Projects Ideas
Assembly Codes
Datastructure Assignments
Computer Graphics Lab
Operating system Lab
australia-and-India-flag
  • Home
  • About me
  • Contact Form
  • Submit Your Work
  • Site Map
  • Privacy Policy