Context Based Search Engine
Problem Definition for Context Based Search Engine
Our project Context Based Search Engine is basically a research oriented project where we are required to understand the working of search engines such as google, msn etc and study the concepts of data mining and data warehousing and also to research on available search algorithms. We also would be doing a comparative analysis of Google Search API and Lucene framework.
Technology Used
Java
Platform
Windows Machine
Software and Hardware Requirements
- JDK 1.5
- Microsoft Windows XP Professional SP2
- 512 Mb RAM
- 80 Gb HDD
- Pentium 4 processor
Context Based Search Engine Project Description
Although several new operating systems attempt to provide users with content-based search capabilities, they are limited to text documents. A key challenge in implementing a content-based similarity search system for feature-rich data is that such data is noisy and complex. For example, consider two different photographs of an identical scene, or two separate recordings of a person speaking the same sentence. Despite the high degree of similarity between the two images or between the audio files or data, the digital pattern of this data are different at very low level. By Comparing noise inside the digital data , usually data requires matching based on some similarity of pattern instead of exact match of digital representation. However, if we try similarity search in high dimensional data it is notoriously difficult. So in current scenario, today’s advanced search algorithms such as database tools and search engines have limited capability to search for exact matches. These kind of search engines can work only for textual data and text annotations only. To date, there is no practical content-based search engine for massive amounts of inherently noisy, feature-rich data.
Our application would be a code indexing and search application. It will be an application of Search API and Lucene framework. This is a branched out specialized domain from context based searching.