To Implement Web Crawler in Java BE(IT) CLP-II Pratical
Aim : To implement Web Crawler in Java Language .
Web crawler is the program of piece of code that search engine uses to index Web pages across the web. It crawls the HTML Page to find the keywords on that page for search engine indexing of the pages .
Below code Web crawler in Java crawls the “google.com” and finds out the total links to other pages .
Web crawler is the program of piece of code that search engine uses to index Web pages across the web. It crawls the HTML Page to find the keywords on that page for search engine indexing of the pages .
Below code Web crawler in Java crawls the “google.com” and finds out the total links to other pages .
import java.net.*;
import java.io.*;
import java.util.regex.*;
public class crawler {
public static void main(String[] args) {
String source_url="https://google.com";
try
{
URL url = new URL(source_url);
URLConnection yc = url.openConnection();
String data=null;
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
data=data+inputLine ;
in.close();
Integer i=0;
Pattern pattern = Pattern.compile("]*href=\"[^>]*>(.*?)");
Matcher matcher = pattern.matcher(data);
while (matcher.find())
{
System.out.println((i+1)+ matcher.group());
i=i+1;
}
System.out.println("TOTAL LINKS:"+i);
}
catch(Exception e)
{
System.out.println(e);
}
}
}
Leave a Reply