Gooly

A google-based searching tool with automated data crawling from dynamic websites
Position
Creator
Type
Software
Date
2022
Source

Description


Gooly

A Google-based search engines You can try it on here : http://gooly.phamvietduc.com

Overview
Gooly is a google-clone search bar which supports crawling tool from back-end. Here is the firstlook:

Behind the project is a tool that collects data from public websites and inserts into the MySQL database. Here is how it looks:

Database


The idea of making this project

During my exposure to web programming, I noticed that most websites, especially electronic information sites, adhere to a common standard, in order to increase the number of vistors. When inspecting information from these pages, we see the presence of html "meta" tags. These tags will cover the general content of the website such as title, subject image, topic, etc. For example, when I inspect the homepage of Cnet and The Verge, we could see that there are some similarities in naming html tags such as "og:site_name", "og:description",,...



Inspecting data from Cnet HTML meta tag


Inspecting data from the Verge HTML meta tag

With that idea in mind, I started developing a tool that could recursively call in the meta tags and anchor tags of public websites, and collect the data into the database. The collected data is served to search engines.

Tech Stack

Python

React

HTML

CSS

JavaScript

MySQL

Git

Django

Docker

(back to top)