Skip to content

Devanshv17/Smart-Link

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated-Content-Analysis-Website

Team Whackos

Abhijeet Kumar, Aryan Mittal, Devansh Verma, Riya Sanket Kashive

Overview

This solution aims to automate the extraction and content analysis of all embedded links from a website, regardless of their location, and includes asking concise and relevant questions, the most relevant links and topics for those questions, all complete with an automated verification and metric system for assessing their aforementioned parameters (conciseness and relevance). A detailed documentation of the repository has been laid out in this document.

Key Features:

  1. Data Scraping: Utilizes Selenium to extract all embedded links from the target website.
  2. Data Storage: JSON files are used to store and organize the extracted data.
  3. Question Generation: We employ the duckduckgo_search library in conjunction with the gemini API to generate precise and pertinent questions.
  4. Link-Question Mapping and Relevance Metric: TFIDF Vectorization is used to map the generated questions to the most relevant links, and is employed as a relevance metric to evaluate the quality of the mappings.

Problem Statement

For detailed information on the problem statement, please refer to this document.

Achievements

Our solution achieved an accuracy of 83%.

Screenshot 2024-08-26 at 1 19 30 PM

Milestones

  1. Extraction of embedded links from a website.
  2. Content analysis and question generation.
  3. Implementation of an automated system for verification and relevance assessment.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published