Skip to content

Commit 6b72f56

Browse files
added script and pipline for scheduled link rot checker
Signed-off-by: krishnaduttPanchagnula <[email protected]>
1 parent dbf3cd9 commit 6b72f56

File tree

2 files changed

+47
-0
lines changed

2 files changed

+47
-0
lines changed

.github/workflows/lin-rot-checker.yml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
name: Link rot Checker Scheduled Job
2+
3+
on:
4+
schedule:
5+
- cron: '0 8 * * *'
6+
7+
jobs:
8+
run-python:
9+
runs-on: ubuntu-latest
10+
steps:
11+
- name: Checkout repository
12+
uses: actions/checkout@v4
13+
14+
- name: Set up Python
15+
uses: actions/setup-python@v5
16+
with:
17+
python-version: '3.11'
18+
19+
- name: Execute Python script
20+
run: python your_script.py

hack/lin-rot-checker.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
import re
2+
import glob
3+
import requests
4+
5+
# Regex pattern to extract URLs from markdown links [text](url)
6+
pattern = re.compile(r'\[.*?\]\((.*?)\)')
7+
8+
links = []
9+
10+
# Iterate over all .md files in all directories inclusing childern
11+
for filename in glob.glob("**/*.md",recursive=True):
12+
with open(filename, 'r', encoding='utf-8') as f:
13+
print(f"Processing file: {filename}")
14+
content = f.read()
15+
found_links = pattern.findall(content)
16+
links.extend(found_links)
17+
18+
print(f"Extracted links:{links}")
19+
20+
for link in links:
21+
try:
22+
if requests.head(link).status_code==200:
23+
print(f'{link} link is valid')
24+
else:
25+
print (f'{link} is not valid')
26+
except requests.exceptions.RequestException as e:
27+
print("The link has exceeded the dns resolution limit and failed")

0 commit comments

Comments
 (0)