In this tutorial, weโll build a desktop app that:
โ
Extracts links from files (.txt, .pdf, .html)
โ
Filters links (include/exclude keywords)
โ
Checks if links are broken
โ
Displays results with colors (๐ข working / ๐ด broken)
โ
Uses a modern GUI with PySide6
๐ฆ Step 1: Install Dependencies
First, install required packages:
pip install PySide6 requests PyPDF2
๐ง Step 2: Import Required Libraries
We start by importing everything we need:
import os
import sys
import re
import requests
import time
import platform
import subprocess
from PySide6.QtWidgets import *
from PySide6.QtCore import Qt, QThread, Signal, QTimer
from PySide6.QtGui import QColor, QIcon, QGuiApplication
import PyPDF2
๐ก Explanation:
os, re โ file handling + regex
requests โ check links
PySide6 โ GUI framework
PyPDF2 โ extract text from PDFs
๐งต Step 3: Create a Background Worker (QThread)
We use a thread so the UI doesnโt freeze while scanning.
class LinkWorker(QThread):
found = Signal(str, bool)
progress = Signal(int)
finished = Signal()
๐ก Why?
GUI apps must stay responsive, so heavy work runs in a thread.
๐ Step 3.1: Initialize Worker
def __init__(self, folder, file_types, check_broken, include_words=None, exclude_words=None):
super().__init__()
self.folder = folder
self.file_types = file_types
self.check_broken = check_broken
self.include_words = include_words or []
self.exclude_words = exclude_words or []
self.seen_links = set()
self._running = True
๐ก Features:
Avoid duplicate links
Support include/exclude filters
Allow stopping process
๐ Step 3.2: Scan Files
def run(self):
all_files = []
for root, _, files in os.walk(self.folder):
for f in files:
ext = os.path.splitext(f)[1].lower()
if (ext == '.txt' and self.file_types['txt']) or \
(ext == '.pdf' and self.file_types['pdf']) or \
(ext in ['.html', '.htm'] and self.file_types['html']):
all_files.append(os.path.join(root, f))
๐ก What happens:
Recursively scans folders
Filters only selected file types
๐ Step 3.3: Extract Links
urls = re.findall(r'https?://[^\s"\'>]+', text)
๐ก Regex explained:
Matches http:// or https://
Stops at spaces or quotes
๐ Handle PDF Files
reader = PyPDF2.PdfReader(f)
for page in reader.pages:
text = page.extract_text()
๐ฏ Step 3.4: Apply Filters
if self.include_words and not any(w in url for w in self.include_words):
continue
if self.exclude_words and any(w in url for w in self.exclude_words):
continue
๐ก Example:
Include: google
Exclude: facebook
๐ Step 3.5: Check Broken Links
def check_link(self, url):
try:
res = requests.get(url, timeout=10)
return not (200 <= res.status_code < 400)
except:
return True
๐ก Logic:
200โ399 โ OK
400+ โ broken
๐ฅ๏ธ Step 4: Build the GUI
Create the main window:
class LinkApp(QWidget):
def __init__(self):
super().__init__()
self.setWindowTitle("LinkGuardian")
self.setMinimumSize(1000, 600)
๐ Step 4.1: Folder Selection
self.path_input = QLineEdit()
self.path_input.setReadOnly(True)
browse_btn = QPushButton("Browse")
browse_btn.clicked.connect(self.browse_folder)
def browse_folder(self):
folder = QFileDialog.getExistingDirectory(self)
if folder:
self.path_input.setText(folder)
self.folder = folder
โ๏ธ Step 4.2: Options (Checkboxes)
self.txt_checkbox = QCheckBox(".txt")
self.pdf_checkbox = QCheckBox(".pdf")
self.html_checkbox = QCheckBox(".html")
self.check_broken_checkbox = QCheckBox("Check Broken Links")
๐ Step 4.3: Filters
self.include_input = QLineEdit()
self.include_input.setPlaceholderText("Include words")
self.exclude_input = QLineEdit()
self.exclude_input.setPlaceholderText("Exclude words")
โถ๏ธ Step 4.4: Start Scan
def start_scan(self):
self.worker = LinkWorker(
self.folder,
{
'txt': self.txt_checkbox.isChecked(),
'pdf': self.pdf_checkbox.isChecked(),
'html': self.html_checkbox.isChecked()
},
self.check_broken_checkbox.isChecked(),
self.include_input.text().split(","),
self.exclude_input.text().split(",")
)
self.worker.found.connect(self.add_link)
self.worker.start()
๐จ Step 5: Display Results
def add_link(self, link, is_broken):
item = QListWidgetItem(link)
color = QColor("red") if is_broken else QColor("green")
item.setForeground(color)
self.results_list.addItem(item)
๐ก Result:
๐ข Green โ Working link
๐ด Red โ Broken link
๐ Step 6: Progress Bar
self.progress_bar = QProgressBar()
self.progress_bar.setMaximum(100)
Update it from the worker:
self.worker.progress.connect(self.progress_bar.setValue)
๐ Step 7: Copy All Links
def copy_all_links(self):
links = "\n".join(
self.results_list.item(i).text()
for i in range(self.results_list.count())
)
QGuiApplication.clipboard().setText(links)
๐ Step 8: Open Links on Double Click
def open_item(self, item):
url = item.text()
if platform.system() == "Windows":
os.startfile(url)
else:
subprocess.Popen(["xdg-open", url])
๐ Step 9: Run the App
if __name__ == "__main__":
app = QApplication(sys.argv)
window = LinkApp()
window.show()
sys.exit(app.exec())
๐ Final Result
You now have a professional desktop tool that:
โ Extracts links from files
โ Filters intelligently
โ Detects broken links
โ Displays results beautifully
โ Runs smoothly with threads
๐ก Bonus Ideas
Want to upgrade it further?
Export results to CSV
Add domain grouping
Add link preview
Add multi-threaded link checking (faster ๐)
United States
NORTH AMERICA
Related News
How Brazeโs CTO is rethinking engineering for the agentic area
11h ago
Amazon Employees Are 'Tokenmaxxing' Due To Pressure To Use AI Tools
22h ago
KDE Receives $1.4 Million Investment From Sovereign Tech Fund
2h ago
Instagramโs new โInstantsโ feature combines elements from Snapchat and BeReal
2h ago
Six Claude Code Skills That Close the AI Agent Feedback Loop
2h ago