-
Notifications
You must be signed in to change notification settings - Fork 68
Add web viewer feature for browsing downloaded messages #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add Flask-based web viewer with Telegram-style UI - Implement infinite scroll and search functionality - Add menu option 8 to TeleGatherer.py to launch viewer - Fix UTF-8 encoding issues for Windows compatibility - Update documentation and requirements
Removed duplicate downloads directory exclusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a web-based viewer for browsing downloaded Telegram messages with a Telegram-style dark UI. The feature includes a Flask backend API, responsive frontend with search and infinite scroll, and integration into the main TeleGatherer menu.
Key Changes
- New web viewer with Flask REST API for serving messages with pagination and caching
- Telegram-style responsive UI with search functionality and infinite scroll
- UTF-8 encoding fixes for Windows compatibility with emojis and special characters
Reviewed Changes
Copilot reviewed 7 out of 9 changed files in this pull request and generated 26 comments.
Show a summary per file
| File | Description |
|---|---|
web_viewer_static/styles.css |
Telegram-themed CSS with dark color scheme and responsive layout |
web_viewer_static/index.html |
Single-page HTML structure for the viewer interface |
web_viewer_static/app.js |
Frontend JavaScript handling chat loading, search, and infinite scroll |
web_viewer.py |
Flask backend with API endpoints for chats and messages, includes caching |
TeleGatherer.py |
Added menu option 8 to launch web viewer via subprocess |
helpers/TeleViewer.py |
Fixed UTF-8 encoding for file operations |
requirements.txt |
Added Flask 3.0.0 and flask-cors 4.0.0 dependencies |
.gitignore |
Added log file exclusion for web viewer static directory |
README.md |
Added web viewer documentation and usage instructions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| chatItem.innerHTML = ` | ||
| <div class="chat-item-name">${escapeHtml(chat.name)}</div> | ||
| <div class="chat-item-meta">${chat.message_count || 0} messages</div> | ||
| `; |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HTML is constructed using string concatenation with user-supplied data (chat.name). While escapeHtml() is used here, this pattern is error-prone. Consider using textContent property or DOM manipulation methods to safely set the chat name instead of innerHTML.
| chatItem.innerHTML = ` | |
| <div class="chat-item-name">${escapeHtml(chat.name)}</div> | |
| <div class="chat-item-meta">${chat.message_count || 0} messages</div> | |
| `; | |
| const chatNameDiv = document.createElement('div'); | |
| chatNameDiv.className = 'chat-item-name'; | |
| chatNameDiv.textContent = chat.name; | |
| const chatMetaDiv = document.createElement('div'); | |
| chatMetaDiv.className = 'chat-item-meta'; | |
| chatMetaDiv.textContent = `${chat.message_count || 0} messages`; | |
| chatItem.appendChild(chatNameDiv); | |
| chatItem.appendChild(chatMetaDiv); |
| if (!searchQuery) return html; | ||
|
|
||
| const query = escapeHtml(searchQuery).toLowerCase(); | ||
| const regex = new RegExp(`(${query.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')})`, 'gi'); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The regex pattern replace(/[.*+?^${}()|[\]\\]/g, '\\$&') is used to escape regex special characters, but the character class is missing escaping for the closing bracket ]. While this works in practice due to how JavaScript handles it, it's clearer to escape it as \\] for consistency and clarity.
| const regex = new RegExp(`(${query.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')})`, 'gi'); | |
| const regex = new RegExp(`(${query.replace(/[.*+?^${}()|[\]\\\]]/g, '\\$&')})`, 'gi'); |
| app.run(debug=True, port=5000, host='0.0.0.0') | ||
|
|
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server runs with debug=True and host='0.0.0.0' in production. This is a significant security risk as it:
- Exposes the Werkzeug debugger which can execute arbitrary code
- Binds to all network interfaces, making it accessible from outside the local machine
- Exposes sensitive file paths and code
For production use, set debug=False and host='127.0.0.1' (localhost only), or make these configurable via environment variables.
| app.run(debug=True, port=5000, host='0.0.0.0') | |
| debug = os.environ.get('FLASK_DEBUG', 'False').lower() in ('1', 'true', 'yes') | |
| host = os.environ.get('FLASK_HOST', '127.0.0.1') | |
| port = int(os.environ.get('FLASK_PORT', '5000')) | |
| app.run(debug=debug, port=port, host=host) |
| content = f.read() | ||
|
|
||
| remaining = content | ||
| while remaining.strip(): | ||
| message, remaining = parse_message_json(remaining) | ||
| if message: | ||
| messages.append(message) | ||
| else: | ||
| break |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire file content is read into memory with f.read() before parsing. For very large message files (potentially gigabytes), this could cause memory issues. Consider using streaming/chunked reading to parse JSON objects incrementally.
| content = f.read() | |
| remaining = content | |
| while remaining.strip(): | |
| message, remaining = parse_message_json(remaining) | |
| if message: | |
| messages.append(message) | |
| else: | |
| break | |
| buffer = "" | |
| while True: | |
| chunk = f.read(8192) | |
| if not chunk: | |
| break | |
| buffer += chunk | |
| # Try to parse as many messages as possible from the buffer | |
| while buffer.strip(): | |
| message, remaining = parse_message_json(buffer) | |
| if message: | |
| messages.append(message) | |
| buffer = remaining | |
| else: | |
| # Not enough data for a complete message, read more | |
| break | |
| # After file is read, try to parse any remaining message in buffer | |
| while buffer.strip(): | |
| message, remaining = parse_message_json(buffer) | |
| if message: | |
| messages.append(message) | |
| buffer = remaining | |
| else: | |
| break |
| # Sample messages to discover chats (check first 100 and last 100) | ||
| sample_size = min(200, len(messages)) | ||
| if sample_size > 0: | ||
| sample_indices = list(range(min(100, len(messages)))) + list(range(max(0, len(messages) - 100), len(messages))) | ||
| for idx in sample_indices: | ||
| if idx < len(messages): | ||
| msg = messages[idx] | ||
| if 'chat' in msg and msg['chat']: | ||
| chat_data = msg['chat'] | ||
| chat_key = get_chat_key(chat_data) | ||
| if chat_key and chat_key not in seen_chats: | ||
| seen_chats.add(chat_key) | ||
| # Create chat info | ||
| chat_name = chat_data.get('username') or chat_data.get('first_name') or chat_data.get('title') or str(chat_data.get('id', 'Unknown')) | ||
| chats_dict[chat_key] = { | ||
| 'id': chat_key, | ||
| 'chat_id': chat_data.get('id'), | ||
| 'username': chat_data.get('username'), | ||
| 'name': chat_name, | ||
| 'type': chat_data.get('type', ''), | ||
| 'message_count': 0, # Will be calculated on demand | ||
| 'source_folder': chat_dir.name | ||
| } | ||
|
|
||
| # Count messages per chat (do this efficiently) | ||
| chat_counts = {} | ||
| for msg in messages: | ||
| if 'chat' in msg and msg['chat']: | ||
| chat_data = msg['chat'] | ||
| chat_key = get_chat_key(chat_data) | ||
| if chat_key: | ||
| chat_counts[chat_key] = chat_counts.get(chat_key, 0) + 1 | ||
|
|
||
| # Update message counts | ||
| for chat_key, count in chat_counts.items(): | ||
| if chat_key in chats_dict: | ||
| chats_dict[chat_key]['message_count'] = count | ||
| elif chat_key not in seen_chats: | ||
| # Chat discovered during counting |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function iterates through ALL messages twice - once to discover chats (lines 249-254) and then again to count messages per chat (lines 247-259). This is inefficient for large message files. Consider combining these operations into a single pass through the messages.
| # Sample messages to discover chats (check first 100 and last 100) | |
| sample_size = min(200, len(messages)) | |
| if sample_size > 0: | |
| sample_indices = list(range(min(100, len(messages)))) + list(range(max(0, len(messages) - 100), len(messages))) | |
| for idx in sample_indices: | |
| if idx < len(messages): | |
| msg = messages[idx] | |
| if 'chat' in msg and msg['chat']: | |
| chat_data = msg['chat'] | |
| chat_key = get_chat_key(chat_data) | |
| if chat_key and chat_key not in seen_chats: | |
| seen_chats.add(chat_key) | |
| # Create chat info | |
| chat_name = chat_data.get('username') or chat_data.get('first_name') or chat_data.get('title') or str(chat_data.get('id', 'Unknown')) | |
| chats_dict[chat_key] = { | |
| 'id': chat_key, | |
| 'chat_id': chat_data.get('id'), | |
| 'username': chat_data.get('username'), | |
| 'name': chat_name, | |
| 'type': chat_data.get('type', ''), | |
| 'message_count': 0, # Will be calculated on demand | |
| 'source_folder': chat_dir.name | |
| } | |
| # Count messages per chat (do this efficiently) | |
| chat_counts = {} | |
| for msg in messages: | |
| if 'chat' in msg and msg['chat']: | |
| chat_data = msg['chat'] | |
| chat_key = get_chat_key(chat_data) | |
| if chat_key: | |
| chat_counts[chat_key] = chat_counts.get(chat_key, 0) + 1 | |
| # Update message counts | |
| for chat_key, count in chat_counts.items(): | |
| if chat_key in chats_dict: | |
| chats_dict[chat_key]['message_count'] = count | |
| elif chat_key not in seen_chats: | |
| # Chat discovered during counting | |
| # Discover chats and count messages in a single pass | |
| chat_counts = {} | |
| seen_chats = set() | |
| for msg in messages: | |
| if 'chat' in msg and msg['chat']: | |
| chat_data = msg['chat'] | |
| chat_key = get_chat_key(chat_data) | |
| if chat_key: | |
| # Count messages per chat | |
| chat_counts[chat_key] = chat_counts.get(chat_key, 0) + 1 | |
| # Discover chat if not already seen | |
| if chat_key not in seen_chats: | |
| seen_chats.add(chat_key) | |
| chat_name = chat_data.get('username') or chat_data.get('first_name') or chat_data.get('title') or str(chat_data.get('id', 'Unknown')) | |
| chats_dict[chat_key] = { | |
| 'id': chat_key, | |
| 'chat_id': chat_data.get('id'), | |
| 'username': chat_data.get('username'), | |
| 'name': chat_name, | |
| 'type': chat_data.get('type', ''), | |
| 'message_count': 0, # Will be updated below | |
| 'source_folder': chat_dir.name | |
| } | |
| # Update message counts | |
| for chat_key, count in chat_counts.items(): | |
| if chat_key in chats_dict: | |
| chats_dict[chat_key]['message_count'] = count | |
| else: | |
| # Chat discovered during counting (should not happen, but for safety) |
| </div> | ||
| </div> | ||
| <div class="search-bar" id="searchBar" style="display: none;"> | ||
| <input type="text" id="searchInput" placeholder="Search messages..." autocomplete="off"> |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The search input field lacks an associated <label> element. While it has a placeholder, screen readers cannot properly announce the purpose of the input field. Add a <label> element (can be visually hidden with CSS if needed) or use aria-label attribute for better accessibility.
| <input type="text" id="searchInput" placeholder="Search messages..." autocomplete="off"> | |
| <input type="text" id="searchInput" placeholder="Search messages..." autocomplete="off" aria-label="Search messages"> |
| function setupSearch() { | ||
| const searchInput = document.getElementById('searchInput'); | ||
| const searchClear = document.getElementById('searchClear'); | ||
| const searchResultsInfo = document.getElementById('searchResultsInfo'); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variable searchResultsInfo.
| const searchResultsInfo = document.getElementById('searchResultsInfo'); |
| viewer_process = subprocess.Popen( | ||
| [sys.executable, 'web_viewer.py'], | ||
| creationflags=subprocess.CREATE_NEW_CONSOLE | ||
| ) | ||
| else: | ||
| # On Unix-like systems, run in background | ||
| viewer_process = subprocess.Popen( |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assignment to 'viewer_process' is unnecessary as it is redefined before this value is used.
| viewer_process = subprocess.Popen( | |
| [sys.executable, 'web_viewer.py'], | |
| creationflags=subprocess.CREATE_NEW_CONSOLE | |
| ) | |
| else: | |
| # On Unix-like systems, run in background | |
| viewer_process = subprocess.Popen( | |
| subprocess.Popen( | |
| [sys.executable, 'web_viewer.py'], | |
| creationflags=subprocess.CREATE_NEW_CONSOLE | |
| ) | |
| else: | |
| # On Unix-like systems, run in background | |
| subprocess.Popen( |
| ) | ||
| else: | ||
| # On Unix-like systems, run in background | ||
| viewer_process = subprocess.Popen( |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assignment to 'viewer_process' is unnecessary as it is redefined before this value is used.
| @@ -0,0 +1,418 @@ | |||
| import os | |||
| import json | |||
| import re | |||
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 're' is not used.
| import re |
Overview
This PR adds a modern web-based viewer that provides a Telegram-style interface for browsing and searching downloaded Telegram messages. The viewer makes it easier to analyze collected intelligence from Telegram channels.
Features Added
Web Viewer Interface
Integration
TeleGatherer.pyto launch the web viewer directlypython web_viewer.pyhttp://localhost:5000Technical Details
New Files
web_viewer.py- Flask backend server with REST API endpointsweb_viewer_static/index.html- Main HTML pageweb_viewer_static/styles.css- Telegram-style CSS stylingweb_viewer_static/app.js- Frontend JavaScript with infinite scroll and searchModified Files
TeleGatherer.py- Added menu option 8 to launch web viewerhelpers/TeleViewer.py- Fixed UTF-8 encoding issues for Windows compatibility (handles emojis and special characters)requirements.txt- Added Flask==3.0.0 and flask-cors==4.0.0README.md- Added comprehensive web viewer documentation.gitignore- Added Downloads folder exclusionAPI Endpoints
GET /api/chats- List all available chatsGET /api/chats/<chat_id>/messages- Get messages with paginationGET /api/chats/<chat_id>/info- Get chat informationKey Improvements
Smart Chat Grouping - Messages are grouped by actual chat ID (from message data), not folder names, so messages from different chats stored in the same folder are displayed separately
Search Capabilities - Search across:
Performance -
Windows Compatibility - Fixed encoding issues to properly handle emojis and non-ASCII characters on Windows
Usage
TeleGatherer.py(option 6)TeleGatherer.pyand select option 8python web_viewer.pyhttp://localhost:5000Dependencies
Notes