Skip to content

Conversation

@jakemuk
Copy link

@jakemuk jakemuk commented Nov 11, 2025

Overview

This PR adds a modern web-based viewer that provides a Telegram-style interface for browsing and searching downloaded Telegram messages. The viewer makes it easier to analyze collected intelligence from Telegram channels.

Features Added

Web Viewer Interface

  • 📱 Telegram-style UI - Beautiful dark theme matching Telegram's native design
  • 🔄 Dynamic Chat Discovery - Automatically detects and displays all chats from the Downloads folder
  • 📝 Message Display - Shows message text, metadata, entities (URLs, emails, mentions), and forwarded messages
  • 🔍 Search Functionality - Real-time search through messages by text, sender name, or forwarded content with visual highlighting
  • Infinite Scroll - Automatically loads more messages as you scroll (100 messages per page)
  • 🎨 Responsive Design - Works seamlessly on desktop and mobile devices
  • 💾 Performance Optimized - Caching and pagination for fast loading of large message files

Integration

  • Added menu option 8 to TeleGatherer.py to launch the web viewer directly
  • Viewer can also be launched manually: python web_viewer.py
  • Automatically opens browser to http://localhost:5000

Technical Details

New Files

  • web_viewer.py - Flask backend server with REST API endpoints
  • web_viewer_static/index.html - Main HTML page
  • web_viewer_static/styles.css - Telegram-style CSS styling
  • web_viewer_static/app.js - Frontend JavaScript with infinite scroll and search

Modified Files

  • TeleGatherer.py - Added menu option 8 to launch web viewer
  • helpers/TeleViewer.py - Fixed UTF-8 encoding issues for Windows compatibility (handles emojis and special characters)
  • requirements.txt - Added Flask==3.0.0 and flask-cors==4.0.0
  • README.md - Added comprehensive web viewer documentation
  • .gitignore - Added Downloads folder exclusion

API Endpoints

  • GET /api/chats - List all available chats
  • GET /api/chats/<chat_id>/messages - Get messages with pagination
  • GET /api/chats/<chat_id>/info - Get chat information

Key Improvements

  1. Smart Chat Grouping - Messages are grouped by actual chat ID (from message data), not folder names, so messages from different chats stored in the same folder are displayed separately

  2. Search Capabilities - Search across:

    • Message text content
    • Sender usernames and first names
    • Forwarded message sources
  3. Performance -

    • Message caching based on file modification time
    • Efficient JSON parsing for concatenated message objects
    • Pagination to load 100 messages at a time
  4. Windows Compatibility - Fixed encoding issues to properly handle emojis and non-ASCII characters on Windows

Usage

  1. Download messages using TeleGatherer.py (option 6)
  2. Launch web viewer:
    • From menu: Run TeleGatherer.py and select option 8
    • Manually: Run python web_viewer.py
  3. Open browser to http://localhost:5000
  4. Select a chat from the sidebar to view messages
  5. Use the search bar to find specific messages

Dependencies

  • Flask==3.0.0
  • flask-cors==4.0.0

Notes

  • No breaking changes to existing functionality
  • Web viewer works with existing Downloads folder structure
  • All existing features remain unchanged

- Add Flask-based web viewer with Telegram-style UI
- Implement infinite scroll and search functionality
- Add menu option 8 to TeleGatherer.py to launch viewer
- Fix UTF-8 encoding issues for Windows compatibility
- Update documentation and requirements
Removed duplicate downloads directory exclusion
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a web-based viewer for browsing downloaded Telegram messages with a Telegram-style dark UI. The feature includes a Flask backend API, responsive frontend with search and infinite scroll, and integration into the main TeleGatherer menu.

Key Changes

  • New web viewer with Flask REST API for serving messages with pagination and caching
  • Telegram-style responsive UI with search functionality and infinite scroll
  • UTF-8 encoding fixes for Windows compatibility with emojis and special characters

Reviewed Changes

Copilot reviewed 7 out of 9 changed files in this pull request and generated 26 comments.

Show a summary per file
File Description
web_viewer_static/styles.css Telegram-themed CSS with dark color scheme and responsive layout
web_viewer_static/index.html Single-page HTML structure for the viewer interface
web_viewer_static/app.js Frontend JavaScript handling chat loading, search, and infinite scroll
web_viewer.py Flask backend with API endpoints for chats and messages, includes caching
TeleGatherer.py Added menu option 8 to launch web viewer via subprocess
helpers/TeleViewer.py Fixed UTF-8 encoding for file operations
requirements.txt Added Flask 3.0.0 and flask-cors 4.0.0 dependencies
.gitignore Added log file exclusion for web viewer static directory
README.md Added web viewer documentation and usage instructions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +36 to +39
chatItem.innerHTML = `
<div class="chat-item-name">${escapeHtml(chat.name)}</div>
<div class="chat-item-meta">${chat.message_count || 0} messages</div>
`;
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTML is constructed using string concatenation with user-supplied data (chat.name). While escapeHtml() is used here, this pattern is error-prone. Consider using textContent property or DOM manipulation methods to safely set the chat name instead of innerHTML.

Suggested change
chatItem.innerHTML = `
<div class="chat-item-name">${escapeHtml(chat.name)}</div>
<div class="chat-item-meta">${chat.message_count || 0} messages</div>
`;
const chatNameDiv = document.createElement('div');
chatNameDiv.className = 'chat-item-name';
chatNameDiv.textContent = chat.name;
const chatMetaDiv = document.createElement('div');
chatMetaDiv.className = 'chat-item-meta';
chatMetaDiv.textContent = `${chat.message_count || 0} messages`;
chatItem.appendChild(chatNameDiv);
chatItem.appendChild(chatMetaDiv);

Copilot uses AI. Check for mistakes.
if (!searchQuery) return html;

const query = escapeHtml(searchQuery).toLowerCase();
const regex = new RegExp(`(${query.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')})`, 'gi');
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The regex pattern replace(/[.*+?^${}()|[\]\\]/g, '\\$&') is used to escape regex special characters, but the character class is missing escaping for the closing bracket ]. While this works in practice due to how JavaScript handles it, it's clearer to escape it as \\] for consistency and clarity.

Suggested change
const regex = new RegExp(`(${query.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')})`, 'gi');
const regex = new RegExp(`(${query.replace(/[.*+?^${}()|[\]\\\]]/g, '\\$&')})`, 'gi');

Copilot uses AI. Check for mistakes.
Comment on lines +417 to +418
app.run(debug=True, port=5000, host='0.0.0.0')

Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server runs with debug=True and host='0.0.0.0' in production. This is a significant security risk as it:

  1. Exposes the Werkzeug debugger which can execute arbitrary code
  2. Binds to all network interfaces, making it accessible from outside the local machine
  3. Exposes sensitive file paths and code

For production use, set debug=False and host='127.0.0.1' (localhost only), or make these configurable via environment variables.

Suggested change
app.run(debug=True, port=5000, host='0.0.0.0')
debug = os.environ.get('FLASK_DEBUG', 'False').lower() in ('1', 'true', 'yes')
host = os.environ.get('FLASK_HOST', '127.0.0.1')
port = int(os.environ.get('FLASK_PORT', '5000'))
app.run(debug=debug, port=port, host=host)

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +99
content = f.read()

remaining = content
while remaining.strip():
message, remaining = parse_message_json(remaining)
if message:
messages.append(message)
else:
break
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire file content is read into memory with f.read() before parsing. For very large message files (potentially gigabytes), this could cause memory issues. Consider using streaming/chunked reading to parse JSON objects incrementally.

Suggested change
content = f.read()
remaining = content
while remaining.strip():
message, remaining = parse_message_json(remaining)
if message:
messages.append(message)
else:
break
buffer = ""
while True:
chunk = f.read(8192)
if not chunk:
break
buffer += chunk
# Try to parse as many messages as possible from the buffer
while buffer.strip():
message, remaining = parse_message_json(buffer)
if message:
messages.append(message)
buffer = remaining
else:
# Not enough data for a complete message, read more
break
# After file is read, try to parse any remaining message in buffer
while buffer.strip():
message, remaining = parse_message_json(buffer)
if message:
messages.append(message)
buffer = remaining
else:
break

Copilot uses AI. Check for mistakes.
Comment on lines +223 to +261
# Sample messages to discover chats (check first 100 and last 100)
sample_size = min(200, len(messages))
if sample_size > 0:
sample_indices = list(range(min(100, len(messages)))) + list(range(max(0, len(messages) - 100), len(messages)))
for idx in sample_indices:
if idx < len(messages):
msg = messages[idx]
if 'chat' in msg and msg['chat']:
chat_data = msg['chat']
chat_key = get_chat_key(chat_data)
if chat_key and chat_key not in seen_chats:
seen_chats.add(chat_key)
# Create chat info
chat_name = chat_data.get('username') or chat_data.get('first_name') or chat_data.get('title') or str(chat_data.get('id', 'Unknown'))
chats_dict[chat_key] = {
'id': chat_key,
'chat_id': chat_data.get('id'),
'username': chat_data.get('username'),
'name': chat_name,
'type': chat_data.get('type', ''),
'message_count': 0, # Will be calculated on demand
'source_folder': chat_dir.name
}

# Count messages per chat (do this efficiently)
chat_counts = {}
for msg in messages:
if 'chat' in msg and msg['chat']:
chat_data = msg['chat']
chat_key = get_chat_key(chat_data)
if chat_key:
chat_counts[chat_key] = chat_counts.get(chat_key, 0) + 1

# Update message counts
for chat_key, count in chat_counts.items():
if chat_key in chats_dict:
chats_dict[chat_key]['message_count'] = count
elif chat_key not in seen_chats:
# Chat discovered during counting
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function iterates through ALL messages twice - once to discover chats (lines 249-254) and then again to count messages per chat (lines 247-259). This is inefficient for large message files. Consider combining these operations into a single pass through the messages.

Suggested change
# Sample messages to discover chats (check first 100 and last 100)
sample_size = min(200, len(messages))
if sample_size > 0:
sample_indices = list(range(min(100, len(messages)))) + list(range(max(0, len(messages) - 100), len(messages)))
for idx in sample_indices:
if idx < len(messages):
msg = messages[idx]
if 'chat' in msg and msg['chat']:
chat_data = msg['chat']
chat_key = get_chat_key(chat_data)
if chat_key and chat_key not in seen_chats:
seen_chats.add(chat_key)
# Create chat info
chat_name = chat_data.get('username') or chat_data.get('first_name') or chat_data.get('title') or str(chat_data.get('id', 'Unknown'))
chats_dict[chat_key] = {
'id': chat_key,
'chat_id': chat_data.get('id'),
'username': chat_data.get('username'),
'name': chat_name,
'type': chat_data.get('type', ''),
'message_count': 0, # Will be calculated on demand
'source_folder': chat_dir.name
}
# Count messages per chat (do this efficiently)
chat_counts = {}
for msg in messages:
if 'chat' in msg and msg['chat']:
chat_data = msg['chat']
chat_key = get_chat_key(chat_data)
if chat_key:
chat_counts[chat_key] = chat_counts.get(chat_key, 0) + 1
# Update message counts
for chat_key, count in chat_counts.items():
if chat_key in chats_dict:
chats_dict[chat_key]['message_count'] = count
elif chat_key not in seen_chats:
# Chat discovered during counting
# Discover chats and count messages in a single pass
chat_counts = {}
seen_chats = set()
for msg in messages:
if 'chat' in msg and msg['chat']:
chat_data = msg['chat']
chat_key = get_chat_key(chat_data)
if chat_key:
# Count messages per chat
chat_counts[chat_key] = chat_counts.get(chat_key, 0) + 1
# Discover chat if not already seen
if chat_key not in seen_chats:
seen_chats.add(chat_key)
chat_name = chat_data.get('username') or chat_data.get('first_name') or chat_data.get('title') or str(chat_data.get('id', 'Unknown'))
chats_dict[chat_key] = {
'id': chat_key,
'chat_id': chat_data.get('id'),
'username': chat_data.get('username'),
'name': chat_name,
'type': chat_data.get('type', ''),
'message_count': 0, # Will be updated below
'source_folder': chat_dir.name
}
# Update message counts
for chat_key, count in chat_counts.items():
if chat_key in chats_dict:
chats_dict[chat_key]['message_count'] = count
else:
# Chat discovered during counting (should not happen, but for safety)

Copilot uses AI. Check for mistakes.
</div>
</div>
<div class="search-bar" id="searchBar" style="display: none;">
<input type="text" id="searchInput" placeholder="Search messages..." autocomplete="off">
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The search input field lacks an associated <label> element. While it has a placeholder, screen readers cannot properly announce the purpose of the input field. Add a <label> element (can be visually hidden with CSS if needed) or use aria-label attribute for better accessibility.

Suggested change
<input type="text" id="searchInput" placeholder="Search messages..." autocomplete="off">
<input type="text" id="searchInput" placeholder="Search messages..." autocomplete="off" aria-label="Search messages">

Copilot uses AI. Check for mistakes.
function setupSearch() {
const searchInput = document.getElementById('searchInput');
const searchClear = document.getElementById('searchClear');
const searchResultsInfo = document.getElementById('searchResultsInfo');
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable searchResultsInfo.

Suggested change
const searchResultsInfo = document.getElementById('searchResultsInfo');

Copilot uses AI. Check for mistakes.
Comment on lines +288 to +294
viewer_process = subprocess.Popen(
[sys.executable, 'web_viewer.py'],
creationflags=subprocess.CREATE_NEW_CONSOLE
)
else:
# On Unix-like systems, run in background
viewer_process = subprocess.Popen(
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'viewer_process' is unnecessary as it is redefined before this value is used.

Suggested change
viewer_process = subprocess.Popen(
[sys.executable, 'web_viewer.py'],
creationflags=subprocess.CREATE_NEW_CONSOLE
)
else:
# On Unix-like systems, run in background
viewer_process = subprocess.Popen(
subprocess.Popen(
[sys.executable, 'web_viewer.py'],
creationflags=subprocess.CREATE_NEW_CONSOLE
)
else:
# On Unix-like systems, run in background
subprocess.Popen(

Copilot uses AI. Check for mistakes.
)
else:
# On Unix-like systems, run in background
viewer_process = subprocess.Popen(
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'viewer_process' is unnecessary as it is redefined before this value is used.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,418 @@
import os
import json
import re
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 're' is not used.

Suggested change
import re

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants