Add web viewer feature for browsing downloaded messages #23

jakemuk · 2025-11-11T06:47:40Z

Overview

This PR adds a modern web-based viewer that provides a Telegram-style interface for browsing and searching downloaded Telegram messages. The viewer makes it easier to analyze collected intelligence from Telegram channels.

Features Added

Web Viewer Interface

📱 Telegram-style UI - Beautiful dark theme matching Telegram's native design
🔄 Dynamic Chat Discovery - Automatically detects and displays all chats from the Downloads folder
📝 Message Display - Shows message text, metadata, entities (URLs, emails, mentions), and forwarded messages
🔍 Search Functionality - Real-time search through messages by text, sender name, or forwarded content with visual highlighting
⚡ Infinite Scroll - Automatically loads more messages as you scroll (100 messages per page)
🎨 Responsive Design - Works seamlessly on desktop and mobile devices
💾 Performance Optimized - Caching and pagination for fast loading of large message files

Integration

Added menu option 8 to TeleGatherer.py to launch the web viewer directly
Viewer can also be launched manually: python web_viewer.py
Automatically opens browser to http://localhost:5000

Technical Details

New Files

web_viewer.py - Flask backend server with REST API endpoints
web_viewer_static/index.html - Main HTML page
web_viewer_static/styles.css - Telegram-style CSS styling
web_viewer_static/app.js - Frontend JavaScript with infinite scroll and search

Modified Files

TeleGatherer.py - Added menu option 8 to launch web viewer
helpers/TeleViewer.py - Fixed UTF-8 encoding issues for Windows compatibility (handles emojis and special characters)
requirements.txt - Added Flask==3.0.0 and flask-cors==4.0.0
README.md - Added comprehensive web viewer documentation
.gitignore - Added Downloads folder exclusion

API Endpoints

GET /api/chats - List all available chats
GET /api/chats/<chat_id>/messages - Get messages with pagination
GET /api/chats/<chat_id>/info - Get chat information

Key Improvements

Smart Chat Grouping - Messages are grouped by actual chat ID (from message data), not folder names, so messages from different chats stored in the same folder are displayed separately
Search Capabilities - Search across:
- Message text content
- Sender usernames and first names
- Forwarded message sources
Performance -
- Message caching based on file modification time
- Efficient JSON parsing for concatenated message objects
- Pagination to load 100 messages at a time
Windows Compatibility - Fixed encoding issues to properly handle emojis and non-ASCII characters on Windows

Usage

Download messages using TeleGatherer.py (option 6)
Launch web viewer:
- From menu: Run TeleGatherer.py and select option 8
- Manually: Run python web_viewer.py
Open browser to http://localhost:5000
Select a chat from the sidebar to view messages
Use the search bar to find specific messages

Dependencies

Flask==3.0.0
flask-cors==4.0.0

Notes

No breaking changes to existing functionality
Web viewer works with existing Downloads folder structure
All existing features remain unchanged

- Add Flask-based web viewer with Telegram-style UI - Implement infinite scroll and search functionality - Add menu option 8 to TeleGatherer.py to launch viewer - Fix UTF-8 encoding issues for Windows compatibility - Update documentation and requirements

Removed duplicate downloads directory exclusion

Copilot

Pull Request Overview

This PR adds a web-based viewer for browsing downloaded Telegram messages with a Telegram-style dark UI. The feature includes a Flask backend API, responsive frontend with search and infinite scroll, and integration into the main TeleGatherer menu.

Key Changes

New web viewer with Flask REST API for serving messages with pagination and caching
Telegram-style responsive UI with search functionality and infinite scroll
UTF-8 encoding fixes for Windows compatibility with emojis and special characters

Reviewed Changes

Copilot reviewed 7 out of 9 changed files in this pull request and generated 26 comments.

Show a summary per file

File	Description
`web_viewer_static/styles.css`	Telegram-themed CSS with dark color scheme and responsive layout
`web_viewer_static/index.html`	Single-page HTML structure for the viewer interface
`web_viewer_static/app.js`	Frontend JavaScript handling chat loading, search, and infinite scroll
`web_viewer.py`	Flask backend with API endpoints for chats and messages, includes caching
`TeleGatherer.py`	Added menu option 8 to launch web viewer via subprocess
`helpers/TeleViewer.py`	Fixed UTF-8 encoding for file operations
`requirements.txt`	Added Flask 3.0.0 and flask-cors 4.0.0 dependencies
`.gitignore`	Added log file exclusion for web viewer static directory
`README.md`	Added web viewer documentation and usage instructions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-19T04:43:18Z

web_viewer_static/app.js

+            chatItem.innerHTML = `
+                <div class="chat-item-name">${escapeHtml(chat.name)}</div>
+                <div class="chat-item-meta">${chat.message_count || 0} messages</div>
+            `;


The HTML is constructed using string concatenation with user-supplied data (chat.name). While escapeHtml() is used here, this pattern is error-prone. Consider using textContent property or DOM manipulation methods to safely set the chat name instead of innerHTML.

Suggested change

chatItem.innerHTML = `

<div class="chat-item-name">${escapeHtml(chat.name)}</div>

<div class="chat-item-meta">${chat.message_count || 0} messages</div>

`;

const chatNameDiv = document.createElement('div');

chatNameDiv.className = 'chat-item-name';

chatNameDiv.textContent = chat.name;

const chatMetaDiv = document.createElement('div');

chatMetaDiv.className = 'chat-item-meta';

chatMetaDiv.textContent = `${chat.message_count || 0} messages`;

chatItem.appendChild(chatNameDiv);

chatItem.appendChild(chatMetaDiv);

Copilot · 2025-11-19T04:43:19Z

web_viewer_static/app.js

+    if (!searchQuery) return html;
+
+    const query = escapeHtml(searchQuery).toLowerCase();
+    const regex = new RegExp(`(${query.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')})`, 'gi');


[nitpick] The regex pattern replace(/[.*+?^${}()|[\]\\]/g, '\\$&') is used to escape regex special characters, but the character class is missing escaping for the closing bracket ]. While this works in practice due to how JavaScript handles it, it's clearer to escape it as \\] for consistency and clarity.

Suggested change

const regex = new RegExp(`(${query.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')})`, 'gi');

const regex = new RegExp(`(${query.replace(/[.*+?^${}()|[\]\\\]]/g, '\\$&')})`, 'gi');

Copilot · 2025-11-19T04:43:19Z

web_viewer.py

+    app.run(debug=True, port=5000, host='0.0.0.0')
+


The server runs with debug=True and host='0.0.0.0' in production. This is a significant security risk as it:

Exposes the Werkzeug debugger which can execute arbitrary code

Binds to all network interfaces, making it accessible from outside the local machine

Exposes sensitive file paths and code

For production use, set debug=False and host='127.0.0.1' (localhost only), or make these configurable via environment variables.

Suggested change

app.run(debug=True, port=5000, host='0.0.0.0')

debug = os.environ.get('FLASK_DEBUG', 'False').lower() in ('1', 'true', 'yes')

host = os.environ.get('FLASK_HOST', '127.0.0.1')

port = int(os.environ.get('FLASK_PORT', '5000'))

app.run(debug=debug, port=port, host=host)

Copilot · 2025-11-19T04:43:19Z

web_viewer.py

+            content = f.read()
+
+        remaining = content
+        while remaining.strip():
+            message, remaining = parse_message_json(remaining)
+            if message:
+                messages.append(message)
+            else:
+                break


The entire file content is read into memory with f.read() before parsing. For very large message files (potentially gigabytes), this could cause memory issues. Consider using streaming/chunked reading to parse JSON objects incrementally.

Suggested change

content = f.read()

remaining = content

while remaining.strip():

message, remaining = parse_message_json(remaining)

if message:

messages.append(message)

else:

break

buffer = ""

while True:

chunk = f.read(8192)

if not chunk:

break

buffer += chunk

# Try to parse as many messages as possible from the buffer

while buffer.strip():

message, remaining = parse_message_json(buffer)

if message:

messages.append(message)

buffer = remaining

else:

# Not enough data for a complete message, read more

break

# After file is read, try to parse any remaining message in buffer

while buffer.strip():

message, remaining = parse_message_json(buffer)

if message:

messages.append(message)

buffer = remaining

else:

break

Copilot · 2025-11-19T04:43:20Z

web_viewer.py

+            # Sample messages to discover chats (check first 100 and last 100)
+            sample_size = min(200, len(messages))
+            if sample_size > 0:
+                sample_indices = list(range(min(100, len(messages)))) + list(range(max(0, len(messages) - 100), len(messages)))
+                for idx in sample_indices:
+                    if idx < len(messages):
+                        msg = messages[idx]
+                        if 'chat' in msg and msg['chat']:
+                            chat_data = msg['chat']
+                            chat_key = get_chat_key(chat_data)
+                            if chat_key and chat_key not in seen_chats:
+                                seen_chats.add(chat_key)
+                                # Create chat info
+                                chat_name = chat_data.get('username') or chat_data.get('first_name') or chat_data.get('title') or str(chat_data.get('id', 'Unknown'))
+                                chats_dict[chat_key] = {
+                                    'id': chat_key,
+                                    'chat_id': chat_data.get('id'),
+                                    'username': chat_data.get('username'),
+                                    'name': chat_name,
+                                    'type': chat_data.get('type', ''),
+                                    'message_count': 0,  # Will be calculated on demand
+                                    'source_folder': chat_dir.name
+                                }
+
+            # Count messages per chat (do this efficiently)
+            chat_counts = {}
+            for msg in messages:
+                if 'chat' in msg and msg['chat']:
+                    chat_data = msg['chat']
+                    chat_key = get_chat_key(chat_data)
+                    if chat_key:
+                        chat_counts[chat_key] = chat_counts.get(chat_key, 0) + 1
+
+            # Update message counts
+            for chat_key, count in chat_counts.items():
+                if chat_key in chats_dict:
+                    chats_dict[chat_key]['message_count'] = count
+                elif chat_key not in seen_chats:
+                    # Chat discovered during counting


The function iterates through ALL messages twice - once to discover chats (lines 249-254) and then again to count messages per chat (lines 247-259). This is inefficient for large message files. Consider combining these operations into a single pass through the messages.

Suggested change

# Sample messages to discover chats (check first 100 and last 100)

sample_size = min(200, len(messages))

if sample_size > 0:

sample_indices = list(range(min(100, len(messages)))) + list(range(max(0, len(messages) - 100), len(messages)))

for idx in sample_indices:

if idx < len(messages):

msg = messages[idx]

if 'chat' in msg and msg['chat']:

chat_data = msg['chat']

chat_key = get_chat_key(chat_data)

if chat_key and chat_key not in seen_chats:

seen_chats.add(chat_key)

# Create chat info

chat_name = chat_data.get('username') or chat_data.get('first_name') or chat_data.get('title') or str(chat_data.get('id', 'Unknown'))

chats_dict[chat_key] = {

'id': chat_key,

'chat_id': chat_data.get('id'),

'username': chat_data.get('username'),

'name': chat_name,

'type': chat_data.get('type', ''),

'message_count': 0, # Will be calculated on demand

'source_folder': chat_dir.name

}

# Count messages per chat (do this efficiently)

chat_counts = {}

for msg in messages:

if 'chat' in msg and msg['chat']:

chat_data = msg['chat']

chat_key = get_chat_key(chat_data)

if chat_key:

chat_counts[chat_key] = chat_counts.get(chat_key, 0) + 1

# Update message counts

for chat_key, count in chat_counts.items():

if chat_key in chats_dict:

chats_dict[chat_key]['message_count'] = count

elif chat_key not in seen_chats:

# Chat discovered during counting

# Discover chats and count messages in a single pass

chat_counts = {}

seen_chats = set()

for msg in messages:

if 'chat' in msg and msg['chat']:

chat_data = msg['chat']

chat_key = get_chat_key(chat_data)

if chat_key:

# Count messages per chat

chat_counts[chat_key] = chat_counts.get(chat_key, 0) + 1

# Discover chat if not already seen

if chat_key not in seen_chats:

seen_chats.add(chat_key)

chat_name = chat_data.get('username') or chat_data.get('first_name') or chat_data.get('title') or str(chat_data.get('id', 'Unknown'))

chats_dict[chat_key] = {

'id': chat_key,

'chat_id': chat_data.get('id'),

'username': chat_data.get('username'),

'name': chat_name,

'type': chat_data.get('type', ''),

'message_count': 0, # Will be updated below

'source_folder': chat_dir.name

}

# Update message counts

for chat_key, count in chat_counts.items():

if chat_key in chats_dict:

chats_dict[chat_key]['message_count'] = count

else:

# Chat discovered during counting (should not happen, but for safety)

Copilot · 2025-11-19T04:43:25Z

web_viewer_static/index.html

+                </div>
+            </div>
+            <div class="search-bar" id="searchBar" style="display: none;">
+                <input type="text" id="searchInput" placeholder="Search messages..." autocomplete="off">


The search input field lacks an associated <label> element. While it has a placeholder, screen readers cannot properly announce the purpose of the input field. Add a <label> element (can be visually hidden with CSS if needed) or use aria-label attribute for better accessibility.

Suggested change

<input type="text" id="searchInput" placeholder="Search messages..." autocomplete="off">

<input type="text" id="searchInput" placeholder="Search messages..." autocomplete="off" aria-label="Search messages">

Copilot · 2025-11-19T04:43:25Z

web_viewer_static/app.js

+function setupSearch() {
+    const searchInput = document.getElementById('searchInput');
+    const searchClear = document.getElementById('searchClear');
+    const searchResultsInfo = document.getElementById('searchResultsInfo');


Unused variable searchResultsInfo.

Suggested change

const searchResultsInfo = document.getElementById('searchResultsInfo');

Copilot · 2025-11-19T04:43:25Z

TeleGatherer.py

+          viewer_process = subprocess.Popen(
+              [sys.executable, 'web_viewer.py'],
+              creationflags=subprocess.CREATE_NEW_CONSOLE
+          )
+        else:
+          # On Unix-like systems, run in background
+          viewer_process = subprocess.Popen(


This assignment to 'viewer_process' is unnecessary as it is redefined before this value is used.

Suggested change

viewer_process = subprocess.Popen(

[sys.executable, 'web_viewer.py'],

creationflags=subprocess.CREATE_NEW_CONSOLE

)

else:

# On Unix-like systems, run in background

viewer_process = subprocess.Popen(

subprocess.Popen(

[sys.executable, 'web_viewer.py'],

creationflags=subprocess.CREATE_NEW_CONSOLE

)

else:

# On Unix-like systems, run in background

subprocess.Popen(

Copilot · 2025-11-19T04:43:26Z

TeleGatherer.py

+          )
+        else:
+          # On Unix-like systems, run in background
+          viewer_process = subprocess.Popen(


This assignment to 'viewer_process' is unnecessary as it is redefined before this value is used.

Copilot · 2025-11-19T04:43:26Z

web_viewer.py

@@ -0,0 +1,418 @@
+import os
+import json
+import re


Import of 're' is not used.

Suggested change

import re

jakemuk added 2 commits November 11, 2025 10:43

Update .gitignore

5cd27d6

Removed duplicate downloads directory exclusion

tsale requested a review from Copilot November 19, 2025 04:36

tsale self-assigned this Nov 19, 2025

tsale added the in-progress label Nov 19, 2025

Copilot started reviewing on behalf of tsale November 19, 2025 04:36 View session

Copilot finished reviewing on behalf of tsale November 19, 2025 04:39

Copilot AI reviewed Nov 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add web viewer feature for browsing downloaded messages #23

Add web viewer feature for browsing downloaded messages #23

Uh oh!

jakemuk commented Nov 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            chatItem.innerHTML = `
-                <div class="chat-item-name">${escapeHtml(chat.name)}</div>
-                <div class="chat-item-meta">${chat.message_count || 0} messages</div>
-            `;
+            const chatNameDiv = document.createElement('div');
+            chatNameDiv.className = 'chat-item-name';
+            chatNameDiv.textContent = chat.name;
+            const chatMetaDiv = document.createElement('div');
+            chatMetaDiv.className = 'chat-item-meta';
+            chatMetaDiv.textContent = `${chat.message_count || 0} messages`;
+            chatItem.appendChild(chatNameDiv);
+            chatItem.appendChild(chatMetaDiv);

	const regex = new RegExp(`(${query.replace(/[.*+?^${}()\|[\]\\]/g, '\\$&')})`, 'gi');
	const regex = new RegExp(`(${query.replace(/[.*+?^${}()\|[\]\\\]]/g, '\\$&')})`, 'gi');

-    app.run(debug=True, port=5000, host='0.0.0.0')
+    debug = os.environ.get('FLASK_DEBUG', 'False').lower() in ('1', 'true', 'yes')
+    host = os.environ.get('FLASK_HOST', '127.0.0.1')
+    port = int(os.environ.get('FLASK_PORT', '5000'))
+    app.run(debug=debug, port=port, host=host)

-            content = f.read()
-        remaining = content
-        while remaining.strip():
-            message, remaining = parse_message_json(remaining)
-            if message:
-                messages.append(message)
-            else:
-                break
+            buffer = ""
+            while True:
+                chunk = f.read(8192)
+                if not chunk:
+                    break
+                buffer += chunk
+                # Try to parse as many messages as possible from the buffer
+                while buffer.strip():
+                    message, remaining = parse_message_json(buffer)
+                   if message:
+                       messages.append(message)
+                       buffer = remaining
+                   else:
+                       # Not enough data for a complete message, read more
+                       break
+            # After file is read, try to parse any remaining message in buffer
+            while buffer.strip():
+                message, remaining = parse_message_json(buffer)
+                if message:
+                    messages.append(message)
+                    buffer = remaining
+                else:
+                    break

	<input type="text" id="searchInput" placeholder="Search messages..." autocomplete="off">
	<input type="text" id="searchInput" placeholder="Search messages..." autocomplete="off" aria-label="Search messages">

Add web viewer feature for browsing downloaded messages #23

Are you sure you want to change the base?

Add web viewer feature for browsing downloaded messages #23

Uh oh!

Conversation

jakemuk commented Nov 11, 2025

Overview

Features Added

Web Viewer Interface

Integration

Technical Details

New Files

Modified Files

API Endpoints

Key Improvements

Usage

Dependencies

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants