Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,8 @@ API Reference
>>> LogfmtRenderer(key_order=["b", "a"], bool_as_flag=False)(None, "", event_dict)
'b="[1, 2, 3]" a=42 flag=true'

.. autoclass:: SensitiveDataRedactor

.. autoclass:: EventRenamer

.. autofunction:: add_log_level
Expand Down
217 changes: 217 additions & 0 deletions docs/processors.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,223 @@ Advanced log aggregation and analysis tools like [*Logstash*](https://www.elasti
For a list of shipped processors, check out the {ref}`API documentation <procs>`.


## Redacting Sensitive Data

When logging in production environments, it's critical to ensure sensitive information like passwords, API keys, personal data, and financial information doesn't end up in your logs.
*structlog* provides the {class}`~structlog.processors.SensitiveDataRedactor` processor to automatically identify and redact sensitive fields from log events.

### Basic Usage

```python
import structlog
from structlog.processors import SensitiveDataRedactor

# Create a redactor for common sensitive fields
redactor = SensitiveDataRedactor(
sensitive_fields=["password", "api_key", "secret", "token"]
)

structlog.configure(
processors=[
structlog.stdlib.add_log_level,
redactor, # Place before renderers!
structlog.processors.JSONRenderer(),
]
)

log = structlog.get_logger()
log.info("user_login", user="alice", password="secret123")
# Output: {"event": "user_login", "user": "alice", "password": "[REDACTED]", "level": "info"}
```

### Pattern Matching

Instead of listing every possible field name, use glob-style patterns with `*` (matches any sequence) and `?` (matches single character):

```python
redactor = SensitiveDataRedactor(
sensitive_fields=[
"*password*", # Matches: password, user_password, password_hash
"api_*", # Matches: api_key, api_secret, api_token
"*_token", # Matches: auth_token, refresh_token, access_token
"*secret*", # Matches: secret, client_secret, secret_key
]
)
```

### Case-Insensitive Matching

Enable case-insensitive matching when field names may have inconsistent casing:

```python
redactor = SensitiveDataRedactor(
sensitive_fields=["password", "apikey"],
case_insensitive=True
)
# Now matches: password, PASSWORD, Password, ApiKey, APIKEY, etc.
```

### Nested Structures

The redactor automatically traverses nested dictionaries and lists:

```python
log.info(
"config_loaded",
config={
"database": {
"host": "localhost",
"password": "db_secret" # Will be redacted
},
"api_keys": [
{"service": "stripe", "api_key": "sk_live_xxx"}, # Will be redacted
{"service": "twilio", "api_key": "AC_xxx"} # Will be redacted
]
}
)
```

### Custom Redaction Logic

For more control over how values are redacted, provide a custom callback:

```python
def partial_mask(field_name, value, path):
"""Show first/last 2 characters for debugging."""
if isinstance(value, str) and len(value) > 4:
return f"{value[:2]}{'*' * (len(value) - 4)}{value[-2:]}"
return "[REDACTED]"

redactor = SensitiveDataRedactor(
sensitive_fields=["*password*", "*token*"],
redaction_callback=partial_mask
)

log.info("auth", password="mysecretpassword")
# Output: {"event": "auth", "password": "my**********rd"}
```

The callback receives:
- `field_name`: The name of the field being redacted
- `value`: The original value
- `path`: The full path to the field (e.g., `"config.database.password"`)

### Compliance Use Cases

#### GDPR Compliance

Protect personally identifiable information (PII) in logs:

```python
import logging

# Separate audit logger for compliance records
audit_logger = logging.getLogger("gdpr.audit")

def gdpr_audit(field_name, value, path):
"""Log redaction events for GDPR compliance auditing."""
audit_logger.info(
"PII field redacted",
extra={
"field_name": field_name,
"field_path": path,
"value_type": type(value).__name__,
}
)

gdpr_redactor = SensitiveDataRedactor(
sensitive_fields=[
# Personal identifiers
"*email*", "*phone*", "*mobile*",
"*name*", "*first_name*", "*last_name*",
# Government IDs
"*ssn*", "*social_security*", "*passport*",
"*national_id*", "*tax_id*",
# Location data
"*address*", "*zip*", "*postal*",
# Dates that could identify
"*birth*", "*dob*",
],
case_insensitive=True,
audit_callback=gdpr_audit,
)
```

#### PCI-DSS Compliance

Protect payment card data:

```python
def mask_card_number(field_name, value, path):
"""PCI-DSS compliant card masking - show only last 4 digits."""
if "card" in field_name.lower() and isinstance(value, str):
# Remove any spaces/dashes and show last 4
digits = "".join(c for c in value if c.isdigit())
if len(digits) >= 4:
return f"****-****-****-{digits[-4:]}"
return "[REDACTED]"

pci_redactor = SensitiveDataRedactor(
sensitive_fields=[
"*card*", "*pan*", # Card numbers
"*cvv*", "*cvc*", "*cvn*", # Security codes
"*expir*", # Expiration dates
"*account_number*", # Bank accounts
"*routing*", # Routing numbers
],
case_insensitive=True,
redaction_callback=mask_card_number,
)
```

#### HIPAA Compliance

Protect health information:

```python
hipaa_redactor = SensitiveDataRedactor(
sensitive_fields=[
# Patient identifiers
"*patient_id*", "*medical_record*", "*mrn*",
# Health information
"*diagnosis*", "*prescription*", "*medication*",
"*treatment*", "*procedure*",
# Insurance
"*insurance_id*", "*policy_number*",
# Also include general PII patterns
"*ssn*", "*dob*", "*birth*",
],
case_insensitive=True,
)
```

### Combining Multiple Redactors

For applications with different compliance requirements, you can chain multiple redactors:

```python
structlog.configure(
processors=[
structlog.stdlib.add_log_level,
gdpr_redactor, # GDPR PII protection
pci_redactor, # PCI-DSS payment data
hipaa_redactor, # HIPAA health data
structlog.processors.JSONRenderer(),
]
)
```

### Performance Considerations

- **Prefer exact matches over patterns** when possible for better performance
- **Use `frozenset` internally** for O(1) exact match lookups
- **Patterns are compiled once** at initialization, not on every log call
- **Place the redactor before expensive operations** like JSON serialization

:::{versionadded} 25.1.0
:::


## Third-Party packages

*structlog* was specifically designed to be as composable and reusable as possible, so whatever you're missing:
Expand Down
Loading