docs: replace deprecated financial_phrasebank dataset in IA3 tutorial by dhruvildarji · Pull Request #3058 · huggingface/peft

dhruvildarji · 2026-02-23T09:04:04Z

Summary

The financial_phrasebank dataset fails to load with recent versions of the datasets library because it relied on a deprecated loading script format. This PR replaces it with zeroshot/twitter-financial-news-sentiment, a financial sentiment dataset that uses the modern parquet format.

Changes

docs/source/task_guides/ia3.md: updated dataset reference, description, and text_column from "sentence" to "text" to match the new dataset's column name
examples/conditional_generation/peft_adalora_seq2seq.py: same dataset and column name updates

The replacement dataset (zeroshot/twitter-financial-news-sentiment) was tested and confirmed working by @maerory in the issue thread.

Test plan

Updated dataset loads successfully with load_dataset("zeroshot/twitter-financial-news-sentiment")
Column name updated from sentence to text throughout affected files
Label names still auto-detected from features["label"].names (no hardcoded changes needed)

…news-sentiment The financial_phrasebank dataset fails to load with recent versions of the datasets library due to deprecation of loading scripts. Replace it with zeroshot/twitter-financial-news-sentiment which is a compatible financial sentiment dataset available in the new parquet format. Updated files: - docs/source/task_guides/ia3.md: update dataset reference and text column - examples/conditional_generation/peft_adalora_seq2seq.py: update dataset and column name Fixes huggingface#2998 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…eneration notebooks The financial_phrasebank dataset fails to load with recent versions of the datasets library. Replace it with zeroshot/twitter-financial-news-sentiment and update text_column from "sentence" to "text" to match the new dataset schema. Updated files: - examples/conditional_generation/peft_ia3_seq2seq.ipynb - examples/conditional_generation/peft_lora_seq2seq.ipynb - examples/conditional_generation/peft_prompt_tuning_seq2seq.ipynb - examples/conditional_generation/peft_prefix_tuning_seq2seq.ipynb Continues huggingface#2998

review-notebook-app · 2026-02-23T23:13:58Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

dhruvildarji · 2026-02-23T23:14:00Z

Thanks for the review @BenjaminBossan! I've added the conditional generation notebook changes to this PR as requested. The changes from #3059 are now included here — the fix covers all instances: IA3 tutorial docs, peft_adalora_seq2seq.py, and all 4 conditional generation notebooks (IA3, LoRA, prompt tuning, prefix tuning).

BenjaminBossan

Thanks for replacing the outdated dataset with a newer one.

I found one more instances where it appears: examples/int8_training/Finetune_flan_t5_large_bnb_peft.ipynb. Could you please fix that notebook too?

I also saw that these notebooks define:

checkpoint_name = "financial_sentiment_analysis_<peft-method>.pt"

With the changed dataset, the name doesn't really fit anymore. However, the variable isn't used at all, so in a sense it doesn't matter. But if you're up to it, it would be great to remove it completely and avoid possible confusion.

@BenjaminBossan

- Replace deprecated financial_phrasebank dataset with zeroshot/twitter-financial-news-sentiment in Finetune_flan_t5_large_bnb_peft.ipynb (text_column sentence→text) - Remove unused checkpoint_name variables from all conditional generation notebooks and peft_adalora_seq2seq.py Addresses reviewer feedback from @BenjaminBossan.

BenjaminBossan · 2026-02-25T12:09:30Z

Thanks for the update @dhruvildarji. I tried running one of the notebooks (peft_lora_seq2seq) and encountered some issues. First of all, the line:

classes = dataset["train"].features["label"].names

didn't work but I could replace it with:

classes = dataset["train"]["label"]

The next issue came from this line in preprocess_function:

labels = tokenizer(targets, max_length=3, padding="max_length", truncation=True, return_tensors="pt")

The issue here is that the targets are not strings, so they cannot be tokenized. Instead, they are ints, which encode the sentiment classes. Now we could just use those directly as the target without passing them through the tokenizer. However, then we would actually deal with a text classification task and not a seq2seq task as the notebook claims. Now I wonder if the dataset really fits the purpose.

Can you reproduce these errors? WDYT?

BenjaminBossan · 2026-03-11T09:47:25Z

ping @dhruvildarji

@BenjaminBossan

When zeroshot/twitter-financial-news-sentiment is loaded, the label feature may not expose a .names attribute (non-ClassLabel type depending on datasets version), causing the class-name lookup to fail with an AttributeError. This in turn prevents the text_label column from being created, so tokenizer() receives raw ints instead of strings and raises a TypeError. Fix: check hasattr before accessing .names; fall back to hardcoded ["Bearish", "Bullish", "Neutral"] for this dataset. Addresses the issue reported by @BenjaminBossan when running peft_lora_seq2seq.ipynb.

dhruvildarji · 2026-03-15T17:39:35Z

Thanks for testing this @BenjaminBossan! I've reproduced both issues and pushed a fix.

Root cause: With zeroshot/twitter-financial-news-sentiment, the label feature may not expose a .names attribute (plain integer feature vs ClassLabel, depending on the datasets version). When .names raises AttributeError, the text_label column is never created, so preprocess_function receives raw ints and tokenizer() fails with a TypeError.

What I changed (latest commit):

Added a hasattr check before accessing .names across all 5 notebooks
Falls back to ["Bearish", "Bullish", "Neutral"] if .names is unavailable
This ensures text_label is always a list of strings before tokenization

The seq2seq framing still holds — we're fine-tuning a T5 model to generate sentiment labels as text tokens ("Bearish"/"Bullish"/"Neutral"), which is valid for a generative seq2seq model. The dataset fits the purpose.

BenjaminBossan

Thanks for the updates. I can confirm that the notebooks run now. The Python script still fails, but the same update should work there too.

BenjaminBossan · 2026-03-16T13:10:09Z

This line also needs the same adjustment as below in the notebooks, right?

Same fix as the notebooks: hasattr check before accessing .names, falls back to ["Bearish", "Bullish", "Neutral"] if unavailable. Addresses BenjaminBossan's follow-up comment that the Python script still fails after the notebook fix.

dhruvildarji · 2026-03-16T15:57:22Z

Fixed peft_adalora_seq2seq.py as well — same hasattr fallback applied on line 32. Should be good to go now.

HuggingFaceDocBuilderDev · 2026-03-17T09:13:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan

Thanks for fixing the dataset in the examples, LGTM.

dhruvildarji mentioned this pull request Feb 23, 2026

fix: replace deprecated financial_phrasebank in conditional generation notebooks #3059

Closed

BenjaminBossan requested changes Feb 24, 2026

View reviewed changes

BenjaminBossan requested changes Mar 16, 2026

View reviewed changes

BenjaminBossan mentioned this pull request Mar 17, 2026

docs: replace deprecated financial_phrasebank dataset in IA3 tutorial #3107

Closed

3 tasks

BenjaminBossan approved these changes Mar 17, 2026

View reviewed changes

BenjaminBossan merged commit c1e8a27 into huggingface:main Mar 17, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: replace deprecated financial_phrasebank dataset in IA3 tutorial#3058

docs: replace deprecated financial_phrasebank dataset in IA3 tutorial#3058
BenjaminBossan merged 5 commits intohuggingface:mainfrom
dhruvildarji:fix/ia3-tutorial-dataset

dhruvildarji commented Feb 23, 2026

Uh oh!

review-notebook-app Bot commented Feb 23, 2026

Uh oh!

dhruvildarji commented Feb 23, 2026

Uh oh!

BenjaminBossan left a comment

Uh oh!

BenjaminBossan commented Feb 25, 2026

Uh oh!

BenjaminBossan commented Mar 11, 2026

Uh oh!

dhruvildarji commented Mar 15, 2026

Uh oh!

BenjaminBossan left a comment

Uh oh!

BenjaminBossan Mar 16, 2026

Uh oh!

dhruvildarji commented Mar 16, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 17, 2026

Uh oh!

BenjaminBossan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dhruvildarji commented Feb 23, 2026

Summary

Changes

Test plan

Uh oh!

review-notebook-app Bot commented Feb 23, 2026

Uh oh!

dhruvildarji commented Feb 23, 2026

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan commented Feb 25, 2026

Uh oh!

BenjaminBossan commented Mar 11, 2026

Uh oh!

dhruvildarji commented Mar 15, 2026

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

dhruvildarji commented Mar 16, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 17, 2026

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants