Large language models like GPT-2 are powerful for summarization but can be expensive to use, especially when deployed via APIs where cost is based on token count. This project focused on reducing input token usage while maintaining—or even improving—summarization quality.

Approach

I experimented with different input reduction techniques to optimize performance:

  • Null Prompting: Instead of explicit instructions, I tested leaving out the prompt entirely, letting GPT-2 infer the task from context.
  • TF-IDF Sentence Selection: Used TF-IDF scoring to extract the most relevant sentences from articles before passing them into GPT-2.
  • Dynamic Token Reduction: Adjusted the number of selected sentences based on article length to balance brevity and completeness.

Implementation

  • Fine-tuned GPT-2 on the CNN/DailyMail dataset, which contains news articles and their summaries.
  • Implemented custom preprocessing to trim unnecessary input before passing text to the model.
  • Trained and tested models on an NVIDIA RTX 3060 Ti, adjusting hyperparameters to fit within memory constraints.

Key Takeaways

  • Reducing token count actually improved performance in some cases, as it forced the model to focus on the most relevant content.
  • Null prompting worked just as well (or better) than explicit instructions, showing that GPT-2 can infer summarization tasks without additional guidance.
  • TF-IDF-based summarization significantly lowered token usage while retaining important details, making it a practical approach for API-based summarization tasks.

This project showed that carefully optimizing input tokens can lead to better cost efficiency and improved model outputs, making large-scale summarization more practical and affordable. Read more about my paper.