How to define ration of summary with hugging face transformers pipeline?

Shawn

I am using the following code to summarize an article from using huggingface-transformer's pipeline. Using this code:

from transformers import pipeline
summarizer = pipeline(task="summarization" )
summary = summarizer(text)
print(summary[0]['summary_text'])

How can I define a ratio between the summary and the original article? For example, 20% of the original article?

EDIT 1: I implemented the solution you suggested, but got the following error. This is the code I used:

summarizer(text, min_length = int(0.1 * len(text)), max_length = int(0.2 * len(text)))
print(summary[0]['summary_text'])

The error I got:

RuntimeError                              Traceback (most recent call last)
<ipython-input-9-bc11c5d8eb66> in <module>()
----> 1 summarizer(text, min_length = int(0.1 * len(text)), max_length = int(0.2 * len(text)))
      2 print(summary[0]['summary_text'])

13 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1482         # remove once script supports set_grad_enabled
   1483         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1484     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1485 
   1486 

RuntimeError: index out of range: Tried to access index 1026 out of table with 1025 rows. at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418
dennlinger

(Note that this answer is based on the documentation for version 2.6 of transformers)

It seems that as of yet the documentation on the pipeline feature is still very shallow, which is why we have to dig a bit deeper. When calling a Python object, it internally references its own __call__ property, which we can find here for the summarization pipeline.

Note that it allows us (similar to the underlying BartForConditionalGeneration model) to specifiy the min_length and max_length, which is why we can simply call with something like

summarizer(text, min_length = 0.1 * len(text), max_length = 0.2 * len(text)

This would give you a summary of about 10-20% length of the original data, but of course you can change that to your liking. Note that the default value for BartForConditionalGeneration for max_length is 20 (as of now, min_length is undocumented, but defaults to 0), whereas the summarization pipeline has values min_length=21 and max_length=142.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

TOP Ranking

HotTag

Archive