Files
INTUIA/Programa final/spacy/pipeline/__pycache__/spancat.cpython-312.pyc
T

333 lines
37 KiB
Plaintext
Raw Normal View History

2026-03-15 13:27:50 +00:00
Ë
>û gcxãóxddlmZddlmZddlmZmZmZmZm Z m
Z
m Z m Z m
Z
ddlZddlmZmZmZmZmZmZddlmZmZmZmZddlmZmZdd lmZdd
l m!Z!dd l"m#Z#dd l$m%Z%m&Z&m'Z'dd
l(m)Z)m*Z*ddl+m,Z,ddl-m.Z.ddl/m0Z0dZ1dZ2dZ3e«jie1«dZ5e«jie2«dZ6eGdde««Z7ddœdee%de e8de
edefdZ9ddœdee%de:de
edefdZ;e,jxd «de e8de7fd!„«Z=e,jxd"«d#e8d$e8de7fd%„«Z>e,jxd&«de:de7fd'„«Z?e!j€d(d)gd*e3de5d gd+¢d,œd-d.id/œd0d1d1d2œ¬3«d4e!d5e:d6e7dee e e%efefde:d7e
ed8eAd9e
e8dd:fd;„«ZBe!j€d<d)ge3e6d0d gd+¢d,œd-d.id=d>œd0d1d1d2œ¬3«d4e!d5e:d6e7dee e e%efefde:d?eAd@eCd7e
edd:fdA„«ZDdBee)dee:effdC„ZEe,jŒd.«dD„«ZGeGdE„dF««ZHGdG„d:e0«ZIy)Hé)Ú dataclass)Úpartial) ÚAnyÚCallableÚDictÚIterableÚListÚOptionalÚTupleÚUnionÚcastN)ÚConfigÚModelÚOpsÚ OptimizerÚget_current_opsÚset_dropout_rate)ÚFloats2dÚInts1dÚInts2dÚRaggedé)ÚProtocolÚruntime_checkable)ÚErrors)ÚLanguage)ÚScorer)ÚDocÚSpanÚ SpanGroup)ÚExampleÚvalidate_examples)Úregistry)ÚVocabé)Ú
TrainablePipea4
[model]
@architectures = "spacy.SpanCategorizer.v1"
scorer = {"@layers": "spacy.LinearLogistic.v1"}
[model.reducer]
@layers = spacy.mean_max_reducer.v1
hidden_size = 128
[model.tok2vec]
@architectures = "spacy.Tok2Vec.v2"
[model.tok2vec.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = 96
rows = [5000, 1000, 2500, 1000]
attrs = ["NORM", "PREFIX", "SUFFIX", "SHAPE"]
include_static_vectors = false
[model.tok2vec.encode]
@architectures = "spacy.MaxoutWindowEncoder.v2"
width = ${model.tok2vec.embed.width}
window_size = 1
maxout_pieces = 3
depth = 4
a&
[model]
@architectures = "spacy.SpanCategorizer.v1"
scorer = {"@layers": "Softmax.v2"}
[model.reducer]
@layers = spacy.mean_max_reducer.v1
hidden_size = 128
[model.tok2vec]
@architectures = "spacy.Tok2Vec.v2"
[model.tok2vec.embed]
@architectures = "spacy.MultiHashEmbed.v1"
width = 96
rows = [5000, 1000, 2500, 1000]
attrs = ["NORM", "PREFIX", "SUFFIX", "SHAPE"]
include_static_vectors = false
[model.tok2vec.encode]
@architectures = "spacy.MaxoutWindowEncoder.v2"
width = ${model.tok2vec.embed.width}
window_size = 1
maxout_pieces = 3
depth = 4
ÚscÚmodelcó2eZdZddœdeedeedefdZy)Ú SuggesterN©ÚopsÚdocsr,Úreturncóy©)Úselfr-r,s úWC:\Users\garci\AppData\Roaming\Python\Python312\site-packages\spacy/pipeline/spancat.pyÚ__call__zSuggester.__call__QsØ ó) Ú__name__Ú
__module__Ú __qualname__rrr
rrr4r1r5r3r*r*Os'àDHò ˜X c™]ð °H¸S±Mð ÈVô r5r*r+r-Úsizesr,r.c ó|
t«}g}g}|D}|jjt|«d¬«}|j d«}d}|D]}|t|«krX|dt|«|dz
z
} |j |jj
| | |zf««||djdz
}|sŒl|djdk(rŒJ|dj«|j |«Œä|j|«}
t|«dkDr&t|jj|«|
«} n't|jjdd¬«|
«} | jjdk(sJ| S) NÚÚdtype)éÿÿÿÿr%rr%r>r©rr)rÚxpÚarangeÚlenÚreshapeÚappendÚhstackÚshapeÚndimÚ asarray1irÚvstackÚzerosÚdataXd) r-r9r,ÚspansÚlengthsÚdocÚstartsÚlengthÚsizeÚ starts_sizeÚ
lengths_arrayÚoutputs r3Úngram_suggesterrUUs^ð €{ÜÓˆØ €EØ€Gۈ؜s 3›x¨sÓØ Ó؈ۈDØ”s˜3“xÒØ$Ð%<¤s¨3£x°4¸!±8Ñ'<Ð= Ø ˜SŸV™VŸ]™]¨K¸ÀtÑ9KÐ+LÓ˜™)Ÿ/™/¨!Ñ,ÚØ˜R‘y—~Ó;¨E°"©I¯O©OÓ
ð ðð—MM *€MÜ
ˆ5ƒzA˜ŸŸ
 ,¨mÓ<‰ä˜ŸŸ  V°3˜ ÓÓà =‰=× Ñ  Ò  €Mr5Ú spans_keycóâ|
t«}g}g}|D]d}d}|j|r@|j|D].}|j|j|jg«|dz
}Œ0|j|«Œft t |j|d¬««}t|«dkDrt|j|d¬«|«} | St|jjdd¬«|«} | S)Nrr%r;r<r?) rrLrDÚstartÚendr
rÚasarrayrBrr@rJ)
r-rVr,rLrMrNrPÚspanrSrTs
r3Úpreset_spans_suggesterr\rð €{ÜÓˆØ €EØ€GÛˆØˆØ 9‰9 ØŸ ™  )Ô,Ø ˜dŸj™j¨$¯(©(И!‘ ð ðôœ §¡¨W¸C Ó!@ÓA€MÜ
ˆ5ƒzA‚~ܘŸ  E°˜ Ó5°}Óð €Mô˜ŸŸ  V°3˜ ÓÓGˆØ €Mr5zspacy.ngram_suggester.v1có$tt|¬«S)z£Suggest all spans of the given lengths. Spans are returned as a ragged
array of integers. The array has two columns, indicating the start and end
position.©r9)rrUr^s r3Úbuild_ngram_suggesterr_sô ”?¨%Ô 0r5zspacy.ngram_range_suggester.v1Úmin_sizeÚmax_sizecóHtt||dz««}t|«S)zÖSuggest all spans of the given lengths between a given min and max value - both inclusive.
Spans are returned as a ragged array of integers. The array has two columns,
indicating the start and end position.r%)ÚlistÚranger_)r`rar9s r3Úbuild_ngram_range_suggesterres$ô
¨A¡Ó /€EÜ  Ó 'r5zspacy.preset_spans_suggester.v1có$tt|¬«S)z»Suggest all spans that are already stored in doc.spans[spans_key].
This is useful when an upstream component is used to set the spans
on the Doc such as a SpanRuler or SpanFinder.©rV)rr\rgs r3Úbuild_preset_spans_suggesterrhsô
Ô)°YÔ ?r5Úspancatz doc.spansçà?)r%ré)z@miscr9z@scorerszspacy.spancat_scorer.v1)Ú thresholdrVÚ max_positiver(Ú suggesterÚscorerçð?ç)Ú
spans_sc_fÚ
spans_sc_pÚ
spans_sc_r)ÚassignsÚdefault_configÚdefault_score_weightsÚnlpÚnamernrorlrmÚSpanCategorizerc
óBt|j||||dd|||d¬« S)aCreate a SpanCategorizer component and configure it for multi-label
classification to be able to assign multiple labels for each span.
The span categorizer consists of two
parts: a suggester function that proposes candidate spans, and a labeller
model that predicts one or more labels for each span.
name (str): The component instance name, used to add entries to the
losses during training.
suggester (Callable[[Iterable[Doc], Optional[Ops]], Ragged]): A function that suggests spans.
Spans are returned as a ragged array with two integer columns, for the
start and end positions.
model (Model[Tuple[List[Doc], Ragged], Floats2d]): A model instance that
is given a list of documents and (start, end) indices representing
candidate span offsets. The model predicts a probability for each category
for each span.
spans_key (str): Key of the doc.spans dict to save the spans under. During
initialization and training, the component will look for spans on the
reference document under the same key.
scorer (Optional[Callable]): The scoring method. Defaults to
Scorer.score_spans for the Doc.spans[spans_key] with overlapping
spans allowed.
threshold (float): Minimum probability to consider a prediction positive.
Spans with a positive prediction will be saved on the Doc. Defaults to
0.5.
max_positive (Optional[int]): Maximum number of labels to consider positive
per span. Defaults to None, indicating no limit.
NTF)
r(rnryrVÚnegative_weightÚ
allow_overlaprmrlroÚadd_negative_label©rzÚvocab)rxryrnr(rVrorlrms r3Ú make_spancatr£s7ôd Ø ‰ ØØØ
ØØØØØØ ô ð r5Úspancat_singlelabelT)rVr(r|rnror}r|r}c
óBt|j||||||ddd|¬« S)aBCreate a SpanCategorizer component and configure it for multi-class
classification. With this configuration each span can get at most one
label. The span categorizer consists of two
parts: a suggester function that proposes candidate spans, and a labeller
model that predicts one or more labels for each span.
name (str): The component instance name, used to add entries to the
losses during training.
suggester (Callable[[Iterable[Doc], Optional[Ops]], Ragged]): A function that suggests spans.
Spans are returned as a ragged array with two integer columns, for the
start and end positions.
model (Model[Tuple[List[Doc], Ragged], Floats2d]): A model instance that
is given a list of documents and (start, end) indices representing
candidate span offsets. The model predicts a probability for each category
for each span.
spans_key (str): Key of the doc.spans dict to save the spans under. During
initialization and training, the component will look for spans on the
reference document under the same key.
scorer (Optional[Callable]): The scoring method. Defaults to
Scorer.score_spans for the Doc.spans[spans_key] with overlapping
spans allowed.
negative_weight (float): Multiplier for the loss terms.
Can be used to downweight the negative samples if there are too many.
allow_overlap (bool): If True the data is assumed to contain overlapping spans.
Otherwise it produces non-overlapping spans greedily prioritizing
higher assigned label scores.
r%TN)
r(rnryrVr|r}rmr~rlror)rxryrnr(rVr|r}ros r3Úmake_spancat_singlelabelr„äs7ôd Ø ‰ ØØØ
ØØØØØô ð r5Úexamplesc óút|«}dŠ|dŠ|jd«|jdd«|jdˆfd«|jdˆfd „«tj|fi|¤ŽS)
spans_rVÚattrr}gettercóT|jj|t«dg«Sr0)rLÚgetrB)rNÚkeyÚ attr_prefixs €r3ú<lambda>zspancat_score.<locals>.<lambda>,s!ø€ 3§9¡9§=¡=°´S¸Ó5EÐ5GÐ1HÈ"Ô#Mr5Úhas_annotationcó |jvSr0)rL)rNs €r3zspancat_score.<locals>.<lambda>.sø€°C¸3¿9¹9Ñ4Dr5)ÚdictÚ
setdefaultrÚ score_spans)r…Úkwargsrs @@r3Ú
spancat_scorer•%sù€Ü
&\€FØ€KØ
Ñ
€CØ
×Ñf  
¨c¨UÐ
×Ño 
×ÑØÓð ×ÑÐ&Ó(DÔ × Ñ ˜ 1¨&Ñ 1r5cótSr0)r•r1r5r3Úmake_spancat_scorerr—2sä Ðr5có"eZdZdZdZdZdZy)Ú
_Intervalsz:
Helper class to avoid storing overlapping spans.
có"t«|_yr0)ÚsetÚranges©r2s r3Ú__init__z_Intervals.__init__=s Ü“eˆ r5có\t||«D]}|jj|«Œyr0)rdÚadd)r2r;Úes r3r z_Intervals.add@s"Üq˜!ˆ K‰KO‰O˜ ñr5cóP|\}}t||«D]}||jvsŒyy)NTF)rd)r2Úrangr;s r3Ú __contains__z_Intervals.__contains__Ds/؈ˆ1Üq˜!–ˆAØD—K‘KÒÙððr5N)r6r7r8Ú__doc__ržr r1r5r3r™r™7sñòòór5r™có¬eZdZdZ d<ddddddedœd ed
eeee e
fe fd e d e
d
ede
deedeedeedeedeeddfdZede
fd«Zd=dZde
defdZedee
fd«Zedee
fd«Zedee
effd«Zedefd«Zedeedffd«Zdee fd „Z d!d"œdee d#e
ddfd$„Z!dee ddfd%„Z"d&ddd'œd(ee#d)ed*ee$d+eee
efdee
eff
d,„Z%d(ee#d-ee
e fdeeeffd.„Z&ddd/œd0egee#fd1ee'd2eee
ddfd3„Z(d(ee#fd4„Z)d5e#fd6„Z*d7e d8e+d9e de,fd:„Z- d>d7e d8e+d9e dede,f
d;„Z.y)?rzz_Pipeline component to label spans of text.
DOCS: https://spacy.io/api/spancategorizer
FrLrpTNrj)r~rVr|r}rmrlror€r(rnryr~rVr|r}rmrlror.cóÜg||
| ||dœ|_||_||_||_||_| |_||_|s2| /| dkDr)ttjj| ¬««yyy)a}Initialize the multi-label or multi-class span categorizer.
vocab (Vocab): The shared vocabulary.
model (thinc.api.Model): The Thinc Model powering the pipeline component.
For multi-class classification (single label per span) we recommend
using a Softmax classifier as a the final layer, while for multi-label
classification (multiple possible labels per span) we recommend Logistic.
suggester (Callable[[Iterable[Doc], Optional[Ops]], Ragged]): A function that suggests spans.
Spans are returned as a ragged array with two integer columns, for the
start and end positions.
name (str): The component instance name, used to add entries to the
losses during training.
spans_key (str): Key of the Doc.spans dict to save the spans under.
During initialization and training, the component will look for
spans on the reference document under the same key. Defaults to
`"spans"`.
add_negative_label (bool): Learn to predict a special 'negative_label'
when a Span is not annotated.
threshold (Optional[float]): Minimum probability to consider a prediction
positive. Defaults to 0.5. Spans with a positive prediction will be saved
on the Doc.
max_positive (Optional[int]): Maximum number of labels to consider
positive per span. Defaults to None, indicating no limit.
negative_weight (float): Multiplier for the loss terms.
Can be used to downweight the negative samples if there are too many
when add_negative_label is True. Otherwise its unused.
allow_overlap (bool): If True the data is assumed to contain overlapping spans.
Otherwise it produces non-overlapping spans greedily prioritizing
higher assigned label scores. Only used when max_positive is 1.
scorer (Optional[Callable]): The scoring method. Defaults to
Scorer.score_spans for the Doc.spans[spans_key] with overlapping
spans allowed.
DOCS: https://spacy.io/api/spancategorizer#init
)ÚlabelsrVrlrmr|r}Nr%)rm) Úcfgr€rnr(ryror~Ú
ValueErrorrÚE1051Úformat) r2r€r(rnryr~rVr|r}rmrlros r3zSpanCategorizer.__init__Rs„ðfØ

ˆŒðˆŒ
Ø"ˆŒØˆŒ
؈Œ ؈Œ Ø"4ˆÔÙ Ð!9¸lÈQÒ>NÜœVŸ\™\×0¸ ?OÐ!9ˆ}r5có2t|jd«S)z¿Key of the doc.spans dict to save the spans under. During
initialization and training, the component will look for spans on the
reference document under the same key.
rV)Ústrrªrs r3zSpanCategorizer.key•sô 4—88˜)r5có,d}|jjd«r|jjd«}no|jjd«rT|jj d«jd«r*|jj d«jd«}|j||j
k(rZ|j sMttjj|j|jjd«¬««yyy)z<Raise an error if the component can not add any more labels.NÚnOÚ output_layer)ry) r(Úhas_dimÚget_dimÚhas_refÚget_refÚ _n_labelsÚ is_resizabler«rÚE922r­ry)r2s r3Ú_allow_extra_labelz"SpanCategorizer._allow_extra_labelà
ˆØ :‰:× Ñ ˜dÔ ×# DÓ)‰BØ
Z‰Z×
Ñ
 Ô
/°D·J±J×4FÑ4FØ ó5
ç
‰'$5ð×# NÓ;¸DÓAˆ
ˆ>˜b D§N¡NÒ× Ü—KK×&¨D¯I©I¸$¿*¹*×:LÑ:LÈTÓ:RÐðð3ˆ>r5Úlabelcót|t«sttj«||j
vry|j
«|jdj|«|jjj|«y)zÎAdd a new label to the pipe.
label (str): The label to add.
RETURNS (int): 0 if label is already present, otherwise 1.
DOCS: https://spacy.io/api/spancategorizer#add_label
rr%) Ú
isinstancer¯rÚE187r©rDr€Ústringsr )r2s r3Ú add_labelzSpanCategorizer.add_label¬sjô˜%¤ÔœVŸ[™[Ó D—K‘KÑ ØØ ×ÑÔ Ñ×! %Ô
×Ñ×јuÔr5có2t|jd«S)z†RETURNS (Tuple[str]): The labels currently added to the component.
DOCS: https://spacy.io/api/spancategorizer#labels
)Útuplerªrs r3zSpanCategorizer.labels½sô T—XX˜(r5có,t|j«S)z†RETURNS (List[str]): Information about the component's labels.
DOCS: https://spacy.io/api/spancategorizer#label_data
)rcrs r3Ú
label_datazSpanCategorizer.label_dataÅsô D—K‘KÓ Ð r5có`t|j«Dcic]\}}||Œ
c}}Scc}}w)z(RETURNS (Dict[str, int]): The label map.)Ú enumerater©)r2r;s r3Ú
_label_mapzSpanCategorizer._label_mapÍs/ô*3°4·;±;Ô)?Ô@Ñ)?™X˜Q qÐ)?Ò@ùÓ@s
*cót|jrt|j«dzSt|j«S)z RETURNS (int): Number of labels.r%)r~rBrs r3zSpanCategorizer._n_labelsÒs0ð × t—{#  t—{ #r5cóF|jrt|j«Sy)z8RETURNS (Union[int, None]): Index of the negative label.N)r~rBrs r3Ú_negative_label_iz!SpanCategorizer._negative_label_iÚsð × t—Ó r5r-có |j||jj¬«}|jj «dk(r*|jjj dd«}||fS|jj
||f«}||fS)zþApply the pipeline's model to a batch of docs, without modifying them.
docs (Iterable[Doc]): The documents to predict.
RETURNS: The models prediction for each document.
DOCS: https://spacy.io/api/spancategorizer#predict
r+r)rnr(r,rMÚsumÚalloc2fÚpredict)r2r-ÚindicesÚscoress r3zSpanCategorizer.predictâs}ð—.. ¨4¯:©:¯>©>:ˆØ ?‰?× Ñ Ó  AÒ —Z‘Z—^‘^×+¨A¨qÓ1ˆFð˜ˆÐð—ZZרw¨Ó8ˆ˜ˆÐr5Ú
candidates)Úcandidates_keyrÒcó|j||jj¬«}t||«D]L\}}g|j|<|j
D])}|j|j
||d|d«Œ+ŒNy)aoUse the spancat suggester to add a list of span candidates to a list of docs.
This method is intended to be used for debugging purposes.
docs (Iterable[Doc]): The documents to modify.
candidates_key (str): Key of the Doc.spans dict to save the candidate spans under.
DOCS: https://spacy.io/api/spancategorizer#set_candidates
r+rr%N)rnr(r,ÚziprLrKrD)r2r-Úsuggester_outputrÑrNÚindexs r3Úset_candidateszSpanCategorizer.set_candidatesñsyð Ÿ>™>¨$°D·J±J·N±N˜ä"Ð#3°TÖ:‰OˆJ˜Ø(*ˆCI‰I #×*Ø— ‘ ˜.ѰU¸1±XÀÀaÁÐ1IÕ ;r5c
óÐ|\}}d}t|«D\}}||j}tt|jd«} |jddk(r?|j ||||||j |z| «|j|j<n=|j||||||j |z«|j|j<||j |z
}ŒÓy)aModify a batch of Doc objects, using pre-computed scores.
docs (Iterable[Doc]): The documents to modify.
scores: The scores to set, produced by SpanCategorizer.predict.
DOCS: https://spacy.io/api/spancategorizer#set_annotations
rr}rmr%N)
rKr
ÚboolrªÚ_make_span_group_singlelabelrMrLÚ_make_span_group_multilabel)
r2r-Úindices_scoresrÏÚoffsetr;rNÚ indices_ir}s
r3Úset_annotationszSpanCategorizer.set_annotationsð)‰ˆØˆÜ –o‰FˆAˆsØ 
×)ˆ ¤ t§x¡x°Ñ'@ÓAˆx‰x˜Ñ'¨1Ò,Ø&*×&GÑ&GØØØ˜6 F¨W¯_©_¸QÑ-?Ñ$?Ð' ˜$Ÿ(™(Ò'+×&FÑ&FØØØ˜6 F¨W¯_©_¸QÑ-?Ñ$?Ð' ˜$Ÿ(™(Ñ
g—oo  (‰Fñ!&r5rq)ÚdropÚsgdÚlossesr…cón|i}|j|jd«t|d«|j|«t d|D««s|S|Dcgc]}|j
Œ}}|j
||jj¬«}|jj«dk(r|St|j|«|jj||f«\}} |j|||f«\}
} | | «||j|«||jxx|
z
cc<|Scc}w)a1Learn from a batch of documents and gold-standard information,
updating the pipe's model. Delegates to predict and get_loss.
examples (Iterable[Example]): A batch of Example objects.
drop (float): The dropout rate.
sgd (thinc.api.Optimizer): The optimizer.
losses (Dict[str, float]): Optional record of the loss during training.
Updated using the component name as the key.
RETURNS (Dict[str, float]): The updated losses dictionary.
DOCS: https://spacy.io/api/spancategorizer#update
rqzSpanCategorizer.updatec3óbK|]'}|jrt|j«ndŒ)y­w)rN)Ú predictedrB)Ú.0Úegs r3ú <genexpr>z)SpanCategorizer.update.<locals>.<genexpr>8s%èø€ÐOÁhÀ¨¯ ª ”3r—|‘|Ô$¸;Áhùs-/r+r)rryr"Ú_validate_categoriesÚanyrårnr(r,rMrÚ begin_updateÚget_lossÚ
finish_update) r2r…r-rLÚbackprop_scoresÚlossÚd_scoress r3ÚupdatezSpanCategorizer.updatesð( ˆˆFØ×ј$Ÿ)™) SÔ˜(Ð$<Ô ×! ÑOÁhÓˆMÙ'/Ó0¡x   xˆÐ˜t¨¯©¯©ÓØ =‰=× Ñ Ó  !Ò ˆMܘŸ *Ø"&§*¡*×"9Ñ"9¸¸-Ó"Hш؟ x°%¸°ÓA‰ˆˆhÙ˜Ô ˆ?Ø × Ñ ˜sÔ ˆty‰yÓ˜TÑØˆ
ùò1sÁD2Ú spans_scorescóØ|\}}t|jjj|j«|jjj|j
««}t
j|j|j¬«}|jr"t
j|jd«}d}|j}t|«D\} }
i} || j} t|j
| «D],}
t!| |
df«}t!| |
df«}||
z| ||f<Œ.|j#|
«D]L}|j$|j&f}|| vsŒ | |}||j(}d|||f<|jsŒHd|<ŒN||j
| z
}ŒÑ|jjj+|d¬«}|jr)t
j,«d}d|||j.f<||z
}|jr/t1t2|j4d«}|dk7r
|xx|zcc<t3|dzj7««}||fS) akFind the loss and gradient of loss for the batch of documents and
their predicted scores.
examples (Iterable[Examples]): The batch of examples.
spans_scores: Scores representing the model's predictions.
RETURNS (Tuple[float, float]): The loss and the gradient.
DOCS: https://spacy.io/api/spancategorizer#get_loss