Files
INTUIA/Programa final/spacy/pipeline/__pycache__/textcat.cpython-312.pyc
T

210 lines
20 KiB
Plaintext
Raw Normal View History

2026-03-15 13:27:50 +00:00
Ë
>û g<ãóddlmZddlmZmZmZmZmZmZm Z ddl
Z
ddl m Z m
Z
mZmZmZddlmZddlmZddlmZdd lmZdd
lmZdd lmZmZmZdd lm Z dd
l!m"Z"ddl#m$Z$dZ%e «jMe%«dZ'dZ(dZ)ejTddgde'ddidœdddddddddddœ
¬«dede+de
eeeefde,d eed!d"f d#„«Z-d$eed!ee+effd%„Z.e j^d«d&„«Z0Gd'„d"e$«Z1y)(é)Úislice)ÚAnyÚCallableÚDictÚIterableÚListÚOptionalÚTupleN)ÚConfigÚModelÚ OptimizerÚget_array_moduleÚset_dropout_rate)ÚFloats2dé)ÚErrors)ÚLanguage)ÚScorer)ÚDoc)ÚExampleÚvalidate_examplesÚvalidate_get_examples)Úregistry)ÚVocabé)Ú
TrainablePipeaW
[model]
@architectures = "spacy.TextCatEnsemble.v2"
[model.tok2vec]
@architectures = "spacy.Tok2Vec.v2"
[model.tok2vec.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = 64
rows = [2000, 2000, 500, 1000, 500]
attrs = ["NORM", "LOWER", "PREFIX", "SUFFIX", "SHAPE"]
include_static_vectors = false
[model.tok2vec.encode]
@architectures = "spacy.MaxoutWindowEncoder.v2"
width = ${model.tok2vec.embed.width}
window_size = 1
maxout_pieces = 3
depth = 2
[model.linear_model]
@architectures = "spacy.TextCatBOW.v3"
exclusive_classes = true
length = 262144
ngram_size = 1
no_output_layer = false
Úmodelz€
[model]
@architectures = "spacy.TextCatBOW.v3"
exclusive_classes = true
length = 262144
ngram_size = 1
no_output_layer = false
a`
[model]
@architectures = "spacy.TextCatReduce.v1"
exclusive_classes = true
use_reduce_first = false
use_reduce_last = false
use_reduce_max = false
use_reduce_mean = true
[model.tok2vec]
@architectures = "spacy.HashEmbedCNN.v2"
pretrained_vectors = null
width = 96
depth = 4
embed_size = 2000
window_size = 1
maxout_pieces = 3
subword_features = true
Útextcatzdoc.catsçz@scorerszspacy.textcat_scorer.v2)Ú thresholdrÚscorerçð?)
Ú
cats_scoreÚcats_score_descÚ cats_micro_pÚ cats_micro_rÚ cats_micro_fÚ cats_macro_pÚ cats_macro_rÚ cats_macro_fÚcats_macro_aucÚcats_f_per_type)ÚassignsÚdefault_configÚdefault_score_weightsÚnlpÚnamer r!ÚreturnÚTextCategorizercó6t|j||||¬«S)Create a TextCategorizer component. The text categorizer predicts categories
over a whole document. It can learn one or more labels, and the labels are considered
to be mutually exclusive (i.e. one true label per doc).
model (Model[List[Doc], List[Floats2d]]): A model instance that predicts
scores for each category.
threshold (float): Cutoff to consider a prediction "positive".
scorer (Optional[Callable]): The scoring method.
)r r!)r3Úvocab)r0r1rr r!s úWC:\Users\garci\AppData\Roaming\Python\Python312\site-packages\spacy/pipeline/textcat.pyÚ make_textcatr7MsôJ ˜3Ÿ9™9 e¨T¸YÈvÔ Úexamplesc ó4tj|dfddi|¤ŽS)catsÚ multi_labelF)rÚ
score_cats)r9Úkwargss r6Ú
textcat_scorer?us/Ü × Ñ ØØñ ðð ð ñ  ðr8cótS©N)r?©r8r6Úmake_textcat_scorerrC~sä Ðr8có eZdZdZ d$edœdedededede e
dd f d
Z e d «Z
e deefd «Ze deefd
«ZdeefdZdeedd fdZdd d dœdeedede ede eeefdeeeff
dZdd d dœdeedede ede eeefdeeeff
dZdeedeej8ej8ffdZdeedeeeffdZdedefdZ d d d dœde
geefde e!d e eed!e edd f
d"„Z"deefd#„Z#y )%r3zmPipeline component for single-label text classification.
DOCS: https://spacy.io/api/textcategorizer
)r!r5rr1r r!r2Ncóv||_||_||_d|_g|ddœ}t |«|_||_y)aaInitialize a text categorizer for single-label classification.
vocab (Vocab): The shared vocabulary.
model (thinc.api.Model): The Thinc Model powering the pipeline component.
name (str): The component instance name, used to add entries to the
losses during training.
threshold (float): Unused, not needed for single-label (exclusive
classes) classification.
scorer (Optional[Callable]): The scoring method. Defaults to
Scorer.score_cats for the attribute "cats".
DOCS: https://spacy.io/api/textcategorizer#init
N)Úlabelsr Úpositive_label)r5rr1Ú_rehearsal_modelÚdictÚcfgr!)Úselfr5rr1r r!rJs r6Ú__init__zTextCategorizer.__init__‰sEð,ˆŒ
؈Œ
؈Œ Ø $ˆÔàØ
ˆô
˜“9ˆŒØˆ r8cóy)NFrB©rKs r6Úsupport_missing_valuesz&TextCategorizer.support_missing_values«sð
r8có2t|jd«S)z†RETURNS (Tuple[str]): The labels currently added to the component.
DOCS: https://spacy.io/api/textcategorizer#labels
rF)ÚtuplerJrNs r6rFzTextCategorizer.labels²sô T—XX˜(r8có|jS)z†RETURNS (List[str]): Information about the component's labels.
DOCS: https://spacy.io/api/textcategorizer#label_data
)rFrNs r6Ú
label_datazTextCategorizer.label_dataºsð {‰{Ðr8Údocscóštd|D««ss|Dcgc]}|jŒ}}|jjj}|j t
t|««t
|j«f«}|S|jj|«}|jjj|«}|Scc}w)zþApply the pipeline's model to a batch of docs, without modifying them.
docs (Iterable[Doc]): The documents to predict.
RETURNS: The models prediction for each document.
DOCS: https://spacy.io/api/textcategorizer#predict
c3ó2K|]}t|«Œy­wrA©Úlen©Ú.0Údocs r6ú <genexpr>z*TextCategorizer.predict.<locals>.<genexpr>Êóèø€Ð,¡t ”3s—8¡tùó) ÚanyÚtensorrÚopsÚxpÚzerosrXÚlistrFÚpredictÚasarray)rKrTr[ÚtensorsrbÚscoress r6rezTextCategorizer.predictÂôÑ,¡tÓ,á-1Ó2©T cs—z“z¨TˆGÐ×"ˆ—XXœs¤4¨£:›´°D·K±KÓ0@ÐBˆF؈MØ×# DÓØ×Ó/ˆØˆ
ùò
3sCcóžt|«D]?\}}t|j«D]"\}}t|||f«|j|<Œ$ŒAy)aModify a batch of Doc objects, using pre-computed scores.
docs (Iterable[Doc]): The documents to modify.
scores: The scores to set, produced by TextCategorizer.predict.
DOCS: https://spacy.io/api/textcategorizer#set_annotations
N)Ú enumeraterFÚfloatr;)rKrTrhÚir[Úlabels r6Úset_annotationszTextCategorizer.set_annotationsÔsGô  o‰FˆAˆ% d§k¡kÖ25Ü"'¨¨q°!¨t© Ó"5˜ñ&r8r)ÚdropÚsgdÚlossesr9rprqrrcóØ|i}|j|jd«t|d«|j|«t d|D««s|St |j |«|j j|Dcgc]}|jŒc}«\}}|j||«\}} || «||j|«||jxx|z
cc<|Scc}w)a1Learn from a batch of documents and gold-standard information,
updating the pipe's model. Delegates to predict and get_loss.
examples (Iterable[Example]): A batch of Example objects.
drop (float): The dropout rate.
sgd (thinc.api.Optimizer): The optimizer.
losses (Dict[str, float]): Optional record of the loss during training.
Updated using the component name as the key.
RETURNS (Dict[str, float]): The updated losses dictionary.
DOCS: https://spacy.io/api/textcategorizer#update
rzTextCategorizer.updatec3óbK|]'}|jrt|j«ndŒ)y­w)rN)Ú predictedrX)rZÚegs r6r\z)TextCategorizer.update.<locals>.<genexpr>ùs%èø€ÐOÁhÀ¨¯ ª ”3r—|‘|Ô$¸!Ó;Áhùs-/) Ú
setdefaultr1rÚ_validate_categoriesr_rrÚ begin_updateruÚget_lossÚ
finish_update)
rKr9rprqrrrvrhÚ bp_scoresÚlossÚd_scoress
r6ÚupdatezTextCategorizer.updateàð( ˆ>؈FØ×ј$Ÿ)™) SÔ˜(Ð$<Ô ×! ÑOÁhÓˆMܘŸ TÔ ŸJ™J×3ÉHÓ4UÉHÀb°R·\³\ÈHÑ4UÓˆ ØŸ x°Ó8‰ˆˆhÙØ ˆ?Ø × Ñ ˜sÔ ˆty‰yÓ˜TÑØˆ
ùò
5VsÂC'có4|i}|j|jd«|j|St|d«|j |«|Dcgc]}|j
Œ}}t
d|D««s|St|j|«|jj|«\}}|jj|«\} }
|| z
} || «||j|«||jxx| dzj«z
cc<|Scc}w)Perform a "rehearsal" update from a batch of data. Rehearsal updates
teach the current model to make predictions similar to an initial model,
to try to address the "catastrophic forgetting" problem. This feature is
experimental.
examples (Iterable[Example]): A batch of Example objects.
drop (float): The dropout rate.
sgd (thinc.api.Optimizer): The optimizer.
losses (Dict[str, float]): Optional record of the loss during training.
Updated using the component name as the key.
RETURNS (Dict[str, float]): The updated losses dictionary.
DOCS: https://spacy.io/api/textcategorizer#rehearse
rzTextCategorizer.rehearsec3ó2K|]}t|«Œy­wrArWrYs r6r\z+TextCategorizer.rehearse.<locals>.<genexpr>#r]r^r) rwr1rHrrxrur_rrryr{Úsum) rKr9rprqrrrvrTrhr|ÚtargetÚgradients r6ÚrehearsezTextCategorizer.rehearsesð, ˆˆFØ×ј$Ÿ)™) SÔ × Ñ Ð ˆMܘ(Ð$>Ô ×! +Ù'/Ó0¡x   xˆÐÑ,¡tÓˆMܘŸ TÔ ŸJ™J×3°DÓˆ Ø×)×6°tÓ<‰ ˆØ˜F‘?ˆÙØ ˆ?Ø × Ñ ˜sÔ ˆty‰yÓ˜h¨™kרˆ
ùò1sÁDcó"tt|««}tj|t|j«fd¬«}tj
|t|j«fd¬«}t
|«D]m\}}t
|j«D]P\}}||jjvr|jj||||f<Œ=|jsŒJd|||f<ŒRŒo|jjj|«}||fS)f)Údtyper)
rXrdÚnumpyrcrFÚonesrjÚ referencer;rOrrarf) rKr9Ú nr_examplesÚtruthsÚ not_missingrlrvrmrns r6Ú_examples_to_truthz"TextCategorizer._examples_to_truth0ôœ$˜x›.Ó Ü˜k¬3¨t¯{©{Ó+;Ð<ÀCÔHˆÜ—jj +¬s°4·;±;Ó/?Ð!@ÈÔ Ü˜xÖ(‰EˆAˆrÜ% d§k¡kÖ2‘˜BŸL™L×-Ø#%§<¡<×#4Ñ#4°UÑ#;F˜1˜a˜4×0Ø(+K  1 Ò ×Ó/ˆØ"r8cót|d«|j|«|j|«\}}|jjj |«}||z
}||z}|dzj
«}t|«|fS)aeFind the loss and gradient of loss for the batch of documents and
their predicted scores.
examples (Iterable[Examples]): The batch of examples.
scores: Scores representing the model's predictions.
RETURNS (Tuple[float, float]): The loss and the gradient.
DOCS: https://spacy.io/api/textcategorizer#get_loss
zTextCategorizer.get_lossr)rrxrrrarfÚmeanrk)rKr9rhrr~Úmean_square_errors r6rzzTextCategorizer.get_loss?sô ˜(Ð$>Ô ×! "×5°hÓˆ Ø—j‘j—n‘n×,¨[Ó Ø˜F‘?ˆØˆØ% q™[×ÜÐÐ1r8rncóæt|t«sttj«||j
vry|j
«|jdj|«|jrZd|jjvrB|jjd|jt|j
««|_ |jjj|«y)zÎAdd a new label to the pipe.
label (str): The label to add.
RETURNS (int): 0 if label is already present, otherwise 1.
DOCS: https://spacy.io/api/textcategorizer#add_label
rrFÚ
resize_outputr)Ú
isinstanceÚstrÚ
ValueErrorrÚE187rFÚ_allow_extra_labelrJÚappendrÚattrsrXr5ÚstringsÚadd)rKrns r6Ú add_labelzTextCategorizer.add_labelRô˜%¤ÔœVŸ[™[Ó D—K‘KÑ ØØ ×ÑÔ Ñ×! %Ô :Š:˜/¨T¯Z©Z×-=Ñ-=ÑŸ×)¨/Ñ:¸4¿:¹:ÄsÈ4Ï;É;ÓGWÓXˆDŒJØ
×Ñ×јuÔr8)r0rFrGÚ get_examplesr0rFrGcó|t|d«|j|««|€9|«D].}|jjD]}|j |«ŒŒ0n|D]}|j |«Œt |j «dkrttj«|||j vr6tjj||j ¬«}t|«t |j «dk7r6tjj||j ¬«}t|«||jd<tt|«d««} | D
cgc]}
|
j Œ} }
|j#| «\} }
|j%«t | «dkDs/Jtj&j|j(¬««t | «dkDs/Jtj&j|j(¬««|j*j-| | ¬ «ycc}
w)
aInitialize the pipe for training, using a representative set
of data examples.
get_examples (Callable[[], Iterable[Example]]): Function that
returns a representative sample of gold-standard Example objects.
nlp (Language): The current nlp object the component is part of.
labels (Optional[Iterable[str]]): The labels to add to the component, typically generated by the
`init labels` command. If no labels are provided, the get_examples
callback is used to extract the labels from the data.
positive_label (Optional[str]): The positive label for a binary task with exclusive classes,
`None` otherwise and by default.
DOCS: https://spacy.io/api/textcategorizer#initialize