Files
INTUIA/Programa final/spacy/pipeline/__pycache__/entity_linker.cpython-312.pyc
T

241 lines
32 KiB
Plaintext
Raw Normal View History

2026-03-15 13:27:50 +00:00
Ë
>û gqã#ó´ddlZddlmZddlmZddlmZmZmZm Z m
Z
m Z m Z ddl
Z
ddlmZmZmZmZmZddlmZddlmZdd lmZdd
lmZmZdd lmZdd lm Z dd
l!m"Z"m#Z#ddl$m%Z%m&Z&m'Z'ddlm(Z(m)Z)ddl*m+Z+ddl,m-Z-ddl.m/Z/ddl0m1Z1dZ2dZ3e«jie3«dZ5ejldgd¢dge5gddddddiddiddidd d!idddd"œd#ddd$œ¬%«dd&œd'ed(e7ded)e e7d*e8d+e9d,e9d-e8d.eee#ge efd/eee e#ge e efd0ee+e8gefd1e9d2e ed3e9d4e8d5e e:f d6„«Z;d7„Z<e)jzd!«d8„«Z>Gd9„d:e1«Z?y);éN)Úislice)ÚPath)ÚAnyÚCallableÚDictÚIterableÚListÚOptionalÚUnion)ÚConfigÚCosineDistanceÚModelÚ OptimizerÚset_dropout_rate)ÚFloats2dé)Úutil)ÚErrors)Ú CandidateÚ
KnowledgeBase)ÚLanguage)ÚScorer)ÚDocÚSpan)ÚExampleÚvalidate_examplesÚvalidate_get_examples)ÚSimpleFrozenListÚregistry)ÚVocabé)ÚEntityLinker_v1)Údeserialize_config)Ú
TrainablePipeTzç
[model]
@architectures = "spacy.EntityLinker.v2"
[model.tok2vec]
@architectures = "spacy.HashEmbedCNN.v2"
pretrained_vectors = null
width = 96
depth = 2
embed_size = 2000
window_size = 1
maxout_pieces = 3
subword_features = true
ÚmodelÚ
entity_linker)zdoc.entsz doc.sentsz
token.ent_iobztoken.ent_typeztoken.ent_kb_idé@z@misczspacy.CandidateGenerator.v1z spacy.CandidateBatchGenerator.v1zspacy.EmptyKB.v2z@scorerszspacy.entity_linker_scorer.v1)r%Úlabels_discardÚn_sentsÚ
incl_priorÚ incl_contextÚentity_vector_lengthÚget_candidatesÚget_candidates_batchÚgenerate_empty_kbÚ overwriteÚscorerÚ
use_gold_entsÚcandidates_batch_sizeÚ thresholdgð?)Ú nel_micro_fÚ nel_micro_rÚ nel_micro_p)ÚrequiresÚassignsÚdefault_configÚdefault_score_weights)r4ÚnlpÚnamer(r)r*r+r,r-r.r/r0r1r2r3r4c
óÄ|jjdd«s t|j||||||||| | ¬« St |j||||||||| |
| | |
||¬«S)Construct an EntityLinker component.
model (Model[List[Doc], Floats2d]): A model that learns document vector
representations. Given a batch of Doc objects, it should return a single
array, with one row per item in the batch.
labels_discard (Iterable[str]): NER labels that will automatically get a "NIL" prediction.
n_sents (int): The number of neighbouring sentences to take into account.
incl_prior (bool): Whether or not to include prior probabilities from the KB in the model.
incl_context (bool): Whether or not to include the local context in the model.
entity_vector_length (int): Size of encoding vectors in the KB.
get_candidates (Callable[[KnowledgeBase, Span], Iterable[Candidate]]): Function that
produces a list of candidates, given a certain knowledge base and a textual mention.
get_candidates_batch (
Callable[[KnowledgeBase, Iterable[Span]], Iterable[Iterable[Candidate]]], Iterable[Candidate]]
): Function that produces a list of candidates, given a certain knowledge base and several textual mentions.
generate_empty_kb (Callable[[Vocab, int], KnowledgeBase]): Callable returning empty KnowledgeBase.
scorer (Optional[Callable]): The scoring method.
use_gold_ents (bool): Whether to copy entities from gold docs during training or not. If false, another
component must provide entity annotations.
candidates_batch_size (int): Size of batches for entity candidate generation.
threshold (Optional[float]): Confidence threshold for entity predictions. If confidence is below the threshold,
prediction is discarded. If None, predictions are not filtered by any threshold.
Úinclude_span_makerF)r(r)r*r+r,r-r0r1)
r(r)r*r+r,r-r.r/r0r1r2r3r4)ÚattrsÚgetr"ÚvocabÚ EntityLinker)r<r=r%r(r)r*r+r,r-r.r/r0r1r2r3r4s ú]C:\Users\garci\AppData\Roaming\Python\Python312\site-packages\spacy/pipeline/entity_linker.pyÚmake_entity_linkerrE+sðN ;‰;?‰?ÐÔ Ø I‰IØ Ø ØØ%Ø!5ØØô 
ð
ô Ø ‰ Ø
Ø ØØØØØô! ðóc óPtj|fdtjgi|¤ŽS)negative_labels)rÚ score_linksrCÚNIL)ÚexamplesÚkwargss rDÚentity_linker_scorerMs&Ü × Ñ ˜ ×9IÑ9IÐ8JÐ UÈfÑ UrFcótS©N)rM©rFrDÚmake_entity_linker_scorerrQsä ÐrFc$óŽeZdZdZdZ d7eeddœdedede de
e d e d
e d e d e d
e
eege
efde
ee
ege
e
efde
ee gefde dee
de de deeddf"dZde
ede
efdZde
egeffdZd8dZdddœde
ge
efdeedee
egeffdZd „Zd!ddd"œde
ed#ed$eed%eee efdee eff
d&„Zde
ed'efd(„Zd)e
e de!e fd*„Z"d)e
e d+e!e ddfd,„Z#e$«d-œd.„Z%e$«d-œd/„Z&e'«d-œd0e(e e)fd1e
e ddfd2„Z*e'«d-œd0e(e e)fd1e
e ddfd3„Z+ddd4œd5„Z,d6„Z-y)9rCz^Pipeline component for named entity linking.
DOCS: https://spacy.io/api/entitylinker
rJN)r0r1r4rBr%r=r(r)r*r+r,r-r.r/r0r1r2r3r4Úreturnc
ó 
|8d|cxkrdks-nttjjdd|¬««|_|_|_t|«_|_ |_
|_ | ‰_ |
_
d| i_td¬«_| j|«_|_|_|_|dkrttj*«dt,t.fˆ
ˆfd „ }|_y)
aUInitialize an entity linker.
vocab (Vocab): The shared vocabulary.
model (thinc.api.Model): The Thinc Model powering the pipeline component.
name (str): The component instance name, used to add entries to the
losses during training.
labels_discard (Iterable[str]): NER labels that will automatically get a "NIL" prediction.
n_sents (int): The number of neighbouring sentences to take into account.
incl_prior (bool): Whether or not to include prior probabilities from the KB in the model.
incl_context (bool): Whether or not to include the local context in the model.
entity_vector_length (int): Size of encoding vectors in the KB.
get_candidates (Callable[[KnowledgeBase, Span], Iterable[Candidate]]): Function that
produces a list of candidates, given a certain knowledge base and a textual mention.
get_candidates_batch (
Callable[[KnowledgeBase, Iterable[Span]], Iterable[Iterable[Candidate]]],
Iterable[Candidate]]
): Function that produces a list of candidates, given a certain knowledge base and several textual mentions.
generate_empty_kb (Callable[[Vocab, int], KnowledgeBase]): Callable returning empty KnowledgeBase.
scorer (Optional[Callable]): The scoring method. Defaults to Scorer.score_links.
use_gold_ents (bool): Whether to copy entities from gold docs or not. If false, another
component must provide entity annotations.
candidates_batch_size (int): Size of batches for entity candidate generation.
threshold (Optional[float]): Confidence threshold for entity predictions. If confidence is below the
threshold, prediction is discarded. If None, predictions are not filtered by any threshold.
DOCS: https://spacy.io/api/entitylinker#init
Nrr!)Ú range_startÚ range_endÚvaluer0F)Ú normalizerKcóÐsSjs |fi|¤ŽSj|«}jd|D««}t||«D] \}}||_Œ|fi|¤ŽS)Nc3ó4K|]}|jŒy­wrO)Ú predicted)Ú.0Úegs rDú <genexpr>zFEntityLinker.__init__.<locals>._score_with_ents_set.<locals>.<genexpr>þsèø€Ð5©H bR—\•\©Hùs)r2Ú _ensure_entsÚpipeÚzipr[)rKrLÚdocsr]Údocr1Úselfs €€rDÚ_score_with_ents_setz3EntityLinker.__init__.<locals>._score_with_ents_setôszø€ñØ
Ø×˜1¨&Ñ×,¨XÓ6Ø—y‘yÙ5©HÓô # 8¨TÖ2GB˜Ø#&B•Lð˜1¨&Ñ1rF)Ú
ValueErrorrÚE1043ÚformatrBr%r=Úlistr(r)r*r+r-r.Úcfgr
ÚdistanceÚkbr2r3r4ÚE1044rrr1)rdrBr%r=r(r)r*r+r,r-r.r/r0r1r2r3r4res` ` rDÚ__init__zEntityLinker.__init__¦sù€ðb Ð ¨!¨yÔ*=¸AÔ*=ÜÜ ×#Ø !ØØóð
ðˆŒ
؈Œ
؈Œ Ü" ÔàˆŒ ØŒØÔØÔØ$8ˆÔ!Ø$/°Ð#;ˆŒÜÔ7ˆŒ
Ù# D§J¡JÐ0DÓŒØ*ˆÔØ%:ˆÔ"ˆŒà   œVŸ\™\Ó  2¬8´GÑ+<ö+ˆ rFrKcó¾|js|Sg}|D]G}|j«\}}|j«}||j_|j |«ŒI|S)zLIf use_gold_ents is true, set the gold entities to (a copy of) eg.predicted.)r2Úget_aligned_ents_and_nerÚcopyr[ÚentsÚappend)rdrKÚ new_examplesr]rrÚnew_egs rDr_zEntityLinker._ensure_entss`à׈Oàˆ ÛˆBØ×3‰GˆD—W‘W“YˆFØ$(ˆF× Ñ Ô × Ñ  Õ ð
ÐrFÚ kb_loadercó¬t|«s2ttjj t |«¬««||j «|_y)ziDefine the KB of this pipe by providing a function that will
create it using this object's vocab.)Úarg_typeN)ÚcallablerfrÚE885rhÚtyperBrl)rdrws rDÚset_kbzEntityLinker.set_kbs=ô˜ ÔœVŸ[™[׸i»Ð ˜DŸJ™JÓ'ˆrFcóJ|j€3ttjj |j
¬««t
|jd«rN|jj«r3ttjj |j
¬««yy)r=Úis_empty) rlrfrÚE1018rhr=Úhasattrr€ÚE139©rds rDÚ validate_kbzEntityLinker.validate_kbsrà 7‰7ˆœVŸ\™\×0°d·i±iÐ 4—77˜ '¨D¯G©G×,<Ñ,<Ô,>ÜœVŸ[™[×/°T·Y±YÐ -?Ð 'rF)r<rwÚ get_examplesr<có˜t|d«||j|«|j«|jj}g}g}|j t
|«d««}|D]S}|j} |j| «|j|jjj|««ŒUt|«dkDs/Jtjj|j ¬««t|«dkDs/Jtjj|j ¬««t#|D cgc]} | j$Œc} «}
|
s|d} | dd} d| _| f| _|jj)||jjj+|d¬ «¬
«|
sg _yycc} w) aInitialize the pipe for training, using a representative set
of data examples.
get_examples (Callable[[], Iterable[Example]]): Function that
returns a representative sample of gold-standard Example objects.
nlp (Language): The current nlp object the component is part of.
kb_loader (Callable[[Vocab], KnowledgeBase]): A function that creates a KnowledgeBase from a Vocab
instance. Note that providing this argument will overwrite all data accumulated in the current KB.
Use this only when loading a KB as-such from file.
DOCS: https://spacy.io/api/entitylinker#initialize
zEntityLinker.initializeNé
rrr!ÚXXXÚfloat32©Údtype)ÚY)rr}r…rlr,r_rÚxrsr%ÚopsÚalloc1fÚlenrÚE923rhr=ÚanyrrÚlabel_Ú
initializeÚasarray) rdr†r<rwÚnOÚ
doc_sampleÚ
vector_samplerKr]rcÚhas_annotationsÚents rDrzEntityLinker.initialize"ô& ˜lÐ,EÔ Ð Ø K‰K˜ Ô  ×ÑÔØ
W‰W×
)ˆØˆ
؈
Ø×$¤V©L«N¸BÓ%?ÓÛˆBØ—$‘$ˆCØ × Ñ ˜cÔ × Ñ  §¡§¡×!7Ñ!7¸Ó!;Õ ô: ÒF¤F§K¡K×$6Ñ$6¸D¿I¹IÐ$6Ó$FÓ! I¤v§{¡{×'9Ñ'9¸t¿y¹yÐ'9Ó'IÓ±:Ó>±:¨C˜sŸxx°:ÑÙØ˜Q‘-ˆCØ(ˆC؈CŒJØvˆCŒHà
×ÑØ˜DŸJ™JŸN™N×2°=È Ð ô
ñàˆCùò?sÅGcóœ|D]G}|jjD],}t|j|j|««}|sŒ+yŒIy)z€Check if a batch contains a learnable example.
If one isn't present, then the update step needs to be skipped.
TF)r[rrrir-rl)rdrKr]Ú
candidatess rDÚbatch_has_learnable_examplez(EntityLinker.batch_has_learnable_exampleXsHó ˆ—||×(Ü! $×"5Ñ"5°d·g±g¸sÓ"CÓD
ÚÚñð rFç)ÚdropÚsgdÚlossesr¡có|j«|i}|j|jd«|s|S|j|«}t |d«|j |«s|St
|j|«|Dcgc]}|jŒ}}|jj|«\}}|j||¬«\} }
||
«||j|«||jxx| z
cc<|Scc}w)a.Learn from a batch of documents and gold-standard information,
updating the pipe's model. Delegates to predict and get_loss.
examples (Iterable[Example]): A batch of Example objects.
drop (float): The dropout rate.
sgd (thinc.api.Optimizer): The optimizer.
losses (Dict[str, float]): Optional record of the loss during training.
Updated using the component name as the key.
RETURNS (Dict[str, float]): The updated losses dictionary.
DOCS: https://spacy.io/api/entitylinker#update
r zEntityLinker.update)Úsentence_encodingsrK) r…Ú
setdefaultr=r_rrr%r[Ú begin_updateÚget_lossÚ
finish_update) rdrKr]rbÚ
bp_contextÚlossÚd_scoress rDÚupdatezEntityLinker.updatefð(
×ÑÔØ ˆ>؈FØ×ј$Ÿ)™) SÔØˆMØ×$ XÓܘ(Ð$9Ô×ÔˆM䘟 *Ù'/Ó0¡x   xˆÐ0Ø)-¯©×)@Ñ)@ÀÓ)FјŸØ1¸
ˆˆhñ Ø ˆ?Ø × Ñ ˜sÔ ˆty‰yÓ˜TÑàˆ
ùò1sÁ?C<có|t|d«g}d}g}|D]}}|jdd¬«}|j«D]U}||j} | r=|jj | «}
|j
|
«|j
|«|dz
}ŒWŒ|jjj|d¬«}||} |s1|jjj|jŽ} d| fS| j|jk7r,tjjd d
¬ «}
t|
«|j j#| |«}|jjj|jŽ} || |<|j j%| |«}|t'|«z }t)|«| fS) NzEntityLinker.get_lossrÚ ENT_KB_IDT)Ú as_stringr!rzgold entities do not match up©ÚmethodÚmsg)rÚ get_alignedÚget_matching_entsÚstartrlÚ
get_vectorrsr%rÚ asarray2fÚalloc2fÚshaperÚE147rhÚ RuntimeErrorrkÚget_gradr¨rÚfloat)rdrKÚentity_encodingsÚeidxÚ keep_entsr]Úkb_idsrœÚkb_idÚentity_encodingÚselected_encodingsÚoutÚerrÚ gradientsr«s rDzEntityLinker.get_loss•ܘ(Ð$;ÔÐàˆØˆ ãˆBØ—^‘^ K¸4@ˆFà×-ؘsŸy™yÑ)ÙØ&*§g¡g×&8Ñ&8¸Ó&?$×+¨OÔ×$ TÔ˜‘ ‘ñð Ÿ:™:Ÿ>™>×3Ð4DÈIÐØ/° ÑñØ($—**—..×(Ð*<×*BÑ*BÐCˆc6ˆ × #Ð'7×'=Ñ'=Ò —++×!Ð'Fðˆ˜sÓ —M‘M×*Ð+=Ð?OÓPˆ à$ˆdj‰jn‰n×$Ð&8×&>Ñ&>Ð?ˆØ"ˆˆI‰à}‰}×%Ð&8Ð:JÓKˆØ”cÐÜT{˜ÐrFrbc ó |j«d}g}|jjj}|s|St |t
«r|g}t
|«D\}}t|«dk(rŒ|jDcgc]}|Œ}}tdt|j«|j«D]q} |j| | |jz}
tt|
««D cgc]} |
| j|jvr| Œ!} } t|jdkDr-|j|j | D cgc]} |
| Œ c} «n,| D cgc]!} |j#|j |
| «Œ#c} «}
t
|
«D]•\}}t%|d«sJt|j«}|j'|d«|j'|d«f}|d|dcxk\rdk\sJJ|j(r»t+d|d|j,z
«}t/t|«dz
|d|j,z«}||j0}||j2}|||j5«}|jj7|g«d}|j8}|j:j=|«}|dz
}|j|jvr|j?|j@«Œit|
|«}|s|j?|j@«Œ–t|«dk(r,|jB€ |j?|djD«ŒÐtGjH|«|jK|Dcgc]}|jLŒc}«}|jNs|jK|Dcgc]}dŒc}«}|}|j(rÜ|jK|Dcgc]}|jPŒc}«}|j:j=|d¬«} t|«t|«k7r*tStTjVjYdd¬ ««|j[|«| zz }!|!j\|j\k7rt_tTj`«||!z||!zz
}|j?|jB|j+«|jBk\r+||jc«je«jDntfj@«Œ˜ŒtŒÐt|«|k(s,tTjVjYdd
¬ «}"tS|"«|Scc}wcc} wcc} wcc} wcc}wcc}wcc}w) apApply the pipeline's model to a batch of docs, without modifying them.
Returns the KB IDs for each entity in each doc, including NIL if there is
no prediction.
docs (Iterable[Doc]): The documents to predict.
RETURNS (List[str]): The models prediction for each document.
DOCS: https://spacy.io/api/entitylinker#predict
rr!Úsentséÿÿÿÿr )ÚaxisÚpredictzvectors not of equal lengthr±z$result variables not of equal length)4r…r%rÚxpÚ
isinstancerÚ enumeraterÚrangerrr3r•r(rir.rlr-rÚindexr+Úmaxr)Úminr¶ÚendÚas_docrÍÚlinalgÚnormrsrJr4Úentity_ÚrandomÚshuffler—Ú
prior_probr*Ú
entity_vectorr¼rrhÚdotrºrfÚE161ÚargmaxÚitemrC)#rdrbÚ entity_countÚ final_kb_idsrÎÚircÚsÚ sentencesÚent_idxÚ ent_batchÚidxÚ
valid_ent_idxÚbatch_candidatesÚjrœÚ sent_indicesÚstart_sentenceÚ end_sentenceÚ start_tokenÚ end_tokenÚsent_docÚsentence_encodingÚsentence_encoding_tÚ
sentence_normržÚ prior_probsruÚscoresr¿Ú entity_normÚsimsrÇs# rDzEntityLinker.predict½s#ð
×ÑÔØˆ Ø"$ˆ Ø
Z‰Z^‰^×
Ñ
ˆÙØÐ Ü dœCÔ Ø6ˆDÜ —o‰FˆAˆsÜ3x˜1Š}ØØ$'§I¢IÓ.¡I˜ Iˆ¤C¨¯©£M°4×3MÑ3M×NØŸH™H W¨w¸×9SÑ9SÑ/SÐT ô
%¤S¨£^Ô!áØ  ‘~×,°D×4GÑ4GÑØð!ô $(ð×1°AÒן¹MÓ!J¹M°S )¨C£.¸MÑ!Jôñ $1óá#0˜Cð×+¨D¯G©G°Y¸s±^ÕDØ#0ñó $Ð ô×2FA" Ô  §¡›O¨¨a©Ó¨¨b© Ó$™?¨l¸1©oÔÒ×(ä),¨Q° ¸Q±À$Ç,Á,Ñ0NÓ)O˜Ü'*Ü  N¨QÑ ¸À$Ç,Á,Ñ0Nó(˜ ð'0°Ñ&?×&EÑ&E˜ Ø$-¨lÑ$;×$?Ñ$?˜ Ø#& {°9Ð#=×#DÑ#DÓ#F˜ð-1¯J©J×,>Ñ,>À¸zÓ,JÈ1Ñ,MÐ)Ø.?×.AÑ.AÐ+Ø(*¯ © ¯©Ð7JÓ(K˜
Ø  %—zz T×%8Ñ%8Ñ$×+¨D¯H©HÖ5ä%)Ð*:¸1Ñ*=Ó%>˜
Ù·±Ö  ›_°Ò1°d·n±nÐ6Là
¸
×0EÑ0EÖ"ŸN™N¨:Ô6à*,¯*©*ÉJÓ5WÉJÀq°a·l³lÈJÑ5WÓ*X˜KØ#'§?¢?Ø.0¯j©jÁzÓ9RÁzÀ!º#ÀzÑ9RÓ.S  Ø%0˜#×0Ø35·:±:Ù>HÓ$I¹j¸ Q§_£_¸jÑ$Ió4"Ð 0ð/1¯i©i¯n©nÐ=MÐTU¨nÓ.V  Ü#&Ð'7Ó#8¼CÀ Ó<LÒ#LÜ*6Ü(.¯ © ×(:Ñ(:Ø3<Ø0Mð);ó)*ó+&ð%&ð(*§v¡vÐ.>Ð@SÓ'TØ$1°KÑ$?ñ(" ð$(§:¡:°×1BÑ1BÒ#BÜ*4´V·[±[Ó*AÐ$AØ)4°tÑ);¸{ÈTÑ?QÑ)R Ø(×/à#'§>¡>Ð#9Ø#)§:¡:£<°4·>±>Ò#Að!+¨6¯=©=«?×+?Ñ+?Ó+AÑ B× JÒ Jô&2×%5Ñ%5ö ò}3ò-Oð
&ôB! —++× Ð&LðˆCô˜sÓ ÐùòG/ùò!ùò"KùòùòZ6Xùâ9Rùò
%Js*Á; U$Ã&$U)Ä: U.Å&U3 Í=U8Î2 U=ÏVcót|Dcgc]}|jD]}|ŒŒc}}«}|t|«k7r3ttjj |t|«¬««d}|j d}|D]=}|jD],}||}|dz
}|D]} | jdk(s|sŒ|| _ŒŒ.Œ?ycc}}w)aModify a batch of documents, using pre-computed scores.
docs (Iterable[Doc]): The documents to modify.
kb_ids (List[str]): The IDs to set, produced by EntityLinker.predict.
DOCS: https://spacy.io/api/entitylinker#set_annotations
)rrÚidsrr0r!N) rrrrfrÚE148rhrjÚ ent_kb_idÚ
ent_kb_id_)
rdrbrcÚ
count_entsrår0Útokens
rDÚset_annotationszEntityLinker.set_annotations7ô©Ô #¸¿¼°#š#¸˜#¨Ò