Files
INTUIA/Programa final/spacy/training/__pycache__/iob_utils.cpython-312.pyc
T

100 lines
11 KiB
Plaintext
Raw Normal View History

2026-03-15 13:27:50 +00:00
Ë
?û gc$ã óÚddlZddlmZmZmZmZmZmZmZddl m
Z
m Z ddl m
Z
mZdeedeefdZdeedeefd „Zdeedeefd
Zdeedeefd Zdd e
d
efdZd e
deefdZ dd e
deeeeeeeffd
edeefdZd e
deedeefdZd e
deedeeeeeeefffdZdeedeeeeeffdZdedeeeffdZdedefdZeZeZeZy)éN)ÚDictÚIterableÚIteratorÚListÚTupleÚUnionÚcasté)ÚErrorsÚWarnings)ÚDocÚSpanÚtagsÚreturncó’g}t|«}|r7|jt|««|jt|««|rŒ7|S)N)ÚlistÚextendÚ _consume_osÚ _consume_ent)rÚouts úYC:\Users\garci\AppData\Roaming\Python\Python312\site-packages\spacy/training/iob_utils.pyÚ iob_to_biluorsAØ€CÜ ‹:€DÙ
Ø
”;˜
”< Ó ð €Jócóªg}|D]K}||j|«Œ|jddd«jddd«}|j|«ŒM|S)U-úB-éúL-úI-)ÚappendÚreplace)rrÚtags rÚ biluo_to_iobr#sTØ
€CÛˆØ ˆ;Ø J‰Js—++˜d D¨!Ó4°T¸ÓCˆCØ J‰Js ð €Jrc#óbK|r)|ddk(r |jd«|r
|ddk(rŒyyyy­w)NrÚO)Úpop)rs rrrs6èø€Ù
4˜7˜ch‰hqñ 4˜7˜c”>ˆ$>ˆ$ùs(/«/có¨|sgS|jd«}d|ddz}d|ddz}d}|r+|d||hvr"|dz
}|jd«|r
|d||hvrŒ"|dd}|dk(r=t|«dk(r)ttjj |¬««d|zgSd|z}d |z}t
d|dz
«Dcgc]}d
|Œ } }|g| z|gzScc}w) NrÚIrÚLr
©r"rrrr)r&ÚlenÚ
ValueErrorr ÚE177ÚformatÚrange)
rr"Ú target_inÚ target_lastÚlengthÚlabelÚstartÚendÚmiddles
rrr!s٠؈ Ø
(‰(1+€CØc˜!˜"g
€Iؘ˜A˜B˜-€KØ
€FÙ
‘7˜y¨+Ð!‰ ˆØ Œ ñ ‘7˜y¨+Ð
ˆG€EØ
‚{Ü ˆu‹:˜Š?ÜœVŸ[™[×/°CÐ u‘ ˆ~Ðàu‘ ˆØU‰lˆÜ(-¨a°¸Ô(<Ó=Ñ(< 1Bug,Ð(<ˆÐˆw˜Ñ 3 'ùò>sÂ7 CÚdocÚmissingc óšt||jDcgc]%}|j|j|jfŒ'c}|¬«Scc}w)r9)Úoffsets_to_biluo_tagsÚentsÚ
start_charÚend_charÚlabel_)r8r9Úents rÚdoc_to_biluo_tagsrB7sAÜ Ø Ø?B¿xºxÓH¹x¸ˆ#.‰.˜#Ÿ,™,¨¯
©
Ò 3¸xÑô ðùâHs•*A
cópt|d¬«}t|«D]\}}|jdk(sŒd||<Œ|S)-r;r
r%)rBÚ enumerateÚent_iob)r8r=Útokens rÚ_doc_to_biluo_tags_with_partialrI?s<Ü ˜S¨#Ô .€Dܘc–N‰ˆˆ5Ø =‰=˜AÓ ØˆDŠGð €KrÚentitiesc
ó
i}|Dcic]}|j|jŒ}}|Dcic]%}|jt|«z|jŒ'}}|Dcgc]}dŒ}}|D\} }
} | s|D]} | | k\sŒ | |
ksŒd||| <ŒŒ%t| |
«D]^}
|
|j «vrBt t jj||
d||
d||
df| |
| f¬««| |
| f||
<Œ`|j| «}|j|
«}|Œ·|Œº||k(r d| ||<ŒÈd | ||<t|dz|«D]
}d
| ||<Œ d | ||<Œõt«}|D](\} }
} t| |
«D]}|j|«ŒŒ*|D]H}t|j|jt|«z«D]}||vsŒŒ9|||j<ŒJd|vrŽ|dk7r‰t|«}tjtj jt|j"«d kDr|j"dd d
zn |j"t|«d kDr|dd d
zn|¬««|Scc}wcc}wcc}w)u¸Encode labelled spans into per-token tags, using the
Begin/In/Last/Unit/Out scheme (BILUO).
doc (Doc): The document that the entity offsets refer to. The output tags
will refer to the token boundaries within the document.
entities (iterable): A sequence of `(start, end, label)` triples. `start`
and `end` should be character-offset integers denoting the slice into
the original string.
missing (str): The label used for missing values, e.g. if tokenization
doesn’t align with the entity offsets. Defaults to "O".
RETURNS (list): A list of unicode strings, describing the tags. Each tag
string will be of the form either "", "O" or "{action}-{label}", where
action is one of "B", "I", "L", "U". The missing label is used where the
entity offsets don't align with the tokenization in the `Doc` object.
The training algorithm will view these as missing values. "O" denotes a
non-entity token. "B" denotes the beginning of a multi-token entity,
"I" the inside of an entity of three or more tokens, and "L" the end
of an entity of two or more tokens. "U" denotes a single-token entity.
EXAMPLE:
>>> text = 'I like London.'
>>> entities = [(len('I like '), len('I like London'), 'LOC')]
>>> doc = nlp.tokenizer(text)
>>> tags = offsets_to_biluo_tags(doc, entities)
>>> assert tags == ["O", "O", 'U-LOC', "O"]
rDr%rrr
)Úspan1Úspan2Nrrrré2z...)ÚtextrJ)ÚidxrGr+r/Úkeysr,r ÚE103r.ÚgetÚsetÚaddÚstrÚwarningsÚwarnr ÚW030rO)r8rJr9Útokens_in_entsrHÚstartsÚendsr6Úbiluor>r?r3Ú token_indexÚ start_tokenÚ end_tokenrGÚ entity_charsÚent_strs rr<r<Gð<CE€NÙ.1Ó
2©c Uˆei‰i˜ŸÑ ¨c€FÐ
2Ù9<Ó °ˆEI‰Iœ˜E
Ñ " E§G¡GÑ +¸€DÐ Ó ™#QŠS˜#€EÐ ã'/Ñ#ˆ
H˜Ûؘ
“? q¨8£|Ø'*E˜& ™)Òô % Ö: Ø .×"5Ñ"5Ó"7ÑŸ ×*à .¨{Ñ ;¸AÑ >Ø .¨{Ñ ;¸AÑ >Ø .¨{Ñ ;¸AÑ >ð#ð
$.¨x¸Ð"?ð
ó ð ð0:¸8ÀUÐ.K˜!Ÿ*™* ZÓ0ˆKØŸ Ó*ˆÑ&¨9Ñ+@Ø +Ø+-¨e¨W¨E˜&à+-¨e¨W¨E˜" ¡?°IÖ>˜Ø%'¨ w <˜˜ð?à)+¨E¨7 |E˜$ð;(0ô>“5€LÛ'/Ñ#ˆ
H˜eÜz 8Ö,ˆAØ × Ñ ˜QÕ ñ(0óˆÜu—yy %§)¡)¬c°%«jÑ"8Ö9ˆ Ùð%ˆE%—''ŠNð ð  ˆe|˜ 3šÜh“-ˆÜ
Ü M‰M× Ñ Ü.1°#·(±(«m¸bÒ.@S—XX˜c˜r] *ÀcÇhÁhÜ14°W³ÀÒ1B˜  "˜¨Òð

ô