Files
INTUIA/Programa final/pt_core_news_sm/README.md
T

53 lines
33 KiB
Markdown
Raw Normal View History

2026-03-15 13:27:50 +00:00
### Details: https://spacy.io/models/pt#pt_core_news_sm
Portuguese pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, lemmatizer (trainable_lemmatizer), senter, ner, attribute_ruler.
| Feature | Description |
| --- | --- |
| **Name** | `pt_core_news_sm` |
| **Version** | `3.8.0` |
| **spaCy** | `>=3.8.0,<3.9.0` |
| **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `lemmatizer`, `attribute_ruler`, `ner` |
| **Components** | `tok2vec`, `morphologizer`, `parser`, `lemmatizer`, `senter`, `attribute_ruler`, `ner` |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
| **Sources** | [UD Portuguese Bosque v2.8](https://github.com/UniversalDependencies/UD_Portuguese-Bosque) (Rademaker, Alexandre; Freitas, Cláudia; de Souza, Elvis; Silveira, Aline; Cavalcanti, Tatiana; Evelyn, Wograine; Rocha, Luisa; Soares-Bastos, Isabela; Bick, Eckhard; Chalub, Fabricio; Paulino-Passos, Guilherme; Real, Livy; de Paiva, Valeria; Zeman, Daniel; Popel, Martin; Mareček, David; Silveira, Natalia; Martins, André)<br />[WikiNER](https://figshare.com/articles/Learning_multilingual_named_entity_recognition_from_Wikipedia/5462500) (Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, James R Curran) |
| **License** | `CC BY-SA 4.0` |
| **Author** | [Explosion](https://explosion.ai) |
### Label Scheme
<details>
<summary>View label scheme (590 labels for 3 components)</summary>
| Component | Labels |
| --- | --- |
| **`morphologizer`** | `Definite=Ind\|Gender=Masc\|Number=Sing\|POS=DET\|PronType=Art`, `Gender=Masc\|Number=Sing\|POS=NOUN`, `Gender=Masc\|Number=Sing\|POS=ADJ`, `Definite=Def\|Gender=Masc\|Number=Sing\|POS=DET\|PronType=Art`, `Gender=Masc\|Number=Sing\|POS=PROPN`, `Number=Sing\|POS=PROPN`, `Mood=Ind\|Number=Sing\|POS=AUX\|Person=3\|Tense=Pres\|VerbForm=Fin`, `Gender=Masc\|Number=Plur\|POS=NOUN`, `Definite=Def\|POS=ADP\|PronType=Art`, `Gender=Fem\|Number=Sing\|POS=NOUN`, `Gender=Fem\|Number=Sing\|POS=ADJ`, `POS=PUNCT`, `NumType=Card\|POS=NUM`, `POS=ADV`, `Gender=Fem\|Number=Plur\|POS=ADJ`, `Gender=Fem\|Number=Plur\|POS=NOUN`, `Definite=Def\|Gender=Masc\|Number=Sing\|POS=ADP\|PronType=Art`, `Gender=Fem\|Number=Sing\|POS=PROPN`, `Gender=Fem\|Number=Sing\|POS=VERB\|VerbForm=Part`, `POS=ADP`, `POS=PRON\|PronType=Rel`, `Mood=Ind\|Number=Sing\|POS=VERB\|Person=3\|Tense=Pres\|VerbForm=Fin`, `POS=SCONJ`, `POS=VERB\|VerbForm=Inf`, `Definite=Def\|Gender=Masc\|Number=Plur\|POS=DET\|PronType=Art`, `Gender=Masc\|Number=Plur\|POS=ADJ`, `POS=CCONJ`, `Definite=Def\|Gender=Fem\|Number=Plur\|POS=DET\|PronType=Art`, `Definite=Def\|Gender=Fem\|Number=Sing\|POS=DET\|PronType=Art`, `Definite=Ind\|Gender=Fem\|Number=Sing\|POS=DET\|PronType=Art`, `Gender=Masc\|Number=Sing\|POS=DET\|PronType=Ind`, `Mood=Sub\|Number=Sing\|POS=AUX\|Person=3\|Tense=Pres\|VerbForm=Fin`, `Definite=Def\|Gender=Masc\|Number=Plur\|POS=ADP\|PronType=Art`, `Gender=Masc\|Number=Plur\|POS=PRON\|PronType=Rel`, `Gender=Fem\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Mood=Ind\|Number=Plur\|POS=VERB\|Person=3\|Tense=Pres\|VerbForm=Fin`, `POS=ADV\|Polarity=Neg`, `Gender=Masc\|Number=Sing\|POS=DET\|PronType=Art`, `POS=X`, `Gender=Masc\|Number=Plur\|POS=PRON\|PronType=Dem`, `Gender=Fem\|Number=Plur\|POS=DET\|PronType=Ind`, `Mood=Ind\|Number=Plur\|POS=VERB\|Person=1\|Tense=Pres\|VerbForm=Fin`, `Gender=Masc\|Number=Plur\|POS=PRON\|PronType=Tot`, `Case=Acc\|Gender=Masc\|Mood=Ind\|Number=Plur\|POS=VERB\|Person=1\|PronType=Prs\|Tense=Pres\|VerbForm=Fin`, `Number=Sing\|POS=CCONJ`, `Gender=Masc\|Number=Sing\|POS=VERB\|VerbForm=Part`, `Gender=Masc\|Number=Plur\|POS=DET\|PronType=Dem`, `Case=Acc\|Gender=Masc\|Number=Plur\|POS=VERB\|Person=3\|PronType=Prs\|VerbForm=Inf`, `Gender=Masc\|Number=Sing\|POS=DET\|PronType=Dem`, `Gender=Masc\|Number=Sing\|POS=PRON\|PronType=Rel`, `Case=Acc\|Gender=Fem\|Number=Plur\|POS=VERB\|Person=3\|PronType=Prs\|VerbForm=Inf`, `Gender=Fem\|Number=Plur\|POS=PRON\|PronType=Ind`, `Gender=Masc\|Number=Plur\|POS=DET\|PronType=Prs`, `Case=Acc\|Gender=Masc\|Mood=Sub\|Number=Plur\|POS=VERB\|Person=3\|PronType=Prs\|Tense=Pres\|VerbForm=Fin`, `Number=Plur\|POS=NOUN`, `Mood=Sub\|Number=Plur\|POS=VERB\|Person=3\|Tense=Fut\|VerbForm=Fin`, `POS=AUX\|VerbForm=Inf`, `Gender=Fem\|Number=Plur\|POS=VERB\|VerbForm=Part\|Voice=Pass`, `Case=Nom\|Gender=Masc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Gender=Masc\|Number=Sing\|POS=ADP\|PronType=Dem`, `Gender=Masc\|Number=Sing\|POS=PRON\|PronType=Dem`, `POS=VERB\|VerbForm=Ger`, `Mood=Ind\|Number=Plur\|POS=AUX\|Person=3\|Tense=Pres\|VerbForm=Fin`, `Gender=Masc\|Number=Plur\|POS=VERB\|VerbForm=Part\|Voice=Pass`, `Gender=Masc\|Number=Plur\|POS=PROPN`, `Number=Plur\|POS=AUX\|Person=3\|VerbForm=Inf`, `Gender=Fem\|Number=Sing\|POS=PRON\|PronType=Dem`, `Mood=Ind\|Number=Sing\|POS=VERB\|Person=3\|Tense=Fut\|VerbForm=Fin`, `Gender=Masc\|Number=Plur\|POS=PRON\|PronType=Ind`, `Mood=Ind\|Number=Plur\|POS=VERB\|Person=3\|Tense=Past\|VerbForm=Fin`, `Definite=Def\|Gender=Masc\|Number=Sing\|POS=PRON\|PronType=Art`, `POS=VERB\|VerbForm=Part`, `Gender=Masc\|NumType=Ord\|Number=Sing\|POS=ADJ`, `Mood=Ind\|Number=Sing\|POS=VERB\|Person=3\|Tense=Past\|VerbForm=Fin`, `Gender=Fem\|Number=Sing\|POS=DET\|PronType=Dem`, `Definite=Ind\|Gender=Fem\|Number=Sing\|POS=ADP\|PronType=Art`, `Gender=Fem\|Number=Sing\|POS=PRON\|PronType=Rel`, `Mood=Sub\|Number=Sing\|POS=VERB\|Person=3\|Tense=Pres\|VerbForm=Fin`, `Definite=Def\|Gender=Fem\|Number=Sing\|POS=ADP\|PronType=Art`, `Mood=Ind\|Number=Sing\|POS=AUX\|Person=3\|Tense=Past\|VerbForm=Fin`, `Case=Acc\|Ge
| **`parser`** | `ROOT`, `acl`, `acl:relcl`, `advcl`, `advmod`, `amod`, `appos`, `aux`, `aux:pass`, `case`, `cc`, `ccomp`, `compound`, `conj`, `cop`, `csubj`, `dep`, `det`, `discourse`, `expl`, `fixed`, `flat`, `flat:foreign`, `flat:name`, `iobj`, `mark`, `nmod`, `nsubj`, `nsubj:pass`, `nummod`, `obj`, `obl`, `obl:agent`, `parataxis`, `punct`, `xcomp` |
| **`ner`** | `LOC`, `MISC`, `ORG`, `PER` |
</details>
### Accuracy
| Type | Score |
| --- | --- |
| `SENTS_P` | 92.61 |
| `SENTS_R` | 94.58 |
| `SENTS_F` | 92.21 |
| `TAG_ACC` | 88.94 |
| `ENTS_P` | 88.13 |
| `ENTS_R` | 88.16 |
| `ENTS_F` | 88.14 |
| `TOKEN_ACC` | 100.00 |
| `TOKEN_P` | 99.88 |
| `TOKEN_R` | 99.95 |
| `TOKEN_F` | 99.92 |
| `POS_ACC` | 96.38 |
| `MORPH_ACC` | 94.96 |
| `MORPH_MICRO_P` | 97.63 |
| `MORPH_MICRO_R` | 97.08 |
| `MORPH_MICRO_F` | 97.35 |
| `DEP_UAS` | 89.04 |
| `DEP_LAS` | 84.52 |
| `LEMMA_ACC` | 96.91 |