W.P. McNeill:
I want to train a Textcategorizer model with the following (text, label)
pairs.
Label COLOR:
- The door is brown.
- The barn is red.
- The flower is yellow.
Label ANIMAL:
- The horse is running.
- The fish is jumping.
- The chicken is asleep.
I am copying the example code in the documentation for TextCategorizer.
textcat = TextCategorizer(nlp.vocab)
losses = {}
optimizer = nlp.begin_training()
textcat.update([doc1, doc2], [gold1, gold2], losses=losses, sgd=optimizer)
The doc variables will presumably be just nlp("The door is brown.")
and so on. What should be in gold1
and gold2
? I'm guessing they should be GoldParse objects, but I don't see how you represent text categorization information in those.
Posted in S.E.F
via StackOverflow & StackExchange Atomic Web Robots
This Question have been answered
HERE