Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

SOLVED: Bi-Grams not generated while using vocabulary parameter in Countvectorizer

ashok eapen:

I am trying generate BiGrams using countvectorizer and attach them back to the dataframe. Howerver Its giving me only unigrams only as outputs. I want to create the bi grams only if the specific keywords are present . I am passing them using vocabulary parameter

Input data


Id Name
1 Industrial Floor chenidsd 34
2 Industrial Floor room 345
3 Central District 46
4 Central Industrial District Bay
5 Chinese District Bay
6 Bay Chinese xrty
7 Industrial Floor chenidsd 34
8 Industrial Floor room 345
9 Central District 46
10 Central Industrial District Bay
11 Chinese District Bay
12 Bay Chinese dffefef
13 Industrial Floor chenidsd 34
14 Industrial Floor room 345
15 Central District 46
16 Central Industrial District Bay
17 Chinese District Bay
18 Bay Chinese grty

NLTK


words=nltk.corpus.stopwords.words('english')
Nata['Clean_Name'] = Nata['Name'].apply(lambda x: ' '.join([item.lower() for item in x.split()]))
Nata['Clean_Name']=Nata['Clean_Name'].apply(lambda x:"".join([item.lower() for item in x if not item.isdigit()]))
Nata['Clean_Name']=Nata['Clean_Name'].apply(lambda x:"".join([item.lower() for item in x if item not in string.punctuation]))
Nata['Clean_Name'] = Nata['Clean_Name'].apply(lambda x: ' '.join([item.lower() for item in x.split() if item not in (new_stop_words)]))

Vocabulary Defintion


english_corpus=['bay','central','chinese','district', 'floor','industrial','room']

Bigram Generator


cv = CountVectorizer( max_features = 200,analyzer='word',vocabulary = english_corpus,ngram_range =(2,2))
cv_addr = cv.fit_transform(Nata.pop('Clean_Name'))
for i, col in enumerate(cv.get_feature_names()):
Nata[col] = pd.SparseSeries(cv_addr[:, i].toarray().ravel(), fill_value=0)

However it gives me only unigram as output.How to fix this.

Output


In[26]:Nata.columns.tolist()
Out[26]:

['Id',
'Name',
'bay',
'central',
'chinese',
'district',
'floor',
'industrial',
'room']



Posted in S.E.F
via StackOverflow & StackExchange Atomic Web Robots
This Question have been answered
HERE


This post first appeared on Stack Solved, please read the originial post: here

Share the post

SOLVED: Bi-Grams not generated while using vocabulary parameter in Countvectorizer

×

Subscribe to Stack Solved

Get updates delivered right to your inbox!

Thank you for your subscription

×