Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

4 ways to encode categorical features with high cardinality

Member-only storyAicha BokbotFollowTowards Data Science--ShareIn this article, we will go through 4 popular methods to encode Categorical variables with high cardinality: (1) Target encoding, (2) Count encoding, (3) Feature hashing and (4) Embedding.We will explain how each method works, discuss its pros and cons and observe its impact on the performance of a classification task.— Introducing categorical features (1) Why do we need to encode categorical features? (2) Why one-hot encoding is not suited to high cardinality? — Application on an AdTech dataset — Overview of each encoding method (1) Target encoding (2) Count encoding (3) Feature hashing (4) Embedding — Benchmarking the performance to predict CTR — Conclusion — To go furtherCategorical features are a type of variables that describe categories or groups (e.g. gender, color, country), as opposed to numerical features that measure a quantity (e.g. age, height, temperature).There are two types of categorical data: ordinal features which categories can be ranked and sorted (e.g. sizes of T-shirt or restaurant ratings from 1 to 5 star) and nominal features which categories don’t imply any meaningful order (e.g. name of a person, of a city).Encoding a categorical variable means finding a mapping that converts a category to a numerical value.While some algorithms can work with categorical data directly (like decision trees), most Machine Learning Models cannot handle categorical features and were designed to operate with…----Towards Data ScienceData scientist, building machine learning models at scale www.linkedin.com/in/aichabokbot/Heiko HotzinTowards Data Science--16Cameron R. Wolfe, Ph.D.inTowards Data Science--8Giuseppe ScalamognainTowards Data Science--6Maarten GrootendorstinTowards Data Science--8Rayyan Shaikh--10Krishnakanth Naik JarapalainAI Skunks--Sruthy Nath--Andras GefferthinTowards Data Science--Maninder Singh--Peter Maina--1HelpStatusWritersBlogCareersPrivacyTermsAboutText to speechTeams



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

4 ways to encode categorical features with high cardinality

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×