June 15th 2023

Posted on Jun 14 This blog was originally posted on https://ziro2mach.com my Learning "machine learning" blogperhaps the only thing about Machine Learning that's more important than machine learning itself is data pre-processing 🙃that's cuz as defined before machine learning is:the ~science~ math of taking in real world info, converting it into numbers and then ~finding~ learning a pattern out of itand info out in the real world brings along with it, ton of noiseas an advocate of learning by getting your hands dirty, here's an examplethere's something called the russel's circumplexsomething that helps quantify emotionscuz ML algorithms learn best when the data they work with is continuous numbers instead of traditional encoded classification data likewhile the class-ified data does represent numbers, the numberical value of a class doesn't always represent the intensity of an emotion, while russel's model gives you an activation and an pleseantness value that are already intensities of an emotionlet's say we find a dataset with paramenters we are looking forhere the column pic represents an 3d array of red, green and blue pixel values of an image containing an emotion and the rest are pretty straight forwardthe whole goal of training an ML model is so that we could us it to actively predict output on unseen data/situations. a simple way of doing that isthe remaining 20% can be used to value the performance of the model developednotice that there's some missing data in the age column,so there are 2 common ways of dealing with that missing datanote: works great for super ultra large datasets but since more data = better...many a times, the data in datasets is class data and while encoded class data might not always accurately represent the intensity of a parameter, something is better than nothingthere are 2 common ways of dealing with class data, lets take the gender columnwhen one column is split into number of class columns, like gender has 2 classes: male and female, so the gender columns gets split into 2 columns: a male column and female columnfor columns with binary classes, like true or false, male or female, yes or no, etc so that one of the class label is replaced with 0 and the other with 1different columns usually represent different parameters, and not all paraneters have the same proportion. assuming a dataset of age and height, the age column has a range of 1 to 100, while the height column perhaps has a range of 100cm to 200cmwhy is this important?when we plot these values without scaling em to the same range it would look likeand let's say we tried to find a line that best fit through the points it would look likehowever if we scaled the inputs to the same range, it would look like thiswhich even from a glance we can tell that the line better fits the model, i.e there is lesser error to predict for unseen datanow feature scaling is commonly done using 2 methodswhere x is the current input we want to scale, here's an example of normalization on the dataset we were working onthis leaves us with a ready for training datasetwe've done a lot of pre-processing on the training dataset, and testing data is going to look like the unclean training dataso we've to remember toyou get the point we've use the exact same operation tools used on the training dataset for the operation we would be doing on the testing datasetyuppp data people shouldn't become doctors 😝and with that we have testing data that is ready to be taken for a ride in our ML modelTemplates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse Dima Sukharev - Apr 26 Atsushi Suzuki - May 3 felix715 - Mar 28 Supapon Rabiebpo - Apr 9 Once suspended, lucidmach will not be able to comment or publish posts until their suspension is removed. Once unsuspended, lucidmach will be able to comment and publish posts again. Once unpublished, all posts by lucidmach will become hidden and only accessible to themselves. If lucidmach is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to Nukala Suraj. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag lucidmach: lucidmach consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging lucidmach will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.

10 Mind-Blowing Facts About Vortex Cl…
Best Baby toys 0 â€“ 6 months

This post first appeared on VedVyas Articles, please read the originial post: here

People also like

10 Mind-Blowing Facts About Vortex Cloud Gaming

Best Baby toys 0 â€“ 6 months

Intro to Data Pre-Processing using Quantified Emotion Recognition

Related Articles

Intro to Data Pre-Processing using Quantified Emotion Recognition

Related Articles

Share the post

Subscribe to Vedvyas Articles

Thank you for your subscription