Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Time Series for Climate Change: Reducing Food Waste with Clustering | by Vitor Cerqueira | Jun, 2023

Sed ut perspiciatis unde. In the rest of this article, we’ll do a Clustering analysis of food demand time Series. You’ll learn how to:The full code is available on Github:We’ll use a weekly food sales time series collected by the US Department of Agriculture. This data set contains information about food sales by product category and subcategory. The time series is split by state, but we’ll use national total sales in each period.Below is a sample of the data set:Here’s what the whole data looks like:We’ll use a feature-based approach to time series clustering. This process involves two main steps:Let’s do each step in turn.We start by extracting a set of statistics to summarise each time series. The goal is to convert each series into a small set of features.There are several tools for time series feature extraction. We’ll use tsfel, which provides a competitive performance relative to other approaches [3].Here’s how you can use tsfel:# get configurationcfg = tsfel.get_features_by_domain()# extract features for each food subcategoryfeatures = {col: tsfel.time_series_features_extractor(cfg, data[col])for col in data}features_df = pd.concat(features, axis=0)This process results in a large number of features. Some of these may be redundant, so we carry a feature selection process.Below, we apply three operations to the feature set:# normalizing the featuresfeatures_norm_df = pd.DataFrame(MinMaxScaler().fit_transform(features_df),columns=features_df.columns)# removing features with 0 variancemin_var = VarianceThreshold(threshold=0)min_var.fit(features_norm_df)features_norm_df = pd.DataFrame(min_var.transform(features_norm_df),columns=min_var.get_feature_names_out())# removing correlated featuresfeatures_norm_df = correlation_filter(features_norm_df, 0.9)features_norm_df.index = data.columnsAfter preprocessing a data set, we’re ready to cluster time series. We summarise each series into a small set of unordered features. So, we can use any conventional algorithm for clustering. A popular choice is K-means.With K-means, we need to pick the number of clusters we want. Unless we have some domain knowledge, there’s no obvious apriori value for this parameter. But, we can carry out a data-driven approach to select the number of clusters. We test different values and pick the best one.Below, we test K-means with up to 24 clusters. Then, we pick the number of clusters that maximizes the silhouette score. This metric quantifies the cohesion of the clusters obtained.kmeans_parameters = {'init': 'k-means++','n_init': 100,'max_iter': 50,}n_clusters = range(2, 25)silhouette_coef = []for k in n_clusters:kmeans = KMeans(n_clusters=k, **kmeans_parameters)kmeans.fit(features_norm_df)score = silhouette_score(features_norm_df, kmeans.labels_)silhouette_coef.append(score)The silhouette score is maximized for 5 clusters as shown in the figure below.We can draw a parallel coordinates plot to understand the profile of each cluster. Here’s an example with a sample of three features:We can also use the information about clusters to improve demand forecasting models. For example, by building a model for each cluster. The paper in reference [5] is a good example of this approach.Hierarchical clustering is an alternative to K-means. It combines pairs of clusters iteratively, leading to a tree-like structure. The library scipy provides an implementation for this method.# hierarchical clustering using the ward methodclustering = shc.linkage(features_norm_df, method='ward')# plotting the dendrogramdend = shc.dendrogram(clustering,labels=categories.values,orientation='right',leaf_font_size=7)The results of a hierarchical clustering model are best visualized with a dendrogram plot:We can use the dendrogram to understand the clusters’ profiles. For example, we can see that most canned items are grouped (orange color). Oranges also cluster with pancake/cake mixes. These two often go together in people’s breakfast.Source link Save my name, email, and website in this browser for the next time I comment.By using this form you agree with the storage and handling of your data. * Δdocument.getElementById( "ak_js_1" ).setAttribute( "value", ( new Date() ).getTime() );Tech dedicated news site to equip you with all tech related stuff.I agree that my submitted data is being collected and stored.✉️ Send us an emailTechToday © 2023. All Rights Reserved.TechToday.co is a technology blog and review site specializing in providing in-depth insights into the latest news and trends in the technology sector.TechToday © 2023. All Rights Reserved.Be the first to know the latest updatesI agree that my submitted data is being collected and stored.



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

Time Series for Climate Change: Reducing Food Waste with Clustering | by Vitor Cerqueira | Jun, 2023

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×