Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Beyond Numpy and Pandas: Unlocking the Potential of Lesser-Known Python Libraries

Sed ut perspiciatis unde. Xarray is a Python library that extends the features and functionalities of NumPy, giving us the possibility to work with labeled arrays and datasets.As they say on their website, in fact:Xarray makes working with labeled multi-dimensional arrays in Python simple, efficient, and fun!And also:Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like multidimensional arrays, which allows for a more intuitive, more concise, and less error-prone developer experience.In other words, it extends the functionality of NumPy arrays by adding labels or coordinates to the array dimensions. These labels provide metadata and enable more advanced analysis and manipulation of multi-dimensional data.For example, in NumPy, arrays are accessed using integer-based indexing.In Xarray, instead, each dimension can have a label associated with it, making it easier to understand and manipulate the data based on meaningful names.For example, instead of accessing data with arr[0, 1, 2], we can use arr.sel(x=0, y=1, z=2) in Xarray, where x, y, and z are dimension labels.This makes the code much more readable!So, let’s see some features of Xarray.As usual, to install it:FEATURE ONE: WORKING WITH LABELED COORDINATESSuppose we want to create some data related to temperature and we want to label these with coordinates like latitude and longitude. We can do it like so:# Create temperature datatemperature = np.random.rand(100, 100) * 20 + 10# Create coordinate arrays for latitude and longitudelatitudes = np.linspace(-90, 90, 100)longitudes = np.linspace(-180, 180, 100)# Create an Xarray data array with labeled coordinatesda = xr.DataArray(temperature,dims=['latitude', 'longitude'],coords={'latitude': latitudes, 'longitude': longitudes})# Access data using labeled coordinatessubset = da.sel(latitude=slice(-45, 45), longitude=slice(-90, 0))And if we print them we get:>>>array([[13.45064786, 29.15218061, 14.77363206, ..., 12.00262833,16.42712411, 15.61353963],[23.47498117, 20.25554247, 14.44056286, ..., 19.04096482,15.60398491, 24.69535367],[25.48971105, 20.64944534, 21.2263141 , ..., 25.80933737,16.72629302, 29.48307134],...,[10.19615833, 17.106716 , 10.79594252, ..., 29.6897709 ,20.68549602, 29.4015482 ],[26.54253304, 14.21939699, 11.085207 , ..., 15.56702191,19.64285595, 18.03809074],[26.50676351, 15.21217526, 23.63645069, ..., 17.22512125,13.96942377, 13.93766583]])Coordinates:* latitude (latitude) float64 -44.55 -42.73 -40.91 ... 40.91 42.73 44.55* longitude (longitude) float64 -89.09 -85.45 -81.82 ... -9.091 -5.455 -1.818So, let’s see the process step-by-step:The result is also easily readable, so labeling is really helpful in a lot of cases.FEATURE TWO: HANDLING MISSING DATASuppose we’re collecting data related to temperatures during the year. We want to know if we have some null values in our array. Here’s how we can do so:# Create temperature data with missing valuestemperature = np.random.rand(365, 50, 50) * 20 + 10temperature[0:10, :, :] = np.nan # Set the first 10 days as missing values# Create time, latitude, and longitude coordinate arraystimes = pd.date_range('2023-01-01', periods=365, freq='D')latitudes = np.linspace(-90, 90, 50)longitudes = np.linspace(-180, 180, 50)# Create an Xarray data array with missing valuesda = xr.DataArray(temperature,dims=['time', 'latitude', 'longitude'],coords={'time': times, 'latitude': latitudes, 'longitude': longitudes})# Count the number of missing values along the time dimensionmissing_count = da.isnull().sum(dim='time')# Print missing valuesprint(missing_count)>>>array([[10, 10, 10, ..., 10, 10, 10],[10, 10, 10, ..., 10, 10, 10],[10, 10, 10, ..., 10, 10, 10],...,[10, 10, 10, ..., 10, 10, 10],[10, 10, 10, ..., 10, 10, 10],[10, 10, 10, ..., 10, 10, 10]])Coordinates:* latitude (latitude) float64 -90.0 -86.33 -82.65 ... 82.65 86.33 90.0* longitude (longitude) float64 -180.0 -172.7 -165.3 ... 165.3 172.7 180.0And so we obtain that we have 10 null values.Also, if we take a look closely at the code, we can see that we can apply Pandas’ methods to an Xarray like isnull.sum(), as in this case, that counts the total number of missing values.FEATURE ONE: HANDLING AND ANALYZING MULTI-DIMENSIONAL DATAThe temptation to handle and analyze multi-dimensional data is high when we have the possibility to label our arrays. So, why not try it?For example, suppose we’re still collecting data related to temperatures at certain latitudes and longitudes.We may want to calculate the mean, the max, and the median temperatures. We can do it like so:# Create synthetic temperature datatemperature = np.random.rand(365, 50, 50) * 20 + 10# Create time, latitude, and longitude coordinate arraystimes = pd.date_range('2023-01-01', periods=365, freq='D')latitudes = np.linspace(-90, 90, 50)longitudes = np.linspace(-180, 180, 50)# Create an Xarray datasetds = xr.Dataset({'temperature': (['time', 'latitude', 'longitude'], temperature),},coords={'time': times,'latitude': latitudes,'longitude': longitudes,})# Perform statistical analysis on the temperature datamean_temperature = ds['temperature'].mean(dim='time')max_temperature = ds['temperature'].max(dim='time')min_temperature = ds['temperature'].min(dim='time')# Print values print(f"mean temperature:\n {mean_temperature}\n")print(f"max temperature:\n {max_temperature}\n")print(f"min temperature:\n {min_temperature}\n")>>>mean temperature:array([[19.99931701, 20.36395016, 20.04110699, ..., 19.98811842,20.08895803, 19.86064693],[19.84016491, 19.87077812, 20.27445405, ..., 19.8071972 ,19.62665953, 19.58231185],[19.63911165, 19.62051976, 19.61247548, ..., 19.85043831,20.13086891, 19.80267099],...,[20.18590514, 20.05931149, 20.17133483, ..., 20.52858247,19.83882433, 20.66808513],[19.56455575, 19.90091128, 20.32566232, ..., 19.88689221,19.78811145, 19.91205212],[19.82268297, 20.14242279, 19.60842148, ..., 19.68290006,20.00327294, 19.68955107]])Coordinates:* latitude (latitude) float64 -90.0 -86.33 -82.65 ... 82.65 86.33 90.0* longitude (longitude) float64 -180.0 -172.7 -165.3 ... 165.3 172.7 180.0max temperature:array([[29.98465531, 29.97609171, 29.96821276, ..., 29.86639343,29.95069558, 29.98807808],[29.91802049, 29.92870312, 29.87625447, ..., 29.92519055,29.9964299 , 29.99792388],[29.96647016, 29.7934891 , 29.89731136, ..., 29.99174546,29.97267052, 29.96058079],...,[29.91699117, 29.98920555, 29.83798369, ..., 29.90271746,29.93747041, 29.97244906],[29.99171911, 29.99051943, 29.92706773, ..., 29.90578739,29.99433847, 29.94506567],[29.99438621, 29.98798699, 29.97664488, ..., 29.98669576,29.91296382, 29.93100249]])Coordinates:* latitude (latitude) float64 -90.0 -86.33 -82.65 ... 82.65 86.33 90.0* longitude (longitude) float64 -180.0 -172.7 -165.3 ... 165.3 172.7 180.0min temperature:array([[10.0326431 , 10.07666029, 10.02795524, ..., 10.17215336,10.00264909, 10.05387097],[10.00355858, 10.00610942, 10.02567816, ..., 10.29100316,10.00861792, 10.16955806],[10.01636216, 10.02856619, 10.00389027, ..., 10.0929342 ,10.01504103, 10.06219179],...,[10.00477003, 10.0303088 , 10.04494723, ..., 10.05720692,10.122994 , 10.04947012],[10.00422182, 10.0211205 , 10.00183528, ..., 10.03818058,10.02632697, 10.06722953],[10.10994581, 10.12445222, 10.03002468, ..., 10.06937041,10.04924046, 10.00645499]])Coordinates:* latitude (latitude) float64 -90.0 -86.33 -82.65 ... 82.65 86.33 90.0* longitude (longitude) float64 -180.0 -172.7 -165.3 ... 165.3 172.7 180.0And we obtained what we wanted, also in a clearly readable way.And again, as before, to calculate the max, min, and mean values of temperatures we’ve used Pandas’ functions applied to an array.Source link Save my name, email, and website in this browser for the next time I comment.By using this form you agree with the storage and handling of your data. * Δdocument.getElementById( "ak_js_1" ).setAttribute( "value", ( new Date() ).getTime() );Tech dedicated news site to equip you with all tech related stuff.I agree that my submitted data is being collected and stored.✉️ Send us an emailTechToday © 2023. All Rights Reserved.TechToday.co is a technology blog and review site specializing in providing in-depth insights into the latest news and trends in the technology sector.TechToday © 2023. All Rights Reserved.Be the first to know the latest updatesI agree that my submitted data is being collected and stored.



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

Beyond Numpy and Pandas: Unlocking the Potential of Lesser-Known Python Libraries

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×