A Python dictionary is used to store key-value pairs and is the implementation for hash maps in Python.
Each key in a Python dictionary is unique and can be only of an immutable data type such as string
, int
, tuple
, etc.
There is no restriction on the values, they can be of any data type.
If you try to access a key that does not exist in a Python dictionary, you will get a “KeyError
“.
d1 = {"Ashley":42, "Jacob":24, "Katherine":31} print(d1["Ashley"]) #key exists, OK print(d1["Katherine"]) #key exists, OK print(d1["Melanie"]) #key absent, Error
Output:
To overcome this problem, and to better handle this kind of error, Python provides an alternative called defaultdict
which is part of its in-built collection
module.
Table of Contents
What is defaultdict?
defaultdict
is a subclass of Python’s standard dict
class and works almost similar to the standard dictionary, with the additional provision of specifying default values for missing keys.
Let’s reimplement the dictionary from the previous example, this time using defaultdict
with a default value of 0.
from collections import defaultdict d2 = defaultdict(int) #setting the default callable to int() print("Defaultdict d2 initialized:", d2) #Assigning key-value pairs d2["Ashley"]=42 d2["Jacob"]=24 d2["Katherine"]=31 print("d2 after setting some keys:",d2) #accessing existent and non-existent keys print(d2["Ashley"]) #key exists, returns corresponding value print(d2["Katherine"]) #key exists, returns corresponding value print(d2["Melanie"]) #key absent, returns default value using int()
Output:
The defaultdict
constructor as the first parameter a ‘default_factory
‘ method which is called whenever a missing key is accessed on the dictionary.
In the above example, we pass int
as the default_factory
method. Whenever int()
is called, it returns a 0. Hence, when we access the key ‘Melanie’, we get the value 0.
Note that if we don’t pass any value to the default_factory
method, its default value is set to None
, in which case our defaultdict
will work as the standard dict
and will raise a KeyError
in case a missing key is accessed.
We could also define our own custom method or pass a lambda
function, that would return any other desired value to be used as the default value for our dictionary.
Let’s take the same example and set the default value to 99, this time using our custom callable.
from collections import defaultdict # our default method that will be called in case of missing key access def get_default_value(): return 99 d3 = defaultdict(get_default_value, {"Ashley":42, "Jacob":24, "Katherine":31}) print("Dictionary d3:", d3) #accessing existent and non-existent keys print(d2["Ashley"]) #key exists, returns corresponding value print(d2["Katherine"]) #key exists, returns corresponding value print(d2["Melanie"]) #key absent, returns default value using get_default_value()
Output:
This time, when we accessed the key ‘Melanie’, our user-defined function get_default_value
was called to return the default value.
Note that the callable passed as default_factory
is called with no arguments, so make sure you define your method accordingly with the matching signature.
How does defaultdict works?
Whenever we access any value of a dictionary, using the subscript operator [ ]
, both Python’s standard dict
as well as the defaultdict
objects internally call the __getitem__
method.
If the dictionary has the specified key, then the __getitem__
method returns the value of that key.
If the key does not exist, then it internally calls the __missing__
method.
The __missing__
method will raise the KeyError
in the case of standard dictionaries, and in case the default_factory
parameter is set to None
for the defaultdict
.
If it is not set to None
, then it will call the method passed as the argument to the default_factory
parameter.
You can test this by directly calling these methods on the defaultdict
object.
from collections import defaultdict d4 = defaultdict(lambda : 99, {"Ashley":42, "Jacob":24, "Katherine":31}) #specifying a lambda function as the default callable print("Dictionary d4:", d4) print(d4.__getitem__("Ashley")) #key exists, returns 42 print(d4.__getitem__("Jacob")) #key exists, returns 24 print(d4.__getitem__("Ashton")) #key does not exist, calls __missing__, which in turn calls the lambda method we passed. #directly calling the __missing__ method print("d4.__missing__('Ashton') = ",d4.__missing__("Ashton"))
Output:
Appending to list values in defaultdict
In Python dict
, if you used lists as values and if you wanted to update them dynamically, say in a loop, you always have to check if the key exists before appending values to the corresponding list.
If the key doesn’t exist, you create a new list else you append it to the existing list.
Let’s make a dictionary representing even and odd values up to (and excluding) 20. The even values are identified by the key 0, and the odd values by 1.
d_even_odd = dict() #empty dictionary for i in range(20): key = i%2 if key in d_even_odd: #key exists, list has already been created d_even_odd[key].append(i) else: #key doesn't exist, create one and assign a list with 1 element d_even_odd[key] = [i] for k in d_even_odd: print(f"{k}: {d_even_odd[k]}")
Output:
To avoid this hassle of always checking if the key exists and then performing a certain operation is exactly where defaultdict
becomes the most useful alternative.
We can simply define a defaultdict
with the callable list
.
This way whenever we access a key that doesn’t exist, an empty list will be returned, to which we can append the desired value and this updated list will be mapped to the respective key.
from collections import defaultdict dd_even_odd = defaultdict(list) #empty defaultdict with list() as default callable. for i in range(20): key = i%2 # no if condition, missing keys handled implicitly dd_even_odd[key].append(i) for k in dd_even_odd: print(f"{k}: {dd_even_odd[k]}")
Output:
Length of defaultdict
The length of a defaultdict
indicating the number of key-value pairs in the dictionary can be computed by passing the defaultdict
object to the len
method.
This is the same as we would do for the standard dict
.
from collections import defaultdict dd_powers = defaultdict(list) for i in range(8): dd_powers[i].extend([i**2, i**0.5, i**3]) #appending square, square root and cube for k in dd_powers: print(f"{k}: {dd_powers[k]}") print("\nlength of the defaultdict:", len(dd_powers))
Output:
Removing an item from defaultdict
We can remove elements from a defaultdict
dictionary the way we do in the standard Python dictionaries, i.e using the del
operator or the pop
method.
from collections import defaultdict name_lengths = defaultdict(int) names = ["Aman", "Shanaya", "Harris", "Alwyn"] for n in names: name_lengths[n] = len(n) print(f"Current dictionary:") print(name_lengths) del name_lengths["Shanaya"] #removing "Shanaya" deleted_val = name_lengths.pop("Harris") #removing "Harris", returns deleted value print(f"\nDeleted value:",deleted_val) print(f"\nAfter deleting two keys:") print(name_lengths)
Output:
If the requested key doesn’t exist, the del
statement raises the KeyError
.
The pop
method returns the deleted value.
If the key does not exist, it raises the KeyError
or returns the default value specified by the optional parameter d
.
Get a list of keys in defultdict
To get the list of keys in a defaultdict
dictionary, we can call the keys()
method on the defaultdict
object.
The method returns a dict_keys
object containing all the keys of the object.
The dict_keys
object is an iterable, we can iterate over it to get the individual keys or we can convert it to a Python list using the list
method.
The keys
method is also defined in Python’s dict
class, which is a parent class of the defaultdict
class.
from collections import defaultdict name_lengths = defaultdict(int) names = ["Aman", "Shanaya", "Harris", "Alwyn"] for n in names: name_lengths[n] = len(n) print(f"Current dictionary:") print(name_lengths) print(name_lengths.keys()) keys_list = list(name_lengths.keys()) print("\nKeys:",keys_list)
Output:
Checking the existence of keys in defaultdict
Although we don’t need to check for the existence of a key before accessing it in a defaultdict
, we might still want to find out if a certain key exists in the dictionary or not.
To do this, we use Python’s in
operator that is used with almost all kinds of containers in Python to check if a certain element is present in that container.
from collections import defaultdict divisibility_by_4 = defaultdict(list) for i in range(21): divisibility_by_4[i%4].append(i) print(f"Current dictionary:",divisibility_by_4) print("3 exists?") print(3 in divisibility_by_4) #True, divisibility by 4 can leave remainder 3 print("6 exists?") print(6 in divisibility_by_4) #False, divisor 4 can never produce remainder 6
Output:
Sort a Python defaultdict
By default, Python dictionaries are unordered. That is the reason you cannot index Python dictionaries as there is no notion of the ‘position’ of elements.
So there is no point in sorting a dictionary, whether standard dict
or a defaultdict
object in their original form.
However we can obtain the key-value pairs as an iterable dict_items
object using the items()
method, which we can sort by calling Python’s sorted()
method.
from collections import defaultdict def count_vowels(string): '''function to count number of vowels in a string''' count = 0 for c in str.lower(string): if c in "aeiou": count+=1 return count vowels_counter = defaultdict(int) #maps names to no. of vowels in them names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias"] for n in names: vowels_counter[n] = count_vowels(n) #assigning vowel count to each name print("Current defaultdict:\n",vowels_counter) items = vowels_counter.items() #get key-value pairs print("\ndefaultdict items:\n", items) print("type:",type(items)) items_sorted = sorted(items) #sort key-value pairs print("\nSorted defaultdict items:\n", items_sorted)
Output:
Now if we again try to create a defaultdict
using these sorted items, the resultant dictionary will still not have the desired sorted ordering.
from collections import defaultdict def count_vowels(string): '''function to count number of vowels in a string''' count = 0 for c in str.lower(string): if c in "aeiou": count+=1 return count vowels_counter = defaultdict(int) #maps names to no. of vowels in them names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias"] for n in names: vowels_counter[n] = count_vowels(n) #assigning vowel count to each name print("Current defaultdict:\n",vowels_counter) items = vowels_counter.items() #get key-value pairs items_sorted = sorted(items) #sort key-value pairs print("\nSorted defaultdict items:\n", items_sorted) # creating new defaultdict using sorted items vowels_counter_1 = defaultdict(int, items) #new defaultdict, unordered print(f"\ndefaultdict from sorted items:\n",vowels_counter_1)
Output:
In these examples, we resorted to default sorting, which is based on the first element of the tuple in the dict_items
list.
So the result is sorted by keys.
If we want to sort the items by values, we can specify a lambda
function indicating the basis of sorting using the key
parameter of the sorted
method.
from collections import defaultdict def count_vowels(string): '''function to count number of vowels in a string''' count = 0 for c in str.lower(string): if c in "aeiou": count+=1 return count vowels_counter = defaultdict(int) #maps names to no. of vowels in them names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias"] for n in names: vowels_counter[n] = count_vowels(n) #assigning vowel count to each name print("Current defaultdict:\n",vowels_counter) items = vowels_counter.items() #get key-value pairs items_sorted = sorted(items) #sort key-value pairs print("\nSorted defaultdict items:\n", items_sorted) items_sorted_by_value = sorted(items, key=lambda x: x[1]) #value is at pos.1 of key-val pair print("\ndefaultdict items sorted by value:\n", items_sorted_by_value)
Output:
defaultdict to JSON
JSON or JavaScript Object Notion is a popular format for data exchange over the internet.
It can comprise structures similar to both Python lists and dictionaries.
You often find internet APIs sending requests and receiving responses in the JSON format.
A file containing JSON data has the extension .json
.
Python provides the json
library to better parse JSON data from files and also to easily write data to JSON files.
The defaultdict
object (as well as the standard dict
object) can be dumped to a JSON file using the dump
or dumps
method of the json
module in Python.
The json.dumps
method converts the defaultdict
object into a string representation. We can write this string to a file using the write
method of the Python file handler.
We can also directly dump the defaultdict
data as JSON using the json.dump
method which accepts the dictionary and the file pointer opened in ‘write’ mode.
We can optionally set the parameter indent
for both these methods to an integer value to pretty print the output JSON with the specified indent level for each data element in JSON.
We can also direct these methods to sort the output JSON data by keys, using the optional boolean parameter sort_keys
. Let’s use all these options in an example.
import json from collections import defaultdict names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias", "Shanaya", "Harris", "Alwyn"] ages = [21, 23, 23, 26, 28, 19, 21, 22, 24] courses = ["CS", "Law", "Environment", "CS", "CS", "Environment", "Law", "Music", "CS"] students = defaultdict(dict) #creating defaultdict with dict callable #adding students data to defaultdict for i in range(len(names)): students[i+100]["name"] = names[i] #would first return an empty dict to which we assign key 'name' students[i+100]["age"] = ages[i] students[i+100]["course"] = courses[i] print(f"Current student data:") print(students) #converting to JSON string students_json = json.dumps(students, indent=3) #add indent of 3 print("\nStudents data as JSON string:") print(students_json) print("type:", type(students_json)) # dumping the string with open("students.json", "w") as f1: f1.write(students_json) print("JSON string dumped in students.json") #dumping json without string conversion with open("students_1.json", "w") as f2: json.dump(students, f2, indent=3, sort_keys=True) #sort the defaultdict keys in output json print("defaultdict directly dumped as JSON in students_1.json")
Output:
Our student data stored as defaultdict
will be dumped as JSON in the files students.json
and students_1.json
.
Defaultdict to Pandas DataFrame
Pandas DataFrames are one of the most popular libraries of storing and manipulating 2D tabular data, where each column can be a different datatype.
Pandas provides a way to convert a dictionary into a Pandas DataFrame.
We can pass our defaultdict
object directly to the pandas.DataFrame
method as an argument to the first data
parameter, in which case the row and column indices will be implicitly determined based on the given data.
A better way is to use the pd.DataFrame.from_dict
method which offers more flexibility in determining the orientation of the table.
Let us convert our student data from the previous example into a Pandas DataFrame.
import pandas as pd from collections import defaultdict names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias", "Shanaya", "Harris", "Alwyn"] ages = [21, 23, 23, 26, 28, 19, 21, 22, 24] courses = ["CS", "Law", "Environment", "CS", "CS", "Environment", "Law", "Music", "CS"] students = defaultdict(dict) #creating defaultdict with dict callable #adding students data to defaultdict for i in range(len(names)): students[i+100]["name"] = names[i] #would first return an empty dict to which we assign key 'name' students[i+100]["age"] = ages[i] students[i+100]["course"] = courses[i] print(f"Current student data:") print(students) #creating a dataframe from defaultdict object df_students = pd.DataFrame.from_dict(students, orient='index') #using defaultdict keys as row indices print(f"\nStudents data as DataFrames:") print(df_students)
Output:
We can also dump the defaultdict
object into a CSV file using Pandas’ to_csv
method.
import pandas as pd from collections import defaultdict names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias", "Shanaya", "Harris", "Alwyn"] ages = [21, 23, 23, 26, 28, 19, 21, 22, 24] courses = ["CS", "Law", "Environment", "CS", "CS", "Environment", "Law", "Music", "CS"] students = defaultdict(dict) #creating defaultdict with dict callable #adding students data to defaultdict for i in range(len(names)): students[i+100]["name"] = names[i] #would first return an empty dict to which we assign key 'name' students[i+100]["age"] = ages[i] students[i+100]["course"] = courses[i] print(f"Current student data:") print(students) #creating a dataframe from defaultdict object df_students = pd.DataFrame.from_dict(students, orient='index') #using defaultdict keys as row indices df_students.to_csv("students.csv", index_label="id") print("\nStudent data dumped to students.csv")
With the parameter value index_label="id"
, we indicate that we want to store the row indices as a separate column with the label “id” in the output CSV file.
Output:
Defaultdict to normal dict
Finally, let’s also look at how to convert a defaultdict
into the standard dict
type.
It is relatively straightforward, we can simply pass the defaultdict
object to the dict
constructor to convert it to the standard dictionary.
from collections import defaultdict names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias", "Shanaya", "Harris", "Alwyn"] ages = [21, 23, 23, 26, 28, 19, 21, 22, 24] courses = ["CS", "Law", "Environment", "CS", "CS", "Environment", "Law", "Music", "CS"] students = defaultdict(dict) #creating defaultdict with dict callable #adding students data to defaultdict for i in range(len(names)): students[i+100]["name"] = names[i] #would first return an empty dict to which we assign key 'name' students[i+100]["age"] = ages[i] students[i+100]["course"] = courses[i] print(f"Current student data:") print(students) print("type:",type(students)) students_d = dict(students) print(f"\nAfter converting to dict:") print(students_d) print("type:",type(students_d))
Output: