Dataclass For Django Custom Command Arguments
Dataclasses
I was cleaning up a custom Django command yesterday and ended using a dataclass as a way to keep the code DRY.
If dataclasses are new to you, don’t worry. Dataclasses are similar to classes, but they have boilerplates that handle many common method and attribute definitions for you. For example, instead of you having to explicitly define the constructor and assign attributes such as below,
class MyClass:
def __init__(self, param1: str, param2: int, ...):
self.param1 = param1
self.param2 = param2
...
this code is already defined and hidden away. You just need to define the fields.
@dataclass
class MyClass:
param1: str
param2: int
...
There are many useful features of dataclasses. In my specific use case, I leveraged a dataclass for two purposes:
- To group of a set of variables and logic that are commonly used together.
- To centralize default value definitions.
The Original Custom Command
My custom command allows me to scrape a website. It takes in arguments from the command line and builds the url query string for the website. I can run a command like this:
python manage.py scrape --start-date=2025-05-01 --end-date=2025-05-14 --max-items-per-page=20
None of the arguments are required, but they have default values whose constants are defined near the top of the file.
from django.core.management.base import BaseCommand
BASE_URL = "https://code.djangoproject.com/query"
DATE_STR = (datetime.now() - timedelta(days=3)).strftime("%Y-%m-%d")
MAX_ITEMS_PER_PAGE = 100
START_PAGE = 1
...
The arguments are registered with their default values by invoking parser.add_argument(..., default=<CONSTANT>, ...)
.
...
class Command(BaseCommand):
def add_arguments(self, parser):
parser.add_argument("--start-date", default=DATE_STR, help="Start date in format YYYY-mm-dd")
parser.add_argument("--end-date", default="", help="End date in format YYYY-mm-dd")
parser.add_argument("--start-page", default=START_PAGE, type=int)
parser.add_argument("--max-items-per-page", default=MAX_ITEMS_PER_PAGE, type=int)
...
When the command is run, it instantiates an object for the scraper class using the values from the command line arguments.
...
class Command(BaseCommand):
...
def handle(self, *args, **options):
scraper = TracScraper(
options["start_date"],
options["end_date"],
options["start_page"],
options["max_items_per_page"],
)
scraper.scrape()
...
The constructor for the scraper class consists of keyword arguments which are used to initialize the object attributes. The default values for the keyword arguments are coming from the constants defined near the top of the file. The constructor also includes logic to build the url query string from the attributes.
...
class TracScraper:
def __init__(self, start_date=DATE_STR, end_date=DATE_STR, start_page=START_PAGE, max_items_per_page=MAX_ITEMS_PER_PAGE):
self.start_date = start_date
self.end_date = end_date
self.start_page = start_page
self.max_items_per_page = max_items_per_page
query = f"changetime={self.start_date}..{self.end_date}&start_page={self.start_page}&max_items_per_page={self.max_items_per_page}"
self.url = f"{BASE_URL}?{query}"
def scrape(self):
...
This seems typical for a custom command. However, there are a few places of redundancy.
- While global constants are used to define the default values, referencing them in the function signature of
TracScraper.__init__()
can feel unwieldy and repetitive when there is a much larger number of command line arguments. - When another scraper class is introduced for the same website, not only will it likely have a similar function signature for its constructor class, but it will also need the logic for building the url.
Using Dataclass
I now use a dataclass that includes a field for each command line argument. It also includes the base_url
. The global constants near the top of the file have been removed and their values are set directly on the fields.
from dataclasses import dataclass
from django.core.management.base import BaseCommand
@dataclass
class Params:
start_date: str = (datetime.now() - timedelta(days=3)).strftime("%Y-%m-%d")
end_date: str = ""
max_items_per_page: int = 100
start_page: int = 1
base_url: str = "https://code.djangoproject.com/query"
...
I add a field for start_url
that defaults to an empty string. Then in the __post_init__()
method, I included the logic that builds and assigns the query string to start_url
.
...
@dataclass
class Params:
...
start_url: str = ""
def __post_init__(self):
query_params = f"changetime={self.start_date}..{self.end_date}&max={self.max_items_per_page}&page={self.start_page}"
self.start_url = f"{self.base_url}&{query_params}"
...
When registering the command line arguments, an instance of the Params
dataclass is created with all the fields assuming their default values.
class Command(BaseCommand):
def add_arguments(self, parser):
# Instantiate a dataclass where all fields assume their default values.
p = Params()
parser.add_argument("--start-date", default=p.start_date, help="Start date in format YYYY-mm-dd")
parser.add_argument("--end-date", default=p.end_date help="End date in format YYYY-mm-dd")
parser.add_argument("--start-page", default=p.start_page, type=int)
parser.add_argument("--max-items-per-page", default=p.max_items_per_page, type=int)
When the command is run, it instantiates an object for the dataclass using the values from the command line arguments, then passes this Params
object to instantiate the scraper class.
class Command(BaseCommand):
...
def handle(self, *args, **options):
p = Params(
options["start_date"],
options["end_date"],
options["start_page"],
options["max_items_per_page"]
)
scraper = TracScraper(p)
scraper.scrape()
The constructor for the scraper class now consists of just the Params
dataclass. It no longer needs to be concerned about defining the function signature with keyword arguments and their corresponding default values. It also doesn’t need to be concerned about the logic for building the url query string. All the variables are grouped together, including the start_url
variable derived from the command line arguments.
class TracScraper:
def __init__(self, params: Param):
self.params = params
def scrape(self):
...
Full Code
Here is the original custom command file:
from django.core.management.base import BaseCommand
BASE_URL = "https://code.djangoproject.com/query"
DATE_STR = (datetime.now() - timedelta(days=3)).strftime("%Y-%m-%d")
MAX_ITEMS_PER_PAGE = 100
START_PAGE = 1
class Command(BaseCommand):
def add_arguments(self, parser):
parser.add_argument("--start-date", default=DATE_STR, help="Start date in format YYYY-mm-dd")
parser.add_argument("--end-date", default="", help="End date in format YYYY-mm-dd")
parser.add_argument("--start-page", default=START_PAGE, type=int)
parser.add_argument("--max-items-per-page", default=MAX_ITEMS_PER_PAGE, type=int)
def handle(self, *args, **options):
scraper = TracScraper(
options["start_date"],
options["end_date"],
options["start_page"],
options["max_items_per_page"],
)
scraper.scrape()
class TracScraper:
def __init__(self, start_date=DATE_STR, end_date=DATE_STR, start_page=START_PAGE, max_items_per_page=MAX_ITEMS_PER_PAGE):
self.start_date = start_date
self.end_date = end_date
self.start_page = start_page
self.max_items_per_page = max_items_per_page
query = f"changetime={start_date}..{end_date}&start_page={start_page}&max_items_per_page={max_items_per_page}"
self.url = f"{BASE_URL}?{query}"
def scrape(self):
...
Here is the updated file in one piece after using the dataclass:
from dataclasses import dataclass
from django.core.management.base import BaseCommand
@dataclass
class Params:
start_date: str = (datetime.now() - timedelta(days=3)).strftime("%Y-%m-%d")
end_date: str = ""
start_page: int = 1
max_items_per_page: int = 1000
base_url: str = "https://code.djangoproject.com/query"
start_url: str = ""
def __post_init__(self):
query_params = f"changetime={self.start_date}..{self.end_date}&max={self.max_items_per_page}&page={self.start_page}"
self.start_url = f"{self.base_url}&{query_params}"
class Command(BaseCommand):
def add_arguments(self, parser):
p = Params()
parser.add_argument("--start-date", default=p.start_date, help="Start date in format YYYY-mm-dd")
parser.add_argument("--end-date", default=p.end_date, help="End date in format YYYY-mm-dd")
parser.add_argument("--start-page", default=p.start_page, type=int)
parser.add_argument("--max-items-per-page", default=p.max_items_per_page, type=int)
def handle(self, *args, **options):
p = Params(
options["start_date"],
options["end_date"],
options["start_page"],
options["max_items_per_page"]
)
scraper = TracScraper(p)
scraper.scrape()
class TracScraper:
def __init__(self, params: Param):
self.params = params
def scrape(self):
...
Final Thoughts
What are your thoughts on this particular use case for grouping command line arguments into a dataclass?
I think it is interesting and it also seems like a natural use case. I like how much easier it is to create the scraper class without worrying about redefining the default values for the keyword arguments. It makes the function signature much shorter. It also reduces the likelihood of assigning the wrong default values and having them be out of sync between the scraper class and the command line argument registry.
How have you used dataclasses? What do you like about them?