Lesson 32 • Advanced
Module & Package Architecture for Large Codebases
Learn how to structure and scale large Python applications. Master the architectural patterns used by Django, FastAPI, Airflow, and enterprise teams to build maintainable codebases with thousands of files.
Module & Package Architecture for Large Codebases
When your project grows beyond a few files, the MOST important factor for long-term success is structure. A great architecture makes your code:
- ✔ easier to understand
- ✔ easier to test
- ✔ easier to extend
- ✔ easier to debug
- ✔ easier to onboard new developers
- ✔ scale to thousands of files
This lesson teaches you exactly how professional Python teams structure + scale large applications.
🔥 1. Why Python Projects Need Good Architecture
Small scripts can look like this:
app.py
utils.py
database.pyBut large systems suffer without structure:
- ❌ circular imports
- ❌ duplicated logic
- ❌ unclear module responsibilities
- ❌ "god files" with 5,000–20,000 lines
- ❌ impossible navigation
- ❌ hard-to-test logic
- ❌ breaking one feature breaks everything
Large codebases need:
- modular isolation
- clear domain boundaries
- strong naming conventions
- proper folder hierarchy
- dependency direction
- package-level APIs
⚙️ 2. How Python Imports Actually Work
Understanding the import system is key.
When Python imports a module:
- It searches directories from
sys.path - Looks for a package folder OR
.pyfile - Executes the module once
- Caches it in
sys.modules
Meaning:
- ✔ imports are cached
- ✔ circular imports cause runtime errors
- ✔ module execution on import can be expensive
Rule:
👉 Keep import side-effects to zero. (No DB connections, no heavy computation.)
📦 3. What Is a Package? (And Why It Matters)
A package is a folder containing an __init__.py.
myapp/
__init__.py
users/
__init__.py
models.py
routes.pyWithout __init__.py, Python treats folders as namespace packages.
With it → proper isolated packages.
Use regular packages unless you need distributed namespace packages.
🧱 4. Standard Large-Scale Project Structure
Professional Python projects (Django, Flask, Airflow, FastAPI) use this format:
project/
app/
__init__.py
config/
__init__.py
settings.py
core/
__init__.py
exceptions.py
interfaces.py
domain/
__init__.py
users/
__init__.py
models.py
services.py
payments/
__init__.py
models.py
services.py
infrastructure/
__init__.py
db/
__init__.py
repository.py
connection.py
cache/
redis_client.py
api/
__init__.py
routes/
users.py
payments.py
tests/
...
scripts/
docs/This is clean because:
- domain layer contains business logic
- infrastructure contains external systems
- api layer exposes HTTP / CLI interface
- core holds shared primitives
- config holds settings
This scales to 100K+ lines.
🧩 5. The Layered Architecture (Most Common Design)
Layers:
1. API Layer
FastAPI routers, Flask routes, CLI commands.
2. Service Layer
Business logic, domain rules, coordination.
3. Data/Infrastructure Layer
Database, cache, filesystem, external APIs.
4. Core / Shared Components
Cross-cutting concerns:
- exceptions
- interfaces
- abstractions
- helpers
Benefits:
- ✔ isolates business logic
- ✔ minimizes circular imports
- ✔ pluggable infrastructure (swap DB easily)
- ✔ easier unit testing
🔌 6. Dependency Direction (The #1 Rule)
High-level modules must NOT depend on low-level modules.
Instead → low-level depends on high-level interfaces.
Example:
domain/service.py → depends on → interface
infrastructure/db_repository.py → implements interfaceThis enables:
- ✔ dependency injection
- ✔ testing with mock DB
- ✔ clean separation
- ✔ avoiding circular imports
🔍 7. Avoiding Circular Imports
Circular imports happen when two modules import each other:
a.py → imports b.py
b.py → imports a.pyFix by:
- ✔ moving shared logic into core
- ✔ using local imports inside functions
- ✔ introducing interfaces
- ✔ separating pure logic from IO
Example Fix:
Avoiding Circular Imports
Use interfaces to break circular dependencies
# instead of importing across layers
# from domain.user_service import get_user
# restructure using interfaces
from typing import Protocol
class UserGetter(Protocol):
def get_user(self, user_id: str): ...
# Now domain depends on interface, not concrete implementation
class UserService:
def __init__(self, user_getter: UserGetter):
self.user_getter = user_getter
def get_user_profile(self, user_id: str):
return self.user_getter.get_user(user_id)
print("Interface
...🧠 8. Internal Package APIs (__all__ and public API)
Every package should define what it exposes:
Package Public API
Define what your package exposes with __all__
# __init__.py example:
# Simulating what would be in separate files
class UserService:
def get_user(self, user_id):
return f"User {user_id}"
class User:
def __init__(self, name):
self.name = name
# Export only the public API
__all__ = ["UserService", "User"]
# Usage demonstration
service = UserService()
user = User("Alice")
print(service.get_user("123"))
print(f"Created user: {user.name}")Benefits:
- ✔ clean external imports
- ✔ hides internal details
- ✔ stable API for other modules
🚀 9. Configuration Architecture
NEVER scatter config constants in files.
Use:
app/config/settings.pySupport overrides:
- settings_local.py
- .env files
- environment variables
Avoid hard-coding secrets, URLs, DB credentials.
Configuration Management
Centralized config with environment variables
# config/settings.py
import os
from pathlib import Path
BASE_DIR = Path(__file__).resolve().parent.parent
DEBUG = os.getenv("DEBUG", "False") == "True"
DATABASE_URL = os.getenv("DATABASE_URL", "sqlite:///db.sqlite3")
SECRET_KEY = os.getenv("SECRET_KEY", "change-me-in-production")
# Feature flags
ENABLE_ANALYTICS = os.getenv("ENABLE_ANALYTICS", "True") == "True"
ENABLE_CACHING = os.getenv("ENABLE_CACHING", "False") == "True"
print(f"DEBUG mode: {DEBUG}")
print(f"Database: {DATABASE_URL}")
pri
...📚 10. Module Naming Conventions (Professional Standard)
Use:
models.py— classes representing dataservices.py— business logicrepository.py— DB accessroutes.py— API routestasks.py— background jobsexceptions.py— error definitionsutils.py— ONLY for generic helpers
Avoid:
- ❌ utils2.py
- ❌ helpers_mixed.py
- ❌ random_functions.py
Consistency wins.
🧪 11. Testing Architecture
Test mirrors app structure:
tests/
domain/
test_users.py
api/
test_routes.py
infrastructure/
test_repository.pyUse factories for test data.
Separate unit and integration tests.
Principles:
- domain tested heavily with unit tests
- infrastructure tested with mocks
- API tested with integration tests
Part 2: Domain-Driven Design & Advanced Patterns
🔥 12. Domain-Driven Design (DDD) in Python
DDD is a system-design method focused on modelling business rules, not framework limitations.
Its main idea:
- ✔ Code structure mirrors business structure
- ✔ Each domain is isolated
- ✔ Logic belongs to the domain, not to API or DB layers
- ✔ Domain objects represent real-world concepts
Example domain structure:
domain/
users/
models.py
services.py
validators.py
payments/
models.py
services.py
policies.py
inventory/
models.py
rules.pyWhy DDD fits Python:
- Python's dynamic nature simplifies domain modelling
- Dataclasses + type hints make models clean
- Package isolation prevents circular imports
- Encourages clean business rules without tech dependencies
Goal: Domains should NOT depend on frameworks or external APIs. Only infrastructure depends on domains.
⚙️ 13. Domain Layer Responsibilities
The domain layer should contain:
✔ Entities
Objects that have identity across time (e.g., User, Order).
✔ Value Objects
Objects identified by value, not identity (e.g., Price, Email, Coordinates).
✔ Domain Services
Logic that doesn't naturally belong to any one entity.
✔ Policies & Rules
Business validation, decision logic.
✔ Domain Exceptions
Errors specific to the domain.
✔ Events
e.g., UserRegistered, PaymentCompleted
What domain must NOT contain:
- ❌ database code
- ❌ API framework code
- ❌ external library calls
- ❌ logging / caching
- ❌ infrastructure details
This keeps the code clean, portable, and testable.
🧱 14. Infrastructure Layer (DB, Cache, External Systems)
Infrastructure is where all "real-world" systems live:
infrastructure/
db/
repository.py
connection.py
cache/
redis_client.py
external/
payment_gateway.py
sms_service.pyResponsibilities:
- actual SQL queries
- ORM models
- Redis connections
- HTTP clients
- integrations with Stripe, AWS, etc.
Must NOT:
- ❌ contain business rules
- ❌ call domain services
Instead → infrastructure implements interfaces defined in the domain.
Example interface:
Repository Pattern
Infrastructure implements domain interfaces
# domain/users/interfaces.py
from typing import Protocol
class UserRepository(Protocol):
def save(self, user): ...
def find_by_email(self, email: str): ...
# Infrastructure implementation:
# infrastructure/db/user_repository.py
class SQLUserRepository:
"""Implements UserRepository interface"""
def save(self, user):
# actual SQL logic
print(f"Saving user to database: {user}")
def find_by_email(self, email: str):
# actual SQL query
pr
...🧩 15. The API Layer
This is the "edge" of the app. Usually contains:
api/
routes/
users.py
payments.py
schemas.py
dependencies.pyFramework examples:
- Flask blueprints
- FastAPI routers
- Django views
- CLI commands (Click, Typer)
- Websocket handlers
Responsibilities:
- ✔ parsing requests
- ✔ converting domain errors → HTTP codes
- ✔ authentication
- ✔ response formatting
Must NOT:
- ❌ contain business decisions
- ❌ contain SQL queries
- ❌ talk directly to infrastructure
- ❌ hold domain logic
API should talk ONLY to domain services.
Part 3: Real-World Architectures
🔥 16. The Three Master Architectures for Large Python Projects
There are 3 real-world architectures used by engineering teams once codebases reach 50K+ lines:
✔ 1. Layered Architecture (most common)
api/
domain/
infrastructure/
core/Clear vertical layers with strict dependency rules.
✔ 2. Clean/Hexagonal Architecture (enterprise-grade)
domain/
models/
services/
events/
adapters/
db/
cache/
external/
application/
api/Domain is isolated and stable. Adapters wrap external systems. Application orchestrates flows.
✔ 3. Plugin / Modular Monolith (like Django)
users/
payments/
inventory/
notifications/
analytics/Each feature is an "app" with its own mini-architecture inside.
This is the architecture used by:
- Django
- Airflow
- Open edX
- Odoo
- Many enterprise monoliths
⚙️ 17. How Django Organises Massive Codebases
Django uses a modular app structure:
project/
settings/
core/
users/
payments/
api/
dashboard/Each "app" has:
- models.py
- views.py
- services.py
- signals.py
- admin.py
Benefits:
- ✓ isolation
- ✓ easy testing
- ✓ independent teams
- ✓ plugin marketplace (reusable apps)
Why this matters: If you model your website like this, you can scale to 200+ pages and thousands of functions without losing control.
🔥 18. How FastAPI Organises Modern Backend Projects
FastAPI encourages a clean, layered layout:
app/
main.py
api/
v1/
routes/
schemas/
services/
repositories/
models/
core/
config.py
security.py
events.py
db/
session.py
migrations/Strengths:
- Fast startup
- Async-first
- Domain + repository pattern
- Event-driven hooks
Perfect for scalable SaaS backends.
🎉 Final Conclusion
Across the three parts, you've now learned:
- ✔ Clean architecture
- ✔ Domain-driven design
- ✔ Layered module organization
- ✔ Plugin-based modular monolith
- ✔ Event-driven communication
- ✔ Dependency inversion
- ✔ Container-based dependency wiring
- ✔ API versioning for long-term stability
- ✔ Scaling to hundreds of modules
- ✔ How real companies structure Python systems
You now have the knowledge to architect and scale Python applications from startup MVPs to enterprise systems handling millions of users.
📋 Quick Reference — Module Architecture
| Concept | What it means |
|---|---|
| __init__.py | Makes a directory a Python package |
| __all__ = [...] | Control what's exported from a module |
| from . import module | Relative import within a package |
| importlib.import_module() | Dynamic import at runtime |
| src/ layout | Best-practice project structure |
🎉 Great work! You've completed this lesson.
You can now architect large Python codebases with proper packages, relative imports, and clean module boundaries.
Up next: Logging & Debugging — add professional observability to your Python applications.
Sign up for free to track which lessons you've completed and get learning reminders.