Lesson 37 โข Advanced
Automation & Scripting for DevOps and System Tasks
Master Python automation for infrastructure management, deployment pipelines, monitoring, backups, and production system orchestration
What You'll Learn
- File and directory automation for logs, backups, and cleanup
- System command execution with subprocess
- Task scheduling and cron alternatives
- Server health monitoring and metrics collection
- API and webhook automation for CI/CD
- Docker container automation and management
- Kubernetes deployment automation
- Automated backups and data rotation
- Log processing and real-time monitoring
- Zero-downtime deployment scripts
- Infrastructure-as-Code patterns
- Building custom orchestration tools
Why Python for DevOps Automation?
Python has become the de facto standard for DevOps automation, replacing shell scripts with safer, more maintainable solutions.
| Feature | Bash Scripts | Python Automation |
|---|---|---|
| Error handling | Cryptic exit codes | try/except with clear messages |
| Cross-platform | Linux/Mac only | Works everywhere |
| API integration | Requires curl hacks | Native requests/boto3 |
| Maintainability | Hard to read at scale | Clean, testable code |
Common Use Cases
- Deployment automation and orchestration
- Server provisioning and configuration
- Backup and disaster recovery
- Log aggregation and analysis
- Monitoring and alerting
- Infrastructure health checks
- Secret rotation and security hardening
File & Directory Automation
Every DevOps workflow involves managing files: rotating logs, cleaning temporary data, synchronizing directories, and organizing backups.
Common Tasks
- Cleanup - Remove old temporary files and logs
- Log rotation - Compress and archive logs when they exceed size limits
- Directory sync - Keep backup directories in sync
- Backup management - Create and rotate backups automatically
- File monitoring - Watch for changes and trigger actions
Real-World Example
A production CI server runs a cleanup script every hour to remove build artifacts older than 7 days, preventing disk space exhaustion. This same pattern applies to log management, cache cleanup, and temporary file handling.
System Command Execution
The subprocess module provides safe, controlled execution of system commands with proper error handling and timeout management.
Best Practices
- Always use lists -
["ls", "-la"]not"ls -la" - Set timeouts - Prevent hanging on unresponsive commands
- Capture output - Capture stdout/stderr for logging and debugging
- Check return codes - Non-zero means failure
- Avoid shell=True - Prevents injection attacks
Common Operations
- Restarting systemd services
- Checking service status
- Running Docker and Kubernetes commands
- Executing build and deployment scripts
- Managing SSH connections
Task Scheduling
Modern DevOps requires more intelligent scheduling than traditional cron. Python provides flexible alternatives.
| Tool | Best For | Complexity |
|---|---|---|
| Cron | Simple, one-off scripts | Low |
| APScheduler | In-process scheduling | Medium |
| Celery Beat | Distributed, high-volume | High |
Scheduling Options
Traditional Cron
0 2 * * * /usr/bin/python3 /scripts/backup.pySimple but limited
APScheduler (Python)
More powerful: retry on failure, parallel execution, event-based triggers, state management
Celery Beat
Distributed task queue with advanced scheduling capabilities
Typical Scheduled Tasks
- Daily database backups at 2 AM
- Log rotation every 6 hours
- Health checks every 5 minutes
- Cleanup scripts at midnight
- Certificate renewal checks weekly
Server Health Monitoring
Proactive monitoring prevents outages. Python can track system resources and alert teams before problems escalate.
Metrics to Monitor
- CPU usage - Alert on sustained high usage
- Memory consumption - Prevent OOM kills
- Disk space - Alert before running out
- Network I/O - Detect unusual traffic patterns
- Process health - Ensure critical services are running
- System uptime - Track stability
The psutil Library
psutil is the standard for cross-platform system monitoring in Python:
pip install psutilProvides CPU, memory, disk, network, and process information on Linux, macOS, and Windows.
API & Webhook Automation
Modern infrastructure is API-driven. Python integrates seamlessly with CI/CD systems, monitoring tools, and cloud platforms.
Common Integrations
- CI/CD triggers - GitHub Actions, GitLab CI, Jenkins
- Alerting - Slack, PagerDuty, Discord webhooks
- Monitoring - Datadog, Prometheus, Grafana APIs
- Cloud providers - AWS, GCP, Azure management APIs
- Container registries - Docker Hub, ECR, GCR
Automation Patterns
Event-driven deployment
Git push โ trigger pipeline โ deploy
Automated alerting
High CPU โ send Slack alert โ scale infrastructure
Self-healing systems
Service down โ restart automatically โ notify team
Docker Automation
The Docker Python SDK enables comprehensive container lifecycle management from within Python scripts.
Installation
pip install dockerAutomation Tasks
- Cleanup - Remove stopped containers and dangling images
- Health checks - Monitor container health status
- Auto-restart - Restart unhealthy containers
- Log collection - Aggregate logs from all containers
- Image management - Build, tag, and push images
- Resource limits - Monitor and enforce CPU/memory limits
Production Use Case
A maintenance script runs nightly to clean up stopped containers and dangling images, preventing disk space issues. It also restarts any containers marked as unhealthy by Docker's health checks.
Kubernetes Automation
The Kubernetes Python client allows programmatic cluster management, enabling GitOps-style automation.
Installation
pip install kubernetesAutomation Capabilities
- Deployment management - Scale, update, rollback deployments
- Pod operations - List, inspect, delete pods
- ConfigMap/Secret updates - Rotate configurations safely
- Health monitoring - Check pod and node health
- Auto-scaling - Adjust replicas based on metrics
- Resource cleanup - Remove completed jobs and old pods
Advanced Patterns
โข Blue/green deployments - maintain two production environments
โข Canary releases - gradually roll out changes to a subset of users
โข Automatic rollback - revert on health check failure
โข Multi-cluster management - orchestrate across regions
Backup Automation & Data Rotation
Regular, automated backups are essential for disaster recovery. Python orchestrates the entire backup lifecycle.
Backup Strategy
- Database dumps - MySQL, PostgreSQL, MongoDB
- File system backups - Compress and archive directories
- Cloud sync - Upload to S3, Google Cloud Storage, Azure Blob
- Rotation policy - Keep daily (7 days), weekly (4 weeks), monthly (12 months)
- Verification - Test restore capability periodically
- Encryption - Encrypt backups before storage
3-2-1 Backup Rule
3 copies of data โข 2 different media types โข 1 offsite copy
Python scripts can implement this automatically: local disk, network storage, cloud backup.
Log Processing & Automated Alerts
Logs contain critical information about system health, security events, and errors. Automated analysis prevents issues from going unnoticed.
Log Analysis Tasks
- Error detection - Count and categorize errors
- Pattern matching - Find security threats or anomalies
- Performance analysis - Identify slow queries and requests
- Real-time monitoring - Tail logs and alert immediately
- Aggregation - Combine logs from multiple services
- Visualization - Generate reports and dashboards
Alert Triggers
โข Error threshold - Alert when error rate exceeds 1%
โข Security events - Failed login attempts, suspicious patterns
โข Performance degradation - Response time above threshold
โข Service crashes - Application or container restarts
Zero-Downtime Deployment
Production deployments must minimize or eliminate downtime. Python orchestrates sophisticated deployment strategies.
Deployment Pipeline
1. Pull latest code from Git
2. Run test suite - abort on failure
3. Build Docker image
4. Push to container registry
5. Update Kubernetes deployment
6. Wait for health checks to pass
7. Rollback automatically if unhealthy
8. Send deployment notification
Safety Mechanisms
- Rolling updates - replace pods gradually
- Health checks - verify each new pod before proceeding
- Automatic rollback on failure
- Smoke tests after deployment
- Traffic shifting strategies
Security Considerations
Automation scripts often run with elevated privileges. Security must be a top priority.
โ Best Practices
- Store secrets in environment variables or secret managers
- Never hardcode passwords, API keys, or tokens
- Use least-privilege principles for automation accounts
- Validate all inputs before executing system commands
- Avoid
shell=Truein subprocess calls - Keep logs free of sensitive information
- Implement audit trails for all automation actions
- Use encrypted connections for remote operations
โ Security Anti-Patterns
- Hardcoding credentials in scripts
- Running automation as root unnecessarily
- Accepting user input without validation
- Logging sensitive data
- Storing backups without encryption
- Ignoring certificate validation
Building Production-Ready Automation
Professional automation systems require more than working code. They need reliability, observability, and maintainability.
Essential Components
- Logging - Comprehensive, structured logs with context
- Error handling - Graceful failure and recovery
- Monitoring - Track automation success/failure rates
- Documentation - Clear runbooks and troubleshooting guides
- Testing - Unit and integration tests for automation logic
- Version control - Git for all automation scripts
- Idempotency - Scripts can run multiple times safely
The DevOps Loop
Write automation โ Test thoroughly โ Deploy โ Monitor โ Learn from failures โ Improve โ Repeat
Every automation failure is an opportunity to make the system more resilient.
Complete DevOps Automation Examples
Explore comprehensive automation scripts for file management, system monitoring, Docker, Kubernetes, and more
# DevOps Automation & Scripting Examples
# ============================================
# 1. FILE & DIRECTORY AUTOMATION
# ============================================
import os
import shutil
import time
from pathlib import Path
from typing import List
def cleanup_old_files(directory: str, days: int = 7):
"""Remove files older than specified days"""
cutoff_time = time.time() - (days * 86400)
for file_path in Path(directory).rglob('*'):
if file_path.is_file() and file_
...Key Takeaways
- Python is the industry standard for DevOps automation due to cross-platform support and rich libraries
- Automate repetitive tasks: file cleanup, backups, deployments, monitoring
- Use subprocess safely with timeouts and proper error handling
- Modern scheduling goes beyond cron - build intelligent task runners
- Monitor system health proactively with psutil and automated alerts
- Docker and Kubernetes Python SDKs enable comprehensive container orchestration
- Implement zero-downtime deployments with health checks and automatic rollback
- Security is critical - never hardcode secrets, validate inputs, use least privilege
- Production automation requires logging, monitoring, testing, and documentation
- Build self-healing systems that detect and correct issues automatically
๐ Quick Reference โ DevOps Automation
| Tool / Module | What it does |
|---|---|
| pathlib.Path | Modern file and directory manipulation |
| subprocess.run(cmd, check=True) | Run shell commands from Python |
| shutil.copy2 / shutil.rmtree | High-level file operations |
| docker SDK | Manage Docker containers from Python |
| psutil | Monitor CPU, memory, and processes |
๐ Great work! You've completed this lesson.
You can now automate deployments, manage infrastructure, and build self-healing systems using Python's DevOps toolkit.
Up next: Language Integration โ call C and Rust code from Python for maximum performance.
Sign up for free to track which lessons you've completed and get learning reminders.