10 tickets to migrate from CSV-based processing to a fully automated SAP-integrated AWS pipeline.
Architecture decisions, core business logic (processor, ETL, comparison engine), security, SAP team coordination, code review, cutover calls.
Lambda functions (ingestion & ETL), backend API endpoints, React frontend (dashboards, auth UI), validation scripts, integration tests.
CDK infrastructure (VPC, S3, DynamoDB, ECS, CloudFront), GitHub Actions CI/CD, Docker, CloudWatch monitoring, IAM, production runbook.
| Task | Owner | Rationale |
|---|---|---|
| Design VPC (public/private subnets, NAT, security groups) | Tech Lead | Architecture decision |
CDK project scaffolding (infra/, app.py, base stacks) | DevOps | Sets IaC patterns for the project |
| Secrets Manager entries (SAP API creds, JWT secret) | Tech Lead | Security-sensitive |
| GitHub Actions pipeline: Infrastructure (cdk diff → deploy) | DevOps | CI/CD ownership |
| GitHub Actions pipeline: Backend (Docker → ECR → ECS) | DevOps | CI/CD ownership |
| GitHub Actions pipeline: Frontend (build → S3 → CloudFront) | DevOps | CI/CD ownership |
| GitHub Actions pipeline: Lambda/ETL (package → deploy) | DevOps | CI/CD ownership |
| ECR repository CDK construct | DevOps | Container registry infra |
cdk deploy creates VPC + networking. 4 CI/CD pipelines run on push. Secrets stored.| Task | Owner | Rationale |
|---|---|---|
| S3 buckets CDK (6 buckets, lifecycle policies, SSE-KMS encryption) | DevOps | Infrastructure |
| DynamoDB tables CDK (4 tables, on-demand billing, schemas) | DevOps | Infrastructure |
| S3 Object Lock for audit bucket (COMPLIANCE mode, 7-year) | Tech Lead | Compliance-critical config |
| Cross-account S3 bucket policy (SD-WAN account → IOR account read) | DevOps | Per meeting decision (Mar 26) |
| IAM roles: Lambda execution, ECS task role, S3 access policies | Tech Lead | Security-sensitive |
| Task | Owner | Rationale |
|---|---|---|
Write ingest-product-weights Lambda template (ZFMIOR002 → S3) | Tech Lead | First Lambda = the pattern |
| SQS Dead Letter Queue + 3x retry → SNS alert | DevOps | Reliability infrastructure |
| EventBridge schedules (daily for ZEMM07/ZSDR002, weekly for rest) | DevOps | Scheduling infra |
| Replicate pattern for remaining 6 ingestion Lambdas | Full-Stack | Follows established template |
| Unit tests for all 7 Lambdas (mock SAP responses) | Full-Stack | Quality & regression coverage |
| Task | Owner | Rationale |
|---|---|---|
etl-product-master (merge ZFMIOR002+ZEMM005, derived weight fields) | Tech Lead | Core business formulas |
etl-hts-reference (cross-reference, rate range validation) | Tech Lead | Domain knowledge required |
etl-transactions (parse ZSDR002/ZEMM004 invoice data) | Tech Lead | Complex field mapping |
| Schema validation framework (shared across all ETL) | Tech Lead | Quality standard |
etl-packaging (simple normalize — material + isPackaging) | Full-Stack | Simplest ETL, follows pattern |
| SNS quality alert integration (wire to ETL errors) | DevOps | Alerting infrastructure |
| Tests: validate ETL output matches current CSV data | Full-Stack | Regression testing |
| Task | Owner | Rationale |
|---|---|---|
| Shadow mode flag (pipeline writes DynamoDB, app still reads CSVs) | Tech Lead | Architecture toggle |
| Validation script: DynamoDB ProductMaster vs Product_List.csv | Full-Stack | Pandas comparison |
| Validation script: HTSReference vs HTS_Code.csv | Full-Stack | Same pattern |
| Validation script: TariffSequencing vs HTS Tariff.csv | Full-Stack | Same pattern |
| Validation script: PackagingMaterials vs Packaging_Material.csv | Full-Stack | Same pattern |
| Fix data mapping mismatches found during validation | Tech Lead | Domain knowledge |
| Document all field mapping adjustments | Full-Stack | Documentation |
| Task | Owner | Rationale |
|---|---|---|
Refactor data_loader.py: DynamoDB reads + TTL caching + fallback | Tech Lead | Core application change |
Add GET /api/customs-lines endpoint | Full-Stack | New endpoint, spec defined |
Add POST /api/compare endpoint | Full-Stack | New endpoint, spec defined |
| Dockerfile for FastAPI app | DevOps | Containerization |
| CDK: ECS Fargate (1 vCPU/2GB), ALB, auto-scaling (1→2) | DevOps | Infrastructure |
| CDK: WAF rules on ALB | DevOps | Security infra |
| Test: processor.py identical output from DynamoDB vs CSV | Tech Lead | Critical regression |
| Task | Owner | Rationale |
|---|---|---|
| CDK: S3 + CloudFront distribution + Origin Access Control | DevOps | Hosting infrastructure |
| Build Customs Lines Dashboard page (view results by date) | Full-Stack | New React component |
| Build Comparison Report page (discrepancies, pass/fail) | Full-Stack | New React component |
| Wire dashboards to API endpoints | Full-Stack | API integration |
| UX review and design direction | Tech Lead | Design decisions |
| Code review all new components | Tech Lead | Quality gate |
| Task | Owner | Rationale |
|---|---|---|
Fix main.py: remove TESTING_MODE, fix DI chain | Tech Lead | Security-critical |
Migrate users.json → DynamoDB Users table | Tech Lead | Auth storage |
Update user_storage.py to DynamoDB | Tech Lead | Auth code path |
| Frontend: remove mock admin, uncomment auth check | Full-Stack | Bounded frontend change |
| Test login/logout for admin & operator roles | Full-Stack | QA testing |
Add credentials: 'include' to audit HTML fetches | Full-Stack | Small targeted fix |
| Task | Owner | Rationale |
|---|---|---|
generate-customs-lines Lambda (reuses processor.py) | Tech Lead | Core business logic |
| EventBridge daily trigger (after ETL completion) | DevOps | Scheduling infra |
comparison.py engine (field-by-field, tolerances) | Tech Lead | New business logic |
| Comparison Lambda (chained after generation) | Tech Lead | Orchestration logic |
| SNS discrepancy alerts | DevOps | Alerting infra |
| Integration tests: generate → compare → report | Full-Stack | End-to-end test scripts |
| Comparison report UI component | Full-Stack | Frontend display |
| Task | Owner | Rationale |
|---|---|---|
| CloudWatch dashboards CDK (pipeline health, data freshness) | DevOps | Monitoring infrastructure |
| CloudWatch alarms (ETL fail, SAP down, comparison fail rate) | DevOps | Alerting infrastructure |
| SNS → email/Slack alert routing | DevOps | Notification wiring |
| Parallel run: old system alongside new (1-2 weeks) | Tech Lead Full-Stack | Both monitor |
| Validate output parity: manual path = automated path | Tech Lead | Business sign-off |
| Production runbook (restart, rollback, health checks) | DevOps | Ops documentation |
| Cutover decision and execution | Tech Lead | Tech lead call |
| Verify manual upload fallback post-cutover | Full-Stack | Regression test |
| Ticket | Tech Lead | Full-Stack | DevOps |
|---|---|---|---|
| T1 Infra + CI/CD | 25% | — | 75% |
| T2 Storage Layer | 20% | — | 80% |
| T3 Ingestion Lambdas | 30% | 50% | 20% |
| T4 ETL Processing | 65% | 25% | 10% |
| T5 Shadow Validation | 20% | 80% | — |
| T6 Backend → ECS | 45% | 20% | 35% |
| T7 Frontend → CF | 15% | 70% | 15% |
| T8 Auth Fix | 60% | 40% | — |
| T9 Auto-Gen + Compare | 65% | 25% | 10% |
| T10 Monitor + Cutover | 30% | 15% | 55% |
Data volume is tiny (~73 products, ~67 packaging, ~21 HTS rates). Lambda + pandas is 35x cheaper than Glue and the team already knows Python.
Lowest risk. Move users.json to DynamoDB now. Defer Cognito/Entra ID until M365 integration decision is final.
| Risk | Mitigation |
|---|---|
| SAP team delays on 6 APIs | Build against mocks (T3-T4). Swap endpoint URLs when APIs ship. |
| Cross-account S3 access complexity | Agreed in Mar 26 meeting: bucket policy grants read to IOR account. Test in T2. |
| Auth migration breaks things | T8 is isolated. Test in staging before merging. |
| Processor output differs between CSV and DynamoDB | Shadow mode (T5) catches this before any cutover. |
| GitHub Actions → GitLab migration mid-project | Per meeting: start with GH Actions now. GitLab migration is optional in ~1 month. |
| Scope creep from Document Distribution Pipeline | Separate project. Do not bundle into these 10 tickets. |