Architecture Overview
This page gives a high-level overview of the DeepFix architecture: the main components, how they interact, and the guiding design principles.
System Overview
DeepFix is a distributed system for AI-powered ML artifact analysis. It follows a client–server architecture that separates artifact computation from intelligent analysis.

Core Components
DeepFix SDK (Client)
The SDK is responsible for:
- Artifact computation (datasets, checks, metrics)
- Artifact recording in MLflow
- Workflow integration (PyTorch Lightning, ML pipelines)
- Client communication with the DeepFix server
Location: deepfix-sdk/
See also: SDK API Reference.
DeepFix Server
The server is responsible for:
- Running specialized analysis agents
- Querying the knowledge base
- Synthesizing and returning results
Location: deepfix-server/
See also: Server Architecture and Server API Reference.
DeepFix Core
Shared models and types:
- Data models:
APIRequest,APIResponse, artifact models - Type definitions: data types, artifact paths, enums
Location: deepfix-core/
See also: Core API Reference.
Knowledge Base
Stores best practices and domain knowledge:
- Architecture best practices
- Data quality best practices
- Training best practices
Location: deepfix-kb/, documents/
Architecture Principles
Separation of Concerns
- Client handles computation and workflow integration.
- Server focuses on AI-powered analysis and reasoning.
- Clear boundaries between SDK, Server, and Core.
Stateless Server
- No session state between requests.
- Enables horizontal scaling and simpler deployment.
Artifact Storage
- MLflow is the source of truth for artifacts.
- Client generates artifacts and logs them to MLflow.
Agentic Analysis
- Specialized agents for different artifact types (datasets, deepchecks, checkpoints, training).
- Parallel agent execution where possible.
- Cross-artifact reasoning for holistic insights.
Local-First Design
- Designed for local deployment on a single machine.
- Can scale out to cloud and container deployments.
- Minimal external dependencies.
Data Flow
Analysis Request Flow

Agent Execution Flow
AnalyseArtifactsAPI
↓
AgentContext (decode request)
↓
ArtifactAnalysisCoordinator
↓
┌─────────────────────────────────────┐
│ Parallel Agent Execution │
│ - DatasetArtifactsAnalyzer │
│ - DeepchecksArtifactsAnalyzer │
│ - ModelCheckpointArtifactsAnalyzer │
│ - TrainingArtifactsAnalyzer │
└─────────────────────────────────────┘
↓
CrossArtifactReasoningAgent (sequential)
↓
Synthesize results
↓
APIResponse
Technology Stack
Client (SDK)
- Language: Python 3.11+
- Key libraries:
requestsfor HTTP communicationmlflowfor artifact trackingpydanticfor data validation
Server
- Language: Python 3.11+
- Framework: FastAPI (via LitServe)
- Key libraries:
dspyfor LLM orchestrationlitservefor servingpydantic v2for validationllama-index-retrievers-bm25for retrieval
Core
- Language: Python 3.11+
- Key libraries:
pydanticfor data models
Communication Protocol
REST API
- Protocol: HTTP/HTTPS
- Format: JSON
- Main endpoint:
POST /v1/analyse
Example Request:
{
"dataset_name": "my-dataset",
"dataset_artifacts": {},
"deepchecks_artifacts": {},
"model_checkpoint_artifacts": {},
"training_artifacts": {},
"language": "english"
}
Example Response:
{
"agent_results": {},
"summary": "Cross-artifact summary",
"additional_outputs": {},
"error_messages": {}
}
Deployment Architecture
Local Deployment
┌─────────────────────────────────────┐
│ Local Machine │
│ │
│ ┌──────────┐ ┌──────────────┐ │
│ │ Client │───▶│ Server │ │
│ └──────────┘ └──────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────────┐ │
│ │ MLflow │ │ Knowledge KB │ │
│ └──────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────┘
Docker / Compose Deployment
See Docker Deployment for details on running DeepFix in containers alongside MLflow.
Design Decisions
Why Client–Server?
- Scalability: independently scale analysis.
- Separation: clear boundary between computation and analysis.
- Flexibility: SDK can work in offline or degraded mode.
Why MLflow for Artifacts?
- Standardized artifact storage and tracking.
- Integration with existing ML workflows.
- Versioning and reproducibility.
Why Agentic Architecture?
- Specialization per artifact type.
- Easy to add new agents.
- Parallelizable execution.