Back to blog
Research

Towards Foundation Models for Network Traffic: A Graph-Based Approach

ResearchFoundation ModelsFew-shot LearningGNN

Foundation models have revolutionized AI. GPT showed that a single pre-trained model could excel at countless NLP tasks. Vision transformers did the same for images. Now we're asking: can we build foundation models for network security? Our research proposes a graph-based framework that represents network traffic as dynamic spatio-temporal graphs, enabling few-shot learning across diverse security tasks with an average 6.87% improvement over training from scratch.

The Foundation Model Paradigm

The AI revolution of the 2020s was powered by a simple but profound insight: pre-train once on massive data, then fine-tune for specific tasks.

GPT wasn't trained to answer questions—it was trained to predict the next word in text. But in doing so, it learned language deeply enough to excel at translation, summarization, coding, and countless other tasks with minimal additional training.

This paradigm shift has transformed every domain it touches. But network security has largely missed this revolution.

Why hasn't network security had its "GPT moment"?

The standard approach has been to tokenize network packets—converting bytes to tokens and applying transformer architectures borrowed from NLP. But network traffic isn't like text:

1. Relational structure matters: Network traffic is fundamentally about relationships between entities, not sequences of tokens.

2. Temporal dynamics are critical: Attacks unfold over time in complex patterns that linear sequence models struggle to capture.

3. Scale is different: Modern networks generate billions of flows daily. Processing at the packet level is computationally prohibitive.

Our research proposes a fundamentally different approach: graphs, not tokens.

Network Traffic as Dynamic Graphs

Our framework represents network traffic as a dynamic spatio-temporal graph:

Nodes: Network entities—IP addresses, devices, services. Each node has features describing its characteristics and behavior.

Edges: Communications between entities. Edge features capture communication properties: bytes, duration, protocols, timing.

Temporal dynamics: The graph evolves over time as new communications occur and old ones fade. Our representation captures both the current state and how it evolved.

Why operate at the flow level?

Previous graph approaches often worked at the packet level—every packet is a node or edge. This creates enormous graphs that are expensive to process.

We operate at the flow level: aggregated summaries of communications between entity pairs. This reduces graph size by orders of magnitude while preserving the information needed for security analysis.

A typical network might generate millions of packets per minute but only thousands of flows. Flow-level representation makes foundation model scale feasible.

Self-Supervised Pre-training for Networks

The key to foundation models is effective pre-training objectives—tasks that force the model to learn useful representations without labeled data.

For language models, this was next-token prediction. For our network foundation model, we use link prediction: given a partial view of the network graph, predict which communications will occur.

Why link prediction works:

To predict whether two entities will communicate, the model must understand:
- What types of entities they are (servers, workstations, IoT devices)
- Their normal communication patterns
- The broader network topology and communication norms
- Temporal patterns (what happens at this time of day?)

This single objective forces the model to learn comprehensive network behavior representations.

Training procedure:

1. Mask random edges from the network graph
2. Train the model to predict which edges were masked
3. Use contrastive learning to distinguish real edges from fake ones
4. Incorporate temporal context through recurrent processing

After pre-training on diverse network data, the model has learned "network language"—general patterns that transfer across environments.

Few-Shot Learning Results

We evaluated our foundation model on three downstream tasks:

Intrusion Detection: Classifying network flows as malicious or benign.
- 50-shot accuracy: 89.3% (vs 82.1% training from scratch)
- 100-shot accuracy: 92.7% (vs 86.4% from scratch)

Traffic Classification: Identifying application types from flow characteristics.
- 50-shot accuracy: 84.6% (vs 77.8% from scratch)
- 100-shot accuracy: 88.2% (vs 82.1% from scratch)

Botnet Detection: Identifying command-and-control traffic patterns.
- 50-shot accuracy: 91.2% (vs 83.9% from scratch)
- 100-shot accuracy: 94.1% (vs 88.3% from scratch)

Average improvement: 6.87% with dramatically reduced labeling requirements.

More importantly, these results held across different network environments. A model pre-trained on enterprise networks improved performance on industrial control systems, healthcare networks, and IoT environments—even though these look very different at the surface level.

Comparison with Token-Based Approaches

Several research efforts have applied transformer architectures to network traffic by tokenizing packets—essentially treating network data like text.

Our graph-based approach differs fundamentally:

Structural representation: Token-based models flatten network structure into sequences. Relationships between entities become implicit, buried in token patterns. Graph models represent structure explicitly, making relationships first-class objects.

Computational efficiency: Token-based approaches must process every packet. Graph models operate on aggregated flows, reducing computation by 10-100x while maintaining detection accuracy.

Scalability: Enterprise networks generate terabytes of packet data daily. Graph representations compress this to manageable sizes without losing security-relevant information.

Interpretability: Detections on graphs map directly to network entities and relationships. Token-based detections require additional work to identify which entities are involved.

We believe graphs are the right abstraction for network security—they match how practitioners think about networks and how attacks actually unfold.

Implications for Security Operations

A true foundation model for network security would transform how detection systems are built and deployed:

Reduced data requirements: Instead of collecting thousands of labeled examples for each network and attack type, security teams could achieve strong detection with dozens of examples.

Faster deployment: Pre-trained models arrive already understanding network behavior. Site-specific fine-tuning takes hours, not months.

Better generalization: Models that understand general network patterns transfer better to new environments and previously unseen attack types.

Unified framework: A single foundation model could power intrusion detection, traffic classification, anomaly detection, and threat hunting—currently separate systems that don't share knowledge.

Continuous improvement: As the foundation model is updated with new data, all downstream applications benefit without individual retraining.

This is the vision we're working toward at Hypergraph: not just point solutions, but foundational AI infrastructure for network security.

Current Limitations and Future Work

Our research is a step toward foundation models for network security, not the final destination:

Scale: True foundation models require massive pre-training data. We've demonstrated the approach works; scaling to foundation model size is ongoing.

Multi-modal data: Networks generate more than flows—logs, alerts, endpoint telemetry. Incorporating these modalities would create richer representations.

Adversarial robustness: Attackers may attempt to evade detection by manipulating how their traffic appears in graph representations. Adversarial training and robustness testing are active research areas.

Standardization: Different networks use different addressing schemes, protocols, and conventions. Handling this heterogeneity at foundation model scale requires careful normalization.

Despite these challenges, we believe the graph-based foundation model paradigm is the future of ML for network security.

The Path Forward

Foundation models transformed NLP and computer vision by learning deep representations from massive unlabeled data. Our research demonstrates that the same paradigm can work for network security—with graphs as the representation rather than tokens.

The results are promising: 6.87% average improvement with dramatically reduced labeling requirements, and genuine cross-network transfer learning. This is early evidence that foundation models for network security are achievable.

At Hypergraph, we're building on this research to create practical, deployable foundation models for enterprise network security. Read the full research paper for technical details, or contact us to discuss how these capabilities could benefit your security operations.

Paper Citation: Van Langendonck, L., Castell-Uroz, I., & Barlet-Ros, P. (2024). Towards a graph-based foundation model for network traffic analysis. arXiv:2409.08111