Skip to content

Commit 2aeca19

Browse files
committed
Add enhancement proposal for egress flow in OpenShift
1 parent e16826d commit 2aeca19

File tree

2 files changed

+157
-0
lines changed

2 files changed

+157
-0
lines changed
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
---
2+
title: egress-flow
3+
authors:
4+
- "@bnshr"
5+
reviewers:
6+
- "@trozet"
7+
- "@danwinship"
8+
- "@msherif1234"
9+
approvers:
10+
- "@trozet"
11+
- "@danwinship"
12+
api-approvers:
13+
- None
14+
creation-date: 2025-11-10
15+
last-updated: 2025-11-12
16+
tracking-link:
17+
- https://issues.redhat.com/browse/CNF-14073
18+
status: implementable
19+
---
20+
21+
22+
# Communication egress flows matrix of OpenShift and Operators
23+
24+
## Summary
25+
26+
This enhancement allows to automatically generate the communication network communication in the
27+
product documentation for all egress flows of OpenShift (multi-node and
28+
single-node deployments) and Operators.
29+
30+
## Motivation
31+
32+
Security-conscious customers need OpenShift flows matrix for regulatory reasons
33+
and/or to implement firewall rules to restrict traffic to the minimum set of
34+
required flows only, on-node firewall or external.
35+
36+
### User Stories
37+
38+
- As an OpenShift cluster administrator, I want documentation on the expected
39+
flows of traffic outgoing from to every OpenShift installation so I can set up
40+
firewall rules such as nftables, NGFW, etc. to restrict traffic to the
41+
minimum required set of flows only.
42+
43+
### Goals
44+
45+
- Provide a mechanism to automatically generate an accurate and up-to-date
46+
OpenShift communication egress flows matrix.
47+
48+
- Keep the egress flow matrix documented in OpenShift release documents
49+
updated and validate it.
50+
51+
### Non-Goals
52+
N/A
53+
54+
## Proposal
55+
56+
We propose to leverage OpenShift Network Observability Operator to collect the egress communication from the cluster to the outside world.
57+
58+
- A communication matrix describing the expected flows of outgoing traffic will
59+
be included in every OpenShift release documentation.
60+
61+
### Workflow Description
62+
63+
An OpenShift administrator would like to get an accurate and up-to-date OpenShift
64+
communication egress flows matrix.
65+
66+
- The admin reviews OpenShift release documentation to get the included communication
67+
matrix describing the expected flows of outgoing traffic.
68+
69+
### API Extensions
70+
N/A
71+
72+
### Topology Considerations
73+
74+
#### Hypershift / Hosted Control Planes
75+
Out of scope for this proposal.
76+
77+
#### Standalone Clusters
78+
79+
The communication matrix can be generated on standalone clusters.
80+
81+
#### Single-node Deployments or MicroShift
82+
83+
The communication matrix can be generated on single-node deployments and MicroShift.
84+
85+
### Implementation Details/Notes/Constraints
86+
87+
1. OpenShift CI installs the Network Observability Operator in the cluster in test.
88+
2. Through eBPF agent of the Network Observability Operator, the egress network data are collected. The data is aggregated through Loki. The retention of the flow logs in the Loki kept for 24 hours. `FlowCollector` is adjusted to capture all data with sampling rate 1.
89+
3. CI job would run OpenShift tests to track any special flow that generates outgoing flow within the cluster.
90+
4. The start and end time of the test result are captured and then we filter the Loki aggregated egress flow to process it.
91+
5. The data processing would filter out only egress data from the OpenShift operators.
92+
93+
**Basic Loki query**
94+
95+
```{K8S_FlowLayer="infra", FlowDirection="1"} | json | DstSubnetLabel = "" | SrcSubnetLabel = "Pods" | line_format "{{.SrcAddr}},{{.SrcPort}},{{.DstAddr}},{{.DstPort}}" ```
96+
97+
This query would be readjusted to find the Operators that are generating the egress flow.
98+
99+
100+
#### Architecture
101+
102+
![Alt text](./images/egress.drawio.svg)
103+
104+
105+
106+
107+
### Risks and Mitigations
108+
109+
1. Having the sampling rate for flow capture may hit the peformance issue.
110+
2. The small size Loki (1x.small) in the installed Loki may impose the risk of storage issue.
111+
3. Loki could be down and hence debugging is necessary and data loss can occur. However, rerun of CI job is required in that case.
112+
113+
### Drawbacks
114+
N/A
115+
116+
## Open Questions
117+
118+
1. What should be reporting strategy once we get the egress data report?
119+
2. Should we automate the reporting the teams? If yes, how?
120+
3. Do we need persistent storage for Loki and storage in the Cloud (maybe in AWS)?
121+
122+
## Test Plan
123+
124+
- E2E tests will be added to `openshift-tests`
125+
- Validate an up-to-date generated egress flow matches the
126+
one documented in OpenShift release documents
127+
128+
129+
## Graduation Criteria
130+
131+
### Dev Preview -> Tech Preview
132+
N/A
133+
134+
### Tech Preview -> GA
135+
N/A
136+
137+
### Removing a deprecated feature
138+
N/A
139+
140+
## Upgrade / Downgrade Strategy
141+
N/A
142+
143+
## Version Skew Strategy
144+
N/A
145+
146+
## Operational Aspects of API Extensions
147+
N/A
148+
149+
## Support Procedures
150+
N/A
151+
152+
## Alternatives (Not Implemented)
153+
N/A

0 commit comments

Comments
 (0)