Skip to content

Commit 3f1ac36

Browse files
committed
Add enhancement proposal for egress flow in OpenShift
1 parent e16826d commit 3f1ac36

File tree

2 files changed

+155
-0
lines changed

2 files changed

+155
-0
lines changed
Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
---
2+
title: egress-flow
3+
authors:
4+
- "@bnshr"
5+
reviewers:
6+
- "@trozet"
7+
- "@danwinship"
8+
- "@msherif1234"
9+
approvers:
10+
- "@trozet"
11+
- "@danwinship"
12+
api-approvers:
13+
- None
14+
creation-date: 2025-11-10
15+
last-updated: 2025-11-12
16+
status: implementable
17+
---
18+
19+
20+
# Communication egress flows matrix of OpenShift and Operators
21+
22+
## Summary
23+
24+
This enhancement allows to automatically generate the communication network communication in the
25+
product documentation for all egress flows of OpenShift (multi-node and
26+
single-node deployments) and Operators.
27+
28+
## Motivation
29+
30+
Security-conscious customers need OpenShift flows matrix for regulatory reasons
31+
and/or to implement firewall rules to restrict traffic to the minimum set of
32+
required flows only, on-node firewall or external.
33+
34+
### User Stories
35+
36+
- As an OpenShift cluster administrator, I want documentation on the expected
37+
flows of traffic outgoing from to every OpenShift installation so I can set up
38+
firewall rules such as nftables, NGFW, etc. to restrict traffic to the
39+
minimum required set of flows only.
40+
41+
### Goals
42+
43+
- Provide a mechanism to automatically generate an accurate and up-to-date
44+
OpenShift communication egress flows matrix.
45+
46+
- Keep the egress flow matrix documented in OpenShift release documents
47+
updated and validate it.
48+
49+
### Non-Goals
50+
N/A
51+
52+
## Proposal
53+
54+
We propose to leverage OpenShift Network Observability Operator to collect the egress communication from the cluster to the outside world.
55+
56+
- A communication matrix describing the expected flows of outgoing traffic will
57+
be included in every OpenShift release documentation.
58+
59+
### Workflow Description
60+
61+
An OpenShift administrator would like to get an accurate and up-to-date OpenShift
62+
communication egress flows matrix.
63+
64+
- The admin reviews OpenShift release documentation to get the included communication
65+
matrix describing the expected flows of outgoing traffic.
66+
67+
### API Extensions
68+
N/A
69+
70+
### Topology Considerations
71+
72+
#### Hypershift / Hosted Control Planes
73+
Out of scope for this proposal.
74+
75+
#### Standalone Clusters
76+
77+
The communication matrix can be generated on standalone clusters.
78+
79+
#### Single-node Deployments or MicroShift
80+
81+
The communication matrix can be generated on single-node deployments and MicroShift.
82+
83+
### Implementation Details/Notes/Constraints
84+
85+
1. OpenShift CI installs the Network Observability Operator in the cluster in test.
86+
2. Through eBPF agent of the Network Observability Operator, the egress network data are collected. The data is aggregated through Loki. The retention of the flow logs in the Loki kept for 24 hours. `FlowCollector` is adjusted to capture all data with sampling rate 1.
87+
3. CI job would run OpenShift tests to track any special flow that generates outgoing flow within the cluster.
88+
4. The start and end time of the test result are captured and then we filter the Loki aggregated egress flow to process it.
89+
5. The data processing would filter out only egress data from the OpenShift operators.
90+
91+
**Basic Loki query**
92+
93+
```{K8S_FlowLayer="infra", FlowDirection="1"} | json | DstSubnetLabel = "" | SrcSubnetLabel = "Pods" | line_format "{{.SrcAddr}},{{.SrcPort}},{{.DstAddr}},{{.DstPort}}" ```
94+
95+
This query would be readjusted to find the Operators that are generating the egress flow.
96+
97+
98+
#### Architecture
99+
100+
![Alt text](./images/egress.drawio.svg)
101+
102+
103+
104+
105+
### Risks and Mitigations
106+
107+
1. Having the sampling rate for flow capture may hit the peformance issue.
108+
2. The small size Loki (1x.small) in the installed Loki may impose the risk of storage issue.
109+
3. Loki could be down and hence debugging is necessary and data loss can occur. However, rerun of CI job is required in that case.
110+
111+
### Drawbacks
112+
N/A
113+
114+
## Open Questions
115+
116+
1. What should be reporting strategy once we get the egress data report?
117+
2. Should we automate the reporting the teams? If yes, how?
118+
3. Do we need persistent storage for Loki and storage in the Cloud (maybe in AWS)?
119+
120+
## Test Plan
121+
122+
- E2E tests will be added to `openshift-tests`
123+
- Validate an up-to-date generated egress flow matches the
124+
one documented in OpenShift release documents
125+
126+
127+
## Graduation Criteria
128+
129+
### Dev Preview -> Tech Preview
130+
N/A
131+
132+
### Tech Preview -> GA
133+
N/A
134+
135+
### Removing a deprecated feature
136+
N/A
137+
138+
## Upgrade / Downgrade Strategy
139+
N/A
140+
141+
## Version Skew Strategy
142+
N/A
143+
144+
## Operational Aspects of API Extensions
145+
N/A
146+
147+
## Support Procedures
148+
N/A
149+
150+
## Alternatives (Not Implemented)
151+
N/A

0 commit comments

Comments
 (0)