Skip to content

Commit 674fed2

Browse files
Merge branch 'main' into knewbury01/fix-service-handler
2 parents 22fb705 + 64efaff commit 674fed2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+14687
-3245
lines changed
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
## What This PR Contributes
2+
3+
<!-- Explain in Markdown bullet points what is covered in this PR:
4+
1. Organize the bullet points in a reasonable level of hierarchy, and
5+
2. Be as EXHAUSTIVE as possible. -->
6+
7+
## Future Works
8+
9+
<!-- Explain in Markdown bullet points what is OUT OF SCOPE of this PR.
10+
Also organize them with bullet points in a reasonable of hierarchy. -->

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,3 +71,5 @@ tmp/
7171
**.testproj
7272
dbs
7373
*.cds.json
74+
.cds-extractor-cache
75+

extractors/README.md

Lines changed: 32 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -30,16 +30,20 @@ pre-finalize.sh`"]
3030
JSE[[javascript extractor]]
3131
DTRAC[codeql database<br>trace-command]
3232
SPF[[pre-finalize.sh]]
33-
DIDX[codeql database index-files<br> --language=cds<br>--include-extension=.cds]
34-
SIF[[index-files.sh]]
35-
SIT[[index-files.ts/js]]
36-
NPM[[npm install & build]]
37-
DETS[[Determine CDS command]]
38-
FIND[[Find package.json dirs]]
39-
INST[[Install dependencies]]
40-
CC[[cds compiler]]
33+
ABCMD[[autobuild.sh/cmd]]
34+
ABT[[cds-extractor.ts/js]]
35+
ENV[[setup & validate<br>environment]]
36+
PDG[[build project<br>dependency graph]]
37+
INSTC[[install dependencies<br>with caching]]
38+
PROC[[process CDS files<br>to JSON]]
39+
PMAP[[project-aware<br>dependency resolution]]
40+
FIND[[find project for<br>CDS file]]
41+
CDCMD[[determine CDS<br>command for project]]
42+
COMP[[compile CDS<br>to JSON]]
4143
CDJ([.cds.json files])
44+
FILT[[configure LGTM<br>index filters]]
4245
JSA[[javascript extractor<br>autobuild script]]
46+
DIAG[[add compilation<br>diagnostics]]
4347
TF([CodeQL TRAP files])
4448
DBF[codeql database finalize<br> -- /path/to/database]
4549
@@ -54,20 +58,30 @@ pre-finalize.sh`"]
5458
JSE ==> |run autobuild within<br>the javascript extractor| DTRAC
5559
5660
DTRAC ==> |run the build --command| SPF
57-
SPF ==> |run codeql index-files<br>for CDS files| DIDX
58-
DIDX ==> |invoke script via<br>--search-path| SIF
59-
SIF ==> |runs TypeScript version<br>after npm install| NPM
60-
NPM ==> |executes compiled<br>index-files.js| SIT
61+
SPF ==> |run autobuilder<br>for CDS files| ABCMD
62+
ABCMD ==> |runs TypeScript version<br>of CDS extractor| ABT
6163
62-
SIT ==> |finds project directories<br>with package.json| FIND
63-
FIND ==> |install CDS dependencies<br>in project directories| INST
64-
SIT ==> |determines which<br>cds command to use| DETS
65-
DETS ==> |processes each CDS file| CC
64+
ABT ==> |setup and validate<br>environment first| ENV
65+
ABT ==> |build project dependency<br>graph for source root| PDG
66+
PDG ==> |analyze CDS projects<br>structure & relationships| PMAP
67+
68+
ABT ==> |efficiently install<br>required dependencies| INSTC
69+
INSTC ==> |use cached approach for<br>dependency installation| PMAP
70+
71+
ABT ==> |process each CDS file<br>to generate JSON files| PROC
72+
PROC ==> |find which project<br>contains this CDS file| FIND
73+
FIND ==> |uses project-aware<br>dependency resolution| PMAP
74+
FIND ==> |determine appropriate<br>CDS command for project| CDCMD
75+
76+
CDCMD ==> |compile CDS file to JSON<br>with project context| COMP
77+
COMP ==> |generate JSON representation<br>with project awareness| CDJ
78+
COMP --x |if compilation fails,<br>report diagnostics| DIAG
79+
DIAG -.-> |diagnostics stored<br>in database| DB
6680
67-
CC ==> |compile .cds files to<br>create .cds.json files| CDJ
6881
CDJ -.-> |stored in same location<br>as original .cds files| DB
6982
70-
SIT ==> |configures extraction<br>filters for JSON files| JSA
83+
ABT ==> |configure extraction<br>filters for JSON files| FILT
84+
ABT ==> |run JavaScript extractor<br>to process JSON files| JSA
7185
JSA ==> |processes .cds.json files<br>via javascript extractor| CDJ
7286
7387
CDJ ==> |javascript extractor<br>generates TRAP files| TF

extractors/cds/tools/.gitignore

Lines changed: 0 additions & 12 deletions
This file was deleted.

extractors/cds/tools/README.md

Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
# CodeQL CDS Extractor
2+
3+
A robust CodeQL extractor for [Core Data Services (CDS)][CDS] files used in [SAP Cloud Application Programming (CAP)][CAP] model projects. This extractor processes `.cds` files and compiles them into `.cds.json` files for CodeQL analysis while maintaining project-aware parsing and dependency resolution.
4+
5+
## Overview
6+
7+
The CodeQL CDS extractor is designed to efficiently process CDS projects by:
8+
9+
- **Project-Aware Processing**: Analyzes CDS files as related project configurations rather than independent definitions
10+
- **Optimized Dependency Management**: Caches and reuses `@sap/cds` and `@sap/cds-dk` dependencies across projects
11+
- **Enhanced Precision**: Reduces false-positives in CodeQL queries by understanding cross-file relationships
12+
- **Performance Optimization**: Avoids duplicate processing and unnecessary dependency installations
13+
14+
## Architecture
15+
16+
The extractor uses an `autobuild` approach with the following key components:
17+
18+
### Core Components
19+
20+
- **`cds-extractor.ts`**: Main entry point that orchestrates the extraction process
21+
- **`src/cds/parser/`**: CDS project discovery and dependency graph building
22+
- **`src/cds/compiler/`**: Compilation orchestration and `.cds.json` generation
23+
- **`src/packageManager/`**: Dependency installation and caching
24+
- **`src/logging/`**: Unified logging and performance tracking
25+
- **`src/environment.ts`**: Environment setup and validation
26+
- **`src/codeql.ts`**: CodeQL JavaScript extractor integration
27+
28+
### Extraction Process
29+
30+
1. **Environment Setup**: Validates CodeQL tools and system requirements
31+
2. **Project Discovery**: Recursively scans for CDS projects and builds dependency graph
32+
3. **Dependency Management**: Installs and caches required CDS compiler dependencies
33+
4. **CDS Compilation**: Compiles `.cds` files to `.cds.json` using project-aware compilation
34+
5. **JavaScript Extraction**: Runs CodeQL's JavaScript extractor on source and compiled files
35+
36+
## Usage
37+
38+
### Prerequisites
39+
40+
- Node.js (accessible via `node` command)
41+
- CodeQL CLI tools
42+
- SAP CDS projects with `.cds` files
43+
44+
### Running the Extractor
45+
46+
The extractor is typically invoked by CodeQL during database creation:
47+
48+
```bash
49+
codeql database create --language=cds --source-root=/path/to/project my-database
50+
```
51+
52+
### Manual Execution
53+
54+
For development and testing purposes:
55+
56+
```bash
57+
# Build the extractor
58+
npm run build
59+
60+
# Run directly (from project source root)
61+
node dist/cds-extractor.js /path/to/source/root
62+
```
63+
64+
## Development
65+
66+
### Project Structure
67+
68+
```text
69+
extractors/cds/tools/
70+
├── cds-extractor.ts # Main entry point
71+
├── src/ # Source code modules
72+
│ ├── cds/ # CDS-specific functionality
73+
│ │ ├── compiler/ # Compilation orchestration
74+
│ │ └── parser/ # Project discovery and parsing
75+
│ ├── logging/ # Logging and performance tracking
76+
│ ├── packageManager/ # Dependency management
77+
│ ├── codeql.ts # CodeQL integration
78+
│ ├── diagnostics.ts # Error reporting
79+
│ ├── environment.ts # Environment setup
80+
│ ├── filesystem.ts # File system utilities
81+
│ └── utils.ts # General utilities
82+
├── test/ # Test suites
83+
├── dist/ # Compiled JavaScript output
84+
└── package.json # Project configuration
85+
```
86+
87+
### Building
88+
89+
```bash
90+
# Install dependencies
91+
npm install
92+
93+
# Build TypeScript to JavaScript
94+
npm run build
95+
96+
# Run all checks and build
97+
npm run build:all
98+
```
99+
100+
### Testing
101+
102+
```bash
103+
# Run tests
104+
npm test
105+
106+
# Run tests with coverage
107+
npm run test:coverage
108+
109+
# Run tests in watch mode
110+
npm run test:watch
111+
```
112+
113+
### Code Quality
114+
115+
```bash
116+
# Lint TypeScript files
117+
npm run lint
118+
119+
# Auto-fix linting issues
120+
npm run lint:fix
121+
122+
# Format code
123+
npm run format
124+
```
125+
126+
## Configuration
127+
128+
### Environment Variables
129+
130+
The extractor respects several CodeQL environment variables:
131+
132+
- `CODEQL_DIST`: Path to CodeQL distribution
133+
- `CODEQL_EXTRACTOR_CDS_WIP_DATABASE`: Target database path
134+
- `LGTM_INDEX_FILTERS`: File filtering configuration
135+
136+
### CDS Project Detection
137+
138+
Projects are detected based on:
139+
140+
- Presence of `package.json` files
141+
- CDS files (`.cds`) in the project directory tree
142+
- Valid CDS dependencies (`@sap/cds`, `@sap/cds-dk`) in package.json
143+
144+
### Compilation Strategy
145+
146+
The extractor uses a sophisticated compilation approach:
147+
148+
1. **Dependency Graph Building**: Maps relationships between CDS projects
149+
2. **Smart Caching**: Reuses compiled outputs and dependency installations
150+
3. **Error Recovery**: Handles compilation failures gracefully
151+
4. **Performance Tracking**: Monitors compilation times and resource usage
152+
153+
## Performance Features
154+
155+
### Optimized Dependency Management
156+
157+
- **Shared Dependency Cache**: Single installation per unique dependency combination
158+
- **Isolated Environments**: Dependencies installed in temporary cache directories
159+
- **No Source Modification**: Original project files remain unchanged
160+
161+
### Efficient Processing
162+
163+
- **Project-Level Compilation**: Compiles related CDS files together
164+
- **Duplicate Avoidance**: Prevents redundant processing of imported files
165+
- **Memory Tracking**: Monitors and reports memory usage throughout extraction
166+
167+
### Scalability
168+
169+
- **Large Codebase Support**: Optimized for enterprise-scale CDS projects
170+
- **Parallel Processing**: Where possible, processes independent projects concurrently
171+
- **Resource Management**: Cleans up temporary files and cached dependencies
172+
173+
## Integration with `cds` CLI
174+
175+
### Installation of CDS (Node) Dependencies
176+
177+
#### Installation of `@sap/cds` and `@sap/cds-dk`
178+
179+
The CDS extractor attempts to optimize performance for most projects by caching the installation of the unique combinations of resolved CDS dependencies across all projects under a given source root.
180+
181+
The "unique combinations of resolved CDS dependencies" means that we resolve the **latest** available version **within the semantic version range** for each `@sap/cds` and `@sap/cds-dk` dependency specified in the `package.json` file for a given CAP project.
182+
183+
In practice, this means that if "project-a" requires `@sap/cds@^6.0.0` and "project-b" requires `@sap/cds@^7.0.0` while the latest available version is `@sap/[email protected]` (as a trivial example), the extractor will install `@sap/[email protected]` once and reuse it for both projects.
184+
185+
This is much faster than installing all dependencies for every project individually, especially for large projects with many CDS files. However, this approach has some limitations and trade-offs:
186+
187+
- This latest-first approach is more likely to choose the same version for multiple projects, which can reduce analysis time and can improve consistency in analysis between projects.
188+
- This approach does not read (or respect) the `package-lock.json` file, which means that we are more likely to use a `cds` version that is different from the one most recently tested/used by the project developers.
189+
- We are more likely to encounter incompatibility issues where a particular project hasn't been tested with the latest version of `@sap/cds` or `@sap/cds-dk`.
190+
191+
We can mitigate some of these issues through a (to be implemented) compilation retry mechanism for projects where some CDS compilation task(s) fail to produce the expected `.cds.json` output file(s).
192+
The proposed retry mechanism would install the full set of dependencies for the affected project(s) while respecting the `package-lock.json` file, and then re-run the compilation for the affected project(s).
193+
194+
```text
195+
TODO: retry mechanism expected before next release of the CDS extractor
196+
```
197+
198+
#### Installation of Additional Project-Specific Dependencies
199+
200+
```text
201+
TODO: implement installation of dependencies required for compilation to succeed for a given project
202+
```
203+
204+
### Integration with `cds compile` command
205+
206+
The CDS extractor uses the `cds compile` command to compile `.cds` files into `.cds.json` files, which are then processed by CodeQL's JavaScript extractor.
207+
208+
Where possible, a single `model.cds.json` file is generated for each project, containing all the compiled definitions from the project's `.cds` files. This results in a faster extraction process overall with minimal duplication of CDS code elements (e.g., annotations, entities, services, etc.) within the CodeQL database created from the extraction process.
209+
210+
Where project-level compilation is not possible (e.g., due to project structure), the extractor generates individual `.cds.json` files for each `.cds` file in the project. The main downside to this approach is that if one `.cds` file imports another `.cds` file, the imported definitions will be duplicated in the CodeQL database, which can lead to false positives in queries that expect unique definitions.
211+
212+
```text
213+
TODO: use the unique (session) ID of the CDS extractor run to as the `<session>` part of `<basename>.<session>.cds.json` and set JS extractor env vars to only extractor `.<session>.cds.json` files
214+
```
215+
216+
### Integration with `cds env` command
217+
218+
The current version of the CDS extractor expects CAP projects to follow the [default project structure][CAP-project-structure], particularly regarding the names of the (`app`, `db`, & `srv`) subdirectories in which the extractor will look for `.cds` files to process (in addition to the root directory of the project).
219+
220+
The proposed solution will use the `cds env` command to discover configurations that affect the structure of the project and/or the expected "compilation tasks" for the project, such as any user customization of environment configurations such as:
221+
222+
- `cds.folders.app`
223+
- `cds.folders.db`
224+
- `cds.folders.srv`
225+
226+
```text
227+
TODO : add support for integration with `cds env` CLI command as a means of consistently getting configurations for CAP projects
228+
```
229+
230+
## Integration with `codeql` CLI
231+
232+
### File Processing
233+
234+
The extractor processes both:
235+
236+
- **Source Files**: Original `.cds` files for source code analysis
237+
- **Compiled Files**: Generated `.cds.json` files for semantic analysis
238+
239+
### Database Population
240+
241+
- Integrates with CodeQL's JavaScript extractor for final database population
242+
- Maintains proper file relationships and source locations
243+
- Supports CodeQL's standard indexing and filtering mechanisms
244+
245+
## Troubleshooting
246+
247+
### Common Issues
248+
249+
1. **Missing Node.js**: Ensure `node` command is available in PATH
250+
2. **CDS Dependencies**: Verify projects have valid `@sap/cds` dependencies
251+
3. **Compilation Failures**: Check CDS syntax and cross-file references
252+
4. **Memory Issues**: Monitor memory usage for very large projects
253+
254+
### Debugging
255+
256+
The extractor provides comprehensive logging:
257+
258+
- **Performance Tracking**: Times for each extraction phase
259+
- **Memory Usage**: Memory consumption at key milestones
260+
- **Error Reporting**: Detailed error messages with context
261+
- **Project Discovery**: Information about detected CDS projects
262+
263+
### Log Levels
264+
265+
- `info`: General progress and milestone information
266+
- `warn`: Non-critical issues that don't prevent extraction
267+
- `error`: Critical failures that may affect extraction quality
268+
269+
## References
270+
271+
- [SAP Cloud Application Programming Model][CAP]
272+
- [Default Structure of a CAP Project][CAP-project-structure]
273+
- [Core Data Services (CDS)][CDS]
274+
- [Project-Specific Configurations][CDS-ENV-project-configs]
275+
- [Conceptual Definition Language (CDL)][CDL]
276+
- [CodeQL Documentation](https://codeql.github.com/docs/)
277+
278+
[CAP]: https://cap.cloud.sap/docs/about/
279+
[CAP-project-structure]: https://cap.cloud.sap/docs/get-started/#project-structure
280+
[CDS]: https://cap.cloud.sap/docs/cds/
281+
[CDS-ENV-project-configs]: https://cap.cloud.sap/docs/node.js/cds-env#project-specific-configurations
282+
[CDL]: https://cap.cloud.sap/docs/cds/cdl

0 commit comments

Comments
 (0)