|
| 1 | +# CodeQL CDS Extractor |
| 2 | + |
| 3 | +A robust CodeQL extractor for [Core Data Services (CDS)][CDS] files used in [SAP Cloud Application Programming (CAP)][CAP] model projects. This extractor processes `.cds` files and compiles them into `.cds.json` files for CodeQL analysis while maintaining project-aware parsing and dependency resolution. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The CodeQL CDS extractor is designed to efficiently process CDS projects by: |
| 8 | + |
| 9 | +- **Project-Aware Processing**: Analyzes CDS files as related project configurations rather than independent definitions |
| 10 | +- **Optimized Dependency Management**: Caches and reuses `@sap/cds` and `@sap/cds-dk` dependencies across projects |
| 11 | +- **Enhanced Precision**: Reduces false-positives in CodeQL queries by understanding cross-file relationships |
| 12 | +- **Performance Optimization**: Avoids duplicate processing and unnecessary dependency installations |
| 13 | + |
| 14 | +## Architecture |
| 15 | + |
| 16 | +The extractor uses an `autobuild` approach with the following key components: |
| 17 | + |
| 18 | +### Core Components |
| 19 | + |
| 20 | +- **`cds-extractor.ts`**: Main entry point that orchestrates the extraction process |
| 21 | +- **`src/cds/parser/`**: CDS project discovery and dependency graph building |
| 22 | +- **`src/cds/compiler/`**: Compilation orchestration and `.cds.json` generation |
| 23 | +- **`src/packageManager/`**: Dependency installation and caching |
| 24 | +- **`src/logging/`**: Unified logging and performance tracking |
| 25 | +- **`src/environment.ts`**: Environment setup and validation |
| 26 | +- **`src/codeql.ts`**: CodeQL JavaScript extractor integration |
| 27 | + |
| 28 | +### Extraction Process |
| 29 | + |
| 30 | +1. **Environment Setup**: Validates CodeQL tools and system requirements |
| 31 | +2. **Project Discovery**: Recursively scans for CDS projects and builds dependency graph |
| 32 | +3. **Dependency Management**: Installs and caches required CDS compiler dependencies |
| 33 | +4. **CDS Compilation**: Compiles `.cds` files to `.cds.json` using project-aware compilation |
| 34 | +5. **JavaScript Extraction**: Runs CodeQL's JavaScript extractor on source and compiled files |
| 35 | + |
| 36 | +## Usage |
| 37 | + |
| 38 | +### Prerequisites |
| 39 | + |
| 40 | +- Node.js (accessible via `node` command) |
| 41 | +- CodeQL CLI tools |
| 42 | +- SAP CDS projects with `.cds` files |
| 43 | + |
| 44 | +### Running the Extractor |
| 45 | + |
| 46 | +The extractor is typically invoked by CodeQL during database creation: |
| 47 | + |
| 48 | +```bash |
| 49 | +codeql database create --language=cds --source-root=/path/to/project my-database |
| 50 | +``` |
| 51 | + |
| 52 | +### Manual Execution |
| 53 | + |
| 54 | +For development and testing purposes: |
| 55 | + |
| 56 | +```bash |
| 57 | +# Build the extractor |
| 58 | +npm run build |
| 59 | + |
| 60 | +# Run directly (from project source root) |
| 61 | +node dist/cds-extractor.js /path/to/source/root |
| 62 | +``` |
| 63 | + |
| 64 | +## Development |
| 65 | + |
| 66 | +### Project Structure |
| 67 | + |
| 68 | +```text |
| 69 | +extractors/cds/tools/ |
| 70 | +├── cds-extractor.ts # Main entry point |
| 71 | +├── src/ # Source code modules |
| 72 | +│ ├── cds/ # CDS-specific functionality |
| 73 | +│ │ ├── compiler/ # Compilation orchestration |
| 74 | +│ │ └── parser/ # Project discovery and parsing |
| 75 | +│ ├── logging/ # Logging and performance tracking |
| 76 | +│ ├── packageManager/ # Dependency management |
| 77 | +│ ├── codeql.ts # CodeQL integration |
| 78 | +│ ├── diagnostics.ts # Error reporting |
| 79 | +│ ├── environment.ts # Environment setup |
| 80 | +│ ├── filesystem.ts # File system utilities |
| 81 | +│ └── utils.ts # General utilities |
| 82 | +├── test/ # Test suites |
| 83 | +├── dist/ # Compiled JavaScript output |
| 84 | +└── package.json # Project configuration |
| 85 | +``` |
| 86 | + |
| 87 | +### Building |
| 88 | + |
| 89 | +```bash |
| 90 | +# Install dependencies |
| 91 | +npm install |
| 92 | + |
| 93 | +# Build TypeScript to JavaScript |
| 94 | +npm run build |
| 95 | + |
| 96 | +# Run all checks and build |
| 97 | +npm run build:all |
| 98 | +``` |
| 99 | + |
| 100 | +### Testing |
| 101 | + |
| 102 | +```bash |
| 103 | +# Run tests |
| 104 | +npm test |
| 105 | + |
| 106 | +# Run tests with coverage |
| 107 | +npm run test:coverage |
| 108 | + |
| 109 | +# Run tests in watch mode |
| 110 | +npm run test:watch |
| 111 | +``` |
| 112 | + |
| 113 | +### Code Quality |
| 114 | + |
| 115 | +```bash |
| 116 | +# Lint TypeScript files |
| 117 | +npm run lint |
| 118 | + |
| 119 | +# Auto-fix linting issues |
| 120 | +npm run lint:fix |
| 121 | + |
| 122 | +# Format code |
| 123 | +npm run format |
| 124 | +``` |
| 125 | + |
| 126 | +## Configuration |
| 127 | + |
| 128 | +### Environment Variables |
| 129 | + |
| 130 | +The extractor respects several CodeQL environment variables: |
| 131 | + |
| 132 | +- `CODEQL_DIST`: Path to CodeQL distribution |
| 133 | +- `CODEQL_EXTRACTOR_CDS_WIP_DATABASE`: Target database path |
| 134 | +- `LGTM_INDEX_FILTERS`: File filtering configuration |
| 135 | + |
| 136 | +### CDS Project Detection |
| 137 | + |
| 138 | +Projects are detected based on: |
| 139 | + |
| 140 | +- Presence of `package.json` files |
| 141 | +- CDS files (`.cds`) in the project directory tree |
| 142 | +- Valid CDS dependencies (`@sap/cds`, `@sap/cds-dk`) in package.json |
| 143 | + |
| 144 | +### Compilation Strategy |
| 145 | + |
| 146 | +The extractor uses a sophisticated compilation approach: |
| 147 | + |
| 148 | +1. **Dependency Graph Building**: Maps relationships between CDS projects |
| 149 | +2. **Smart Caching**: Reuses compiled outputs and dependency installations |
| 150 | +3. **Error Recovery**: Handles compilation failures gracefully |
| 151 | +4. **Performance Tracking**: Monitors compilation times and resource usage |
| 152 | + |
| 153 | +## Performance Features |
| 154 | + |
| 155 | +### Optimized Dependency Management |
| 156 | + |
| 157 | +- **Shared Dependency Cache**: Single installation per unique dependency combination |
| 158 | +- **Isolated Environments**: Dependencies installed in temporary cache directories |
| 159 | +- **No Source Modification**: Original project files remain unchanged |
| 160 | + |
| 161 | +### Efficient Processing |
| 162 | + |
| 163 | +- **Project-Level Compilation**: Compiles related CDS files together |
| 164 | +- **Duplicate Avoidance**: Prevents redundant processing of imported files |
| 165 | +- **Memory Tracking**: Monitors and reports memory usage throughout extraction |
| 166 | + |
| 167 | +### Scalability |
| 168 | + |
| 169 | +- **Large Codebase Support**: Optimized for enterprise-scale CDS projects |
| 170 | +- **Parallel Processing**: Where possible, processes independent projects concurrently |
| 171 | +- **Resource Management**: Cleans up temporary files and cached dependencies |
| 172 | + |
| 173 | +## Integration with `cds` CLI |
| 174 | + |
| 175 | +### Installation of CDS (Node) Dependencies |
| 176 | + |
| 177 | +#### Installation of `@sap/cds` and `@sap/cds-dk` |
| 178 | + |
| 179 | +The CDS extractor attempts to optimize performance for most projects by caching the installation of the unique combinations of resolved CDS dependencies across all projects under a given source root. |
| 180 | + |
| 181 | +The "unique combinations of resolved CDS dependencies" means that we resolve the **latest** available version **within the semantic version range** for each `@sap/cds` and `@sap/cds-dk` dependency specified in the `package.json` file for a given CAP project. |
| 182 | + |
| 183 | +In practice, this means that if "project-a" requires `@sap/cds@^6.0.0` and "project-b" requires `@sap/cds@^7.0.0` while the latest available version is `@sap/[email protected]` (as a trivial example), the extractor will install `@sap/[email protected]` once and reuse it for both projects. |
| 184 | + |
| 185 | +This is much faster than installing all dependencies for every project individually, especially for large projects with many CDS files. However, this approach has some limitations and trade-offs: |
| 186 | + |
| 187 | +- This latest-first approach is more likely to choose the same version for multiple projects, which can reduce analysis time and can improve consistency in analysis between projects. |
| 188 | +- This approach does not read (or respect) the `package-lock.json` file, which means that we are more likely to use a `cds` version that is different from the one most recently tested/used by the project developers. |
| 189 | +- We are more likely to encounter incompatibility issues where a particular project hasn't been tested with the latest version of `@sap/cds` or `@sap/cds-dk`. |
| 190 | + |
| 191 | +We can mitigate some of these issues through a (to be implemented) compilation retry mechanism for projects where some CDS compilation task(s) fail to produce the expected `.cds.json` output file(s). |
| 192 | +The proposed retry mechanism would install the full set of dependencies for the affected project(s) while respecting the `package-lock.json` file, and then re-run the compilation for the affected project(s). |
| 193 | + |
| 194 | +```text |
| 195 | +TODO: retry mechanism expected before next release of the CDS extractor |
| 196 | +``` |
| 197 | + |
| 198 | +#### Installation of Additional Project-Specific Dependencies |
| 199 | + |
| 200 | +```text |
| 201 | +TODO: implement installation of dependencies required for compilation to succeed for a given project |
| 202 | +``` |
| 203 | + |
| 204 | +### Integration with `cds compile` command |
| 205 | + |
| 206 | +The CDS extractor uses the `cds compile` command to compile `.cds` files into `.cds.json` files, which are then processed by CodeQL's JavaScript extractor. |
| 207 | + |
| 208 | +Where possible, a single `model.cds.json` file is generated for each project, containing all the compiled definitions from the project's `.cds` files. This results in a faster extraction process overall with minimal duplication of CDS code elements (e.g., annotations, entities, services, etc.) within the CodeQL database created from the extraction process. |
| 209 | + |
| 210 | +Where project-level compilation is not possible (e.g., due to project structure), the extractor generates individual `.cds.json` files for each `.cds` file in the project. The main downside to this approach is that if one `.cds` file imports another `.cds` file, the imported definitions will be duplicated in the CodeQL database, which can lead to false positives in queries that expect unique definitions. |
| 211 | + |
| 212 | +```text |
| 213 | +TODO: use the unique (session) ID of the CDS extractor run to as the `<session>` part of `<basename>.<session>.cds.json` and set JS extractor env vars to only extractor `.<session>.cds.json` files |
| 214 | +``` |
| 215 | + |
| 216 | +### Integration with `cds env` command |
| 217 | + |
| 218 | +The current version of the CDS extractor expects CAP projects to follow the [default project structure][CAP-project-structure], particularly regarding the names of the (`app`, `db`, & `srv`) subdirectories in which the extractor will look for `.cds` files to process (in addition to the root directory of the project). |
| 219 | + |
| 220 | +The proposed solution will use the `cds env` command to discover configurations that affect the structure of the project and/or the expected "compilation tasks" for the project, such as any user customization of environment configurations such as: |
| 221 | + |
| 222 | +- `cds.folders.app` |
| 223 | +- `cds.folders.db` |
| 224 | +- `cds.folders.srv` |
| 225 | + |
| 226 | +```text |
| 227 | +TODO : add support for integration with `cds env` CLI command as a means of consistently getting configurations for CAP projects |
| 228 | +``` |
| 229 | + |
| 230 | +## Integration with `codeql` CLI |
| 231 | + |
| 232 | +### File Processing |
| 233 | + |
| 234 | +The extractor processes both: |
| 235 | + |
| 236 | +- **Source Files**: Original `.cds` files for source code analysis |
| 237 | +- **Compiled Files**: Generated `.cds.json` files for semantic analysis |
| 238 | + |
| 239 | +### Database Population |
| 240 | + |
| 241 | +- Integrates with CodeQL's JavaScript extractor for final database population |
| 242 | +- Maintains proper file relationships and source locations |
| 243 | +- Supports CodeQL's standard indexing and filtering mechanisms |
| 244 | + |
| 245 | +## Troubleshooting |
| 246 | + |
| 247 | +### Common Issues |
| 248 | + |
| 249 | +1. **Missing Node.js**: Ensure `node` command is available in PATH |
| 250 | +2. **CDS Dependencies**: Verify projects have valid `@sap/cds` dependencies |
| 251 | +3. **Compilation Failures**: Check CDS syntax and cross-file references |
| 252 | +4. **Memory Issues**: Monitor memory usage for very large projects |
| 253 | + |
| 254 | +### Debugging |
| 255 | + |
| 256 | +The extractor provides comprehensive logging: |
| 257 | + |
| 258 | +- **Performance Tracking**: Times for each extraction phase |
| 259 | +- **Memory Usage**: Memory consumption at key milestones |
| 260 | +- **Error Reporting**: Detailed error messages with context |
| 261 | +- **Project Discovery**: Information about detected CDS projects |
| 262 | + |
| 263 | +### Log Levels |
| 264 | + |
| 265 | +- `info`: General progress and milestone information |
| 266 | +- `warn`: Non-critical issues that don't prevent extraction |
| 267 | +- `error`: Critical failures that may affect extraction quality |
| 268 | + |
| 269 | +## References |
| 270 | + |
| 271 | +- [SAP Cloud Application Programming Model][CAP] |
| 272 | + - [Default Structure of a CAP Project][CAP-project-structure] |
| 273 | +- [Core Data Services (CDS)][CDS] |
| 274 | + - [Project-Specific Configurations][CDS-ENV-project-configs] |
| 275 | +- [Conceptual Definition Language (CDL)][CDL] |
| 276 | +- [CodeQL Documentation](https://codeql.github.com/docs/) |
| 277 | + |
| 278 | +[CAP]: https://cap.cloud.sap/docs/about/ |
| 279 | +[CAP-project-structure]: https://cap.cloud.sap/docs/get-started/#project-structure |
| 280 | +[CDS]: https://cap.cloud.sap/docs/cds/ |
| 281 | +[CDS-ENV-project-configs]: https://cap.cloud.sap/docs/node.js/cds-env#project-specific-configurations |
| 282 | +[CDL]: https://cap.cloud.sap/docs/cds/cdl |
0 commit comments