Open
Description
Hi team,
I'm experiencing very slow performance when running a CodeQL query on a Python project using TaintTracking::Global
. The analysis never finishes, even after more than 2 hours, on a project that I believe is not very large. Below are some details:
- CVE project: CVE-2024-23637
- Python files: 263
- Total lines: ~88,981
- Sources: < 200
- Sinks: < 200
- Tracking config:
TaintTracking::Global<RemoteToFileConfiguration>
My query looks like this:
module RemoteToFileConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
MySources::isSource(source)
}
predicate isSink(DataFlow::Node sink) {
MySinks::isMySink(sink)
}
}
module Flow = TaintTracking::Global<RemoteToFileConfiguration>;
import Flow::PathGraph
from Flow::PathNode source, Flow::PathNode sink
where Flow::flowPath(source, sink)
select sink.getNode(), source, sink, "Flow path from source to sink"
I defined sinks or sources like this (simplified):
module MySinks {
class Sink extends DataFlow::Node {
Sink() {
exists(FunctionValue func, Call call |
func.getQualifiedName() = "run_code" or
func.getQualifiedName() = "check_syntax_error" or
...
call.getFunc().pointsTo(func) and
this = DataFlow::exprNode(call.getAnArg())
)
}
}
predicate isMySink(DataFlow::Node sink) {
exists(Sink s | s = sink)
}
}
My questions:
- Why is the performance so slow in this case?
- Are there any best practices for optimizing
TaintTracking::Global
on Python? - I tried using
func.getQualifiedName()
with a full path like"Module xml.etree.ElementInclude.Function default_loader"
, but it didn’t work in VSCode (the function wasn't found). Is there a correct way to define sinks using fully qualified names for Python?
Thank you very much for any guidance or suggestions!