Skip to content

Commit 071ae9e

Browse files
author
bhilprecht
authored
Explanation how to evaluate on new datasets
1 parent 6d09007 commit 071ae9e

File tree

1 file changed

+15
-1
lines changed

1 file changed

+15
-1
lines changed

README.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,21 @@ source venv/bin/activate
1818
pip3 install -r requirements.txt
1919
```
2020

21-
# Reproduce Experiments
21+
# How to experiment with DeepDB on a new Dataset
22+
- Specify a new schema in the schemas folder
23+
- Due to the current implementation, make sure to declare
24+
- the primary key,
25+
- the filename of the csv sample file,
26+
- the correct table size and sample rate,
27+
- the relationships among tables if you do not just run queries over a single table,
28+
- any non-key functional dependencies (this is rather an implementation detail),
29+
- and include all columns in the no-compression list by default (as done for the IMDB benchmark),
30+
- To further reduce the training time, you can exclude columns you do not need in your experiments (also done in the IMDB benchmark)
31+
- Generate the HDF/sampled HDF files and learn the RSPN ensemble
32+
- Use the RSPN ensemble to answer queries
33+
- For reference, please check the commands to reproduce the results of the paper
34+
35+
# How to Reproduce Experiments in the Paper
2236

2337
## Cardinality Estimation
2438
Download the [Job dataset](http://homepages.cwi.nl/~boncz/job/imdb.tgz).

0 commit comments

Comments
 (0)