Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Commit 5e9f1fb

Browse files
sungchun12Sung Won Chung
andauthored
Beautify readme (#671)
* reorganize and add lovely image * capitalization * add dev testing illustration * fix emoji * fix postgresql string * efficient sentence * Leo's feedback * update image --------- Co-authored-by: Sung Won Chung <[email protected]>
1 parent 2697d3a commit 5e9f1fb

File tree

3 files changed

+60
-49
lines changed

3 files changed

+60
-49
lines changed

β€ŽREADME.mdβ€Ž

Lines changed: 60 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,65 @@
1-
<p align="left">
1+
<p align="center">
22
<a href="https://datafold.com/"><img alt="Datafold" src="https://user-images.githubusercontent.com/1799931/196497110-d3de1113-a97f-4322-b531-026d859b867a.png" width="30%" /></a>
33
</p>
44

5-
<h1 align="left">
6-
data-diff: compare datasets fast, within or across SQL databases
7-
</h1>
5+
<h2 align="center">
6+
data-diff: Compare datasets fast, within or across SQL databases
87

8+
![data-diff-logo](docs/data-diff-logo.png)
9+
</h2>
910
<br>
1011

12+
# Use Cases
13+
14+
## Data Migration & Replication Testing
15+
Compare source to target and check for discrepancies when moving data between systems:
16+
- Migrating to a new data warehouse (e.g., Oracle > Snowflake)
17+
- Converting SQL to a new transformation framework (e.g., stored procedures > dbt)
18+
- Continuously replicating data from an OLTP DB to OLAP DWH (e.g., MySQL > Redshift)
19+
20+
21+
## Data Development Testing
22+
Test SQL code and preview changes by comparing development/staging environment data to production:
23+
1. Make a change to some SQL code
24+
2. Run the SQL code to create a new dataset
25+
3. Compare the dataset with its production version or another iteration
26+
27+
<p align="left">
28+
<img alt="dbt" src="https://seeklogo.com/images/D/dbt-logo-E4B0ED72A2-seeklogo.com.png" width="10%" />
29+
</p>
30+
31+
<details>
32+
<summary> data-diff integrates with dbt Core to seamlessly compare local development to production datasets
33+
34+
</summary>
35+
36+
![data-development-testing](docs/development_testing.png)
37+
38+
</details>
39+
40+
> [dbt Cloud users should check out Datafold's out-of-the-box deployment testing integration](https://www.datafold.com/data-deployment-testing)
41+
42+
:eyes: **Watch [4-min demo video](https://www.loom.com/share/ad3df969ba6b4298939efb2fbcc14cde)**
43+
44+
**[Get started with data-diff & dbt](https://docs.datafold.com/development_testing/open_source)**
45+
46+
Also available in a [VS Code Extension](https://marketplace.visualstudio.com/items?itemName=Datafold.datafold-vscode)
47+
48+
Reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) for advice and support
49+
50+
1151
# How it works
1252

1353
When comparing the data, `data-diff` utilizes the resources of the underlying databases as much as possible. It has two primary modes of comparison:
1454

15-
## joindiff
55+
## `joindiff`
1656
- Recommended for comparing data within the same database
1757
- Uses the outer join operation to diff the rows as efficiently as possible within the same database
1858
- Fully relies on the underlying database engine for computation
1959
- Requires both datasets to be queryable with a single SQL query
2060
- Time complexity approximates JOIN operation and is largely independent of the number of differences in the dataset
2161

22-
## hashdiff
62+
## `hashdiff`
2363
- Recommended for comparing datasets across different databases
2464
- Can also be helpful in diffing very large tables with few expected differences within the same database
2565
- Employs a divide-and-conquer algorithm based on hashing and binary search
@@ -52,61 +92,32 @@ data-diff \
5292
Check out [documentation](https://docs.datafold.com/reference/open_source/cli) for the full command reference.
5393

5494

55-
# Use cases
56-
57-
## Data Migration & Replication Testing
58-
Compare source to target and check for discrepancies when moving data between systems:
59-
- Migrating to a new data warehouse (e.g., Oracle > Snowflake)
60-
- Converting SQL to a new transformation framework (e.g., stored procedures > dbt)
61-
- Continuously replicating data from an OLTP DB to OLAP DWH (e.g., MySQL > Redshift)
62-
63-
64-
## Data Development Testing
65-
Test SQL code and preview changes by comparing development/staging environment data to production:
66-
1. Make a change to some SQL code
67-
2. Run the SQL code to create a new dataset
68-
3. Compare the dataset with its production version or another iteration
69-
70-
<p align="left">
71-
<img alt="dbt" src="https://seeklogo.com/images/D/dbt-logo-E4B0ED72A2-seeklogo.com.png" width="10%" />
72-
</p>
73-
74-
`data-diff` integrates with dbt Core and dbt Cloud to seamlessly compare local development to production datasets.
75-
76-
:eyes: **Watch [4-min demo video](https://www.loom.com/share/ad3df969ba6b4298939efb2fbcc14cde)**
77-
78-
**[Get started with data-diff & dbt](https://docs.datafold.com/development_testing/open_source)**
79-
80-
Also available in a [VS Code Extension](https://marketplace.visualstudio.com/items?itemName=Datafold.datafold-vscode)
81-
82-
Reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) for advice and support
83-
8495
# Supported databases
8596

8697

8798
| Database | Status | Connection string |
8899
|---------------|-------------------------------------------------------------------------------------------------------------------------------------|--------|
89-
| PostgreSQL >=10 | πŸ’š | `postgresql://<user>:<password>@<host>:5432/<database>` |
90-
| MySQL | πŸ’š | `mysql://<user>:<password>@<hostname>:5432/<database>` |
91-
| Snowflake | πŸ’š | `"snowflake://<user>[:<password>]@<account>/<database>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<role>[&authenticator=externalbrowser]"` |
92-
| BigQuery | πŸ’š | `bigquery://<project>/<dataset>` |
93-
| Redshift | πŸ’š | `redshift://<username>:<password>@<hostname>:5439/<database>` |
94-
| Oracle | πŸ’› | `oracle://<username>:<password>@<hostname>/database` |
95-
| Presto | πŸ’› | `presto://<username>:<password>@<hostname>:8080/<database>` |
96-
| Databricks | πŸ’› | `databricks://<http_path>:<access_token>@<server_hostname>/<catalog>/<schema>` |
97-
| Trino | πŸ’› | `trino://<username>:<password>@<hostname>:8080/<database>` |
98-
| Clickhouse | πŸ’› | `clickhouse://<username>:<password>@<hostname>:9000/<database>` |
99-
| Vertica | πŸ’› | `vertica://<username>:<password>@<hostname>:5433/<database>` |
100-
| DuckDB | πŸ’› | |
100+
| PostgreSQL >=10 | 🟒 | `postgresql://<user>:<password>@<host>:5432/<database>` |
101+
| MySQL | 🟒 | `mysql://<user>:<password>@<hostname>:5432/<database>` |
102+
| Snowflake | 🟒 | `"snowflake://<user>[:<password>]@<account>/<database>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<role>[&authenticator=externalbrowser]"` |
103+
| BigQuery | 🟒 | `bigquery://<project>/<dataset>` |
104+
| Redshift | 🟒 | `redshift://<username>:<password>@<hostname>:5439/<database>` |
105+
| Oracle | 🟑 | `oracle://<username>:<password>@<hostname>/database` |
106+
| Presto | 🟑 | `presto://<username>:<password>@<hostname>:8080/<database>` |
107+
| Databricks | 🟑 | `databricks://<http_path>:<access_token>@<server_hostname>/<catalog>/<schema>` |
108+
| Trino | 🟑 | `trino://<username>:<password>@<hostname>:8080/<database>` |
109+
| Clickhouse | 🟑 | `clickhouse://<username>:<password>@<hostname>:9000/<database>` |
110+
| Vertica | 🟑 | `vertica://<username>:<password>@<hostname>:5433/<database>` |
111+
| DuckDB | 🟑 | |
101112
| ElasticSearch | πŸ“ | |
102113
| Planetscale | πŸ“ | |
103114
| Pinot | πŸ“ | |
104115
| Druid | πŸ“ | |
105116
| Kafka | πŸ“ | |
106117
| SQLite | πŸ“ | |
107118

108-
* πŸ’š: Implemented and thoroughly tested.
109-
* πŸ’›: Implemented, but not thoroughly tested yet.
119+
* 🟒: Implemented and thoroughly tested.
120+
* 🟑: Implemented, but not thoroughly tested yet.
110121
* ⏳: Implementation in progress.
111122
* πŸ“: Implementation planned. Contributions welcome.
112123

β€Ždocs/data-diff-logo.pngβ€Ž

40.8 KB
Loading

β€Ždocs/development_testing.pngβ€Ž

69.7 KB
Loading

0 commit comments

Comments
Β (0)