Skip to content

Commit 398024c

Browse files
committed
feat: csvstat supports the --no-inference, --locale, --blanks, --date-format, --datetime-format options, closes #965
1 parent ec9ca61 commit 398024c

File tree

4 files changed

+23
-9
lines changed

4 files changed

+23
-9
lines changed

CHANGELOG.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ Unreleased
22
----------
33

44
* :doc:`/scripts/csvformat` adds a :code:`--skip-header` (:code:`-E`) option to not output a header row.
5+
* :doc:`/scripts/csvstat` supports the :code:`--no-inference` (:code:`-I`), :code:`--locale` (:code:`-L`), :code:`--blanks`, :code:`--date-format` and :code:`datetime-format` options.
56
* :doc:`/scripts/csvstat` adds a :code:`--json` option to output results as JSON text.
67
* :doc:`/scripts/csvstat` adds an :code:`--indent` option to indent the JSON text when :code:`--json` is set.
78
* :doc:`/scripts/csvstat` reports a "Non-null values" statistic (or a :code:`nonnulls` column when :code:`--csv` is set).

csvkit/utilities/csvstat.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,6 @@
6969

7070
class CSVStat(CSVKitUtility):
7171
description = 'Print descriptive statistics for each column in a CSV file.'
72-
override_flags = ['L', 'blanks', 'date-format', 'datetime-format']
7372

7473
def add_arguments(self):
7574
self.argparser.add_argument(
@@ -144,6 +143,9 @@ def add_arguments(self):
144143
'-y', '--snifflimit', dest='sniff_limit', type=int, default=1024,
145144
help='Limit CSV dialect sniffing to the specified number of bytes. '
146145
'Specify "0" to disable sniffing entirely, or "-1" to sniff the entire file.')
146+
self.argparser.add_argument(
147+
'-I', '--no-inference', dest='no_inference', action='store_true',
148+
help='Disable type inference when parsing the input. Disable reformatting of values.')
147149

148150
def main(self):
149151
if self.args.names_only:
@@ -183,6 +185,7 @@ def main(self):
183185
self.input_file,
184186
skip_lines=self.args.skip_lines,
185187
sniff_limit=sniff_limit,
188+
column_types=self.get_column_types(),
186189
**self.reader_kwargs,
187190
)
188191

docs/common_arguments.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,11 @@ csvkit's tools share a set of common command-line arguments. Not every argument
3131
Specify the locale (en_US) of any formatted numbers.
3232
-S, --skipinitialspace
3333
Ignore whitespace immediately following the delimiter.
34-
--blanks Do not coerce empty, "na", "n/a", "none", "null", "."
35-
strings to NULL values.
34+
--blanks Do not convert "", "na", "n/a", "none", "null", "." to
35+
NULL.
36+
--null-value NULL_VALUES [NULL_VALUES ...]
37+
Convert this value to NULL. --null-value can be
38+
specified multiple times.
3639
--date-format DATE_FORMAT
3740
Specify a strptime date format string like "%m/%d/%Y".
3841
--datetime-format DATETIME_FORMAT

docs/scripts/csvstat.rst

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,15 @@ Prints descriptive statistics for all columns in a CSV file. Will intelligently
1010
.. code-block:: bash
1111
1212
usage: csvstat [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
13-
[-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-S] [-H]
14-
[-K SKIP_LINES] [-v] [-l] [--zero] [-V] [--csv] [--json]
13+
[-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-L LOCALE]
14+
[-S] [--blanks] [--null-value NULL_VALUES [NULL_VALUES ...]]
15+
[--date-format DATE_FORMAT] [--datetime-format DATETIME_FORMAT]
16+
[-H] [-K SKIP_LINES] [-v] [-l] [--zero] [-V] [--csv] [--json]
1517
[-i INDENT] [-n] [-c COLUMNS] [--type] [--nulls] [--non-nulls]
16-
[--unique] [--min] [--max] [--sum] [--mean] [--median] [--stdev]
17-
[--len] [--freq] [--freq-count FREQ_COUNT] [--count]
18-
[--decimal-format DECIMAL_FORMAT] [-G] [-y SNIFF_LIMIT]
18+
[--unique] [--min] [--max] [--sum] [--mean] [--median]
19+
[--stdev] [--len] [--max-precision] [--freq]
20+
[--freq-count FREQ_COUNT] [--count]
21+
[--decimal-format DECIMAL_FORMAT] [-G] [-y SNIFF_LIMIT] [-I]
1922
[FILE]
2023
2124
Print descriptive statistics for each column in a CSV file.
@@ -48,6 +51,7 @@ Prints descriptive statistics for all columns in a CSV file. Will intelligently
4851
--median Only output medians.
4952
--stdev Only output standard deviations.
5053
--len Only output the length of the longest values.
54+
--max-precision Only output the most decimal places.
5155
--freq Only output lists of frequent values.
5256
--freq-count FREQ_COUNT
5357
The maximum number of frequent values to display.
@@ -59,7 +63,10 @@ Prints descriptive statistics for all columns in a CSV file. Will intelligently
5963
Do not use grouping separators in decimal numbers.
6064
-y SNIFF_LIMIT, --snifflimit SNIFF_LIMIT
6165
Limit CSV dialect sniffing to the specified number of
62-
bytes. Specify "0" to disable sniffing.
66+
bytes. Specify "0" to disable sniffing entirely, or
67+
"-1" to sniff the entire file.
68+
-I, --no-inference Disable type inference when parsing the input. Disable
69+
reformatting of values.
6370
6471
See also: :doc:`../common_arguments`.
6572

0 commit comments

Comments
 (0)