diff --git a/README.md b/README.md index ee2d6d6..86930ca 100644 --- a/README.md +++ b/README.md @@ -25,10 +25,8 @@ There are a couple ways to install `tantivy-cli`. If you are a Rust programmer, you probably have `cargo` installed and you can just run `cargo install tantivy-cli` - - ## Creating the index: `new` - + Let's create a directory in which your index will be stored. ```bash @@ -36,39 +34,34 @@ Let's create a directory in which your index will be stored. mkdir wikipedia-index ``` - We will now initialize the index and create its schema. The [schema](https://quickwit-oss.github.io/tantivy/tantivy/schema/index.html) defines the list of your fields, and for each field: -- its name +- its name - its type, currently `u64`, `i64` or `str` - how it should be indexed. -You can find more information about the latter on +You can find more information about the latter on [tantivy's schema documentation page](https://quickwit-oss.github.io/tantivy/tantivy/schema/index.html) In our case, our documents will contain * a title -* a body +* a body * a url -We want the title and the body to be tokenized and indexed. We also want +We want the title and the body to be tokenized and indexed. We also want to add the term frequency and term positions to our index. Running `tantivy new` will start a wizard that will help you define the schema of the new index. -Like all the other commands of `tantivy`, you will have to +Like all the other commands of `tantivy`, you will have to pass it your index directory via the `-i` or `--index` parameter as follows: - ```bash tantivy new -i wikipedia-index ``` - - - Answer the questions as follows: ```none @@ -142,13 +135,10 @@ It is a fairly human readable JSON, so you can check its content. It contains two sections: - segments (currently empty, but we will change that soon) -- schema - - +- schema # Indexing the document: `index` - Tantivy's `index` command offers a way to index a json file. The file must contain one JSON object per line. The structure of this JSON object must match that of our schema definition. @@ -168,10 +158,9 @@ If you are in a rush you can [download 100 articles in the right format here (11 The `index` command will index your document. By default it will use as 3 thread, each with a buffer size of 1GB split a -across these threads. +across these threads. - -``` +```bash cat wiki-articles.json | tantivy index -i ./wikipedia-index ``` @@ -192,18 +181,18 @@ The main file is `meta.json`. You should also see a lot of files with a UUID as filename, and different extensions. Our index is in fact divided in segments. Each segment acts as an individual smaller index. -Its name is simply a uuid. +Its name is simply a uuid. If you decided to index the complete wikipedia, you may also see some of these files disappear. Having too many segments can hurt search performance, so tantivy actually automatically starts -merging segments. +merging segments. # Serve the search index: `serve` Tantivy's cli also embeds a search server. You can run it with the following command. -``` +```bash tantivy serve -i wikipedia-index ``` @@ -218,32 +207,31 @@ By default this query is treated as `barack OR obama`. You can also search for documents that contains both term, by adding a `+` sign before the terms in your query. http://localhost:3000/api/?q=%2Bbarack%20%2Bobama&nhits=20 - + Also, `-` makes it possible to remove documents the documents containing a specific term. http://localhost:3000/api/?q=-barack%20%2Bobama&nhits=20 - + Finally tantivy handle phrase queries. http://localhost:3000/api/?q=%22barack%20obama%22&nhits=20 - + # Search the index via the command line You may also use the `search` command to stream all documents matching a specific query. The documents are returned in an unspecified order. -``` +```bash tantivy search -i wikipedia-index -q "barack obama" tantivy search -i hdfs --query "*" --agg '{"severities":{"terms":{"field":"severity_text"}}}' ``` - # Benchmark the index: `bench` Tantivy's cli provides a simple benchmark tool. You can run it with the following command. -``` +```bash tantivy bench -i wikipedia-index -n 10 -q queries.txt ``` diff --git a/src/main.rs b/src/main.rs index d7beba8..b015c4a 100644 --- a/src/main.rs +++ b/src/main.rs @@ -33,6 +33,7 @@ fn main() { .arg(Arg::new("host") .long("host") .value_name("host") + .default_value("localhost") .help("host to listen to") .default_value("localhost") .value_parser(clap::value_parser!(String)) @@ -42,7 +43,6 @@ fn main() { .long("port") .value_name("port") .help("Port") - .default_value("3000") .value_parser(clap::value_parser!(usize)) ) )