CHANGELOG

2024-06-28

  • Dolma v1.7 is now available! We will deprecate the Dolma v1.6 index (v4_dolma-v1_6_llama) on July 7.

2024-06-12

  • The count query now has two optional fields, max_clause_freq and max_diff_tokens, both are for customizing CNF queries.

  • The ntd and infgram_ntd queries now has an optional field, max_support, which is to customize the accuracy of result.

  • The search_docs query now has four optional fields. max_clause_freq and max_diff_tokens which are similar to those in count queries. max_disp_len which controls how many tokens to return per document. maxnum is now optional with a default value of 1.

2024-06-06

  • The input field corpus is renamed to index. Support for corpus will be discontinued sometime in the future. Please update your scripts accordingly.

  • count, ntd, and infgram_ntd queries now returns an extra field approx.

  • infgram_prob and infgram_ntd queries now returns an extra field suffix_len.

  • For search_docs queries, each returned document now contains an extra field token_ids.

2024-05-08

  • The count query now supports CNF inputs, similar to search_docs.

2024-04-15

  • We’re lifting the restriction on concurrent requests and sleeping between requests. Now it should be OK to issue concurrent requests. Though our server is serving a lot of requests and you may experience longer latency.

2024-03-02

  • The API now supports inputting a list of token IDs in place of a string as the query. Check out the query_ids field.

2024-02-23

  • The output field tokenized is deprecated and replaced by token_ids and tokens in all query types (except in search_docs, where the new fields are token_idsss and tokensss). The tokenized field will be removed on 2024-03-01.

  • For ntd and infgram_ntd queries, there is now a new output field prompt_cnt, and the output field ntd is deprecated and replaced by result_by_token_id. The ntd field will be removed on 2024-03-01.

  • For search_docs queries, the output field docs is deprecated and replaced by documents, which contains additional metadata of the retrieved documents. The docs field will be removed on 2024-03-01.