CHANGELOG
2024-06-28
Dolma v1.7 is now available! We will deprecate the Dolma v1.6 index (
v4_dolma-v1_6_llama) on July 7.
2024-06-12
The
countquery now has two optional fields,max_clause_freqandmax_diff_tokens, both are for customizing CNF queries.The
ntdandinfgram_ntdqueries now has an optional field,max_support, which is to customize the accuracy of result.The
search_docsquery now has four optional fields.max_clause_freqandmax_diff_tokenswhich are similar to those incountqueries.max_disp_lenwhich controls how many tokens to return per document.maxnumis now optional with a default value of 1.
2024-06-06
The input field
corpusis renamed toindex. Support forcorpuswill be discontinued sometime in the future. Please update your scripts accordingly.count,ntd, andinfgram_ntdqueries now returns an extra fieldapprox.infgram_probandinfgram_ntdqueries now returns an extra fieldsuffix_len.For
search_docsqueries, each returned document now contains an extra fieldtoken_ids.
2024-05-08
The
countquery now supports CNF inputs, similar tosearch_docs.
2024-04-15
We’re lifting the restriction on concurrent requests and sleeping between requests. Now it should be OK to issue concurrent requests. Though our server is serving a lot of requests and you may experience longer latency.
2024-03-02
The API now supports inputting a list of token IDs in place of a string as the query. Check out the
query_idsfield.
2024-02-23
The output field
tokenizedis deprecated and replaced bytoken_idsandtokensin all query types (except insearch_docs, where the new fields aretoken_idsssandtokensss). Thetokenizedfield will be removed on 2024-03-01.For
ntdandinfgram_ntdqueries, there is now a new output fieldprompt_cnt, and the output fieldntdis deprecated and replaced byresult_by_token_id. Thentdfield will be removed on 2024-03-01.For
search_docsqueries, the output fielddocsis deprecated and replaced bydocuments, which contains additional metadata of the retrieved documents. Thedocsfield will be removed on 2024-03-01.