原文地址 https://www.elastic.co/guide/en/elasticsearch/reference/current/_exploring_your_data.html
样例数据集
如果你是从开始看到现在,那么对Elasticsearch应该有了最基础的了解。如果没有,建议从头读一遍。接下来工作将在接近实际环境的数据集上展开。这里准备了一份用户银行账号信息的虚构JSON文档。每个文档都有如下结构:
{
"account_number": 0,
"balance": 16623,
"firstname": "Bradshaw",
"lastname": "Mckenzie",
"age": 29,
"gender": "F",
"address": "244 Columbus Place",
"employer": "Euron",
"email": "bradshawmckenzie@euron.com",
"city": "Hobucken",
"state": "CO"
}
你也许会感到奇怪,这些值和语法看起来很怪异,实际上它们是在www.json-generator.com上生成的,这些全都是随机值。
载入样例数据集
你可以从这里下载样例数据集。把它当到当前目录,然后按照下面的命令把它加载到簇中:
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty&refresh' --data-binary "@accounts.json"
curl 'localhost:9200/_cat/indices?v'
结果如下:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open customer aHuZLSakQNGcFgTMiyIXaA 5 1 2 0 7.4kb 7.4kb
yellow open bank 4aK4OrQOSWexaj54oCE4ag 5 1 1000 0 640.2kb 640.2kb
这意味着我们刚刚将批量的为1000个文档放倒bank索引中,类型为account。
检索API
接下来我们开始一些简单的检索。有很多基本的方式去进行检索:一种是通过REST请求URI进行参数传递,另一种是通过REST请求消息体。消息体的方式更具有表现力更强,可以用可读性更好的JSON格式定义检索条件。URI传参只做一个演示,以后的讲解仅使用请求消息体的方式。
检索API可以通过_search路径进行访问。下面的例子会返回bank索引中所有的文档:
curl -XGET 'localhost:9200/bank/_search?q=*&sort=account_number:asc&pretty&pretty'
我们先来分析下检索所有。我们检索bank索引(_search
路径),q=*
参数告诉Elasticsearch去匹配索引中所有的文档。sort=account_number:asc
参数控制检索结果使用account_number
属性进行升序排列。pretty
参数告诉Elasticsearch以格式化好的JSON形式返回结果。
响应如下:
{
"took" : 88,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1000,
"max_score" : null,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "0",
"_score" : null,
"_source" : {
"account_number" : 0,
"balance" : 16623,
"firstname" : "Bradshaw",
"lastname" : "Mckenzie",
"age" : 29,
"gender" : "F",
"address" : "244 Columbus Place",
"employer" : "Euron",
"email" : "bradshawmckenzie@euron.com",
"city" : "Hobucken",
"state" : "CO"
},
"sort" : [
0
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "1",
"_score" : null,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"
},
"sort" : [
1
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "2",
"_score" : null,
"_source" : {
"account_number" : 2,
"balance" : 28838,
"firstname" : "Roberta",
"lastname" : "Bender",
"age" : 22,
"gender" : "F",
"address" : "560 Kingsway Place",
"employer" : "Chillium",
"email" : "robertabender@chillium.com",
"city" : "Bennett",
"state" : "LA"
},
"sort" : [
2
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "3",
"_score" : null,
"_source" : {
"account_number" : 3,
"balance" : 44947,
"firstname" : "Levine",
"lastname" : "Burks",
"age" : 26,
"gender" : "F",
"address" : "328 Wilson Avenue",
"employer" : "Amtap",
"email" : "levineburks@amtap.com",
"city" : "Cochranville",
"state" : "HI"
},
"sort" : [
3
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "4",
"_score" : null,
"_source" : {
"account_number" : 4,
"balance" : 27658,
"firstname" : "Rodriquez",
"lastname" : "Flores",
"age" : 31,
"gender" : "F",
"address" : "986 Wyckoff Avenue",
"employer" : "Tourmania",
"email" : "rodriquezflores@tourmania.com",
"city" : "Eastvale",
"state" : "HI"
},
"sort" : [
4
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "5",
"_score" : null,
"_source" : {
"account_number" : 5,
"balance" : 29342,
"firstname" : "Leola",
"lastname" : "Stewart",
"age" : 30,
"gender" : "F",
"address" : "311 Elm Place",
"employer" : "Diginetic",
"email" : "leolastewart@diginetic.com",
"city" : "Fairview",
"state" : "NJ"
},
"sort" : [
5
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : null,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
},
"sort" : [
6
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "7",
"_score" : null,
"_source" : {
"account_number" : 7,
"balance" : 39121,
"firstname" : "Levy",
"lastname" : "Richard",
"age" : 22,
"gender" : "M",
"address" : "820 Logan Street",
"employer" : "Teraprene",
"email" : "levyrichard@teraprene.com",
"city" : "Shrewsbury",
"state" : "MO"
},
"sort" : [
7
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "8",
"_score" : null,
"_source" : {
"account_number" : 8,
"balance" : 48868,
"firstname" : "Jan",
"lastname" : "Burns",
"age" : 35,
"gender" : "M",
"address" : "699 Visitation Place",
"employer" : "Glasstep",
"email" : "janburns@glasstep.com",
"city" : "Wakulla",
"state" : "AZ"
},
"sort" : [
8
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "9",
"_score" : null,
"_source" : {
"account_number" : 9,
"balance" : 24776,
"firstname" : "Opal",
"lastname" : "Meadows",
"age" : 39,
"gender" : "M",
"address" : "963 Neptune Avenue",
"employer" : "Cedward",
"email" : "opalmeadows@cedward.com",
"city" : "Olney",
"state" : "OH"
},
"sort" : [
9
]
}
]
}
}
结果主要有以下几个部分:
took
- Elasticsearch执行检索花费的时间(毫秒)timed_out
- 表示检索是否超时_shards
- 表示检索了多少瓦片,同时包含成功/失败的次数hits
- 检索的结果hits.total
- 符合检索条件的文档总数hits.hits
- 实际检索结果的数组(默认显示前10条)sort
- 结果排序关键字(不写就会使用score)_score
和max_score
- 暂时先不深究
下面是使用请求体传参与上面检索条件完全相同的命令:
curl -XGET 'localhost:9200/bank/_search?pretty' -d'
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}'
主要的不同之处是不再URI中使用q=*
,而是POST一个JSON形式的请求体。下一节会讨论JSON检索的使用方式。
一旦获取到检索结果,Elasticsearch就完成了请求的过程,不会在服务端维持任何资源或者在结果中有任何的开始标记。这与许多其他诸如SQL等平台有显著的不同,那些平台如果开始获取预先查询结果的一个子集,就必须继续向服务器发请求,通过某种服务端的状态标注来获取剩余的结果。