我下面有代码将数据添加到弹性搜索中
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
r = [{'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]
es.indices.create(index='my-index_1', ignore=400)
for e in enumerate(r):
#es.indices.update(index="my-index_1", body=e[1])
es.index(index="my-index_1", body=e[1])
#Retrieve the data
es.search(index = 'my-index_1')['hits']['hits']
要求如何更新文件
r = [{'Name': 'Dr. Messi', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]
这里Dr. Messi, Dr. Christiano
必须更新索引,Dr. Bernard M. Aaron
而不应该更新,因为它已经存在于索引中
在Elasticsearch中,当索引数据时没有提供自定义ID时,elasticsearch将为您索引的每个文档创建一个新的ID。
因此,在您不提供任何ID的情况下,elasticsearch会为您提供它。但是您还想检查是否Name
已经建立索引,这取决于您要对数据进行索引。有两种可能的解决方案。
_id
为每个文档传递。此后,您将必须搜索Name
文档是否存在。_id
为每个文档用自己的索引数据。使用进行此搜索之后_id
。这是一种更快,更轻松的方法。我正在使用创建自己的ID的第二种方法。在您进行搜索时,Name
我将创建一个基于Name
值的字段。Name
值字段的哈希值是_id
。我将使用md5。但是您可以使用任何其他哈希函数。
第一索引数据:
import hashlib
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
r = [{'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]
index_name="my-index_1"
es.indices.create(index=index_name, ignore=400)
for e in enumerate(r):
#es.indices.update(index="my-index_1", body=e[1])
es.index(index=index_name, body=e[1],id=hashlib.md5(e[1]['Name'].encode()).hexdigest())
输出:
[{'_index': 'my-index_1',
'_type': '_doc',
'_id': '1164c423bc4e2fcb75697c3031af9ef1',
'_score': 1.0,
'_source': {'Name': 'Dr. Christopher DeSimone',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '672ae14197a135c39eab759be8b0597f',
'_score': 1.0,
'_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '85702447f9e9ea010054eaf0555ce79c',
'_score': 1.0,
'_source': {'Name': 'Dr. Bernard M. Aaron',
'Specialised and Location': 'Health'}}]
下一步:索引新数据
r = [{'Name': 'Dr. Messi', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]
for rec in r:
try:
es.get(index=index_name, id=hashlib.md5(rec['Name'].encode()).hexdigest())
except NotFoundError:
print("Record Not found")
es.index(index=index_name, body=rec,id=hashlib.md5(rec['Name'].encode()).hexdigest())
输出:
[{'_index': 'my-index_1',
'_type': '_doc',
'_id': '1164c423bc4e2fcb75697c3031af9ef1',
'_score': 1.0,
'_source': {'Name': 'Dr. Christopher DeSimone',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '672ae14197a135c39eab759be8b0597f',
'_score': 1.0,
'_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '85702447f9e9ea010054eaf0555ce79c',
'_score': 1.0,
'_source': {'Name': 'Dr. Bernard M. Aaron',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': 'e2e0f463145568471097ff027b18b40d',
'_score': 1.0,
'_source': {'Name': 'Dr. Messi', 'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '23bb4f1a3a41efe7f4cab8a80d766708',
'_score': 1.0,
'_source': {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'}}]
如您所见,Dr. Bernard M. Aaron
记录没有索引,因为它已经存在
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句