Golang Elasticsearches 批量修改查询及发送MQ


Posted in Golang onApril 19, 2022

update_by_query批量修改

POST post-v1_1-2021.02,post-v1_1-2021.03,post-v1_1-2021.04/_update_by_query
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "join_field": {
              "value": "post"
            }
          }
        },
        {
          "term": {
            "platform": {
              "value": "toutiao"
            }
          }
        },
        {
          "exists": {
            "field": "liked_count"
          }
        }
      ]
    }
  },
  "script":{
    "source":"ctx._source.liked_count=0",
    "lang":"painless"
  }
}

索引添加字段

PUT user_tiktok/_doc/_mapping?include_type_name=true
{
  "post_signature":{
    "StuClass":{
      "type":"keyword"
    },
    "post_token":{
      "type":"keyword"
    }
  }
}
PUT user_toutiao/_mapping
{
  "properties": {
    "user_token": {
      "type": "text"
    }
  }
}

查询es发送MQ

from celery import Celery
from elasticsearch import Elasticsearch
import logging
import arrow
import pytz
from elasticsearch.helpers import scan, streaming_bulk
import redis
pool_16_8 = redis.ConnectionPool(host='10.0.3.100', port=6379, db=8, password='EfcHGSzKqg6cfzWq')
rds_16_8 = redis.StrictRedis(connection_pool=pool_16_8)
logger = logging.getLogger('elasticsearch')
logger.disabled = False
logger.setLevel(logging.INFO)
es_zoo_connection = Elasticsearch('http://eswriter:e s密码@e sip:4000', dead_timeout=10,
                                  retry_on_timeout=True)
logger = logging.getLogger(__name__)
class ES(object):
    index = None
    doc_type = None
    id_field = '_id'
    version = ''
    source_id_field = ''
    aliase_field = ''
    separator = '-'
    aliase_func = None
    es = None
    tz = pytz.timezone('Asia/Shanghai')
    logger = logger
    @classmethod
    def mget(cls, ids=None, index=None, **kwargs):
        index = index or cls.index
        docs = cls.es.mget(body={'ids': ids}, doc_type=cls.doc_type, index=index, **kwargs)
        return docs
    @classmethod
    def count(cls, query=None, index=None, **kwargs):
        index = index or cls.index
        c = cls.es.count(doc_type=cls.doc_type, body=query, index=index, **kwargs)
        return c.get('count', 0)
    @classmethod
    def upsert(cls, doc, doc_id=None, index=None, doc_as_upsert=True, **kwargs):
        body = {
            "doc": doc,
        }
        if doc_as_upsert:
            body['doc_as_upsert'] = True
        id = doc_id or cls.id_name(doc)
        index = index or cls.index_name(doc)
        cls.es.update(index, id, cls.doc_type, body, **kwargs)
    @classmethod
    def search(cls, index=None, query=None, **kwargs):
        index = index or cls.index
        return cls.es.search(index=index, body=query, **kwargs)
    @classmethod
    def scan(cls, query, index=None, **kwargs):
        return scan(cls.es,
                    query=query,
                    index=index or cls.index,
                    **kwargs)
    @classmethod
    def index_name(cls, doc):
        if cls.aliase_field and cls.aliase_field in doc.keys():
            aliase_part = doc[cls.aliase_field]
            if isinstance(aliase_part, str):
                aliase_part = arrow.get(aliase_part)
            if isinstance(aliase_part, int):
                aliase_part = arrow.get(aliase_part).astimezone(cls.tz)
            if cls.version:
                index = '{}{}{}{}{}'.format(cls.index, cls.separator, cls.version, cls.separator,
                                            cls.aliase_func(aliase_part))
            else:
                index = '{}{}{}'.format(cls.index, cls.separator, cls.aliase_func(aliase_part))
        else:
            index = cls.index
        return index
    @classmethod
    def id_name(cls, doc):
        id = doc.get(cls.id_field) and doc.pop(cls.id_field) or doc.get(cls.source_id_field)
        if not id:
            print('========', doc)
        assert id, 'doc _id must not be None'
        return id
    @classmethod
    def bulk_upsert(cls, docs, **kwargs):
        """
        批量操作文章, 仅支持 index 和 update
        """
        op_type = kwargs.get('op_type') or 'update'
        chunk_size = kwargs.get('chunk_size')
        if op_type == 'update':
            upsert = kwargs.get('upsert', True)
            if upsert is None:
                upsert = True
        else:
            upsert = False
        actions = cls._gen_bulk_actions(docs, cls.index_name, cls.doc_type, cls.id_name, op_type, upsert=upsert)
        result = streaming_bulk(cls.es, actions, chunk_size=chunk_size, raise_on_error=False, raise_on_exception=False,
                                max_retries=5, request_timeout=25)
        return result
    @classmethod
    def _gen_bulk_actions(cls, docs, index_name, doc_type, id_name, op_type, upsert=True, **kwargs):
        assert not upsert or (upsert and op_type == 'update'), 'upsert should use "update" as op_type'
        for doc in docs:
            # 支持 index_name 作为一个工厂函数
            if callable(index_name):
                index = index_name(doc)
            else:
                index = index_name
            if op_type == 'index':
                _source = doc
            elif op_type == 'update' and not upsert:
                _source = {'doc': doc}
            elif op_type == 'update' and upsert:
                _source = {'doc': doc, 'doc_as_upsert': True}
            else:
                continue
            if callable(id_name):
                id = id_name(doc)
            else:
                id = id_name
            # 生成 Bulk 动作
            action = {
                "_op_type": op_type,
                "_index": index,
                "_type": doc_type,
                "_id": id,
                "_source": _source
            }
            yield action
class tiktokEsUser(ES):
    index = 'user_tiktok'
    doc_type = '_doc'
    id_field = '_id'
    source_id_field = 'user_id'
    es = es_zoo_connection
from kombu import Exchange, Queue, binding
def data_es_route_task_spider(name, args, kwargs, options, task=None, **kw):
    return {
        'exchange': 'tiktok',
        'exchange_type': 'topic',
        'routing_key': name
    }
class DataEsConfig_download(object):
    broker_url = 'amqp://用户:密码@ip:端口/'
    task_ignore_result = True
    task_serializer = 'json'
    accept_content = ['json']
    task_default_queue = 'default'
    task_default_exchange = 'default'
    task_default_routing_key = 'default'
    exchange = Exchange('tiktok', type='topic')
    task_queues = [
        Queue(
            'tiktok.user_avatar.download',
            [binding(exchange, routing_key='tiktok.user_avatar.download')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.post_avatar.download',
            [binding(exchange, routing_key='tiktok.post_avatar.download')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.post.spider',
            [binding(exchange, routing_key='tiktok.post.spider')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.post.save',
            [binding(exchange, routing_key='tiktok.post.save')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.user.save',
            [binding(exchange, routing_key='tiktok.user.save')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.post_avatar.invalid',
            [binding(exchange, routing_key='tiktok.post_avatar.invalid')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.user_avatar.invalid',
            [binding(exchange, routing_key='tiktok.user_avatar.invalid')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.comment.save',
            [binding(exchange, routing_key='tiktok.comment.save')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
    ]
    task_routes = (data_es_route_task_spider,)
    enable_utc = True
    timezone = "Asia/Shanghai"
# 下载app
tiktok_app = Celery(
    'tiktok',
    include=[
        'task.tasks',
    ]
)
tiktok_app.config_from_object(DataEsConfig_download)
# 发任务生产者,更新舆情user历史信息
def send_post():
    query = {
        "query": {
            "bool": {
                "must": [
                    {
                        "exists": {
                            "field": "post_signature"
                        }
                    },
                    {
                        "range": {
                            "following_num": {
                                "gte": 1000
                            }
                        }
                    }
                ]
            }
        },
        "_source": ["region", "sec_uid", "post_signature"]
    }
    # query = {
    #     "query": {
    #         "bool": {
    #             "must": [
    #                 {"exists": {
    #                     "field": "post_signature"
    #                 }},
    #                 {
    #                     "match": {
    #                         "region": "MY"
    #                     }
    #                 }
    #             ]
    #         }
    #     },
    #     "_source": ["region", "sec_uid", "post_signature"]
    # }
    r = tiktokEsUser.scan(query=query, scroll='30m', request_timeout=100)
    for item in map(lambda x: x['_source'], r):
        tiktok_app.send_task('tiktok.post.spider', args=(item,))
def send_sign_token():
    query = {
        "query": {
            "bool": {
                "must": [
                    {
                        "exists": {
                            "field": "post_signature"
                        }
                    },
                    {
                        "range": {
                            "following_num": {
                                "gte": 1000
                            }
                        }
                    },
                    {
                        "range": {
                            "create_time": {
                                "gte": "2021-01-06T00:00:00",
                                "lte": "2021-01-06T01:00:00"
                            }
                        }
                    }
                ]
            }
        },
        "_source": ["user_id", "sec_uid"]
    }
    r = tiktokEsUser.scan(query=query, scroll='30m', request_timeout=100)
    for item in map(lambda x: x['_source'], r):
        tiktok_app.send_task('tiktok.user.sign_token', args=(item,))
if __name__ == '__main__':
    send_post()
    # send_sign_token()

以上就是go语言实现Elasticsearches批量修改查询及发送MQ操作示例的详细内容!

Golang 相关文章推荐
解决Golang中ResponseWriter的一个坑
Apr 27 Golang
go语言中json数据的读取和写出操作
Apr 28 Golang
基于Go Int转string几种方式性能测试
Apr 28 Golang
彻底理解golang中什么是nil
Apr 29 Golang
golang 比较浮点数的大小方式
May 02 Golang
Go 语言下基于Redis分布式锁的实现方式
Jun 28 Golang
Go Plugins插件的实现方式
Aug 07 Golang
Go 语言中 20 个占位符的整理
Oct 16 Golang
深入理解go缓存库freecache的使用
Feb 15 Golang
Golang 并发下的问题定位及解决方案
Mar 16 Golang
golang生成vcf通讯录格式文件详情
Mar 25 Golang
GO语言字符串处理函数之处理Strings包
Apr 14 Golang
GO语言异常处理分析 err接口及defer延迟
Apr 14 #Golang
GO语言字符串处理函数之处理Strings包
Apr 14 #Golang
golang的文件创建及读写操作
Apr 14 #Golang
golang定时器
Apr 14 #Golang
golang用type-switch判断interface的实际存储类型
Apr 14 #Golang
golang语言指针操作
Apr 14 #Golang
golang使用map实现去除重复数组
Apr 14 #Golang
You might like
PHP数据库开发知多少
2006/10/09 PHP
PHP跨平台获取服务器IP地址自定义函数分享
2014/12/29 PHP
php数组冒泡排序算法实例
2016/05/06 PHP
thinkphp整合微信支付代码分享
2016/11/24 PHP
Javascript Jquery 遍历Json的实现代码
2010/03/31 Javascript
Jquery css函数用法(判断标签是否拥有某属性)
2011/05/28 Javascript
使用javascript实现有效时间的控制,并显示将要过期的时间
2014/01/02 Javascript
JS替换文本域内的回车示例
2014/02/18 Javascript
javascript中局部变量和全局变量的区别详解
2015/02/27 Javascript
javascript实现简单的鼠标拖动效果实例
2015/04/10 Javascript
怎么通过onclick事件获取js函数返回值(代码少)
2015/07/28 Javascript
RequireJS 依赖关系的实例(推荐)
2017/01/21 Javascript
boostrap模态框二次弹出清空原有内容的方法
2018/08/10 Javascript
基于AngularJs select绑定数字类型的问题
2018/10/08 Javascript
JS实现根据数组对象的某一属性排序操作示例
2019/01/14 Javascript
基于Taro的微信小程序模板消息-获取formId功能模块封装实践
2019/07/15 Javascript
layui switch 开关监听 弹出确定状态转换的例子
2019/09/21 Javascript
python下函数参数的传递(参数带星号的说明)
2010/09/19 Python
在GitHub Pages上使用Pelican搭建博客的教程
2015/04/25 Python
Python3单行定义多个变量或赋值方法
2018/07/12 Python
Django中使用第三方登录的示例代码
2018/08/20 Python
解决Numpy中sum函数求和结果维度的问题
2019/12/06 Python
Python爬取新型冠状病毒“谣言”新闻进行数据分析
2020/02/16 Python
使用celery和Django处理异步任务的流程分析
2020/02/19 Python
C#如何进行LDAP用户校验
2012/11/21 面试题
师范应届生教师求职信
2013/11/05 职场文书
幼儿运动会邀请函
2014/01/17 职场文书
先进德育工作者事迹材料
2014/01/24 职场文书
2014年小学重阳节活动策划方案
2014/09/16 职场文书
党支部工作总结2015
2015/04/01 职场文书
召开会议通知范文
2015/04/15 职场文书
应急管理工作总结2015
2015/05/04 职场文书
苦儿流浪记读书笔记
2015/07/01 职场文书
治理商业贿赂工作总结
2015/08/10 职场文书
2019思想汇报范文
2019/05/21 职场文书
Java界面编程实现界面跳转
2022/06/16 Java/Android