Golang Elasticsearches 批量修改查询及发送MQ


Posted in Golang onApril 19, 2022

update_by_query批量修改

POST post-v1_1-2021.02,post-v1_1-2021.03,post-v1_1-2021.04/_update_by_query
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "join_field": {
              "value": "post"
            }
          }
        },
        {
          "term": {
            "platform": {
              "value": "toutiao"
            }
          }
        },
        {
          "exists": {
            "field": "liked_count"
          }
        }
      ]
    }
  },
  "script":{
    "source":"ctx._source.liked_count=0",
    "lang":"painless"
  }
}

索引添加字段

PUT user_tiktok/_doc/_mapping?include_type_name=true
{
  "post_signature":{
    "StuClass":{
      "type":"keyword"
    },
    "post_token":{
      "type":"keyword"
    }
  }
}
PUT user_toutiao/_mapping
{
  "properties": {
    "user_token": {
      "type": "text"
    }
  }
}

查询es发送MQ

from celery import Celery
from elasticsearch import Elasticsearch
import logging
import arrow
import pytz
from elasticsearch.helpers import scan, streaming_bulk
import redis
pool_16_8 = redis.ConnectionPool(host='10.0.3.100', port=6379, db=8, password='EfcHGSzKqg6cfzWq')
rds_16_8 = redis.StrictRedis(connection_pool=pool_16_8)
logger = logging.getLogger('elasticsearch')
logger.disabled = False
logger.setLevel(logging.INFO)
es_zoo_connection = Elasticsearch('http://eswriter:e s密码@e sip:4000', dead_timeout=10,
                                  retry_on_timeout=True)
logger = logging.getLogger(__name__)
class ES(object):
    index = None
    doc_type = None
    id_field = '_id'
    version = ''
    source_id_field = ''
    aliase_field = ''
    separator = '-'
    aliase_func = None
    es = None
    tz = pytz.timezone('Asia/Shanghai')
    logger = logger
    @classmethod
    def mget(cls, ids=None, index=None, **kwargs):
        index = index or cls.index
        docs = cls.es.mget(body={'ids': ids}, doc_type=cls.doc_type, index=index, **kwargs)
        return docs
    @classmethod
    def count(cls, query=None, index=None, **kwargs):
        index = index or cls.index
        c = cls.es.count(doc_type=cls.doc_type, body=query, index=index, **kwargs)
        return c.get('count', 0)
    @classmethod
    def upsert(cls, doc, doc_id=None, index=None, doc_as_upsert=True, **kwargs):
        body = {
            "doc": doc,
        }
        if doc_as_upsert:
            body['doc_as_upsert'] = True
        id = doc_id or cls.id_name(doc)
        index = index or cls.index_name(doc)
        cls.es.update(index, id, cls.doc_type, body, **kwargs)
    @classmethod
    def search(cls, index=None, query=None, **kwargs):
        index = index or cls.index
        return cls.es.search(index=index, body=query, **kwargs)
    @classmethod
    def scan(cls, query, index=None, **kwargs):
        return scan(cls.es,
                    query=query,
                    index=index or cls.index,
                    **kwargs)
    @classmethod
    def index_name(cls, doc):
        if cls.aliase_field and cls.aliase_field in doc.keys():
            aliase_part = doc[cls.aliase_field]
            if isinstance(aliase_part, str):
                aliase_part = arrow.get(aliase_part)
            if isinstance(aliase_part, int):
                aliase_part = arrow.get(aliase_part).astimezone(cls.tz)
            if cls.version:
                index = '{}{}{}{}{}'.format(cls.index, cls.separator, cls.version, cls.separator,
                                            cls.aliase_func(aliase_part))
            else:
                index = '{}{}{}'.format(cls.index, cls.separator, cls.aliase_func(aliase_part))
        else:
            index = cls.index
        return index
    @classmethod
    def id_name(cls, doc):
        id = doc.get(cls.id_field) and doc.pop(cls.id_field) or doc.get(cls.source_id_field)
        if not id:
            print('========', doc)
        assert id, 'doc _id must not be None'
        return id
    @classmethod
    def bulk_upsert(cls, docs, **kwargs):
        """
        批量操作文章, 仅支持 index 和 update
        """
        op_type = kwargs.get('op_type') or 'update'
        chunk_size = kwargs.get('chunk_size')
        if op_type == 'update':
            upsert = kwargs.get('upsert', True)
            if upsert is None:
                upsert = True
        else:
            upsert = False
        actions = cls._gen_bulk_actions(docs, cls.index_name, cls.doc_type, cls.id_name, op_type, upsert=upsert)
        result = streaming_bulk(cls.es, actions, chunk_size=chunk_size, raise_on_error=False, raise_on_exception=False,
                                max_retries=5, request_timeout=25)
        return result
    @classmethod
    def _gen_bulk_actions(cls, docs, index_name, doc_type, id_name, op_type, upsert=True, **kwargs):
        assert not upsert or (upsert and op_type == 'update'), 'upsert should use "update" as op_type'
        for doc in docs:
            # 支持 index_name 作为一个工厂函数
            if callable(index_name):
                index = index_name(doc)
            else:
                index = index_name
            if op_type == 'index':
                _source = doc
            elif op_type == 'update' and not upsert:
                _source = {'doc': doc}
            elif op_type == 'update' and upsert:
                _source = {'doc': doc, 'doc_as_upsert': True}
            else:
                continue
            if callable(id_name):
                id = id_name(doc)
            else:
                id = id_name
            # 生成 Bulk 动作
            action = {
                "_op_type": op_type,
                "_index": index,
                "_type": doc_type,
                "_id": id,
                "_source": _source
            }
            yield action
class tiktokEsUser(ES):
    index = 'user_tiktok'
    doc_type = '_doc'
    id_field = '_id'
    source_id_field = 'user_id'
    es = es_zoo_connection
from kombu import Exchange, Queue, binding
def data_es_route_task_spider(name, args, kwargs, options, task=None, **kw):
    return {
        'exchange': 'tiktok',
        'exchange_type': 'topic',
        'routing_key': name
    }
class DataEsConfig_download(object):
    broker_url = 'amqp://用户:密码@ip:端口/'
    task_ignore_result = True
    task_serializer = 'json'
    accept_content = ['json']
    task_default_queue = 'default'
    task_default_exchange = 'default'
    task_default_routing_key = 'default'
    exchange = Exchange('tiktok', type='topic')
    task_queues = [
        Queue(
            'tiktok.user_avatar.download',
            [binding(exchange, routing_key='tiktok.user_avatar.download')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.post_avatar.download',
            [binding(exchange, routing_key='tiktok.post_avatar.download')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.post.spider',
            [binding(exchange, routing_key='tiktok.post.spider')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.post.save',
            [binding(exchange, routing_key='tiktok.post.save')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.user.save',
            [binding(exchange, routing_key='tiktok.user.save')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.post_avatar.invalid',
            [binding(exchange, routing_key='tiktok.post_avatar.invalid')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.user_avatar.invalid',
            [binding(exchange, routing_key='tiktok.user_avatar.invalid')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
        Queue(
            'tiktok.comment.save',
            [binding(exchange, routing_key='tiktok.comment.save')],
            queue_arguments={'x-queue-mode': 'lazy'}
        ),
    ]
    task_routes = (data_es_route_task_spider,)
    enable_utc = True
    timezone = "Asia/Shanghai"
# 下载app
tiktok_app = Celery(
    'tiktok',
    include=[
        'task.tasks',
    ]
)
tiktok_app.config_from_object(DataEsConfig_download)
# 发任务生产者,更新舆情user历史信息
def send_post():
    query = {
        "query": {
            "bool": {
                "must": [
                    {
                        "exists": {
                            "field": "post_signature"
                        }
                    },
                    {
                        "range": {
                            "following_num": {
                                "gte": 1000
                            }
                        }
                    }
                ]
            }
        },
        "_source": ["region", "sec_uid", "post_signature"]
    }
    # query = {
    #     "query": {
    #         "bool": {
    #             "must": [
    #                 {"exists": {
    #                     "field": "post_signature"
    #                 }},
    #                 {
    #                     "match": {
    #                         "region": "MY"
    #                     }
    #                 }
    #             ]
    #         }
    #     },
    #     "_source": ["region", "sec_uid", "post_signature"]
    # }
    r = tiktokEsUser.scan(query=query, scroll='30m', request_timeout=100)
    for item in map(lambda x: x['_source'], r):
        tiktok_app.send_task('tiktok.post.spider', args=(item,))
def send_sign_token():
    query = {
        "query": {
            "bool": {
                "must": [
                    {
                        "exists": {
                            "field": "post_signature"
                        }
                    },
                    {
                        "range": {
                            "following_num": {
                                "gte": 1000
                            }
                        }
                    },
                    {
                        "range": {
                            "create_time": {
                                "gte": "2021-01-06T00:00:00",
                                "lte": "2021-01-06T01:00:00"
                            }
                        }
                    }
                ]
            }
        },
        "_source": ["user_id", "sec_uid"]
    }
    r = tiktokEsUser.scan(query=query, scroll='30m', request_timeout=100)
    for item in map(lambda x: x['_source'], r):
        tiktok_app.send_task('tiktok.user.sign_token', args=(item,))
if __name__ == '__main__':
    send_post()
    # send_sign_token()

以上就是go语言实现Elasticsearches批量修改查询及发送MQ操作示例的详细内容!

Golang 相关文章推荐
golang interface判断为空nil的实现代码
Apr 24 Golang
Go语言-为什么返回值为接口类型,却返回结构体
Apr 24 Golang
Golang之sync.Pool使用详解
May 06 Golang
go开发alertmanger实现钉钉报警
Jul 16 Golang
使用GO语言实现Mysql数据库CURD的简单示例
Aug 07 Golang
浅谈GO中的Channel以及死锁的造成
Mar 18 Golang
Golang 遍历二叉树
Apr 19 Golang
Golang获取List列表元素的四种方式
Apr 20 Golang
Golang 结构体数据集合
Apr 22 Golang
Golang并发工具Singleflight
May 06 Golang
Go语言怎么使用变长参数函数
Jul 15 Golang
GO语言异常处理分析 err接口及defer延迟
Apr 14 #Golang
GO语言字符串处理函数之处理Strings包
Apr 14 #Golang
golang的文件创建及读写操作
Apr 14 #Golang
golang定时器
Apr 14 #Golang
golang用type-switch判断interface的实际存储类型
Apr 14 #Golang
golang语言指针操作
Apr 14 #Golang
golang使用map实现去除重复数组
Apr 14 #Golang
You might like
PHP+DBM的同学录程序(1)
2006/10/09 PHP
PHP防注入安全代码
2008/04/09 PHP
浅析十款PHP开发框架的对比
2013/07/05 PHP
PHP实现的解汉诺塔问题算法示例
2018/08/06 PHP
Html中JS脚本执行顺序简单举例说明
2010/06/19 Javascript
jQuery :nth-child前有无空格的区别分析
2011/07/11 Javascript
Javascript中的高阶函数介绍
2015/03/15 Javascript
jQuery实现的AJAX简单弹出层效果代码
2015/11/26 Javascript
jquery按回车键实现表单提交的简单实例
2016/05/25 Javascript
Angularjs 动态改变title标题(兼容ios)
2016/12/29 Javascript
JS实现购物车特效
2017/02/02 Javascript
js实现简单的获取验证码按钮效果
2017/03/03 Javascript
vue指令之表单控件绑定v-model v-model与v-bind结合使用
2019/04/17 Javascript
简述Vue中容易被忽视的知识点
2019/12/09 Javascript
原生javascript如何实现共享onload事件
2020/07/03 Javascript
Vue Render函数创建DOM节点代码实例
2020/07/08 Javascript
简单讲解Python中的闭包
2015/08/11 Python
python logging日志模块以及多进程日志详解
2018/04/18 Python
浅谈tensorflow1.0 池化层(pooling)和全连接层(dense)
2018/04/27 Python
python中的常量和变量代码详解
2018/07/25 Python
Django实现表单验证
2018/09/08 Python
Opencv+Python 色彩通道拆分及合并的示例
2018/12/08 Python
Python简单I/O操作示例
2019/03/18 Python
python opencv实现图像边缘检测
2019/04/29 Python
python中多个装饰器的调用顺序详解
2019/07/16 Python
详解Anconda环境下载python包的教程(图形界面+命令行+pycharm安装)
2019/11/11 Python
如何基于python生成list的所有的子集
2019/11/11 Python
使用Python的datetime库处理时间(RPA流程)
2019/11/24 Python
Python函数参数类型及排序原理总结
2019/12/19 Python
CSS3动画:5种预载动画效果实例
2017/04/05 HTML / CSS
浅谈Html5多线程开发之WebWorkers
2018/05/02 HTML / CSS
Crabtree & Evelyn英国官网:瑰珀翠护手霜、香水、沐浴和身体护理
2018/04/26 全球购物
函授毕业生自我鉴定
2013/11/06 职场文书
党员评议表自我评价范文
2014/10/20 职场文书
小学教师自我评价
2015/03/04 职场文书
一个独生女的故事观后感
2015/06/04 职场文书