【DataSophon】DataSophon1.2.1 整合Zeppelin并配置Hive|Trino|Spark解释器

news2024/10/9 12:04:15

目录

​一、Zeppelin简介

二、实现步骤

2.1 Zeppelin包下载

2.2 work配置文件

三、配置常用解释器

3.1配置Hive解释器

3.2 配置trino解释器

3.3 配置Spark解释器


一、Zeppelin简介


Zeppelin是Apache基金会下的一个开源框架,它提供了一个数据可视化的框架,是一个基于web的notebook。后台支持接入多种数据引擎,比如jdbc、spark、hive等。同时也支持多种语言进行交互式的数据分析,比如Scala、SQL、Python等等。本文从安装和使用两部分来介绍Zeppelin。


二、实现步骤


官网安装参考文档:datasophon/docs/zh/datasophon集成zeppelin.md at dev · datavane/datasophon · GitHub

2.1 Zeppelin包下载


https://dlcdn.apache.org/zeppelin/zeppelin-0.10.1/zeppelin-0.10.1-bin-all.tgz

# 创建配置存放目录
mkdir -p /opt/datasophon/DDP/packages/datasophon-manager-1.2.1/conf/meta/DDP-1.2.1/ZEPPELIN

将配置json文件和启停脚本准备好,还需要将启动脚本复制到安装包根目录下

control_zeppelin.sh

#!/bin/bash

SCRIPT_DIR=$(cd "$(dirname "$0")" && pwd)
ZEPPELIN_HOME=$SCRIPT_DIR  # 当前目录即为 Zeppelin 安装目录
ZEPPELIN_DAEMON=$ZEPPELIN_HOME/bin/zeppelin-daemon.sh

start_zeppelin() {
    echo "Starting Zeppelin..."
    $ZEPPELIN_DAEMON start
}

stop_zeppelin() {
    echo "Stopping Zeppelin..."
    $ZEPPELIN_DAEMON stop
}

check_zeppelin_status() {
    echo "Checking Zeppelin status..."
    $ZEPPELIN_DAEMON status
    if [ $? -eq 0 ]; then
        echo "Zeppelin is running."
        exit 0
    else
        echo "Zeppelin is not running."
        exit 1
    fi
}

case "$1" in
    start)
        start_zeppelin
        ;;
    stop)
        stop_zeppelin
        ;;
    restart)
        stop_zeppelin
        sleep 5  # 等待一些时间确保Zeppelin完全停止
        start_zeppelin
        ;;
    status)
        check_zeppelin_status
        ;;
    *)
        echo "Usage: $0 {start|stop|restart|status}"
        exit 1
        ;;
esac

control_zeppelin.sh

#!/bin/bash

SCRIPT_DIR=$(cd "$(dirname "$0")" && pwd)
ZEPPELIN_HOME=$SCRIPT_DIR  # 当前目录即为 Zeppelin 安装目录
ZEPPELIN_DAEMON=$ZEPPELIN_HOME/bin/zeppelin-daemon.sh

start_zeppelin() {
    echo "Starting Zeppelin..."
    $ZEPPELIN_DAEMON start
}

stop_zeppelin() {
    echo "Stopping Zeppelin..."
    $ZEPPELIN_DAEMON stop
}

check_zeppelin_status() {
    echo "Checking Zeppelin status..."
    $ZEPPELIN_DAEMON status
    if [ $? -eq 0 ]; then
        echo "Zeppelin is running."
        exit 0
    else
        echo "Zeppelin is not running."
        exit 1
    fi
}

case "$1" in
    start)
        start_zeppelin
        ;;
    stop)
        stop_zeppelin
        ;;
    restart)
        stop_zeppelin
        sleep 5  # 等待一些时间确保Zeppelin完全停止
        start_zeppelin
        ;;
    status)
        check_zeppelin_status
        ;;
    *)
        echo "Usage: $0 {start|stop|restart|status}"
        exit 1
        ;;
esac

service_ddl.json

{
  "name": "ZEPPELIN",
  "label": "ZEPPELIN",
  "description": "交互式数据分析notebook",
  "version": "0.10.1",
  "sortNum": 1,
  "dependencies": [],
  "packageName": "zeppelin-0.10.1.tar.gz",
  "decompressPackageName": "zeppelin-0.10.1",
  "roles": [
    {
      "name": "ZeppelinServer",
      "label": "ZeppelinServer",
      "roleType": "master",
      "cardinality": "1+",
      "runAs": {},
      "logFile": "logs/zeppelin-root-${host}.log",
      "startRunner": {
        "timeout": "60",
        "program": "control_zeppelin.sh",
        "args": [
          "start"
        ]
      },
      "stopRunner": {
        "timeout": "600",
        "program": "control_zeppelin.sh",
        "args": [
          "stop"
        ]
      },
      "statusRunner": {
        "timeout": "60",
        "program": "control_zeppelin.sh",
        "args": [
          "status"
        ]
      },
      "externalLink": {
        "name": "ZeppelinServer UI",
        "label": "ZeppelinServer UI",
        "url": "http://${host}:8889"
      }
    }
  ],
  "configWriter": {
    "generators": [
      {
        "filename": "zeppelin-env.sh",
        "configFormat": "custom",
        "outputDirectory": "conf",
        "templateName": "zeppelin-env.ftl",
        "includeParams": [
          "custom.zeppelin.env"
        ]
      },
      {
        "filename": "zeppelin-site.xml",
        "configFormat": "custom",
        "outputDirectory": "conf",
        "templateName": "zeppelin-site.ftl",
        "includeParams": [
          "jobmanagerEnable",
          "custom.zeppelin.site.xml"
        ]
      }
    ]
  },
  "parameters": [
    {
      "name": "jobmanagerEnable",
      "label": "jobmanagerEnable",
      "description": "The Job tab in zeppelin page seems not so useful instead it cost lots of memory and affect the performance.Disable it can save lots of memory",
      "configType": "map",
      "required": true,
      "type": "switch",
      "value": true,
      "configurableInWizard": true,
      "hidden": false,
      "defaultValue": true
    },
    {
      "name": "custom.zeppelin.env",
      "label": "自定义配置 zeppelin-env.sh",
      "description": "自定义配置",
      "configType": "custom",
      "required": false,
      "type": "multipleWithKey",
      "value": [{"HADOOP_CONF_DIR":"${HADOOP_CONF_DIR}"}],
      "configurableInWizard": true,
      "hidden": false,
      "defaultValue": [{"HADOOP_CONF_DIR":"${HADOOP_CONF_DIR}"}]
    },
    {
      "name": "custom.zeppelin.site.xml",
      "label": "自定义配置 zeppelin-site.xml",
      "description": "自定义配置",
      "configType": "custom",
      "required": false,
      "type": "multipleWithKey",
      "value": [],
      "configurableInWizard": true,
      "hidden": false,
      "defaultValue": ""
    }
  ]
}

2.2 work配置文件


work下需要准备两个配置文件zeppelin-env.ftl和zeppelin-site.ftl

zeppelin-env.ftl

#!/bin/bash

export ZEPPELIN_ADDR=0.0.0.0
export ZEPPELIN_PORT=8889
parent_dir=$(dirname "$(cd "$(dirname "$0")" && pwd)")
export JAVA_HOME=$parent_dir/jdk1.8.0_333

<#list itemList as item>
export ${item.name}=${item.value}
</#list>

zeppelin-site.ftl

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->

<configuration>

<property>
  <name>zeppelin.server.addr</name>
  <value>127.0.0.1</value>
  <description>Server binding address</description>
</property>

<property>
  <name>zeppelin.server.port</name>
  <value>8080</value>
  <description>Server port.</description>
</property>

<property>
  <name>zeppelin.cluster.addr</name>
  <value></value>
  <description>Server cluster address, eg. 127.0.0.1:6000,127.0.0.2:6000,127.0.0.3:6000</description>
</property>

<property>
  <name>zeppelin.server.ssl.port</name>
  <value>8443</value>
  <description>Server ssl port. (used when ssl property is set to true)</description>
</property>

<property>
  <name>zeppelin.server.context.path</name>
  <value>/</value>
  <description>Context Path of the Web Application</description>
</property>

<property>
  <name>zeppelin.war.tempdir</name>
  <value>webapps</value>
  <description>Location of jetty temporary directory</description>
</property>

<property>
  <name>zeppelin.notebook.dir</name>
  <value>notebook</value>
  <description>path or URI for notebook persist</description>
</property>

<property>
  <name>zeppelin.interpreter.include</name>
  <value></value>
  <description>All the inteprreters that you would like to include. You can only specify either 'zeppelin.interpreter.include' or 'zeppelin.interpreter.exclude'. Specifying them together is not allowed.</description>
</property>

<property>
  <name>zeppelin.interpreter.exclude</name>
  <value></value>
  <description>All the inteprreters that you would like to exclude. You can only specify either 'zeppelin.interpreter.include' or 'zeppelin.interpreter.exclude'. Specifying them together is not allowed.</description>
</property>

<property>
  <name>zeppelin.notebook.homescreen</name>
  <value></value>
  <description>id of notebook to be displayed in homescreen. ex) 2A94M5J1Z Empty value displays default home screen</description>
</property>

<property>
  <name>zeppelin.notebook.homescreen.hide</name>
  <value>false</value>
  <description>hide homescreen notebook from list when this value set to true</description>
</property>

<property>
  <name>zeppelin.notebook.collaborative.mode.enable</name>
  <value>true</value>
  <description>Enable collaborative mode</description>
</property>

<!-- Google Cloud Storage notebook storage -->
<!--
<property>
  <name>zeppelin.notebook.gcs.dir</name>
  <value></value>
  <description>
    A GCS path in the form gs://bucketname/path/to/dir.
    Notes are stored at {zeppelin.notebook.gcs.dir}/{notebook-id}/note.json
 </description>
</property>

<property>
  <name>zeppelin.notebook.gcs.credentialsJsonFilePath</name>
  <value>path/to/key.json</value>
  <description>
    Path to GCS credential key file for authentication with Google Storage.
 </description>
</property>


<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.GCSNotebookRepo</value>
  <description>notebook persistence layer implementation</description>
</property>
-->

<!-- Amazon S3 notebook storage -->
<!-- Creates the following directory structure: s3://{bucket}/{username}/{notebook-id}/note.json -->
<!--
<property>
  <name>zeppelin.notebook.s3.user</name>
  <value>user</value>
  <description>user name for s3 folder structure</description>
</property>

<property>
  <name>zeppelin.notebook.s3.bucket</name>
  <value>zeppelin</value>
  <description>bucket name for notebook storage</description>
</property>

<property>
  <name>zeppelin.notebook.s3.endpoint</name>
  <value>s3.amazonaws.com</value>
  <description>endpoint for s3 bucket</description>
</property>

<property>
  <name>zeppelin.notebook.s3.timeout</name>
  <value>120000</value>
  <description>s3 bucket endpoint request timeout in msec.</description>
</property>

<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.S3NotebookRepo</value>
  <description>notebook persistence layer implementation</description>
</property>

-->

<!-- Additionally, encryption is supported for notebook data stored in S3 -->
<!-- Use the AWS KMS to encrypt data -->
<!-- If used, the EC2 role assigned to the EMR cluster must have rights to use the given key -->
<!-- See https://aws.amazon.com/kms/ and http://docs.aws.amazon.com/kms/latest/developerguide/concepts.html -->
<!--
<property>
  <name>zeppelin.notebook.s3.kmsKeyID</name>
  <value>AWS-KMS-Key-UUID</value>
  <description>AWS KMS key ID used to encrypt notebook data in S3</description>
</property>
-->

<!-- provide region of your KMS key -->
<!-- See http://docs.aws.amazon.com/general/latest/gr/rande.html#kms_region for region codes names -->
<!--
<property>
  <name>zeppelin.notebook.s3.kmsKeyRegion</name>
  <value>us-east-1</value>
  <description>AWS KMS key region in your AWS account</description>
</property>
-->

<!-- Use a custom encryption materials provider to encrypt data -->
<!-- No configuration is given to the provider, so you must use system properties or another means to configure -->
<!-- See https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/EncryptionMaterialsProvider.html -->
<!--
<property>
  <name>zeppelin.notebook.s3.encryptionMaterialsProvider</name>
  <value>provider implementation class name</value>
  <description>Custom encryption materials provider used to encrypt notebook data in S3</description>
</property>
-->

<!-- Server-side encryption enabled for notebooks -->
<!--
<property>
  <name>zeppelin.notebook.s3.sse</name>
  <value>true</value>
  <description>Server-side encryption enabled for notebooks</description>
</property>
-->

<!-- Path style access for S3 bucket -->
<!--
<property>
  <name>zeppelin.notebook.s3.pathStyleAccess</name>
  <value>true</value>
  <description>Path style access for S3 bucket</description>
</property>
-->

<!-- S3 Object Permissions (Canned ACL) for notebooks -->
<!--
<property>
  <name>zeppelin.notebook.s3.cannedAcl</name>
  <value>BucketOwnerFullControl</value>
  <description>Saves notebooks in S3 with the given Canned Access Control List.</description>
</property>
-->

<!-- Optional override to control which signature algorithm should be used to sign AWS requests -->
<!-- Set this property to "S3SignerType" if your AWS S3 compatible APIs support only AWS Signature Version 2 such as Ceph. -->
<!--
<property>
  <name>zeppelin.notebook.s3.signerOverride</name>
  <value>S3SignerType</value>
  <description>optional override to control which signature algorithm should be used to sign AWS requests</description>
</property>
-->

<!-- Aliyun OSS notebook storage -->
<!-- Creates the following directory structure: oss://{bucket}/{notebook_dir}/note_path -->
<!--

<property>
  <name>zeppelin.notebook.oss.bucket</name>
  <value>zeppelin</value>
  <description>bucket name for notebook storage</description>
</property>

<property>
  <name>zeppelin.notebook.oss.endpoint</name>
  <value>http://oss-cn-hangzhou.aliyuncs.com</value>
  <description>endpoint for oss bucket</description>
</property>

<property>
  <name>zeppelin.notebook.oss.accesskeyid</name>
  <value></value>
  <description>Access key id for your OSS account</description>
</property>

<property>
  <name>zeppelin.notebook.oss.accesskeysecret</name>
  <value></value>
  <description>Access key secret for your OSS account</description>
</property>

<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.OSSNotebookRepo</value>
  <description>notebook persistence layer implementation</description>
</property>

-->

<!-- If using Azure for storage use the following settings -->
<!--
<property>
  <name>zeppelin.notebook.azure.connectionString</name>
  <value>DefaultEndpointsProtocol=https;AccountName=<accountName>;AccountKey=<accountKey></value>
  <description>Azure account credentials</description>
</property>

<property>
  <name>zeppelin.notebook.azure.share</name>
  <value>zeppelin</value>
  <description>share name for notebook storage</description>
</property>

<property>
  <name>zeppelin.notebook.azure.user</name>
  <value>user</value>
  <description>optional user name for Azure folder structure</description>
</property>

<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.AzureNotebookRepo</value>
  <description>notebook persistence layer implementation</description>
</property>
-->

<!-- Notebook storage layer using local file system
<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.VFSNotebookRepo</value>
  <description>local notebook persistence layer implementation</description>
</property>
-->

<!-- Notebook storage layer using hadoop compatible file system
<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.FileSystemNotebookRepo</value>
  <description>Hadoop compatible file system notebook persistence layer implementation, such as local file system, hdfs, azure wasb, s3 and etc.</description>
</property>

<property>
  <name>zeppelin.server.kerberos.keytab</name>
  <value></value>
  <description>keytab for accessing kerberized hdfs</description>
</property>

<property>
  <name>zeppelin.server.kerberos.principal</name>
  <value></value>
  <description>principal for accessing kerberized hdfs</description>
</property>
-->

<!-- For connecting your Zeppelin with ZeppelinHub -->
<!--
<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.GitNotebookRepo, org.apache.zeppelin.notebook.repo.zeppelinhub.ZeppelinHubRepo</value>
  <description>two notebook persistence layers (versioned local + ZeppelinHub)</description>
</property>
-->

<!-- MongoDB notebook storage -->
<!--
<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.MongoNotebookRepo</value>
  <description>notebook persistence layer implementation</description>
</property>

<property>
  <name>zeppelin.notebook.mongo.uri</name>
  <value>mongodb://localhost</value>
  <description>MongoDB connection URI used to connect to a MongoDB database server</description>
</property>

<property>
  <name>zeppelin.notebook.mongo.database</name>
  <value>zeppelin</value>
  <description>database name for notebook storage</description>
</property>

<property>
  <name>zeppelin.notebook.mongo.collection</name>
  <value>notes</value>
  <description>collection name for notebook storage</description>
</property>

<property>
  <name>zeppelin.notebook.mongo.autoimport</name>
  <value>false</value>
  <description>import local notes into MongoDB automatically on startup, reset to false after import to avoid repeated import</description>
</property>
-->

<property>
  <name>zeppelin.notebook.storage</name>
  <value>org.apache.zeppelin.notebook.repo.GitNotebookRepo</value>
  <description>versioned notebook persistence layer implementation</description>
</property>

<property>
  <name>zeppelin.notebook.one.way.sync</name>
  <value>false</value>
  <description>If there are multiple notebook storages, should we treat the first one as the only source of truth?</description>
</property>

<property>
  <name>zeppelin.interpreter.dir</name>
  <value>interpreter</value>
  <description>Interpreter implementation base directory</description>
</property>

<property>
  <name>zeppelin.interpreter.localRepo</name>
  <value>local-repo</value>
  <description>Local repository for interpreter's additional dependency loading</description>
</property>

<property>
  <name>zeppelin.interpreter.dep.mvnRepo</name>
  <value>https://repo1.maven.org/maven2/</value>
  <description>Remote principal repository for interpreter's additional dependency loading</description>
</property>

<property>
  <name>zeppelin.dep.localrepo</name>
  <value>local-repo</value>
  <description>Local repository for dependency loader</description>
</property>

<property>
  <name>zeppelin.helium.node.installer.url</name>
  <value>https://nodejs.org/dist/</value>
  <description>Remote Node installer url for Helium dependency loader</description>
</property>

<property>
  <name>zeppelin.helium.npm.installer.url</name>
  <value>https://registry.npmjs.org/</value>
  <description>Remote Npm installer url for Helium dependency loader</description>
</property>

<property>
  <name>zeppelin.helium.yarnpkg.installer.url</name>
  <value>https://github.com/yarnpkg/yarn/releases/download/</value>
  <description>Remote Yarn package installer url for Helium dependency loader</description>
</property>

<!--
<property>
  <name>zeppelin.helium.registry</name>
  <value>helium,https://s3.amazonaws.com/helium-package/helium.json</value>
  <description>Location of external Helium Registry</description>
</property>
-->

<property>
  <name>zeppelin.interpreter.group.default</name>
  <value>spark</value>
  <description></description>
</property>

<property>
  <name>zeppelin.interpreter.connect.timeout</name>
  <value>60000</value>
  <description>Interpreter process connect timeout in msec.</description>
</property>

<property>
  <name>zeppelin.interpreter.output.limit</name>
  <value>102400</value>
  <description>Output message from interpreter exceeding the limit will be truncated</description>
</property>

<property>
  <name>zeppelin.ssl</name>
  <value>false</value>
  <description>Should SSL be used by the servers?</description>
</property>

<property>
  <name>zeppelin.ssl.client.auth</name>
  <value>false</value>
  <description>Should client authentication be used for SSL connections?</description>
</property>

<property>
  <name>zeppelin.ssl.keystore.path</name>
  <value>keystore</value>
  <description>Path to keystore relative to Zeppelin configuration directory</description>
</property>

<property>
  <name>zeppelin.ssl.keystore.type</name>
  <value>JKS</value>
  <description>The format of the given keystore (e.g. JKS or PKCS12)</description>
</property>

<property>
  <name>zeppelin.ssl.keystore.password</name>
  <value>change me</value>
  <description>Keystore password. Can be obfuscated by the Jetty Password tool</description>
</property>

<!--
<property>
  <name>zeppelin.ssl.key.manager.password</name>
  <value>change me</value>
  <description>Key Manager password. Defaults to keystore password. Can be obfuscated.</description>
</property>
-->

<property>
  <name>zeppelin.ssl.truststore.path</name>
  <value>truststore</value>
  <description>Path to truststore relative to Zeppelin configuration directory. Defaults to the keystore path</description>
</property>

<property>
  <name>zeppelin.ssl.truststore.type</name>
  <value>JKS</value>
  <description>The format of the given truststore (e.g. JKS or PKCS12). Defaults to the same type as the keystore type</description>
</property>

<!--
<property>
  <name>zeppelin.ssl.truststore.password</name>
  <value>change me</value>
  <description>Truststore password. Can be obfuscated by the Jetty Password tool. Defaults to the keystore password</description>
</property>
-->

<!--
<property>
  <name>zeppelin.ssl.pem.key</name>
  <value></value>
  <description>This directive points to the PEM-encoded private key file for the server.</description>
</property>
-->

<!--
<property>
  <name>zeppelin.ssl.pem.key.password</name>
  <value></value>
  <description>Password of the PEM-encoded private key.</description>
</property>
-->

<!--
<property>
  <name>zeppelin.ssl.pem.cert</name>
  <value></value>
  <description>This directive points to a file with certificate data in PEM format.</description>
</property>
-->

<!--
<property>
  <name>zeppelin.ssl.pem.ca</name>
  <value></value>
  <description>This directive sets the all-in-one file where you can assemble the Certificates of Certification Authorities (CA) whose clients you deal with. These are used for Client Authentication. Such a file is simply the concatenation of the various PEM-encoded Certificate files.</description>
</property>
-->

<property>
  <name>zeppelin.server.allowed.origins</name>
  <value>*</value>
  <description>Allowed sources for REST and WebSocket requests (i.e. http://onehost:8080,http://otherhost.com). If you leave * you are vulnerable to https://issues.apache.org/jira/browse/ZEPPELIN-173</description>
</property>

<property>
  <name>zeppelin.username.force.lowercase</name>
  <value>false</value>
  <description>Force convert username case to lower case, useful for Active Directory/LDAP. Default is not to change case</description>
</property>

<property>
  <name>zeppelin.notebook.default.owner.username</name>
  <value></value>
  <description>Set owner role by default</description>
</property>

<property>
  <name>zeppelin.notebook.public</name>
  <value>true</value>
  <description>Make notebook public by default when created, private otherwise</description>
</property>

<property>
  <name>zeppelin.websocket.max.text.message.size</name>
  <value>10240000</value>
  <description>Size in characters of the maximum text message to be received by websocket. Defaults to 10240000</description>
</property>

<property>
  <name>zeppelin.server.default.dir.allowed</name>
  <value>false</value>
  <description>Enable directory listings on server.</description>
</property>

<property>
  <name>zeppelin.interpreter.yarn.monitor.interval_secs</name>
  <value>10</value>
  <description>Check interval in secs for yarn apps monitors</description>
</property>

<!--
<property>
  <name>zeppelin.interpreter.lifecyclemanager.class</name>
  <value>org.apache.zeppelin.interpreter.lifecycle.TimeoutLifecycleManager</value>
  <description>LifecycleManager class for managing the lifecycle of interpreters, by default interpreter will
  be closed after timeout</description>
</property>

<property>
  <name>zeppelin.interpreter.lifecyclemanager.timeout.checkinterval</name>
  <value>60000</value>
  <description>Milliseconds of the interval to checking whether interpreter is time out</description>
</property>

<property>
  <name>zeppelin.interpreter.lifecyclemanager.timeout.threshold</name>
  <value>3600000</value>
  <description>Milliseconds of the interpreter timeout threshold, by default it is 1 hour</description>
</property>
-->

<property>
    <name>zeppelin.server.jetty.name</name>
    <value> </value>
    <description>Hardcoding Application Server name to Prevent Fingerprinting</description>
</property>

<!--
<property>
    <name>zeppelin.server.send.jetty.name</name>
    <value>false</value>
    <description>If set to false, will not show the Jetty version to prevent Fingerprinting</description>
</property>
-->

<!--
<property>
    <name>zeppelin.server.jetty.request.header.size</name>
    <value>8192</value>
    <description>Http Request Header Size Limit (to prevent HTTP 413)</description>
</property>
-->

<!--
<property>
    <name>zeppelin.server.jetty.thread.pool.max</name>
    <value>400</value>
    <description>Max Thread pool number for QueuedThreadPool in Jetty Server</description>
</property>
-->
<!--
<property>
    <name>zeppelin.server.jetty.thread.pool.min</name>
    <value>8</value>
    <description>Min Thread pool number for QueuedThreadPool in Jetty Server</description>
</property>
-->
<!--
<property>
    <name>zeppelin.server.jetty.thread.pool.timeout</name>
    <value>30</value>
    <description>Timeout number for QueuedThreadPool in Jetty Server</description>
</property>
-->

<!--
<property>
    <name>zeppelin.server.authorization.header.clear</name>
    <value>true</value>
    <description>Authorization header to be cleared if server is running as authcBasic</description>
</property>
-->

<property>
  <name>zeppelin.server.xframe.options</name>
  <value>SAMEORIGIN</value>
  <description>The X-Frame-Options HTTP response header can be used to indicate whether or not a browser should be allowed to render a page in a frame/iframe/object.</description>
</property>

<!--
<property>
  <name>zeppelin.server.strict.transport</name>
  <value>max-age=631138519</value>
  <description>The HTTP Strict-Transport-Security response header is a security feature that lets a web site tell browsers that it should only be communicated with using HTTPS, instead of using HTTP. Enable this when Zeppelin is running on HTTPS. Value is in Seconds, the default value is equivalent to 20 years.</description>
</property>
-->

<property>
  <name>zeppelin.server.xxss.protection</name>
  <value>1; mode=block</value>
  <description>The HTTP X-XSS-Protection response header is a feature of Internet Explorer, Chrome and Safari that stops pages from loading when they detect reflected cross-site scripting (XSS) attacks. When value is set to 1 and a cross-site scripting attack is detected, the browser will sanitize the page (remove the unsafe parts).</description>
</property>

<property>
  <name>zeppelin.server.xcontent.type.options</name>
  <value>nosniff</value>
  <description>The HTTP X-Content-Type-Options response header helps to prevent MIME type sniffing attacks. It directs the browser to honor the type specified in the Content-Type header, rather than trying to determine the type from the content itself. The default value "nosniff" is really the only meaningful value. This header is supported on all browsers except Safari and Safari on iOS.</description>
</property>

<!--
<property>
  <name>zeppelin.server.html.body.addon</name>
  <value><![CDATA[<script defer src="https://url/to/my/lib.min.js" /><script defer src="https://url/to/other/lib.min.js" />]]></value>
  <description>Addon html code to be placed at the end of the html->body section in index.html delivered by zeppelin server.</description>
</property>

<property>
  <name>zeppelin.server.html.head.addon</name>
  <value></value>
  <description>Addon html code to be placed at the end of the html->head section in index.html delivered by zeppelin server.</description>
</property>
-->


<!--
<property>
  <name>zeppelin.interpreter.callback.portRange</name>
  <value>10000:10010</value>
</property>
-->

<!--
<property>
  <name>zeppelin.recovery.storage.class</name>
  <value>org.apache.zeppelin.interpreter.recovery.LocalRecoveryStorage</value>
  <description>ReoveryStorage implementation based on java native local file system</description>
</property>

<property>
  <name>zeppelin.recovery.storage.class</name>
  <value>org.apache.zeppelin.interpreter.recovery.FileSystemRecoveryStorage</value>
  <description>ReoveryStorage implementation based on hadoop FileSystem</description>
</property>
-->

<!--
<property>
  <name>zeppelin.recovery.dir</name>
  <value>recovery</value>
  <description>Location where recovery metadata is stored</description>
</property>
-->

<!-- GitHub configurations
<property>
  <name>zeppelin.notebook.git.remote.url</name>
  <value></value>
  <description>remote Git repository URL</description>
</property>

<property>
  <name>zeppelin.notebook.git.remote.username</name>
  <value>token</value>
  <description>remote Git repository username</description>
</property>

<property>
  <name>zeppelin.notebook.git.remote.access-token</name>
  <value></value>
  <description>remote Git repository password</description>
</property>

<property>
  <name>zeppelin.notebook.git.remote.origin</name>
  <value>origin</value>
  <description>Git repository remote</description>
</property>

<property>
  <name>zeppelin.notebook.cron.enable</name>
  <value>false</value>
  <description>Notebook enable cron scheduler feature</description>
</property>
<property>
  <name>zeppelin.notebook.cron.folders</name>
  <value></value>
  <description>Notebook cron folders</description>
</property>
-->

<property>
  <name>zeppelin.run.mode</name>
  <value>auto</value>
  <description>'auto|local|k8s|docker'</description>
</property>

<property>
  <name>zeppelin.k8s.portforward</name>
  <value>false</value>
  <description>Port forward to interpreter rpc port. Set 'true' only on local development when zeppelin.k8s.mode 'on'</description>
</property>

<property>
  <name>zeppelin.k8s.container.image</name>
  <value>apache/zeppelin:0.9.0-SNAPSHOT</value>
  <description>Docker image for interpreters</description>
</property>

<property>
  <name>zeppelin.k8s.spark.container.image</name>
  <value>apache/spark:latest</value>
  <description>Docker image for Spark executors</description>
</property>

<property>
  <name>zeppelin.k8s.template.dir</name>
  <value>k8s</value>
  <description>Kubernetes yaml spec files</description>
</property>

<property>
  <name>zeppelin.docker.container.image</name>
  <value>apache/zeppelin:0.8.0</value>
  <description>Docker image for interpreters</description>
</property>

<property>
  <name>zeppelin.search.index.rebuild</name>
  <value>false</value>
  <description>Whether rebuild index when zeppelin start. If true, it would read all notes and rebuild the index, this would consume lots of memory if you have large amounts of notes, so by default it is false</description>
</property>

<property>
  <name>zeppelin.search.use.disk</name>
  <value>true</value>
  <description>Whether using disk for storing search index, if false, memory will be used instead.</description>
</property>

<property>
  <name>zeppelin.search.index.path</name>
  <value>/tmp/zeppelin-index</value>
  <description>path for storing search index on disk.</description>
</property>

<property>
  <name>zeppelin.jobmanager.enable</name>
  <value>${jobmanagerEnable}</value>
  <description>The Job tab in zeppelin page seems not so useful instead it cost lots of memory and affect the performance.
  Disable it can save lots of memory</description>
</property>

<property>
  <name>zeppelin.spark.only_yarn_cluster</name>
  <value>false</value>
  <description>Whether only allow yarn cluster mode</description>
</property>

<property>
  <name>zeppelin.note.file.exclude.fields</name>
  <value></value>
  <description>fields to be excluded from being saved in note files, with Paragraph prefix mean the fields in Paragraph, e.g. Paragraph.results</description>
</property>

<#list itemList as item>
<property>
  <name>${item.name}</name>
  <value>${item.value}</value>
</property>
</#list>

</configuration>

配置完成后需要重启,work也需要重启

/opt/datasophon/DDP/packages/datasophon-manager-1.2.1/bin/datasophon-api.sh start api

重启完成后,我们完成安装。

登录页面 http://192.168.2.100:8889/


三、配置常用解释器


3.1配置Hive解释器


前提条件我们需要启动HiverServer2。

复制hive-site.xml到zeppline中

cp /opt/datasophon/hive-3.1.0/conf/hive-site.xml /opt/datasophon/zeppelin-0.10.1/conf/

将如下jar包拷贝到目录:/opt/datasophon/zeppelin-0.10.1/interpreter/jdbc

commons-lang-2.6.jar
curator-client-2.12.0.jar
guava-19.0.jar
hadoop-common-3.3.3.jar
hive-common-3.1.0.jar
hive-exec-3.1.0.jar
hive-jdbc-3.1.0.jar
hive-serde-3.1.0.jar
hive-service-3.1.0.jar
hive-service-rpc-3.1.0.jar
httpclient-4.5.2.jar
httpcore-4.4.4.jar
libfb303-0.9.3.jar
libthrift-0.9.3.jar
mysql-connector-java-5.1.46-bin.jar
mysql-connector-java.jar
protobuf-java-2.5.0.jar

jar包拷贝完成后,重启zeppline。

web界面配置集成hive

新建一个继承jdbc的解释器,命名为hive,如下图所示

配置默认jdbc URL和USER

我的配置如下:

属性名称

属性值

default.url

jdbc:hive2://192.168.21.102:10000

default.user

hdfs

default.driver

org.apache.hive.jdbc.HiveDriver

创建新的notebook

Interpreter选择:hive

测试使用

3.2 配置trino解释器


将trino服务启动

新增拦截器

拦截器名字为trino,group设置为jdbc

设置属性,添加url和driver,用户名可以随便填,trino默认没有启动用户校验

jdbc:trino://192.168.21.102:10000

属性名称

属性值

default.url

jdbc:trino://192.168.2.98:8086/hive

default.user

hdfs

default.driver

io.trino.jdbc.TrinoDriver

配置完成后我们选择重启

创建一个 new note

Interpreter选择:trino

测试查询hive表中数据

3.3 配置Spark解释器


Zeppelin默认的spark解释器包括%spark , %sql , %dep , %pyspark , %ipyspark , %r等子解释器,在实际应用中根据spark集群的参数修改具体的属性进入解释器配置界面,默认为local[*],Spark采用何种运行模式,参数配置信息如下。

  • local模式:使用local[*],[]中为线程数,*代表线程数与计算机的CPU核心数一致。
  • standalone模式: 使用spark://master:7077
  • yarn模式:使用yarn-client或yarn-cluster
  • mesos模式:使用mesos://zk://zk1:2181,zk2:2182,zk3:2181/mesos或mesos://host:5050

进入编辑spark拦截器

选择编辑配置SPARK_HOME和spark.master,具体参数含义看官网。

创建note

解释器选择"spark"

测试运行note

%spark
// create DataFrame from scala Seq. It can infer schema for you.
val df1 = spark.createDataFrame(Seq((1, "andy", 20, "USA"), (2, "jeff", 23, "China"), (3, "james", 18, "USA"))).toDF("id", "name", "age", "country")
df1.printSchema
df1.show()

// create DataFrame from scala case class
case class Person(id:Int, name:String, age:Int, country:String)
val df2 = spark.createDataFrame(Seq(Person(1, "andy", 20, "USA"), Person(2, "jeff", 23, "China"), Person(3, "james", 18, "USA")))
df2.printSchema
df2.show()

import spark.implicits._
// you can also create Dataset from scala case class
val df3 = spark.createDataset(Seq(Person(1, "andy", 20, "USA"), Person(2, "jeff", 23, "China"), Person(3, "james", 18, "USA")))
df3.printSchema
df3.show()

点击 运行

运行结果输出


Apache Zeppelin 一文打尽

高级工具 zeppelin 整合hive教程_zeppelin hive-CSDN博客

高级工具 zeppelin 整合hive教程_zeppelin hive-CSDN博客

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2198972.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

使用java分别输出二叉树的深度遍历和广度遍历

代码功能 这段Java代码定义了一个二叉树&#xff0c;并实现了两种遍历方法&#xff1a;深度优先搜索&#xff08;DFS&#xff09;和广度优先搜索&#xff08;BFS&#xff09;。通过DFS&#xff0c;代码从根节点开始&#xff0c;优先访问子节点&#xff0c;直至最深的节点&…

气膜体育馆在学校中的应用:创新教育设施的未来—轻空间

随着气膜体育馆的广泛推广&#xff0c;越来越多的学校开始探索其在日常教学和活动中的应用&#xff0c;开创了新的教育模式和学生体验。 提升体能训练与健康教育 气膜体育馆以其卓越的设施条件&#xff0c;为学生的体能训练提供了理想的场所。在这里&#xff0c;专业教练可以为…

理解 ECMAScript 与 JavaScript:标准与实现的区别

文章目录 1.引言 2.ECMAScript 是什么&#xff1f; 3.JavaScript 是什么&#xff1f; 4.ECMAScript 和 JavaScript 的关系 5.ECMAScript 的发展历史 6.JavaScript 的浏览器特性 7.ECMAScript 与 JavaScript 的区别 8.常见误区 9.结语 1.引言 在前端开发的世界中&#…

食品企业为什么要参加第111届深圳秋糖酒会

第111届深圳秋季全国糖酒会将于2024年10月29日至31日在深圳国际会展中心&#xff08;宝安&#xff09;隆重举行&#xff0c;展览规模达28万平方米&#xff0c;刷新了秋季糖酒会的历史纪录。如此宏大的规模&#xff0c;不仅彰显了糖酒会的影响力&#xff0c;也预示着本次展会将汇…

利士策分享,节后重启,再启新程

利士策分享&#xff0c;节后重启&#xff0c;再启新程 随着日历翻过最后一页法定节假日的篇章&#xff0c;我们再次回到了熟悉而繁忙的工作岗位上。 那些与家人团聚的温馨时光&#xff0c;那些悠然自得的休闲日子&#xff0c;仿佛还在眼前&#xff0c;却又已悄然远去。 面对这…

【Golang】关于Go语言中的IO操作

✨✨ 欢迎大家来到景天科技苑✨✨ &#x1f388;&#x1f388; 养成好习惯&#xff0c;先赞后看哦~&#x1f388;&#x1f388; &#x1f3c6; 作者简介&#xff1a;景天科技苑 &#x1f3c6;《头衔》&#xff1a;大厂架构师&#xff0c;华为云开发者社区专家博主&#xff0c;…

基于SpringBoot+Vue的流浪猫狗救助系统

作者&#xff1a;计算机学姐 开发技术&#xff1a;SpringBoot、SSM、Vue、MySQL、JSP、ElementUI、Python、小程序等&#xff0c;“文末源码”。 专栏推荐&#xff1a;前后端分离项目源码、SpringBoot项目源码、Vue项目源码、SSM项目源码、微信小程序源码 精品专栏&#xff1a;…

C0022.在Clion中快速生成函数中形参注释及添加函数说明

快速生成函数中形参注释 只需要在函数前输入/**&#xff0c;然后按回车&#xff0c;这样即可快速生成如下函数形参注释。 新增函数描述 在设置界面中的搜索框中输入Code Generation&#xff0c;然后勾选 如下选择框&#xff0c;这样就可以在如上生成的代码快中新增函数描述栏…

从零开始:全面掌握AI大模型的终极学习指南,学废了 我这就去手撕面试官!

学习AI大模型从零基础入门到精通是一个循序渐进的过程&#xff0c;涉及到理论知识、编程技能和实践经验。下面是一份详细的指南&#xff0c;帮助你从头开始学习并逐步掌握AI大模型的构建与应用。 第一阶段&#xff08;10天&#xff09;&#xff1a;初阶应用 该阶段让大家对大…

深入理解Transformer的笔记记录(非小白贴)NNLM → Word2Vec

文章的整体介绍顺序为&#xff1a; NNLM → Word2Vec → Seq2Seq → Seq2Seq with Attention → Transformer → Elmo → GPT → BERT 自然语言处理相关任务中要将自然语言交给机器学习中的算法来处理&#xff0c;通常需要将语言数学化&#xff0c;因为计算机机器只认数学符号…

SDUT数据结构与算法第二次机测

目录 7-1 括号匹配 7-2 后缀式求值 7-3 表达式转换 7-4 【模板】KMP字符串匹配 比较详细注释和图解请看KMP——字符串匹配-CSDN博客&#xff0c;&#xff08;点击链接可跳转&#xff09;一看就会 7-5 约瑟夫环&#xff08;押题&#xff0c;重要&#xff09; 7-6 单调栈&a…

加密软件的桌面管理系统有什么?

1、IT资源管控&#xff1a;协助企事业单位管理者对内部计算机、宽带、打印、外围设备等IT资源进行管控&#xff0c;提高IT资源利用率。 2、规范内网行为&#xff1a;规范员工的计算机使用行为、网络使用行为、IT资产使用行为、设备使用行为 等&#xff0c;令员工活动在合规范围…

YOLOv11改进,YOLOv11改进损失函数采用Powerful-IoU:自适应惩罚因子和基于锚框质量的梯度调节函数(2024年最新IOU)

摘要 边界框回归(BBR)是目标检测中的核心任务之一,BBR损失函数显著影响其性能。然而,观察到现有基于IoU的损失函数存在不合理的惩罚因子,导致回归过程中锚框扩展,并显著减缓收敛速度。为了解决这个问题,深入分析了锚框扩展的原因。针对这个问题,提出了一种新的Powerfu…

PDF无法导出中文

font/SIMSUN.TTC with Identity-H is not recognized. 查看BaseFont源码发现".ttc," 改为"SIMSUN.TTC,a"提示数字转换异常 改为"SIMSUN.TTC,11"提示数字索引必须介于0和1之间 改为0或1结果正常 BaseFont baseFont BaseFont.createFont("/U…

办公AI推荐:阅读总结视频翻译文档文章等—包阅AI

目录 官网首页 网页阅读 思维导图 图书对话功能 1. 关键词 2. 总结 3. 主要内容 随心笔记 视频阅读 Mysql数据库案例 思维导图 内容评价 总结 想象一下&#xff0c;当您能在几分钟内掌握一小时视频的精华&#xff0c;或瞬间生成一本书的思维导图&#xff0c;您的学…

【2024最新】基于springboot+vue的旧物置换网站lw+ppt

作者&#xff1a;计算机搬砖家 开发技术&#xff1a;SpringBoot、php、Python、小程序、SSM、Vue、MySQL、JSP、ElementUI等&#xff0c;“文末源码”。 专栏推荐&#xff1a;SpringBoot项目源码、Vue项目源码、SSM项目源码、微信小程序源码 精品专栏&#xff1a;Java精选实战项…

富士胶片人像汽车照片Lr调色教程,手机滤镜PS+Lightroom预设下载!

调色介绍 富士胶片人像汽车照片的调色旨在营造出独特的复古、文艺氛围。通过 Lightroom 的调色工具&#xff0c;将人像与汽车完美融合在具有富士胶片特色的画面中&#xff0c;展现出别样的美感。 预设信息 调色风格&#xff1a;富士胶片风格预设适合类型&#xff1a;人像&am…

读懂NCHW和NHWC

vscode ssh连接防火墙关闭 sudo ufw allow ssh打开即可 TensorRT推理参考: 一篇就够&#xff1a;高性能推理引擎理论与实践 (TensorRT)-阿里云开发者社区 下面很好的描述了各种不同的格式。 参考文档 Tensor 内存布局 — MegEngine 1.13.2 文档 对于”NCHW” 而言&#xff0c…

Spring与Spring Boot之间的区别

Spring和Spring Boot是用于开发Java企业应用的两个主流框架。虽然它们都属于Spring生态系统的一部分&#xff0c;但是它们各自有不同的使用场景和特点。 在本文中&#xff0c;我们将探讨Spring与Spring Boot之间的差异&#xff0c;针对他们之间特性的差异&#xff0c;做一个详…

网安加·百家讲坛 | 潘继平:AI赋能DevOps平台:全面提升代码安全性

作者简介&#xff1a;潘继平&#xff0c;中国软协项目管理专委会专家&#xff0c;深圳市软件行业协会特聘专家。华为土耳其研究所外聘高级项目顾问&#xff0c;负责华为云应用生态圈产品线研发管理。曾为华为全球技术服务中心、华为制造IT以及华为流程IT解决方案提供等多个部门…