当前位置:首页 » 编程语言 » sparksqlwhere
扩展阅读
webinf下怎么引入js 2023-08-31 21:54:13
堡垒机怎么打开web 2023-08-31 21:54:11

sparksqlwhere

发布时间: 2022-10-20 21:02:13

1. spark sql 支持子查询吗

支持子查询
例如:
select ename,deptno,sal
from emp
where deptno=(select deptno from dept where loc='NEW YORK');

2. Spark SQL到底支持什么SQL语句

试试看看spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\SQLParser.scala
scala语言不是很容易懂,但是里面有解析SQL的方法,可以看出支持的SQL语句,至少关键词是很明确的。

protected val ALL = Keyword("ALL")
protected val AND = Keyword("AND")
protected val APPROXIMATE = Keyword("APPROXIMATE")
protected val AS = Keyword("AS")
protected val ASC = Keyword("ASC")
protected val BETWEEN = Keyword("BETWEEN")
protected val BY = Keyword("BY")
protected val CASE = Keyword("CASE")
protected val CAST = Keyword("CAST")
protected val DESC = Keyword("DESC")
protected val DISTINCT = Keyword("DISTINCT")
protected val ELSE = Keyword("ELSE")
protected val END = Keyword("END")
protected val EXCEPT = Keyword("EXCEPT")
protected val FALSE = Keyword("FALSE")
protected val FROM = Keyword("FROM")
protected val FULL = Keyword("FULL")
protected val GROUP = Keyword("GROUP")
protected val HAVING = Keyword("HAVING")
protected val IN = Keyword("IN")
protected val INNER = Keyword("INNER")
protected val INSERT = Keyword("INSERT")
protected val INTERSECT = Keyword("INTERSECT")
protected val INTO = Keyword("INTO")
protected val IS = Keyword("IS")
protected val JOIN = Keyword("JOIN")
protected val LEFT = Keyword("LEFT")
protected val LIKE = Keyword("LIKE")
protected val LIMIT = Keyword("LIMIT")
protected val NOT = Keyword("NOT")
protected val NULL = Keyword("NULL")
protected val ON = Keyword("ON")
protected val OR = Keyword("OR")
protected val ORDER = Keyword("ORDER")
protected val SORT = Keyword("SORT")
protected val OUTER = Keyword("OUTER")
protected val OVERWRITE = Keyword("OVERWRITE")
protected val REGEXP = Keyword("REGEXP")
protected val RIGHT = Keyword("RIGHT")
protected val RLIKE = Keyword("RLIKE")
protected val SELECT = Keyword("SELECT")
protected val SEMI = Keyword("SEMI")
protected val TABLE = Keyword("TABLE")
protected val THEN = Keyword("THEN")
protected val TRUE = Keyword("TRUE")
protected val UNION = Keyword("UNION")
protected val WHEN = Keyword("WHEN")
protected val WHERE = Keyword("WHERE")
protected val WITH = Keyword("WITH")

3. sparksql的truncate=false删除表

truncate操作同没有where条件的delete操作十分相似。
1、无论truncate大表还是小表速度都非常快。delete要产生回滚信息来满足回滚需求,而truncate是不产生的。
2、truncate是DDL语句进行隐式提交,不能进行回滚操作。
3、truncate重新设定表和索引的HWM(高水标记),由于全表扫描和索引快速扫描都要读取所有的数据块知道HWM为止。所以全表扫描的性能不会因为delete而提高,但是经过truncate操作后速度会很快。
4、truncate不触发任何delete触发器。
5、不能赋给某个用户truncate其它用户表的权限。如果需要trucate其它用户表的权限必须对该用户赋DROP ANY TABLE权限。
6、当表被truncate后,这个表和索引所占用的空间会恢复到初始大小,而delete操作不会减少表或索引所占用的空间。
7、不能truncate一个带有外键的表,如果要删除首先要取消外键,然后再删除。

4. 为什么SparkSQL不支持子查询

执行效率低 例:select * from table where a not in(select a from tableb) 如果子查询包括一个比较大的结果集,就不建议使用。

5. spark sql怎么去获取hive 表一定日期范围内的数据

select orderid,fenjian,timee
from
(
select orderid,fenjian,timee,row_number(orderid,fenjian) rn
from (
select orderid,fenjian,timee from tableName
distribute by orderid,fenjian sort by orderid,fenjian,timee asc
) t1
) t2
where t2.rn=1

6. 如何使用Spark SQL 的JDBC server

运行环境
集群环境:CDH5.3.0

具体JAR版本如下:
spark版本:1.2.0-cdh5.3.0
hive版本:0.13.1-cdh5.3.0
hadoop版本:2.5.0-cdh5.3.0
启动 JDBC server
cd /etc/spark/conf
ln -s /etc/hive/conf/hive-site.xml hive-site.xml
cd /opt/cloudera/parcels/CDH/lib/spark/
chmod- -R 777 logs/
cd /opt/cloudera/parcels/CDH/lib/spark/sbin
./start-thriftserver.sh --master yarn --hiveconf hive.server2.thrift.port=10008

Connecting to the JDBC server with Beeline
cd /opt/cloudera/parcels/CDH/lib/spark/bin
beeline -u jdbc:hive2://hadoop04:10000

[root@hadoop04 bin]# beeline -u jdbc:hive2://hadoop04:10000
scan complete in 2ms
Connecting to jdbc:hive2://hadoop04:10000
Connected to: Spark SQL (version 1.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.3.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.3.0 by Apache Hive
0: jdbc:hive2://hadoop04:10000>

Working with Beeline
Within the Beeline client, you can use standard HiveQL commands to create, list, and query tables. You can find the full details of HiveQL in the Hive Language Manual,but here, we show a few common operations.

CREATE TABLE IF NOT EXISTS mytable (key INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

create table mytable(name string,addr string,status string) row format delimited fields terminated by '#'

#加载本地文件
load data local inpath '/external/tmp/data.txt' into table mytable

#加载hdfs文件
load data inpath 'hdfs://ju51nn/external/tmp/data.txt' into table mytable;

describe mytable;

explain select * from mytable where name = '张三'

select * from mytable where name = '张三'

cache table mytable

select count(*) total,count(distinct addr) num1,count(distinct status) num2 from mytable where addr='gz';

uncache table mytable

使用数据示例
张三#广州#学生
李四#贵州#教师
王五#武汉#讲师
赵六#成都#学生
lisa#广州#学生
lily#gz#studene

Standalone Spark SQL Shell
Spark SQL also supports a simple shell you can use as a single process: spark-sql
它主要用于本地的开发环境,在共享集群环境中,请使用JDBC SERVER

cd /opt/cloudera/parcels/CDH/lib/spark/bin
./spark-sql

7. Spark SQL到底支持什么SQL语句

scala语言不是很容易懂,但是里面有解析SQL的方法,可以看出支持的SQL语句,至少关键词是很明确的。
protected val ALL = Keyword("ALL")
protected val AND = Keyword("AND")
protected val APPROXIMATE = Keyword("APPROXIMATE")
protected val AS = Keyword("AS")
protected val ASC = Keyword("ASC")
protected val BETWEEN = Keyword("BETWEEN")
protected val BY = Keyword("BY")
protected val CASE = Keyword("CASE")
protected val CAST = Keyword("CAST")
protected val DESC = Keyword("DESC")
protected val DISTINCT = Keyword("DISTINCT")
protected val ELSE = Keyword("ELSE")
protected val END = Keyword("END")
protected val EXCEPT = Keyword("EXCEPT")
protected val FALSE = Keyword("FALSE")
protected val FROM = Keyword("FROM")
protected val FULL = Keyword("FULL")
protected val GROUP = Keyword("GROUP")
protected val HAVING = Keyword("HAVING")
protected val IN = Keyword("IN")
protected val INNER = Keyword("INNER")
protected val INSERT = Keyword("INSERT")
protected val INTERSECT = Keyword("INTERSECT")
protected val INTO = Keyword("INTO")
protected val IS = Keyword("IS")
protected val JOIN = Keyword("JOIN")
protected val LEFT = Keyword("LEFT")
protected val LIKE = Keyword("LIKE")
protected val LIMIT = Keyword("LIMIT")
protected val NOT = Keyword("NOT")
protected val NULL = Keyword("NULL")
protected val ON = Keyword("ON")
protected val OR = Keyword("OR")
protected val ORDER = Keyword("ORDER")
protected val SORT = Keyword("SORT")
protected val OUTER = Keyword("OUTER")
protected val OVERWRITE = Keyword("OVERWRITE")
protected val REGEXP = Keyword("REGEXP")
protected val RIGHT = Keyword("RIGHT")
protected val RLIKE = Keyword("RLIKE")
protected val SELECT = Keyword("SELECT")
protected val SEMI = Keyword("SEMI")
protected val TABLE = Keyword("TABLE")
protected val THEN = Keyword("THEN")
protected val TRUE = Keyword("TRUE")
protected val UNION = Keyword("UNION")
protected val WHEN = Keyword("WHEN")
protected val WHERE = Keyword("WHERE")
protected val WITH = Keyword("WITH")

8. 如何使用 Spark SQL

一、启动方法
/data/spark-1.4.0-bin-cdh4/bin/spark-sql --master spark://master:7077 --total-executor-cores 10 --executor-memory 1g --executor-cores 2

注:/data/spark-1.4.0-bin-cdh4/为spark的安装路径

/data/spark-1.4.0-bin-cdh4/bin/spark-sql –help 查看启动选项

--master MASTER_URL 指定master url
--executor-memory MEM 每个executor的内存,默认为1G
--total-executor-cores NUM 所有executor的总核数
-e <quoted-query-string> 直接执行查询SQL

-f <filename> 以文件方式批量执行SQL

二、Spark sql对hive支持的功能

1、查询语句:SELECT GROUP BY ORDER BY CLUSTER BY SORT BY
2、hive操作运算:
1) 关系运算:= ==, <>, <, >, >=, <=
2) 算术运算:+, -, *, /, %
3) 逻辑运算:AND, &&, OR, ||
4) 复杂的数据结构
5) 数学函数:(sign, ln, cos, etc)
6) 字符串函数:
3、 UDF
4、 UDAF

5、 用户定义的序列化格式
6、join操作:JOIN {LEFT|RIGHT|FULL} OUTER JOIN LEFT SEMI JOIN CROSS JOIN
7、 unions操作:
8、 子查询: SELECT col FROM ( SELECT a + b AS col from t1) t2
9、Sampling
10、 Explain
11、 分区表
12、 视图
13、 hive ddl功能:CREATE TABLE、CREATE TABLE AS SELECT、ALTER TABLE

14、 支持的数据类型:TINYINT SMALLINT INT BIGINT BOOLEAN FLOAT DOUBLE STRING BINARY TIMESTAMPDATE ARRAY MAP STRUCT

三、Spark sql 在客户端编程方式进行查询数据
1、启动spark-shell
./spark-shell --master spark://master:7077 --total-executor-cores 10 --executor-memory 1g --executor-cores 2
2、编写程序
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val df = sqlContext.read.json("../examples/src/main/resources/people.json")
查看所有数据:df.show()
查看表结构:df.printSchema()
只看name列:df.select("name").show()
对数据运算:df.select(df("name"), df("age") + 1).show()
过滤数据:df.filter(df("age") > 21).show()

分组统计:df.groupBy("age").count().show()

1、查询txt数据
import sqlContext.implicits._
case class Person(name: String, age: Int)
val people = sc.textFile("../examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)).toDF()
people.registerTempTable("people")
val teenagers = sqlContext.sql("SELECT name, age FROM people WHERE age >= 13 AND age <= 19")
2、parquet文件
val df = sqlContext.read.load("../examples/src/main/resources/users.parquet")
3、hdfs文件

val df = sqlContext.read.load("hdfs://namenode.Hadoop:9000/user/hive/warehouse/spark_test.db/test_parquet/part-r-00001.gz.parquet")
4、保存查询结果数据
val df = sqlContext.read.load("../examples/src/main/resources/users.parquet")

df.select("name", "favorite_color").write.save("namesAndFavColors.parquet“)

四、Spark sql性能调优

缓存数据表:sqlContext.cacheTable("tableName")

取消缓存表:sqlContext.uncacheTable("tableName")

spark.sql.inMemoryColumnarStorage.compressedtrue当设置为true时,Spark SQL将为基于数据统计信息的每列自动选择一个压缩算法。
spark.sql.inMemoryColumnarStorage.batchSize10000柱状缓存的批数据大小。更大的批数据可以提高内存的利用率以及压缩效率,但有OOMs的风险

9. 给定关键词,怎么使用spark sql进行查询,然后返回查询的结果。就跟搜索引擎类似,怎么将请求发送给集群

UPDATE m_data SET m_type = '1' WHERE (m_name LIKE '%食品%')
加上条件就行了

10. 如何使用sparksql向mysql中插入数据

f(isset($_POST['submit'])&&$_POST['submit']=='提交'){
3 //判断是否是提交过来的
4 $intext = $_POST['intext'];
5 if($intext!=null||$intext!=''){
6 $link = mysql_connect("localhost", "root", "123456");
7 //数据库配置信息 第一个参数数据库位置第二个是用户名第三个是密码
8 mysql_select_db("szn_test");
9 //设置要使用的数据库
10 $sql = "select * from demo where res = '".$intext."'";