當前位置:首頁 » 編程語言 » sparksqlwhere
擴展閱讀
webinf下怎麼引入js 2023-08-31 21:54:13
堡壘機怎麼打開web 2023-08-31 21:54:11

sparksqlwhere

發布時間: 2022-10-20 21:02:13

1. spark sql 支持子查詢嗎

支持子查詢
例如:
select ename,deptno,sal
from emp
where deptno=(select deptno from dept where loc='NEW YORK');

2. Spark SQL到底支持什麼SQL語句

試試看看spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\SQLParser.scala
scala語言不是很容易懂,但是裡面有解析SQL的方法,可以看出支持的SQL語句,至少關鍵詞是很明確的。

protected val ALL = Keyword("ALL")
protected val AND = Keyword("AND")
protected val APPROXIMATE = Keyword("APPROXIMATE")
protected val AS = Keyword("AS")
protected val ASC = Keyword("ASC")
protected val BETWEEN = Keyword("BETWEEN")
protected val BY = Keyword("BY")
protected val CASE = Keyword("CASE")
protected val CAST = Keyword("CAST")
protected val DESC = Keyword("DESC")
protected val DISTINCT = Keyword("DISTINCT")
protected val ELSE = Keyword("ELSE")
protected val END = Keyword("END")
protected val EXCEPT = Keyword("EXCEPT")
protected val FALSE = Keyword("FALSE")
protected val FROM = Keyword("FROM")
protected val FULL = Keyword("FULL")
protected val GROUP = Keyword("GROUP")
protected val HAVING = Keyword("HAVING")
protected val IN = Keyword("IN")
protected val INNER = Keyword("INNER")
protected val INSERT = Keyword("INSERT")
protected val INTERSECT = Keyword("INTERSECT")
protected val INTO = Keyword("INTO")
protected val IS = Keyword("IS")
protected val JOIN = Keyword("JOIN")
protected val LEFT = Keyword("LEFT")
protected val LIKE = Keyword("LIKE")
protected val LIMIT = Keyword("LIMIT")
protected val NOT = Keyword("NOT")
protected val NULL = Keyword("NULL")
protected val ON = Keyword("ON")
protected val OR = Keyword("OR")
protected val ORDER = Keyword("ORDER")
protected val SORT = Keyword("SORT")
protected val OUTER = Keyword("OUTER")
protected val OVERWRITE = Keyword("OVERWRITE")
protected val REGEXP = Keyword("REGEXP")
protected val RIGHT = Keyword("RIGHT")
protected val RLIKE = Keyword("RLIKE")
protected val SELECT = Keyword("SELECT")
protected val SEMI = Keyword("SEMI")
protected val TABLE = Keyword("TABLE")
protected val THEN = Keyword("THEN")
protected val TRUE = Keyword("TRUE")
protected val UNION = Keyword("UNION")
protected val WHEN = Keyword("WHEN")
protected val WHERE = Keyword("WHERE")
protected val WITH = Keyword("WITH")

3. sparksql的truncate=false刪除表

truncate操作同沒有where條件的delete操作十分相似。
1、無論truncate大表還是小錶速度都非常快。delete要產生回滾信息來滿足回滾需求,而truncate是不產生的。
2、truncate是DDL語句進行隱式提交,不能進行回滾操作。
3、truncate重新設定表和索引的HWM(高水標記),由於全表掃描和索引快速掃描都要讀取所有的數據塊知道HWM為止。所以全表掃描的性能不會因為delete而提高,但是經過truncate操作後速度會很快。
4、truncate不觸發任何delete觸發器。
5、不能賦給某個用戶truncate其它用戶表的許可權。如果需要trucate其它用戶表的許可權必須對該用戶賦DROP ANY TABLE許可權。
6、當表被truncate後,這個表和索引所佔用的空間會恢復到初始大小,而delete操作不會減少表或索引所佔用的空間。
7、不能truncate一個帶有外鍵的表,如果要刪除首先要取消外鍵,然後再刪除。

4. 為什麼SparkSQL不支持子查詢

執行效率低 例:select * from table where a not in(select a from tableb) 如果子查詢包括一個比較大的結果集,就不建議使用。

5. spark sql怎麼去獲取hive 表一定日期范圍內的數據

select orderid,fenjian,timee
from
(
select orderid,fenjian,timee,row_number(orderid,fenjian) rn
from (
select orderid,fenjian,timee from tableName
distribute by orderid,fenjian sort by orderid,fenjian,timee asc
) t1
) t2
where t2.rn=1

6. 如何使用Spark SQL 的JDBC server

運行環境
集群環境:CDH5.3.0

具體JAR版本如下:
spark版本:1.2.0-cdh5.3.0
hive版本:0.13.1-cdh5.3.0
hadoop版本:2.5.0-cdh5.3.0
啟動 JDBC server
cd /etc/spark/conf
ln -s /etc/hive/conf/hive-site.xml hive-site.xml
cd /opt/cloudera/parcels/CDH/lib/spark/
chmod- -R 777 logs/
cd /opt/cloudera/parcels/CDH/lib/spark/sbin
./start-thriftserver.sh --master yarn --hiveconf hive.server2.thrift.port=10008

Connecting to the JDBC server with Beeline
cd /opt/cloudera/parcels/CDH/lib/spark/bin
beeline -u jdbc:hive2://hadoop04:10000

[root@hadoop04 bin]# beeline -u jdbc:hive2://hadoop04:10000
scan complete in 2ms
Connecting to jdbc:hive2://hadoop04:10000
Connected to: Spark SQL (version 1.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.3.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.3.0 by Apache Hive
0: jdbc:hive2://hadoop04:10000>

Working with Beeline
Within the Beeline client, you can use standard HiveQL commands to create, list, and query tables. You can find the full details of HiveQL in the Hive Language Manual,but here, we show a few common operations.

CREATE TABLE IF NOT EXISTS mytable (key INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

create table mytable(name string,addr string,status string) row format delimited fields terminated by '#'

#載入本地文件
load data local inpath '/external/tmp/data.txt' into table mytable

#載入hdfs文件
load data inpath 'hdfs://ju51nn/external/tmp/data.txt' into table mytable;

describe mytable;

explain select * from mytable where name = '張三'

select * from mytable where name = '張三'

cache table mytable

select count(*) total,count(distinct addr) num1,count(distinct status) num2 from mytable where addr='gz';

uncache table mytable

使用數據示例
張三#廣州#學生
李四#貴州#教師
王五#武漢#講師
趙六#成都#學生
lisa#廣州#學生
lily#gz#studene

Standalone Spark SQL Shell
Spark SQL also supports a simple shell you can use as a single process: spark-sql
它主要用於本地的開發環境,在共享集群環境中,請使用JDBC SERVER

cd /opt/cloudera/parcels/CDH/lib/spark/bin
./spark-sql

7. Spark SQL到底支持什麼SQL語句

scala語言不是很容易懂,但是裡面有解析SQL的方法,可以看出支持的SQL語句,至少關鍵詞是很明確的。
protected val ALL = Keyword("ALL")
protected val AND = Keyword("AND")
protected val APPROXIMATE = Keyword("APPROXIMATE")
protected val AS = Keyword("AS")
protected val ASC = Keyword("ASC")
protected val BETWEEN = Keyword("BETWEEN")
protected val BY = Keyword("BY")
protected val CASE = Keyword("CASE")
protected val CAST = Keyword("CAST")
protected val DESC = Keyword("DESC")
protected val DISTINCT = Keyword("DISTINCT")
protected val ELSE = Keyword("ELSE")
protected val END = Keyword("END")
protected val EXCEPT = Keyword("EXCEPT")
protected val FALSE = Keyword("FALSE")
protected val FROM = Keyword("FROM")
protected val FULL = Keyword("FULL")
protected val GROUP = Keyword("GROUP")
protected val HAVING = Keyword("HAVING")
protected val IN = Keyword("IN")
protected val INNER = Keyword("INNER")
protected val INSERT = Keyword("INSERT")
protected val INTERSECT = Keyword("INTERSECT")
protected val INTO = Keyword("INTO")
protected val IS = Keyword("IS")
protected val JOIN = Keyword("JOIN")
protected val LEFT = Keyword("LEFT")
protected val LIKE = Keyword("LIKE")
protected val LIMIT = Keyword("LIMIT")
protected val NOT = Keyword("NOT")
protected val NULL = Keyword("NULL")
protected val ON = Keyword("ON")
protected val OR = Keyword("OR")
protected val ORDER = Keyword("ORDER")
protected val SORT = Keyword("SORT")
protected val OUTER = Keyword("OUTER")
protected val OVERWRITE = Keyword("OVERWRITE")
protected val REGEXP = Keyword("REGEXP")
protected val RIGHT = Keyword("RIGHT")
protected val RLIKE = Keyword("RLIKE")
protected val SELECT = Keyword("SELECT")
protected val SEMI = Keyword("SEMI")
protected val TABLE = Keyword("TABLE")
protected val THEN = Keyword("THEN")
protected val TRUE = Keyword("TRUE")
protected val UNION = Keyword("UNION")
protected val WHEN = Keyword("WHEN")
protected val WHERE = Keyword("WHERE")
protected val WITH = Keyword("WITH")

8. 如何使用 Spark SQL

一、啟動方法
/data/spark-1.4.0-bin-cdh4/bin/spark-sql --master spark://master:7077 --total-executor-cores 10 --executor-memory 1g --executor-cores 2

註:/data/spark-1.4.0-bin-cdh4/為spark的安裝路徑

/data/spark-1.4.0-bin-cdh4/bin/spark-sql –help 查看啟動選項

--master MASTER_URL 指定master url
--executor-memory MEM 每個executor的內存,默認為1G
--total-executor-cores NUM 所有executor的總核數
-e <quoted-query-string> 直接執行查詢SQL

-f <filename> 以文件方式批量執行SQL

二、Spark sql對hive支持的功能

1、查詢語句:SELECT GROUP BY ORDER BY CLUSTER BY SORT BY
2、hive操作運算:
1) 關系運算:= ==, <>, <, >, >=, <=
2) 算術運算:+, -, *, /, %
3) 邏輯運算:AND, &&, OR, ||
4) 復雜的數據結構
5) 數學函數:(sign, ln, cos, etc)
6) 字元串函數:
3、 UDF
4、 UDAF

5、 用戶定義的序列化格式
6、join操作:JOIN {LEFT|RIGHT|FULL} OUTER JOIN LEFT SEMI JOIN CROSS JOIN
7、 unions操作:
8、 子查詢: SELECT col FROM ( SELECT a + b AS col from t1) t2
9、Sampling
10、 Explain
11、 分區表
12、 視圖
13、 hive ddl功能:CREATE TABLE、CREATE TABLE AS SELECT、ALTER TABLE

14、 支持的數據類型:TINYINT SMALLINT INT BIGINT BOOLEAN FLOAT DOUBLE STRING BINARY TIMESTAMPDATE ARRAY MAP STRUCT

三、Spark sql 在客戶端編程方式進行查詢數據
1、啟動spark-shell
./spark-shell --master spark://master:7077 --total-executor-cores 10 --executor-memory 1g --executor-cores 2
2、編寫程序
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val df = sqlContext.read.json("../examples/src/main/resources/people.json")
查看所有數據:df.show()
查看錶結構:df.printSchema()
只看name列:df.select("name").show()
對數據運算:df.select(df("name"), df("age") + 1).show()
過濾數據:df.filter(df("age") > 21).show()

分組統計:df.groupBy("age").count().show()

1、查詢txt數據
import sqlContext.implicits._
case class Person(name: String, age: Int)
val people = sc.textFile("../examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)).toDF()
people.registerTempTable("people")
val teenagers = sqlContext.sql("SELECT name, age FROM people WHERE age >= 13 AND age <= 19")
2、parquet文件
val df = sqlContext.read.load("../examples/src/main/resources/users.parquet")
3、hdfs文件

val df = sqlContext.read.load("hdfs://namenode.Hadoop:9000/user/hive/warehouse/spark_test.db/test_parquet/part-r-00001.gz.parquet")
4、保存查詢結果數據
val df = sqlContext.read.load("../examples/src/main/resources/users.parquet")

df.select("name", "favorite_color").write.save("namesAndFavColors.parquet「)

四、Spark sql性能調優

緩存數據表:sqlContext.cacheTable("tableName")

取消緩存表:sqlContext.uncacheTable("tableName")

spark.sql.inMemoryColumnarStorage.compressedtrue當設置為true時,Spark SQL將為基於數據統計信息的每列自動選擇一個壓縮演算法。
spark.sql.inMemoryColumnarStorage.batchSize10000柱狀緩存的批數據大小。更大的批數據可以提高內存的利用率以及壓縮效率,但有OOMs的風險

9. 給定關鍵詞,怎麼使用spark sql進行查詢,然後返回查詢的結果。就跟搜索引擎類似,怎麼將請求發送給集群

UPDATE m_data SET m_type = '1' WHERE (m_name LIKE '%食品%')
加上條件就行了

10. 如何使用sparksql向mysql中插入數據

f(isset($_POST['submit'])&&$_POST['submit']=='提交'){
3 //判斷是否是提交過來的
4 $intext = $_POST['intext'];
5 if($intext!=null||$intext!=''){
6 $link = mysql_connect("localhost", "root", "123456");
7 //資料庫配置信息 第一個參數資料庫位置第二個是用戶名第三個是密碼
8 mysql_select_db("szn_test");
9 //設置要使用的資料庫
10 $sql = "select * from demo where res = '".$intext."'";