侧边栏壁纸
博主头像
★街角晚灯★博主等级

博观而约取 厚积而薄发

  • 累计撰写 448 篇文章
  • 累计创建 183 个标签
  • 累计收到 0 条评论

目 录CONTENT

文章目录

BigData

WinJay
2023-11-03 / 0 评论 / 0 点赞 / 17 阅读 / 16791 字 / 正在检测是否收录...
温馨提示:
文章发布较早,内容可能过时,阅读注意甄别。。。。

BMR

image-20230906084805669

Hadoop

version

  • 查看版本信息

sudo -u hdfs hadoop version

image-20230906085126968

ls

  • 查看HDFS文件信息

sudo -u hdfs hadoop fs -ls / 查看根目录下的文件

image-20230906085328201

ls -R

  • 递归查看某个目录的所有子文件

sudo -u hdfs hadoop fs -ls -R /hdp/apps/

image-20230906090744490

mkdir

  • 新建目录

hadoop fs -mkdir /testdir001

实验没成功,提示权限问题。

image-20230906092129664

变更用户后使用Hadoop的超级用户'hdfs'后执行就成功了,命令如下:

sudo -u hdfs hadoop fs -mkdir /testdir001

image-20230906095705453

copyFromLocal

  • 上传文件到HDFS目录下

hadoop fs -copyFromLocal /root/get.tar /tmp/

image-20230906092420444

copyToLocal

  • 下载文件到本地

hadoop fs -copyToLocal /tmp/spark /root/HDFS_Download

cat

  • 查看文件内容

hadoop fs -cat /tmp/spark.txt

cp

  • HDFS系统内文件复制

hadoop fs -cp /tmp/spark.txt /hdp/apps/spark.txt

get

  • 直接下载了

hadoop fs -get /tmp/spark.txt

-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>

put

-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>

mv

rm

du

fsck

  • 检查根目录 '/' 的文件系统状态

sudo -u hdfs hadoop fsck /

image-20230906095644374

jar

jar <jar> run a jar file. NOTE: please use "yarn jar" to launch YARN applications, not this command.

hadoop jar /usr/local/1.2.0/hadoop-mapreduce-examples-3.1.1.jar wordcount /testdir001 /testdir001/output

这条命令是一个Hadoop MapReduce作业的启动命令,用于运行一个叫做"wordcount"的示例MapReduce作业。让我逐步解释这个命令的各个部分:

hadoop jar:这是命令的开始,它告诉Hadoop要运行一个Java JAR文件,并将其解释为一个MapReduce作业。在这里,hadoop是Hadoop命令行工具,jar表示要运行一个Java归档文件。

/usr/local/1.2.0/hadoop-mapreduce-examples-3.1.1.jar:这是要运行的Java JAR文件的路径。在这个例子中,它是Hadoop发行版的一部分,包含了一些示例MapReduce作业,其中包括了"wordcount"作业。

wordcount:这是要运行的MapReduce作业的类名。在Hadoop示例中,"wordcount"是一个非常常见的示例作业,用于计算文本文件中每个单词的出现次数。

/testdir001:这是输入数据的Hadoop文件系统路径。MapReduce作业将从这个目录中的文件读取数据,然后执行"wordcount"操作。

/testdir001/output:这是输出结果的Hadoop文件系统路径。MapReduce作业将计算"wordcount"并将结果写入这个目录。

综合起来,这个命令的作用是运行一个名为"wordcount"的MapReduce作业,该作业从/testdir001目录中的文件读取数据,并将结果写入/testdir001/output目录中。这是一个常见的示例,用于演示如何使用Hadoop来处理和分析大规模的文本数据,特别是计算每个单词在文本中出现的次数。

distcp

  • Hadoop分布式复制(distcp)命令

hadoop distcp /testdir001 /testdir004

这条命令是Hadoop分布式复制(distcp)命令的示例,用于将数据从一个Hadoop文件系统路径复制到另一个Hadoop文件系统路径。让我逐步解释这个命令的各个部分:

hadoop distcp:这是命令的开始,它告诉Hadoop要执行分布式复制操作。

/testdir001:这是源目录,也就是要复制数据的目录。在这个例子中,数据将从/testdir001目录中复制出来。

/testdir004:这是目标目录,也就是将数据复制到的目录。在这个例子中,数据将被复制到/testdir004目录中。

综合起来,这个命令的作用是将/testdir001目录中的数据复制到/testdir004目录中。这是一个用于在Hadoop集群内进行数据迁移、备份或复制的常见命令。分布式复制(distcp)命令可以高效地处理大规模数据的复制操作,充分利用Hadoop集群的分布式特性,以提高性能和可伸缩性。

job -status

hadoop job -status application_168708787897_003

Application_xxxxxx_xx 是在HadoopWebUI中获取到的。

job -list all

sudo -u hdfs hadoop job -list all

image-20230906103917028

dfsadmin -report

sudo -u hdfs hadoop dfsadmin -report

  • 查看报告状态

image-20230906104701122image-20230906104930623

[root@sdpmaster ~]# sudo -u hdfs hadoop dfsadmin -report
WARNING: Use of this script to execute dfsadmin is deprecated.
WARNING: Attempting to execute replacement "hdfs dfsadmin" instead.

Configured Capacity: 1924159562752 (1.75 TB)
Present Capacity: 1924159562752 (1.75 TB)
DFS Remaining: 1920849536000 (1.75 TB)
DFS Used: 3310026752 (3.08 GB)
DFS Used%: 0.17%
Replicated Blocks:
	Under replicated blocks: 209
	Blocks with corrupt replicas: 0
	Missing blocks: 0
	Missing blocks (with replication factor 1): 0
	Low redundancy blocks with highest priority to recover: 0
	Pending deletion blocks: 0
Erasure Coded Block Groups:
	Low redundancy block groups: 0
	Block groups with corrupt internal blocks: 0
	Missing block groups: 0
	Low redundancy blocks with highest priority to recover: 0
	Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 10.17.0.74:50010 (sdpnode1.litsoft.com.cn)
Hostname: sdpnode1.litsoft.com.cn
Decommission Status : Normal
Configured Capacity: 962079781376 (896.01 GB)
DFS Used: 1655013376 (1.54 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 960424768000 (894.47 GB)
DFS Used%: 0.17%
DFS Remaining%: 99.83%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Wed Sep 06 10:45:58 CST 2023
Last Block Report: Wed Sep 06 09:33:49 CST 2023
Num of Blocks: 209


Name: 10.17.0.75:50010 (sdpnode2.litsoft.com.cn)
Hostname: sdpnode2.litsoft.com.cn
Decommission Status : Normal
Configured Capacity: 962079781376 (896.01 GB)
DFS Used: 1655013376 (1.54 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 960424768000 (894.47 GB)
DFS Used%: 0.17%
DFS Remaining%: 99.83%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Wed Sep 06 10:45:59 CST 2023
Last Block Report: Wed Sep 06 08:30:43 CST 2023
Num of Blocks: 209


Dead datanodes (1):

Name: 10.17.0.73:50010 (sdpmaster.litsoft.com.cn)
Hostname: sdpmaster.litsoft.com.cn
Decommission Status : Normal
Configured Capacity: 962079781376 (896.01 GB)
DFS Used: 1628323890 (1.52 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 960317241079 (894.37 GB)
DFS Used%: 0.17%
DFS Remaining%: 99.82%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Tue Sep 05 10:56:03 CST 2023
Last Block Report: Tue Sep 05 10:54:53 CST 2023
Num of Blocks: 0

Dfsadmin -safemode

sudo -u hdfs hadoop dfsadmin -safemode enter

Usage: hdfs dfsadmin [-safemode enter | leave | get | wait | forceExit]

image-20230906111307323image-20230906111444183

Yarn

version

image-20230906111647323

application -list

yarn application -list

image-20230906111825579

application -list -appStates RUNNING

yarn application -list -appStates RUNNING

image-20230906112039267

Kill

yarn application -kill ${Application-Id}

image-20230906112420917

Logs

node -list

yarn node -list

image-20230906113949302

Queue

yarn queue -status root

image-20230906114120925

Hive

hive

sudo -u hive hive

image-20230906114525450

image-20230906114635804

0: jdbc:hive2://sdpmaster.litsoft.com.cn:2181> use sys;
INFO  : Compiling command(queryId=hive_20230906114933_37c568e8-f94a-4e09-acf6-8677eaaf6521): use sys
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20230906114933_37c568e8-f94a-4e09-acf6-8677eaaf6521); Time taken: 0.011 seconds
INFO  : Executing command(queryId=hive_20230906114933_37c568e8-f94a-4e09-acf6-8677eaaf6521): use sys
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20230906114933_37c568e8-f94a-4e09-acf6-8677eaaf6521); Time taken: 0.013 seconds
INFO  : OK
No rows affected (0.043 seconds)
0: jdbc:hive2://sdpmaster.litsoft.com.cn:2181> show tables;
INFO  : Compiling command(queryId=hive_20230906114937_38a441a5-b9c6-4f1f-a1eb-88e3baa91958): show tables
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20230906114937_38a441a5-b9c6-4f1f-a1eb-88e3baa91958); Time taken: 0.012 seconds
INFO  : Executing command(queryId=hive_20230906114937_38a441a5-b9c6-4f1f-a1eb-88e3baa91958): show tables
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20230906114937_38a441a5-b9c6-4f1f-a1eb-88e3baa91958); Time taken: 0.017 seconds
INFO  : OK
+----------------------------+
|          tab_name          |
+----------------------------+
| bucketing_cols             |
| cds                        |
| columns_v2                 |
| database_params            |
| db_privs                   |
| db_version                 |
| dbs                        |
| funcs                      |
| global_privs               |
| key_constraints            |
| mv_creation_metadata       |
| mv_tables_used             |
| part_col_privs             |
| part_col_stats             |
| part_privs                 |
| partition_key_vals         |
| partition_keys             |
| partition_params           |
| partition_stats_view       |
| partitions                 |
| role_map                   |
| roles                      |
| sd_params                  |
| sds                        |
| sequence_table             |
| serde_params               |
| serdes                     |
| skewed_col_names           |
| skewed_col_value_loc_map   |
| skewed_string_list         |
| skewed_string_list_values  |
| skewed_values              |
| sort_cols                  |
| tab_col_stats              |
| table_params               |
| table_stats_view           |
| tbl_col_privs              |
| tbl_privs                  |
| tbls                       |
| version                    |
| wm_mappings                |
| wm_pools                   |
| wm_pools_to_triggers       |
| wm_resourceplans           |
| wm_triggers                |
+----------------------------+
45 rows selected (0.059 seconds)
0: jdbc:hive2://sdpmaster.litsoft.com.cn:2181> desc version;
INFO  : Compiling command(queryId=hive_20230906114947_f34c651f-cc28-429f-8d86-2a797ce85f2f): desc version
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20230906114947_f34c651f-cc28-429f-8d86-2a797ce85f2f); Time taken: 0.115 seconds
INFO  : Executing command(queryId=hive_20230906114947_f34c651f-cc28-429f-8d86-2a797ce85f2f): desc version
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20230906114947_f34c651f-cc28-429f-8d86-2a797ce85f2f); Time taken: 0.026 seconds
INFO  : OK
+------------------+------------+----------+
|     col_name     | data_type  | comment  |
+------------------+------------+----------+
| ver_id           | int        |          |
| schema_version   | string     |          |
| version_comment  | string     |          |
+------------------+------------+----------+
3 rows selected (0.224 seconds)
0: jdbc:hive2://sdpmaster.litsoft.com.cn:2181> select * from version limit 1;
INFO  : Compiling command(queryId=hive_20230906115022_0d6b7c25-9637-4f4c-8432-9425ab2e766f): select * from version limit 1
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:version.ver_id, type:int, comment:null), FieldSchema(name:version.schema_version, type:string, comment:null), FieldSchema(name:version.version_comment, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20230906115022_0d6b7c25-9637-4f4c-8432-9425ab2e766f); Time taken: 2.056 seconds
INFO  : Executing command(queryId=hive_20230906115022_0d6b7c25-9637-4f4c-8432-9425ab2e766f): select * from version limit 1
INFO  : Completed executing command(queryId=hive_20230906115022_0d6b7c25-9637-4f4c-8432-9425ab2e766f); Time taken: 0.051 seconds
INFO  : OK
+-----------------+-------------------------+--------------------------------+
| version.ver_id  | version.schema_version  |    version.version_comment     |
+-----------------+-------------------------+--------------------------------+
| 1               | 3.1.1000                | Hive release version 3.1.1000  |
+-----------------+-------------------------+--------------------------------+
1 row selected (2.47 seconds)
0: jdbc:hive2://sdpmaster.litsoft.com.cn:2181>

beeline

beeline -u jdbc:hive2://litsoft.com.cn:10000/default -n hive -p *Hive123

hbase shell

image-20230906115809265

Spark-shell

sudo -u spark spark-shell

image-20230906131327784

val textFile = spark.read.textFile("file:///root/superfile.txt")

image-20230906140104321

image-20230906140336017

Spark-sql

sudo -u spark spark-sql

image-20230906140504106

image-20230906140530249

image-20230906140617852

pyspark

sudo -u spark pyspark

textFile = spark.read.text("file:///testdir001/superfile.txt")

testFile.first()

testFile.count()

Caused by: java.io.FileNotFoundException: File file:/testdir001/superfile.txt does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
0
  1. 支付宝打赏

    qrcode alipay
  2. 微信打赏

    qrcode weixin

评论区