0%

大数据bug记录

记录遇到的简单bug

201911-201912

集群时间不同步

错误详情

使用 spark 时,yarn 启动节点时报错

Application application_1576029485466_0001 failed 2 times due to Error launching 
appattempt_1576029485466_0001_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException:
Unauthorized request to start container.
This token is expired. current time is 1576059076270 found 1576030868681

解决办法

让时间同步

unable to create new native thread

错误详情

使用 spark 时,程序报错

2019-12-11 11:34:58,366 INFO scheduler.DAGScheduler: ResultStage 0 (collect at SparkSharder.java:430) failed in 80.679 s due to Job aborted due to stage failure: Task 5428 in stage 0.0 failed 4 times, most recent failure: Lost task 5428.3 in stage 0.0 (TID 5440, dell-r730-4, executor 3): java.io.IOException: DestHost:destPort dell-r720:8020 , LocalHost:localPort dell-r730-4/10.0.0.14:0. Failed on local exception: java.io.IOException: Couldn't set up IO streams: java.lang.OutOfMemoryError: unable to create new native thread
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:833)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:808)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1549)
at org.apache.hadoop.ipc.Client.call(Client.java:1491)
at org.apache.hadoop.ipc.Client.call(Client.java:1388)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:907)
at sun.reflect.GeneratedMethodAccessor307.invoke(Unknown Source)

解决办法

原因是超过了unlimt -u设定的最大线程数,把它增大即可

# 修改 limits.conf
$ vim /etc/security/limits.conf
...
* soft nofile 65536 # 文件打开数(以前改的)
* hard nofile 65536

* hard nproc 65536 # 线程数(加上这两行)
* soft nproc 65536
...


# 如果没有效果,则继续修改 20-nproc.conf(它限制了线程最大值)
$ vim /etc/security/limits.d/20-nproc.conf
# * soft nproc 65536 # 修改它