There are two settings that control the number of retries (i.e. the maximum number of
ApplicationMaster registration attempts with YARN is considered failed and hence the entire Spark application):
spark.yarn.maxAppAttempts - Spark's own setting. See https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala#L50.
yarn.resourcemanager.am.max-attempts - YARN's own setting with default being 2.
(As you can see in YarnRMClient.getMaxRegAttempts) the actual number is the minimum of the configuration settings of YARN and Spark with YARN's being the last resort.