Jan Zyka Jan Zyka - 5 months ago 19x
Java Question

Should I call ugi.checkTGTAndReloginFromKeytab() before every action on hadoop?

In my server application I'm connecting to Kerberos secured Hadoop cluster from my java application. I'm using various components like the HDFS file system, Oozie, Hive etc. On the application startup I do call

UserGroupInformation.loginUserFromKeytabAndReturnUGI( ... );

This returns me
instance and I keep it for application lifetime. When doing priviledged action I launch them

This works fine but I wonder if and when should I renew the kerberos ticket in
? I found a method
which seems to do the ticket renewal whenever it's close to expiry. I also found that this method is being called by various Hadoop tools like
for example.

Now if I want to my server application (possibly running for months or even years) to never experience ticket expiry what is the best approach? To provide concrete questions:

  1. Can I rely on the various Hadoop clients they call
    whenever it's needed?

  2. Should I call ever
    myself in my code?

  3. If so should I do that before every single call to
    or rather setup a timer and call it periodically (how often)?


Hadoop committer here! This is an excellent question.

Unfortunately, it's difficult to give a definitive answer to this without a deep dive into the particular usage patterns of the application. Instead, I can offer general guidelines and describe when Hadoop would handle ticket renewal or re-login from a keytab automatically for you, and when it wouldn't.

The primary use case for Kerberos authentication in the Hadoop ecosystem is Hadoop's RPC framework, which uses SASL for authentication. Most of the daemon processes in the Hadoop ecosystem handle this by doing a single one-time call to UserGroupInformation#loginUserFromKeytab at process startup. Examples of this include the HDFS DataNode, which must authenticate its RPC calls to the NameNode, and the YARN NodeManager, which must authenticate its calls to the ResourceManager. How is it that daemons like the DataNode can do a one-time login at process startup and then keep on running for months, long past typical ticket expiration times?

Since this is such a common use case, Hadoop implements an automatic re-login mechanism directly inside the RPC client layer. The code for this is visible in the RPC Client#handleSaslConnectionFailure method:

          // try re-login
          if (UserGroupInformation.isLoginKeytabBased()) {
          } else if (UserGroupInformation.isLoginTicketBased()) {

You can think of this as "lazy evaluation" of re-login. It only re-executes login in response to an authentication failure on an attempted RPC connection.

Knowing this, we can give a partial answer. If your application's usage pattern is to login from a keytab and then perform typical Hadoop RPC calls, then you likely do not need to roll your own re-login code. The RPC client layer will do it for you. "Typical Hadoop RPC" means the vast majority of Java APIs for interacting with Hadoop, including the HDFS FileSystem API, the YarnClient and MapReduce Job submissions.

However, some application usage patterns do not involve Hadoop RPC at all. An example of this would be applications that interact solely with Hadoop's REST APIs, such as WebHDFS or the YARN REST APIs. In that case, the authentication model uses Kerberos via SPNEGO as described in the Hadoop HTTP Authentication documentation.

Knowing this, we can add more to our answer. If your application's usage pattern does not utilize Hadoop RPC at all, and instead sticks solely to the REST APIs, then you must roll your own re-login logic. This is exactly why WebHdfsFileSystem calls UserGroupInformation#checkTGTAndReloginFromkeytab, just like you noticed. WebHdfsFileSystem chooses to make the call right before every operation. This is a fine strategy, because UserGroupInformation#checkTGTAndReloginFromkeytab only renews the ticket if it's "close" to expiration. Otherwise, the call is a no-op.

As a final use case, let's consider an interactive process, not logging in from a keytab, but rather requiring the user to run kinit externally before launching the application. In the vast majority of cases, these are going to be short-running applications, such as Hadoop CLI commands. However, in some cases these can be longer-running processes. To support longer-running processes, Hadoop starts a background thread to renew the Kerberos ticket "close" to expiration. This logic is visible in UserGroupInformation#spawnAutoRenewalThreadForUserCreds. There is an important distinction here though compared to the automatic re-login logic provided in the RPC layer. In this case, Hadoop only has the capability to renew the ticket and extend its lifetime. Tickets have a maximum renewable lifetime, as dictated by the Kerberos infrastructure. After that, the ticket won't be usable anymore. Re-login in this case is practically impossible, because it would imply re-prompting the user for a password, and they likely walked away from the terminal. This means that if the process keeps running beyond expiration of the ticket, it won't be able to authenticate anymore.

Again, we can use this information to inform our overall answer. If you rely on a user to login interactively via kinit before launching the application, and if you're confident the application won't run longer than the Kerberos ticket's maximum renewable lifetime, then you can rely on Hadoop internals to cover periodic renewal for you.

If you're using keytab-based login, and you're just not sure if your application's usage pattern can rely on the Hadoop RPC layer's automatic re-login, then the conservative approach is to roll your own. @SamsonScharfrichter gave an excellent answer here about rolling your own.

HBase Kerberos connection renewal strategy

Finally, I should add a note about API stability. The Apache Hadoop Compatibility guidelines discuss the Hadoop development community's commitment to backwards-compatibility in full detail. The interface of UserGroupInformation is annotated LimitedPrivate and Evolving. Technically, this means the API of UserGroupInformation is not considered public, and it could evolve in backwards-incompatible ways. As a practical matter, there is a lot of code already depending on the interface of UserGroupInformation, so it's simply not feasible for us to make a breaking change. Certainly within the current 2.x release line, I would not have any fear about method signatures changing out from under you and breaking your code.

Now that we have all of this background information, let's revisit your concrete questions.

Can I rely on the various Hadoop clients they call checkTGTAndReloginFromKeytab whenever it's needed?

You can rely on this if your application's usage pattern is to call the Hadoop clients, which in turn utilize Hadoop's RPC framework. You cannot rely on this if your application's usage pattern only calls the Hadoop REST APIs.

Should I call ever checkTGTAndReloginFromKeytab myself in my code?

You'll likely need to do this if your application's usage pattern is solely to call the Hadoop REST APIs instead of Hadoop RPC calls. You would not get the benefit of the automatic re-login implemented inside Hadoop's RPC client.

If so should I do that before every single call to ugi.doAs(...) or rather setup a timer and call it periodically (how often)?

It's fine to call UserGroupInformation#checkTGTAndReloginFromKeytab right before every action that needs to be authenticated. If the ticket is not close to expiration, then the method will be a no-op. If you're suspicious that your Kerberos infrastructure is sluggish, and you don't want client operations to pay the latency cost of re-login, then that would be a reason to do it in a separate background thread. Just be sure to stay a little bit ahead of the ticket's actual expiration time. You might borrow the logic inside UserGroupInformation for determining if a ticket is "close" to expiration. In practice, I've never personally seen the latency of re-login be problematic.