Nov 10, 2011 How to Generate A Public/Private SSH Key Linux By Damien – Posted on Nov 10, 2011 Nov 18, 2011 in Linux If you are using SSH frequently to connect to a remote host, one of the way to secure the connection is to use a public/private SSH key so no password is transmitted over the network and it can prevent against brute force attack. Keytab files are not bound to the systems on which they were created; you can create a keytab file on one computer and copy it for use on other computers. Following is an example of the keytab file creation process using MIT Kerberos. In order to indicate a particular provider type and location, the user must provide the hadoop.security.credential.provider.path configuration element in core-site.xml or use the command line option -provider on each of the following commands. This provider path is a comma-separated list of URLs that indicates the type and location of a list of providers that should be consulted.
- Hadoop Commands In Unix
- Command To Generate Machine Keys In Hadoop Training
- Hadoop Command Line
- Command To Generate Machine Keys In Hadoop Tutorial
- Many Git servers authenticate using SSH public keys. In order to provide a public key, each user in your system must generate one if they don’t already have one. This process is similar across all operating systems. First, you should check to make sure you don’t already have a key.
- At the beginning, it is recommended to create a separate user for Hadoop to isolate Hadoop file system from Unix file system. Open the root using the command “su”. Create a user from the root account using the command “useradd username”. Now you can open an existing user account using the command “su username”.
HDFS Transparent Encryption protects Hadoop data that’s at rest on disk. When the encryption is enabled for a cluster, data write and read operations on encrypted zones (HDFS directories) on the disk are automatically encrypted and decrypted. This process is “transparent” because it’s invisible to the application working with the data. HDFS Transparent Encryption does not affect user access to Hadoop data, although it can have a minor impact on performance.
Prerequisite
The cluster where you want to use HDFS Transparent Encryption must have Kerberos enabled.
Important:
Security Setup must be enabled when creating the cluster. The person creating the cluster must choose the Security Setup: Enabled option on the Security page of the Create Cluster wizard, as described in Creating a Cluster. You can’t enable Kerberos for a cluster after it’s been created.
When you create a cluster with Security Setup enabled, the following takes place:
- HDFS Transparent Encryption is enabled on the cluster. You can verify this by entering the following at the command line:
bdacli getinfo cluster_hdfs_transparent_encryption_enabled
- MIT Kerberos, Sentry, Network Firewall, Network Encryption, and Auditing are also enabled on the cluster.
- Two principals are created as part of the Kerberos configuration:
hdfs/clustername@BDACLOUDSERVICE.ORACLE.COM
— The password for authenticating this principal is your Cloudera admin password.oracle/clustername@BDACLOUDSERVICE.ORACLE.COM
— The password for authenticating this principal is your Oracle operating system password.
In both cases,clustername
is the name of your cluster andBDACLOUDSERVICE.ORACLE.COM
is the Kerberos realm for Oracle Big Data Cloud Machine. - A Key Trustee Server is installed and configured on the cluster. This server is used for managing keys and certificates for HDFS Transparent Encryption. See Cloudera Navigator Key Trustee Server for more information about this server. (You should back up Key Trustee Server databases and configuration files on a regular schedule. See the Cloudera documentation topic, Backing Up and Restoring Key Trustee Server.)
#Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X
[AZURE.SELECTOR]
Secure Shell (SSH) allows you to remotely perform operations on your Linux-based HDInsight clusters using a command-line interface. This document provides information on using SSH with HDInsight from Linux, Unix, or OS X clients.
[AZURE.NOTE] The steps in this article assume you are using a Linux, Unix, or OS X client. These steps may be performed on a Windows-based client if you have installed a package that provides
ssh
and ssh-keygen
, such as Bash on Ubuntu on Windows.If you do not have SSH installed on your Windows-based client, use the steps in Use SSH with Linux-based HDInsight (Hadoop) from Windows for information on installing and using PuTTY.
##Prerequisites
- ssh-keygen and ssh for Linux, Unix, and OS X clients. This utilities are usually provided with your operating system, or available through the package management system.
- A modern web browser that supports HTML5.
OR
- Azure CLI.[AZURE.INCLUDE use-latest-version]
##What is SSH?
SSH is a utility for logging in to, and remotely executing, commands on a remote server. With Linux-based HDInsight, SSH establishes an encrypted connection to the cluster headnode and provides a command line that you use to type in commands. Commands are then executed directly on the server.
###SSH user name
An SSH user name is the name you use to authenticate to the HDInsight cluster. When you specify an SSH user name during cluster creation, this user is created on all nodes in the cluster. Once the cluster is created, you can use this user name to connect to the HDInsight cluster headnodes. From the headnodes, you can then connect to the individual worker nodes.
###SSH password or Public key
An SSH user can use either a password or public key for authentication. A password is just a string of text you make up, while a public key is part of a cryptographic key pair generated to uniquely identify you.
A key is more secure than a password, however it requires additional steps to generate the key and you must maintain the files containing the key in a secure location. If anyone gains access to the key files, they gain access to your account. Or if you lose the key files, you will not be able to login to your account.
A key pair consists of a public key (which is sent to the HDInsight server,) and a private key (which is kept on your client machine.) When you connect to the HDInsight server using SSH, the SSH client will use the private key on your machine to authenticate with the server.
##Create an SSH key
Use the following information if you plan on using SSH keys with your cluster. If you plan on using a password, you can skip this section.
- Open a terminal session and use the following command to see if you have any existing SSH keys:Look for the following files in the directory listing. These are common names for public SSH keys.
- id_dsa.pub
- id_ecdsa.pub
- id_ed25519.pub
- id_rsa.pub
- If you do not want to use an existing file, or you have no existing SSH keys, use the following to generate a new file:You will be prompted for the following information:
- The file location - The location defaults to ~/.ssh/id_rsa.
- A passphrase - You will be prompted to re-enter this.[AZURE.NOTE] We strongly recommend that you use a secure passphrase for the key. However, if you forget the passphrase, there is no way to recover it.
After the command finishes, you will have two new files, the private key (for example, id_rsa) and the public key (for example, id_rsa.pub).
##Create a Linux-based HDInsight cluster
When creating a Linux-based HDInsight cluster, you must provide the public key created previously. From Linux, Unix, or OS X clients, there are two ways to create an HDInsight cluster:
- Azure Portal - Uses a web-based portal to create the cluster.
- Azure CLI for Mac, Linux and Windows - Uses command-line commands to create the cluster.
Each of these methods will require either a password or a public key. For complete information on creating a Linux-based HDInsight cluster, see Provision Linux-based HDInsight clusters.
###Azure Portal
When using the Azure Portal to create a Linux-based HDInsight cluster, you must enter an SSH USER NAME, and select to enter a PASSWORD or SSH PUBLIC KEY.
If you select SSH PUBLIC KEY, you can either paste the public key (contained in the file with the .pub extension) into the SSH PublicKey field, or select Select a file to browse and select the public key file.
[AZURE.NOTE] The key file is simply a text file. The contents should appear similar to the following:
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCelfkjrpYHYiks4TM+r1LVsTYQ4jAXXGeOAF9Vv/KGz90pgMk3VRJk4PEUSELfXKxP3NtsVwLVPN1l09utI/tKHQ6WL3qy89WVVVLiwzL7tfJ2B08Gmcw8mC/YoieT/YG+4I4oAgPEmim+6/F9S0lU2I2CuFBX9JzauX8n1Y9kWzTARST+ERx2hysyA5ObLv97Xe4C2CQvGE01LGAXkw2ffP9vI+emUM+VeYrf0q3w/b1o/COKbFVZ2IpEcJ8G2SLlNsHWXofWhOKQRi64TMxT7LLoohD61q2aWNKdaE4oQdiuo8TGnt4zWLEPjzjIYIEIZGk00HiQD+KCB5pxoVtp user@system
This creates a login for the specified user, by using the password or public key you provide.
###Azure Command-Line Interface for Mac, Linux and Windows
You can use the Azure CLI for Mac, Linux and Windows to create a new cluster by using the
azure hdinsight cluster create
command.For more information on using this command, see Provision Hadoop Linux clusters in HDInsight using custom options.
##Connect to a Linux-based HDInsight cluster
From a terminal session, use the SSH command to connect to the cluster headnode by providing the address and user name:
- SSH address - There are two addresses that may be used to connect to a cluster using SSH:
- Connect to the headnode: The cluster name, followed by -ssh.azurehdinsight.net. For example, mycluster-ssh.azurehdinsight.net.
- Connect to the edge node: If your cluster is R Server on HDInsight, the cluster will also contain an edge node that can be accessed using RServer.CLUSTERNAME.ssh.azurehdinsight.net, where CLUSTERNAME is the name of the cluster.
- User name - The SSH user name you provided when you created the cluster.
The following example will connect to the primary headnode of mycluster as the user me:
If you used a password for the user account, you will be prompted to enter the password.
If you used an SSH key that is secured with a passphrase, you will be prompted to enter the passphrase. Otherwise, SSH will attempt to automatically authenticate by using one of the local private keys on your client.
[AZURE.NOTE] If SSH does not automatically authenticate with the correct private key, use the -i parameter and specify the path to the private key. The following example will load the private key from
~/.ssh/id_rsa
:ssh -i ~/.ssh/id_rsa [email protected]
If you are connecting to using the address for the headnode, and no port is specified, SSH will default to port 22, which will connect to the primary headnode on the HDInsight cluster. If you use port 23, you will connect to the secondary. For more information on the headnodes, see Availability and reliability of Hadoop clusters in HDInsight.
###Connect to worker nodes
The worker nodes are not directly accessible from outside the Azure datacenter, but they can be accessed from the cluster headnode via SSH.
Hadoop Commands In Unix
If you use an SSH key to authenticate your user account, you must complete the following steps on your client:
- Using a text editor, open
~/.ssh/config
. If this file doesn't exist, you can create it by enteringtouch ~/.ssh/config
in the terminal. - Add the following to the file. Replace CLUSTERNAME with the name of your HDInsight cluster.This configures SSH agent forwarding for your HDInsight cluster.
- Test SSH agent forwarding by using the following command from the terminal:This should return information similar to the following:If nothing is returned, this indicates that ssh-agent is not running. Consult your operating system documentation for specific steps on installing and configuring ssh-agent, or see Using ssh-agent with ssh.
- Once you have verified that ssh-agent is running, use the following to add your SSH private key to the agent:If your private key is stored in a different file, replace
~/.ssh/id_rsa
with the path to the file.
Use the following steps to connect to the worker nodes for your cluster.
[AZURE.IMPORTANT] If you use an SSH key to authenticate your account, you must complete the previous steps to verify that agent forwarding is working.
- Connect to the HDInsight cluster by using SSH as described previously.
- Once you are connected, use the following to retrieve a list of the nodes in your cluster. Replace ADMINPASSWORD with the password for your cluster admin account. Replace CLUSTERNAME with the name of your cluster.This will return information in JSON format for the nodes in the cluster, including
host_name
, which contains the fully qualified domain name (FQDN) for each node. The following is an example of ahost_name
entry returned by the curl command: - Once you have a list of the worker nodes you want to connect to, use the following command from the SSH session to the server to open a connection to a worker node:Replace USERNAME with your SSH user name and FQDN with the FQDN for the worker node. For example,
workernode0.workernode-0-e2f35e63355b4f15a31c460b6d4e1230.j1.internal.cloudapp.net
.[AZURE.NOTE] If you use a password to authentication your SSH session, you will be prompted to enter the password again. If you use an SSH key, the connection should finish without any prompts. - Once the session has been established, the terminal prompt will change from
username@hn#-clustername
tousername@wk#-clustername
to indicate that you are connected to the worker node. Any commands you run at this point will run on the worker node. - Once you have finished performing actions on the worker node, use the
exit
command to close the session to the worker node. This will return you to theusername@hn#-clustername
prompt.
Connect to a Domain-joined HDInsight cluster
Domain-joined HDInsight integrates Kerberos with Hadoop in HDInsight. Because the SSH user is not an Active Direcotry domain user, this user account cannot run Hadoop commands from SSH shell on a domain-joined cluster directly. You must run kinit first.
To run Hive queries on a Domain-joined HDInsight cluster using SSH
- Connect to a Domain-joined HDInsight cluster using SSH. For instrocutions, see Connect to a Linux-based HDInsight cluster.
- Run kinit. It will ask you for a domain user name and domain user password. For more information on configure domain users for domain-joined HDInsight clusters, see Configure Domain-joined HDInisight clusters.
- Open the Hive console by enter:Then you can run Hive commands.
##Add more accounts
- Generate a new public key and private key for the new user account, as described in the Create an SSH key section.[AZURE.NOTE] The private key should either be generated on a client that the user will use to connect to the cluster, or securely transferred to such a client after creation.
- From an SSH session to the cluster, add the new user with the following command:This will create a new user account, but will disable password authentication.
- Create the directory and files to hold the key by using the following commands:
- When the nano editor opens, copy and paste in the contents of the public key for the new user account. Finally, use Ctrl-X to save the file and exit the editor.
- Use the following command to change ownership of the .ssh folder and contents to the new user account:
- You should now be able to authenticate to the server with the new user account and private key.
##SSH tunneling
SSH can be used to tunnel local requests, such as web requests, to the HDInsight cluster. The request will then be routed to the requested resource as if it had originated on the HDInsight cluster headnode.
[AZURE.IMPORTANT] An SSH tunnel is a requirement for accessing the web UI for some Hadoop services. For example, both the Job History UI or Resource Manager UI can only be accessed using an SSH tunnel.
Command To Generate Machine Keys In Hadoop Training
For more information on creating and using an SSH tunnel, see Use SSH Tunneling to access Ambari web UI, ResourceManager, JobHistory, NameNode, Oozie, and other web UI's.
Hadoop Command Line
##Next steps
Command To Generate Machine Keys In Hadoop Tutorial
Now that you understand how to authenticate by using an SSH key, learn how to use MapReduce with Hadoop on HDInsight.