16/12/26 21:34:11 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null bin winutils.exe in the Hadoop binaries. Hadoop requires native libraries on Windows to work properly -that includes to access the file:// filesystem, where Hadoop uses some Windows APIs to implement posix-like file access permissions. This is implemented in HADOOP.DLL and WINUTILS.EXE.
We recently got a big new server at work to run Hadoop and Spark (H/S) on for a proof-of-concept test of some software we're writing for the biopharmaceutical industry and I hit a few snags while trying to get H/S up and running on Windows Server 2016. I've documented here, step-by-step, how I managed to install and run this pair of Apache products directly in the Windows cmd prompt, without any need for Linux emulation. Get the Software The first step is to download Java, Hadoop, and Spark. Spark seems to have trouble working with newer versions of Java, so I'm sticking with Java 8 for now:. For Java, I download the 'Windows x64' version of the JDK (jdk-8u191-windows-x64.exe); for Hadoop, the binary of v3.1.1 (hadoop-3.1.1.tar.gz); for Spark, v2.3.2 'Pre-built for Apache Hadoop 2.7 and later (spark-2.3.2-bin-hadoop2.7.tgz). Move all three of these files to C:. To extract the.gz archives.
Once they're extracted (Hadoop takes a while), you should have two directories and the JDK application: Run the Java installer but change the destination folder from the default C: Program Files Java jdk.1.8.0191 to just C: Java. (H/S can have trouble with directories with spaces in their names.) Another box will pop up asking for the 'Destination Folder' again. This time, use C: Java jre1.8.0191. Make two more directories in C: called C: Hadoop and C: Spark and copy the hadoop-3.1.1 and spark-2.3.2-bin-hadoop2.7 directories into those directories, respectively.