martes, septiembre 26, 2017

Schedule a task command line task with windows task scheduler.

Scheduling a command on the windows task scheduler can be tricky. So below some things to have in mind.

In the "Action" tab:
1- start program

2- Program/script: C:\Windows\System32\cmd.exe

3- Add arguments: /c nameOfTheScript.bat
    If you wanna log error use the following:
     script.bat > logall.txt 2>&1
     If you wanna append on the the file:
     script.bat >> logall.txt 2>&1
 4- Start In:  c:/someDir (without the back slash)


lunes, septiembre 04, 2017

How to send files from HDFS to S3 compressed

Below is a code in bash to transfer all files from HDFS to S3, and compress them.

#!/bin/bash
# ./s3check /user/raw/2016
# it will check recursivly check all files on that path on S3, with the equivalent path on s3.
# if the file is not on s3 it will created with compresion
if [ -z "$1" ]; then
echo usage: $0 directory
exit
fi
hdfsDir=$1
echo hdfs dir: $hdfsDir
files=$(hadoop fs -ls -R $hdfsDir/*/ | awk '{print $NF}' | grep FlumeData.*$ |tr '\n' '\n') # change FlumeData with the name of the files u wanna check
let count=0
let errors=0
let checked=0
for file in $files
do
lsFile=$( aws s3 ls s3://bucket/directory$file ) #change the bucket and the directory name
fileArr=( $lsFile )
fileSize=${fileArr[2]}
if [[ -n "$lsFile" ]] && [[ $fileSize -gt 500 ]]; then
let checked=checked+1
echo "Checked: $file"
else
echo copying: $file
if (hdfs dfs -cat hdfs://$file |gzip| aws s3 cp - s3://btr-dataPlatform/backup$file.gz); then
let count=count+1
else
echo error on $file
let errors=errors+1
fi
fi
done
echo "Count: $count"
echo "Errors: $errors"
echo "Checked: $checked"
view raw s3check.sh hosted with ❤ by GitHub