在建表的时候我们常将日期字段设置为INT类型,将诸如20180601
这样的数字值来表示日期,这样在做日期比较等操作时没有问题,但是要进行某些日期计算,就要先转成日期类型才能进行计算了,怎么转换呢?
数据准备
下面在Hive中先建一个表,含有一个INT类型的日期字段,插入两行数据。
1 | create table tb (dt INT); |
转换类型
这里第一种方法是将INT
类型的日期值转成STRING
类型,用Hive内置的unix_timestamp
函数转成时间戳类型,最后将时间戳用from_unixtime
转成yyyy-MM-dd
的日期类型。
第二种就比较直接,将INT
类型的日期值转成STRING
类型,再对字符串进行截取处理,用-
拼接起来。
1 | select dt, |
运行结果
dt | a | b |
---|---|---|
20180701 | 2018-07-01 | 2018-07-01 |
20180715 | 2018-07-15 | 2018-07-15 |
当然,每次都这样写有些费劲,可以在Hive中创建UDF或者宏,转换时进行调用就好了。
创建宏命令
宏命令相对于UDF要简单方便些,但是宏只能是临时宏,只在本次会话中可见、有效。因此你需要将宏脚本放在SQL脚本的头部。
1 | DROP TEMPORARY MACRO IF EXISTS date_trans; |
如果同一个功能的函数或宏命令被多次调用,那维护起来就很方便,语句也简洁很多。
Hive内置日期函数一览
Return Type | Name(Signature) | Description | 说明 |
---|---|---|---|
string | from_unixtime(bigint unixtime[, string format]) | Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00". | 将时间的秒值转换成format格式(format可为“yyyy-MM-dd hh:mm:ss”,“yyyy-MM-dd hh”,“yyyy-MM-dd hh:mm”等等)如from_unixtime(1250111000,"yyyy-MM-dd") 得到2009-03-12 |
bigint | unix_timestamp() | Gets current Unix timestamp in seconds. | 获取本地时区下的时间戳 |
bigint | unix_timestamp(string date) | Converts time string in format?yyyy-MM-dd HH:mm:ss?to Unix timestamp (in seconds), using the default timezone and the default locale, return 0 if fail: unix_timestamp('2009-03-20 11:30:01') = 1237573801 | 将格式为yyyy-MM-dd HH:mm:ss的时间字符串转换成时间戳 ?如unix_timestamp('2009-03-20 11:30:01') = 1237573801 |
bigint | unix_timestamp(string date, string pattern) | Convert time string with given pattern (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html]) to Unix time stamp (in seconds), return 0 if fail: unix_timestamp('2009-03-20', 'yyyy-MM-dd') = 1237532400. | 将指定时间字符串格式字符串转换成Unix时间戳,如果格式不对返回0 如:unix_timestamp('2009-03-20', 'yyyy-MM-dd') = 1237532400 |
string | to_date(string timestamp) | Returns the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01". | 返回时间字符串的日期部分 |
int | year(string date) | Returns the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970. | 返回时间字符串的年份部分 |
int | quarter(date/timestamp/string) | Returns the quarter of the year for a date, timestamp, or string in the range 1 to 4 (as of Hive?1.3.0). Example: quarter('2015-04-08') = 2. | 返回当前时间属性哪个季度 如quarter('2015-04-08') = 2 |
int | month(string date) | Returns the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11. | 返回时间字符串的月份部分 |
int | day(string date) dayofmonth(date) | Returns the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1. | 返回时间字符串的天 |
int | hour(string date) | Returns the hour of the timestamp: hour('2009-07-30 12:58:59') = 12, hour('12:58:59') = 12. | 返回时间字符串的小时 |
int | minute(string date) | Returns the minute of the timestamp. | 返回时间字符串的分钟 |
int | second(string date) | Returns the second of the timestamp. | 返回时间字符串的秒 |
int | weekofyear(string date) | Returns the week number of a timestamp string: weekofyear("1970-11-01 00:00:00") = 44, weekofyear("1970-11-01") = 44. | 返回时间字符串位于一年中的第几个周内 ?如weekofyear("1970-11-01 00:00:00") = 44, weekofyear("1970-11-01") = 44 |
int | datediff(string enddate, string startdate) | Returns the number of days from startdate to enddate: datediff('2009-03-01', '2009-02-27') = 2. | 计算开始时间startdate到结束时间enddate相差的天数 |
string | date_add(string startdate, int days) | Adds a number of days to startdate: date_add('2008-12-31', 1) = '2009-01-01'. | 从开始时间startdate加上days |
string | date_sub(string startdate, int days) | Subtracts a number of days to startdate: date_sub('2008-12-31', 1) = '2008-12-30'. | 从开始时间startdate减去days |
timestamp | from_utc_timestamp(timestamp, string timezone) | Assumes given timestamp is UTC and converts to given timezone (as of Hive?0.8.0). For example, from_utc_timestamp('1970-01-01 08:00:00','PST') returns 1970-01-01 00:00:00. | 如果给定的时间戳并非UTC,则将其转化成指定的时区下时间戳 |
timestamp | to_utc_timestamp(timestamp, string timezone) | Assumes given timestamp is in given timezone and converts to UTC (as of Hive?0.8.0). For example, to_utc_timestamp('1970-01-01 00:00:00','PST') returns 1970-01-01 08:00:00. | 如果给定的时间戳指定的时区下时间戳,则将其转化成UTC下的时间戳 |
date | current_date | Returns the current date at the start of query evaluation (as of Hive?1.2.0). All calls of current_date within the same query return the same value. | 返回当前时间日期 |
timestamp | current_timestamp | Returns the current timestamp at the start of query evaluation?(as of Hive?1.2.0). All calls of current_timestamp within the same query return the same value. | 返回当前时间戳 |
string | add_months(string start_date, int num_months) | Returns the date that is num_months after start_date?(as of Hive?1.1.0). start_date is a string, date or timestamp. num_months is an integer. The time part of start_date is ignored.?If start_date is the last day of the month or if the resulting month has fewer days than the day component of start_date, then the result is the last day of the resulting month. Otherwise, the result has the same day component as start_date. | 返回当前时间下再增加num_months个月的日期 |
string | last_day(string date) | Returns the last day of the month which the date belongs to?(as of Hive?1.1.0). date is a string in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'.?The time part of date is ignored. | 返回这个月的最后一天的日期,忽略时分秒部分(HH:mm:ss) |
string | next_day(string start_date, string day_of_week) | Returns the first date which is later than start_date and named as day_of_week?(as of Hive1.2.0).?start_date is a string/date/timestamp. day_of_week is 2 letters, 3 letters or full name of the day of the week (e.g. Mo, tue, FRIDAY). The time part of start_date is ignored. Example: next_day('2015-01-14', 'TU') = 2015-01-20. | 返回当前时间的下一个星期X所对应的日期 如:next_day('2015-01-14', 'TU') = 2015-01-20 ?以2015-01-14为开始时间,其下一个星期二所对应的日期为2015-01-20 |
string | trunc(string date, string format) | Returns date truncated to the unit specified by the format?(as of Hive?1.2.0). Supported formats: MONTH/MON/MM, YEAR/YYYY/YY. Example: trunc('2015-03-17', 'MM') = 2015-03-01. | 返回时间的最开始年份或月份 ?如trunc("2016-06-26",“MM”)=2016-06-01 ?trunc("2016-06-26",“YY”)=2016-01-01 ? 注意所支持的格式为MONTH/MON/MM, YEAR/YYYY/YY |
double | months_between(date1, date2) | Returns number of months between dates date1 and date2 (as of Hive?1.2.0). If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise the UDF calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. date1 and date2 type can be date, timestamp or string in the format 'yyyy-MM-dd' or 'yyyy-MM-dd HH:mm:ss'. The result is rounded to 8 decimal places. Example: months_between('1997-02-28 10:30:00', '1996-10-30') = 3.94959677 | 返回date1与date2之间相差的月份,如date1>date2,则返回正,如果date1 |
string | date_format(date/timestamp/string ts, string fmt) | Converts a date/timestamp/string to a value of string in the format specified by the date format fmt (as of Hive?1.2.0). Supported formats are Java SimpleDateFormat formats?–https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html. The second argument fmt should be constant. Example: date_format('2015-04-08', 'y') = '2015'.date_format can be used to implement other UDFs, e.g.:dayname(date) is date_format(date, 'EEEE') dayofyear(date) is date_format(date, 'D') | 按指定格式返回时间date 如:date_format("2016-06-22","MM-dd")=06-22 |