Athena query on CloudFront Logs for max rate during 5 minute interval

skohrs

I'm trying to query CloudFront Logs to derive a traffic baseline for the max rate of requests by IP address. I'm struggling to combine the date and time columns into a timestamp I can use to generate the 5 minute intervals. Am I even close with this query?

SELECT from_iso8601_timestamp(CONCAT(CAST(date as varchar), 'T', time, 'Z')) AS datetime, request_ip, COUNT(request_ip) AS count
FROM cloudfront_logs
WHERE "date"
  BETWEEN DATE '2020-10-28'
      AND DATE '2020-11-04'
GROUP BY CONCAT(CAST(date as varchar), 'T', time, 'Z'), request_ip
ORDER BY datetime ASC
LIMIT 10;

I'm trying to get to a point where I can do something like this:

SELECT
  FROM_UNIXTIME((UNIX_TIMESTAMP(table.timestamp) DIV 300) * 300) AS Timestamp,
  request_ip,
  COUNT(request_ip) AS count
FROM cloudfront_logs
WHERE "date"
  BETWEEN DATE '2020-10-28'
      AND DATE '2020-11-04'
GROUP BY CONCAT(CAST(date as varchar), 'T', time, 'Z'), request_ip, count
ORDER BY datetime ASC
LIMIT 10;

Per request, here's the format of the cloudfront_logs table based on the log format. I have tried to create the concatenated timestamp during table creation, but I had a lot of syntax errors:

CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
  `date` DATE,
  time STRING,
  location STRING,
  bytes BIGINT,
  request_ip STRING,
  method STRING,
  host STRING,
  uri STRING,
  status INT,
  referrer STRING,
  user_agent STRING,
  query_string STRING,
  cookie STRING,
  result_type STRING,
  request_id STRING,
  host_header STRING,
  request_protocol STRING,
  request_bytes BIGINT,
  time_taken FLOAT,
  xforwarded_for STRING,
  ssl_protocol STRING,
  ssl_cipher STRING,
  response_result_type STRING,
  http_version STRING,
  fle_status STRING,
  fle_encrypted_fields INT,
  c_port INT,
  time_to_first_byte FLOAT,
  x_edge_detailed_result_type STRING,
  sc_content_type STRING,
  sc_content_len BIGINT,
  sc_range_start BIGINT,
  sc_range_end BIGINT
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t'
LOCATION 's3://CloudFront_bucket_name/CloudFront/'
TBLPROPERTIES ( 'skip.header.line.count'='2' )

Taken from the official AWS Documentation on Querying Amazon CloudFront Logs

Failed example:

CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
  timestamp AS from_iso8601_timestamp(CONCAT(CAST(date as varchar), 'T', time, 'Z')) DATE,
  `date` DATE,
  time STRING,
  location STRING,
  bytes BIGINT,

Syntax error:

line 1:8: no viable alternative at input 'create external' (service: amazonathena; status code: 400; error code: invalidrequestexception; request id: e5147782-2e62-465e...)

UPDATE:

Here is a sample from the actual logs:

#Version: 1.0
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields c-port time-to-first-byte x-edge-detailed-result-type sc-content-type sc-content-len sc-range-start sc-range-end
2020-10-29      04:59:03        MEL50-C1        1524    192.168.0.2     GET     example.cloudfront.net    /mobile/js/category/category.js 200     https://www.example.com/    Mozilla/5.0%20(Linux;%20Android%2010;%20A7)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/192.168.0.1%20Mobile%20Safari/537.36      v=2020102601    cookie=redacted Hit     BSVYL08_EXAMPLE==        cdn.example.com      https   2001    0.005   -       TLSv1.3 TLS_AES_128_GCM_SHA256  Hit     HTTP/2.0
        -       -       26906   0.004   Hit     application/javascript  787     -       -

SOLUTION:

SELECT from_unixtime(floor(to_unixtime(from_iso8601_timestamp(CONCAT(CAST(date AS VARCHAR), 'T', time, 'Z'))) / 300) * 300), request_ip, COUNT(request_ip) AS count
FROM cloudfront_logs
WHERE "date"
  BETWEEN DATE '2020-10-28'
      AND DATE '2020-11-04'
GROUP BY from_unixtime(floor(to_unixtime(from_iso8601_timestamp(CONCAT(CAST(date AS VARCHAR), 'T', time, 'Z'))) / 300) * 300), request_ip
ORDER BY count DESC;
Theo

Maybe there's a shorter way to express it, but you should be able to use the following expression to round your date + time to 5 minute intervals:

from_unixtime(floor(to_unixtime(from_iso8601_timestamp(CONCAT(CAST(date AS VARCHAR), 'T', time, 'Z'))) / 300) * 300)

If you use that in your first query you should get what you're after.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Amazon Athena not parsing cloudfront logs

How to add new 5-minute interval

How to get every 5 minute interval record?

Max files in Laravel 5 logs

Query S3 logs content using Athena or DynamoDB

Error on query parsing alb logs by datetime in aws athena

Query records that happen during a time interval

How to read logs elements to find 5 minute intervals

Create 5-minute interval between two timestamp

Filter dataframe using the most recent datetime entry on a 5 minute interval

Athena Access Logs

Plot 10 minute interval

How to test for 5 minute intervals with SQL, given Windows Event Manager event timestamps in AWS Athena

Slow query log logs queries that are fast during testing

Not able to Athena query CloudTrail Logs from KMS CMK Encrypted S3 bucket

Increment minute to the nearest 15 minute interval

Explain MySQL Query for "How to group time column into 5 minute intervals"

How to round off to the closest 5 minute interval if it is 1 or 2 minutes ahead or behind?

How to round it down to the closest 5 minute interval if it is 1,2,3,4 minutes ahead?

How to create a 5-minute interval datetime index using incomplete dataset (Python)

Round to nearest 15 minute interval

Does log4net provides Max Rate feature i.e. maximum speed allowed to write logs to the file?

How to establish successful rate by interval?

MySQL query to fetch row count of hourly records inserted and max record per minute in that hour

AWS Athena Query Partitioning

Athena SQL query issue

AWS Athena map query

AWS Athena Query Structure

AWS Lambda schedule rate(0 minute)