Speed up/replace the loop for millions data:judge multi date range

Chon Kit Hui

Good evening guys,I have 6 millions data and they have four types.

z=structure(list(date = structure(c(11866, 16190, 14729, 11718), class = "Date"), 
           beg1 = structure(c(12264, 12264, 13970, 12264), class = "Date"), 
           end1 = structure(c(17621, 14760, 14760, 13298), class = "Date"), 
           ID1 = c(1003587, 1000396, 1010743, 1002113), beg2 = structure(c(NA, 
                                                                           14790, 14790, 13299), class = "Date"), end2 = structure(c(NA, 
                                                                                                                                     17621, 15217, 13969), class = "Date"), ID2 = c(NA, 1024488, 
                                                                                                                                                                                    1027877, 1002824), beg3 = structure(c(NA, NA, 15218, 13970
                                                                                                                                                                                    ), class = "Date"), end3 = structure(c(NA, NA, 17621, 14760
                                                                                                                                                                                    ), class = "Date"), ID3 = c(NA, NA, 1031361, 1002113), beg4 = structure(c(NA, 
                                                                                                                                                                                                                                                              NA, NA, 14790), class = "Date"), end4 = structure(c(NA, NA, 
                                                                                                                                                                                                                                                                                                                  NA, 17621), class = "Date"), ID4 = c(NA, NA, NA, 1021290), 
           realID = c(NA, NA, NA, NA)), row.names = c(267365L, 193587L, 
                                                      5294385L, 2039421L), class = "data.frame")

and I tried to judge and assign a suitalbe ID based on their date in which date ranges(use the loop).

for(i in 1:nrow(z)){tryCatch({print(i)
if(between(z$date[i],z$beg1[i],z$end1[i])==T){z$realID[i]=z$ID1[i]}
if(between(z$date[i],z$beg2[i],z$end2[i])==T){z$realID[i]=z$ID2[i]}
if(between(z$date[i],z$beg3[i],z$end3[i])==T){z$realID[i]=z$ID3[i]}
if(between(z$date[i],z$beg4[i],z$end4[i])==T){z$realID[i]=z$ID4[i]}},error=function(e){})}          

The code works. But,now the problem is I have too many datas,the loop is inefficiency,may be it will take almost one day to loop.

Does anyone know how can I improve or replace the code? Thanks you so much.

Dave2e

Since R is a vectorized language, to speed up this code it is best to operate on the entire vector as oppose to looping through each element.
As simple solution is to use a series of ifelse statements.

z$realID <- ifelse(!is.na(z$beg1) & z$date> z$beg1 & z$date< z$end1, z$ID1, z$realID)
z$realID <- ifelse(!is.na(z$beg2) & z$date> z$beg2 & z$date< z$end2, z$ID2, z$realID)
z$realID <- ifelse(!is.na(z$beg3) & z$date> z$beg3 & z$date< z$end3, z$ID3, z$realID)
z$realID <- ifelse(!is.na(z$beg4) & z$date> z$beg4 & z$date< z$end4, z$ID4, z$realID)

When the if statement evaluates TRUE, the realID will update if not it will retain its prior value.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to improve the speed of comparing strings in lists(millions of data)?

Date range loop into bbdd

delete millions for records from table between date range

Output data by date range

generate date range to be used in a loop

Loop through a date range with JavaScript

How to select data from multi tables based date range and table name (quarterly)

SQL speed: Return each date in date range and count() for each

Best data type (in terms of speed/RAM) for millions of pairs of a single int paired with a batch (2 to 100) of ints

SQL : find data by range of date

Subsetting data table by date range

Date Range for set of same data

Vectorizing or speed up for loop in Pandas for data transformation

Loop colouring for bulk data range

Python - For loop millions of rows

How do I loop through a date range?

PHP - Check Date Range/Availability in foreach loop

Loop through rows and exploding date range

Loop through date range with a monthly step

Is there way start a loop in a certain date range

TextChanged and switch statement to judge the changed textbox and rebuild chart data and update (Index out of Range) VB.Net

Improving for loop inception speed when comparing Date-time values

How to improve speed for table with millions of rows

Speed up millions of regex replacements in Python 3

speed up LIMIT query with millions of records

Speed up reading/hashing millions of files/images

Creating date range for each loop in a date object array in js

MySQL Date Range Multi Column Count and Group By Select Statement

Range based auto loop for multi-dimensional array as a function parameter