Efficiency way to clean data in R

Alegría

Input is

enter image description here

the row 3 and row 5 had incorrtct format, if I want

sale_date produst_model store_code
20210208 ASUS_DE552 AAE_08072
20210305 ASUS_AC693 AAE_08072
20210107 ASUS_DE551 AAR_7461
20210325 ASUS_DB341 CMHT_654
20210227 ASUS_HG0982 BR_981

If this table have 20,000 rows, Do I have more efficiency way to check every row is match rule?

Chris Ruehlemann

From looking at the data posted my hunch is that the strings in the three columns were at some point extracted from a composite string such as 20210227_ASUS_HG0982_BR_981 but the extraction seems to have gone wrong in some places. If this assumption is correct then I would recommend going back to the original strings and fixing the extraction, for example like this using the extract function:

library(tidyverse)
data.frame(original) %>%
  extract(original,
          into = c("sale_date", "produst_model", "store_code"),
          regex = "(\\d+)_(\\w+\\d+)_(\\w+)")
  sale_date produst_model store_code
1  20210227   ASUS_HG0982     BR_981

Data:

original = "20210227_ASUS_HG0982_BR_981"

Obviously, the regex here is based only on a single string and will likely have to be adapted as soon as you have more strings.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

R Custom Functions to Clean Data

clean data in r from image

How to clean up data in R?

Clean way to compute transition probabilities between two columns of a data.table in R

Efficiency in extracting data from webscraping in R

Clean way of sharing data between BLoCs

how to print data in nice clean way

Clean way to access a nested data structure

How do I clean twitter data in R?

Clean coordiates data from same colum in R

Need advice on using R to clean up data

Efficiency of the way comparator works

Trying to increase efficiency of an R function that's working with time series data

Efficiency when ranking dates using data.table in R

clean architecture - what is the correct way to do data model mapping?

Is there clean way to pass context data to @Asynchronous ejb call?

Clean way to make Subsequent AJAX Calls to API based on Data

lapply over lapply (or other way to clean timeseries data)

A clean way for adding variable-length values to data frame by group

clean way to unpickle data saved with several pickler.dump calls

Clean way of inserting data into mongodb through a function in your nodejs?

Clean way to populate select elements from tabular json data

Clean way to send data struct from python to arduino?

Efficiency/Clean code in multiple subquery join

Clean way to select variable for calculations depending on other variable value in R

Is there a clean, simple way to dynamically add dataframe columns in R?

Is there any way to clean the quotation "" in my column of my dataset in R

Clean Data With R: ifelse is changing value of data frame

Dart data structures efficiency