What is the best way to sample data based on date range?

98 views Asked by At

I have weather dataset from 01 Nov 2007 until 18 May 2008 my data is date-dependent

I want to predict the temperature from 07 May 2008 until 18 May 2008 (which is maybe a total of 10-15 observations) my data size is around 200

I will be using decision tree/RF and SVM & NN to make my prediction

I've never handled data like this so I'm not sure how to sample it if we ignore the bias factor can I sample training data from 01 Nov 2007 to 18 May 2008 and test data from 07 May 2008 to 18 May 2008? or is there a better way to handle this ? or would it be better to first sort my data by date then split my data (ordered) with 80:20 for test and training set then just output the required date?



install.packages("rattle")
install.packages("RGtk2")
library("rattle")

seed <- 42
set.seed(seed)
fname <- system.file("csv", "weather.csv", package = "rattle")
dataset <- read.csv(fname, encoding = "UTF-8")

dataset$Date <- convert_to_date(dataset$Date)

dataset <- dataset[order(as.Date(dataset$Date, format="%Y/%M/%D")),]
dataset <- dataset[1:200,]
str(dataset)
> str(dataset)
'data.frame':   200 obs. of  24 variables:
 $ Date         : Date, format: "2007-11-01" "2007-11-02" "2007-11-03" ...
 $ Location     : chr  "Canberra" "Canberra" "Canberra" "Canberra" ...
 $ MinTemp      : num  8 14 13.7 13.3 7.6 6.2 6.1 8.3 8.8 8.4 ...
 $ MaxTemp      : num  24.3 26.9 23.4 15.5 16.1 16.9 18.2 17 19.5 22.8 ...
 $ Rainfall     : num  0 3.6 3.6 39.8 2.8 0 0.2 0 0 16.2 ...
 $ Evaporation  : num  3.4 4.4 5.8 7.2 5.6 5.8 4.2 5.6 4 5.4 ...
 $ Sunshine     : num  6.3 9.7 3.3 9.1 10.6 8.2 8.4 4.6 4.1 7.7 ...
 $ WindGustDir  : chr  "NW" "ENE" "NW" "NW" ...
 $ WindGustSpeed: int  30 39 85 54 50 44 43 41 48 31 ...
 $ WindDir9am   : chr  "SW" "E" "N" "WNW" ...
 $ WindDir3pm   : chr  "NW" "W" "NNE" "W" ...
 $ WindSpeed9am : int  6 4 6 30 20 20 19 11 19 7 ...
 $ WindSpeed3pm : int  20 17 6 24 28 24 26 24 17 6 ...
 $ Humidity9am  : int  68 80 82 62 68 70 63 65 70 82 ...
 $ Humidity3pm  : int  29 36 69 56 49 57 47 57 48 32 ...
 $ Pressure9am  : num  1020 1012 1010 1006 1018 ...
 $ Pressure3pm  : num  1015 1008 1007 1007 1018 ...
 $ Cloud9am     : int  7 5 8 2 7 7 4 6 7 7 ...
 $ Cloud3pm     : int  7 3 7 7 7 5 6 7 7 1 ...
 $ Temp9am      : num  14.4 17.5 15.4 13.5 11.1 10.9 12.4 12.1 14.1 13.3 ...
 $ Temp3pm      : num  23.6 25.7 20.2 14.1 15.4 14.8 17.3 15.5 18.9 21.7 ...
 $ RainToday    : chr  "No" "Yes" "Yes" "Yes" ...
 $ RISK_MM      : num  3.6 3.6 39.8 2.8 0 0.2 0 0 16.2 0 ...
 $ RainTomorrow : chr  "Yes" "Yes" "Yes" "Yes" ...


0

There are 0 answers