Save svmlight file

17 views Asked by At

I created two dictionaries as follows:

pf (key is id)

{78: [(20.0, 1.0), (164.0, 1.0), (175.0, 1.0), (182.0, 1.0), (939.0, 1.0), (943.0, 1.0), (944.0, 1.0), (954.0, 1.0), (957.0, 1.0), (959.0, 1.0), (962.0, 1.0), (965.0, 1.0), (969.0, 1.0), (976.0, 1.0), (981.0, 1.0), (988.0, 1.0), (1217.0, 1.0)], 181: [(175.0, 1.0), (284.0, 1.0), (879.0, 1.0), (899.0, 1.0), (922.0, 1.0), (930.0, 1.0), (932.0, 1.0), (959.0, 1.0), (965.0, 1.0), (976.0, 1.0), (1026.0, 1.0), (1075.0, 1.0), (1097.0, 1.0), (1154.0, 1.0), (1162.0, 1.0), (1366.0, 1.0), (1441.0, 1.0), (1444.0, 1.0), (1447.0, 1.0), (1454.0, 1.0), (1464.0, 1.0), (114.0, 1.0)], 190: [(175.0, 1.0), (208.0, 1.0), (284.0, 1.0), (898.0, 1.0), (899.0, 1.0), (929.0, 1.0), (1455.0, 1.0), (390.0, 1.0), (937.0, 1.0), (987.0, 1.0), (1009.0, 1.0), (1131.0, 1.0), (1385.0, 1.0), (1432.0, 1.0), (1437.0, 1.0), (1442.0, 1.0), (1452.0, 1.0), (150.0, 1.0), (222.0, 1.0), (294.0, 1.0), (421.0, 1.0), (1231.0, 1.0), (1355.0, 1.0), ]}

and h (key is also id, same as in pf)

{33: 1, 64: 1, 156: 1, 181: 1, 199: 1, 216: 1, 307: 1, 352: 1, 355: 1, 358: 1, 373: 1, 415: 1, 445: 1, 478: 1, 542: 1, 563: 1, 606: 1, 625: 1, 644: 1, 658: 1, 686: 1, 752: 1, 774: 1, 871: 1, 932: 1, 935: 1, 953: 1, 1020: 1, 1139: 1, 1141: 1, 1152: 1, 1178: 1, 1267: 1, 1343: 1, 1344: 1, 1375: 1, 1386: 1, 1408: 1, 1490: 1, 1587: 1, 1612: 1, 1642: 1, 1663: 1, 1804: 1, 1816: 1, 1881: 1, 1932: 1, 1947: 1, 1987: 1, 2009: 1, 2028: 1, 2030: 1, 2041: 1, 2063: 1, 2079: 1, 2086: 1, 2121: 1, 2123: 1, 2146: 1, 2200: 1, 2230: 1, 2256: 1, 2294: 1, 2320: 1, 2342: 1, 2370: 1, 2449: 1, 2467: 1, 2477: 1, 2488: 1, 2492: 1, 2588: 1, 2647: 1, 2659: 1, 2679: 1, 2778: 1, 2793: 1, 2827: 1, 2856: 1, 2863: 1, 2931: 1, 2957: 1, 2984: 1, 2994: 1, 3008: 1, 3019: 1, 3040: 1, 3053: 1, 3076: 1, 3077: 1, 3089: 1, 3093: 1, 3114: 1, 3120: 1, 3149: 1, 3222: 1, 3239: 1, 3252: 1, 3287: 1, 3290: 1, 3331: 1}

Now I'd like to use two functions from util library (below) to create third function that saves these to svmlight format as follows:

1 2:0.5 3:0.12 10:0.9 2000:0.3
0 4:1.0 78:0.6 1009:0.2
1 33:0.1 34:0.98 1000:0.8 3300:0.2
1 34:0.1 389:0.32

..where the first digit is the value from h. (Keep in mind that id refers to same thing in both dictionaries. h dictionary has less ids than df). If 'id' in pf is not in h, than the first digit would be 0.

Here are the two functions that I'd like to use:

from sklearn.datasets import load_svmlight_file

def bag_to_svmlight(input):
    return ' '.join(( "%d:%f" % (fid, float(fvalue)) for fid, fvalue in input))

#input is features and label stored in the svmlight_file
def get_data_from_svmlight(svmlight_file):
    data_train = load_svmlight_file(svmlight_file,n_features=1473)
    X_train = data_train[0]
    Y_train = data_train[1]
    return X_train, Y_train

This is what I have so far:

def save_svm(pf, h, file_name_str):

deliverable = open(file_name_str, 'wb')
#I'm stuck on this. Don't know python enough to write the labels and features correctly
deliverable.write(bytes(f"{label} {feature_value}\n", 'utf-8'))

I am really unsure on how to save the output from these two dictionaries in the format specified above.

0

There are 0 answers