I am doing an assignment for my class where I have a file that is 7 Mb. Essentially I am supposed to break it up into 2 phases. Phase 1: I add each word from the file into an array list and sort it in alphabetical order. I then add every 100,000 words into 1 file, so I have 12 files in total with the naming convention as displayed below in code.
Phase 2: For every 2 files, I read one line from each file, and write which one comes first in alphabetical order into a new file (basically sort), until I eventually merge 2 files into 1 that is sorted. I do this in a loop, so that the number of files get halved each time while being sorted, so essentially I would have 7 MB all sorted into one file.
What I am having trouble with: For phase 2, I successfully read phase 1, but it seems that my files are all being copied repeatedly into multiple files, rather than being sorted and merged. I appreciate any help given, thank you.
File: It seems I cannot upload the .txt file, but the code should work so that any file with any number of lines can be merged, just the number of lines variable needs to be changed.
Summary: 1 Big big file unsorted, turns into multiple sorted files (ie. 12), first sort and merge turns it into 6 files, second sort and merge turns it into 3 files, third merge turns it into 2 files, and fourth merge turns it into 1 file big file again. Code:
package Assignment11;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Scanner;
public class FileSorter_1
{
public static ArrayList<String> storyline = new ArrayList<String>();
public static int num_lines = 100000; //this number can be changed
public static int num_files_initial;
public static int num_files_sec;
public static void main(String[] args) throws IOException
{
phase1();
phase2();
}
public static void phase1() throws IOException
{
Scanner story = new Scanner(new File("Aesop_Shakespeare_Shelley_Twain.txt")); //file name
int f = 0;
while(story.hasNext())
{
int i = 0;
while(story.hasNext())
{
String temp = story.next();
storyline.add(temp);
i++;
if(i > num_lines)
{
break;
}
}
Collections.sort(storyline, String.CASE_INSENSITIVE_ORDER);
BufferedWriter write2file = new BufferedWriter(new FileWriter("temp_0_" + f + ".txt")); //initialze new file
for(int x = 0; x<num_lines;x++)
{
write2file.write(storyline.get(x));
write2file.newLine();
}
write2file.close();
f++;
}
num_files_initial = f;
}
public static void phase2() throws IOException
{
int file_n = 1;
int prev_fn = 0;
int t = 0;
int g = 0;
while(g<5)
{
System.out.println(num_files_initial);
if(t+1 > num_files_initial-1)
{
if(num_files_initial % 2 != 0)
{
BufferedWriter w = new BufferedWriter(new FileWriter("temp_"+file_n +"_" + g + ".txt"));
Scanner file1 = new Scanner(new File("temp_"+prev_fn +"_" + t + ".txt"));
String word1 = file1.next();
while(file1.hasNext())
{
w.write(word1);
w.newLine();
}
g++;
break;
}
num_files_initial = num_files_initial / 2 + num_files_initial % 2;
g = 0;
t = 0;
file_n++;
prev_fn++;
}
String s1="temp_"+file_n +"_" + g + ".txt";
String s2="temp_"+prev_fn +"_" + t + ".txt";
String s3="temp_"+prev_fn +"_" + (t+1) + ".txt";
System.out.println(s2);
System.out.println(s3);
BufferedWriter w = new BufferedWriter(new FileWriter(s1));
Scanner file1 = new Scanner(new File(s2));
Scanner file2 = new Scanner(new File(s3));
String word1 = file1.next();
String word2 = file2.next();
System.out.println(num_files_initial);
//System.out.println(t);
//System.out.println(g);
while(file1.hasNext() && file2.hasNext())
{
if(word1.compareTo(word2) == 1) //if word 1 comes first = 1
{
w.write(word1);
w.newLine();
file1.next();
}
if(word1.compareTo(word2) == 0) //if word 1 comes second = 0
{
w.write(word2);
w.newLine();
file2.next();
}
}
while(file1.hasNext())
{
w.write(word1);
w.newLine();
break;
}
while(file2.hasNext())
{
w.write(word2);
w.newLine();
break;
}
g++;
t+=2;
w.close();
file1.close();
file2.close();
}
}
}
After writing data into the new files you are not clearing the existing sorted array and that's why it is being copied into new files. Here are some fixes:
Hope this helps.