大家能帮忙看下问啥这个程序很慢吗?求优化建议
- Java code
public class MyThread extends Thread{ private CountDownLatch threadsSignal; private int hsmapCapacity; public MyThread(CountDownLatch threadsSignal, int capacity){ super(); this.threadsSignal = threadsSignal; this.hsmapCapacity = capacity; } @Override public void run(){ System.out.println(Thread.currentThread().getName() + "Start..."); FileReader fr; BufferedReader bfr; FileWriter fw; BufferedWriter bfw; HashMap<String,Long> nodes = new HashMap<String,Long>(hsmapCapacity); String line, rev, s1, s1_rev; Random rd = new Random(); int p,len,j; long cnt; try{ File dir = new File("Maps"); if(!dir.exists()) dir.mkdir(); nodes.clear(); while(blocks.cardinality()<numOfBlocks){ p = rd.nextInt(numOfBlocks); while(blocks.get(p)==true) p = rd.nextInt(numOfBlocks); blocks.set(p); fr = new FileReader("Nodes/nodes"+p); bfr = new BufferedReader(fr, bufSize); fw = new FileWriter("Maps/maps"+p); bfw = new BufferedWriter(fw, bufSize); //nodes.clear(); while((line = bfr.readLine()) != null){ String[] strs = line.split("\t"); cnt = new Long(strs[1]); rev = getReverse(getTwin(strs[0])); len = rev.length(); long preOriginal = -1, preReplace = -1, Original = -1, Replace = -1; long diff = -1; boolean newOut = true, next = false; for(j = 0; j < strs[0].length() - k + 1; j++){ s1 = strs[0].substring(j, k + j); s1_rev = rev.substring(len - j - k, len - j); if(!nodes.containsKey(s1) && !nodes.containsKey(s1_rev)){ nodes.put(s1, cnt+j*2); if(!newOut && !next){ bfw.write(preOriginal+"\t"+preReplace); bfw.newLine(); newOut = true; } } else{ if(nodes.containsKey(s1)){ Original = cnt+j*2; Replace = nodes.get(s1); } else if(nodes.containsKey(s1_rev)){ Original = cnt+j*2; Replace = nodes.get(s1_rev)+1; } if(newOut){ bfw.write(Original+"\t"+Replace); bfw.newLine(); newOut = false; next = true; } else if(Original-preOriginal==2){ if(next){ diff = Replace - preReplace; bfw.write(diff>0?"+":"-"); bfw.newLine(); next = false; } else{ if(Replace - preReplace != diff){ bfw.write(preOriginal+"\t"+preReplace); bfw.newLine(); bfw.write(Original+"\t"+Replace); bfw.newLine(); next = true; } } } preOriginal = Original; preReplace = Replace; } } if(!newOut && !next){ bfw.write(preOriginal+"\t"+preReplace); bfw.newLine(); } } nodes.clear(); bfw.close(); fw.close(); bfr.close(); fr.close(); } }catch(Exception E){ System.out.println("Exception caught!"); E.printStackTrace(); } threadsSignal.countDown(); System.out.println(Thread.currentThread().getName() + "End. Remaining" + threadsSignal.getCount() + " threads"); } } private void BuildMap(int threadNum, int hsmapCapacity) throws Exception{ CountDownLatch threadSignal = new CountDownLatch(threadNum); for(int i=0;i<threadNum;i++){ Thread t = new MyThread(threadSignal, hsmapCapacity); t.start(); } threadSignal.await(); System.out.println(Thread.currentThread().getName() + "End."); }
输入文件是256个文件,分别为node0到node255,输出也为256个文件,分别为map0到map255,每个输入文件的每一行为一个长字符串(长度大于59)+“\t"+一个数字。程序要做的事情就是读每一个文件,把每一行的长字符串拆成短的子串(长度均为59),放入hashmap,如果遇到一样的子串,则把新出现的重复子串的id和已经在hashmap中的子串的id的替代关系写入文件,为了避免写入文件量过大,这里写的是替代的范围。因为每一个文件的读写和操作都相互独立,所以这里用了多线程,但测试发现,当读取的256个文件总大小为3.2G,写出文件总大小为1.2G, k取59时,用8个线程需要12-14分钟,磁盘读取速度很快,按理说IO量也不大,计算量也不大,怎么会这么慢呢?求优化建议,谢谢!
注:rev = getReverse(getTwin(strs[0]));就是把原字符串中的每个字符做一下替换然后做reverse
[解决办法]
大概知道你的意思了。
这个比较麻烦,3.2G的文件数据,文件记录还要在固定长度的基础上一位一位往后取,整体上就慢。
首先先说明一点,多线程只是为了并发处理,并不是提高处理速度,因为单CPU的话,多线程也是轮流使用CPU的,所以线程切换也可能耗时间(多CPU也可能受共享内存的限制)。
在你的程序看来,我觉得你的线程取文件用了个随机不太好,即
while(blocks.get(p)==true)
p = rd.nextInt(numOfBlocks);
这样的话,可能有获取冲突的时候,就要不断地循环。
像你的程序,可以设置一个counter,每个线程取到一个文件p=counter后把counter+1,这样下一个程序取到的文件号就不会重复,可以避免随机重复循环。
其次substring可能比较慢,可以考虑用StringBuilder,即
String[] strs = line.split("\t");
cnt = new Long(strs[1]);
StringBuilder buf = new StringBuilder(strs[0].sunstring(0, k-1));
StringBuilder buf_rev = new StringBuilder(strs[0].substring(0, k-1)).reverse();
char[] c = strs[0].toCharArray();
for (int j=k; j<c.length; j++) {
buf.append(c[j]);
buf_rev.insert(0, c[j]);
s1 = buf.toString();
s1_rev = buf_rev.toString();
if(!nodes.containsKey(s1) && !nodes.containsKey(s1_rev)){
...
}
}