USACO 4.3 Letter Game

The basic idea is to use brutal force approach by enumerating all possible words generated from input word and choose the right ones. There are a couple of optimizations that we can do:

  • There are at most two strings in each anwser, because the length constraints (3 is the minimum length, 7 is the maximum length, so at most two strings.).
  • Similarly, since each word in dictionary is at least 3 characters long, we can skip sub strings with length 1 or 2 during enumeration.
  • During enumeration, whenever we find the first substring is greater lexicographically than the second, we can abort because the same string pair can occur later (which is what we want, the lexicographically smaller ones). There are some edge cases to deal with here though, such as the second string is empty, in which case the first string would be the full length of the source string.
  • The STL next_enumeration comes really handy.
#include <vector>
#include <iostream>
#include <fstream>
#include <map>
#include <string>
#include <algorithm>
#include <set>

using namespace std;

map<string,int> wordval;
set<string> used;
vector<string> ans;
vector<int> value = {2,5,4,4,1,6,5,5,1,7,6,3,5,2,3,5,7,2,1,2,4,6,6,7,5,7};
int ans1 = 0;
string word;

void compute() {
sort(word.begin(),word.end());
do {
for(int i = 3; i <= (int)word.size(); ++i) {
for(int j = i; j <= (int)word.size(); ++j) {
string w1, w2, w;
w1 = word.substr(0,i);
w2 = word.substr(i,j-i);
if (w1 > w2 && w2 != "") continue;
if(!wordval[w1] || (!wordval[w2] && w2 != "")) continue;

w = w1;
if(!w2.empty()) {
w += " "; w += w2;
}

if(ans1 == wordval[w1] + wordval[w2] && !used.count(w)) {
used.insert(w);
ans.push_back(w);
} else if(ans1 < wordval[w1] + wordval[w2]) {
ans1 = wordval[w1] + wordval[w2];
used.clear();
ans.clear();
used.insert(w);
ans.push_back(w);
}
}
}
} while (next_permutation(word.begin(),word.end()));
}

int main() {
ifstream fdin("lgame.dict");
ifstream fin("lgame.in");
ofstream fout("lgame.out");

string str;
wordval[""] = 0;
while(fdin >> str && str != ".") {
int val = 0;
for(int i = 0; i< (int)str.size(); ++i) {
val += value[str[i] - 'a'];
}
wordval[str] = val;
}

fin >> word;
cout << word << endl;
compute();

cout << ans1 << endl;
fout << ans1 << endl;
for(int i = 0; i < (int)ans.size(); ++i) {
cout << ans[i] << endl;
fout << ans[i] << endl;
}

return 0;
}

The Google File System

Paper digest - The Google File System

Motivation

  • Scalable distributed file system for large distributed data-intensive applications driven by observations of application workloads and technological environment, both current and anticipated:
    • Component failures are the norm rather than the exception.
    • Files are huge by traditional standards. Multi-GB files are common.
    • Workloads: large streaming reads and small random reads; many large, sequential writes.

Goals

  • Provides fault tolerance while running on inexpensive commodity hardware.

  • Delivers high aggregate performance to a large number of clients.

  • Must efficiently implement well-defined semantics for multiple clients that concurrently append to the same file. Atomicity with minimal synchronization overhead is essential.

  • Prefer bandwidth over latency (batch processing, not real time.).

  • Consistency Model : co-designing the applications and the file system for flexibility (in other words, push hard problem to other places.).

System Design

  • Master-Slave architecture.

    • GFS Master:
      • Management data storage: metadata, namespace, access control, mappings from files to chunks, current location of chunk.
      • System wide activities: chunk lease management, garbage collect of orphaned chunks, chunk migration.
      • System is designed to minimize master’s involvement since master is SPOF.
    • GFS Slave (aka chunk server):
      • Chunk with 64 bit globally uniquely chunk handle. Stored on linux FS. Replicated (3 copies by default.).
      • Does not cache files (rely on Linux’s buffer cache.).
    • GFS Client:
      • Interact with master for metadata operations.
      • Actual data IO does not go through master; go through between client and chunk server.
      • Does not cache files (infeasible to cache - too large; hard problem to solve for cache coherence, so don’t solve it.).
      • Does not cache files - continued: GFS applications, MapReduce and BigTable. MapReduce - sequential read, no need to cache. Bigtable has its own cache management.
  • Lease Management

    • Lease is used to maintain a consistent mutation order across replicas.
    • Master grants chunk lease to one replica (primary).
    • The primary picks a serial order for all mutations to the chunk. All replicas follow this order when applying mutations.
    • A lease has an initial timeout of 60 seconds. Can be extended at the request of primary chunk server, or can be revoked by master after lease expire.
  • Consistency Model

    • Relaxed consistency model.
    • File namespace mutations (e.g., file creation) are atomic, handled by master exclusively.
    • Data mutation case: consistent but undefined (e.g., concurrent succeed mutation).
    • Data mutation case: Inconsistent and undefined (e.g., failed mutation).
    • Concurrent append: at least once semantic (implies that records could have paddings in between, that readers need to dealt with).
    • Reader dealt with validating (through checksums) and identifying records.
    • Overall - flexible consistency model (instead of strong consistency) makes application development a little bit difficult.
  • Append

    • Pipeline
    • Data flow and control flow decoupled.
  • Fault Tolerant

    • Replicated master, check points and snapshots (**Q: how replication is done??)
    • Replicated chunk servers (default 3). Erasure codes / parity.