2017-02-22

USACO

USACO 4.4 Frame Up

The problem description mentions:

It is possible to see at least one part of each of the four sides of a frame.

With the information of the position of each side, we can calculate the position of each frame, as each frame can be determined by its left/down and right/up (or left/up right/down) corner. The relationship of frame B stacks on top of frame A can be modeled as a directed graph where A and B are vertices and there is an edge from A to B. Then do a topological sort pass to generate the result. Note it requires all possible results of topological sort, which requires back tracking with DFS. A good reference implementation of such algorithm can be found here.

#include <vector>
#include <iostream>
#include <fstream>
#include <string>
#include <cstring>

using namespace std;

int H, W;
vector<int> min_x(30, -1);
vector<int> min_y(30, -1);
vector<int> max_x(30, -1);
vector<int> max_y(30, -1);
vector<bool> exist(30, false);
char grid[33][33]; // processed input.
int g[30][30]; // matrix representation of graph.
vector<int> in_degree(30, 0);
ifstream fin("frameup.in");
ofstream fout("frameup.out");

int ans[30], nans = 0;

void buildGraph() {
    fin >> H >> W;
    for(int i = 0; i < H; ++i) {
        fin >> grid[i];
        for(int j = 0; j < W; ++j) {
            cout << grid[i][j];
            if(grid[i][j] == '.') continue;
            int v = grid[i][j] - 'A';
            if(min_x[v] == -1 || min_x[v]>i) min_x[v] = i;
            if(min_y[v] == -1 || min_y[v]>j) min_y[v] = j;
            if(max_x[v] == -1 || max_x[v]<i) max_x[v] = i;
            if(max_y[v] == -1 || max_y[v]<j) max_y[v] = j;
            exist[v] = true;
        }
        cout << endl;
    }
    
    for(int u = 0; u < 26; ++u) {
        if (!exist[u]) continue;
        for(int y = min_y[u]; y <= max_y[u]; ++y) {
            int v = grid[min_x[u]][y] - 'A';
            if(v != u && !g[u][v]) {
                g[u][v] = 1; ++in_degree[v];
            }
            
            v = grid[max_x[u]][y] - 'A';
            if(v != u && !g[u][v]) {
                g[u][v] = 1; ++in_degree[v];
            }
        }
        
        for(int x = min_x[u]; x <= max_x[u]; ++x) {
            int v = grid[x][min_y[u]] - 'A';
            if(v != u && !g[u][v]) {
                g[u][v] = 1; ++in_degree[v];
            }
            v = grid[x][max_y[u]] - 'A';
            if(v != u && !g[u][v]) {
                g[u][v] = 1; ++in_degree[v];
            }
        }
    }
}

// DFS that generates all topological sorts in alphabetic order.
void dfs() {
    bool all_used = true;
    for(int i = 0; i < 26; ++i) {
        if(exist[i]) {
            all_used = false;
            break;
        }
    }
    
    if(all_used) {
        for(int i = 0; i < nans; ++i) {
            fout << (char)(ans[i] + 'A');
            cout << (char)(ans[i] + 'A');
        }
        fout << endl; cout << endl;
        return;
    }
    
    for(int u = 0; u < 26; ++u) {
        if (!exist[u]) continue;
        if (in_degree[u]) continue;
        exist[u] = false;
        ans[nans++] = u;
        for(int v = 0; v < 26; ++v) {
            if(g[u][v]) --in_degree[v];
        }
        dfs();
        exist[u] = true;
        nans--;
        for(int v = 0; v < 26; ++v) {
            if(g[u][v]) in_degree[v]++;
        }
    }
}

int main() {
    buildGraph();
    dfs();
    return 0;
}

2017-02-19

USACO

USACO 4.4 Pollutant Control

It is an obvious max flow min cut problem. The nasty part of this problems is:

There might be multiple routes between same warehouses, so the graph might contain multiple edges between two vertices.
The output requires to print the routes under certain constraints, instead of just asking about the min cut cost.

The idea is:

Find the max flow first. The cost of the max flow must be min cut.
Sort all edges in cost descending order. Then iterate through the sorted edges, do two things:
- Remove this edge from the graph temporarily, calculate the maximum flow value. If the diff between the calculated maximum flow value and the original max flow value (the value calculated without removing the edge from the graph) is the capacity of the edge, then this edge must be in the min cut.
- If an edge is in a min cut, remove this edge from the graph, and repeat, until we get all edges.

This code is a simple modification of the Drainage Ditches Problem Solution that fits this problem, however it does not pass all test data because it does not deal with multiple edges connecting same two vertices. Will post another solution soon.

#include <vector>
#include <iostream>
#include <fstream>
#include <climits>
#include <functional>
#include <algorithm>
#include <numeric>

using namespace std;

int N, M, Si, Ei, Ci;
struct Edge {
    int to, cap, rev, index;
    Edge(int to, int cap, int rev, int index) :
    to(to), cap(cap), rev(rev), index(index) {}
};

vector<vector<Edge>> graph;
vector<vector<Edge>> graphCopy;
vector<bool> visited;
vector<Edge*> edges;
void addEdge(int from, int to, int cap, int index) {
    graph[from].emplace_back(Edge(to, cap, (int)graph[to].size(), index));
    graph[to].emplace_back(Edge(from, 0, (int)graph[from].size() - 1, index));
}

// Ford–Fulkerson
int dfs(int v, int t, int f) {
    if (v == t)
    return f;
    visited[v] = true;
    int size = (int)graph[v].size();
    for (int i = 0; i < size; ++i) {
        auto &e = graph[v][i];
        if (visited[e.to] || e.cap <= 0) continue;
        int d = dfs(e.to, t, std::min(f, e.cap));
        if (d <= 0) continue;
        e.cap -= d;
        graph[e.to][e.rev].cap += d;
        return d;
    }
    return 0;
}

int maxFlow(int s, int t) {
    int flow = 0;
    while (true) {
        std::fill(visited.begin(), visited.end(), false);
        int f = dfs(s, t, INT_MAX);
        if (f == 0) return flow;
        flow += f;
    }
}

int main() {
    ifstream fin("milk6.in");
    ofstream fout("milk6.out");
    
    fin >> N /* warehouses - vertices */ >> M /* trucks - edges */;
    graph = vector<vector<Edge>>(N + 1, vector<Edge>());
    
    visited = vector<bool>(N + 1, false);
    for (int i = 0; i < M; ++i) {
        fin >> Si >> Ei >> Ci;
        addEdge(Si, Ei, Ci, i);
        //cout << Si << " " << Ei << " " << Ci << " " << i << endl;
    }
    
    for (int i = 0; i < N; ++i) {
        for (auto &edge : graph[i]) {
            if (edge.cap == 0) continue;
            edges.emplace_back(&edge);
        }
    }

    sort(edges.begin(), edges.end(), [] (Edge *e1, const Edge *e2) {
        return e1->cap > e2->cap;
    });
    
    graphCopy = graph;
    int ans = maxFlow(1, N);
    graph = graphCopy;
    int total = ans;

    vector<int> ret;
    for (int i = 0; i < edges.size(); ++i) {
        //cout << "cap : " << edges[i]->cap << "; index : " << edges[i]->index << endl;
        graphCopy = graph;
        int cap = edges[i]->cap;
        edges[i]->cap -= cap;
        int flow = maxFlow(1, N);
        if (ans - flow == cap) {
            ret.push_back(edges[i]->index);
            graph = graphCopy;
            ans = flow;
        } else {
            graph = graphCopy;
            edges[i]->cap += cap;
        }
    }
    
    fout << total - ans << " " << ret.size() << endl;
    for (auto val : ret) {
        fout << val + 1 << endl;
    }
}

2017-02-11

Distributed System

Dynamo, Amazon’s Highly Available Key-value Store

Paper digest - Dynamo: Amazon’s Highly Available Key-value Store

Motivation

A highly available key-value storage system that manages state of Amazon services.
Reliability at massive scale, provide an “always-on” experience.

Requirements

Query Model: simple read and write operations to objects (relatively small, usually less than 1MB) that is uniquely identified by a key.
ACID (Atomicity, Consistency, Isolation, Durability) : weak consistency, no isolation, only single key update.
Efficiency:
- Function on a commodity hardware infrastructure.
- Strict latency requirements and SLA.
- Configurable to achieve customizable latency and throughput requirements.
Operation and Security:
- No authentication and authorization (operating in non-hostile environment).
- Scale up (each service uses its distinct instance of Dynamo).
SLA:
- Traditional SLA metrics (average, median and expected variance) are not good enough to build a system where all customers have a good experience.
- SLAs are expressed and measured at the 99.9th percentile of the distribution. Choose 99.9 for cost effective.

System Design

Favor availability over consistency by using optimistic replication techniques.
Optimistic replication introduces conflict. Conflict resolution introduces two problems: when to resolve them and who resolves them.
- When to resolve:
  - Write time: writes may be rejected if the data store cannot reach all the replicas at a given time. This leads to poor user experience.
  - Read time: Dynamo targets the design space of an “always write-able” data store. So conflicts are resolved at read time.
- Who to perform resolve:
  - Configurable.
  - System level resolution: last write wins.
  - Application level resolution: interactive guided merge conflicts, etc.
Other design considerations: Incremental scalability, Symmetry, Decentralization, Heterogeneity.

System Architecture

Partitioning: Consistent Hashing.
Replication: Replicated write protocol with configurable NWR parameters.
Conflict Resolution: Vector clock.
Temporary failures: Sloppy Quorum and hinted handoff.
Recovering from permanent failures: Merkle trees.
Membership and failure detection: Gossip-based membership protocol and failure detection.

Quantum Field