Paris - Une description bien detaile

Most Popular
Study
Business
Design
Technology
Travel
Explore all categories

Home
Documents
Paris - Une description bien detaile

out of 9

Upload: gab

Post on 04-Apr-2018

229 views

Category:

Documents

0 download

Report

Download

Facebook Twitter E-Mail LinkedIn Pinterest

Embed Size (px):

TRANSCRIPT

7/29/2019 Paris - Une description bien detaile
1/9
What Makes Paris Look like Paris?
Carl Doersch1 Saurabh Singh1 Abhinav Gupta1 Josef Sivic2 AlexeiA. Efros1,2
1Carnegie Mellon University 2INRIA / Ecole Normale Superieure,Paris
Figure 1: These two photos might seem nondescript, but eachcontains hints about which city it might belong to. Given a largeimagedatabase of a given city, our algorithm is able toautomatically discover the geographically-informative elements(patch clusters to the rightof each photo) that help in capturingits look and feel. On the left, the emblematic street sign, abalustrade window, and the balcony
support are all very indicative of Paris, while on the right,the neoclassical columned entryway sporting a balcony, a Victorianwindow, and,of course, the cast iron railing are very much featuresof London.
Abstract
Given a large repository of geotagged imagery, we seek toauto-matically find visual elements, e.g. windows, balconies, andstreetsigns, that are most distinctive for a certain geo-spatialarea, forexample the city of Paris. This is a tremendouslydifficult task asthe visual features distinguishing architecturalelements of differ-ent places can be very subtle. In addition, weface a hard searchproblem: given all possible patches in allimages, which of themare both frequently occurring andgeographically informative? Toaddress these issues, we propose touse a discriminative clusteringapproach able to take into accountthe weak geographic supervi-sion. We show that geographicallyrepresentative image elementscan be discovered automatically fromGoogle Street View imageryin a discriminative manner. Wedemonstrate that these elements arevisually interpretable andperceptually geo-informative. The dis-covered visual elements canalso support a variety ofcomputational
geography tasks, such as mapping architecturalcorrespondencesand influences within and across cities, findingrepresentative el-ements at different geo-spatial scales, andgeographically-informedimage retrieval.
CR Categories: I.3.m [Computer Graphics]:MiscellaneousApplication I.4.10 [Image Processing and ComputerVision]: Im-age RepresentationStatistical
Keywords: data mining, visual summarization, reference art,bigdata, computational geography, visual perception
Links: DL PD F WEB
1 Introduction
Consider the two photographs in Figure 1, both downloadedfromGoogle Street View. One comes from Paris, the other onefromLondon. Can you tell which is which? Surprisingly, even forthesenondescript street scenes, people who have been to Europe tendtodo quite well on this task. In an informal survey, we presented11subjects with 100 random Street View images of which 50% werefromParis, and the rest from eleven other cities. We instructedthesubjects (who have all been to Paris) to try and ignore any textinthe photos, and collected their binary forced-choiceresponses(Paris / Not Paris). On average, subjects were correct 79%of thetime (std = 6.3), with chance at 50% (when allowed toscruti-nize the text, performance for some subjects went up as highas90%). What this suggests is that people are remarkablysensitiveto the geographically-informative features within thevisual envi-ronment. But what are those features? In informaldebriefings, our
subjects suggested that for most images, a few localized,distinctiveelements immediately gave it away. E.g. for Paris,things likewindows with railings, the particular style ofbalconies, the dis-tinctive doorways, the traditionalblue/green/white street signs, etc.were particularly helpful.Finding those features can be difficultthough, since every imagecan contain more than 25, 000 candidatepatches, and only a tinyfraction will be truly distinctive.
In this work, we want to find such local geo-informativefeaturesautomatically, directly from a large database ofphotographs from aparticular place, such as a city. Specifically,given tens of thousandsof geo-localized images of some geographicregion R, we aim tofind a few hundred visual elements that areboth: 1) repeating, i.e.they occur often in R, and 2)geographically discriminative, i.e.they occur much more often in Rthan in RC. Figure 1 showssample output of our algorithm: for eachphotograph we show three
of the most geo-informative visual elementsthat wereautomaticallydiscovered. For the Paris scene (left), the streetsign, the windowwith railings, and the balcony support are allflagged as informative.
But why is this topic important for modern computer graphics?1)Scientifically, the goal of understanding which visual elementsarefundamental to our perception of a complex visual concept,suchas a place, is an interesting and useful one. Our paper sharesthismotivation with a number of other recent works that dontactuallysynthesize new visual imagery, but rather propose ways offindingand visualizing existing image data in better ways, be itselectingcandid portraits from a video stream [Fiss et al. 2011],summarizing
http://doi.acm.org/10.1145/2185520.2185597http://portal.acm.org/ft_gateway.cfm?id=2185597&type=pdfhttp://graphics.cs.cmu.edu/projects/whatMakesParis/http://graphics.cs.cmu.edu/projects/whatMakesParis/http://graphics.cs.cmu.edu/projects/whatMakesParis/http://portal.acm.org/ft_gateway.cfm?id=2185597&type=pdfhttp://doi.acm.org/10.1145/2185520.2185597
7/29/2019 Paris - Une description bien detaile
2/9
a scene from photo collections [Simon et al. 2007], findingiconicimages of an object [Berg and Berg 2009], etc. 2) Morepractically,one possible future application of the ideas presentedhere might beto help CG modelers by generating so-called referenceart for acity. For instance, when modeling Paris for P IXARsRatatouille,the co-director Jan Pinkava faced exactly this problem:The ba-sic question for us was: what would Paris look like as amodelof Paris?, that is, what are the main things that give thecity itsunique look? [Paik 2006]. Their solution was to run aroundParis
for a week like mad tourists, just looking at things, talkingaboutthem, and taking lots of pictures not just of the Eiffel Towerbutof the many stylistic Paris details, such as signs, doors etc.[Paik2006](see photos on pp.120121). But if going on location isnotfeasible, our approach could serve as basis for adetail-centricreference art retriever, which would let artistsfocus their attentionon the most statistically significantstylistic elements of the city. 3)And finally, morephilosophically, our ultimate goal is to provide astylisticnarrative for a visual experience of a place. Such narrative,onceestablished, can be related to others in a kind ofgeo-culturalvisual reference graph, highlighting similarities anddifferences be-tween regions. E.g. one could imagine finding avisual appearancetrail from Greece, through Italy and Spain andinto Latin Amer-ica. In this work, we only take the first steps inthis direction con-necting visual appearance across cities, findingsimilarities withina continent, and differences betweenneighborhoods. But we hopethat our work might act as a catalyst forresearch in this new area,which might be called computationalgeo-cultural modeling.
2 Prior Work
In the field of architectural history, descriptions of urban andre-gional architectural styles and their elements are wellestablished,e.g. [Loyer 1988; Sutcliffe 1996]. Such local elementsand rulesfor combining them have been used in computer systems forpro-cedural modeling of architecture to generate 3D models ofentirecities in an astonishing level of detail, e.g. [Mueller etal. 2006], orto parse images of facades, e.g. [Teboul et al. 2010].However, suchsystems require significant manual effort from anexpert to specifythe appropriate elements and rules for eacharchitectural style.
At the other end of the spectrum, data-driven approaches havebeenleveraging the huge datasets of geotagged images that haverecentlybecome available online. For example, Crandall et al.[2009] usethe GPS locations of 35 thousand consumer photos fromFlickrto plot photographer-defined frequency maps of cities andcoun-tries, while Kalogerakis et al. [2009] use the locations andrelativetime-stamps of photos of the same photographer to modelworld-wide human travel priors. Geo-tagged datasets have also beenusedfor place recognition [Schindler et al. 2007; Knopp et al.2010;Chen et al. 2011] including famous landmarks [Li et al. 2008;Liet al. 2009; Zheng et al. 2009]. Our work is particularlyrelatedto [Schindler et al. 2007; Knopp et al. 2010], where geotagsarealso used as a supervisory signal to find sets of imagefeaturesdiscriminative for a particular place. While theseapproaches canwork very well, their image features typically cannotgeneralize be-
yond matching specific buildings imaged from differentviewpoints.Alternatively, global image representations from scenerecogni-tion, such as GIST descriptor [Oliva and Torralba 2006]have beenused for geo-localization of generic scenes on the globalEarthscale [Hays and Efros 2008; Kalogerakis et al. 2009]. There,too,reasonable recognition performance has been achieved, but theuseof global descriptors makes it hard for a human to interpret whyagiven image gets assigned to a certain location.
Finally, our paper is related to a line of work on unsupervisedob-ject discovery [Russell et al. 2006; Chum et al. 2009;Karlinskyet al. 2009; Lee and Grauman 2009; Singh et al. 2012] (andespe-
cially [Quack et al. 2008], who also deal with mininggeo-taggedimage data). Such methods attempt to explicitly discoverfeaturesor objects which occur frequently in many images and arealso use-ful as human-interpretable elements of visualrepresentation. Butbeing unsupervised, these methods are limited toonly discover-ing things that are both very common and highlyvisually consis-tent. However, adding supervision, such as byexplicitly trainingobject [Li et al. 2010] or object part detectors[Bourdev and Ma-lik 2009], requires a labor-intensive labelingprocess for each ob-
ject/part.
In contrast, here we propose a discovery method that is weaklycon-strained by location labels derived from GPS tags, and which isableto mine representative visual elements automatically from alargeonline image dataset. Not only are the resulting visualelements ge-ographically discriminative (i.e. they occur only in agiven locale),but they also typically look meaningful to humans,making themsuitable for a variety of geo-data visualizationapplications. Thenext section describes the data used in this work,followed by thefull description of our algorithm.
3 The Data
Flickr has emerged as the data-source of choice for mostrecently
developed data-driven applications in computer vision andgraph-ics, including visual geo-location [Hays and Efros 2008;Cran-dall et al. 2009; Li et al. 2009]. However, the difficultywithFlickr and other consumer photo-sharing websites forgeographi-cal tasks is that there is a strong data bias towardsfamous land-marks. To correct for this bias and provide a moreuniform sam-pling of the geographical space, we turn to GOOGLESTREET
VIE W a huge database of street-level imagery, captured aspanora-mas using specially-designed vehicles. This enablesextraction ofroughly fronto-parallel views of building facades and,to some ex-tent, avoids dealing with large variations of cameraviewpoint.
Given a geographical area on a map, we automatically scrapeadense sampling of panoramas of that area from Google StreetView[Gronat et al. 2011]. From each panorama, we extract twoperspectiveimages (936x537 pixels), one on each side of the cap-
turing vehicle, so that the image plane is roughly parallel tothe ve-hicles direction of motion. This results in approximately10, 000images per city. For this project, we downloaded 12 cities:Paris,London, Prague, Barcelona, Milan, New York, Boston,Philadel-phia, San Francisco, San Paulo, Mexico City, and Tokyo. Wehavealso scraped suburbs of Paris for one experiment.
4 Discovering geo-informative elements
Our goal is to discover visual elements which are characteristicofa given geographical locale (e.g. the city of Paris). That is, weseekpatterns that are both frequently occurring within the givenlocale,and geographically discriminative, i.e. they appear in thatlocaleand do not appear elsewhere. Note that neither of these twore-
quirements by itself is enough: sidewalks and cars occurfrequentlyin Paris but are hardly discriminative, whereas theEiffel Tower isvery discriminative, but too rare to be useful (<0.0001% in ourdata). In this work, we will represent visualelements by square im-age patches at various resolutions, and minethem from our largeimage database. The database will be dividedinto two parts: (i)the positive set containing images from thelocation whose visualelements we wish to discover (e.g. Paris); and(ii) the negative setcontaining images from the rest of the world(in our case, the other11 cities in the dataset). We assume thatmany frequently occurringbut uninteresting visual patterns (trees,cars, sky, etc.) will occur inboth the positive and negative sets,and should be filtered out. Our
7/29/2019 Paris - Une description bien detaile
See Also
Book [PDF] The Times Mindgames Number And Logic Puzzles Book 4 500 Brain Crunching Puzzles Featuring 7 Popular Mind Games The Times Puzzle Books Download YEG Fitness - Nov/Dec 2014 - [PDF Document]Turning a Launchpad into a CV Preset Bank
3/9
input& matches&
kNN
iter.1
iter.2
iter.3
kNN
iter.1
iter.2
iter.3
input& matches&input& matches&
kNN
iter.1
iter.2
iter.3
Figure 3: Steps of our algorithm for three sample candidatepatches in Paris. The first row: initial candidate and its NNmatches. Rows 2-4:iterations of SVM learning (trained using patcheson left). Red boxes indicate matches outside Paris. Rows show every7th match for clarity.
Notice how the number of not-Paris matches decreases with eachiteration, except for rightmost cluster, which is eventuallydiscarded.
(a)$K&Means$Clusters$using$SIFT$(Visual$Words)$(b)$K&Means$Clusters$using$HOG$
Figure 2: (a) k-means clustering using SIFT (visual words) isdom-inated by low level features. (b) k-means clustering overhigher
dimensional HOG features produces visually incoherentclusters.
biggest challenge is that the overwhelming majority of our dataisuninteresting, so matching the occurrences of the rareinterestingelements is like finding a few needles in ahaystack.
One possible way to attack this problem would be to firstdiscoverrepeated elements and then simply pick the ones which arethe mostgeographically discriminative. A standard technique forfindingrepeated patterns in data is clustering. For example, incomputervision, visual word approaches [Sivic and Zisserman 2003]usek-means clustering on image patches represented by SIFTdescrip-tors. Unfortunately, standard visual words tend to bedominatedby low-level features, like edges and corners (Figure 2a),not thelarger visual structures we are hoping to find. While we cantryclustering using larger image patches (with ahigher-dimensionalfeature descriptor, such as HOG [Dalal and Triggs2005]), k-meansbehaves poorly in very high dimensions because thedistance metricbecomes less meaningful, producing visuallyinhom*ogeneous clus-ters (Figure 2b). We also experimented withother clustering ap-proaches, such as Locality-Sensitive Hashing[Gong and Lazebnik2011], with similar results.
An alternative approach is to use the geographic information aspartof the clustering, extracting elements that are both repeatedand dis-criminative at the same time. We have experimented withsuch dis-criminative clustering methods [Moosmann et al. 2007;Fulkersonet al. 2008; Shotton et al. 2008], but found they did notprovidethe right behavior for our data: they either produceinhom*ogeneousclusters or focus too much on the most common visualfeatures. Webelieve this is because such approaches include atleast one step that
partitions the entire feature space. This tends to lose theneedles inour haystack: the rare discriminative elements get mixedwith, andoverwhelmed by, less interesting patches, making itunlikely that adistinctive element could ever emerge as its owncluster.
In this paper, we propose an approach that avoids partitioningtheentire feature space into clusters. Instead, we start with alargenumber of randomly sampled candidate patches, and then giveeachcandidate a chance to see if it can converge to a cluster thatis bothfrequent and discriminative. We first compute the nearestneighborsof each candidate, and reject candidates with too manyneighborsin the negative set. Then we gradually build clusters byapplying
iterative discriminative learning to each surviving candidate.Thefollowing section presents the details of this algorithm.
4.1 Our Approach
From the tens of millions of patches in our full positive set,we ran-domly sample a subset of 25, 000 high-contrast patches toserve ascandidates for seeding the clusters. Throughout thealgorithm, werepresent such patches using a HOG+color descriptor[Dalal andTriggs 2005]. First, the initial geo-informativeness ofeach patch is
estimated by finding the top 20 nearest neighbor (NN) patchesinthe full dataset (both positive and negative), measured bynormal-ized correlation. Patches portraying non-discriminativeelementstend to match similar elements in both positive andnegative set,while patches portraying a non-repeating element willhave more-or-less random matches, also in both sets. Thus, we keepthe candi-date patches that have the highest proportion of theirnearest neigh-bors in the positive set, while also rejectingnear-duplicate patches(measured by spatial overlap of more than 30%between any 5 oftheir top 50 nearest neighbors). This reduces thenumber of candi-dates to about 1000.
Figure 3 (top row) shows three example candidate patchesto-gether with their nearest neighbor matches. We can see that,al-though some matches appear reasonable, many are not verygeo-discriminative (red boxes show matches outside Paris), norvisually
coherent (e.g. street sign). The main problem is that a standarddis-tance metric, such as normalized correlation, does not capturewhatthe important parts are within an image patch, and insteadtreats allpixels equally. For example, the the street signcandidate (Figure 3center), has a dark vertical bar along the rightedge, and so all theretrieved NN matches also have that bar, eventhough its irrelevantto the street sign concept.
Recently, [Shrivastava et al. 2011] showed how one canimprovevisual retrieval by adapting the distance metric to thegiven queryusing discriminative learning. We adopt similarmachinery, traininga linear SVM detector for each visual element inan iterative man-ner as in [Singh et al. 2012] while also adding aweak geographicalconstraint. The procedure produces a weight vectorwhich corre-sponds to a new, per-element similarity measure thataims to be geo-
discriminative. Our iterative clustering works as follows.Initially,we train an SVM detector for each visual element, usingthe top knearest neigbors from the positive set as positiveexamples, and allnegative-set patches as negative examples. Whilethis produces asmall improvement (Figure 3 (row 2)), it is notenough, since thetop k matches might not have been very good tobegin with. So, weiterate the SVM learning, using the top kdetections from previousround as positives (we set k = 5 for allexperiments). The ideais that with each round, the top detectionswill become better andbetter, resulting in a continuously improvingdetector. However, do-ing this directly would not produce muchimprovement because theSVM tends to overfit to the initial positiveexamples [Singh et al.
7/29/2019 Paris - Une description bien detaile
4/9
R an dom 'I ma ge s' for 'P ari s' Str ee t2 vi ew' E xt ra cted'V is ua l' El eme nt s' fr om' Pa ri s' R an do m' Ima ge s' for' Pra gu e' St re et 2v ie w' E xt ra cte d'V is ua l' El eme nts' fr om' Pra gu e'
Random'Images'for'London'Street2view' Extracted'El ements'from'London' Rand om'Images'for'Barcelona'Street2view' Extracted'Elements'from'Ba rcel ona'
Random'Images'for'San'Francisco'(SF)'Extracted'Elements'from'SF'Extracted'Elements'from'Boston'Random'Images'for'Boston'
Figure 4: Google Street View vs. geo-informative elements forsix cities. Arguably, the geo-informative elements (right) are ableto providebetter stylistic representation of a city than randomlysampled Google Street View images (left).
2012], and will prefer them in each next round over new (andbet-ter) ones. Therefore, we apply cross-validation by dividingboththe positive and the negative parts of the dataset into lequally-sizedsubsets (we set l = 3 for all experiments). At eachiteration ofthe training, we apply the detectors trained on theprevious roundto a new, unseen subset of data to select the top kdetections forretraining. In our experiments, we used threeiterations, as most
good clustered didnt need more to converge (i.e. stopchanging).After the final iteration, we rank the resultingdetectors based ontheir accuracy: percentage of top 50 firings thatare in the positivedataset (i.e. in Paris). We return the top fewhundred detectors asour geo-informative visual elements.
Figure 3 gives some intuition about the algorithm. Forexample,in the left column, the initial nearest neighbors containonly a few
7/29/2019 Paris - Une description bien detaile
5/9
windows with railings. However, windows with railings differmorefrom the negative set thanthe windows without railings; thusthe de-tector quickly becomes more sensitive to them as thealgorithm pro-gresses. The right-most example does not appear toimprove, eitherin visual similarity or in geo-discriminativeness.This is because theoriginal candidate patch was intrinsically notvery geo-informativeand would not make a good visual element. Suchpatches have alow final accuracy and are discarded.
Implementation Details: Our current implementation considersonlysquare patches (although it would not be difficult to addotheraspect ratios), and takes patches at scales ranging from80-by-80pixels all the way to height-of-image size. Patches arerepresentedwith standard HOG [Dalal and Triggs 2005] (8x8x31cells), plusa 8x8 color image in L*a*b colorspace (a and b only).Thus theresulting feature has 8x8x33 = 2112 dimentions. Duringiterativelearning, we use a soft-margin SVM with C fixed to 0.1.The fullmining computation is quite expensive; a single cityrequires ap-proximately 1, 800 CPU-hours. But since the algorithmis highlyparallelizable, it can be done overnight on a cluster.
4.2 Results and Validation
Figure 4 shows the results of running our algorithm on severalwell-
known cities. For each city, the left column shows randomlycho-sen images from Google Street View, while the right columnshowssome of the top-ranked visual element clusters that wereautomat-ically discovered (due to space limitations, a subset ofelementswas selected manually to show variety; see the projectwebpage forthe full list). Note that for each city, our visualelements conveya better stylistic feel of the city than do therandom images. Forexample, in Paris, the top-scoring elementszero-in on some of themain features that make Paris look likeParis: doors, balconies, win-dows with railings, street signs andspecial Parisian lampposts. It isalso interesting to note that, onthe whole, the algorithm had moretrouble with American cities: itwas able to discover only a fewgeo-informative elements, and someof them turned out to be dif-ferent brands of cars, road tunnels,etc. This might be explained bythe relative lack of stylisticcoherence and uniqueness in Americancities (with its melting pot ofstyles and influences), as well as thesupreme reign of theautomobile on American streets.
In addition to the qualitative results, we would also like toprovidea more quantitative evaluation of our algorithm. Whilevalidatingdata-mining approaches is difficult in general, there area few ques-tions about our method that we can measure: 1) do thediscoveredvisual elements correspond to an expert opinion of whatvisuallycharacterizes a particular city? 2) are they indeedobjectively geo-informative? 3) do users find them subjectivelygeo-informative ina visual discrimination task? and 4) can theelements be potentiallyuseful for some practical task? To answerthe first question, we con-sulted a respected volume on 19thcentury Paris architecture [Loyer1988]. We found that a number ofstylistic visual elements men-tioned in the book correspond quitewell to those discovered by ouralgorithm, as illustrated on Figure5.
To evaluate how geo-informative our visual elements are, we ranthetop 100 Paris element detectors over an unseen dataset whichwas50% from Paris and 50% from elsewhere. For each element, wefoundits geo-informativeness by computing the percentage of thetime itfired in Paris out of the top 100 firings. The average accuracyofour top detectors was 83% (where chance is 50%). We repeatedthisfor our top 100 Prague detectors, and found the average accu-racyon an unseen dataset of Prague to be 92%. Next, we repeatedtheabove experiment with people rather than computers. To avoidsubjectfatigue, we reduced the dataset to 100 visual elements, 50fromParis and 50 from Prague. 50% of the elements were the top-
Window'Balustrades' Streetlamps'on'Pedestal' Parisian'Doors'
Figure 5: Books on Paris architecture are expressly written togive
the reader a sample of the architectural elements that arespecifi-cally Parisian. We consulted one such volume [Loyer, 1988]and
found that a number of their illustrative examples (left) wereauto-matically discovered by our method (right).
Figure 6: Geo-informative visual elements can provide subtlecuesto help artists better capture the visual style of a place. Weasked anartist to make a sketch from a photo of Paris (left), andthen sketch it
again after showing her the top discovered visual elements forthisimage (right). Note, for example, that the street sign andwindowrailings are missing in the left sketch. In our informalsurvey, most
people found the right sketch to be more Paris-like.
ranked ones returned by our algorithm for Paris and Prague.Theother 50% were randomly sampled patches of Paris and Prague(butbiased to be high-contrast, as before, to avoid empty skypatches,etc). In a web-based study, subjects (who have all been toParis butnot necessarily Prague) were asked to label each of the100 patchesas belonging to either Paris or Prague (forced choice).The results
of our study (22 naive subjects) are as follows: averageclassifi-cation performance for the algorithm-selected patches was78.5%(std = 11.8), while for random patches it was 58.1% (std =6.1);the p-value for a paired-samples t-test was < 108. While onran-dom patches subjects did not do much better than chance,perfor-mance on our geo-informative elements was roughly comparabletothe much simpler full-image classification task reported in thebe-ginning of the paper (although since here we only used Prague,thesetups are not quite the same).
Finally, to get a sense of whetherour elementsmight serve asrefer-ence art, we asked an artist to sketch a photograph of Paris,allow-ing only 10 minutes so that some details had to be omitted.Severaldays later, she made another 10-minute sketch of the samephoto-graph, this time aided by a display of the top 10geo-informativeelements our algorithm detected in the image. In aninformal, ran-domized survey, 10 out of our 11 naive subjects (whohad all beento Paris) found the second sketch to be moreParis-like. The twosketches are shown in Figure 6.
5 Applications
Now that we have a tool for discoveringgeographically-informativevisual elements for a given locale, wecan use them to explore waysof building stylistic narratives forcities and of making visual con-nections between them. Here wediscuss just a few such directions.
7/29/2019 Paris - Une description bien detaile
6/9
St.$Germain$
market$
Place$des$
Vosges$
Map$data$$OpenStreetMap$contributors,$CC$BY?SA$
Figure 7: Examples of geographic patterns in Paris (shown as reddots on the maps) for three discovered visual elements (shown beloweachmap). Balconies with cast-iron railings are concentrated on themain boulevards (left). Windows with railings mostly occur onsmaller streets(middle). Arch supporting columns are concentratedon Place des Vosges and the St. Germain market (right).
Paris&Prague&
Milan&
London&
Barcelona&
Paris&
Barcelona&
Paris&
Milan&
Barcelona&
Prague&
Milan&
London&
Barcelona&
Paris&Prague&
Milan&
Barcelona&
A& B& C& D& E&
Map&data&OpenStreetMapcontributors,CC BY>SA
Figure 8: Architectural patterns across Europe. While arches (A)are common across all Europe, double arches (B) seem rare inLondon.Similarly, while Paris, Barcelona and Milan all sharecast-iron railings on their balconies (D), the grid-like balconyarrangement (E) of Parisand Barcelona is missing in Milan.
5.1 Mapping Patterns of Visual Elements
So far, we have shown the discovered visual elements for agivencity as an ordered list of patch clusters (Figure 4). Giventhat weknow the GPS coordinates of each patch, however, we couldeasilydisplay them on a map, and then search for interestinggeo-spatialpatterns in the occurrences of a given visual element.Figure 7shows the geographical locations for the top-scoringdetections foreach of 3 different visual elements (a sampling ofdetections areshown below each map), revealing interestinglynon-uniform distri-butions. For example, it seems that balconieswith cast-iron railings(left) occur predominantly on the largethoroughfares (bd Saint-Michel, bd Saint-Germain, rue de Rivoli),whereas windows withcast-iron railings (middle) appear mostly onsmaller streets. Thearch-supporting column (right) is adistinguishing feature of the fa-mous Place des Vosges, yet it alsoappears in other parts of Paris,particularly as part of more recentMarche Saint-Germain (this is apossible example of so-calledarchitectural citation). Automati-cally discovering sucharchitectural patterns may be useful to botharchitects and urbanhistorians.
5.2 Exploring Different Geo-spatial Scales
So far we have focused on extracting the visual elementswhich
summarize appearance on one particular scale, that of a city.Butwhat about visual patterns across larger regions, such as aconti-nent, or a more specific region, such as a neighborhood? Herewedemonstrate visual discovery at different geo-spatial scales.
We applied our algorithm to recover interesting patternssharedby the cities on the European subcontinent. Specifically, weusedStreet View images from five European cities (Barcelona,London,Milan, Paris and Prague) as the positive set, and theremaining 7non-European cities as the negative set. Figure 8 showssome inter-esting discriminative features and patterns in terms oftheir mem-bership across the 5 European cities. For example, whilearches
1st$and$2nd$
districts$
Louvre$/
Opera$
5th$and$6th$districts$
La7n$Quarter$/$
Luxembourg$
4th$district$
Marais$
Map$data$OpenStreetMapcontributors,CC BYESA
Figure 9: Geographically-informative visual elements at thescaleof city neighborhoods. Here we show a few discoveredelements
particular to three of the central districts of Paris:Louvre/Opera,
the Marais, and the Latin Quarter/Luxembourg.
are common in cities across Europe, double-arches seem rareinLondon. Similarly, while balcony railings in Paris, BarcelonaandMilan are all made of cast iron, they tend to be made of stoneinLondon and Prague.
We also analyzed visual patterns at the scale of a cityneighbor-hood. Specifically, we considered three well-defineddistricts ofParis: Louvre/Opera (1e, 2e), Le Marais (4e), and LatinQuar-ter/Luxembourg (5e, 6e). Figure 9 shows examples ofgeograph-ically informative elements for each of the threedistricts (whiletaking the other districts and Paris suburbs as thenegative set). Pre-
7/29/2019 Paris - Une description bien detaile
7/9
!"#$%&'()
*'#+,+'()
-##$(.+,,&/&(0)1&)
!"#$%&'()
*'#+,+'()
!"#$%&'()
*'#+,+'()
-##$(.+,,&/&(0)1&)
Figure 10: Detecting architectural influences. Each image showsconfident detections for architectural styles at differentgeographic scales.
Paris,'France' Prague,'Czech'Republic' London,'England'
Figure 11: Visual Correspondence. Each row shows correspondingdetections of a single visual element detector across threedifferent cities.
dictably, Louvre/Opera is differentiated from the rest of Parisbythe presence of big palatial facades. Le Marais is distinguishedbyits more cozy palaces, very close-up views due to narrowstreets,and a specific shape of lampposts. Interestingly, one ofthe defin-ing features of the Latin Quarter/Luxembourg is the highfrequencyof windows with closed shutters as compared to otherdistricts inParis. One possible explanation is that thisneighborhood has be-come very prestigious and a lot of itsreal-estate has been boughtup by people who dont actually livethere most of the time.
Given the detectors for visual elements at differentgeo-spatialscales, it becomes possible to analyze a scene in termsof theregionsfrom which it draws its architectural influences.Figure 10 showsimages from the 5th arrondissem*nt of Paris,pointing out which el-ements are specific to that arrondissem*nt,which are Paris-specific,and which are pan-European. For example,the stone balcony rail-ings and arches are pan-European, windowswith collapsible shut-ters and balconies with iron railings areParisian, and the grooves
around the windows are typical of the 5th arrondissem*nt.
5.3 Visual Correspondences Across Cities
Given a set of architectural elements (windows, balconies, etc.)dis-covered for a particular city, it is natural to ask what thesesameelements might look like in other cities. As it turns out, aminormodification to our algorithm can often accomplish this task.Wehave observed that a detector for a location-specificarchitecturalelement will often fire on functionally similarelements in othercities, just with a much lower score. That is, aParis balcony de-tector will return mostly London balconies if itis forced to run only
Figure 12: Object-centric image averages for the elementdetectorin the top row of Figure 11. Note how the context capturesthe
differences in facade styles between Paris (left) and London(right).
on London images. Naturally these results will be noisy, butwecan clean them up using an iterative learning approach similartothe one in Section 4.1. The only difference is that we requirethepositive patches from each iteration of training to be taken notjust
from the source city, but from all the cities where we wish tofindcorrespondences. For example, to find correspondencesbetweenParis, Prague, and London, we initialize with visualelements dis-covered in Paris and then, at each round of clean-uptraining, weuse 9 top positive matches to train each element SVM, 3from eachof the three cities. Figure 11 illustrates the result ofthis procedure.Note how capturing the correspondence betweensimilar visual ele-ments across cities can often highlight certainstylistic differences,such as the material for the balconies, thestyle of the street-lamps,or the presence and position of ledges onthe facades.
Another interesting observation is that some discovered visualel-
7/29/2019 Paris - Une description bien detaile
8/9
Query&Image&in&Prague&Retrieved&Images&in&Paris&
& & & &Figure 13: Geographically-informedretrieval. Given a queryPrague image (left), we retrieve images inParis (right).
ements, despite having a limited spatial extent, can oftenencodea much larger architectural context. This becomesparticularly ap-parent when looking at the same visual elementdetector appliedin different cities. Figure 12 shows object-centricaverages (in thestyle of [Torralba and Oliva 2003]) for thedetector in the top rowof Figure 11 for Paris and London. That is,for each city, the im-ages with the top 100 detections of theelement are first centered onthat element and then averagedtogether in image space. Note thatnot only do the averagedetections (red squares) look quite differentbetween the twocities, but the average contexts reveal quite a lotabout thedifferences in the structure and style of facades. In Paris,
one can clearly see four equal-height floors, with a balcony rowonthe third floor. In London, though, floor heights are uneven,withthe first floor much taller and more stately.
5.4 Geographically-informed Image Retrieval
Once we have detectors that set up the correspondence betweendif-ferent cities such as Paris and Prague (Sec. 5.3), we can usethemfor geographically-informed image retrieval. Given a queryimagefrom one location, such as Prague, our task is to retrievesimilar im-ages from another location, such as Paris. For this weuse the corre-spondence detectors from Sec. 5.3 while also encodingtheir spatialpositions in the image. In particular, we construct afeature vectorof the query image by building a spatial pyramid andmax-poolingthe SVM scores of the correspondence detectors in eachspatial bin
in the manner of [Li et al. 2010]. Retrieval is then performedus-ing the Euclidean distance between the feature vectors. Figure13demonstrates this approach where a query image from Praguere-trieves images from Paris that contain similar balconies withcastiron railings (bottom) while honoring spatial layout offacades.
6 Conclusion
So, what makes Paris look like Paris? We argued that the lookandfeel of a city rests not so much on the few famous landmarks(e.g.the Eiffel Tower), but largely on a set of stylistic elements,thevisual minutiae of daily urban life. We proposed a methodthat canautomatically find a subset of such visual elements froma largedataset offered by Google Street View, and demonstratedsomepromising applications. This work is but a first step towards
our ultimate goal of providing stylistic narratives to explorethe di-verse visual geographies of our world. Currently, the methodis lim-ited to discovering only local elements (image patches), soa logicalnext step would be trying to capture larger structures,both urban(e.g. facades), as well as natural (e.g. fields, rivers).Finally, theproposed algorithm is not limited to geographic data,and might po-tentially be useful for discovering stylistic elementsin other weaklysupervised settings, e.g. What makes an Appleproduct?
Acknowledgements: We thank Jessica Hodgins, Lavanya Sharan,andYves Ubelmann for their valuable comments and suggestions.This workis a part of a larger effort with Dan Huttenlocher and
David Crandall, on modeling geo-informative visual attributes.Wethank Google for letting us publish the Street View images.Thiswork was partially supported by NDSEG fellowship to CD andbyGoogle, NSF IIS0905402, EIT-ICT, ONR N000141010934,ONRN000141010766, and MSR-INRIA.
References
BER G, T., AN D BER G, A. 2009. Finding iconic images. InThe
2nd Internet Vision Workshop at Conference on Computer VisionandPattern Recognition (CVPR).
BOURDEV, L., AN D MALIK, J. 2009. Poselets: Body part detec-torstrained using 3D human pose annotations. In IEEE 12thIn-ternational Conference on Computer Vision (ICCV), 13651372.
CHE N, D . , BAATZ, G . , KOSER, K . , TSA I, S . , VEDANTHAM,R., PYLVANAINEN, T . , ROIMELA, K . , CHE N, X . , BACH ,J .,POLLEFEYS, M . , GIROD, B ., AN D GRZESZCZUK, R.2011. City-scalelandmark identification on mobile devices. In
IEEE Conference on Computer Vision and PatternRecognition(CVPR), 737744.
CHU M, O., PERDOCH, M., AN D MATAS, J . 2009. Geometricmin-hashing: Finding a (thick) needle in a haystack. In IEEECon-
ference on Computer Vision and Pattern Recognition(CVPR),1724.
CRANDALL, D . , BACKSTROM, L. , HUTTENLOCHER, D., AN DKLEINBERG, J . 2009. Mapping the worlds photos. In Proceed-ings of the 18thInternational Conference on World Wide Web(WWW), 761770.
DALAL, N., AN D TRIGGS, B. 2005. Histograms of orientedgra-dients for human detection. In IEEE Conference onComputerVision and Pattern Recognition (CVPR), vol. 1, 886893.
FIS S, J., AGARWALA, A., AN D CURLESS, B. 2011. Candid por-traitselection from video. ACM Transactions on Graphics (SIG-GRAPH Asia)30, 6, 128.
FULKERSON, B., VEDALDI, A., AN D SOATTO, S. 2008. Localiz-ingobjects with smart dictionaries. In European Conference onComputerVision (ECCV), 179192.
GON G, Y., AN D LAZEBNIK , S. 2011. Iterative quantization:Aprocrustean approach to learning binary codes. In IEEE Con-
ference on Computer Vision and Pattern Recognition(CVPR),817824.
GRONAT, P., HAVLENA, M., SIVIC, J., AN D PAJDLA, T.2011.Building streetviewdatasets for place recognition and cityrecon-struction. Tech. Rep. CTUCMP201116, Czech Tech Univ.
HAYS, J., AN D EFROS, A. 2008. Im2gps: estimatinggeographicinformation from a single image. In IEEE Conference onCom-
puter Vision and Pattern Recognition (CVPR), 18.
KALOGERAKIS , E., VESSELOVA, O., HAYS, J., EFROS, A., AN D
HERTZMANN, A. 2009. Image sequence geolocation with hu-mantravel priors. In IEEE 12th International Conference onComputerVision (ICCV), 253260.
KARLINSKY, L., DINERSTEIN , M., AN D ULLMAN, S. 2009.Un-supervised feature optimization (ufo): Simultaneous selectionofmultiple features with their detection parameters. In IEEECon-
ference on Computer Vision and Pattern Recognition(CVPR),12631270.
KNOPP, J. , SIVIC, J., AN D PAJDLA, T. 2010. Avoiding confus-ingfeatures in place recognition. In European Conference onComputerVision (ECCV), 748761.
7/29/2019 Paris - Une description bien detaile
9/9
LEE , Y., AN D GRAUMAN, K. 2009. Foreground focus: Unsu-pervisedlearning from partially matching images. International
Journal of Computer Vision (IJCV) 85 , 2, 143166.
LI, X., WU, C., ZACH , C., LAZEBNIK, S., AN D FRAHM, J.-M.2008.Modeling and recognition of landmark image collectionsusing iconicscene graphs. In European Conference on ComputerVision (ECCV),427440.
LI, Y., CRANDALL, D., AN D HUTTENLOCHER, D. 2009. Land-
mark classification in large-scale image collections. InIEEE12th International Conference on Computer Vision(ICCV),19571964.
LI, L., SU, H., XIN G, E., AN D FEI -F EI , L. 2010. Objectbank:A high-level image representation for scene classification andse-mantic feature sparsification. In Advances in NeuralInformationProcessing Systems (NIPS), vol. 24.
LOYER, F. 1988. Paris nineteenth century : architecture andur-banism, 1st american ed. Abbeville Press, New York.
MOOSMANN, F., TRIGGS, B., AN D JURIE, F. 2007. Fastdiscrim-inative visual codebooks using randomized clusteringforests.In Advances in Neural Information Processing Systems(NIPS),vol. 19.
MUELLER, P., WONKA, P., HAEGLER, S . , ULMER, A., AN DVAN GOO L,L . 2006. Procedural modeling of buildings. ACMTransactions onGraphics (SIGGRAPH) 25, 3, 614623.
OLIVA, A., AN D TORRALBA, A. 2006. Building the gist of ascene:The role of global image features in recognition. Progressin brainresearch 155, 2336.
PAI K, K. 2006. The Art of Ratatouille. Chronicle Books.
QUACK, T. , LEIBE, B. , AN D VAN GOO L, L. 2008. World-scalemining of objects and events from community photo col-lections. InProceedings of the International Conference onContent-based Imageand Video Retrieval (CIVR), 4756.
RUSSELL, B. C. , EFROS, A . A . , SIVIC, J . , FREEMAN , W.T.,AN D ZISSERMAN , A. 2006. Using multiple segmentations
to discover objects and their extent in image collections.InIEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 16051614.
SCHINDLER , G., BROWN, M., AN D SZELISKI , R. 2007. City-scalelocation recognition. In IEEE Conference on ComputerVision andPattern Recognition (CVPR), 17.
SHOTTON, J. , JOHNSON , M., AN D CIPOLLA , R. 2008. Seman-tictexton forests for image categorization and segmentation. In
IEEE Conference on Computer Vision and PatternRecognition(CVPR), 18.
SHRIVASTAVA, A., MALISIEWICZ , T., GUPTA, A., AN D EFROS,A. A.2011. Data-driven visual similarity for cross-domainimage matching.ACM Transactions on Graphics (SIGGRAPH
Asia) 30 , 6, 154.SIMON, I., SNAVELY, N., AN D SEITZ, S. M.2007. Scene summa-
rization for online image collections. In IEEE 11thInternationalConference on Computer Vision (ICCV), 18.
SINGH, S . , GUPTA, A ., AN D EFROS, A. A. 2012. Un-superviseddiscovery of mid-level discriminative patches.arXiv:1205.3137[cs.CV].
SIVIC, J., AN D ZISSERMAN , A. 2003. Video google: A textre-trieval approach to object matching in videos. In IEEE 9thInter-national Conference on Computer Vision (ICCV), 14701477.
SUTCLIFFE, A. 1996. Paris: an architectural history. YaleUni-versity Press.
TEBOUL, O., SIMON, L., KOUTSOURAKIS , P., AN D PARAGIOS,N. 2010.Segmentation of building facades using proceduralshape priors. InIEEE Conference on Computer Vision and Pat-tern Recognition (CVPR),31053112.
TORRALBA, A., AN D OLIVA, A. 2003. Statistics of naturalimagecategories. Network: Computation in Neural Systems,391412.
ZHENG, Y.-T., ZHAO , M., SON G, Y., ADAM , H., BUDDEMEIER ,U.,BISSACCO, A., BRUCHER, F., CHUA , T.-S., AN D NEVEN,H. 2009. Tourthe world: building a web-scale landmark recog-nition engine. InIEEE Conference on Computer Vision and Pat-tern Recognition (CVPR),10851092.

RIFIFI, SOFOCLES y PARIS - revistadelauniversidad.unam.mx· formación teatral, en los Estados Uni ... bien se toma en cuenta el momento his

detaile 1 de LOSA AuGERADA DE HORWGON ARMADO EN PARA

Paris, capitale du monde Thèmes· connait pas la chanson, le rythme et les paroles nous transportent bien en Afrique.] À deux. Vous venez d’arriver à Paris et vous ne connaissez

Université Paris-Dauphine...Université Paris-Dauphine Les Rhétoriques de la prévention Favorable • Une source de bien-être individuel et collectif • Une responsabilisation

MF C Window s Programming: Document/View Approach Mor e detaile d note s at:

· 2017-09-12· Ahora bien, es de suma importancia mencionar que no eyâígaci' de esta dirección el saber a detaile el dato de quien o quienes son dueños 0 con las que trabaja,

2017‐2024· d’avenir pour les JO 2020, et bien entendu, pour ceux de Paris en 2024. Nous devons capitaliser sur ces différents résultats. L’obtention des JO à Paris en 2024

kjl Barème bien d’habitationvd.seloger.com/files/1/x/z/c...effiCity SAS (RCS Paris 497617746 - Capital social de 116 847 euros) - Siège : 48 Avenue de Villiers, 75017 PARIS –

plataformaunica.tamaulipas.gob.mxplataformaunica.tamaulipas.gob.mx/wp-content/uploads/2012/01/09...· 1.1. Detaile de to-dos ICS campos, restricciones, cálculos y operaciones que

revistadehistoriade-elpuerto.orgrevistadehistoriade-elpuerto.org/contenido/revistas/11/11_document...· Antonio Cabral Chamorro y Natividad ... en vano se buscará detaile delicado

Soleilles Cowork : un espace de travail collaboratif dans Paris : votre bureau à la demande ... et bien plus

Shanghai Paris Toronto Buenos Aires Shanghai Paris Toronto Buenos Aires Le bien être psychologique des salariés au travail Résultats détaillés Septembre

335323oa.upm.es/58935/1/335323.pdf· 2020. 3. 9.· leyenda constructiva detaile del 'techo detalle cubierta detaile tipo 3_ de de con e de y u_ de forjado de placa alveolar de

P. A. J. Dagnan-Bouveret (Paris, - CNDP· 2012. 12. 17.· Dans l’ordre de la création de l’univers, bien avant l’homme, l’animal est là; bien avant la représentation

Shanghai Paris Le bien être psychologique des …Selon vous, le bien être psychologique des salariés au travail est-il un enjeu…pour l’entreprise ? Base : ensemble – 100%

Vietnam Rando bien-être en Haut-Tonkin - La …temps libre à Hanoi jusqu'au transfert aéroport et vol pour Paris. nuit à bord. JOUR 13 Arrivée Paris. La Balaguère et l'accompagnateur

AP LITERATURE AND COMPOSITION Literary Devices …henninghomepage.weebly.com/.../literary_devices-_examples_(detaile…· AP LITERATURE AND COMPOSITION Literary Devices ... Characterization-

Charte du doctorat - Université Paris-Saclay· 2020. 2. 20.· doctorant ou de la doctorante à l’Université Paris-Saclay ou bien, lors de l’entrée en vigueur de la charte,

HIF Paris 2014 - SAP - SAP HANA : bien plus qu’une base de données en mémoire

La technique d'irradiation. Est-elle bien tolérée à court et à long terme? Dr Florence Huguet Hôpital Tenon, Paris

Detaile d Solution s

ed132.ed.univ-paris-diderot.fred132.ed.univ-paris-diderot.fr/lib/exe/fetch.php?media=...· Web view2017. 2. 10.· Vous accédez ensuite au service FileSender. Veillez à bien

#FÉVRIER2021 ParisProgramme nature€¦· Plus d’infos → Le Jardin botanique de Paris, c’est quatre sites de Paris où la nature reste bien vivante. Téléchargez AudioSpot

Conférence "Troubles veineux et Ostéopathie" au Salon du Bien-Etre à Paris le 7 février 2014

ekladata.comekladata.com/S5mQm--_AhuFRQbat6w04H7oQBs/ACCORDEON...· VALSE MUSETTE BELGE ... Pour Jean-Pierre DANGUILLAUME. Bien cordialement (R.I. PARIS-TOURS ... di valse VIOLETTE

Quantitative Evaluation of the Bim-Assistedconstruction Detaile

Bien vivre son âge à Paris· 5, rue Louis-Blanc, Paris 10e Tél. : 01 45 96 04 06 Café social dejean Association Ayyem Zamen 1, rue Dejean, Paris 18e Tél. : 01 42 23 05 93 Café

La restauration collective concédée : bien manger …La restauration collective concédée : bien manger pour tous! Faits et chiffres, éd. 2017 9 rue de la Trémoille 75008 PARIS

Le Tourisme à Paris Chiffres clés· L’année touristique 2015 à Paris aura bien sûr été marquée ... « Destination Paris : la ville augmentée ». Signing of the destination

€¦· Web view... ça va bien, 3) bonjour, 4) ça va, 5) À tout ... ça va bien , _____. A: Ciao. B ... The tower is the tallest structure in Paris and the most-visited paid

L'Impromptu de Paris - Accueil - Bibliothèque …· Moi je veux bien le leur dire, ... Mais combien je préfèrefut-ce au , dam de mon ... bien le droit puisque je viens de les

2.en entrega detaile teknikoak

UNIVERSITE PARIS.DIDEROT (PARIS 7) SORBONNE PARIS CITE· leurs besoins : il s’agit donc bien d’une recherche‐action (O’Brien, 2001). Pour tester la démarche d’identification

Vermeer / Arthur K. Wheelock jr., Vermeer, Paris, Éditions ......«Les Grands peintres», 168 p. S'il y a deux choses impossibles à peindre, c'est bien le silence, c'est bien la

Paris - Une description bien detaile - [PDF Document] (2024)

Documents