Yet another Conference
SearchMailMapsMarketNewsDictionariesBlogsVideoImages

Key points 2010

Petr Popov, Yandex
Petr Popov, Yandex

Graduated from the mechanical-mathematics faculty of the Moscow State University, defending his PhD in algebraic topology. He has worked in the software industry since 2003, and has worked as a developer at Yandex since 2009. He specializes in low-level optimization and calculations for non-classical architecture (IBM Cell, GPU).

Presentation topic:

Basic optimization.

Key points:

The search infrastructure of Yandex is constantly growing and requires significant expenditures on support. The volume of information being indexed is also on the rise as well as the complexity of ranking algorithms and queries. The usual theory about “computational power being less expensive than a programmer’s work on optimization” was proven to be wrong. A search query to Yandex from a user goes through several stages of processing, where the “basic search” is the lowest level and most demanding optimization subsystem that consumes up to 95% of server capacity. Basic search itself includes a search of possible webpage matches, calculating ranking factors for them and the final calculation of relevance. We will tell you how Yandex deals with search index compression and the MatrixNet accelerated ranking algorithms.

Watch the presentation.

Konstantin Serebryany, Google
Konstantin Serebryany, Google

He graduated from the mechanical-mathematics faculty of the Moscow State University in 2000. Worked for 7 years with optimizing compilers (4 years for SUN, 3 years for Intel). In 2004 he defended his PhD dissertation on “Methods of high-level cycles optimization ”. Since 2007, Konstantin has worked for Google Moscow and deals with the dynamic analysis of programs including searching for errors in multithreaded code.

Presentation topic:

Data races or "finding the bug before it finds you".

Key points:

When creating high-load server or client systems it becomes harder to work without multiflow. However, programming multiflow applications is not becoming easier, in many cases due to data race problems. In most of the cases, it is almost impossible to “catch” the races using classic methods of testing because they do not appear in every program launch. This presentation will provide information about the “ThreadSanitizer” tool that finds races and was developed and introduced by Google. ThreadSanitizer helps programmers find races in programs written in C, C++, Java and working under Linux, Mac OS X and Windows. We will share our experience of real applications of this tool while testing such large-scale projects as Google Chrome or other Google software. You will find out what kind of races we discovered, what difficulties we faced while training users, and how we implemented regular automatic testing.

Download the presentation (pdf).

Sergey Nurk, Yandex
Sergey Nurk, Yandex

5th year mechanical-mathematics faculty student of Saint-Petersburg State University. He develops systems of automated structural data collection at Yandex.

Presentation topic:

Presentation topic:

Key points:

It is now possible to find a lot of useful information on the Internet. The overall problem is collecting this data automatically. However, completely automated methods of data collection cannot provide us with the completeness and accuracy of search results we require. This presentation is about a new tool to manage the collection of structured data. Users will be able to specify examples of relevant information on several pages of a web-site. The system will then automatically generate special types of templates to look for the same type of information among the rest of the webpages. The presentation also includes information about algorithms used and the problems that were faced during implementation, a list of unsolved issues and future development.

Download the presentation (pdf).

Kirill Mavrodiev, Intel
Kirill Mavrodiev, Intel

Kirill has been working as a software engineer at Intel for two years. He deals with technical consulting in compiling (Compiler Technical Consulting Engineer) in the EMEA region (Europe, Middle East and Africa).

Presentation topic:

An overview of the modern methods to parallelize and vectorize applications using Parallels Composer.

Key points:

A new tool called Intel® Parallel Studio 2011 was launched at the beginning of September. Intel® Parallel Studio 2011 includes 4 components (Parallel Adviser, Parallel Composer, Parallel Amplifier and Parallel Inspector) that facilitates the quick and efficient transfer from a serial application to a parallel application for systems with shared memory. Intel® Parallel Composer also has the following additional compiling options: Intel® Cilk™ Plus, Array Notation, Guided Auto-parallelization (GAP), etc. you will see examples of these extensions as well as how they were used in the development of a moving particle simulator.

Download the presentation (pdf).

Vlad Seliverstov, Yandex
Vlad Seliverstov, Yandex

Graduated from the computer technologies and applied mathematics faculty of Kuban State University in 2004. Since 2005 he has worked for Yandex. He designed and launched the Yandex Advertising Network. Since 2008 he has led a group of advertising technologies administrators.

Presentation topic:

Phantom web-server.

Key points:

While designing loaded systems we sometimes face the fact that different types of queries to web-servers use different amounts of resources, take different amounts of time and have different priorities. Few inquiries “cost” us less and have to be processed as fast as possible. Several “cost” a lot and must not to block the processing of fast queries. Existing schemes of prioritizing, in our point of view, are ponderous and inconvenient – while the number of such schemes continues to grow, the configuration of the systems becomes more and more complicated. To solve this task and to make query responses faster we wrote our own web-server – Phantom. This presentation tells you how it works, what tasks it solves, and in the end, how the prioritization of different types of queries works in practice using a load testing tool based on Phantom.

Download the presentation (pdf).

Konstantin V. Shvachko, Yahoo!
Konstantin V. Shvachko, Yahoo!

He is the principal software engineer at Yahoo!, where he develops HDFS. He specializes in efficient data structures and algorithms for large-scale distributed storage systems. He holds a Ph.D. in computer science from Moscow State University. He is a member of the Project Management Committee for Apache Hadoop.

Presentation topic:

Scaling Storage and Computation with Hadoop.

Abstract:

Hadoop provides a distributed storage and a framework for the analysis and transformation of very large data sets using the MapReduce paradigm. Hadoop is partitioning data and computation across thousands of hosts, and executes application computations in parallel close to their data. A Hadoop cluster scales computation capacity, storage capacity and IO bandwidth by simply adding commodity servers. Hadoop is an Apache Software Foundation project; it unites hundreds of developers, and hundreds of organizations worldwide report using Hadoop. This presentation will give an overview of the Hadoop family projects with a focus on its distributed storage solutions.

Download the presentation (pdf).

Andrey Kuzmichev, Yandex
Andrey Kuzmichev, Yandex

Graduated from Bauman Moscow State Technical University. He has worked for Yandex since 2007. Until June 2008 he specialized on load testing. Since June 2008 he has led a load testing team.

Presentation topic:

Tanks in Lunapark: load testing in Yandex.

Key points:

Everyday millions of people use Yandex services, and from month to the next the number of users grows. Updating existing services under such loads and starting new projects are impossible without testing. We will tell you about our “Lunapark” load testing tool that was developed and implemented at Yandex. You will find out how and why we developed such a tool. We will also discuss the risks connected with developing your own tool and the advantages it could give us.

Download the presentation (pdf).

Alexander Dmitriev, Yandex
Alexander Dmitriev, Yandex

Graduated from the mechanical-mathematics faculty of the Moscow State University. Before Yandex, he worked in the computer games industry and in the 3D medical visualization sphere. He has worked for Yandex since 2007, he deals with developing distributed computing systems.

Presentation topic:

Yet Another MapReduce.

Key points:

One of the most successful concepts in the parallel processing of large amounts of data today is MapReduce. The simplicity and easy scaling means that MapReduce easily lends itself to a multitude of uses. In this presentation, we look at MR usage by Yandex. In addition to traditional fault-tolerance of data storage and processing of data in large clusters, we also talk about classical approach extensions that emerged from the solution of practical problems.

Watch the presentation.

Alexandre (Shura) Iline, Oracle
Alexandre (Shura) Iline, Oracle

Alexandre Iline is a lead quality engineer in SUN Microsystems, working as a quality architect of Java and JavaFX as well as several products in Java SE and JavaFX portfolio, and also as a test tools architect. Prior to that, Alexandre was playing a role of quality lead/architect of such products as Java Studio Creator and Netbeans. Before that, Alexandre was working as a member/lead of NetBeans QE tools team. Alexandre is the author of Jemmy – the open-source UI testing tool used widely for Swing/AWT UI applications testing as well as for testing of Swing itself. Jemmy v3 is a new generation tool developed/led by Alexandre which, along with the other UI libraries, allows to test JavaFX UI. The tool is used successfully for testing JavaFX SDK and JavaFX products developed internally.

Title:

UI test automation techniques by an example of JavaFX UI.

Abstract:

Test automation is an essential part of a software development process. Being used wisely, it

  • optimizes testing resources,
  • increases testing quality,
  • leads to earlier bug detection,
  • allows to build continuous development processes.

UI Test automation requires tools, experience and human time investment. The session demonstrates a solution used by Java and JavaFX quality team based on experience of testing such products as Swing, NetBeans, JavaFX SDK and JavaFX Authoring tool. The solution is designed to address the key aspects of UI test automation: effectiveness, test base scalability, stability, and maintainability. The core part of the solution is an open source high-level UI test library Jemmy. During the session, Alexandre will be creating JavaFX UI tests from scratch and demonstrating existing test base for real JavaFX products. He will give an overview of JemmyFX API and explain how UI test automation aspects are applied to real test code. The session is intended for Java and JavaFX UI application developers and quality engineers as well as for everyone interested in UI testing techniques and approaches.

Скачать презентацию (pdf).

Richard James Cole, Skype
Richard James Cole, Skype

Richard Cole is a Product Manager with 15 years experience within the high tech software communications industry. Richard leads the Product Management for SkypeKit Desktop, responsible for the product strategy and market requirements for SkypeKit on Windows / Mac and Linux desktop operating systems. Prior to that Richard Programme Managed the technical delivery of many of the successful services found in Skype. Richard holds a MSc and BSc in Applied Chemistry from Imperial College, London.

Title:

Our connected future and the rise of real time video.

A short brief:

What is driving Skype's popularity today, and how SkypeKit is helping our partners to share in our success.

Download the presentation (pdf).

Евгений Поляков, Яндекс
Evgeny Polyakov, Yandex

Graduated from the physics and quantum electronics faculty of the Moscow physics-technical institute in 2005. Since 2000 he has developed Linux core and works with computer complexes ranging from embedded PPC systems to multi machine clusters. Since 2003 he has taken part in network stack development and Linux core cryptography system support. Since 2005 he has been researching file systems and large scale data storage technologies.

Presentation topic:

Distributed systems of data storage, realization features of DHT in an Elliptics network project.

Key points:

This presentation is about the Elliptics network system of data storage, the main task of which is to give users access to data located on physically distributed servers with a flat address model in a decentralized environment. A distributed system of data storage that gives access according to key/value storage, and specifically a distributed hash table, is an effective enough solution with not many limitations. To prove the functionality of this idea this presentation shows the practical implementation of a distributed hash table with a modular system of data storage and different system of access: from POSIX file system to access via HTTP. We will also discuss the limitations of using distributed hash table technology and will compare the specification of high loaded and high reliable access in an unreliable environment with classic models that use centralized systems. The performance of this system will be displayed based on practical results and the flexibility of system methods and functionality expansion.

Download the presentation (pdf).

Vyacheslav Borilin, SPIRIT DSP
Vyacheslav Borilin, SPIRIT DSP

He has worked for SPIRIT for more than 8 years. As vice-president for products, he leads the marketing of voice and video solutions of SPIRIT under the TeamSpirit™ brand, he designs the strategy of development of SPIRIT’s new products and solutions for VoIP. Before SPIRIT, he worked as a team leader for electronic commerce at Mail.ru, he initiated a number of innovation projects successfully predicting the main market trends in the Russian high-tech market.

Presentation topic:

Developing video communication systems for a large number of users on the Internet. Effective coding and real-time traffic distribution. Signaling and firewall traversal techniques.

Key points:

The Internet has offered quality VoIP services for a long time now, however, they only really allow people to connect on a one to one (PC-to-PC) basis. What about a video service that allows the simultaneous connection of 10, 20 or even 50 people who can see each other and discuss, for instance, football? High quality multiuser audio and video connection is a very complex algorithm. We will try to find out why it is so complex and see what this system should contain. We will compare different audio and video codecs as well. We will also tell you about recent developments in the coded audio and video signal sphere, and try to characterize the basic mechanisms of real-time traffic adaptation to networked conditions such as losing packages, jitter, time-out. We will provide you with recommendations on how to create a communication service where the quality of voice and video remains high and the ability to unite dozens of people in one dialogue. Besides the block that is responsible for media data delivery in communications systems, the signaling block also creates its own challenges. Especially designed for the Internet this block supports media data tranfer through NAT servers and firewalls. We will give examples and describe the main directions and techniques of passing through such servers.

Download the presentation (pdf).

Dmitry Nikolaev, SUP
Dmitry Nikolaev, SUP

Worked at “Akko” Ltd. in Saratov as a programmer. He codes in C++, Delphi, Perl, Java and knows MySQL, MSSQL bases. He has created and maintained sites. Dmitry developed the educational system for the Ministry of labor and social security. He currently works as a leading developer for the statistics and ratings service at SUP in Moscow, using C/C++, Perl, MySQL and PostgreSQL bases.

Presentation topic:

Statistics and ratings systems in LiveJournal.com.

Key points:

This presentation is about the overall architecture and interconnection of components in existing statistics and ratings systems in LiveJournal.com. We will look at methods developed for the collection, processing and storage of data according to its specific characteristics, volume and required functionality. We analyzed the practicality of choice based on the relational bases and alternative data storage methods designed for a given task, the advantages and disadvantages of the transition to the file system model of data storage. In short we tried to deal with questions of administration and resiliency as well as ways of expanding the system’s functionality, problems that arise and their solutions.

Download the presentation (pdf).

Ruslan Garaschuk, ABBYY
Ruslan Garaschuk, ABBYY

Graduated from the Moscow Institute of Physics and Technology (MIPT) in 1993. Since 1994 he has worked for ABBYY (BIT Software at the time). He took part in the development of different subsystems of FineReader, developed technologies of fixed and flexible forms of input. Since 2003 he has worked for the linguistic technologies department at ABBYY.

Presentation topic:

The distributed testing system in machinery translation.

Key points:

This presentation shows the principles of developing distributed systems using the machinery translation testing system as an example. We understand the term “distributed system” to mean a system that uses a large number of computers to solve tasks that require a lot of processing time. We pay special attention to issues of system resiliency and scaling.

Download the presentation (pdf).

Олег Юхно, Яндекс
Oleg Yukhno, Yandex

He has worked in the IT sphere since 1998. In 2000 he got his first degree in law, in 2006 he graduated from Bauman Moscow State Technical University with a degree in information systems and technologies. Since 2005 he has worked at Yandex as a system administrator and leads an administration group. He is interested in the exploitation of high-scale accessible and high-loaded systems and productivity optimization, as well as the Oracle database managing system.

Presentation topic:

From statistics to statistics. The evolution of the system’s architecture using the Yandex statistics calculation system as an example.

Key points:

This presentation tells you about how the Yandex statistics calculation system was built and how it has since evolved. Oleg tells you about the advantages and disadvantages of a very centralized monolith system and a decentralized module system, as well as several technological solutions that were used to increase the system’s productivity.

Download the presentation (pdf).

Speakers